Search results for: k-means clustering based feature weighting
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 28570

Search results for: k-means clustering based feature weighting

27580 Orphan Node Inclusion Protocol for Wireless Sensor Network

Authors: Sandeep Singh Waraich

Abstract:

Wireless sensor network (WSN ) consists of a large number of sensor nodes. The disparity in their energy consumption usually lead to the loss of equilibrium in wireless sensor network which may further results in an energy hole problem in wireless network. In this paper, we have considered the inclusion of orphan nodes which usually remain unutilized as intermediate nodes in multi-hop routing. The Orphan Node Inclusion (ONI) Protocol lets the cluster member to bring the orphan nodes into their clusters, thereby saving important resources and increasing network lifetime in critical applications of WSN.

Keywords: wireless sensor network, orphan node, clustering, ONI protocol

Procedia PDF Downloads 398
27579 Revisiting the Swadesh Wordlist: How Long Should It Be

Authors: Feda Negesse

Abstract:

One of the most important indicators of research quality is a good data - collection instrument that can yield reliable and valid data. The Swadesh wordlist has been used for more than half a century for collecting data in comparative and historical linguistics though arbitrariness is observed in its application and size. This research compare s the classification results of the 100 Swadesh wordlist with those of its subsets to determine if reducing the size of the wordlist impact s its effectiveness. In the comparison, the 100, 50 and 40 wordlists were used to compute lexical distances of 29 Cushitic and Semitic languages spoken in Ethiopia and neighbouring countries. Gabmap, a based application, was employed to compute the lexical distances and to divide the languages into related clusters. The study shows that the subsets are not as effective as the 100 wordlist in clustering languages into smaller subgroups but they are equally effective in di viding languages into bigger groups such as subfamilies. It is noted that the subsets may lead to an erroneous classification whereby unrelated languages by chance form a cluster which is not attested by a comparative study. The chance to get a wrong result is higher when the subsets are used to classify languages which are not closely related. Though a further study is still needed to settle the issues around the size of the Swadesh wordlist, this study indicates that the 50 and 40 wordlists cannot be recommended as reliable substitute s for the 100 wordlist under all circumstances. The choice seems to be determined by the objective of a researcher and the degree of affiliation among the languages to be classified.

Keywords: classification, Cushitic, Swadesh, wordlist

Procedia PDF Downloads 281
27578 Study of Mobile Game Addiction Using Electroencephalography Data Analysis

Authors: Arsalan Ansari, Muhammad Dawood Idrees, Maria Hafeez

Abstract:

Use of mobile phones has been increasing considerably over the past decade. Currently, it is one of the main sources of communication and information. Initially, mobile phones were limited to calls and messages, but with the advent of new technology smart phones were being used for many other purposes including video games. Despite of positive outcomes, addiction to video games on mobile phone has become a leading cause of psychological and physiological problems among many people. Several researchers examined the different aspects of behavior addiction with the use of different scales. Objective of this study is to examine any distinction between mobile game addicted and non-addicted players with the use of electroencephalography (EEG), based upon psycho-physiological indicators. The mobile players were asked to play a mobile game and EEG signals were recorded by BIOPAC equipment with AcqKnowledge as data acquisition software. Electrodes were places, following the 10-20 system. EEG was recorded at sampling rate of 200 samples/sec (12,000samples/min). EEG recordings were obtained from the frontal (Fp1, Fp2), parietal (P3, P4), and occipital (O1, O2) lobes of the brain. The frontal lobe is associated with behavioral control, personality, and emotions. The parietal lobe is involved in perception, understanding logic, and arithmetic. The occipital lobe plays a role in visual tasks. For this study, a 60 second time window was chosen for analysis. Preliminary analysis of the signals was carried out with Acqknowledge software of BIOPAC Systems. From the survey based on CGS manual study 2010, it was concluded that five participants out of fifteen were in addictive category. This was used as prior information to group the addicted and non-addicted by physiological analysis. Statistical analysis showed that by applying clustering analysis technique authors were able to categorize the addicted and non-addicted players specifically on theta frequency range of occipital area.

Keywords: mobile game, addiction, psycho-physiology, EEG analysis

Procedia PDF Downloads 143
27577 Diversity Indices as a Tool for Evaluating Quality of Water Ways

Authors: Khadra Ahmed, Khaled Kheireldin

Abstract:

In this paper, we present a pedestrian detection descriptor called Fused Structure and Texture (FST) features based on the combination of the local phase information with the texture features. Since the phase of the signal conveys more structural information than the magnitude, the phase congruency concept is used to capture the structural features. On the other hand, the Center-Symmetric Local Binary Pattern (CSLBP) approach is used to capture the texture information of the image. The dimension less quantity of the phase congruency and the robustness of the CSLBP operator on the flat images, as well as the blur and illumination changes, lead the proposed descriptor to be more robust and less sensitive to the light variations. The proposed descriptor can be formed by extracting the phase congruency and the CSLBP values of each pixel of the image with respect to its neighborhood. The histogram of the oriented phase and the histogram of the CSLBP values for the local regions in the image are computed and concatenated to construct the FST descriptor. Several experiments were conducted on INRIA and the low resolution DaimlerChrysler datasets to evaluate the detection performance of the pedestrian detection system that is based on the FST descriptor. A linear Support Vector Machine (SVM) is used to train the pedestrian classifier. These experiments showed that the proposed FST descriptor has better detection performance over a set of state of the art feature extraction methodologies.

Keywords: planktons, diversity indices, water quality index, water ways

Procedia PDF Downloads 496
27576 Societal Acceptance of Trombe Wall in Buildings in Mediterranean Region: A Case Cyprus

Authors: Soad Abokhamis Mousavi

Abstract:

The Trombe wall is an ancient technique that continues to serve as an effective feature of a passive solar system. However, in practice, architects and their clients are not opting for the Trombe wall because of the view of the Trombe wall on the facades of the buildings. Therefore, this study has two main goals, and one of the goals is to find out why the Trombe wall is not considered in the buildings in the Mediterranean region. And the second goal is to find a solution to facilitate the societal acceptance of the Trombe walls in buildings. To cover the goals, the present work attempts to develop and design a different Trombe Wall with different Materials and views in the facades of the buildings. A qualitative data method was used in this article. The qualitative method was developed based on observation and questionnaires with different clients and expert architects in the selected region. Results indicate that the view of the Trombe wall in the facade of buildings can be used with different designs in order to not affect the beauty of the buildings.

Keywords: trombe wall, societal acceptance, building, energy efficacy

Procedia PDF Downloads 64
27575 Challenges and Opportunities: One Stop Processing for the Automation of Indonesian Large-Scale Topographic Base Map Using Airborne LiDAR Data

Authors: Elyta Widyaningrum

Abstract:

The LiDAR data acquisition has been recognizable as one of the fastest solution to provide the basis data for topographic base mapping in Indonesia. The challenges to accelerate the provision of large-scale topographic base maps as a development plan basis gives the opportunity to implement the automated scheme in the map production process. The one stop processing will also contribute to accelerate the map provision especially to conform with the Indonesian fundamental spatial data catalog derived from ISO 19110 and geospatial database integration. Thus, the automated LiDAR classification, DTM generation and feature extraction will be conducted in one GIS-software environment to form all layers of topographic base maps. The quality of automated topographic base map will be assessed and analyzed based on its completeness, correctness, contiguity, consistency and possible customization.

Keywords: automation, GIS environment, LiDAR processing, map quality

Procedia PDF Downloads 346
27574 Modern Detection and Description Methods for Natural Plants Recognition

Authors: Masoud Fathi Kazerouni, Jens Schlemper, Klaus-Dieter Kuhnert

Abstract:

Green planet is one of the Earth’s names which is known as a terrestrial planet and also can be named the fifth largest planet of the solar system as another scientific interpretation. Plants do not have a constant and steady distribution all around the world, and even plant species’ variations are not the same in one specific region. Presence of plants is not only limited to one field like botany; they exist in different fields such as literature and mythology and they hold useful and inestimable historical records. No one can imagine the world without oxygen which is produced mostly by plants. Their influences become more manifest since no other live species can exist on earth without plants as they form the basic food staples too. Regulation of water cycle and oxygen production are the other roles of plants. The roles affect environment and climate. Plants are the main components of agricultural activities. Many countries benefit from these activities. Therefore, plants have impacts on political and economic situations and future of countries. Due to importance of plants and their roles, study of plants is essential in various fields. Consideration of their different applications leads to focus on details of them too. Automatic recognition of plants is a novel field to contribute other researches and future of studies. Moreover, plants can survive their life in different places and regions by means of adaptations. Therefore, adaptations are their special factors to help them in hard life situations. Weather condition is one of the parameters which affect plants life and their existence in one area. Recognition of plants in different weather conditions is a new window of research in the field. Only natural images are usable to consider weather conditions as new factors. Thus, it will be a generalized and useful system. In order to have a general system, distance from the camera to plants is considered as another factor. The other considered factor is change of light intensity in environment as it changes during the day. Adding these factors leads to a huge challenge to invent an accurate and secure system. Development of an efficient plant recognition system is essential and effective. One important component of plant is leaf which can be used to implement automatic systems for plant recognition without any human interface and interaction. Due to the nature of used images, characteristic investigation of plants is done. Leaves of plants are the first characteristics to select as trusty parts. Four different plant species are specified for the goal to classify them with an accurate system. The current paper is devoted to principal directions of the proposed methods and implemented system, image dataset, and results. The procedure of algorithm and classification is explained in details. First steps, feature detection and description of visual information, are outperformed by using Scale invariant feature transform (SIFT), HARRIS-SIFT, and FAST-SIFT methods. The accuracy of the implemented methods is computed. In addition to comparison, robustness and efficiency of results in different conditions are investigated and explained.

Keywords: SIFT combination, feature extraction, feature detection, natural images, natural plant recognition, HARRIS-SIFT, FAST-SIFT

Procedia PDF Downloads 256
27573 Tumor Size and Lymph Node Metastasis Detection in Colon Cancer Patients Using MR Images

Authors: Mohammadreza Hedyehzadeh, Mahdi Yousefi

Abstract:

Colon cancer is one of the most common cancer, which predicted to increase its prevalence due to the bad eating habits of peoples. Nowadays, due to the busyness of people, the use of fast foods is increasing, and therefore, diagnosis of this disease and its treatment are of particular importance. To determine the best treatment approach for each specific colon cancer patients, the oncologist should be known the stage of the tumor. The most common method to determine the tumor stage is TNM staging system. In this system, M indicates the presence of metastasis, N indicates the extent of spread to the lymph nodes, and T indicates the size of the tumor. It is clear that in order to determine all three of these parameters, an imaging method must be used, and the gold standard imaging protocols for this purpose are CT and PET/CT. In CT imaging, due to the use of X-rays, the risk of cancer and the absorbed dose of the patient is high, while in the PET/CT method, there is a lack of access to the device due to its high cost. Therefore, in this study, we aimed to estimate the tumor size and the extent of its spread to the lymph nodes using MR images. More than 1300 MR images collected from the TCIA portal, and in the first step (pre-processing), histogram equalization to improve image qualities and resizing to get the same image size was done. Two expert radiologists, which work more than 21 years on colon cancer cases, segmented the images and extracted the tumor region from the images. The next step is feature extraction from segmented images and then classify the data into three classes: T0N0، T3N1 و T3N2. In this article, the VGG-16 convolutional neural network has been used to perform both of the above-mentioned tasks, i.e., feature extraction and classification. This network has 13 convolution layers for feature extraction and three fully connected layers with the softmax activation function for classification. In order to validate the proposed method, the 10-fold cross validation method used in such a way that the data was randomly divided into three parts: training (70% of data), validation (10% of data) and the rest for testing. It is repeated 10 times, each time, the accuracy, sensitivity and specificity of the model are calculated and the average of ten repetitions is reported as the result. The accuracy, specificity and sensitivity of the proposed method for testing dataset was 89/09%, 95/8% and 96/4%. Compared to previous studies, using a safe imaging technique (MRI) and non-use of predefined hand-crafted imaging features to determine the stage of colon cancer patients are some of the study advantages.

Keywords: colon cancer, VGG-16, magnetic resonance imaging, tumor size, lymph node metastasis

Procedia PDF Downloads 42
27572 Methodology to Achieve Non-Cooperative Target Identification Using High Resolution Range Profiles

Authors: Olga Hernán-Vega, Patricia López-Rodríguez, David Escot-Bocanegra, Raúl Fernández-Recio, Ignacio Bravo

Abstract:

Non-Cooperative Target Identification has become a key research domain in the Defense industry since it provides the ability to recognize targets at long distance and under any weather condition. High Resolution Range Profiles, one-dimensional radar images where the reflectivity of a target is projected onto the radar line of sight, are widely used for identification of flying targets. According to that, to face this problem, an approach to Non-Cooperative Target Identification based on the exploitation of Singular Value Decomposition to a matrix of range profiles is presented. Target Identification based on one-dimensional radar images compares a collection of profiles of a given target, namely test set, with the profiles included in a pre-loaded database, namely training set. The classification is improved by using Singular Value Decomposition since it allows to model each aircraft as a subspace and to accomplish recognition in a transformed domain where the main features are easier to extract hence, reducing unwanted information such as noise. Singular Value Decomposition permits to define a signal subspace which contain the highest percentage of the energy, and a noise subspace which will be discarded. This way, only the valuable information of each target is used in the recognition process. The identification algorithm is based on finding the target that minimizes the angle between subspaces and takes place in a transformed domain. Two metrics, F1 and F2, based on Singular Value Decomposition are accomplished in the identification process. In the case of F2, the angle is weighted, since the top vectors set the importance in the contribution to the formation of a target signal, on the contrary F1 simply shows the evolution of the unweighted angle. In order to have a wide database or radar signatures and evaluate the performance, range profiles are obtained through numerical simulation of seven civil aircraft at defined trajectories taken from an actual measurement. Taking into account the nature of the datasets, the main drawback of using simulated profiles instead of actual measured profiles is that the former implies an ideal identification scenario, since measured profiles suffer from noise, clutter and other unwanted information and simulated profiles don't. In this case, the test and training samples have similar nature and usually a similar high signal-to-noise ratio, so as to assess the feasibility of the approach, the addition of noise has been considered before the creation of the test set. The identification results applying the unweighted and weighted metrics are analysed for demonstrating which algorithm provides the best robustness against noise in an actual possible scenario. So as to confirm the validity of the methodology, identification experiments of profiles coming from electromagnetic simulations are conducted, revealing promising results. Considering the dissimilarities between the test and training sets when noise is added, the recognition performance has been improved when weighting is applied. Future experiments with larger sets are expected to be conducted with the aim of finally using actual profiles as test sets in a real hostile situation.

Keywords: HRRP, NCTI, simulated/synthetic database, SVD

Procedia PDF Downloads 336
27571 Assessing the Utility of Unmanned Aerial Vehicle-Borne Hyperspectral Image and Photogrammetry Derived 3D Data for Wetland Species Distribution Quick Mapping

Authors: Qiaosi Li, Frankie Kwan Kit Wong, Tung Fung

Abstract:

Lightweight unmanned aerial vehicle (UAV) loading with novel sensors offers a low cost approach for data acquisition in complex environment. This study established a framework for applying UAV system in complex environment quick mapping and assessed the performance of UAV-based hyperspectral image and digital surface model (DSM) derived from photogrammetric point clouds for 13 species classification in wetland area Mai Po Inner Deep Bay Ramsar Site, Hong Kong. The study area was part of shallow bay with flat terrain and the major species including reedbed and four mangroves: Kandelia obovata, Aegiceras corniculatum, Acrostichum auerum and Acanthus ilicifolius. Other species involved in various graminaceous plants, tarbor, shrub and invasive species Mikania micrantha. In particular, invasive species climbed up to the mangrove canopy caused damage and morphology change which might increase species distinguishing difficulty. Hyperspectral images were acquired by Headwall Nano sensor with spectral range from 400nm to 1000nm and 0.06m spatial resolution image. A sequence of multi-view RGB images was captured with 0.02m spatial resolution and 75% overlap. Hyperspectral image was corrected for radiative and geometric distortion while high resolution RGB images were matched to generate maximum dense point clouds. Furtherly, a 5 cm grid digital surface model (DSM) was derived from dense point clouds. Multiple feature reduction methods were compared to identify the efficient method and to explore the significant spectral bands in distinguishing different species. Examined methods including stepwise discriminant analysis (DA), support vector machine (SVM) and minimum noise fraction (MNF) transformation. Subsequently, spectral subsets composed of the first 20 most importance bands extracted by SVM, DA and MNF, and multi-source subsets adding extra DSM to 20 spectrum bands were served as input in maximum likelihood classifier (MLC) and SVM classifier to compare the classification result. Classification results showed that feature reduction methods from best to worst are MNF transformation, DA and SVM. MNF transformation accuracy was even higher than all bands input result. Selected bands frequently laid along the green peak, red edge and near infrared. Additionally, DA found that chlorophyll absorption red band and yellow band were also important for species classification. In terms of 3D data, DSM enhanced the discriminant capacity among low plants, arbor and mangrove. Meanwhile, DSM largely reduced misclassification due to the shadow effect and morphological variation of inter-species. In respect to classifier, nonparametric SVM outperformed than MLC for high dimension and multi-source data in this study. SVM classifier tended to produce higher overall accuracy and reduce scattered patches although it costs more time than MLC. The best result was obtained by combining MNF components and DSM in SVM classifier. This study offered a precision species distribution survey solution for inaccessible wetland area with low cost of time and labour. In addition, findings relevant to the positive effect of DSM as well as spectral feature identification indicated that the utility of UAV-borne hyperspectral and photogrammetry deriving 3D data is promising in further research on wetland species such as bio-parameters modelling and biological invasion monitoring.

Keywords: digital surface model (DSM), feature reduction, hyperspectral, photogrammetric point cloud, species mapping, unmanned aerial vehicle (UAV)

Procedia PDF Downloads 238
27570 Retrieving Similar Segmented Objects Using Motion Descriptors

Authors: Konstantinos C. Kartsakalis, Angeliki Skoura, Vasileios Megalooikonomou

Abstract:

The fuzzy composition of objects depicted in images acquired through MR imaging or the use of bio-scanners has often been a point of controversy for field experts attempting to effectively delineate between the visualized objects. Modern approaches in medical image segmentation tend to consider fuzziness as a characteristic and inherent feature of the depicted object, instead of an undesirable trait. In this paper, a novel technique for efficient image retrieval in the context of images in which segmented objects are either crisp or fuzzily bounded is presented. Moreover, the proposed method is applied in the case of multiple, even conflicting, segmentations from field experts. Experimental results demonstrate the efficiency of the suggested method in retrieving similar objects from the aforementioned categories while taking into account the fuzzy nature of the depicted data.

Keywords: fuzzy object, fuzzy image segmentation, motion descriptors, MRI imaging, object-based image retrieval

Procedia PDF Downloads 357
27569 Numerical Modelling of Effective Diffusivity in Bone Tissue Engineering

Authors: Ayesha Sohail, Khadija Maqbool, Anila Asif, Haroon Ahmad

Abstract:

The field of tissue engineering is an active area of research. Bone tissue engineering helps to resolve the clinical problems of critical size and non-healing defects by the creation of man-made bone tissue. We will design and validate an efficient numerical model, which will simulate the effective diffusivity in bone tissue engineering. Our numerical model will be based on the finite element analysis of the diffusion-reaction equations. It will have the ability to optimize the diffusivity, even at multi-scale, with the variation of time. It will also have a special feature, with which we will not only be able to predict the oxygen, glucose and cell density dynamics, more accurately, but will also sort the issues arising due to anisotropy. We will fix these problems with the help of modifying the governing equations, by selecting appropriate spatio-temporal finite element schemes, by adaptive grid refinement strategy and by transient analysis.

Keywords: scaffolds, porosity, diffusion, transient analysis

Procedia PDF Downloads 519
27568 Title: Real World Evidence a Tool to Overcome the Lack of a Comparative Arm in Drug Evaluation in the Context of Rare Diseases

Authors: Mohamed Wahba

Abstract:

Objective: To build a comparative arm for product (X) in specific gene mutated advanced gastrointestinal cancer using real world evidence to fulfill HTA requirements in drug evaluation. Methods: Data for product (X) were collected from phase II clinical trial while real world data for (Y) and (Z) were collected from US database. Real-world (RW) cohorts were matched to clinical trial base line characteristics using weighting by odds method. Outcomes included progression-free survival (PFS) and overall survival (OS) rates. Study location and participants: Internationally (product X, n=80) and from USA (Product Y and Z, n=73) Results: Two comparisons were made: trial cohort 1 (X) versus real-world cohort 1 (Z), trial cohort 2 (X) versus real-world cohort 2 (Y). For first line, the median OS was 9.7 months (95% CI 8.6- 11.5) and the median PFS was 5.2 months (95% CI 4.7- not reached) for real-world cohort 1. For second line, the median OS was 10.6 months (95% CI 4.7- 27.3) for real-world cohort 2 and the median PFS was 5.0 months (95% CI 2.1- 29.3). For OS analysis, results were statistically significant but not for PFS analysis. Conclusion: This study provided the clinical comparative outcomes needed for HTA evaluation.

Keywords: real world evidence, pharmacoeconomics, HTA agencies, oncology

Procedia PDF Downloads 74
27567 Spectrogram Pre-Processing to Improve Isotopic Identification to Discriminate Gamma and Neutrons Sources

Authors: Mustafa Alhamdi

Abstract:

Industrial application to classify gamma rays and neutron events is investigated in this study using deep machine learning. The identification using a convolutional neural network and recursive neural network showed a significant improvement in predication accuracy in a variety of applications. The ability to identify the isotope type and activity from spectral information depends on feature extraction methods, followed by classification. The features extracted from the spectrum profiles try to find patterns and relationships to present the actual spectrum energy in low dimensional space. Increasing the level of separation between classes in feature space improves the possibility to enhance classification accuracy. The nonlinear nature to extract features by neural network contains a variety of transformation and mathematical optimization, while principal component analysis depends on linear transformations to extract features and subsequently improve the classification accuracy. In this paper, the isotope spectrum information has been preprocessed by finding the frequencies components relative to time and using them as a training dataset. Fourier transform implementation to extract frequencies component has been optimized by a suitable windowing function. Training and validation samples of different isotope profiles interacted with CdTe crystal have been simulated using Geant4. The readout electronic noise has been simulated by optimizing the mean and variance of normal distribution. Ensemble learning by combing voting of many models managed to improve the classification accuracy of neural networks. The ability to discriminate gamma and neutron events in a single predication approach using deep machine learning has shown high accuracy using deep learning. The paper findings show the ability to improve the classification accuracy by applying the spectrogram preprocessing stage to the gamma and neutron spectrums of different isotopes. Tuning deep machine learning models by hyperparameter optimization of neural network models enhanced the separation in the latent space and provided the ability to extend the number of detected isotopes in the training database. Ensemble learning contributed significantly to improve the final prediction.

Keywords: machine learning, nuclear physics, Monte Carlo simulation, noise estimation, feature extraction, classification

Procedia PDF Downloads 133
27566 Medical Neural Classifier Based on Improved Genetic Algorithm

Authors: Fadzil Ahmad, Noor Ashidi Mat Isa

Abstract:

This study introduces an improved genetic algorithm procedure that focuses search around near optimal solution corresponded to a group of elite chromosome. This is achieved through a novel crossover technique known as Segmented Multi Chromosome Crossover. It preserves the highly important information contained in a gene segment of elite chromosome and allows an offspring to carry information from gene segment of multiple chromosomes. In this way the algorithm has better possibility to effectively explore the solution space. The improved GA is applied for the automatic and simultaneous parameter optimization and feature selection of artificial neural network in pattern recognition of medical problem, the cancer and diabetes disease. The experimental result shows that the average classification accuracy of the cancer and diabetes dataset has improved by 0.1% and 0.3% respectively using the new algorithm.

Keywords: genetic algorithm, artificial neural network, pattern clasification, classification accuracy

Procedia PDF Downloads 452
27565 Analyzing the Influence of Hydrometeorlogical Extremes, Geological Setting, and Social Demographic on Public Health

Authors: Irfan Ahmad Afip

Abstract:

This main research objective is to accurately identify the possibility for a Leptospirosis outbreak severity of a certain area based on its input features into a multivariate regression model. The research question is the possibility of an outbreak in a specific area being influenced by this feature, such as social demographics and hydrometeorological extremes. If the occurrence of an outbreak is being subjected to these features, then the epidemic severity for an area will be different depending on its environmental setting because the features will influence the possibility and severity of an outbreak. Specifically, this research objective was three-fold, namely: (a) to identify the relevant multivariate features and visualize the patterns data, (b) to develop a multivariate regression model based from the selected features and determine the possibility for Leptospirosis outbreak in an area, and (c) to compare the predictive ability of multivariate regression model and machine learning algorithms. Several secondary data features were collected locations in the state of Negeri Sembilan, Malaysia, based on the possibility it would be relevant to determine the outbreak severity in the area. The relevant features then will become an input in a multivariate regression model; a linear regression model is a simple and quick solution for creating prognostic capabilities. A multivariate regression model has proven more precise prognostic capabilities than univariate models. The expected outcome from this research is to establish a correlation between the features of social demographic and hydrometeorological with Leptospirosis bacteria; it will also become a contributor for understanding the underlying relationship between the pathogen and the ecosystem. The relationship established can be beneficial for the health department or urban planner to inspect and prepare for future outcomes in event detection and system health monitoring.

Keywords: geographical information system, hydrometeorological, leptospirosis, multivariate regression

Procedia PDF Downloads 96
27564 Hydrochemical Contamination Profiling and Spatial-Temporal Mapping with the Support of Multivariate and Cluster Statistical Analysis

Authors: Sofia Barbosa, Mariana Pinto, José António Almeida, Edgar Carvalho, Catarina Diamantino

Abstract:

The aim of this work was to test a methodology able to generate spatial-temporal maps that can synthesize simultaneously the trends of distinct hydrochemical indicators in an old radium-uranium tailings dam deposit. Multidimensionality reduction derived from principal component analysis and subsequent data aggregation derived from clustering analysis allow to identify distinct hydrochemical behavioural profiles and to generate synthetic evolutionary hydrochemical maps.

Keywords: Contamination plume migration, K-means of PCA scores, groundwater and mine water monitoring, spatial-temporal hydrochemical trends

Procedia PDF Downloads 202
27563 Detecting Venomous Files in IDS Using an Approach Based on Data Mining Algorithm

Authors: Sukhleen Kaur

Abstract:

In security groundwork, Intrusion Detection System (IDS) has become an important component. The IDS has received increasing attention in recent years. IDS is one of the effective way to detect different kinds of attacks and malicious codes in a network and help us to secure the network. Data mining techniques can be implemented to IDS, which analyses the large amount of data and gives better results. Data mining can contribute to improving intrusion detection by adding a level of focus to anomaly detection. So far the study has been carried out on finding the attacks but this paper detects the malicious files. Some intruders do not attack directly, but they hide some harmful code inside the files or may corrupt those file and attack the system. These files are detected according to some defined parameters which will form two lists of files as normal files and harmful files. After that data mining will be performed. In this paper a hybrid classifier has been used via Naive Bayes and Ripper classification methods. The results show how the uploaded file in the database will be tested against the parameters and then it is characterised as either normal or harmful file and after that the mining is performed. Moreover, when a user tries to mine on harmful file it will generate an exception that mining cannot be made on corrupted or harmful files.

Keywords: data mining, association, classification, clustering, decision tree, intrusion detection system, misuse detection, anomaly detection, naive Bayes, ripper

Procedia PDF Downloads 397
27562 One-Class Support Vector Machine for Sentiment Analysis of Movie Review Documents

Authors: Chothmal, Basant Agarwal

Abstract:

Sentiment analysis means to classify a given review document into positive or negative polar document. Sentiment analysis research has been increased tremendously in recent times due to its large number of applications in the industry and academia. Sentiment analysis models can be used to determine the opinion of the user towards any entity or product. E-commerce companies can use sentiment analysis model to improve their products on the basis of users’ opinion. In this paper, we propose a new One-class Support Vector Machine (One-class SVM) based sentiment analysis model for movie review documents. In the proposed approach, we initially extract features from one class of documents, and further test the given documents with the one-class SVM model if a given new test document lies in the model or it is an outlier. Experimental results show the effectiveness of the proposed sentiment analysis model.

Keywords: feature selection methods, machine learning, NB, one-class SVM, sentiment analysis, support vector machine

Procedia PDF Downloads 491
27561 Epileptic Seizure Prediction Focusing on Relative Change in Consecutive Segments of EEG Signal

Authors: Mohammad Zavid Parvez, Manoranjan Paul

Abstract:

Epilepsy is a common neurological disorders characterized by sudden recurrent seizures. Electroencephalogram (EEG) is widely used to diagnose possible epileptic seizure. Many research works have been devoted to predict epileptic seizure by analyzing EEG signal. Seizure prediction by analyzing EEG signals are challenging task due to variations of brain signals of different patients. In this paper, we propose a new approach for feature extraction based on phase correlation in EEG signals. In phase correlation, we calculate relative change between two consecutive segments of an EEG signal and then combine the changes with neighboring signals to extract features. These features are then used to classify preictal/ictal and interictal EEG signals for seizure prediction. Experiment results show that the proposed method carries good prediction rate with greater consistence for the benchmark data set in different brain locations compared to the existing state-of-the-art methods.

Keywords: EEG, epilepsy, phase correlation, seizure

Procedia PDF Downloads 292
27560 Detection and Quantification of Active Pharmaceutical Ingredients as Adulterants in Garcinia cambogia Slimming Preparations Using NIR Spectroscopy Combined with Chemometrics

Authors: Dina Ahmed Selim, Eman Shawky Anwar, Rasha Mohamed Abu El-Khair

Abstract:

A rapid, simple and efficient method with minimal sample treatment was developed for authentication of Garcinia cambogia fruit peel powder, along with determining undeclared active pharmaceutical ingredients (APIs) in its herbal slimming dietary supplements using near infrared spectroscopy combined with chemometrics. Five featured adulterants, including sibutramine, metformin, orlistat, ephedrine, and theophylline are selected as target compounds. The Near infrared spectral data matrix of authentic Garcinia cambogia fruit peel and specimens degraded by intentional contamination with the five selected APIs was subjected to hierarchical clustering analysis to investigate their bundling figure. SIMCA models were established to ensure the genuiness of Garcinia cambogia fruit peel which resulted in perfect classification of all tested specimens. Adulterated samples were utilized for construction of PLSR models based on different APIs contents at minute levels of fraud practices (LOQ < 0.2% w/w).The suggested approach can be applied to enhance and guarantee the safety and quality of Garcinia fruit peel powder as raw material and in dietary supplements.

Keywords: Garcinia cambogia, Quality control, NIR spectroscopy, Chemometrics

Procedia PDF Downloads 62
27559 Logistic Regression Based Model for Predicting Students’ Academic Performance in Higher Institutions

Authors: Emmanuel Osaze Oshoiribhor, Adetokunbo MacGregor John-Otumu

Abstract:

In recent years, there has been a desire to forecast student academic achievement prior to graduation. This is to help them improve their grades, particularly for individuals with poor performance. The goal of this study is to employ supervised learning techniques to construct a predictive model for student academic achievement. Many academics have already constructed models that predict student academic achievement based on factors such as smoking, demography, culture, social media, parent educational background, parent finances, and family background, to name a few. This feature and the model employed may not have correctly classified the students in terms of their academic performance. This model is built using a logistic regression classifier with basic features such as the previous semester's course score, attendance to class, class participation, and the total number of course materials or resources the student is able to cover per semester as a prerequisite to predict if the student will perform well in future on related courses. The model outperformed other classifiers such as Naive bayes, Support vector machine (SVM), Decision Tree, Random forest, and Adaboost, returning a 96.7% accuracy. This model is available as a desktop application, allowing both instructors and students to benefit from user-friendly interfaces for predicting student academic achievement. As a result, it is recommended that both students and professors use this tool to better forecast outcomes.

Keywords: artificial intelligence, ML, logistic regression, performance, prediction

Procedia PDF Downloads 78
27558 A Real Time Set Up for Retrieval of Emotional States from Human Neural Responses

Authors: Rashima Mahajan, Dipali Bansal, Shweta Singh

Abstract:

Real time non-invasive Brain Computer Interfaces have a significant progressive role in restoring or maintaining a quality life for medically challenged people. This manuscript provides a comprehensive review of emerging research in the field of cognitive/affective computing in context of human neural responses. The perspectives of different emotion assessment modalities like face expressions, speech, text, gestures, and human physiological responses have also been discussed. Focus has been paid to explore the ability of EEG (Electroencephalogram) signals to portray thoughts, feelings, and unspoken words. An automated workflow-based protocol to design an EEG-based real time Brain Computer Interface system for analysis and classification of human emotions elicited by external audio/visual stimuli has been proposed. The front end hardware includes a cost effective and portable Emotive EEG Neuroheadset unit, a personal computer and a set of external stimulators. Primary signal analysis and processing of real time acquired EEG shall be performed using MATLAB based advanced brain mapping toolbox EEGLab/BCILab. This shall be followed by the development of MATLAB based self-defined algorithm to capture and characterize temporal and spectral variations in EEG under emotional stimulations. The extracted hybrid feature set shall be used to classify emotional states using artificial intelligence tools like Artificial Neural Network. The final system would result in an inexpensive, portable and more intuitive Brain Computer Interface in real time scenario to control prosthetic devices by translating different brain states into operative control signals.

Keywords: brain computer interface, electroencephalogram, EEGLab, BCILab, emotive, emotions, interval features, spectral features, artificial neural network, control applications

Procedia PDF Downloads 301
27557 Genome-Wide Assessment of Putative Superoxide Dismutases in Unicellular and Filamentous Cyanobacteria

Authors: Shivam Yadav, Neelam Atri

Abstract:

Cyanobacteria are photoautotrophic prokaryotes able to grow in diverse ecological habitats, originated 2.5 - 3.5 billion years ago and brought oxygenic photosynthesis. Since then superoxide dismutases (SODs) acquired great significance due to their ability to catalyze detoxification of byproducts of oxygenic photosynthesis, i.e. superoxide radicals. Sequence information from several cyanobacterial genomes offers a unique opportunity to conduct a comprehensive comparative analysis of the superoxide dismutases family. In the present study, we extracted information regarding SODs from species of sequenced cyanobacteria and investigated their diversity, conservation, domain structure, and evolution. 144 putative SOD homologues were identified. SODs are present in all cyanobacterial species reflecting their significant role in survival. However, their distribution varies, fewer in unicellular marine strains whereas abundant in filamentous nitrogen-fixing cyanobacteria. Motifs and invariant amino acids typical in eukaryotic SODs were conserved well in these proteins. These SODs were classified into three major families according to their domain structures. Interestingly, they lack additional domains as found in proteins of other family. Phylogenetic relationships correspond well with phylogenies based on 16S rRNA and clustering occurs on the basis of structural characteristics such as domain organization. Similar conserved motifs and amino acids indicate that cyanobacterial SODs make use of a similar catalytic mechanism as eukaryotic SODs. Gene gain-and-loss is insignificant during SOD evolution as evidenced by absence of additional domain. This study has not only examined an overall background of sequence-structure-function interactions for the SOD gene family but also revealed variation among SOD distribution based on ecophysiological and morphological characters.

Keywords: comparative genomics, cyanobacteria, phylogeny, superoxide dismutases

Procedia PDF Downloads 115
27556 Recommendations for Data Quality Filtering of Opportunistic Species Occurrence Data

Authors: Camille Van Eupen, Dirk Maes, Marc Herremans, Kristijn R. R. Swinnen, Ben Somers, Stijn Luca

Abstract:

In ecology, species distribution models are commonly implemented to study species-environment relationships. These models increasingly rely on opportunistic citizen science data when high-quality species records collected through standardized recording protocols are unavailable. While these opportunistic data are abundant, uncertainty is usually high, e.g., due to observer effects or a lack of metadata. Data quality filtering is often used to reduce these types of uncertainty in an attempt to increase the value of studies relying on opportunistic data. However, filtering should not be performed blindly. In this study, recommendations are built for data quality filtering of opportunistic species occurrence data that are used as input for species distribution models. Using an extensive database of 5.7 million citizen science records from 255 species in Flanders, the impact on model performance was quantified by applying three data quality filters, and these results were linked to species traits. More specifically, presence records were filtered based on record attributes that provide information on the observation process or post-entry data validation, and changes in the area under the receiver operating characteristic (AUC), sensitivity, and specificity were analyzed using the Maxent algorithm with and without filtering. Controlling for sample size enabled us to study the combined impact of data quality filtering, i.e., the simultaneous impact of an increase in data quality and a decrease in sample size. Further, the variation among species in their response to data quality filtering was explored by clustering species based on four traits often related to data quality: commonness, popularity, difficulty, and body size. Findings show that model performance is affected by i) the quality of the filtered data, ii) the proportional reduction in sample size caused by filtering and the remaining absolute sample size, and iii) a species ‘quality profile’, resulting from a species classification based on the four traits related to data quality. The findings resulted in recommendations on when and how to filter volunteer generated and opportunistically collected data. This study confirms that correctly processed citizen science data can make a valuable contribution to ecological research and species conservation.

Keywords: citizen science, data quality filtering, species distribution models, trait profiles

Procedia PDF Downloads 178
27555 A Relationship Extraction Method from Literary Fiction Considering Korean Linguistic Features

Authors: Hee-Jeong Ahn, Kee-Won Kim, Seung-Hoon Kim

Abstract:

The knowledge of the relationship between characters can help readers to understand the overall story or plot of the literary fiction. In this paper, we present a method for extracting the specific relationship between characters from a Korean literary fiction. Generally, methods for extracting relationships between characters in text are statistical or computational methods based on the sentence distance between characters without considering Korean linguistic features. Furthermore, it is difficult to extract the relationship with direction from text, such as one-sided love, because they consider only the weight of relationship, without considering the direction of the relationship. Therefore, in order to identify specific relationships between characters, we propose a statistical method considering linguistic features, such as syntactic patterns and speech verbs in Korean. The result of our method is represented by a weighted directed graph of the relationship between the characters. Furthermore, we expect that proposed method could be applied to the relationship analysis between characters of other content like movie or TV drama.

Keywords: data mining, Korean linguistic feature, literary fiction, relationship extraction

Procedia PDF Downloads 357
27554 Traffic Prediction with Raw Data Utilization and Context Building

Authors: Zhou Yang, Heli Sun, Jianbin Huang, Jizhong Zhao, Shaojie Qiao

Abstract:

Traffic prediction is essential in a multitude of ways in modern urban life. The researchers of earlier work in this domain carry out the investigation chiefly with two major focuses: (1) the accurate forecast of future values in multiple time series and (2) knowledge extraction from spatial-temporal correlations. However, two key considerations for traffic prediction are often missed: the completeness of raw data and the full context of the prediction timestamp. Concentrating on the two drawbacks of earlier work, we devise an approach that can address these issues in a two-phase framework. First, we utilize the raw trajectories to a greater extent through building a VLA table and data compression. We obtain the intra-trajectory features with graph-based encoding and the intertrajectory ones with a grid-based model and the technique of back projection that restore their surrounding high-resolution spatial-temporal environment. To the best of our knowledge, we are the first to study direct feature extraction from raw trajectories for traffic prediction and attempt the use of raw data with the least degree of reduction. In the prediction phase, we provide a broader context for the prediction timestamp by taking into account the information that are around it in the training dataset. Extensive experiments on several well-known datasets have verified the effectiveness of our solution that combines the strength of raw trajectory data and prediction context. In terms of performance, our approach surpasses several state-of-the-art methods for traffic prediction.

Keywords: traffic prediction, raw data utilization, context building, data reduction

Procedia PDF Downloads 104
27553 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation

Procedia PDF Downloads 217
27552 Adaptive Dehazing Using Fusion Strategy

Authors: M. Ramesh Kanthan, S. Naga Nandini Sujatha

Abstract:

The goal of haze removal algorithms is to enhance and recover details of scene from foggy image. In enhancement the proposed method focus into two main categories: (i) image enhancement based on Adaptive contrast Histogram equalization, and (ii) image edge strengthened Gradient model. Many circumstances accurate haze removal algorithms are needed. The de-fog feature works through a complex algorithm which first determines the fog destiny of the scene, then analyses the obscured image before applying contrast and sharpness adjustments to the video in real-time to produce image the fusion strategy is driven by the intrinsic properties of the original image and is highly dependent on the choice of the inputs and the weights. Then the output haze free image has reconstructed using fusion methodology. In order to increase the accuracy, interpolation method has used in the output reconstruction. A promising retrieval performance is achieved especially in particular examples.

Keywords: single image, fusion, dehazing, multi-scale fusion, per-pixel, weight map

Procedia PDF Downloads 446
27551 Water Quality Calculation and Management System

Authors: H. M. B. N Jayasinghe

Abstract:

The water is found almost everywhere on Earth. Water resources contain a lot of pollution. Some diseases can be spread through the water to the living beings. So to be clean water it should undergo a number of treatments necessary to make it drinkable. So it is must to have purification technology for the wastewater. So the waste water treatment plants act a major role in these issues. When considering the procedures taken after the water treatment process was always based on manual calculations and recordings. Water purification plants may interact with lots of manual processes. It means the process taking much time consuming. So the final evaluation and chemical, biological treatment process get delayed. So to prevent those types of drawbacks there are some computerized programmable calculation and analytical techniques going to be introduced to the laboratory staff. To solve this problem automated system will be a solution in which guarantees the rational selection. A decision support system is a way to model data and make quality decisions based upon it. It is widely used in the world for the various kind of process automation. Decision support systems that just collect data and organize it effectively are usually called passive models where they do not suggest a specific decision but only reveal information. This web base system is based on global positioning data adding facility with map location. Most worth feature is SMS and E-mail alert service to inform the appropriate person on a critical issue. The technological influence to the system is HTML, MySQL, PHP, and some other web developing technologies. Current issues in the computerized water chemistry analysis are not much deep in progress. For an example the swimming pool water quality calculator. The validity of the system has been verified by test running and comparison with an existing plant data. Automated system will make the life easier in productively and qualitatively.

Keywords: automated system, wastewater, purification technology, map location

Procedia PDF Downloads 231