Search results for: Pima Indians diabetes dataset
195 GeNS: a Biological Data Integration Platform
Authors: Joel Arrais, João E. Pereira, João Fernandes, José Luís Oliveira
Abstract:
The scientific achievements coming from molecular biology depend greatly on the capability of computational applications to analyze the laboratorial results. A comprehensive analysis of an experiment requires typically the simultaneous study of the obtained dataset with data that is available in several distinct public databases. Nevertheless, developing a centralized access to these distributed databases rises up a set of challenges such as: what is the best integration strategy, how to solve nomenclature clashes, how to solve database overlapping data and how to deal with huge datasets. In this paper we present GeNS, a system that uses a simple and yet innovative approach to address several biological data integration issues. Compared with existing systems, the main advantages of GeNS are related to its maintenance simplicity and to its coverage and scalability, in terms of number of supported databases and data types. To support our claims we present the current use of GeNS in two concrete applications. GeNS currently contains more than 140 million of biological relations and it can be publicly downloaded or remotely access through SOAP web services.Keywords: Data integration, biological databases
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1632194 Multi-Sensor Target Tracking Using Ensemble Learning
Authors: Bhekisipho Twala, Mantepu Masetshaba, Ramapulana Nkoana
Abstract:
Multiple classifier systems combine several individual classifiers to deliver a final classification decision. However, an increasingly controversial question is whether such systems can outperform the single best classifier, and if so, what form of multiple classifiers system yields the most significant benefit. Also, multi-target tracking detection using multiple sensors is an important research field in mobile techniques and military applications. In this paper, several multiple classifiers systems are evaluated in terms of their ability to predict a system’s failure or success for multi-sensor target tracking tasks. The Bristol Eden project dataset is utilised for this task. Experimental and simulation results show that the human activity identification system can fulfil requirements of target tracking due to improved sensors classification performances with multiple classifier systems constructed using boosting achieving higher accuracy rates.
Keywords: Single classifier, machine learning, ensemble learning, multi-sensor target tracking.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 598193 A Phenomic Algorithm for Reconstruction of Gene Networks
Authors: Rio G. L. D'Souza, K. Chandra Sekaran, A. Kandasamy
Abstract:
The goal of Gene Expression Analysis is to understand the processes that underlie the regulatory networks and pathways controlling inter-cellular and intra-cellular activities. In recent times microarray datasets are extensively used for this purpose. The scope of such analysis has broadened in recent times towards reconstruction of gene networks and other holistic approaches of Systems Biology. Evolutionary methods are proving to be successful in such problems and a number of such methods have been proposed. However all these methods are based on processing of genotypic information. Towards this end, there is a need to develop evolutionary methods that address phenotypic interactions together with genotypic interactions. We present a novel evolutionary approach, called Phenomic algorithm, wherein the focus is on phenotypic interaction. We use the expression profiles of genes to model the interactions between them at the phenotypic level. We apply this algorithm to the yeast sporulation dataset and show that the algorithm can identify gene networks with relative ease.
Keywords: Evolutionary computing, gene expression analysis, gene networks, microarray data analysis, phenomic algorithms.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1924192 An Auxiliary Technique for Coronary Heart Disease Prediction by Analyzing ECG Based on ResNet and Bi-LSTM
Authors: Yang Zhang, Jian He
Abstract:
Heart disease is one of the leading causes of death in the world, and coronary heart disease (CHD) is one of the major heart diseases. Electrocardiogram (ECG) is widely used in the detection of heart diseases, but the traditional manual method for CHD prediction by analyzing ECG requires lots of professional knowledge for doctors. This paper presents sliding window and continuous wavelet transform (CWT) to transform ECG signals into images, and then ResNet and Bi-LSTM are introduced to build the ECG feature extraction network (namely ECGNet). At last, an auxiliary system for CHD prediction was developed based on modified ResNet18 and Bi-LSTM, and the public ECG dataset of CHD from MIMIC-3 was used to train and test the system. The experimental results show that the accuracy of the method is 83%, and the F1-score is 83%. Compared with the available methods for CHD prediction based on ECG, such as kNN, decision tree, VGGNet, etc., this method not only improves the prediction accuracy but also could avoid the degradation phenomenon of the deep learning network.
Keywords: Bi-LSTM, CHD, coronary heart disease, ECG, electrocardiogram, ResNet, sliding window.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 334191 An Anatomically-Based Model of the Nerves in the Human Foot
Authors: Muhammad Zeeshan UlHaque, Peng Du, Leo K. Cheng, Marc D. Jacobs
Abstract:
Sensory nerves in the foot play an important part in the diagnosis of various neuropathydisorders, especially in diabetes mellitus.However, a detailed description of the anatomical distribution of the nerves is currently lacking. A computationalmodel of the afferent nerves inthe foot may bea useful tool for the study of diabetic neuropathy. In this study, we present the development of an anatomically-based model of various major sensory nerves of the sole and dorsal sidesof the foot. In addition, we presentan algorithm for generating synthetic somatosensory nerve networks in the big-toe region of a right foot model. The algorithm was based on a modified version of the Monte Carlo algorithm, with the capability of being able to vary the intra-epidermal nerve fiber density in differentregionsof the foot model. Preliminary results from the combinedmodel show the realistic anatomical structure of the major nerves as well as the smaller somatosensory nerves of the foot. The model may now be developed to investigate the functional outcomes of structural neuropathyindiabetic patients.
Keywords: Diabetic neuropathy, Finite element modeling, Monte Carlo Algorithm, Somatosensory nerve networks
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2335190 Application of Advanced Remote Sensing Data in Mineral Exploration in the Vicinity of Heavy Dense Forest Cover Area of Jharkhand and Odisha State Mining Area
Authors: Hemant Kumar, R. N. K. Sharma, A. P. Krishna
Abstract:
The study has been carried out on the Saranda in Jharkhand and a part of Odisha state. Geospatial data of Hyperion, a remote sensing satellite, have been used. This study has used a wide variety of patterns related to image processing to enhance and extract the mining class of Fe and Mn ores.Landsat-8, OLI sensor data have also been used to correctly explore related minerals. In this way, various processes have been applied to increase the mineralogy class and comparative evaluation with related frequency done. The Hyperion dataset for hyperspectral remote sensing has been specifically verified as an effective tool for mineral or rock information extraction within the band range of shortwave infrared used. The abundant spatial and spectral information contained in hyperspectral images enables the differentiation of different objects of any object into targeted applications for exploration such as exploration detection, mining.
Keywords: Hyperion, hyperspectral, sensor, Landsat-8.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 621189 Image Retrieval Based on Multi-Feature Fusion for Heterogeneous Image Databases
Authors: N. W. U. D. Chathurani, Shlomo Geva, Vinod Chandran, Proboda Rajapaksha
Abstract:
Selecting an appropriate image representation is the most important factor in implementing an effective Content-Based Image Retrieval (CBIR) system. This paper presents a multi-feature fusion approach for efficient CBIR, based on the distance distribution of features and relative feature weights at the time of query processing. It is a simple yet effective approach, which is free from the effect of features' dimensions, ranges, internal feature normalization and the distance measure. This approach can easily be adopted in any feature combination to improve retrieval quality. The proposed approach is empirically evaluated using two benchmark datasets for image classification (a subset of the Corel dataset and Oliva and Torralba) and compared with existing approaches. The performance of the proposed approach is confirmed with the significantly improved performance in comparison with the independently evaluated baseline of the previously proposed feature fusion approaches.
Keywords: Feature fusion, image retrieval, membership function, normalization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1344188 Unsupervised Clustering Methods for Identifying Rare Events in Anomaly Detection
Authors: Witcha Chimphlee, Abdul Hanan Abdullah, Mohd Noor Md Sap, Siriporn Chimphlee, Surat Srinoy
Abstract:
It is important problems to increase the detection rates and reduce false positive rates in Intrusion Detection System (IDS). Although preventative techniques such as access control and authentication attempt to prevent intruders, these can fail, and as a second line of defence, intrusion detection has been introduced. Rare events are events that occur very infrequently, detection of rare events is a common problem in many domains. In this paper we propose an intrusion detection method that combines Rough set and Fuzzy Clustering. Rough set has to decrease the amount of data and get rid of redundancy. Fuzzy c-means clustering allow objects to belong to several clusters simultaneously, with different degrees of membership. Our approach allows us to recognize not only known attacks but also to detect suspicious activity that may be the result of a new, unknown attack. The experimental results on Knowledge Discovery and Data Mining-(KDDCup 1999) Dataset show that the method is efficient and practical for intrusion detection systems.Keywords: Network and security, intrusion detection, fuzzy cmeans, rough set.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2859187 Vehicle Type Classification with Geometric and Appearance Attributes
Authors: Ghada S. Moussa
Abstract:
With the increase in population along with economic prosperity, an enormous increase in the number and types of vehicles on the roads occurred. This fact brings a growing need for efficiently yet effectively classifying vehicles into their corresponding categories, which play a crucial role in many areas of infrastructure planning and traffic management.
This paper presents two vehicle-type classification approaches; 1) geometric-based and 2) appearance-based. The two classification approaches are used for two tasks: multi-class and intra-class vehicle classifications. For the evaluation purpose of the proposed classification approaches’ performance and the identification of the most effective yet efficient one, 10-fold cross-validation technique is used with a large dataset. The proposed approaches are distinguishable from previous research on vehicle classification in which: i) they consider both geometric and appearance attributes of vehicles, and ii) they perform remarkably well in both multi-class and intra-class vehicle classification. Experimental results exhibit promising potentials implementations of the proposed vehicle classification approaches into real-world applications.
Keywords: Appearance attributes, Geometric attributes, Support vector machine, Vehicle classification.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4278186 Emotion Classification using Adaptive SVMs
Authors: P. Visutsak
Abstract:
The study of the interaction between humans and computers has been emerging during the last few years. This interaction will be more powerful if computers are able to perceive and respond to human nonverbal communication such as emotions. In this study, we present the image-based approach to emotion classification through lower facial expression. We employ a set of feature points in the lower face image according to the particular face model used and consider their motion across each emotive expression of images. The vector of displacements of all feature points input to the Adaptive Support Vector Machines (A-SVMs) classifier that classify it into seven basic emotions scheme, namely neutral, angry, disgust, fear, happy, sad and surprise. The system was tested on the Japanese Female Facial Expression (JAFFE) dataset of frontal view facial expressions [7]. Our experiments on emotion classification through lower facial expressions demonstrate the robustness of Adaptive SVM classifier and verify the high efficiency of our approach.Keywords: emotion classification, facial expression, adaptive support vector machines, facial expression classifier.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2224185 Automatic Detection of Proliferative Cells in Immunohistochemically Images of Meningioma Using Fuzzy C-Means Clustering and HSV Color Space
Authors: Vahid Anari, Mina Bakhshi
Abstract:
Visual search and identification of immunohistochemically stained tissue of meningioma was performed manually in pathologic laboratories to detect and diagnose the cancers type of meningioma. This task is very tedious and time-consuming. Moreover, because of cell's complex nature, it still remains a challenging task to segment cells from its background and analyze them automatically. In this paper, we develop and test a computerized scheme that can automatically identify cells in microscopic images of meningioma and classify them into positive (proliferative) and negative (normal) cells. Dataset including 150 images are used to test the scheme. The scheme uses Fuzzy C-means algorithm as a color clustering method based on perceptually uniform hue, saturation, value (HSV) color space. Since the cells are distinguishable by the human eye, the accuracy and stability of the algorithm are quantitatively compared through application to a wide variety of real images.
Keywords: Positive cell, color segmentation, HSV color space, immunohistochemistry, meningioma, thresholding, fuzzy c-means.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 693184 Unconstrained Arabic Online Handwritten Words Segmentation using New HMM State Design
Authors: Randa Ibrahim Elanwar, Mohsen Rashwan, Samia Mashali
Abstract:
In this paper we propose a segmentation system for unconstrained Arabic online handwriting. An essential problem addressed by analytical-based word recognition system. The system is composed of two-stages the first is a newly special designed hidden Markov model (HMM) and the second is a rules based stage. In our system, handwritten words are broken up into characters by simultaneous segmentation-recognition using HMMs of unique design trained using online features most of which are novel. The HMM output characters boundaries represent the proposed segmentation points (PSP) which are then validated by rules-based post stage without any contextual information help to solve different segmentation errors. The HMM has been designed and tested using a self collected dataset (OHASD) [1]. Most errors cases are cured and remarkable segmentation enhancement is achieved. Very promising word and character segmentation rates are obtained regarding the unconstrained Arabic handwriting difficulty and not using context help.
Keywords: Arabic, Hidden Markov Models, online handwriting, word segmentation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1836183 Multinomial Dirichlet Gaussian Process Model for Classification of Multidimensional Data
Authors: Wanhyun Cho, Soonja Kang, Sangkyoon Kim, Soonyoung Park
Abstract:
We present probabilistic multinomial Dirichlet classification model for multidimensional data and Gaussian process priors. Here, we have considered efficient computational method that can be used to obtain the approximate posteriors for latent variables and parameters needed to define the multiclass Gaussian process classification model. We first investigated the process of inducing a posterior distribution for various parameters and latent function by using the variational Bayesian approximations and important sampling method, and next we derived a predictive distribution of latent function needed to classify new samples. The proposed model is applied to classify the synthetic multivariate dataset in order to verify the performance of our model. Experiment result shows that our model is more accurate than the other approximation methods.Keywords: Multinomial dirichlet classification model, Gaussian process priors, variational Bayesian approximation, Importance sampling, approximate posterior distribution, Marginal likelihood evidence.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1614182 Key Factors Influencing Individual Knowledge Capability in KIFs
Authors: Salman Iqbal
Abstract:
Knowledge management (KM) literature has mainly focused on the antecedents of KM. The purpose of this study is to investigate the effect of specific human resource management (HRM) practices on employee knowledge sharing and its outcome as individual knowledge capability. Based on previous literature, a model is proposed for the study and hypotheses are formulated. The cross-sectional dataset comes from a sample of 19 knowledge intensive firms (KIFs). This study has run an item parceling technique followed by Confirmatory Factor Analysis (CFA) on the latent constructs of the research model. Employees’ collaboration and their interpersonal trust can help to improve their knowledge sharing behaviour and knowledge capability within organisations. This study suggests that in future, by using a larger sample, better statistical insight is possible. The findings of this study are beneficial for scholars, policy makers and practitioners. The empirical results of this study are entirely based on employees’ perceptions and make a significant research contribution, given there is a dearth of empirical research focusing on the subcontinent.
Keywords: Employees’ collaboration, individual knowledge capability, knowledge sharing, monetary rewards, structural equation modelling.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1353181 A Comparative Study of Malware Detection Techniques Using Machine Learning Methods
Authors: Cristina Vatamanu, Doina Cosovan, Dragoş Gavriluţ, Henri Luchian
Abstract:
In the past few years, the amount of malicious software increased exponentially and, therefore, machine learning algorithms became instrumental in identifying clean and malware files through (semi)-automated classification. When working with very large datasets, the major challenge is to reach both a very high malware detection rate and a very low false positive rate. Another challenge is to minimize the time needed for the machine learning algorithm to do so. This paper presents a comparative study between different machine learning techniques such as linear classifiers, ensembles, decision trees or various hybrids thereof. The training dataset consists of approximately 2 million clean files and 200.000 infected files, which is a realistic quantitative mixture. The paper investigates the above mentioned methods with respect to both their performance (detection rate and false positive rate) and their practicability.Keywords: Detection Rate, False Positives, Perceptron, One Side Class, Ensembles, Decision Tree, Hybrid methods, Feature Selection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3280180 Sparse Networks-Based Speedup Technique for Proteins Betweenness Centrality Computation
Authors: Razvan Bocu, Dr Sabin Tabirca
Abstract:
The study of proteomics reached unexpected levels of interest, as a direct consequence of its discovered influence over some complex biological phenomena, such as problematic diseases like cancer. This paper presents the latest authors- achievements regarding the analysis of the networks of proteins (interactome networks), by computing more efficiently the betweenness centrality measure. The paper introduces the concept of betweenness centrality, and then describes how betweenness computation can help the interactome net- work analysis. Current sequential implementations for the between- ness computation do not perform satisfactory in terms of execution times. The paper-s main contribution is centered towards introducing a speedup technique for the betweenness computation, based on modified shortest path algorithms for sparse graphs. Three optimized generic algorithms for betweenness computation are described and implemented, and their performance tested against real biological data, which is part of the IntAct dataset.Keywords: Betweenness centrality, interactome networks, protein-protein interactions, sub-communities, sparse networks, speedup tech-nique, IntAct.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1506179 Improved Rare Species Identification Using Focal Loss Based Deep Learning Models
Authors: Chad Goldsworthy, B. Rajeswari Matam
Abstract:
The use of deep learning for species identification in camera trap images has revolutionised our ability to study, conserve and monitor species in a highly efficient and unobtrusive manner, with state-of-the-art models achieving accuracies surpassing the accuracy of manual human classification. The high imbalance of camera trap datasets, however, results in poor accuracies for minority (rare or endangered) species due to their relative insignificance to the overall model accuracy. This paper investigates the use of Focal Loss, in comparison to the traditional Cross Entropy Loss function, to improve the identification of minority species in the “255 Bird Species” dataset from Kaggle. The results show that, although Focal Loss slightly decreased the accuracy of the majority species, it was able to increase the F1-score by 0.06 and improve the identification of the bottom two, five and ten (minority) species by 37.5%, 15.7% and 10.8%, respectively, as well as resulting in an improved overall accuracy of 2.96%.
Keywords: Convolutional neural networks, data imbalance, deep learning, focal loss, species classification, wildlife conservation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1419178 Detecting and Tracking Vehicles in Airborne Videos
Authors: Hsu-Yung Cheng, Chih-Chang Yu
Abstract:
In this work, we present an automatic vehicle detection system for airborne videos using combined features. We propose a pixel-wise classification method for vehicle detection using Dynamic Bayesian Networks. In spite of performing pixel-wise classification, relations among neighboring pixels in a region are preserved in the feature extraction process. The main novelty of the detection scheme is that the extracted combined features comprise not only pixel-level information but also region-level information. Afterwards, tracking is performed on the detected vehicles. Tracking is performed using efficient Kalman filter with dynamic particle sampling. Experiments were conducted on a wide variety of airborne videos. We do not assume prior information of camera heights, orientation, and target object sizes in the proposed framework. The results demonstrate flexibility and good generalization abilities of the proposed method on a challenging dataset.Keywords: Vehicle Detection, Airborne Video, Tracking, Dynamic Bayesian Networks
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1587177 Topic Modeling Using Latent Dirichlet Allocation and Latent Semantic Indexing on South African Telco Twitter Data
Authors: Phumelele P. Kubheka, Pius A. Owolawi, Gbolahan Aiyetoro
Abstract:
Twitter is one of the most popular social media platforms where users share their opinions on different subjects. Twitter can be considered a great source for mining text due to the high volumes of data generated through the platform daily. Many industries such as telecommunication companies can leverage the availability of Twitter data to better understand their markets and make an appropriate business decision. This study performs topic modeling on Twitter data using Latent Dirichlet Allocation (LDA). The obtained results are benchmarked with another topic modeling technique, Latent Semantic Indexing (LSI). The study aims to retrieve topics on a Twitter dataset containing user tweets on South African Telcos. Results from this study show that LSI is much faster than LDA. However, LDA yields better results with higher topic coherence by 8% for the best-performing model in this experiment. A higher topic coherence score indicates better performance of the model.
Keywords: Big data, latent Dirichlet allocation, latent semantic indexing, Telco, topic modeling, Twitter.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 459176 Improvement in Power Transformer Intelligent Dissolved Gas Analysis Method
Authors: S. Qaedi, S. Seyedtabaii
Abstract:
Non-Destructive evaluation of in-service power transformer condition is necessary for avoiding catastrophic failures. Dissolved Gas Analysis (DGA) is one of the important methods. Traditional, statistical and intelligent DGA approaches have been adopted for accurate classification of incipient fault sources. Unfortunately, there are not often enough faulty patterns required for sufficient training of intelligent systems. By bootstrapping the shortcoming is expected to be alleviated and algorithms with better classification success rates to be obtained. In this paper the performance of an artificial neural network, K-Nearest Neighbour and support vector machine methods using bootstrapped data are detailed and shown that while the success rate of the ANN algorithms improves remarkably, the outcome of the others do not benefit so much from the provided enlarged data space. For assessment, two databases are employed: IEC TC10 and a dataset collected from reported data in papers. High average test success rate well exhibits the remarkable outcome.Keywords: Dissolved gas analysis, Transformer incipient fault, Artificial Neural Network, Support Vector Machine (SVM), KNearest Neighbor (KNN)
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2739175 Classification of Political Affiliations by Reduced Number of Features
Authors: Vesile Evrim, Aliyu Awwal
Abstract:
By the evolvement in technology, the way of expressing opinions switched direction to the digital world. The domain of politics, as one of the hottest topics of opinion mining research, merged together with the behavior analysis for affiliation determination in texts, which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 were constituted by Linguistic Inquiry and Word Count (LIWC) features were tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that the “Decision Tree”, “Rule Induction” and “M5 Rule” classifiers when used with “SVM” and “IGR” feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “Function”, as an aggregate feature of the linguistic category, was found as the most differentiating feature among the 68 features with the accuracy of 81% in classifying articles either as Republican or Democrat.Keywords: Politics, machine learning, feature selection, LIWC.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2365174 Genetic Folding: Analyzing the Mercer-s Kernels Effect in Support Vector Machine using Genetic Folding
Authors: Mohd A. Mezher, Maysam F. Abbod
Abstract:
Genetic Folding (GF) a new class of EA named as is introduced for the first time. It is based on chromosomes composed of floating genes structurally organized in a parent form and separated by dots. Although, the genotype/phenotype system of GF generates a kernel expression, which is the objective function of superior classifier. In this work the question of the satisfying mapping-s rules in evolving populations is addressed by analyzing populations undergoing either Mercer-s or none Mercer-s rule. The results presented here show that populations undergoing Mercer-s rules improve practically models selection of Support Vector Machine (SVM). The experiment is trained multi-classification problem and tested on nonlinear Ionosphere dataset. The target of this paper is to answer the question of evolving Mercer-s rule in SVM addressed using either genetic folding satisfied kernel-s rules or not applied to complicated domains and problems.Keywords: Genetic Folding, GF, Evolutionary Algorithms, Support Vector Machine, Genetic Algorithm, Genetic Programming, Multi-Classification, Mercer's Rules
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1626173 Optimized Preprocessing for Accurate and Efficient Bioassay Prediction with Machine Learning Algorithms
Authors: Jeff Clarine, Chang-Shyh Peng, Daisy Sang
Abstract:
Bioassay is the measurement of the potency of a chemical substance by its effect on a living animal or plant tissue. Bioassay data and chemical structures from pharmacokinetic and drug metabolism screening are mined from and housed in multiple databases. Bioassay prediction is calculated accordingly to determine further advancement. This paper proposes a four-step preprocessing of datasets for improving the bioassay predictions. The first step is instance selection in which dataset is categorized into training, testing, and validation sets. The second step is discretization that partitions the data in consideration of accuracy vs. precision. The third step is normalization where data are normalized between 0 and 1 for subsequent machine learning processing. The fourth step is feature selection where key chemical properties and attributes are generated. The streamlined results are then analyzed for the prediction of effectiveness by various machine learning algorithms including Pipeline Pilot, R, Weka, and Excel. Experiments and evaluations reveal the effectiveness of various combination of preprocessing steps and machine learning algorithms in more consistent and accurate prediction.
Keywords: Bioassay, machine learning, preprocessing, virtual screen.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 981172 Attacks Classification in Adaptive Intrusion Detection using Decision Tree
Authors: Dewan Md. Farid, Nouria Harbi, Emna Bahri, Mohammad Zahidur Rahman, Chowdhury Mofizur Rahman
Abstract:
Recently, information security has become a key issue in information technology as the number of computer security breaches are exposed to an increasing number of security threats. A variety of intrusion detection systems (IDS) have been employed for protecting computers and networks from malicious network-based or host-based attacks by using traditional statistical methods to new data mining approaches in last decades. However, today's commercially available intrusion detection systems are signature-based that are not capable of detecting unknown attacks. In this paper, we present a new learning algorithm for anomaly based network intrusion detection system using decision tree algorithm that distinguishes attacks from normal behaviors and identifies different types of intrusions. Experimental results on the KDD99 benchmark network intrusion detection dataset demonstrate that the proposed learning algorithm achieved 98% detection rate (DR) in comparison with other existing methods.Keywords: Detection rate, decision tree, intrusion detectionsystem, network security.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3628171 Effect of Personality Traits on Classification of Political Orientation
Authors: Vesile Evrim, Aliyu Awwal
Abstract:
Today, there is a large number of political transcripts available on the Web to be mined and used for statistical analysis, and product recommendations. As the online political resources are used for various purposes, automatically determining the political orientation on these transcripts becomes crucial. The methodologies used by machine learning algorithms to do an automatic classification are based on different features that are classified under categories such as Linguistic, Personality etc. Considering the ideological differences between Liberals and Conservatives, in this paper, the effect of Personality traits on political orientation classification is studied. The experiments in this study were based on the correlation between LIWC features and the BIG Five Personality traits. Several experiments were conducted using Convote U.S. Congressional- Speech dataset with seven benchmark classification algorithms. The different methodologies were applied on several LIWC feature sets that constituted by 8 to 64 varying number of features that are correlated to five personality traits. As results of experiments, Neuroticism trait was obtained to be the most differentiating personality trait for classification of political orientation. At the same time, it was observed that the personality trait based classification methodology gives better and comparable results with the related work.Keywords: Politics, personality traits, LIWC, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2162170 Decision Trees for Predicting Risk of Mortality using Routinely Collected Data
Authors: Tessy Badriyah, Jim S. Briggs, Dave R. Prytherch
Abstract:
It is well known that Logistic Regression is the gold standard method for predicting clinical outcome, especially predicting risk of mortality. In this paper, the Decision Tree method has been proposed to solve specific problems that commonly use Logistic Regression as a solution. The Biochemistry and Haematology Outcome Model (BHOM) dataset obtained from Portsmouth NHS Hospital from 1 January to 31 December 2001 was divided into four subsets. One subset of training data was used to generate a model, and the model obtained was then applied to three testing datasets. The performance of each model from both methods was then compared using calibration (the χ2 test or chi-test) and discrimination (area under ROC curve or c-index). The experiment presented that both methods have reasonable results in the case of the c-index. However, in some cases the calibration value (χ2) obtained quite a high result. After conducting experiments and investigating the advantages and disadvantages of each method, we can conclude that Decision Trees can be seen as a worthy alternative to Logistic Regression in the area of Data Mining.Keywords: Decision Trees, Logistic Regression, clinical outcome, risk of mortality.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2523169 Flexible Cities: A Multisided Spatial Application of Tracking Livability of Urban Environment
Authors: Maria Christofi, George Plastiras, Rafaella Elia, Vaggelis Tsiourtis, Theocharis Theocharides, Miltiadis Katsaros
Abstract:
The rapidly expanding urban areas of the world constitute a challenge of how we need to make the transition to "the next urbanization", which will be defined by new analytical tools and new sources of data. This paper is about the production of a spatial application, the ‘FUMapp’, where space and its initiative will be available literally, in meters, but also abstractly, at a sensed level. While existing spatial applications typically focus on illustrations of the urban infrastructure, the suggested application goes beyond the existing: It investigates how our environment's perception adapts to the alterations of the built environment through a dataset construction of biophysical measurements (eye-tracking, heart beating), and physical metrics (spatial characteristics, size of stimuli, rhythm of mobility). It explores the intersections between architecture, cognition, and computing where future design can be improved and identifies the flexibility and livability of the ‘available space’ of specific examined urban paths.
Keywords: Biophysical data, flexibility of urban, livability, next urbanization, spatial application.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1034168 SC-LSH: An Efficient Indexing Method for Approximate Similarity Search in High Dimensional Space
Authors: Sanaa Chafik, ImaneDaoudi, Mounim A. El Yacoubi, Hamid El Ouardi
Abstract:
Locality Sensitive Hashing (LSH) is one of the most promising techniques for solving nearest neighbour search problem in high dimensional space. Euclidean LSH is the most popular variation of LSH that has been successfully applied in many multimedia applications. However, the Euclidean LSH presents limitations that affect structure and query performances. The main limitation of the Euclidean LSH is the large memory consumption. In order to achieve a good accuracy, a large number of hash tables is required. In this paper, we propose a new hashing algorithm to overcome the storage space problem and improve query time, while keeping a good accuracy as similar to that achieved by the original Euclidean LSH. The Experimental results on a real large-scale dataset show that the proposed approach achieves good performances and consumes less memory than the Euclidean LSH.
Keywords: Approximate Nearest Neighbor Search, Content based image retrieval (CBIR), Curse of dimensionality, Locality sensitive hashing, Multidimensional indexing, Scalability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2577167 Effects of Introducing Similarity Measures into Artificial Bee Colony Approach for Optimization of Vehicle Routing Problem
Authors: P. Shunmugapriya, S. Kanmani, P. Jude Fredieric, U. Vignesh, J. Reman Justin, K. Vivek
Abstract:
Vehicle Routing Problem (VRP) is a complex combinatorial optimization problem and it is quite difficult to find an optimal solution consisting of a set of routes for vehicles whose total cost is minimum. Evolutionary and swarm intelligent (SI) algorithms play a vital role in solving optimization problems. While the SI algorithms perform search, the diversity between the solutions they exploit is very important. This is because of the need to avoid early convergence and to get an appropriate balance between the exploration and exploitation. Therefore, it is important to check how far the solutions are diverse. In this paper, we measure the similarity between solutions, which ABC exploits while optimizing VRP. The similar solutions found are discarded at the end of the iteration and only unique solutions are passed on to the next iteration. The bees of discarded solutions become scouts and they start searching for new solutions. This process is continued and results show that the solution is optimized at lesser number of iterations but with the overhead of computing similarity in all the iterations. The problem instance from Solomon benchmarked dataset has been used for evaluating the presented methodology.
Keywords: ABC algorithm, vehicle routing problem, optimization, Jaccard’s similarity measure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 845166 Defects in Open Source Software: The Role of Online Forums
Authors: Faheem Ahmed, Piers Campbell, Ahmad Jaffar, Luiz Capretz
Abstract:
Free and open source software is gaining popularity at an unprecedented rate of growth. Organizations despite some concerns about the quality have been using them for various purposes. One of the biggest concerns about free and open source software is post release software defects and their fixing. Many believe that there is no appropriate support available to fix the bugs. On the contrary some believe that due to the active involvement of internet user in online forums, they become a major source of communicating the identification and fixing of defects in open source software. The research model of this empirical investigation establishes and studies the relationship between open source software defects and online public forums. The results of this empirical study provide evidence about the realities of software defects myths of open source software. We used a dataset consist of 616 open source software projects covering a broad range of categories to study the research model of this investigation. The results of this investigation show that online forums play a significant role identifying and fixing the defects in open source software.Keywords: About Open source software, software engineering, software defect management, empirical software engineering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1776