Search results for: Data mining classification algorithms
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9097

Search results for: Data mining classification algorithms

8617 Comparison between Different Classifications of Periodontal Diseases and Their Advantages

Authors: Ilma Robo, Saimir Heta, Merilda Tarja, Sonila Kapaj, Eduart Kapaj, Geriona Lasku

Abstract:

The classification of periodontal diseases has changed significantly in favor of simplifying the protocol of diagnosis and periodontal treatment. This review study aims to highlight the latest publications in the new periodontal disease classification, talking about the most significant differences versus the old classification with the tendency to express the advantages or disadvantages of clinical application. The aim of the study also includes the growing tendency to link the way of classification of periodontal diseases with predetermined protocols of periodontal treatment of the diagnoses included in the classification. The new classification of periodontal diseases is rather comprehensive in its subdivisions, as the disease is viewed in its entirety, with the biological dimensions of the disease, the degree of aggravation and progression of the disease, in relation to risk factors, predisposition to patient susceptibility and impact of periodontal disease to the general health status of the patient.

Keywords: Periodontal diseases, clinical application, periodontal treatment, oral diagnosis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 597
8616 A Novel Modified Adaptive Fuzzy Inference Engine and Its Application to Pattern Classification

Authors: J. Hossen, A. Rahman, K. Samsudin, F. Rokhani, S. Sayeed, R. Hasan

Abstract:

The Neuro-Fuzzy hybridization scheme has become of research interest in pattern classification over the past decade. The present paper proposes a novel Modified Adaptive Fuzzy Inference Engine (MAFIE) for pattern classification. A modified Apriori algorithm technique is utilized to reduce a minimal set of decision rules based on input output data sets. A TSK type fuzzy inference system is constructed by the automatic generation of membership functions and rules by the fuzzy c-means clustering and Apriori algorithm technique, respectively. The generated adaptive fuzzy inference engine is adjusted by the least-squares fit and a conjugate gradient descent algorithm towards better performance with a minimal set of rules. The proposed MAFIE is able to reduce the number of rules which increases exponentially when more input variables are involved. The performance of the proposed MAFIE is compared with other existing applications of pattern classification schemes using Fisher-s Iris and Wisconsin breast cancer data sets and shown to be very competitive.

Keywords: Apriori algorithm, Fuzzy C-means, MAFIE, TSK

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1931
8615 Using Time-Series NDVI to Model Land Cover Change: A Case Study in the Berg River Catchment Area, Western Cape, South Africa

Authors: A. S. Adesuyi, Z. Munch

Abstract:

This study investigates the use of a time-series of MODIS NDVI data to identify agricultural land cover change on an annual time step (2007 - 2012) and characterize the trend. Following an ISODATA classification of the MODIS imagery to selectively mask areas not agriculture or semi-natural, NDVI signatures were created to identify areas cereals and vineyards with the aid of ancillary, pictometry and field sample data for 2010. The NDVI signature curve and training samples were used to create a decision tree model in WEKA 3.6.9 using decision tree classifier (J48) algorithm; Model 1 including ISODATA classification and Model 2 not. These two models were then used to classify all data for the study area for 2010, producing land cover maps with classification accuracies of 77% and 80% for Model 1 and 2 respectively. Model 2 was subsequently used to create land cover classification and change detection maps for all other years. Subtle changes and areas of consistency (unchanged) were observed in the agricultural classes and crop practices. Over the years as predicted by the land cover classification. Forty one percent of the catchment comprised of cereals with 35% possibly following a crop rotation system. Vineyards largely remained constant with only one percent conversion to vineyard from other land cover classes.

Keywords: Change detection, Land cover, NDVI, time-series.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2291
8614 Genetic Programming Based Data Projections for Classification Tasks

Authors: César Estébanez, Ricardo Aler, José M. Valls

Abstract:

In this paper we present a GP-based method for automatically evolve projections, so that data can be more easily classified in the projected spaces. At the same time, our approach can reduce dimensionality by constructing more relevant attributes. Fitness of each projection measures how easy is to classify the dataset after applying the projection. This is quickly computed by a Simple Linear Perceptron. We have tested our approach in three domains. The experiments show that it obtains good results, compared to other Machine Learning approaches, while reducing dimensionality in many cases.

Keywords: Classification, genetic programming, projections.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1398
8613 Induction of Expressive Rules using the Binary Coding Method

Authors: Seyed R Mousavi

Abstract:

In most rule-induction algorithms, the only operator used against nominal attributes is the equality operator =. In this paper, we first propose the use of the inequality operator, , in addition to the equality operator, to increase the expressiveness of induced rules. Then, we present a new method, Binary Coding, which can be used along with an arbitrary rule-induction algorithm to make use of the inequality operator without any need to change the algorithm. Experimental results suggest that the Binary Coding method is promising enough for further investigation, especially in cases where the minimum number of rules is desirable.

Keywords: Data mining, Inequality operator, Number of rules, Rule-induction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1258
8612 Discovery of Time Series Event Patterns based on Time Constraints from Textual Data

Authors: Shigeaki Sakurai, Ken Ueno, Ryohei Orihara

Abstract:

This paper proposes a method that discovers time series event patterns from textual data with time information. The patterns are composed of sequences of events and each event is extracted from the textual data, where an event is characteristic content included in the textual data such as a company name, an action, and an impression of a customer. The method introduces 7 types of time constraints based on the analysis of the textual data. The method also evaluates these constraints when the frequency of a time series event pattern is calculated. We can flexibly define the time constraints for interesting combinations of events and can discover valid time series event patterns which satisfy these conditions. The paper applies the method to daily business reports collected by a sales force automation system and verifies its effectiveness through numerical experiments.

Keywords: Text mining, sequential mining, time constraints, daily business reports.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1488
8611 An Efficient Classification Method for Inverse Synthetic Aperture Radar Images

Authors: Sang-Hong Park

Abstract:

This paper proposes an efficient method to classify inverse synthetic aperture (ISAR) images. Because ISAR images can be translated and rotated in the 2-dimensional image place, invariance to the two factors is indispensable for successful classification. The proposed method achieves invariance to translation and rotation of ISAR images using a combination of two-dimensional Fourier transform, polar mapping and correlation-based alignment of the image. Classification is conducted using a simple matching score classifier. In simulations using the real ISAR images of five scaled models measured in a compact range, the proposed method yields classification ratios higher than 97 %.

Keywords: Radar, ISAR, radar target classification, radar imaging.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2194
8610 Dynamic Clustering using Particle Swarm Optimization with Application in Unsupervised Image Classification

Authors: Mahamed G.H. Omran, Andries P Engelbrecht, Ayed Salman

Abstract:

A new dynamic clustering approach (DCPSO), based on Particle Swarm Optimization, is proposed. This approach is applied to unsupervised image classification. The proposed approach automatically determines the "optimum" number of clusters and simultaneously clusters the data set with minimal user interference. The algorithm starts by partitioning the data set into a relatively large number of clusters to reduce the effects of initial conditions. Using binary particle swarm optimization the "best" number of clusters is selected. The centers of the chosen clusters is then refined via the Kmeans clustering algorithm. The experiments conducted show that the proposed approach generally found the "optimum" number of clusters on the tested images.

Keywords: Clustering Validation, Particle Swarm Optimization, Unsupervised Clustering, Unsupervised Image Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2454
8609 MIBiClus: Mutual Information based Biclustering Algorithm

Authors: Neelima Gupta, Seema Aggarwal

Abstract:

Most of the biclustering/projected clustering algorithms are based either on the Euclidean distance or correlation coefficient which capture only linear relationships. However, in many applications, like gene expression data and word-document data, non linear relationships may exist between the objects. Mutual Information between two variables provides a more general criterion to investigate dependencies amongst variables. In this paper, we improve upon our previous algorithm that uses mutual information for biclustering in terms of computation time and also the type of clusters identified. The algorithm is able to find biclusters with mixed relationships and is faster than the previous one. To the best of our knowledge, none of the other existing algorithms for biclustering have used mutual information as a similarity measure. We present the experimental results on synthetic data as well as on the yeast expression data. Biclusters on the yeast data were found to be biologically and statistically significant using GO Tool Box and FuncAssociate.

Keywords: Biclustering, mutual information.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1631
8608 A New Approach for Fingerprint Classification based on Minutiae Distribution

Authors: Jayant V Kulkarni, Jayadevan R, Suresh N Mali, Hemant K Abhyankar, Raghunath S Holambe

Abstract:

The paper describes a new approach for fingerprint classification, based on the distribution of local features (minute details or minutiae) of the fingerprints. The main advantage is that fingerprint classification provides an indexing scheme to facilitate efficient matching in a large fingerprint database. A set of rules based on heuristic approach has been proposed. The area around the core point is treated as the area of interest for extracting the minutiae features as there are substantial variations around the core point as compared to the areas away from the core point. The core point in a fingerprint has been located at a point where there is maximum curvature. The experimental results report an overall average accuracy of 86.57 % in fingerprint classification.

Keywords: Minutiae distribution, Minutiae, Classification, Orientation, Heuristic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1568
8607 Evolving a Fuzzy Rule-Base for Image Segmentation

Authors: A. Borji, M. Hamidi

Abstract:

A new method for color image segmentation using fuzzy logic is proposed in this paper. Our aim here is to automatically produce a fuzzy system for color classification and image segmentation with least number of rules and minimum error rate. Particle swarm optimization is a sub class of evolutionary algorithms that has been inspired from social behavior of fishes, bees, birds, etc, that live together in colonies. We use comprehensive learning particle swarm optimization (CLPSO) technique to find optimal fuzzy rules and membership functions because it discourages premature convergence. Here each particle of the swarm codes a set of fuzzy rules. During evolution, a population member tries to maximize a fitness criterion which is here high classification rate and small number of rules. Finally, particle with the highest fitness value is selected as the best set of fuzzy rules for image segmentation. Our results, using this method for soccer field image segmentation in Robocop contests shows 89% performance. Less computational load is needed when using this method compared with other methods like ANFIS, because it generates a smaller number of fuzzy rules. Large train dataset and its variety, makes the proposed method invariant to illumination noise

Keywords: Comprehensive learning Particle Swarmoptimization, fuzzy classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1957
8606 On the Efficient Implementation of a Serial and Parallel Decomposition Algorithm for Fast Support Vector Machine Training Including a Multi-Parameter Kernel

Authors: Tatjana Eitrich, Bruno Lang

Abstract:

This work deals with aspects of support vector machine learning for large-scale data mining tasks. Based on a decomposition algorithm for support vector machine training that can be run in serial as well as shared memory parallel mode we introduce a transformation of the training data that allows for the usage of an expensive generalized kernel without additional costs. We present experiments for the Gaussian kernel, but usage of other kernel functions is possible, too. In order to further speed up the decomposition algorithm we analyze the critical problem of working set selection for large training data sets. In addition, we analyze the influence of the working set sizes onto the scalability of the parallel decomposition scheme. Our tests and conclusions led to several modifications of the algorithm and the improvement of overall support vector machine learning performance. Our method allows for using extensive parameter search methods to optimize classification accuracy.

Keywords: Support Vector Machine Training, Multi-ParameterKernels, Shared Memory Parallel Computing, Large Data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1443
8605 A Study of Soil Heavy Metal Pollution in the Manganese Mining in Drama, Greece

Authors: A. Argiri, A. Molla, Tzouvalekas, E. Skoufogianni, N. Danalatos

Abstract:

The release of heavy metals into the environment has increased over the last years. In this study, 25 soil samples (0-15 cm) from the fields near the mining area in Drama region were selected. The samples were analyzed in the laboratory for their physicochemical properties and for seven “pseudo-total’’ heavy metals content, namely Pb, Zn, Cd, Cr, Cu, Ni, and Mn. The total metal concentrations (Pb, Zn, Cd, Cr, Cu, Ni and Mn) in digests were determined by using the atomic absorption spectrophotometer. According to the results, the mean concentration of the listed heavy metals in 25 soil samples are Cd 1.1 mg/kg, Cr 15 mg/kg, Cu 21.7 mg/kg, Ni 30.1 mg/kg, Pd 50.8 mg/kg, Zn 99.5 mg/kg and Mn 815.3 mg/kg. The results show that the heavy metals remain in the soil even if the mining closed many years ago.

Keywords: Greece, heavy metals, mining, pollution

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 584
8604 Content-based Indoor/Outdoor Video Classification System for a Mobile Platform

Authors: Mitko Veta, Tomislav Kartalov, Zoran Ivanovski

Abstract:

Organization of video databases is becoming difficult task as the amount of video content increases. Video classification based on the content of videos can significantly increase the speed of tasks such as browsing and searching for a particular video in a database. In this paper, a content-based videos classification system for the classes indoor and outdoor is presented. The system is intended to be used on a mobile platform with modest resources. The algorithm makes use of the temporal redundancy in videos, which allows using an uncomplicated classification model while still achieving reasonable accuracy. The training and evaluation was done on a video database of 443 videos downloaded from a video sharing service. A total accuracy of 87.36% was achieved.

Keywords: Indoor/outdoor, video classification, imageclassification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1524
8603 Chilean Wines Classification based only on Aroma Information

Authors: Nicolás H. Beltrán, Manuel A. Duarte-Mermoud, Víctor A. Soto, Sebastián A. Salah, and Matías A. Bustos

Abstract:

Results of Chilean wine classification based on the information provided by an electronic nose are reported in this paper. The classification scheme consists of two parts; in the first stage, Principal Component Analysis is used as feature extraction method to reduce the dimensionality of the original information. Then, Radial Basis Functions Neural Networks is used as pattern recognition technique to perform the classification. The objective of this study is to classify different Cabernet Sauvignon, Merlot and Carménère wine samples from different years, valleys and vineyards of Chile.

Keywords: Feature extraction techniques, Pattern recognitiontechniques, Principal component analysis, Radial basis functionsneural networks, Wine classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1547
8602 Association of Smoking with Chest Radiographic and Lung Function Findings in Retired Bauxite Mining Workers

Authors: L. R. Ferreira, R. C. G. Bianchi, L. C.R. Ferreira, C. M. Galhardi, E. P. Baciuk, L. H. Oliveira

Abstract:

Inhalation hazards are associated with potentially injurious exposure and increased risk for lung diseases, within the bauxite mining industry, especially for the smelter workers. Smoking is related to decreased lung function and leads to chronic lung diseases. This study had the objective to evaluate whether smoking is related to functional and radiographic respiratory changes in retired bauxite mining workers. Methods: This was a retrospective and cross-sectional study involving the analysis of database information of 140 retired bauxite mining workers from Poços de Caldas-MG evaluated at Worker’s Health Reference Center and at the Social Security Brazilian National Institute, from July 1st, 2015 until June 30th, 2016. The workers were divided into three groups: non-smokers (n = 47), ex-smokers (n = 46), and smokers (n = 47). The data included: age, gender, spirometry results, and the presence or not of pulmonary pleural and/or parenchymal changes in chest radiographs. Chi-Squared test was used (p < 0,05). Results: In the smokers’ group, 83% of spirometry tests and 64% of chest x-rays were altered. In the non-smokers’ group, 19% of spirometry tests and 13% of chest x-rays were altered. In the ex-smokers’ group, 35% of spirometry tests and 30% of chest x-rays were altered. Most of the results were statistically significant. Results demonstrated a significant difference between smokers’ and non-smokers’ groups in regard to spirometric and radiographic pulmonary alterations. Ex-smokers’ and non-smokers’ group demonstrated better results when compared to the smokers’ group in relation to altered spirometry and radiograph findings. These data may contribute to planning strategies to enhance smoking cessation programs within the bauxite mining industry.

Keywords: Bauxite mining, spirometry, chest radiography, smoking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 701
8601 Towards Clustering of Web-based Document Structures

Authors: Matthias Dehmer, Frank Emmert Streib, Jürgen Kilian, Andreas Zulauf

Abstract:

Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g., improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.

Keywords: Clustering methods, graph-based patterns, graph similarity, hypertext structures, web structure mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1507
8600 A Human Activity Recognition System Based On Sensory Data Related to Object Usage

Authors: M. Abdullah-Al-Wadud

Abstract:

Sensor-based Activity Recognition systems usually accounts which sensors have been activated to perform an activity. The system then combines the conditional probabilities of those sensors to represent different activities and takes the decision based on that. However, the information about the sensors which are not activated may also be of great help in deciding which activity has been performed. This paper proposes an approach where the sensory data related to both usage and non-usage of objects are utilized to make the classification of activities. Experimental results also show the promising performance of the proposed method.

Keywords: Naïve Bayesian-based classification, Activity recognition, sensor data, object-usage model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1826
8599 An ensemble of Weighted Support Vector Machines for Ordinal Regression

Authors: Willem Waegeman, Luc Boullart

Abstract:

Instead of traditional (nominal) classification we investigate the subject of ordinal classification or ranking. An enhanced method based on an ensemble of Support Vector Machines (SVM-s) is proposed. Each binary classifier is trained with specific weights for each object in the training data set. Experiments on benchmark datasets and synthetic data indicate that the performance of our approach is comparable to state of the art kernel methods for ordinal regression. The ensemble method, which is straightforward to implement, provides a very good sensitivity-specificity trade-off for the highest and lowest rank.

Keywords: Ordinal regression, support vector machines, ensemblelearning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1643
8598 Mining Association Rules from Unstructured Documents

Authors: Hany Mahgoub

Abstract:

This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transform unstructured documents into structured documents) with Information Retrieval scheme (TF-IDF) and Data Mining technique for association rules extraction. EART depends on word feature to extract association rules. It consists of four phases: structure phase, index phase, text mining phase and visualization phase. Our work depends on the analysis of the keywords in the extracted association rules through the co-occurrence of the keywords in one sentence in the original text and the existing of the keywords in one sentence without co-occurrence. Experiments applied on a collection of scientific documents selected from MEDLINE that are related to the outbreak of H5N1 avian influenza virus.

Keywords: Association rules, information retrieval, knowledgediscovery in text, text mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2442
8597 Ontology-Based Backpropagation Neural Network Classification and Reasoning Strategy for NoSQL and SQL Databases

Authors: Hao-Hsiang Ku, Ching-Ho Chi

Abstract:

Big data applications have become an imperative for many fields. Many researchers have been devoted into increasing correct rates and reducing time complexities. Hence, the study designs and proposes an Ontology-based backpropagation neural network classification and reasoning strategy for NoSQL big data applications, which is called ON4NoSQL. ON4NoSQL is responsible for enhancing the performances of classifications in NoSQL and SQL databases to build up mass behavior models. Mass behavior models are made by MapReduce techniques and Hadoop distributed file system based on Hadoop service platform. The reference engine of ON4NoSQL is the ontology-based backpropagation neural network classification and reasoning strategy. Simulation results indicate that ON4NoSQL can efficiently achieve to construct a high performance environment for data storing, searching, and retrieving.

Keywords: Hadoop, NoSQL, ontology, backpropagation neural network, and high distributed file system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 999
8596 Quantification of GHGs Emissions from Electricity and Diesel Fuel Consumption in Basalt Mining Industry in Thailand

Authors: S. Kittipongvises, A. Dubsok

Abstract:

The mineral and mining industry is necessary for countries to have an adequate and reliable supply of materials to meet their socio-economic development. Despite its importance, the environmental impacts from mineral exploration are hugely significant. This study aimed to investigate and quantify the amount of GHGs emissions emitted from both electricity and diesel vehicle fuel consumption in basalt mining in Thailand. Plant A, located in the northeastern region of Thailand, was selected as a case study. Results indicated that total GHGs emissions from basalt mining and operation (Plant A) were approximately 2,501,086 kgCO2e and 1,997,412 kgCO2e in 2014 and 2015, respectively. The estimated carbon intensity ranged between 1.824 kgCO2e to 2.284 kgCO2e per ton of rock product. Scope 1 (direct emissions) was the dominant driver of its total GHGs compared to scope 2 (indirect emissions). As such, transport related combustion of diesel fuels generated the highest GHGs emission (65%) compared to emissions from purchased electricity (35%). Some of the potential implications for mining entities were also presented.

Keywords: Basalt mining, diesel fuel, electricity, GHGs emissions, Thailand.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1055
8595 Classification and Resolving Urban Problems by Means of Fuzzy Approach

Authors: F. Habib, A. Shokoohi

Abstract:

Urban problems are problems of organized complexity. Thus, many models and scientific methods to resolve urban problems are failed. This study is concerned with proposing of a fuzzy system driven approach for classification and solving urban problems. The proposed study investigated mainly the selection of the inputs and outputs of urban systems for classification of urban problems. In this research, five categories of urban problems, respect to fuzzy system approach had been recognized: control, polytely, optimizing, open and decision making problems. Grounded Theory techniques were then applied to analyze the data and develop new solving method for each category. The findings indicate that the fuzzy system methods are powerful processes and analytic tools for helping planners to resolve urban complex problems. These tools can be successful where as others have failed because both incorporate or address uncertainty and risk; complexity and systems interacting with other systems.

Keywords: Classification, complexity, Fuzzy theory, urban problems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2114
8594 Classification of Prostate Cell Nuclei using Artificial Neural Network Methods

Authors: M. Sinecen, M. Makinacı

Abstract:

The purpose of this paper is to assess the value of neural networks for classification of cancer and noncancer prostate cells. Gauss Markov Random Fields, Fourier entropy and wavelet average deviation features are calculated from 80 noncancer and 80 cancer prostate cell nuclei. For classification, artificial neural network techniques which are multilayer perceptron, radial basis function and learning vector quantization are used. Two methods are utilized for multilayer perceptron. First method has single hidden layer and between 3-15 nodes, second method has two hidden layer and each layer has between 3-15 nodes. Overall classification rate of 86.88% is achieved.

Keywords: Artificial neural networks, texture classification, cancer diagnosis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1591
8593 Protein Secondary Structure Prediction Using Parallelized Rule Induction from Coverings

Authors: Leong Lee, Cyriac Kandoth, Jennifer L. Leopold, Ronald L. Frank

Abstract:

Protein 3D structure prediction has always been an important research area in bioinformatics. In particular, the prediction of secondary structure has been a well-studied research topic. Despite the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of various computational prediction algorithms rarely has exceeded 75%. In a previous paper [1], this research team presented a rule-based method called RT-RICO (Relaxed Threshold Rule Induction from Coverings) to predict protein secondary structure. The average Q3 accuracy on the sample datasets using RT-RICO was 80.3%, an improvement over comparable computational methods. Although this demonstrated that RT-RICO might be a promising approach for predicting secondary structure, the algorithm-s computational complexity and program running time limited its use. Herein a parallelized implementation of a slightly modified RT-RICO approach is presented. This new version of the algorithm facilitated the testing of a much larger dataset of 396 protein domains [2]. Parallelized RTRICO achieved a Q3 score of 74.6%, which is higher than the consensus prediction accuracy of 72.9% that was achieved for the same test dataset by a combination of four secondary structure prediction methods [2].

Keywords: data mining, protein secondary structure prediction, parallelization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1596
8592 Sensitivity Analysis during the Optimization Process Using Genetic Algorithms

Authors: M. A. Rubio, A. Urquia

Abstract:

Genetic algorithms (GA) are applied to the solution of high-dimensional optimization problems. Additionally, sensitivity analysis (SA) is usually carried out to determine the effect on optimal solutions of changes in parameter values of the objective function. These two analyses (i.e., optimization and sensitivity analysis) are computationally intensive when applied to high-dimensional functions. The approach presented in this paper consists in performing the SA during the GA execution, by statistically analyzing the data obtained of running the GA. The advantage is that in this case SA does not involve making additional evaluations of the objective function and, consequently, this proposed approach requires less computational effort than conducting optimization and SA in two consecutive steps.

Keywords: Optimization, sensitivity, genetic algorithms, model calibration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1474
8591 The Classification Performance in Parametric and Nonparametric Discriminant Analysis for a Class- Unbalanced Data of Diabetes Risk Groups

Authors: Lily Ingsrisawang, Tasanee Nacharoen

Abstract:

The problems arising from unbalanced data sets generally appear in real world applications. Due to unequal class distribution, many researchers have found that the performance of existing classifiers tends to be biased towards the majority class. The k-nearest neighbors’ nonparametric discriminant analysis is a method that was proposed for classifying unbalanced classes with good performance. In this study, the methods of discriminant analysis are of interest in investigating misclassification error rates for classimbalanced data of three diabetes risk groups. The purpose of this study was to compare the classification performance between parametric discriminant analysis and nonparametric discriminant analysis in a three-class classification of class-imbalanced data of diabetes risk groups. Data from a project maintaining healthy conditions for 599 employees of a government hospital in Bangkok were obtained for the classification problem. The employees were divided into three diabetes risk groups: non-risk (90%), risk (5%), and diabetic (5%). The original data including the variables of diabetes risk group, age, gender, blood glucose, and BMI were analyzed and bootstrapped for 50 and 100 samples, 599 observations per sample, for additional estimation of the misclassification error rate. Each data set was explored for the departure of multivariate normality and the equality of covariance matrices of the three risk groups. Both the original data and the bootstrap samples showed nonnormality and unequal covariance matrices. The parametric linear discriminant function, quadratic discriminant function, and the nonparametric k-nearest neighbors’ discriminant function were performed over 50 and 100 bootstrap samples and applied to the original data. Searching the optimal classification rule, the choices of prior probabilities were set up for both equal proportions (0.33: 0.33: 0.33) and unequal proportions of (0.90:0.05:0.05), (0.80: 0.10: 0.10) and (0.70, 0.15, 0.15). The results from 50 and 100 bootstrap samples indicated that the k-nearest neighbors approach when k=3 or k=4 and the defined prior probabilities of non-risk: risk: diabetic as 0.90: 0.05:0.05 or 0.80:0.10:0.10 gave the smallest error rate of misclassification. The k-nearest neighbors approach would be suggested for classifying a three-class-imbalanced data of diabetes risk groups.

Keywords: Bootstrap, diabetes risk groups, error rate, k-nearest neighbors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2008
8590 Automatic Fingerprint Classification Using Graph Theory

Authors: Mana Tarjoman, Shaghayegh Zarei

Abstract:

Using efficient classification methods is necessary for automatic fingerprint recognition system. This paper introduces a new structural approach to fingerprint classification by using the directional image of fingerprints to increase the number of subclasses. In this method, the directional image of fingerprints is segmented into regions consisting of pixels with the same direction. Afterwards the relational graph to the segmented image is constructed and according to it, the super graph including prominent information of this graph is formed. Ultimately we apply a matching technique to compare obtained graph with the model graphs in order to classify fingerprints by using cost function. Increasing the number of subclasses with acceptable accuracy in classification and faster processing in fingerprints recognition, makes this system superior.

Keywords: Classification, Directional image, Fingerprint, Graph, Super graph.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3634
8589 Incorporating Multiple Supervised Learning Algorithms for Effective Intrusion Detection

Authors: Umar Albalawi, Sang C. Suh, Jinoh Kim

Abstract:

As internet continues to expand its usage with an  enormous number of applications, cyber-threats have significantly  increased accordingly. Thus, accurate detection of malicious traffic in  a timely manner is a critical concern in today’s Internet for security.  One approach for intrusion detection is to use Machine Learning (ML)  techniques. Several methods based on ML algorithms have been  introduced over the past years, but they are largely limited in terms of  detection accuracy and/or time and space complexity to run. In this  work, we present a novel method for intrusion detection that  incorporates a set of supervised learning algorithms. The proposed  technique provides high accuracy and outperforms existing techniques  that simply utilizes a single learning method. In addition, our  technique relies on partial flow information (rather than full  information) for detection, and thus, it is light-weight and desirable for  online operations with the property of early identification. With the  mid-Atlantic CCDC intrusion dataset publicly available, we show that  our proposed technique yields a high degree of detection rate over 99%  with a very low false alarm rate (0.4%). 

 

Keywords: Intrusion Detection, Supervised Learning, Traffic Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2035
8588 Wavelet - Based Classification of Outdoor Natural Scenes by Resilient Neural Network

Authors: Amitabh Wahi, Sundaramurthy S.

Abstract:

Natural outdoor scene classification is active and promising research area around the globe. In this study, the classification is carried out in two phases. In the first phase, the features are extracted from the images by wavelet decomposition method and stored in a database as feature vectors. In the second phase, the neural classifiers such as back-propagation neural network (BPNN) and resilient back-propagation neural network (RPNN) are employed for the classification of scenes. Four hundred color images are considered from MIT database of two classes as forest and street. A comparative study has been carried out on the performance of the two neural classifiers BPNN and RPNN on the increasing number of test samples. RPNN showed better classification results compared to BPNN on the large test samples.

Keywords: BPNN, Classification, Feature extraction, RPNN, Wavelet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1943