Search results for: Naïve Bayesian-based classification
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1144

Search results for: Naïve Bayesian-based classification

754 Rigorous Electromagnetic Model of Fourier Transform Infrared (FT-IR) Spectroscopic Imaging Applied to Automated Histology of Prostate Tissue Specimens

Authors: Rohith K Reddy, David Mayerich, Michael Walsh, P Scott Carney, Rohit Bhargava

Abstract:

Fourier transform infrared (FT-IR) spectroscopic imaging is an emerging technique that provides both chemically and spatially resolved information. The rich chemical content of data may be utilized for computer-aided determinations of structure and pathologic state (cancer diagnosis) in histological tissue sections for prostate cancer. FT-IR spectroscopic imaging of prostate tissue has shown that tissue type (histological) classification can be performed to a high degree of accuracy [1] and cancer diagnosis can be performed with an accuracy of about 80% [2] on a microscopic (≈ 6μm) length scale. In performing these analyses, it has been observed that there is large variability (more than 60%) between spectra from different points on tissue that is expected to consist of the same essential chemical constituents. Spectra at the edges of tissues are characteristically and consistently different from chemically similar tissue in the middle of the same sample. Here, we explain these differences using a rigorous electromagnetic model for light-sample interaction. Spectra from FT-IR spectroscopic imaging of chemically heterogeneous samples are different from bulk spectra of individual chemical constituents of the sample. This is because spectra not only depend on chemistry, but also on the shape of the sample. Using coupled wave analysis, we characterize and quantify the nature of spectral distortions at the edges of tissues. Furthermore, we present a method of performing histological classification of tissue samples. Since the mid-infrared spectrum is typically assumed to be a quantitative measure of chemical composition, classification results can vary widely due to spectral distortions. However, we demonstrate that the selection of localized metrics based on chemical information can make our data robust to the spectral distortions caused by scattering at the tissue boundary.

Keywords: Infrared, Spectroscopy, Imaging, Tissue classification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1634
753 Personal Information Classification Based on Deep Learning in Automatic Form Filling System

Authors: Shunzuo Wu, Xudong Luo, Yuanxiu Liao

Abstract:

Recently, the rapid development of deep learning makes artificial intelligence (AI) penetrate into many fields, replacing manual work there. In particular, AI systems also become a research focus in the field of automatic office. To meet real needs in automatic officiating, in this paper we develop an automatic form filling system. Specifically, it uses two classical neural network models and several word embedding models to classify various relevant information elicited from the Internet. When training the neural network models, we use less noisy and balanced data for training. We conduct a series of experiments to test my systems and the results show that our system can achieve better classification results.

Keywords: Personal information, deep learning, auto fill, NLP, document analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 861
752 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: Fuzzy C-means clustering, Fuzzy C-means clustering based attribute weighting, Pima Indians diabetes dataset, SVM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1763
751 A Novel Technique for Ferroresonance Identification in Distribution Networks

Authors: G. Mokryani, M. R. Haghifam, J. Esmaeilpoor

Abstract:

Happening of Ferroresonance phenomenon is one of the reasons of consuming and ruining transformers, so recognition of Ferroresonance phenomenon has a special importance. A novel method for classification of Ferroresonance presented in this paper. Using this method Ferroresonance can be discriminate from other transients such as capacitor switching, load switching, transformer switching. Wavelet transform is used for decomposition of signals and Competitive Neural Network used for classification. Ferroresonance data and other transients was obtained by simulation using EMTP program. Using Daubechies wavelet transform signals has been decomposed till six levels. The energy of six detailed signals that obtained by wavelet transform are used for training and trailing Competitive Neural Network. Results show that the proposed procedure is efficient in identifying Ferroresonance from other events.

Keywords: Competitive Neural Network, Ferroresonance, EMTP program, Wavelet transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1424
750 Using Data Mining Techniques for Finding Cardiac Outlier Patients

Authors: Farhan Ismaeel Dakheel, Raoof Smko, K. Negrat, Abdelsalam Almarimi

Abstract:

In this paper we used data mining techniques to identify outlier patients who are using large amount of drugs over a long period of time. Any healthcare or health insurance system should deal with the quantities of drugs utilized by chronic diseases patients. In Kingdom of Bahrain, about 20% of health budget is spent on medications. For the managers of healthcare systems, there is no enough information about the ways of drug utilization by chronic diseases patients, is there any misuse or is there outliers patients. In this work, which has been done in cooperation with information department in the Bahrain Defence Force hospital; we select the data for Cardiac patients in the period starting from 1/1/2008 to December 31/12/2008 to be the data for the model in this paper. We used three techniques for finding the drug utilization for cardiac patients. First we applied a clustering technique, followed by measuring of clustering validity, and finally we applied a decision tree as classification algorithm. The clustering results is divided into three clusters according to the drug utilization, for 1603 patients, who received 15,806 prescriptions during this period can be partitioned into three groups, where 23 patients (2.59%) who received 1316 prescriptions (8.32%) are classified to be outliers. The classification algorithm shows that the use of average drug utilization and the age, and the gender of the patient can be considered to be the main predictive factors in the induced model.

Keywords: Data Mining, Clustering, Classification, Drug Utilization..

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1898
749 Slice Bispectrogram Analysis-Based Classification of Environmental Sounds Using Convolutional Neural Network

Authors: Katsumi Hirata

Abstract:

Certain systems can function well only if they recognize the sound environment as humans do. In this research, we focus on sound classification by adopting a convolutional neural network and aim to develop a method that automatically classifies various environmental sounds. Although the neural network is a powerful technique, the performance depends on the type of input data. Therefore, we propose an approach via a slice bispectrogram, which is a third-order spectrogram and is a slice version of the amplitude for the short-time bispectrum. This paper explains the slice bispectrogram and discusses the effectiveness of the derived method by evaluating the experimental results using the ESC‑50 sound dataset. As a result, the proposed scheme gives high accuracy and stability. Furthermore, some relationship between the accuracy and non-Gaussianity of sound signals was confirmed.

Keywords: Bispectrum, convolutional neural network, environmental sound, slice bispectrogram, spectrogram.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 618
748 Selection of Best Band Combination for Soil Salinity Studies using ETM+ Satellite Images (A Case study: Nyshaboor Region,Iran)

Authors: Sanaeinejad, S. H.; A. Astaraei, . P. Mirhoseini.Mousavi, M. Ghaemi,

Abstract:

One of the main environmental problems which affect extensive areas in the world is soil salinity. Traditional data collection methods are neither enough for considering this important environmental problem nor accurate for soil studies. Remote sensing data could overcome most of these problems. Although satellite images are commonly used for these studies, however there are still needs to find the best calibration between the data and real situations in each specified area. Neyshaboor area, North East of Iran was selected as a field study of this research. Landsat satellite images for this area were used in order to prepare suitable learning samples for processing and classifying the images. 300 locations were selected randomly in the area to collect soil samples and finally 273 locations were reselected for further laboratory works and image processing analysis. Electrical conductivity of all samples was measured. Six reflective bands of ETM+ satellite images taken from the study area in 2002 were used for soil salinity classification. The classification was carried out using common algorithms based on the best composition bands. The results showed that the reflective bands 7, 3, 4 and 1 are the best band composition for preparing the color composite images. We also found out, that hybrid classification is a suitable method for identifying and delineation of different salinity classes in the area.

Keywords: Soil salinity, Remote sensing, Image processing, ETM+, Nyshaboor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2021
747 Wavelet-Based ECG Signal Analysis and Classification

Authors: Madina Hamiane, May Hashim Ali

Abstract:

This paper presents the processing and analysis of ECG signals. The study is based on wavelet transform and uses exclusively the MATLAB environment. This study includes removing Baseline wander and further de-noising through wavelet transform and metrics such as signal-to noise ratio (SNR), Peak signal-to-noise ratio (PSNR) and the mean squared error (MSE) are used to assess the efficiency of the de-noising techniques. Feature extraction is subsequently performed whereby signal features such as heart rate, rise and fall levels are extracted and the QRS complex was detected which helped in classifying the ECG signal. The classification is the last step in the analysis of the ECG signals and it is shown that these are successfully classified as Normal rhythm or Abnormal rhythm.  The final result proved the adequacy of using wavelet transform for the analysis of ECG signals.

Keywords: ECG Signal, QRS detection, thresholding, wavelet decomposition, feature extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1273
746 STATISTICA Software: A State of the Art Review

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, P. Ranjetha

Abstract:

Data mining idea is mounting rapidly in admiration and also in their popularity. The foremost aspire of data mining method is to extract data from a huge data set into several forms that could be comprehended for additional use. The data mining is a technology that contains with rich potential resources which could be supportive for industries and businesses that pay attention to collect the necessary information of the data to discover their customer’s performances. For extracting data there are several methods are available such as Classification, Clustering, Association, Discovering, and Visualization… etc., which has its individual and diverse algorithms towards the effort to fit an appropriate model to the data. STATISTICA mostly deals with excessive groups of data that imposes vast rigorous computational constraints. These results trials challenge cause the emergence of powerful STATISTICA Data Mining technologies. In this survey an overview of the STATISTICA software is illustrated along with their significant features.

Keywords: Data Mining, STATISTICA Data Miner, Text Miner, Enterprise Server, Classification, Association, Clustering, Regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2607
745 Automated Particle Picking based on Correlation Peak Shape Analysis and Iterative Classification

Authors: Hrabe Thomas, Beck Florian, Nickell Stephan

Abstract:

Cryo-electron microscopy (CEM) in combination with single particle analysis (SPA) is a widely used technique for elucidating structural details of macromolecular assemblies at closeto- atomic resolutions. However, development of automated software for SPA processing is still vital since thousands to millions of individual particle images need to be processed. Here, we present our workflow for automated particle picking. Our approach integrates peak shape analysis to the classical correlation and an iterative approach to separate macromolecules and background by classification. This particle selection workflow furthermore provides a robust means for SPA with little user interaction. Processing simulated and experimental data assesses performance of the presented tools.

Keywords: Cryo-electron Microscopy, Single Particle Analysis, Image Processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1668
744 A New Model for Question Answering Systems

Authors: Mohammad Reza Kangavari, Samira Ghandchi, Manak Golpour

Abstract:

Most of the Question Answering systems composed of three main modules: question processing, document processing and answer processing. Question processing module plays an important role in QA systems. If this module doesn't work properly, it will make problems for other sections. Moreover answer processing module is an emerging topic in Question Answering, where these systems are often required to rank and validate candidate answers. These techniques aiming at finding short and precise answers are often based on the semantic classification. This paper discussed about a new model for question answering which improved two main modules, question processing and answer processing. There are two important components which are the bases of the question processing. First component is question classification that specifies types of question and answer. Second one is reformulation which converts the user's question into an understandable question by QA system in a specific domain. Answer processing module, consists of candidate answer filtering, candidate answer ordering components and also it has a validation section for interacting with user. This module makes it more suitable to find exact answer. In this paper we have described question and answer processing modules with modeling, implementing and evaluating the system. System implemented in two versions. Results show that 'Version No.1' gave correct answer to 70% of questions (30 correct answers to 50 asked questions) and 'version No.2' gave correct answers to 94% of questions (47 correct answers to 50 asked questions).

Keywords: Answer Processing, Classification, QuestionAnswering and Query Reformulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2125
743 Heritage Tree Expert Assessment and Classification: Malaysian Perspective

Authors: B.-Y.-S. Lau, Y.-C.-T. Jonathan, M.-S. Alias

Abstract:

Heritage trees are natural large, individual trees with exceptionally value due to association with age or event or distinguished people. In Malaysia, there is an abundance of tropical heritage trees throughout the country. It is essential to set up a repository of heritage trees to prevent valuable trees from being cut down. In this cross domain study, a web-based online expert system namely the Heritage Tree Expert Assessment and Classification (HTEAC) is developed and deployed for public to nominate potential heritage trees. Based on the nomination, tree care experts or arborists would evaluate and verify the nominated trees as heritage trees. The expert system automatically rates the approved heritage trees according to pre-defined grades via Delphi technique. Features and usability test of the expert system are presented. Preliminary result is promising for the system to be used as a full scale public system.

Keywords: Arboriculture, Delphi, expert system, heritage tree, urban forestry.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1430
742 Dynamic Features Selection for Heart Disease Classification

Authors: Walid MOUDANI

Abstract:

The healthcare environment is generally perceived as being information rich yet knowledge poor. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. In fact, valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, a proficient methodology for the extraction of significant patterns from the Coronary Heart Disease warehouses for heart attack prediction, which unfortunately continues to be a leading cause of mortality in the whole world, has been presented. For this purpose, we propose to enumerate dynamically the optimal subsets of the reduced features of high interest by using rough sets technique associated to dynamic programming. Therefore, we propose to validate the classification using Random Forest (RF) decision tree to identify the risky heart disease cases. This work is based on a large amount of data collected from several clinical institutions based on the medical profile of patient. Moreover, the experts- knowledge in this field has been taken into consideration in order to define the disease, its risk factors, and to establish significant knowledge relationships among the medical factors. A computer-aided system is developed for this purpose based on a population of 525 adults. The performance of the proposed model is analyzed and evaluated based on set of benchmark techniques applied in this classification problem.

Keywords: Multi-Classifier Decisions Tree, Features Reduction, Dynamic Programming, Rough Sets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2532
741 Contextual Sentiment Analysis with Untrained Annotators

Authors: Lucas A. Silva, Carla R. Aguiar

Abstract:

This work presents a proposal to perform contextual sentiment analysis using a supervised learning algorithm and disregarding the extensive training of annotators. To achieve this goal, a web platform was developed to perform the entire procedure outlined in this paper. The main contribution of the pipeline described in this article is to simplify and automate the annotation process through a system of analysis of congruence between the notes. This ensured satisfactory results even without using specialized annotators in the context of the research, avoiding the generation of biased training data for the classifiers. For this, a case study was conducted in a blog of entrepreneurship. The experimental results were consistent with the literature related annotation using formalized process with experts.

Keywords: Contextualized classifier, naïve Bayes, sentiment analysis, untrained annotators.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4703
740 A Pattern Recognition Neural Network Model for Detection and Classification of SQL Injection Attacks

Authors: Naghmeh Moradpoor Sheykhkanloo

Abstract:

Thousands of organisations store important and confidential information related to them, their customers, and their business partners in databases all across the world. The stored data ranges from less sensitive (e.g. first name, last name, date of birth) to more sensitive data (e.g. password, pin code, and credit card information). Losing data, disclosing confidential information or even changing the value of data are the severe damages that Structured Query Language injection (SQLi) attack can cause on a given database. It is a code injection technique where malicious SQL statements are inserted into a given SQL database by simply using a web browser. In this paper, we propose an effective pattern recognition neural network model for detection and classification of SQLi attacks. The proposed model is built from three main elements of: a Uniform Resource Locator (URL) generator in order to generate thousands of malicious and benign URLs, a URL classifier in order to: 1) classify each generated URL to either a benign URL or a malicious URL and 2) classify the malicious URLs into different SQLi attack categories, and a NN model in order to: 1) detect either a given URL is a malicious URL or a benign URL and 2) identify the type of SQLi attack for each malicious URL. The model is first trained and then evaluated by employing thousands of benign and malicious URLs. The results of the experiments are presented in order to demonstrate the effectiveness of the proposed approach.

Keywords: Neural Networks, pattern recognition, SQL injection attacks, SQL injection attack classification, SQL injection attack detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2844
739 A New History Based Method to Handle the Recurring Concept Shifts in Data Streams

Authors: Hossein Morshedlou, Ahmad Abdollahzade Barforoush

Abstract:

Recent developments in storage technology and networking architectures have made it possible for broad areas of applications to rely on data streams for quick response and accurate decision making. Data streams are generated from events of real world so existence of associations, which are among the occurrence of these events in real world, among concepts of data streams is logical. Extraction of these hidden associations can be useful for prediction of subsequent concepts in concept shifting data streams. In this paper we present a new method for learning association among concepts of data stream and prediction of what the next concept will be. Knowing the next concept, an informed update of data model will be possible. The results of conducted experiments show that the proposed method is proper for classification of concept shifting data streams.

Keywords: Data Stream, Classification, Concept Shift, History.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1278
738 An Improvement of Multi-Label Image Classification Method Based on Histogram of Oriented Gradient

Authors: Ziad Abdallah, Mohamad Oueidat, Ali El-Zaart

Abstract:

Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.

Keywords: Data mining, information retrieval system, multi-label, problem transformation, histogram of gradients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1315
737 Hybrid Approach for Software Defect Prediction Using Machine Learning with Optimization Technique

Authors: C. Manjula, Lilly Florence

Abstract:

Software technology is developing rapidly which leads to the growth of various industries. Now-a-days, software-based applications have been adopted widely for business purposes. For any software industry, development of reliable software is becoming a challenging task because a faulty software module may be harmful for the growth of industry and business. Hence there is a need to develop techniques which can be used for early prediction of software defects. Due to complexities in manual prediction, automated software defect prediction techniques have been introduced. These techniques are based on the pattern learning from the previous software versions and finding the defects in the current version. These techniques have attracted researchers due to their significant impact on industrial growth by identifying the bugs in software. Based on this, several researches have been carried out but achieving desirable defect prediction performance is still a challenging task. To address this issue, here we present a machine learning based hybrid technique for software defect prediction. First of all, Genetic Algorithm (GA) is presented where an improved fitness function is used for better optimization of features in data sets. Later, these features are processed through Decision Tree (DT) classification model. Finally, an experimental study is presented where results from the proposed GA-DT based hybrid approach is compared with those from the DT classification technique. The results show that the proposed hybrid approach achieves better classification accuracy.

Keywords: Decision tree, genetic algorithm, machine learning, software defect prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1465
736 Investigation on Feature Extraction and Classification of Medical Images

Authors: P. Gnanasekar, A. Nagappan, S. Sharavanan, O. Saravanan, D. Vinodkumar, T. Elayabharathi, G. Karthik

Abstract:

In this paper we present the deep study about the Bio- Medical Images and tag it with some basic extracting features (e.g. color, pixel value etc). The classification is done by using a nearest neighbor classifier with various distance measures as well as the automatic combination of classifier results. This process selects a subset of relevant features from a group of features of the image. It also helps to acquire better understanding about the image by describing which the important features are. The accuracy can be improved by increasing the number of features selected. Various types of classifications were evolved for the medical images like Support Vector Machine (SVM) which is used for classifying the Bacterial types. Ant Colony Optimization method is used for optimal results. It has high approximation capability and much faster convergence, Texture feature extraction method based on Gabor wavelets etc..

Keywords: ACO Ant Colony Optimization, Correlogram, CCM Co-Occurrence Matrix, RTS Rough-Set theory

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3012
735 Non-negative Principal Component Analysis for Face Recognition

Authors: Zhang Yan, Yu Bin

Abstract:

Principle component analysis is often combined with the state-of-art classification algorithms to recognize human faces. However, principle component analysis can only capture these features contributing to the global characteristics of data because it is a global feature selection algorithm. It misses those features contributing to the local characteristics of data because each principal component only contains some levels of global characteristics of data. In this study, we present a novel face recognition approach using non-negative principal component analysis which is added with the constraint of non-negative to improve data locality and contribute to elucidating latent data structures. Experiments are performed on the Cambridge ORL face database. We demonstrate the strong performances of the algorithm in recognizing human faces in comparison with PCA and NREMF approaches.

Keywords: classification, face recognition, non-negativeprinciple component analysis (NPCA)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1695
734 A Case-Based Reasoning-Decision Tree Hybrid System for Stock Selection

Authors: Yaojun Wang, Yaoqing Wang

Abstract:

Stock selection is an important decision-making problem. Many machine learning and data mining technologies are employed to build automatic stock-selection system. A profitable stock-selection system should consider the stock’s investment value and the market timing. In this paper, we present a hybrid system including both engage for stock selection. This system uses a case-based reasoning (CBR) model to execute the stock classification, uses a decision-tree model to help with market timing and stock selection. The experiments show that the performance of this hybrid system is better than that of other techniques regarding to the classification accuracy, the average return and the Sharpe ratio.

Keywords: Case-based reasoning, decision tree, stock selection, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1705
733 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN). 

Keywords: Biometrics, identity verification, genetic data, k-nearest neighbor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1120
732 Roof Material Detection Based on Object-Based Approach Using WorldView-2 Satellite Imagery

Authors: Ebrahim Taherzadeh, Helmi Z. M. Shafri, Kaveh Shahi

Abstract:

One of the most important tasks in urban remote sensing is the detection of impervious surfaces (IS), such as roofs and roads. However, detection of IS in heterogeneous areas still remains one of the most challenging tasks. In this study, detection of concrete roof using an object-based approach was proposed. A new rule-based classification was developed to detect concrete roof tile. This proposed rule-based classification was applied to WorldView-2 image and results showed that the proposed rule has good potential to predict concrete roof material from WorldView-2 images, with 85% accuracy.

Keywords: Urban remote sensing, impervious surface, Object- Based, Roof Material, Concrete tile, WorldView-2.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3793
731 Video Classification by Partitioned Frequency Spectra of Repeating Movements

Authors: Kahraman Ayyildiz, Stefan Conrad

Abstract:

In this paper we present a system for classifying videos by frequency spectra. Many videos contain activities with repeating movements. Sports videos, home improvement videos, or videos showing mechanical motion are some example areas. Motion of these areas usually repeats with a certain main frequency and several side frequencies. Transforming repeating motion to its frequency domain via FFT reveals these frequencies. Average amplitudes of frequency intervals can be seen as features of cyclic motion. Hence determining these features can help to classify videos with repeating movements. In this paper we explain how to compute frequency spectra for video clips and how to use them for classifying. Our approach utilizes series of image moments as a function. This function again is transformed into its frequency domain.

Keywords: action recognition, frequency feature, motion recognition, repeating movement, video classification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1883
730 Danger Theory and Intelligent Data Processing

Authors: Anjum Iqbal, Mohd Aizaini Maarof

Abstract:

Artificial Immune System (AIS) is relatively naive paradigm for intelligent computations. The inspiration for AIS is derived from natural Immune System (IS). Classically it is believed that IS strives to discriminate between self and non-self. Most of the existing AIS research is based on this approach. Danger Theory (DT) argues this approach and proposes that IS fights against danger producing elements and tolerates others. We, the computational researchers, are not concerned with the arguments among immunologists but try to extract from it novel abstractions for intelligent computation. This paper aims to follow DT inspiration for intelligent data processing. The approach may introduce new avenue in intelligent processing. The data used is system calls data that is potentially significant in intrusion detection applications.

Keywords: artificial immune system, danger theory, intelligent processing, system calls

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1883
729 ANFIS Approach for Locating Faults in Underground Cables

Authors: Magdy B. Eteiba, Wael Ismael Wahba, Shimaa Barakat

Abstract:

This paper presents a fault identification, classification and fault location estimation method based on Discrete Wavelet Transform and Adaptive Network Fuzzy Inference System (ANFIS) for medium voltage cable in the distribution system.

Different faults and locations are simulated by ATP/EMTP, and then certain selected features of the wavelet transformed signals are used as an input for a training process on the ANFIS. Then an accurate fault classifier and locator algorithm was designed, trained and tested using current samples only. The results obtained from ANFIS output were compared with the real output. From the results, it was found that the percentage error between ANFIS output and real output is less than three percent. Hence, it can be concluded that the proposed technique is able to offer high accuracy in both of the fault classification and fault location.

Keywords: ANFIS, Fault location, Underground Cable, Wavelet Transform.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2741
728 An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique

Authors: Ghada A. Alfattni

Abstract:

Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalanced datasets. Three classification models (Logistic Regression, Support Vector Machine and Nearest Neighbour) were tested with multiple datasets, then the same datasets were oversampled by using SMOTE and applied again to the three models to compare the differences in the performances. Results of experiments show that the highest number of nearest neighbours gives lower values of error rates. 

Keywords: Imbalanced datasets, SMOTE, machine learning, logistic regression, support vector machine, nearest neighbour.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1314
727 Estimation Model of Dry Docking Duration Using Data Mining

Authors: Isti Surjandari, Riara Novita

Abstract:

Maintenance is one of the most important activities in the shipyard industry. However, sometimes it is not supported by adequate services from the shipyard, where inaccuracy in estimating the duration of the ship maintenance is still common. This makes estimation of ship maintenance duration is crucial. This study uses Data Mining approach, i.e., CART (Classification and Regression Tree) to estimate the duration of ship maintenance that is limited to dock works or which is known as dry docking. By using the volume of dock works as an input to estimate the maintenance duration, 4 classes of dry docking duration were obtained with different linear model and job criteria for each class. These linear models can then be used to estimate the duration of dry docking based on job criteria.

Keywords: Classification and regression tree (CART), data mining, dry docking, maintenance duration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2433
726 Liver Tumor Detection by Classification through FD Enhancement of CT Image

Authors: N. Ghatwary, A. Ahmed, H. Jalab

Abstract:

In this paper, an approach for the liver tumor detection in computed tomography (CT) images is represented. The detection process is based on classifying the features of target liver cell to either tumor or non-tumor. Fractional differential (FD) is applied for enhancement of Liver CT images, with the aim of enhancing texture and edge features. Later on, a fusion method is applied to merge between the various enhanced images and produce a variety of feature improvement, which will increase the accuracy of classification. Each image is divided into NxN non-overlapping blocks, to extract the desired features. Support vector machines (SVM) classifier is trained later on a supplied dataset different from the tested one. Finally, the block cells are identified whether they are classified as tumor or not. Our approach is validated on a group of patients’ CT liver tumor datasets. The experiment results demonstrated the efficiency of detection in the proposed technique.

Keywords: Fractional differential (FD), Computed Tomography (CT), fusion.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1682
725 Clustering Multivariate Empiric Characteristic Functions for Multi-Class SVM Classification

Authors: María-Dolores Cubiles-de-la-Vega, Rafael Pino-Mejías, Esther-Lydia Silva-Ramírez

Abstract:

A dissimilarity measure between the empiric characteristic functions of the subsamples associated to the different classes in a multivariate data set is proposed. This measure can be efficiently computed, and it depends on all the cases of each class. It may be used to find groups of similar classes, which could be joined for further analysis, or it could be employed to perform an agglomerative hierarchical cluster analysis of the set of classes. The final tree can serve to build a family of binary classification models, offering an alternative approach to the multi-class SVM problem. We have tested this dendrogram based SVM approach with the oneagainst- one SVM approach over four publicly available data sets, three of them being microarray data. Both performances have been found equivalent, but the first solution requires a smaller number of binary SVM models.

Keywords: Cluster Analysis, Empiric Characteristic Function, Multi-class SVM, R.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1877