Search results for: K-Nearest Neighbours
12 Massive Lesions Classification using Features based on Morphological Lesion Differences
Authors: U. Bottigli, D.Cascio, F. Fauci, B. Golosio, R. Magro, G.L. Masala, P. Oliva, G. Raso, S.Stumbo
Abstract:
Purpose of this work is the development of an automatic classification system which could be useful for radiologists in the investigation of breast cancer. The software has been designed in the framework of the MAGIC-5 collaboration. In the automatic classification system the suspicious regions with high probability to include a lesion are extracted from the image as regions of interest (ROIs). Each ROI is characterized by some features based on morphological lesion differences. Some classifiers as a Feed Forward Neural Network, a K-Nearest Neighbours and a Support Vector Machine are used to distinguish the pathological records from the healthy ones. The results obtained in terms of sensitivity (percentage of pathological ROIs correctly classified) and specificity (percentage of non-pathological ROIs correctly classified) will be presented through the Receive Operating Characteristic curve (ROC). In particular the best performances are 88% ± 1 of area under ROC curve obtained with the Feed Forward Neural Network.Keywords: Neural Networks, K-Nearest Neighbours, SupportVector Machine, Computer Aided Diagnosis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 138211 Superior Performances of the Neural Network on the Masses Lesions Classification through Morphological Lesion Differences
Authors: U. Bottigli, R.Chiarucci, B. Golosio, G.L. Masala, P. Oliva, S.Stumbo, D.Cascio, F. Fauci, M. Glorioso, M. Iacomi, R. Magro, G. Raso
Abstract:
Purpose of this work is to develop an automatic classification system that could be useful for radiologists in the breast cancer investigation. The software has been designed in the framework of the MAGIC-5 collaboration. In an automatic classification system the suspicious regions with high probability to include a lesion are extracted from the image as regions of interest (ROIs). Each ROI is characterized by some features based generally on morphological lesion differences. A study in the space features representation is made and some classifiers are tested to distinguish the pathological regions from the healthy ones. The results provided in terms of sensitivity and specificity will be presented through the ROC (Receiver Operating Characteristic) curves. In particular the best performances are obtained with the Neural Networks in comparison with the K-Nearest Neighbours and the Support Vector Machine: The Radial Basis Function supply the best results with 0.89 ± 0.01 of area under ROC curve but similar results are obtained with the Probabilistic Neural Network and a Multi Layer Perceptron.
Keywords: Neural Networks, K-Nearest Neighbours, Support Vector Machine, Computer Aided Detection
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 161610 Weighted Clustering Coefficient for Identifying Modular Formations in Protein-Protein Interaction Networks
Authors: Zelmina Lubovac, Björn Olsson, Jonas Gamalielsson
Abstract:
This paper describes a novel approach for deriving modules from protein-protein interaction networks, which combines functional information with topological properties of the network. This approach is based on weighted clustering coefficient, which uses weights representing the functional similarities between the proteins. These weights are calculated according to the semantic similarity between the proteins, which is based on their Gene Ontology terms. We recently proposed an algorithm for identification of functional modules, called SWEMODE (Semantic WEights for MODule Elucidation), that identifies dense sub-graphs containing functionally similar proteins. The rational underlying this approach is that each module can be reduced to a set of triangles (protein triplets connected to each other). Here, we propose considering semantic similarity weights of all triangle-forming edges between proteins. We also apply varying semantic similarity thresholds between neighbours of each node that are not neighbours to each other (and hereby do not form a triangle), to derive new potential triangles to include in module-defining procedure. The results show an improvement of pure topological approach, in terms of number of predicted modules that match known complexes.Keywords: Modules, systems biology, protein interactionnetworks, yeast.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21089 Improved Tropical Wood Species Recognition System based on Multi-feature Extractor and Classifier
Authors: Marzuki Khalid, RubiyahYusof, AnisSalwaMohdKhairuddin
Abstract:
An automated wood recognition system is designed to classify tropical wood species.The wood features are extracted based on two feature extractors: Basic Grey Level Aura Matrix (BGLAM) technique and statistical properties of pores distribution (SPPD) technique. Due to the nonlinearity of the tropical wood species separation boundaries, a pre classification stage is proposed which consists ofKmeans clusteringand kernel discriminant analysis (KDA). Finally, Linear Discriminant Analysis (LDA) classifier and KNearest Neighbour (KNN) are implemented for comparison purposes. The study involves comparison of the system with and without pre classification using KNN classifier and LDA classifier.The results show that the inclusion of the pre classification stage has improved the accuracy of both the LDA and KNN classifiers by more than 12%.Keywords: Tropical wood species, nonlinear data, featureextractors, classification
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20008 Improvement in Power Transformer Intelligent Dissolved Gas Analysis Method
Authors: S. Qaedi, S. Seyedtabaii
Abstract:
Non-Destructive evaluation of in-service power transformer condition is necessary for avoiding catastrophic failures. Dissolved Gas Analysis (DGA) is one of the important methods. Traditional, statistical and intelligent DGA approaches have been adopted for accurate classification of incipient fault sources. Unfortunately, there are not often enough faulty patterns required for sufficient training of intelligent systems. By bootstrapping the shortcoming is expected to be alleviated and algorithms with better classification success rates to be obtained. In this paper the performance of an artificial neural network, K-Nearest Neighbour and support vector machine methods using bootstrapped data are detailed and shown that while the success rate of the ANN algorithms improves remarkably, the outcome of the others do not benefit so much from the provided enlarged data space. For assessment, two databases are employed: IEC TC10 and a dataset collected from reported data in papers. High average test success rate well exhibits the remarkable outcome.Keywords: Dissolved gas analysis, Transformer incipient fault, Artificial Neural Network, Support Vector Machine (SVM), KNearest Neighbor (KNN)
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 27397 Improved Automated Classification of Alcoholics and Non-alcoholics
Authors: Ramaswamy Palaniappan
Abstract:
In this paper, several improvements are proposed to previous work of automated classification of alcoholics and nonalcoholics. In the previous paper, multiplayer-perceptron neural network classifying energy of gamma band Visual Evoked Potential (VEP) signals gave the best classification performance using 800 VEP signals from 10 alcoholics and 10 non-alcoholics. Here, the dataset is extended to include 3560 VEP signals from 102 subjects: 62 alcoholics and 40 non-alcoholics. Three modifications are introduced to improve the classification performance: i) increasing the gamma band spectral range by increasing the pass-band width of the used filter ii) the use of Multiple Signal Classification algorithm to obtain the power of the dominant frequency in gamma band VEP signals as features and iii) the use of the simple but effective knearest neighbour classifier. To validate that these two modifications do give improved performance, a 10-fold cross validation classification (CVC) scheme is used. Repeat experiments of the previously used methodology for the extended dataset are performed here and improvement from 94.49% to 98.71% in maximum averaged CVC accuracy is obtained using the modifications. This latest results show that VEP based classification of alcoholics is worth exploring further for system development.Keywords: Alcoholic, Multilayer-perceptron, Nearest neighbour, Gamma band, MUSIC, Visual evoked potential.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13786 An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique
Authors: Ghada A. Alfattni
Abstract:
Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalanced datasets. Three classification models (Logistic Regression, Support Vector Machine and Nearest Neighbour) were tested with multiple datasets, then the same datasets were oversampled by using SMOTE and applied again to the three models to compare the differences in the performances. Results of experiments show that the highest number of nearest neighbours gives lower values of error rates.Keywords: Imbalanced datasets, SMOTE, machine learning, logistic regression, support vector machine, nearest neighbour.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13155 A Distributed Topology Control Algorithm to Conserve Energy in Heterogeneous Wireless Mesh Networks
Authors: F. O. Aron, T. O. Olwal, A. Kurien, M. O. Odhiambo
Abstract:
A considerable amount of energy is consumed during transmission and reception of messages in a wireless mesh network (WMN). Reducing per-node transmission power would greatly increase the network lifetime via power conservation in addition to increasing the network capacity via better spatial bandwidth reuse. In this work, the problem of topology control in a hybrid WMN of heterogeneous wireless devices with varying maximum transmission ranges is considered. A localized distributed topology control algorithm is presented which calculates the optimal transmission power so that (1) network connectivity is maintained (2) node transmission power is reduced to cover only the nearest neighbours (3) networks lifetime is extended. Simulations and analysis of results are carried out in the NS-2 environment to demonstrate the correctness and effectiveness of the proposed algorithm.Keywords: Topology Control, Wireless Mesh Networks, Backbone, Energy Efficiency, Localized Algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13944 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine
Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li
Abstract:
Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.
Keywords: Machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9483 Rolling Element Bearing Diagnosis by Improved Envelope Spectrum: Optimal Frequency Band Selection
Authors: Juan David Arango, Alejandro Restrepo-Martinez
Abstract:
The Rolling Element Bearing (REB) vibration diagnosis is worth of special interest by the variety of REB and the wide necessity of those elements in industrial applications. The presence of a localized fault in a REB gives rise to a vibrational response, characterized by the modulation of a carrier signal. Frequency content of carrier signal (Spectral Frequency –f) is mainly related to resonance frequencies of the REB. This carrier signal is modulated by another signal, governed by the periodicity of the fault impact (Cyclic Frequency –α). In this sense, REB fault vibration response gives rise to a second-order cyclostationary signal. Second order cyclostationary signals could be represented in a bi-spectral map, where Spectral Coherence –SCoh are plotted against f and α. The Improved Envelope Spectrum –IES, is a useful approach to execute REB fault diagnosis. IES could be applied by the integration of SCoh over a predefined bandwidth on the f axis. Approaches to select f-bandwidth have been recently exposed by the definition of a metric which intends to evaluate the magnitude of the IES at the fault characteristics frequencies. This metric is represented in a 1/3-binary tree as a function of the frequency bandwidth and centre. Based on this binary tree the optimal frequency band is selected. However, some advantages have been seen if the metric is changed, which in fact tends to dictate different optimal f-bandwidth and so improve the IES representation. This paper evaluates the behaviour of the IES from a different metric optimization. This metric is based on the sample correlation coefficient, detecting high peaks in the selected frequencies while penalizing high peaks in the neighbours of the selected frequencies. Prior results indicate an improvement on the signal-noise ratio (SNR) on around 86% of samples analysed, which belong to IMS database.
Keywords: Sample Correlation IESFOgram, cyclostationary analysis, improved envelope spectrum, IES, rolling element bearing diagnosis, spectral coherence.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7422 Smart Meters and In-Home Displays to Encourage Water Conservation through Behavioural Change
Authors: Julia Terlet, Thomas H. Beach, Yacine Rezgui
Abstract:
Urbanization, population growth, climate change and the current increase in water demand have made the adoption of innovative demand management strategies crucial to the water industry. Water conservation in urban areas has to be improved by encouraging consumers to adopt more sustainable habits and behaviours. This includes informing and educating them about their households’ water consumption and advising them about ways to achieve significant savings on a daily basis. This paper presents a study conducted in the context of the European FP7 WISDOM Project. By integrating innovative Information and Communication Technologies (ICT) frameworks, this project aims at achieving a change in water savings. More specifically, behavioural change will be attempted by implementing smart meters and in-home displays in a trial group of selected households within Cardiff (UK). Using this device, consumers will be able to receive feedback and information about their consumption but will also have the opportunity to compare their consumption to the consumption of other consumers and similar households. Following an initial survey, it appeared necessary to implement these in-home displays in a way that matches consumer's motivations to save water. The results demonstrated the importance of various factors influencing people’s daily water consumption. Both the relevant literature on the subject and the results of our survey therefore led us to include within the in-home device a variety of elements. It first appeared crucial to make consumers aware of the economic aspect of water conservation and especially of the significant financial savings that can be achieved by reducing their household’s water consumption on the long term. Likewise, reminding participants of the impact of their consumption on the environment by making them more aware of water scarcity issues around the world will help increasing their motivation to save water. Additionally, peer pressure and social comparisons with neighbours and other consumers, accentuated by the use of online social networks such as Facebook or Twitter, will likely encourage consumers to reduce their consumption. Participants will also be able to compare their current consumption to their past consumption and to observe the consequences of their efforts to save water through diverse graphs and charts. Finally, including a virtual water game within the display will help the whole household, children and adults, to achieve significant reductions by providing them with simple tips and advice to save water on a daily basis. Moreover, by setting daily and weekly goals for them to reach, the game will expectantly generate cooperation between family members. Members of each household will indeed be encouraged to work together to reduce their water consumption within different rooms of the house, such as the bathroom, the kitchen, or the toilets. Overall, this study will allow us to understand the elements that attract consumers the most and the features that are most commonly used by the participants. In this way, we intend to determine the main factors influencing water consumption in order to identify the measures that will most encourage water conservation in both the long and short term.
Keywords: Behavioural change, ICT technologies, water consumption, water conservation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15871 Using Statistical Significance and Prediction to Test Long/Short Term Public Services and Patients Cohorts: A Case Study in Scotland
Authors: Sotirios Raptis
Abstract:
Health and Social care (HSc) services planning and scheduling are facing unprecedented challenges, due to the pandemic pressure and also suffer from unplanned spending that is negatively impacted by the global financial crisis. Data-driven approaches can help to improve policies, plan and design services provision schedules using algorithms that assist healthcare managers to face unexpected demands using fewer resources. The paper discusses services packing using statistical significance tests and machine learning (ML) to evaluate demands similarity and coupling. This is achieved by predicting the range of the demand (class) using ML methods such as Classification and Regression Trees (CART), Random Forests (RF), and Logistic Regression (LGR). The significance tests Chi-Squared and Student’s test are used on data over a 39 years span for which data exist for services delivered in Scotland. The demands are associated using probabilities and are parts of statistical hypotheses. These hypotheses, as their NULL part, assume that the target demand is statistically dependent on other services’ demands. This linking is checked using the data. In addition, ML methods are used to linearly predict the above target demands from the statistically found associations and extend the linear dependence of the target’s demand to independent demands forming, thus, groups of services. Statistical tests confirmed ML coupling and made the prediction statistically meaningful and proved that a target service can be matched reliably to other services while ML showed that such marked relationships can also be linear ones. Zero padding was used for missing years records and illustrated better such relationships both for limited years and for the entire span offering long-term data visualizations while limited years periods explained how well patients numbers can be related in short periods of time or that they can change over time as opposed to behaviours across more years. The prediction performance of the associations were measured using metrics such as Receiver Operating Characteristic (ROC), Area Under Curve (AUC) and Accuracy (ACC) as well as the statistical tests Chi-Squared and Student. Co-plots and comparison tables for the RF, CART, and LGR methods as well as the p-value from tests and Information Exchange (IE/MIE) measures are provided showing the relative performance of ML methods and of the statistical tests as well as the behaviour using different learning ratios. The impact of k-neighbours classification (k-NN), Cross-Correlation (CC) and C-Means (CM) first groupings was also studied over limited years and for the entire span. It was found that CART was generally behind RF and LGR but in some interesting cases, LGR reached an AUC = 0 falling below CART, while the ACC was as high as 0.912 showing that ML methods can be confused by zero-padding or by data’s irregularities or by the outliers. On average, 3 linear predictors were sufficient, LGR was found competing well RF and CART followed with the same performance at higher learning ratios. Services were packed only when a significance level (p-value) of their association coefficient was more than 0.05. Social factors relationships were observed between home care services and treatment of old people, low birth weights, alcoholism, drug abuse, and emergency admissions. The work found that different HSc services can be well packed as plans of limited duration, across various services sectors, learning configurations, as confirmed by using statistical hypotheses.
Keywords: Class, cohorts, data frames, grouping, prediction, probabilities, services.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 461