Search results for: classifiers ensembles
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 226

Search results for: classifiers ensembles

166 Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse

Authors: Sheena Christabel Pravin, M. Palanivelan

Abstract:

Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.

Keywords: bi-lingual, children who stutter, children with language impairment, hidden markov models, multi-layer perceptron, linguistic disfluencies, stuttering disfluencies

Procedia PDF Downloads 217
165 Multi-Objective Evolutionary Computation Based Feature Selection Applied to Behaviour Assessment of Children

Authors: F. Jiménez, R. Jódar, M. Martín, G. Sánchez, G. Sciavicco

Abstract:

Abstract—Attribute or feature selection is one of the basic strategies to improve the performances of data classification tasks, and, at the same time, to reduce the complexity of classifiers, and it is a particularly fundamental one when the number of attributes is relatively high. Its application to unsupervised classification is restricted to a limited number of experiments in the literature. Evolutionary computation has already proven itself to be a very effective choice to consistently reduce the number of attributes towards a better classification rate and a simpler semantic interpretation of the inferred classifiers. We present a feature selection wrapper model composed by a multi-objective evolutionary algorithm, the clustering method Expectation-Maximization (EM), and the classifier C4.5 for the unsupervised classification of data extracted from a psychological test named BASC-II (Behavior Assessment System for Children - II ed.) with two objectives: Maximizing the likelihood of the clustering model and maximizing the accuracy of the obtained classifier. We present a methodology to integrate feature selection for unsupervised classification, model evaluation, decision making (to choose the most satisfactory model according to a a posteriori process in a multi-objective context), and testing. We compare the performance of the classifier obtained by the multi-objective evolutionary algorithms ENORA and NSGA-II, and the best solution is then validated by the psychologists that collected the data.

Keywords: evolutionary computation, feature selection, classification, clustering

Procedia PDF Downloads 372
164 A Psychophysiological Evaluation of an Effective Recognition Technique Using Interactive Dynamic Virtual Environments

Authors: Mohammadhossein Moghimi, Robert Stone, Pia Rotshtein

Abstract:

Recording psychological and physiological correlates of human performance within virtual environments and interpreting their impacts on human engagement, ‘immersion’ and related emotional or ‘effective’ states is both academically and technologically challenging. By exposing participants to an effective, real-time (game-like) virtual environment, designed and evaluated in an earlier study, a psychophysiological database containing the EEG, GSR and Heart Rate of 30 male and female gamers, exposed to 10 games, was constructed. Some 174 features were subsequently identified and extracted from a number of windows, with 28 different timing lengths (e.g. 2, 3, 5, etc. seconds). After reducing the number of features to 30, using a feature selection technique, K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) methods were subsequently employed for the classification process. The classifiers categorised the psychophysiological database into four effective clusters (defined based on a 3-dimensional space – valence, arousal and dominance) and eight emotion labels (relaxed, content, happy, excited, angry, afraid, sad, and bored). The KNN and SVM classifiers achieved average cross-validation accuracies of 97.01% (±1.3%) and 92.84% (±3.67%), respectively. However, no significant differences were found in the classification process based on effective clusters or emotion labels.

Keywords: virtual reality, effective computing, effective VR, emotion-based effective physiological database

Procedia PDF Downloads 234
163 Classification of Potential Biomarkers in Breast Cancer Using Artificial Intelligence Algorithms and Anthropometric Datasets

Authors: Aref Aasi, Sahar Ebrahimi Bajgani, Erfan Aasi

Abstract:

Breast cancer (BC) continues to be the most frequent cancer in females and causes the highest number of cancer-related deaths in women worldwide. Inspired by recent advances in studying the relationship between different patient attributes and features and the disease, in this paper, we have tried to investigate the different classification methods for better diagnosis of BC in the early stages. In this regard, datasets from the University Hospital Centre of Coimbra were chosen, and different machine learning (ML)-based and neural network (NN) classifiers have been studied. For this purpose, we have selected favorable features among the nine provided attributes from the clinical dataset by using a random forest algorithm. This dataset consists of both healthy controls and BC patients, and it was noted that glucose, BMI, resistin, and age have the most importance, respectively. Moreover, we have analyzed these features with various ML-based classifier methods, including Decision Tree (DT), K-Nearest Neighbors (KNN), eXtreme Gradient Boosting (XGBoost), Logistic Regression (LR), Naive Bayes (NB), and Support Vector Machine (SVM) along with NN-based Multi-Layer Perceptron (MLP) classifier. The results revealed that among different techniques, the SVM and MLP classifiers have the most accuracy, with amounts of 96% and 92%, respectively. These results divulged that the adopted procedure could be used effectively for the classification of cancer cells, and also it encourages further experimental investigations with more collected data for other types of cancers.

Keywords: breast cancer, diagnosis, machine learning, biomarker classification, neural network

Procedia PDF Downloads 139
162 Parkinson’s Disease Detection Analysis through Machine Learning Approaches

Authors: Muhtasim Shafi Kader, Fizar Ahmed, Annesha Acharjee

Abstract:

Machine learning and data mining are crucial in health care, as well as medical information and detection. Machine learning approaches are now being utilized to improve awareness of a variety of critical health issues, including diabetes detection, neuron cell tumor diagnosis, COVID 19 identification, and so on. Parkinson’s disease is basically a disease for our senior citizens in Bangladesh. Parkinson's Disease indications often seem progressive and get worst with time. People got affected trouble walking and communicating with the condition advances. Patients can also have psychological and social vagaries, nap problems, hopelessness, reminiscence loss, and weariness. Parkinson's disease can happen in both men and women. Though men are affected by the illness at a proportion that is around partial of them are women. In this research, we have to get out the accurate ML algorithm to find out the disease with a predictable dataset and the model of the following machine learning classifiers. Therefore, nine ML classifiers are secondhand to portion study to use machine learning approaches like as follows, Naive Bayes, Adaptive Boosting, Bagging Classifier, Decision Tree Classifier, Random Forest classifier, XBG Classifier, K Nearest Neighbor Classifier, Support Vector Machine Classifier, and Gradient Boosting Classifier are used.

Keywords: naive bayes, adaptive boosting, bagging classifier, decision tree classifier, random forest classifier, XBG classifier, k nearest neighbor classifier, support vector classifier, gradient boosting classifier

Procedia PDF Downloads 130
161 Detection of Powdery Mildew Disease in Strawberry Using Image Texture and Supervised Classifiers

Authors: Sultan Mahmud, Qamar Zaman, Travis Esau, Young Chang

Abstract:

Strawberry powdery mildew (PM) is a serious disease that has a significant impact on strawberry production. Field scouting is still a major way to find PM disease, which is not only labor intensive but also almost impossible to monitor disease severity. To reduce the loss caused by PM disease and achieve faster automatic detection of the disease, this paper proposes an approach for detection of the disease, based on image texture and classified with support vector machines (SVMs) and k-nearest neighbors (kNNs). The methodology of the proposed study is based on image processing which is composed of five main steps including image acquisition, pre-processing, segmentation, features extraction and classification. Two strawberry fields were used in this study. Images of healthy leaves and leaves infected with PM (Sphaerotheca macularis) disease under artificial cloud lighting condition. Colour thresholding was utilized to segment all images before textural analysis. Colour co-occurrence matrix (CCM) was introduced for extraction of textural features. Forty textural features, related to a physiological parameter of leaves were extracted from CCM of National television system committee (NTSC) luminance, hue, saturation and intensity (HSI) images. The normalized feature data were utilized for training and validation, respectively, using developed classifiers. The classifiers have experimented with internal, external and cross-validations. The best classifier was selected based on their performance and accuracy. Experimental results suggested that SVMs classifier showed 98.33%, 85.33%, 87.33%, 93.33% and 95.0% of accuracy on internal, external-I, external-II, 4-fold cross and 5-fold cross-validation, respectively. Whereas, kNNs results represented 90.0%, 72.00%, 74.66%, 89.33% and 90.3% of classification accuracy, respectively. The outcome of this study demonstrated that SVMs classified PM disease with a highest overall accuracy of 91.86% and 1.1211 seconds of processing time. Therefore, overall results concluded that the proposed study can significantly support an accurate and automatic identification and recognition of strawberry PM disease with SVMs classifier.

Keywords: powdery mildew, image processing, textural analysis, color co-occurrence matrix, support vector machines, k-nearest neighbors

Procedia PDF Downloads 122
160 Artificial Intelligence Assisted Sentiment Analysis of Hotel Reviews Using Topic Modeling

Authors: Sushma Ghogale

Abstract:

With a surge in user-generated content or feedback or reviews on the internet, it has become possible and important to know consumers' opinions about products and services. This data is important for both potential customers and businesses providing the services. Data from social media is attracting significant attention and has become the most prominent channel of expressing an unregulated opinion. Prospective customers look for reviews from experienced customers before deciding to buy a product or service. Several websites provide a platform for users to post their feedback for the provider and potential customers. However, the biggest challenge in analyzing such data is in extracting latent features and providing term-level analysis of the data. This paper proposes an approach to use topic modeling to classify the reviews into topics and conduct sentiment analysis to mine the opinions. This approach can analyse and classify latent topics mentioned by reviewers on business sites or review sites, or social media using topic modeling to identify the importance of each topic. It is followed by sentiment analysis to assess the satisfaction level of each topic. This approach provides a classification of hotel reviews using multiple machine learning techniques and comparing different classifiers to mine the opinions of user reviews through sentiment analysis. This experiment concludes that Multinomial Naïve Bayes classifier produces higher accuracy than other classifiers.

Keywords: latent Dirichlet allocation, topic modeling, text classification, sentiment analysis

Procedia PDF Downloads 97
159 Voice Liveness Detection Using Kolmogorov Arnold Networks

Authors: Arth J. Shah, Madhu R. Kamble

Abstract:

Voice biometric liveness detection is customized to certify an authentication process of the voice data presented is genuine and not a recording or synthetic voice. With the rise of deepfakes and other equivalently sophisticated spoofing generation techniques, it’s becoming challenging to ensure that the person on the other end is a live speaker or not. Voice Liveness Detection (VLD) system is a group of security measures which detect and prevent voice spoofing attacks. Motivated by the recent development of the Kolmogorov-Arnold Network (KAN) based on the Kolmogorov-Arnold theorem, we proposed KAN for the VLD task. To date, multilayer perceptron (MLP) based classifiers have been used for the classification tasks. We aim to capture not only the compositional structure of the model but also to optimize the values of univariate functions. This study explains the mathematical as well as experimental analysis of KAN for VLD tasks, thereby opening a new perspective for scientists to work on speech and signal processing-based tasks. This study emerges as a combination of traditional signal processing tasks and new deep learning models, which further proved to be a better combination for VLD tasks. The experiments are performed on the POCO and ASVSpoof 2017 V2 database. We used Constant Q-transform, Mel, and short-time Fourier transform (STFT) based front-end features and used CNN, BiLSTM, and KAN as back-end classifiers. The best accuracy is 91.26 % on the POCO database using STFT features with the KAN classifier. In the ASVSpoof 2017 V2 database, the lowest EER we obtained was 26.42 %, using CQT features and KAN as a classifier.

Keywords: Kolmogorov Arnold networks, multilayer perceptron, pop noise, voice liveness detection

Procedia PDF Downloads 44
158 Predictive Analysis of Chest X-rays Using NLP and Large Language Models with the Indiana University Dataset and Random Forest Classifier

Authors: Azita Ramezani, Ghazal Mashhadiagha, Bahareh Sanabakhsh

Abstract:

This study researches the combination of Random. Forest classifiers with large language models (LLMs) and natural language processing (NLP) to improve diagnostic accuracy in chest X-ray analysis using the Indiana University dataset. Utilizing advanced NLP techniques, the research preprocesses textual data from radiological reports to extract key features, which are then merged with image-derived data. This improved dataset is analyzed with Random Forest classifiers to predict specific clinical results, focusing on the identification of health issues and the estimation of case urgency. The findings reveal that the combination of NLP, LLMs, and machine learning not only increases diagnostic precision but also reliability, especially in quickly identifying critical conditions. Achieving an accuracy of 99.35%, the model shows significant advancements over conventional diagnostic techniques. The results emphasize the large potential of machine learning in medical imaging, suggesting that these technologies could greatly enhance clinician judgment and patient outcomes by offering quicker and more precise diagnostic approximations.

Keywords: natural language processing (NLP), large language models (LLMs), random forest classifier, chest x-ray analysis, medical imaging, diagnostic accuracy, indiana university dataset, machine learning in healthcare, predictive modeling, clinical decision support systems

Procedia PDF Downloads 47
157 Classification Using Worldview-2 Imagery of Giant Panda Habitat in Wolong, Sichuan Province, China

Authors: Yunwei Tang, Linhai Jing, Hui Li, Qingjie Liu, Xiuxia Li, Qi Yan, Haifeng Ding

Abstract:

The giant panda (Ailuropoda melanoleuca) is an endangered species, mainly live in central China, where bamboos act as the main food source of wild giant pandas. Knowledge of spatial distribution of bamboos therefore becomes important for identifying the habitat of giant pandas. There have been ongoing studies for mapping bamboos and other tree species using remote sensing. WorldView-2 (WV-2) is the first high resolution commercial satellite with eight Multi-Spectral (MS) bands. Recent studies demonstrated that WV-2 imagery has a high potential in classification of tree species. The advanced classification techniques are important for utilising high spatial resolution imagery. It is generally agreed that object-based image analysis is a more desirable method than pixel-based analysis in processing high spatial resolution remotely sensed data. Classifiers that use spatial information combined with spectral information are known as contextual classifiers. It is suggested that contextual classifiers can achieve greater accuracy than non-contextual classifiers. Thus, spatial correlation can be incorporated into classifiers to improve classification results. The study area is located at Wuyipeng area in Wolong, Sichuan Province. The complex environment makes it difficult for information extraction since bamboos are sparsely distributed, mixed with brushes, and covered by other trees. Extensive fieldworks in Wuyingpeng were carried out twice. The first one was on 11th June, 2014, aiming at sampling feature locations for geometric correction and collecting training samples for classification. The second fieldwork was on 11th September, 2014, for the purposes of testing the classification results. In this study, spectral separability analysis was first performed to select appropriate MS bands for classification. Also, the reflectance analysis provided information for expanding sample points under the circumstance of knowing only a few. Then, a spatially weighted object-based k-nearest neighbour (k-NN) classifier was applied to the selected MS bands to identify seven land cover types (bamboo, conifer, broadleaf, mixed forest, brush, bare land, and shadow), accounting for spatial correlation within classes using geostatistical modelling. The spatially weighted k-NN method was compared with three alternatives: the traditional k-NN classifier, the Support Vector Machine (SVM) method and the Classification and Regression Tree (CART). Through field validation, it was proved that the classification result obtained using the spatially weighted k-NN method has the highest overall classification accuracy (77.61%) and Kappa coefficient (0.729); the producer’s accuracy and user’s accuracy achieve 81.25% and 95.12% for the bamboo class, respectively, also higher than the other methods. Photos of tree crowns were taken at sample locations using a fisheye camera, so the canopy density could be estimated. It is found that it is difficult to identify bamboo in the areas with a large canopy density (over 0.70); it is possible to extract bamboos in the areas with a median canopy density (from 0.2 to 0.7) and in a sparse forest (canopy density is less than 0.2). In summary, this study explores the ability of WV-2 imagery for bamboo extraction in a mountainous region in Sichuan. The study successfully identified the bamboo distribution, providing supporting knowledge for assessing the habitats of giant pandas.

Keywords: bamboo mapping, classification, geostatistics, k-NN, worldview-2

Procedia PDF Downloads 313
156 Performance Study of Classification Algorithms for Consumer Online Shopping Attitudes and Behavior Using Data Mining

Authors: Rana Alaa El-Deen Ahmed, M. Elemam Shehab, Shereen Morsy, Nermeen Mekawie

Abstract:

With the growing popularity and acceptance of e-commerce platforms, users face an ever increasing burden in actually choosing the right product from the large number of online offers. Thus, techniques for personalization and shopping guides are needed by users. For a pleasant and successful shopping experience, users need to know easily which products to buy with high confidence. Since selling a wide variety of products has become easier due to the popularity of online stores, online retailers are able to sell more products than a physical store. The disadvantage is that the customers might not find products they need. In this research the customer will be able to find the products he is searching for, because recommender systems are used in some ecommerce web sites. Recommender system learns from the information about customers and products and provides appropriate personalized recommendations to customers to find the needed product. In this paper eleven classification algorithms are comparatively tested to find the best classifier fit for consumer online shopping attitudes and behavior in the experimented dataset. The WEKA knowledge analysis tool, which is an open source data mining workbench software used in comparing conventional classifiers to get the best classifier was used in this research. In this research by using the data mining tool (WEKA) with the experimented classifiers the results show that decision table and filtered classifier gives the highest accuracy and the lowest accuracy classification via clustering and simple cart.

Keywords: classification, data mining, machine learning, online shopping, WEKA

Procedia PDF Downloads 352
155 The Major Challenge of the Health System Health Management Services in Kosovo and Impact on Satisfaction

Authors: Nevruz Zogu, Shpetim Rezniqi

Abstract:

In the framework of transformational economic development social pluralism and the free, market health systems operating in the countries of our region are naturally involved in a process of profound change and reform. Health systems actually represent complex ensembles centers and public and private institutions (domestic and foreign), who administer substantial amounts of human, technological, material, financial, information and scientific facts • The goal of health systems is much more than medical care. It includes the promotion, protection, treatment and rehabilitation of health of the population. • Meeting the needs of increasingly diverse broader health services efficient, secure the quality and affordability of their increasing cost of unstoppable, requires the necessary reform of health systems and implementing policies and new management methods, to ensure effectiveness and health benefits as higher population.

Keywords: health, management, economy, finance

Procedia PDF Downloads 425
154 Musical Instrument Recognition in Polyphonic Audio Through Convolutional Neural Networks and Spectrograms

Authors: Rujia Chen, Akbar Ghobakhlou, Ajit Narayanan

Abstract:

This study investigates the task of identifying musical instruments in polyphonic compositions using Convolutional Neural Networks (CNNs) from spectrogram inputs, focusing on binary classification. The model showed promising results, with an accuracy of 97% on solo instrument recognition. When applied to polyphonic combinations of 1 to 10 instruments, the overall accuracy was 64%, reflecting the increasing challenge with larger ensembles. These findings contribute to the field of Music Information Retrieval (MIR) by highlighting the potential and limitations of current approaches in handling complex musical arrangements. Future work aims to include a broader range of musical sounds, including electronic and synthetic sounds, to improve the model's robustness and applicability in real-time MIR systems.

Keywords: binary classifier, CNN, spectrogram, instrument

Procedia PDF Downloads 85
153 Machine Learning Classification of Fused Sentinel-1 and Sentinel-2 Image Data Towards Mapping Fruit Plantations in Highly Heterogenous Landscapes

Authors: Yingisani Chabalala, Elhadi Adam, Khalid Adem Ali

Abstract:

Mapping smallholder fruit plantations using optical data is challenging due to morphological landscape heterogeneity and crop types having overlapped spectral signatures. Furthermore, cloud covers limit the use of optical sensing, especially in subtropical climates where they are persistent. This research assessed the effectiveness of Sentinel-1 (S1) and Sentinel-2 (S2) data for mapping fruit trees and co-existing land-use types by using support vector machine (SVM) and random forest (RF) classifiers independently. These classifiers were also applied to fused data from the two sensors. Feature ranks were extracted using the RF mean decrease accuracy (MDA) and forward variable selection (FVS) to identify optimal spectral windows to classify fruit trees. Based on RF MDA and FVS, the SVM classifier resulted in relatively high classification accuracy with overall accuracy (OA) = 0.91.6% and kappa coefficient = 0.91% when applied to the fused satellite data. Application of SVM to S1, S2, S2 selected variables and S1S2 fusion independently produced OA = 27.64, Kappa coefficient = 0.13%; OA= 87%, Kappa coefficient = 86.89%; OA = 69.33, Kappa coefficient = 69. %; OA = 87.01%, Kappa coefficient = 87%, respectively. Results also indicated that the optimal spectral bands for fruit tree mapping are green (B3) and SWIR_2 (B10) for S2, whereas for S1, the vertical-horizontal (VH) polarization band. Including the textural metrics from the VV channel improved crop discrimination and co-existing land use cover types. The fusion approach proved robust and well-suited for accurate smallholder fruit plantation mapping.

Keywords: smallholder agriculture, fruit trees, data fusion, precision agriculture

Procedia PDF Downloads 56
152 Evaluation of Machine Learning Algorithms and Ensemble Methods for Prediction of Students’ Graduation

Authors: Soha A. Bahanshal, Vaibhav Verdhan, Bayong Kim

Abstract:

Graduation rates at six-year colleges are becoming a more essential indicator for incoming fresh students and for university rankings. Predicting student graduation is extremely beneficial to schools and has a huge potential for targeted intervention. It is important for educational institutions since it enables the development of strategic plans that will assist or improve students' performance in achieving their degrees on time (GOT). A first step and a helping hand in extracting useful information from these data and gaining insights into the prediction of students' progress and performance is offered by machine learning techniques. Data analysis and visualization techniques are applied to understand and interpret the data. The data used for the analysis contains students who have graduated in 6 years in the academic year 2017-2018 for science majors. This analysis can be used to predict the graduation of students in the next academic year. Different Predictive modelings such as logistic regression, decision trees, support vector machines, Random Forest, Naïve Bayes, and KNeighborsClassifier are applied to predict whether a student will graduate. These classifiers were evaluated with k folds of 5. The performance of these classifiers was compared based on accuracy measurement. The results indicated that Ensemble Classifier achieves better accuracy, about 91.12%. This GOT prediction model would hopefully be useful to university administration and academics in developing measures for assisting and boosting students' academic performance and ensuring they graduate on time.

Keywords: prediction, decision trees, machine learning, support vector machine, ensemble model, student graduation, GOT graduate on time

Procedia PDF Downloads 73
151 Comparison of Deep Learning and Machine Learning Algorithms to Diagnose and Predict Breast Cancer

Authors: F. Ghazalnaz Sharifonnasabi, Iman Makhdoom

Abstract:

Breast cancer is a serious health concern that affects many people around the world. According to a study published in the Breast journal, the global burden of breast cancer is expected to increase significantly over the next few decades. The number of deaths from breast cancer has been increasing over the years, but the age-standardized mortality rate has decreased in some countries. It’s important to be aware of the risk factors for breast cancer and to get regular check- ups to catch it early if it does occur. Machin learning techniques have been used to aid in the early detection and diagnosis of breast cancer. These techniques, that have been shown to be effective in predicting and diagnosing the disease, have become a research hotspot. In this study, we consider two deep learning approaches including: Multi-Layer Perceptron (MLP), and Convolutional Neural Network (CNN). We also considered the five-machine learning algorithm titled: Decision Tree (C4.5), Naïve Bayesian (NB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) Algorithm and XGBoost (eXtreme Gradient Boosting) on the Breast Cancer Wisconsin Diagnostic dataset. We have carried out the process of evaluating and comparing classifiers involving selecting appropriate metrics to evaluate classifier performance and selecting an appropriate tool to quantify this performance. The main purpose of the study is predicting and diagnosis breast cancer, applying the mentioned algorithms and also discovering of the most effective with respect to confusion matrix, accuracy and precision. It is realized that CNN outperformed all other classifiers and achieved the highest accuracy (0.982456). The work is implemented in the Anaconda environment based on Python programing language.

Keywords: breast cancer, multi-layer perceptron, Naïve Bayesian, SVM, decision tree, convolutional neural network, XGBoost, KNN

Procedia PDF Downloads 78
150 Evaluation of Random Forest and Support Vector Machine Classification Performance for the Prediction of Early Multiple Sclerosis from Resting State FMRI Connectivity Data

Authors: V. Saccà, A. Sarica, F. Novellino, S. Barone, T. Tallarico, E. Filippelli, A. Granata, P. Valentino, A. Quattrone

Abstract:

The work aim was to evaluate how well Random Forest (RF) and Support Vector Machine (SVM) algorithms could support the early diagnosis of Multiple Sclerosis (MS) from resting-state functional connectivity data. In particular, we wanted to explore the ability in distinguishing between controls and patients of mean signals extracted from ICA components corresponding to 15 well-known networks. Eighteen patients with early-MS (mean-age 37.42±8.11, 9 females) were recruited according to McDonald and Polman, and matched for demographic variables with 19 healthy controls (mean-age 37.55±14.76, 10 females). MRI was acquired by a 3T scanner with 8-channel head coil: (a)whole-brain T1-weighted; (b)conventional T2-weighted; (c)resting-state functional MRI (rsFMRI), 200 volumes. Estimated total lesion load (ml) and number of lesions were calculated using LST-toolbox from the corrected T1 and FLAIR. All rsFMRIs were pre-processed using tools from the FMRIB's Software Library as follows: (1) discarding of the first 5 volumes to remove T1 equilibrium effects, (2) skull-stripping of images, (3) motion and slice-time correction, (4) denoising with high-pass temporal filter (128s), (5) spatial smoothing with a Gaussian kernel of FWHM 8mm. No statistical significant differences (t-test, p < 0.05) were found between the two groups in the mean Euclidian distance and the mean Euler angle. WM and CSF signal together with 6 motion parameters were regressed out from the time series. We applied an independent component analysis (ICA) with the GIFT-toolbox using the Infomax approach with number of components=21. Fifteen mean components were visually identified by two experts. The resulting z-score maps were thresholded and binarized to extract the mean signal of the 15 networks for each subject. Statistical and machine learning analysis were then conducted on this dataset composed of 37 rows (subjects) and 15 features (mean signal in the network) with R language. The dataset was randomly splitted into training (75%) and test sets and two different classifiers were trained: RF and RBF-SVM. We used the intrinsic feature selection of RF, based on the Gini index, and recursive feature elimination (rfe) for the SVM, to obtain a rank of the most predictive variables. Thus, we built two new classifiers only on the most important features and we evaluated the accuracies (with and without feature selection) on test-set. The classifiers, trained on all the features, showed very poor accuracies on training (RF:58.62%, SVM:65.52%) and test sets (RF:62.5%, SVM:50%). Interestingly, when feature selection by RF and rfe-SVM were performed, the most important variable was the sensori-motor network I in both cases. Indeed, with only this network, RF and SVM classifiers reached an accuracy of 87.5% on test-set. More interestingly, the only misclassified patient resulted to have the lowest value of lesion volume. We showed that, with two different classification algorithms and feature selection approaches, the best discriminant network between controls and early MS, was the sensori-motor I. Similar importance values were obtained for the sensori-motor II, cerebellum and working memory networks. These findings, in according to the early manifestation of motor/sensorial deficits in MS, could represent an encouraging step toward the translation to the clinical diagnosis and prognosis.

Keywords: feature selection, machine learning, multiple sclerosis, random forest, support vector machine

Procedia PDF Downloads 241
149 Differences in the Level of Self-Efficacy and Intensity of Narcissism among Band and Solo Musicians

Authors: Weronika Molińska, Joanna Rajchert

Abstract:

A musical career is not only about the quality of performing or playing music. Musicians can choose from a variety of specializations and career paths. The described study focused on psychological traits which relate to a solo career (performing individually or as a leader) or performing as part of a chamber ensemble, ensemble, choir, or orchestra. The hypothesis predicted that narcissism and self-efficacy would be higher in musicians performing solo. The study involved 124 professional musicians: instrumentalists and soloists, singers (n = 59), and ensemble instrumentalists and singers (n = 65). The results confirmed the hypothesis and showed that soloists were higher on self-efficacy and narcissism. In particular, soloists were higher on leader characteristics, demand for admiration, and vanity than musicians performing in ensembles. The result of these studies is a good introduction to a broader project answering the questions of what can increase or decrease the musician's sense of self-efficacy and whether the decreased self-efficacy could induce musicians to give up their solo careers.

Keywords: self-efficacy, musicians, musical profession, narcissism, soloists

Procedia PDF Downloads 66
148 Stationary Energy Partition between Waves in a Carbyne Chain

Authors: Svetlana Nikitenkova, Dmitry Kovriguine

Abstract:

Stationary energy partition between waves in a one dimensional carbyne chain at ambient temperatures is investigated. The study is carried out by standard asymptotic methods of nonlinear dynamics in the framework of classical mechanics, based on a simple mathematical model, taking into account central and noncentral interactions between carbon atoms. Within the first-order nonlinear approximation analysis, triple-mode resonant ensembles of quasi-harmonic waves are revealed. Any resonant triad consists of a single primary high-frequency longitudinal mode and a pair of secondary low-frequency transverse modes of oscillations. In general, the motion of the carbyne chain is described by a superposition of resonant triads of various spectral scales. It is found that the stationary energy distribution is obeyed to the classical Rayleigh–Jeans law, at the expense of the proportional amplitude dispersion, except a shift in the frequency band, upwards the spectrum.

Keywords: resonant triplet, Rayleigh–Jeans law, amplitude dispersion, carbyne

Procedia PDF Downloads 444
147 A Multi-Output Network with U-Net Enhanced Class Activation Map and Robust Classification Performance for Medical Imaging Analysis

Authors: Jaiden Xuan Schraut, Leon Liu, Yiqiao Yin

Abstract:

Computer vision in medical diagnosis has achieved a high level of success in diagnosing diseases with high accuracy. However, conventional classifiers that produce an image to-label result provides insufficient information for medical professionals to judge and raise concerns over the trust and reliability of a model with results that cannot be explained. In order to gain local insight into cancerous regions, separate tasks such as imaging segmentation need to be implemented to aid the doctors in treating patients, which doubles the training time and costs which renders the diagnosis system inefficient and difficult to be accepted by the public. To tackle this issue and drive AI-first medical solutions further, this paper proposes a multi-output network that follows a U-Net architecture for image segmentation output and features an additional convolutional neural networks (CNN) module for auxiliary classification output. Class activation maps are a method of providing insight into a convolutional neural network’s feature maps that leads to its classification but in the case of lung diseases, the region of interest is enhanced by U-net-assisted Class Activation Map (CAM) visualization. Therefore, our proposed model combines image segmentation models and classifiers to crop out only the lung region of a chest X-ray’s class activation map to provide a visualization that improves the explainability and is able to generate classification results simultaneously which builds trust for AI-led diagnosis systems. The proposed U-Net model achieves 97.61% accuracy and a dice coefficient of 0.97 on testing data from the COVID-QU-Ex Dataset which includes both diseased and healthy lungs.

Keywords: multi-output network model, U-net, class activation map, image classification, medical imaging analysis

Procedia PDF Downloads 205
146 Influence of Dryer Autumn Conditions on Weed Control Based on Soil Active Herbicides

Authors: Juergen Junk, Franz Ronellenfitsch, Michael Eickermann

Abstract:

An appropriate weed management in autumn is a prerequisite for an economically successful harvest in the following year. In Luxembourg oilseed rape, wheat and barley is sown from August until October, accompanied by a chemical weed control with soil active herbicides, depending on the state of the weeds and the meteorological conditions. Based on regular ground and surface water-analysis, high levels of contamination by transformation products of respective herbicide compounds have been found in Luxembourg. The most ideal conditions for incorporating soil active herbicides are single rain events. Weed control may be reduced if application is made when weeds are under drought stress or if repeated light rain events followed by dry spells, because the herbicides tend to bind tightly to the soil particles. These effects have been frequently reported for Luxembourg throughout the last years. In the framework of a multisite long-term field experiment (EFFO) weed monitoring, plants observations and corresponding meteorological measurements were conducted. Long-term time series (1947-2016) from the SYNOP station Findel-Airport (WMO ID = 06590) showed a decrease in the number of days with precipitation. As the total precipitation amount has not significantly changed, this indicates a trend towards rain events with higher intensity. All analyses are based on decades (10-day periods) for September and October of each individual year. To assess the future meteorological conditions for Luxembourg, two different approaches were applied. First, multi-model ensembles from the CORDEX experiments (spatial resolution ~12.5 km; transient projections until 2100) were analysed for two different Representative Concentration Pathways (RCP8.5 and RCP4.5), covering the time span from 2005 until 2100. The multi-model ensemble approach allows for the quantification of the uncertainties and also to assess the differences between the two emission scenarios. Second, to assess smaller scale differences within the country a high resolution model projection using the COSMO-LM model was used (spatial resolution 1.3 km). To account for the higher computational demands, caused by the increased spatial resolution, only 10-year time slices have been simulated (reference period 1991-2000; near future 2041-2050 and far future 2091-2100). Statistically significant trends towards higher air temperatures, +1.6 K for September (+5.3 K far future) and +1.3 K for October (+4.3 K), were predicted for the near future compared to the reference period. Precipitation simultaneously decreased by 9.4 mm (September) and 5.0 mm (October) for the near future and -49 mm (September) and -10 mm (October) in the far future. Beside the monthly values also decades were analyzed for the two future time periods of the CLM model. For all decades of September and October the number of days with precipitation decreased for the projected near and far future. Changes in meteorological variables such as air temperature and precipitation did already induce transformations in weed societies (composition, late-emerging etc.) of arable ecosystems in Europe. Therefore, adaptations of agronomic practices as well as effective weed control strategies must be developed to maintain crop yield.

Keywords: CORDEX projections, dry spells, ensembles, weed management

Procedia PDF Downloads 235
145 Comparing SVM and Naïve Bayes Classifier for Automatic Microaneurysm Detections

Authors: A. Sopharak, B. Uyyanonvara, S. Barman

Abstract:

Diabetic retinopathy is characterized by the development of retinal microaneurysms. The damage can be prevented if disease is treated in its early stages. In this paper, we are comparing Support Vector Machine (SVM) and Naïve Bayes (NB) classifiers for automatic microaneurysm detection in images acquired through non-dilated pupils. The Nearest Neighbor classifier is used as a baseline for comparison. Detected microaneurysms are validated with expert ophthalmologists’ hand-drawn ground-truths. The sensitivity, specificity, precision and accuracy of each method are also compared.

Keywords: diabetic retinopathy, microaneurysm, naive Bayes classifier, SVM classifier

Procedia PDF Downloads 329
144 The Optimization of Decision Rules in Multimodal Decision-Level Fusion Scheme

Authors: Andrey V. Timofeev, Dmitry V. Egorov

Abstract:

This paper introduces an original method of parametric optimization of the structure for multimodal decision-level fusion scheme which combines the results of the partial solution of the classification task obtained from assembly of the mono-modal classifiers. As a result, a multimodal fusion classifier which has the minimum value of the total error rate has been obtained.

Keywords: classification accuracy, fusion solution, total error rate, multimodal fusion classifier

Procedia PDF Downloads 467
143 Active Features Determination: A Unified Framework

Authors: Meenal Badki

Abstract:

We address the issue of active feature determination, where the objective is to determine the set of examples on which additional data (such as lab tests) needs to be gathered, given a large number of examples with some features (such as demographics) and some examples with all the features (such as the complete Electronic Health Record). We note that certain features may be more costly, unique, or laborious to gather. Our proposal is a general active learning approach that is independent of classifiers and similarity metrics. It allows us to identify examples that differ from the full data set and obtain all the features for the examples that match. Our comprehensive evaluation shows the efficacy of this approach, which is driven by four authentic clinical tasks.

Keywords: feature determination, classification, active learning, sample-efficiency

Procedia PDF Downloads 77
142 A Computer-Aided System for Tooth Shade Matching

Authors: Zuhal Kurt, Meral Kurt, Bilge T. Bal, Kemal Ozkan

Abstract:

Shade matching and reproduction is the most important element of success in prosthetic dentistry. Until recently, shade matching procedure was implemented by dentists visual perception with the help of shade guides. Since many factors influence visual perception; tooth shade matching using visual devices (shade guides) is highly subjective and inconsistent. Subjective nature of this process has lead to the development of instrumental devices. Nowadays, colorimeters, spectrophotometers, spectroradiometers and digital image analysing systems are used for instrumental shade selection. Instrumental devices have advantages that readings are quantifiable, can obtain more rapidly and simply, objectively and precisely. However, these devices have noticeable drawbacks. For example, translucent structure and irregular surfaces of teeth lead to defects on measurement with these devices. Also between the results acquired by devices with different measurement principles may make inconsistencies. So, its obligatory to search for new methods for dental shade matching process. A computer-aided system device; digital camera has developed rapidly upon today. Currently, advances in image processing and computing have resulted in the extensive use of digital cameras for color imaging. This procedure has a much cheaper process than the use of traditional contact-type color measurement devices. Digital cameras can be taken by the place of contact-type instruments for shade selection and overcome their disadvantages. Images taken from teeth show morphology and color texture of teeth. In last decades, a new method was recommended to compare the color of shade tabs taken by a digital camera using color features. This method showed that visual and computer-aided shade matching systems should be used as concatenated. Recently using methods of feature extraction techniques are based on shape description and not used color information. However, color is mostly experienced as an essential property in depicting and extracting features from objects in the world around us. When local feature descriptors with color information are extended by concatenating color descriptor with the shape descriptor, that descriptor will be effective on visual object recognition and classification task. Therefore, the color descriptor is to be used in combination with a shape descriptor it does not need to contain any spatial information, which leads us to use local histograms. This local color histogram method is remain reliable under variation of photometric changes, geometrical changes and variation of image quality. So, coloring local feature extraction methods are used to extract features, and also the Scale Invariant Feature Transform (SIFT) descriptor used to for shape description in the proposed method. After the combination of these descriptors, the state-of-art descriptor named by Color-SIFT will be used in this study. Finally, the image feature vectors obtained from quantization algorithm are fed to classifiers such as Nearest Neighbor (KNN), Naive Bayes or Support Vector Machines (SVM) to determine label(s) of the visual object category or matching. In this study, SVM are used as classifiers for color determination and shade matching. Finally, experimental results of this method will be compared with other recent studies. It is concluded from the study that the proposed method is remarkable development on computer aided tooth shade determination system.

Keywords: classifiers, color determination, computer-aided system, tooth shade matching, feature extraction

Procedia PDF Downloads 448
141 Using Flow Line Modelling, Remote Sensing for Reconstructing Glacier Volume Loss Model for Athabasca Glacier, Canadian Rockies

Authors: Rituparna Nath, Shawn J. Marshall

Abstract:

Glaciers are one of the main sensitive climatic indicators, as they respond strongly to small climatic shifts. We develop a flow line model of glacier dynamics to simulate the past and future extent of glaciers in the Canadian Rocky Mountains, with the aim of coupling this model within larger scale regional climate models of glacier response to climate change. This paper will focus on glacier-climate modeling and reconstructions of glacier volume from the Little Ice Age (LIA) to present for Athabasca Glacier, Alberta, Canada. Glacier thickness, volume and mass change will be constructed using flow line modelling and examination of different climate scenarios that are able to give good reconstructions of LIA ice extent. With the availability of SPOT 5 imagery, Digital elevation models and GIS Arc Hydro tool, ice catchment properties-glacier width and LIA moraines have been extracted using automated procedures. Simulation of glacier mass change will inform estimates of meltwater run off over the historical period and model calibration from the LIA reconstruction will aid in future projections of the effects of climate change on glacier recession. Furthermore, the model developed will be effective for further future studies with ensembles of glaciers.

Keywords: flow line modeling, Athabasca Glacier, glacier mass balance, Remote Sensing, Arc hydro tool, little ice age

Procedia PDF Downloads 269
140 Ribotaxa: Combined Approaches for Taxonomic Resolution Down to the Species Level from Metagenomics Data Revealing Novelties

Authors: Oshma Chakoory, Sophie Comtet-Marre, Pierre Peyret

Abstract:

Metagenomic classifiers are widely used for the taxonomic profiling of metagenomic data and estimation of taxa relative abundance. Small subunit rRNA genes are nowadays a gold standard for the phylogenetic resolution of complex microbial communities, although the power of this marker comes down to its use as full-length. We benchmarked the performance and accuracy of rRNA-specialized versus general-purpose read mappers, reference-targeted assemblers and taxonomic classifiers. We then built a pipeline called RiboTaxa to generate a highly sensitive and specific metataxonomic approach. Using metagenomics data, RiboTaxa gave the best results compared to other tools (Kraken2, Centrifuge (1), METAXA2 (2), PhyloFlash (3)) with precise taxonomic identification and relative abundance description, giving no false positive detection. Using real datasets from various environments (ocean, soil, human gut) and from different approaches (metagenomics and gene capture by hybridization), RiboTaxa revealed microbial novelties not seen by current bioinformatics analysis opening new biological perspectives in human and environmental health. In a study focused on corals’ health involving 20 metagenomic samples (4), an affiliation of prokaryotes was limited to the family level with Endozoicomonadaceae characterising healthy octocoral tissue. RiboTaxa highlighted 2 species of uncultured Endozoicomonas which were dominant in the healthy tissue. Both species belonged to a genus not yet described, opening new research perspectives on corals’ health. Applied to metagenomics data from a study on human gut and extreme longevity (5), RiboTaxa detected the presence of an uncultured archaeon in semi-supercentenarians (aged 105 to 109 years) highlighting an archaeal genus, not yet described, and 3 uncultured species belonging to the Enorma genus that could be species of interest participating in the longevity process. RiboTaxa is user-friendly, rapid, allowing microbiota structure description from any environment and the results can be easily interpreted. This software is freely available at https://github.com/oschakoory/RiboTaxa under the GNU Affero General Public License 3.0.

Keywords: metagenomics profiling, microbial diversity, SSU rRNA genes, full-length phylogenetic marker

Procedia PDF Downloads 123
139 Natural Language Processing for the Classification of Social Media Posts in Post-Disaster Management

Authors: Ezgi Şendil

Abstract:

Information extracted from social media has received great attention since it has become an effective alternative for collecting people’s opinions and emotions based on specific experiences in a faster and easier way. The paper aims to put data in a meaningful way to analyze users’ posts and get a result in terms of the experiences and opinions of the users during and after natural disasters. The posts collected from Reddit are classified into nine different categories, including injured/dead people, infrastructure and utility damage, missing/found people, donation needs/offers, caution/advice, and emotional support, identified by using labelled Twitter data and four different machine learning (ML) classifiers.

Keywords: disaster, NLP, postdisaster management, sentiment analysis

Procedia PDF Downloads 75
138 Secure Multiparty Computations for Privacy Preserving Classifiers

Authors: M. Sumana, K. S. Hareesha

Abstract:

Secure computations are essential while performing privacy preserving data mining. Distributed privacy preserving data mining involve two to more sites that cannot pool in their data to a third party due to the violation of law regarding the individual. Hence in order to model the private data without compromising privacy and information loss, secure multiparty computations are used. Secure computations of product, mean, variance, dot product, sigmoid function using the additive and multiplicative homomorphic property is discussed. The computations are performed on vertically partitioned data with a single site holding the class value.

Keywords: homomorphic property, secure product, secure mean and variance, secure dot product, vertically partitioned data

Procedia PDF Downloads 412
137 Social Media Data Analysis for Personality Modelling and Learning Styles Prediction Using Educational Data Mining

Authors: Srushti Patil, Preethi Baligar, Gopalkrishna Joshi, Gururaj N. Bhadri

Abstract:

In designing learning environments, the instructional strategies can be tailored to suit the learning style of an individual to ensure effective learning. In this study, the information shared on social media like Facebook is being used to predict learning style of a learner. Previous research studies have shown that Facebook data can be used to predict user personality. Users with a particular personality exhibit an inherent pattern in their digital footprint on Facebook. The proposed work aims to correlate the user's’ personality, predicted from Facebook data to the learning styles, predicted through questionnaires. For Millennial learners, Facebook has become a primary means for information sharing and interaction with peers. Thus, it can serve as a rich bed for research and direct the design of learning environments. The authors have conducted this study in an undergraduate freshman engineering course. Data from 320 freshmen Facebook users was collected. The same users also participated in the learning style and personality prediction survey. The Kolb’s Learning style questionnaires and Big 5 personality Inventory were adopted for the survey. The users have agreed to participate in this research and have signed individual consent forms. A specific page was created on Facebook to collect user data like personal details, status updates, comments, demographic characteristics and egocentric network parameters. This data was captured by an application created using Python program. The data captured from Facebook was subjected to text analysis process using the Linguistic Inquiry and Word Count dictionary. An analysis of the data collected from the questionnaires performed reveals individual student personality and learning style. The results obtained from analysis of Facebook, learning style and personality data were then fed into an automatic classifier that was trained by using the data mining techniques like Rule-based classifiers and Decision trees. This helps to predict the user personality and learning styles by analysing the common patterns. Rule-based classifiers applied for text analysis helps to categorize Facebook data into positive, negative and neutral. There were totally two models trained, one to predict the personality from Facebook data; another one to predict the learning styles from the personalities. The results show that the classifier model has high accuracy which makes the proposed method to be a reliable one for predicting the user personality and learning styles.

Keywords: educational data mining, Facebook, learning styles, personality traits

Procedia PDF Downloads 231