Search results for: Bagging Classifier
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 366

Search results for: Bagging Classifier

246 Bilingualism: A Case Study of Assamese and Bodo Classifiers

Authors: Samhita Bharadwaj

Abstract:

This is an empirical study of classifiers in Assamese and Bodo, two genetically unrelated languages of India. The objective of the paper is to address the language contact between Assamese and Bodo as reflected in classifiers. The data has been collected through fieldwork in Bodo recording narratives and folk tales and eliciting specific data from the speakers. The data for Assamese is self-produced as native speaker of the language. Assamese is the easternmost New-Indo-Aryan (henceforth NIA) language mainly spoken in the Brahmaputra valley of Assam and some other north-eastern states of India. It is the lingua franca of Assam and is creolised in the neighbouring state of Nagaland. Bodo, on the other hand, is a Tibeto-Burman (henceforth TB) language of the Bodo-Garo group. It has the highest number of speakers among the TB languages of Assam. However, compared to Assamese, it is still a lesser documented language and due to the prestige of Assamese, all the Bodo speakers are fluent bi-lingual in Assamese, though the opposite isn’t the case. With this context, classifiers, a characteristic phenomenon of TB languages, but not so much of NIA languages, presents an interesting case study on language contact caused by bilingualism. Assamese, as a result of its language contact with the TB languages which are rich in classifiers; has developed the richest classifier system among the IA languages in India. Yet, as a part of rampant borrowing of Assamese words and patterns into Bodo; Bodo is seen to borrow even Assamese classifiers into its system. This paper analyses the borrowed classifiers of Bodo and finds the route of this borrowing phenomenon in the number system of the languages. As the Bodo speakers start replacing the higher numbers from five with Assamese ones, they also choose the Assamese classifiers to attach to these numbers. Thus, the partial loss of number in Bodo as a result of language contact and bilingualism in Assamese is found to be the reason behind the borrowing of classifiers in Bodo. The significance of the study lies in exploring an interesting aspect of language contact in Assam. It is hoped that this will attract further research on bilingualism and classifiers in Assam.

Keywords: Assamese, bi-lingual, Bodo, borrowing, classifier, language contact

Procedia PDF Downloads 192
245 Pneumoperitoneum Creation Assisted with Optical Coherence Tomography and Automatic Identification

Authors: Eric Yi-Hsiu Huang, Meng-Chun Kao, Wen-Chuan Kuo

Abstract:

For every laparoscopic surgery, a safe pneumoperitoneumcreation (gaining access to the peritoneal cavity) is the first and essential step. However, closed pneumoperitoneum is usually obtained by blind insertion of a Veress needle into the peritoneal cavity, which may carry potential risks suchas bowel and vascular injury.Until now, there remains no definite measure to visually confirm the position of the needle tip inside the peritoneal cavity. Therefore, this study established an image-guided Veress needle method by combining a fiber probe with optical coherence tomography (OCT). An algorithm was also proposed for determining the exact location of the needle tip through the acquisition of OCT images. Our method not only generates a series of “live” two-dimensional (2D) images during the needle puncture toward the peritoneal cavity but also can eliminate operator variation in image judgment, thus improving peritoneal access safety. This study was approved by the Ethics Committee of Taipei Veterans General Hospital (Taipei VGH IACUC 2020-144). A total of 2400 in vivo OCT images, independent of each other, were acquired from experiments of forty peritoneal punctures on two piglets. Characteristic OCT image patterns could be observed during the puncturing process. The ROC curve demonstrates the discrimination capability of these quantitative image features of the classifier, showing the accuracy of the classifier for determining the inside vs. outside of the peritoneal was 98% (AUC=0.98). In summary, the present study demonstrates the ability of the combination of our proposed automatic identification method and OCT imaging for automatically and objectively identifying the location of the needle tip. OCT images translate the blind closed technique of peritoneal access into a visualized procedure, thus improving peritoneal access safety.

Keywords: pneumoperitoneum, optical coherence tomography, automatic identification, veress needle

Procedia PDF Downloads 96
244 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features

Authors: Bushra Zafar, Usman Qamar

Abstract:

Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.

Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection

Procedia PDF Downloads 292
243 Self-Supervised Learning for Hate-Speech Identification

Authors: Shrabani Ghosh

Abstract:

Automatic offensive language detection in social media has become a stirring task in today's NLP. Manual Offensive language detection is tedious and laborious work where automatic methods based on machine learning are only alternatives. Previous works have done sentiment analysis over social media in different ways such as supervised, semi-supervised, and unsupervised manner. Domain adaptation in a semi-supervised way has also been explored in NLP, where the source domain and the target domain are different. In domain adaptation, the source domain usually has a large amount of labeled data, while only a limited amount of labeled data is available in the target domain. Pretrained transformers like BERT, RoBERTa models are fine-tuned to perform text classification in an unsupervised manner to perform further pre-train masked language modeling (MLM) tasks. In previous work, hate speech detection has been explored in Gab.ai, which is a free speech platform described as a platform of extremist in varying degrees in online social media. In domain adaptation process, Twitter data is used as the source domain, and Gab data is used as the target domain. The performance of domain adaptation also depends on the cross-domain similarity. Different distance measure methods such as L2 distance, cosine distance, Maximum Mean Discrepancy (MMD), Fisher Linear Discriminant (FLD), and CORAL have been used to estimate domain similarity. Certainly, in-domain distances are small, and between-domain distances are expected to be large. The previous work finding shows that pretrain masked language model (MLM) fine-tuned with a mixture of posts of source and target domain gives higher accuracy. However, in-domain performance of the hate classifier on Twitter data accuracy is 71.78%, and out-of-domain performance of the hate classifier on Gab data goes down to 56.53%. Recently self-supervised learning got a lot of attention as it is more applicable when labeled data are scarce. Few works have already been explored to apply self-supervised learning on NLP tasks such as sentiment classification. Self-supervised language representation model ALBERTA focuses on modeling inter-sentence coherence and helps downstream tasks with multi-sentence inputs. Self-supervised attention learning approach shows better performance as it exploits extracted context word in the training process. In this work, a self-supervised attention mechanism has been proposed to detect hate speech on Gab.ai. This framework initially classifies the Gab dataset in an attention-based self-supervised manner. On the next step, a semi-supervised classifier trained on the combination of labeled data from the first step and unlabeled data. The performance of the proposed framework will be compared with the results described earlier and also with optimized outcomes obtained from different optimization techniques.

Keywords: attention learning, language model, offensive language detection, self-supervised learning

Procedia PDF Downloads 82
242 AI-Based Techniques for Online Social Media Network Sentiment Analysis: A Methodical Review

Authors: A. M. John-Otumu, M. M. Rahman, O. C. Nwokonkwo, M. C. Onuoha

Abstract:

Online social media networks have long served as a primary arena for group conversations, gossip, text-based information sharing and distribution. The use of natural language processing techniques for text classification and unbiased decision-making has not been far-fetched. Proper classification of this textual information in a given context has also been very difficult. As a result, we decided to conduct a systematic review of previous literature on sentiment classification and AI-based techniques that have been used in order to gain a better understanding of the process of designing and developing a robust and more accurate sentiment classifier that can correctly classify social media textual information of a given context between hate speech and inverted compliments with a high level of accuracy by assessing different artificial intelligence techniques. We evaluated over 250 articles from digital sources like ScienceDirect, ACM, Google Scholar, and IEEE Xplore and whittled down the number of research to 31. Findings revealed that Deep learning approaches such as CNN, RNN, BERT, and LSTM outperformed various machine learning techniques in terms of performance accuracy. A large dataset is also necessary for developing a robust sentiment classifier and can be obtained from places like Twitter, movie reviews, Kaggle, SST, and SemEval Task4. Hybrid Deep Learning techniques like CNN+LSTM, CNN+GRU, CNN+BERT outperformed single Deep Learning techniques and machine learning techniques. Python programming language outperformed Java programming language in terms of sentiment analyzer development due to its simplicity and AI-based library functionalities. Based on some of the important findings from this study, we made a recommendation for future research.

Keywords: artificial intelligence, natural language processing, sentiment analysis, social network, text

Procedia PDF Downloads 90
241 Development of Biodegradable Plastic as Mango Fruit Bag

Authors: Andres M. Tuates Jr., Ofero A. Caparino

Abstract:

Plastics have achieved a dominant position in agriculture because of their transparency, lightness in weight, impermeability to water and their resistance to microbial attack. However, this generates a higher quantity of wastes that are difficult to dispose of by farmers. To address these problems, the project aim to develop and evaluate the biodegradable film for mango fruit bag during development. The PBS and starch were melt-blended in a twin-screw extruder and then blown into film extrusion machine. The physic-chemical-mechanical properties of biodegradable fruit bag were done following standard methods of test. Field testing of fruit bag was also conducted to evaluate its durability and efficiency field condition. The PHilMech-FiC fruit bag is made of biodegradable material measuring 6 x 8 inches with a thickness of 150 microns. The tensile strength is within the range of LDPE while the elongation is within the range of HDPE. It is projected that after thirty-six (36) weeks, the film will be totally degraded. Results of field testing show that the quality of harvested fruits using PHilMech-FiC biodegradable fruit bag in terms of percent marketable, non-marketable and export, peel color at the ripe stage, flesh color, TSS, oBrix, percent edible portion is comparable with the existing bagging materials such as Chinese brown paper bag and old newspaper.

Keywords: cassava starch, PBS, biodegradable, chemical, mechanical properties

Procedia PDF Downloads 250
240 3D Classification Optimization of Low-Density Airborne Light Detection and Ranging Point Cloud by Parameters Selection

Authors: Baha Eddine Aissou, Aichouche Belhadj Aissa

Abstract:

Light detection and ranging (LiDAR) is an active remote sensing technology used for several applications. Airborne LiDAR is becoming an important technology for the acquisition of a highly accurate dense point cloud. A classification of airborne laser scanning (ALS) point cloud is a very important task that still remains a real challenge for many scientists. Support vector machine (SVM) is one of the most used statistical learning algorithms based on kernels. SVM is a non-parametric method, and it is recommended to be used in cases where the data distribution cannot be well modeled by a standard parametric probability density function. Using a kernel, it performs a robust non-linear classification of samples. Often, the data are rarely linearly separable. SVMs are able to map the data into a higher-dimensional space to become linearly separable, which allows performing all the computations in the original space. This is one of the main reasons that SVMs are well suited for high-dimensional classification problems. Only a few training samples, called support vectors, are required. SVM has also shown its potential to cope with uncertainty in data caused by noise and fluctuation, and it is computationally efficient as compared to several other methods. Such properties are particularly suited for remote sensing classification problems and explain their recent adoption. In this poster, the SVM classification of ALS LiDAR data is proposed. Firstly, connected component analysis is applied for clustering the point cloud. Secondly, the resulting clusters are incorporated in the SVM classifier. Radial basic function (RFB) kernel is used due to the few numbers of parameters (C and γ) that needs to be chosen, which decreases the computation time. In order to optimize the classification rates, the parameters selection is explored. It consists to find the parameters (C and γ) leading to the best overall accuracy using grid search and 5-fold cross-validation. The exploited LiDAR point cloud is provided by the German Society for Photogrammetry, Remote Sensing, and Geoinformation. The ALS data used is characterized by a low density (4-6 points/m²) and is covering an urban area located in residential parts of the city Vaihingen in southern Germany. The class ground and three other classes belonging to roof superstructures are considered, i.e., a total of 4 classes. The training and test sets are selected randomly several times. The obtained results demonstrated that a parameters selection can orient the selection in a restricted interval of (C and γ) that can be further explored but does not systematically lead to the optimal rates. The SVM classifier with hyper-parameters is compared with the most used classifiers in literature for LiDAR data, random forest, AdaBoost, and decision tree. The comparison showed the superiority of the SVM classifier using parameters selection for LiDAR data compared to other classifiers.

Keywords: classification, airborne LiDAR, parameters selection, support vector machine

Procedia PDF Downloads 122
239 Identification of Blood Biomarkers Unveiling Early Alzheimer's Disease Diagnosis Through Single-Cell RNA Sequencing Data and Autoencoders

Authors: Hediyeh Talebi, Shokoofeh Ghiam, Changiz Eslahchi

Abstract:

Traditionally, Alzheimer’s disease research has focused on genes with significant fold changes, potentially neglecting subtle but biologically important alterations. Our study introduces an integrative approach that highlights genes crucial to underlying biological processes, regardless of their fold change magnitude. Alzheimer's Single-cell RNA-seq data related to the peripheral blood mononuclear cells (PBMC) was extracted from the Gene Expression Omnibus (GEO). After quality control, normalization, scaling, batch effect correction, and clustering, differentially expressed genes (DEGs) were identified with adjusted p-values less than 0.05. These DEGs were categorized based on cell-type, resulting in four datasets, each corresponding to a distinct cell type. To distinguish between cells from healthy individuals and those with Alzheimer's, an adversarial autoencoder with a classifier was employed. This allowed for the separation of healthy and diseased samples. To identify the most influential genes in this classification, the weight matrices in the network, which includes the encoder and classifier components, were multiplied, and focused on the top 20 genes. The analysis revealed that while some of these genes exhibit a high fold change, others do not. These genes, which may be overlooked by previous methods due to their low fold change, were shown to be significant in our study. The findings highlight the critical role of genes with subtle alterations in diagnosing Alzheimer's disease, a facet frequently overlooked by conventional methods. These genes demonstrate remarkable discriminatory power, underscoring the need to integrate biological relevance with statistical measures in gene prioritization. This integrative approach enhances our understanding of the molecular mechanisms in Alzheimer’s disease and provides a promising direction for identifying potential therapeutic targets.

Keywords: alzheimer's disease, single-cell RNA-seq, neural networks, blood biomarkers

Procedia PDF Downloads 29
238 Structural Design Optimization of Reinforced Thin-Walled Vessels under External Pressure Using Simulation and Machine Learning Classification Algorithm

Authors: Lydia Novozhilova, Vladimir Urazhdin

Abstract:

An optimization problem for reinforced thin-walled vessels under uniform external pressure is considered. The conventional approaches to optimization generally start with pre-defined geometric parameters of the vessels, and then employ analytic or numeric calculations and/or experimental testing to verify functionality, such as stability under the projected conditions. The proposed approach consists of two steps. First, the feasibility domain will be identified in the multidimensional parameter space. Every point in the feasibility domain defines a design satisfying both geometric and functional constraints. Second, an objective function defined in this domain is formulated and optimized. The broader applicability of the suggested methodology is maximized by implementing the Support Vector Machines (SVM) classification algorithm of machine learning for identification of the feasible design region. Training data for SVM classifier is obtained using the Simulation package of SOLIDWORKS®. Based on the data, the SVM algorithm produces a curvilinear boundary separating admissible and not admissible sets of design parameters with maximal margins. Then optimization of the vessel parameters in the feasibility domain is performed using the standard algorithms for the constrained optimization. As an example, optimization of a ring-stiffened closed cylindrical thin-walled vessel with semi-spherical caps under high external pressure is implemented. As a functional constraint, von Mises stress criterion is used but any other stability constraint admitting mathematical formulation can be incorporated into the proposed approach. Suggested methodology has a good potential for reducing design time for finding optimal parameters of thin-walled vessels under uniform external pressure.

Keywords: design parameters, feasibility domain, von Mises stress criterion, Support Vector Machine (SVM) classifier

Procedia PDF Downloads 298
237 Advancements in Predicting Diabetes Biomarkers: A Machine Learning Epigenetic Approach

Authors: James Ladzekpo

Abstract:

Background: The urgent need to identify new pharmacological targets for diabetes treatment and prevention has been amplified by the disease's extensive impact on individuals and healthcare systems. A deeper insight into the biological underpinnings of diabetes is crucial for the creation of therapeutic strategies aimed at these biological processes. Current predictive models based on genetic variations fall short of accurately forecasting diabetes. Objectives: Our study aims to pinpoint key epigenetic factors that predispose individuals to diabetes. These factors will inform the development of an advanced predictive model that estimates diabetes risk from genetic profiles, utilizing state-of-the-art statistical and data mining methods. Methodology: We have implemented a recursive feature elimination with cross-validation using the support vector machine (SVM) approach for refined feature selection. Building on this, we developed six machine learning models, including logistic regression, k-Nearest Neighbors (k-NN), Naive Bayes, Random Forest, Gradient Boosting, and Multilayer Perceptron Neural Network, to evaluate their performance. Findings: The Gradient Boosting Classifier excelled, achieving a median recall of 92.17% and outstanding metrics such as area under the receiver operating characteristics curve (AUC) with a median of 68%, alongside median accuracy and precision scores of 76%. Through our machine learning analysis, we identified 31 genes significantly associated with diabetes traits, highlighting their potential as biomarkers and targets for diabetes management strategies. Conclusion: Particularly noteworthy were the Gradient Boosting Classifier and Multilayer Perceptron Neural Network, which demonstrated potential in diabetes outcome prediction. We recommend future investigations to incorporate larger cohorts and a wider array of predictive variables to enhance the models' predictive capabilities.

Keywords: diabetes, machine learning, prediction, biomarkers

Procedia PDF Downloads 15
236 Recommender Systems Using Ensemble Techniques

Authors: Yeonjeong Lee, Kyoung-jae Kim, Youngtae Kim

Abstract:

This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.

Keywords: product recommender system, ensemble technique, association rules, decision tree, artificial neural networks

Procedia PDF Downloads 262
235 Musical Instruments Classification Using Machine Learning Techniques

Authors: Bhalke D. G., Bormane D. S., Kharate G. K.

Abstract:

This paper presents classification of musical instrument using machine learning techniques. The classification has been carried out using temporal, spectral, cepstral and wavelet features. Detail feature analysis is carried out using separate and combined features. Further, instrument model has been developed using K-Nearest Neighbor and Support Vector Machine (SVM). Benchmarked McGill university database has been used to test the performance of the system. Experimental result shows that SVM performs better as compared to KNN classifier.

Keywords: feature extraction, SVM, KNN, musical instruments

Procedia PDF Downloads 454
234 Frequency Decomposition Approach for Sub-Band Common Spatial Pattern Methods for Motor Imagery Based Brain-Computer Interface

Authors: Vitor M. Vilas Boas, Cleison D. Silva, Gustavo S. Mafra, Alexandre Trofino Neto

Abstract:

Motor imagery (MI) based brain-computer interfaces (BCI) uses event-related (de)synchronization (ERS/ ERD), typically recorded using electroencephalography (EEG), to translate brain electrical activity into control commands. To mitigate undesirable artifacts and noise measurements on EEG signals, methods based on band-pass filters defined by a specific frequency band (i.e., 8 – 30Hz), such as the Infinity Impulse Response (IIR) filters, are typically used. Spatial techniques, such as Common Spatial Patterns (CSP), are also used to estimate the variations of the filtered signal and extract features that define the imagined motion. The CSP effectiveness depends on the subject's discriminative frequency, and approaches based on the decomposition of the band of interest into sub-bands with smaller frequency ranges (SBCSP) have been suggested to EEG signals classification. However, despite providing good results, the SBCSP approach generally increases the computational cost of the filtering step in IM-based BCI systems. This paper proposes the use of the Fast Fourier Transform (FFT) algorithm in the IM-based BCI filtering stage that implements SBCSP. The goal is to apply the FFT algorithm to reduce the computational cost of the processing step of these systems and to make them more efficient without compromising classification accuracy. The proposal is based on the representation of EEG signals in a matrix of coefficients resulting from the frequency decomposition performed by the FFT, which is then submitted to the SBCSP process. The structure of the SBCSP contemplates dividing the band of interest, initially defined between 0 and 40Hz, into a set of 33 sub-bands spanning specific frequency bands which are processed in parallel each by a CSP filter and an LDA classifier. A Bayesian meta-classifier is then used to represent the LDA outputs of each sub-band as scores and organize them into a single vector, and then used as a training vector of an SVM global classifier. Initially, the public EEG data set IIa of the BCI Competition IV is used to validate the approach. The first contribution of the proposed method is that, in addition to being more compact, because it has a 68% smaller dimension than the original signal, the resulting FFT matrix maintains the signal information relevant to class discrimination. In addition, the results showed an average reduction of 31.6% in the computational cost in relation to the application of filtering methods based on IIR filters, suggesting FFT efficiency when applied in the filtering step. Finally, the frequency decomposition approach improves the overall system classification rate significantly compared to the commonly used filtering, going from 73.7% using IIR to 84.2% using FFT. The accuracy improvement above 10% and the computational cost reduction denote the potential of FFT in EEG signal filtering applied to the context of IM-based BCI implementing SBCSP. Tests with other data sets are currently being performed to reinforce such conclusions.

Keywords: brain-computer interfaces, fast Fourier transform algorithm, motor imagery, sub-band common spatial patterns

Procedia PDF Downloads 94
233 Establishment of a Classifier Model for Early Prediction of Acute Delirium in Adult Intensive Care Unit Using Machine Learning

Authors: Pei Yi Lin

Abstract:

Objective: The objective of this study is to use machine learning methods to build an early prediction classifier model for acute delirium to improve the quality of medical care for intensive care patients. Background: Delirium is a common acute and sudden disturbance of consciousness in critically ill patients. After the occurrence, it is easy to prolong the length of hospital stay and increase medical costs and mortality. In 2021, the incidence of delirium in the intensive care unit of internal medicine was as high as 59.78%, which indirectly prolonged the average length of hospital stay by 8.28 days, and the mortality rate is about 2.22% in the past three years. Therefore, it is expected to build a delirium prediction classifier through big data analysis and machine learning methods to detect delirium early. Method: This study is a retrospective study, using the artificial intelligence big data database to extract the characteristic factors related to delirium in intensive care unit patients and let the machine learn. The study included patients aged over 20 years old who were admitted to the intensive care unit between May 1, 2022, and December 31, 2022, excluding GCS assessment <4 points, admission to ICU for less than 24 hours, and CAM-ICU evaluation. The CAMICU delirium assessment results every 8 hours within 30 days of hospitalization are regarded as an event, and the cumulative data from ICU admission to the prediction time point are extracted to predict the possibility of delirium occurring in the next 8 hours, and collect a total of 63,754 research case data, extract 12 feature selections to train the model, including age, sex, average ICU stay hours, visual and auditory abnormalities, RASS assessment score, APACHE-II Score score, number of invasive catheters indwelling, restraint and sedative and hypnotic drugs. Through feature data cleaning, processing and KNN interpolation method supplementation, a total of 54595 research case events were extracted to provide machine learning model analysis, using the research events from May 01 to November 30, 2022, as the model training data, 80% of which is the training set for model training, and 20% for the internal verification of the verification set, and then from December 01 to December 2022 The CU research event on the 31st is an external verification set data, and finally the model inference and performance evaluation are performed, and then the model has trained again by adjusting the model parameters. Results: In this study, XG Boost, Random Forest, Logistic Regression, and Decision Tree were used to analyze and compare four machine learning models. The average accuracy rate of internal verification was highest in Random Forest (AUC=0.86), and the average accuracy rate of external verification was in Random Forest and XG Boost was the highest, AUC was 0.86, and the average accuracy of cross-validation was the highest in Random Forest (ACC=0.77). Conclusion: Clinically, medical staff usually conduct CAM-ICU assessments at the bedside of critically ill patients in clinical practice, but there is a lack of machine learning classification methods to assist ICU patients in real-time assessment, resulting in the inability to provide more objective and continuous monitoring data to assist Clinical staff can more accurately identify and predict the occurrence of delirium in patients. It is hoped that the development and construction of predictive models through machine learning can predict delirium early and immediately, make clinical decisions at the best time, and cooperate with PADIS delirium care measures to provide individualized non-drug interventional care measures to maintain patient safety, and then Improve the quality of care.

Keywords: critically ill patients, machine learning methods, delirium prediction, classifier model

Procedia PDF Downloads 35
232 Methods for Enhancing Ensemble Learning or Improving Classifiers of This Technique in the Analysis and Classification of Brain Signals

Authors: Seyed Mehdi Ghezi, Hesam Hasanpoor

Abstract:

This scientific article explores enhancement methods for ensemble learning with the aim of improving the performance of classifiers in the analysis and classification of brain signals. The research approach in this field consists of two main parts, each with its own strengths and weaknesses. The choice of approach depends on the specific research question and available resources. By combining these approaches and leveraging their respective strengths, researchers can enhance the accuracy and reliability of classification results, consequently advancing our understanding of the brain and its functions. The first approach focuses on utilizing machine learning methods to identify the best features among the vast array of features present in brain signals. The selection of features varies depending on the research objective, and different techniques have been employed for this purpose. For instance, the genetic algorithm has been used in some studies to identify the best features, while optimization methods have been utilized in others to identify the most influential features. Additionally, machine learning techniques have been applied to determine the influential electrodes in classification. Ensemble learning plays a crucial role in identifying the best features that contribute to learning, thereby improving the overall results. The second approach concentrates on designing and implementing methods for selecting the best classifier or utilizing meta-classifiers to enhance the final results in ensemble learning. In a different section of the research, a single classifier is used instead of multiple classifiers, employing different sets of features to improve the results. The article provides an in-depth examination of each technique, highlighting their advantages and limitations. By integrating these techniques, researchers can enhance the performance of classifiers in the analysis and classification of brain signals. This advancement in ensemble learning methodologies contributes to a better understanding of the brain and its functions, ultimately leading to improved accuracy and reliability in brain signal analysis and classification.

Keywords: ensemble learning, brain signals, classification, feature selection, machine learning, genetic algorithm, optimization methods, influential features, influential electrodes, meta-classifiers

Procedia PDF Downloads 38
231 Combining Multiscale Patterns of Weather and Sea States into a Machine Learning Classifier for Mid-Term Prediction of Extreme Rainfall in North-Western Mediterranean Sea

Authors: Pinel Sebastien, Bourrin François, De Madron Du Rieu Xavier, Ludwig Wolfgang, Arnau Pedro

Abstract:

Heavy precipitation constitutes a major meteorological threat in the western Mediterranean. Research has investigated the relationship between the states of the Mediterranean Sea and the atmosphere with the precipitation for short temporal windows. However, at a larger temporal scale, the precursor signals of heavy rainfall in the sea and atmosphere have drawn little attention. Moreover, despite ongoing improvements in numerical weather prediction, the medium-term forecasting of rainfall events remains a difficult task. Here, we aim to investigate the influence of early-spring environmental parameters on the following autumnal heavy precipitations. Hence, we develop a machine learning model to predict extreme autumnal rainfall with a 6-month lead time over the Spanish Catalan coastal area, based on i) the sea pattern (main current-LPC and Sea Surface Temperature-SST) at the mesoscale scale, ii) 4 European weather teleconnection patterns (NAO, WeMo, SCAND, MO) at synoptic scale, and iii) the hydrological regime of the main local river (Rhône River). The accuracy of the developed model classifier is evaluated via statistical analysis based on classification accuracy, logarithmic and confusion matrix by comparing with rainfall estimates from rain gauges and satellite observations (CHIRPS-2.0). Sensitivity tests are carried out by changing the model configuration, such as sea SST, sea LPC, river regime, and synoptic atmosphere configuration. The sensitivity analysis suggests a negligible influence from the hydrological regime, unlike SST, LPC, and specific teleconnection weather patterns. At last, this study illustrates how public datasets can be integrated into a machine learning model for heavy rainfall prediction and can interest local policies for management purposes.

Keywords: extreme hazards, sensitivity analysis, heavy rainfall, machine learning, sea-atmosphere modeling, precipitation forecasting

Procedia PDF Downloads 105
230 Smartphone-Based Human Activity Recognition by Machine Learning Methods

Authors: Yanting Cao, Kazumitsu Nawata

Abstract:

As smartphones upgrading, their software and hardware are getting smarter, so the smartphone-based human activity recognition will be described as more refined, complex, and detailed. In this context, we analyzed a set of experimental data obtained by observing and measuring 30 volunteers with six activities of daily living (ADL). Due to the large sample size, especially a 561-feature vector with time and frequency domain variables, cleaning these intractable features and training a proper model becomes extremely challenging. After a series of feature selection and parameters adjustment, a well-performed SVM classifier has been trained.

Keywords: smart sensors, human activity recognition, artificial intelligence, SVM

Procedia PDF Downloads 119
229 Human Gait Recognition Using Moment with Fuzzy

Authors: Jyoti Bharti, Navneet Manjhi, M. K.Gupta, Bimi Jain

Abstract:

A reliable gait features are required to extract the gait sequences from an images. In this paper suggested a simple method for gait identification which is based on moments. Moment values are extracted on different number of frames of gray scale and silhouette images of CASIA database. These moment values are considered as feature values. Fuzzy logic and nearest neighbour classifier are used for classification. Both achieved higher recognition.

Keywords: gait, fuzzy logic, nearest neighbour, recognition rate, moments

Procedia PDF Downloads 722
228 An Intelligent Text Independent Speaker Identification Using VQ-GMM Model Based Multiple Classifier System

Authors: Ben Soltane Cheima, Ittansa Yonas Kelbesa

Abstract:

Speaker Identification (SI) is the task of establishing identity of an individual based on his/her voice characteristics. The SI task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker specific feature parameters from the speech and generates speaker models accordingly. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Even though performance of speaker identification systems has improved due to recent advances in speech processing techniques, there is still need of improvement. In this paper, a Closed-Set Tex-Independent Speaker Identification System (CISI) based on a Multiple Classifier System (MCS) is proposed, using Mel Frequency Cepstrum Coefficient (MFCC) as feature extraction and suitable combination of vector quantization (VQ) and Gaussian Mixture Model (GMM) together with Expectation Maximization algorithm (EM) for speaker modeling. The use of Voice Activity Detector (VAD) with a hybrid approach based on Short Time Energy (STE) and Statistical Modeling of Background Noise in the pre-processing step of the feature extraction yields a better and more robust automatic speaker identification system. Also investigation of Linde-Buzo-Gray (LBG) clustering algorithm for initialization of GMM, for estimating the underlying parameters, in the EM step improved the convergence rate and systems performance. It also uses relative index as confidence measures in case of contradiction in identification process by GMM and VQ as well. Simulation results carried out on voxforge.org speech database using MATLAB highlight the efficacy of the proposed method compared to earlier work.

Keywords: feature extraction, speaker modeling, feature matching, Mel frequency cepstrum coefficient (MFCC), Gaussian mixture model (GMM), vector quantization (VQ), Linde-Buzo-Gray (LBG), expectation maximization (EM), pre-processing, voice activity detection (VAD), short time energy (STE), background noise statistical modeling, closed-set tex-independent speaker identification system (CISI)

Procedia PDF Downloads 280
227 Epileptic Seizure Prediction by Exploiting Signal Transitions Phenomena

Authors: Mohammad Zavid Parvez, Manoranjan Paul

Abstract:

A seizure prediction method is proposed by extracting global features using phase correlation between adjacent epochs for detecting relative changes and local features using fluctuation/deviation within an epoch for determining fine changes of different EEG signals. A classifier and a regularization technique are applied for the reduction of false alarms and improvement of the overall prediction accuracy. The experiments show that the proposed method outperforms the state-of-the-art methods and provides high prediction accuracy (i.e., 97.70%) with low false alarm using EEG signals in different brain locations from a benchmark data set.

Keywords: Epilepsy, seizure, phase correlation, fluctuation, deviation.

Procedia PDF Downloads 440
226 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction

Procedia PDF Downloads 513
225 A New Internal Architecture Based On Feature Selection for Holonic Manufacturing System

Authors: Jihan Abdulazeez Ahmed, Adnan Mohsin Abdulazeez Brifcani

Abstract:

This paper suggests a new internal architecture of holon based on feature selection model using the combination of Bees Algorithm (BA) and Artificial Neural Network (ANN). BA is used to generate features while ANN is used as a classifier to evaluate the produced features. Proposed system is applied on the Wine data set, the statistical result proves that the proposed system is effective and has the ability to choose informative features with high accuracy.

Keywords: artificial neural network, bees algorithm, feature selection, Holon

Procedia PDF Downloads 429
224 Multi-Channel Information Fusion in C-OTDR Monitoring Systems: Various Approaches to Classify of Targeted Events

Authors: Andrey V. Timofeev

Abstract:

The paper presents new results concerning selection of optimal information fusion formula for ensembles of C-OTDR channels. The goal of information fusion is to create an integral classificator designed for effective classification of seismoacoustic target events. The LPBoost (LP-β and LP-B variants), the Multiple Kernel Learning, and Weighing of Inversely as Lipschitz Constants (WILC) approaches were compared. The WILC is a brand new approach to optimal fusion of Lipschitz Classifiers Ensembles. Results of practical usage are presented.

Keywords: Lipschitz Classifier, classifiers ensembles, LPBoost, C-OTDR systems

Procedia PDF Downloads 430
223 High Altitude Glacier Surface Mapping in Dhauliganga Basin of Himalayan Environment Using Remote Sensing Technique

Authors: Aayushi Pandey, Manoj Kumar Pandey, Ashutosh Tiwari, Kireet Kumar

Abstract:

Glaciers play an important role in climate change and are sensitive phenomena of global climate change scenario. Glaciers in Himalayas are unique as they are predominantly valley type and are located in tropical, high altitude regions. These glaciers are often covered with debris which greatly affects ablation rate of glaciers and work as a sensitive indicator of glacier health. The aim of this study is to map high altitude Glacier surface with a focus on glacial lake and debris estimation using different techniques in Nagling glacier of dhauliganga basin in Himalayan region. Different Image Classification techniques i.e. thresholding on different band ratios and supervised classification using maximum likelihood classifier (MLC) have been used on high resolution sentinel 2A level 1c satellite imagery of 14 October 2017.Here Near Infrared (NIR)/Shortwave Infrared (SWIR) ratio image was used to extract the glaciated classes (Snow, Ice, Ice Mixed Debris) from other non-glaciated terrain classes. SWIR/BLUE Ratio Image was used to map valley rock and Debris while Green/NIR ratio image was found most suitable for mapping Glacial Lake. Accuracy assessment was performed using high resolution (3 meters) Planetscope Imagery using 60 stratified random points. The overall accuracy of MLC was 85 % while the accuracy of Band Ratios was 96.66 %. According to Band Ratio technique total areal extent of glaciated classes (Snow, Ice ,IMD) in Nagling glacier was 10.70 km2 nearly 38.07% of study area comprising of 30.87 % Snow covered area, 3.93% Ice and 3.27 % IMD covered area. Non-glaciated classes (vegetation, glacial lake, debris and valley rock) covered 61.93 % of the total area out of which valley rock is dominant with 33.83% coverage followed by debris covering 27.7 % of the area in nagling glacier. Glacial lake and Debris were accurately mapped using Band ratio technique Hence, Band Ratio approach appears to be useful for the mapping of debris covered glacier in Himalayan Region.

Keywords: band ratio, Dhauliganga basin, glacier mapping, Himalayan region, maximum likelihood classifier (MLC), Sentinel-2 satellite image

Procedia PDF Downloads 201
222 Health Status Monitoring of COVID-19 Patient's through Blood Tests and Naïve-Bayes

Authors: Carlos Arias-Alcaide, Cristina Soguero-Ruiz, Paloma Santos-Álvarez, Adrián García-Romero, Inmaculada Mora-Jiménez

Abstract:

Analysing clinical data with computers in such a way that have an impact on the practitioners’ workflow is a challenge nowadays. This paper provides a first approach for monitoring the health status of COVID-19 patients through the use of some biomarkers (blood tests) and the simplest Naïve Bayes classifier. Data of two Spanish hospitals were considered, showing the potential of our approach to estimate reasonable posterior probabilities even some days before the event.

Keywords: Bayesian model, blood biomarkers, classification, health tracing, machine learning, posterior probability

Procedia PDF Downloads 183
221 Clustering-Based Detection of Alzheimer's Disease Using Brain MR Images

Authors: Sofia Matoug, Amr Abdel-Dayem

Abstract:

This paper presents a comprehensive survey of recent research studies to segment and classify brain MR (magnetic resonance) images in order to detect significant changes to brain ventricles. The paper also presents a general framework for detecting regions that atrophy, which can help neurologists in detecting and staging Alzheimer. Furthermore, a prototype was implemented to segment brain MR images in order to extract the region of interest (ROI) and then, a classifier was employed to differentiate between normal and abnormal brain tissues. Experimental results show that the proposed scheme can provide a reliable second opinion that neurologists can benefit from.

Keywords: Alzheimer, brain images, classification techniques, Magnetic Resonance Images MRI

Procedia PDF Downloads 274
220 Short-Term Forecast of Wind Turbine Production with Machine Learning Methods: Direct Approach and Indirect Approach

Authors: Mamadou Dione, Eric Matzner-lober, Philippe Alexandre

Abstract:

The Energy Transition Act defined by the French State has precise implications on Renewable Energies, in particular on its remuneration mechanism. Until then, a purchase obligation contract permitted the sale of wind-generated electricity at a fixed rate. Tomorrow, it will be necessary to sell this electricity on the Market (at variable rates) before obtaining additional compensation intended to reduce the risk. This sale on the market requires to announce in advance (about 48 hours before) the production that will be delivered on the network, so to be able to predict (in the short term) this production. The fundamental problem remains the variability of the Wind accentuated by the geographical situation. The objective of the project is to provide, every day, short-term forecasts (48-hour horizon) of wind production using weather data. The predictions of the GFS model and those of the ECMWF model are used as explanatory variables. The variable to be predicted is the production of a wind farm. We do two approaches: a direct approach that predicts wind generation directly from weather data, and an integrated approach that estimâtes wind from weather data and converts it into wind power by power curves. We used machine learning techniques to predict this production. The models tested are random forests, CART + Bagging, CART + Boosting, SVM (Support Vector Machine). The application is made on a wind farm of 22MW (11 wind turbines) of the Compagnie du Vent (that became Engie Green France). Our results are very conclusive compared to the literature.

Keywords: forecast aggregation, machine learning, spatio-temporal dynamics modeling, wind power forcast

Procedia PDF Downloads 186
219 Modelisation of a Full-Scale Closed Cement Grinding

Authors: D. Touil, L. Ouadah

Abstract:

An industrial model of cement grinding circuit is proposed on the basis of sampling surveys undertaken in the Meftah cement plant in Algiers, Algeria. The ball mill is described by a series of equal fully mixed stages that incorporates the effect of air sweeping. The kinetic parameters of this material in the energy normalized form obtained using the data of batch dry ball milling are taken into account in developing the present scale-up procedure. The dynamic separator is represented by the air classifier selectivity equation corrected by empirical factors. The model is incorporated in computer program that predict full size distributions and mass flow rates for all streams in a circuit under a particular set of operating conditions.

Keywords: grinding circuit, clinker, cement, modeling, population balance, energy

Procedia PDF Downloads 499
218 Social Media Data Analysis for Personality Modelling and Learning Styles Prediction Using Educational Data Mining

Authors: Srushti Patil, Preethi Baligar, Gopalkrishna Joshi, Gururaj N. Bhadri

Abstract:

In designing learning environments, the instructional strategies can be tailored to suit the learning style of an individual to ensure effective learning. In this study, the information shared on social media like Facebook is being used to predict learning style of a learner. Previous research studies have shown that Facebook data can be used to predict user personality. Users with a particular personality exhibit an inherent pattern in their digital footprint on Facebook. The proposed work aims to correlate the user's’ personality, predicted from Facebook data to the learning styles, predicted through questionnaires. For Millennial learners, Facebook has become a primary means for information sharing and interaction with peers. Thus, it can serve as a rich bed for research and direct the design of learning environments. The authors have conducted this study in an undergraduate freshman engineering course. Data from 320 freshmen Facebook users was collected. The same users also participated in the learning style and personality prediction survey. The Kolb’s Learning style questionnaires and Big 5 personality Inventory were adopted for the survey. The users have agreed to participate in this research and have signed individual consent forms. A specific page was created on Facebook to collect user data like personal details, status updates, comments, demographic characteristics and egocentric network parameters. This data was captured by an application created using Python program. The data captured from Facebook was subjected to text analysis process using the Linguistic Inquiry and Word Count dictionary. An analysis of the data collected from the questionnaires performed reveals individual student personality and learning style. The results obtained from analysis of Facebook, learning style and personality data were then fed into an automatic classifier that was trained by using the data mining techniques like Rule-based classifiers and Decision trees. This helps to predict the user personality and learning styles by analysing the common patterns. Rule-based classifiers applied for text analysis helps to categorize Facebook data into positive, negative and neutral. There were totally two models trained, one to predict the personality from Facebook data; another one to predict the learning styles from the personalities. The results show that the classifier model has high accuracy which makes the proposed method to be a reliable one for predicting the user personality and learning styles.

Keywords: educational data mining, Facebook, learning styles, personality traits

Procedia PDF Downloads 198
217 Review and Comparison of Associative Classification Data Mining Approaches

Authors: Suzan Wedyan

Abstract:

Data mining is one of the main phases in the Knowledge Discovery Database (KDD) which is responsible of finding hidden and useful knowledge from databases. There are many different tasks for data mining including regression, pattern recognition, clustering, classification, and association rule. In recent years a promising data mining approach called associative classification (AC) has been proposed, AC integrates classification and association rule discovery to build classification models (classifiers). This paper surveys and critically compares several AC algorithms with reference of the different procedures are used in each algorithm, such as rule learning, rule sorting, rule pruning, classifier building, and class allocation for test cases.

Keywords: associative classification, classification, data mining, learning, rule ranking, rule pruning, prediction

Procedia PDF Downloads 507