Search results for: KaraAgroAI cocoa dataset
448 Random Forest Classification for Population Segmentation
Authors: Regina Chua
Abstract:
To reduce the costs of re-fielding a large survey, a Random Forest classifier was applied to measure the accuracy of classifying individuals into their assigned segments with the fewest possible questions. Given a long survey, one needed to determine the most predictive ten or fewer questions that would accurately assign new individuals to custom segments. Furthermore, the solution needed to be quick in its classification and usable in non-Python environments. In this paper, a supervised Random Forest classifier was modeled on a dataset with 7,000 individuals, 60 questions, and 254 features. The Random Forest consisted of an iterative collection of individual decision trees that result in a predicted segment with robust precision and recall scores compared to a single tree. A random 70-30 stratified sampling for training the algorithm was used, and accuracy trade-offs at different depths for each segment were identified. Ultimately, the Random Forest classifier performed at 87% accuracy at a depth of 10 with 20 instead of 254 features and 10 instead of 60 questions. With an acceptable accuracy in prioritizing feature selection, new tools were developed for non-Python environments: a worksheet with a formulaic version of the algorithm and an embedded function to predict the segment of an individual in real-time. Random Forest was determined to be an optimal classification model by its feature selection, performance, processing speed, and flexible application in other environments.Keywords: machine learning, supervised learning, data science, random forest, classification, prediction, predictive modeling
Procedia PDF Downloads 94447 Automated End-to-End Pipeline Processing Solution for Autonomous Driving
Authors: Ashish Kumar, Munesh Raghuraj Varma, Nisarg Joshi, Gujjula Vishwa Teja, Srikanth Sambi, Arpit Awasthi
Abstract:
Autonomous driving vehicles are revolutionizing the transportation system of the 21st century. This has been possible due to intensive research put into making a robust, reliable, and intelligent program that can perceive and understand its environment and make decisions based on the understanding. It is a very data-intensive task with data coming from multiple sensors and the amount of data directly reflects on the performance of the system. Researchers have to design the preprocessing pipeline for different datasets with different sensor orientations and alignments before the dataset can be fed to the model. This paper proposes a solution that provides a method to unify all the data from different sources into a uniform format using the intrinsic and extrinsic parameters of the sensor used to capture the data allowing the same pipeline to use data from multiple sources at a time. This also means easy adoption of new datasets or In-house generated datasets. The solution also automates the complete deep learning pipeline from preprocessing to post-processing for various tasks allowing researchers to design multiple custom end-to-end pipelines. Thus, the solution takes care of the input and output data handling, saving the time and effort spent on it and allowing more time for model improvement.Keywords: augmentation, autonomous driving, camera, custom end-to-end pipeline, data unification, lidar, post-processing, preprocessing
Procedia PDF Downloads 123446 Design of an Ensemble Learning Behavior Anomaly Detection Framework
Authors: Abdoulaye Diop, Nahid Emad, Thierry Winter, Mohamed Hilia
Abstract:
Data assets protection is a crucial issue in the cybersecurity field. Companies use logical access control tools to vault their information assets and protect them against external threats, but they lack solutions to counter insider threats. Nowadays, insider threats are the most significant concern of security analysts. They are mainly individuals with legitimate access to companies information systems, which use their rights with malicious intents. In several fields, behavior anomaly detection is the method used by cyber specialists to counter the threats of user malicious activities effectively. In this paper, we present the step toward the construction of a user and entity behavior analysis framework by proposing a behavior anomaly detection model. This model combines machine learning classification techniques and graph-based methods, relying on linear algebra and parallel computing techniques. We show the utility of an ensemble learning approach in this context. We present some detection methods tests results on an representative access control dataset. The use of some explored classifiers gives results up to 99% of accuracy.Keywords: cybersecurity, data protection, access control, insider threat, user behavior analysis, ensemble learning, high performance computing
Procedia PDF Downloads 128445 Correlation between Speech Emotion Recognition Deep Learning Models and Noises
Authors: Leah Lee
Abstract:
This paper examines the correlation between deep learning models and emotions with noises to see whether or not noises mask emotions. The deep learning models used are plain convolutional neural networks (CNN), auto-encoder, long short-term memory (LSTM), and Visual Geometry Group-16 (VGG-16). Emotion datasets used are Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Crowd-sourced Emotional Multimodal Actors Dataset (CREMA-D), Toronto Emotional Speech Set (TESS), and Surrey Audio-Visual Expressed Emotion (SAVEE). To make it four times bigger, audio set files, stretch, and pitch augmentations are utilized. From the augmented datasets, five different features are extracted for inputs of the models. There are eight different emotions to be classified. Noise variations are white noise, dog barking, and cough sounds. The variation in the signal-to-noise ratio (SNR) is 0, 20, and 40. In summation, per a deep learning model, nine different sets with noise and SNR variations and just augmented audio files without any noises will be used in the experiment. To compare the results of the deep learning models, the accuracy and receiver operating characteristic (ROC) are checked.Keywords: auto-encoder, convolutional neural networks, long short-term memory, speech emotion recognition, visual geometry group-16
Procedia PDF Downloads 75444 Adolescent Social Anxiety, School Satisfaction, and School Absenteeism; Findings from Young-HUNT3 and Norwegian National Education Data
Authors: Malik D. Halidu, Cathrine F. Moe, Tommy Haugan
Abstract:
Purpose: The demand for effective school-based interventions in shaping adolescents' unmet mental health needs is growing. Grounding in the functional contextualism approach, this study investigates the role of school satisfaction (SS) in serving as a buffer to school absenteeism (SAB) among adolescents experiencing social anxiety (SA). Methods: A unique and large population-based sample of adolescents (upper secondary school pupils; n= 1864) from the Young-HUNT 3 survey dataset merged with the national educational registry from Norway. Moderation regression analysis was performed using Stata 17. Results: We find a statistically significant moderating role of school satisfaction on the relationship between social anxiety and school absenteeism (β=-0.109,p<0.01) among upper secondary school pupils. Among socially anxious adolescents associated with a higher perceived quality of school life, it functions as a buffer by reducing the positive relationship between SA and SAB. But, there was no statistically significant difference between social anxiety and school absenteeism for adolescents with low school satisfaction. Conclusion: Overall, the study's hypothesis model was statistically supported and contributes to the discourse that school satisfaction as a target of school-based interventions can effectively improve school outcomes (e.g., reduced absenteeism) among socially anxious pupils.Keywords: social anxiety, school satisfaction, school absenteeism, Norwegian adolescent
Procedia PDF Downloads 90443 Optimization of Hate Speech and Abusive Language Detection on Indonesian-language Twitter using Genetic Algorithms
Authors: Rikson Gultom
Abstract:
Hate Speech and Abusive language on social media is difficult to detect, usually, it is detected after it becomes viral in cyberspace, of course, it is too late for prevention. An early detection system that has a fairly good accuracy is needed so that it can reduce conflicts that occur in society caused by postings on social media that attack individuals, groups, and governments in Indonesia. The purpose of this study is to find an early detection model on Twitter social media using machine learning that has high accuracy from several machine learning methods studied. In this study, the support vector machine (SVM), Naïve Bayes (NB), and Random Forest Decision Tree (RFDT) methods were compared with the Support Vector machine with genetic algorithm (SVM-GA), Nave Bayes with genetic algorithm (NB-GA), and Random Forest Decision Tree with Genetic Algorithm (RFDT-GA). The study produced a comparison table for the accuracy of the hate speech and abusive language detection model, and presented it in the form of a graph of the accuracy of the six algorithms developed based on the Indonesian-language Twitter dataset, and concluded the best model with the highest accuracy.Keywords: abusive language, hate speech, machine learning, optimization, social media
Procedia PDF Downloads 128442 Post Pandemic Mobility Analysis through Indexing and Sharding in MongoDB: Performance Optimization and Insights
Authors: Karan Vishavjit, Aakash Lakra, Shafaq Khan
Abstract:
The COVID-19 pandemic has pushed healthcare professionals to use big data analytics as a vital tool for tracking and evaluating the effects of contagious viruses. To effectively analyze huge datasets, efficient NoSQL databases are needed. The analysis of post-COVID-19 health and well-being outcomes and the evaluation of the effectiveness of government efforts during the pandemic is made possible by this research’s integration of several datasets, which cuts down on query processing time and creates predictive visual artifacts. We recommend applying sharding and indexing technologies to improve query effectiveness and scalability as the dataset expands. Effective data retrieval and analysis are made possible by spreading the datasets into a sharded database and doing indexing on individual shards. Analysis of connections between governmental activities, poverty levels, and post-pandemic well being is the key goal. We want to evaluate the effectiveness of governmental initiatives to improve health and lower poverty levels. We will do this by utilising advanced data analysis and visualisations. The findings provide relevant data that supports the advancement of UN sustainable objectives, future pandemic preparation, and evidence-based decision-making. This study shows how Big Data and NoSQL databases may be used to address problems with global health.Keywords: big data, COVID-19, health, indexing, NoSQL, sharding, scalability, well being
Procedia PDF Downloads 70441 Mapping of Alteration Zones in Mineral Rich Belt of South-East Rajasthan Using Remote Sensing Techniques
Authors: Mrinmoy Dhara, Vivek K. Sengar, Shovan L. Chattoraj, Soumiya Bhattacharjee
Abstract:
Remote sensing techniques have emerged as an asset for various geological studies. Satellite images obtained by different sensors contain plenty of information related to the terrain. Digital image processing further helps in customized ways for the prospecting of minerals. In this study, an attempt has been made to map the hydrothermally altered zones using multispectral and hyperspectral datasets of South East Rajasthan. Advanced Space-borne Thermal Emission and Reflection Radiometer (ASTER) and Hyperion (Level1R) dataset have been processed to generate different Band Ratio Composites (BRCs). For this study, ASTER derived BRCs were generated to delineate the alteration zones, gossans, abundant clays and host rocks. ASTER and Hyperion images were further processed to extract mineral end members and classified mineral maps have been produced using Spectral Angle Mapper (SAM) method. Results were validated with the geological map of the area which shows positive agreement with the image processing outputs. Thus, this study concludes that the band ratios and image processing in combination play significant role in demarcation of alteration zones which may provide pathfinders for mineral prospecting studies.Keywords: ASTER, hyperion, band ratios, alteration zones, SAM
Procedia PDF Downloads 280440 Towards a Broader Understanding of Journal Impact: Measuring Relationships between Journal Characteristics and Scholarly Impact
Authors: X. Gu, K. L. Blackmore
Abstract:
The impact factor was introduced to measure the quality of journals. Various impact measures exist from multiple bibliographic databases. In this research, we aim to provide a broader understanding of the relationship between scholarly impact and other characteristics of academic journals. Data used for this research were collected from Ulrich’s Periodicals Directory (Ulrichs), Cabell’s (Cabells), and SCImago Journal & Country Rank (SJR) from 1999 to 2015. A master journal dataset was consolidated via Journal Title and ISSN. We adopted a two-step analysis process to study the quantitative relationships between scholarly impact and other journal characteristics. Firstly, we conducted a correlation analysis over the data attributes, with results indicating that there are no correlations between any of the identified journal characteristics. Secondly, we examined the quantitative relationship between scholarly impact and other characteristics using quartile analysis. The results show interesting patterns, including some expected and others less anticipated. Results show that higher quartile journals publish more in both frequency and quantity, and charge more for subscription cost. Top quartile journals also have the lowest acceptance rates. Non-English journals are more likely to be categorized in lower quartiles, which are more likely to stop publishing than higher quartiles. Future work is suggested, which includes analysis of the relationship between scholars and their publications, based on the quartile ranking of journals in which they publish.Keywords: academic journal, acceptance rate, impact factor, journal characteristics
Procedia PDF Downloads 304439 Corn Production in the Visayas: An Industry Study from 2002-2019
Authors: Julie Ann L. Gadin, Andrearose C. Igano, Carl Joseph S. Ignacio, Christopher C. Bacungan
Abstract:
Corn production has become an important and pervasive industry in the Visayas for many years. Its role as a substitute commodity to rice heightens demand for health-particular consumers. Unfortunately, the corn industry is confronted with several challenges, such as weak institutions. Considering these issues, the paper examined the factors that influence corn production in the three administrative regions in the Visayas, namely, Western Visayas, Central Visayas, and Eastern Visayas. The data used was retrieved from a variety of publicly available data sources such as the Philippine Statistics Authority, the Department of Agriculture, the Philippine Crop Insurance Corporation, and the International Disaster Database. Utilizing a dataset from 2002 to 2019, the indicators were tested using three multiple linear regression (MLR) models. Results showed that the land area harvested (p=0.02), and the value of corn production (p=0.00) are statistically significant variables that influence corn production in the Visayas. Given these findings, it is suggested that the policy of forest conversion and sustainable land management should be effective in enabling farmworkers to obtain land to grow corn crops, especially in rural regions. Furthermore, the Biofuels Act of 2006, the Livestock Industry Restructuring and Rationalization Act, and supported policy, Senate Bill No. 225, or an Act Establishing the Philippine Corn Research Institute and Appropriating Funds, should be enforced inclusively in order to improve the demand for the corn-allied industries which may lead to an increase in the value and volume of corn production in the Visayas.Keywords: corn, industry, production, MLR, Visayas
Procedia PDF Downloads 213438 Attitudinal Change: A Major Therapy for Non–Technical Losses in the Nigerian Power Sector
Authors: Fina O. Faithpraise, Effiong O. Obisung, Azele E. Peter, Chris R. Chatwin
Abstract:
This study investigates and identifies consumer attitude as a major influence that results in non-technical losses in the Nigerian electricity supply sector. This discovery is revealed by the combination of quantitative and qualitative research to complete a survey. The dataset employed is a simple random sampling of households using electricity (public power supply), and the number of units chosen is based on statistical power analysis. The units were subdivided into two categories (household with and without electrical meters). The hypothesis formulated was tested and analyzed using a chi-square statistical method. The results obtained shows that the critical value for the household with electrical prepared meter (EPM) was (9.488 < 427.4) and those without electrical prepared meter (EPMn) was (9.488 < 436.1) with a p-value of 0.01%. The analysis demonstrated so far established the real-time position, which shows that the wrong attitude towards handling the electricity supplied (not turning off light bulbs and electrical appliances when not in use within the rooms and outdoors within 12 hours of the day) characterized the non-technical losses in the power sector. Therefore the adoption of efficient lighting attitudes in individual households as recommended by the researcher is greatly encouraged. The results from this study should serve as a model for energy efficiency and use for the improvement of electricity consumption as well as a stable economy.Keywords: attitudinal change, household, non-technical losses, prepared meter
Procedia PDF Downloads 179437 A Fuzzy-Rough Feature Selection Based on Binary Shuffled Frog Leaping Algorithm
Authors: Javad Rahimipour Anaraki, Saeed Samet, Mahdi Eftekhari, Chang Wook Ahn
Abstract:
Feature selection and attribute reduction are crucial problems, and widely used techniques in the field of machine learning, data mining and pattern recognition to overcome the well-known phenomenon of the Curse of Dimensionality. This paper presents a feature selection method that efficiently carries out attribute reduction, thereby selecting the most informative features of a dataset. It consists of two components: 1) a measure for feature subset evaluation, and 2) a search strategy. For the evaluation measure, we have employed the fuzzy-rough dependency degree (FRFDD) of the lower approximation-based fuzzy-rough feature selection (L-FRFS) due to its effectiveness in feature selection. As for the search strategy, a modified version of a binary shuffled frog leaping algorithm is proposed (B-SFLA). The proposed feature selection method is obtained by hybridizing the B-SFLA with the FRDD. Nine classifiers have been employed to compare the proposed approach with several existing methods over twenty two datasets, including nine high dimensional and large ones, from the UCI repository. The experimental results demonstrate that the B-SFLA approach significantly outperforms other metaheuristic methods in terms of the number of selected features and the classification accuracy.Keywords: binary shuffled frog leaping algorithm, feature selection, fuzzy-rough set, minimal reduct
Procedia PDF Downloads 226436 Major Depressive Disorder: Diagnosis based on Electroencephalogram Analysis
Authors: Wajid Mumtaz, Aamir Saeed Malik, Syed Saad Azhar Ali, Mohd Azhar Mohd Yasin
Abstract:
In this paper, a technique based on electroencephalogram (EEG) analysis is presented, aiming for diagnosing major depressive disorder (MDD) among a potential population of MDD patients and healthy controls. EEG is recognized as a clinical modality during applications such as seizure diagnosis, index for anesthesia, detection of brain death or stroke. However, its usability for psychiatric illnesses such as MDD is less studied. Therefore, in this study, for the sake of diagnosis, 2 groups of study participants were recruited, 1) MDD patients, 2) healthy people as controls. EEG data acquired from both groups were analyzed involving inter-hemispheric asymmetry and composite permutation entropy index (CPEI). To automate the process, derived quantities from EEG were utilized as inputs to classifier such as logistic regression (LR) and support vector machine (SVM). The learning of these classification models was tested with a test dataset. Their learning efficiency is provided as accuracy of classifying MDD patients from controls, their sensitivities and specificities were reported, accordingly (LR =81.7 % and SVM =81.5 %). Based on the results, it is concluded that the derived measures are indicators for diagnosing MDD from a potential population of normal controls. In addition, the results motivate further exploring other measures for the same purpose.Keywords: major depressive disorder, diagnosis based on EEG, EEG derived features, CPEI, inter-hemispheric asymmetry
Procedia PDF Downloads 546435 Toward the Understanding of Shadow Port's Growth: The Level of Shadow Port
Authors: Chayakarn Bamrungbutr, James Sillitoe
Abstract:
The term ‘shadow port’ is used to describe a port whose markets are dominated by an adjacent port that has a more competitive capability. Recently, researchers have put effort into studying the mechanisms of how a regional port, in the shadow of a nearby predominant port which is a capital city port, can compete and grow. However, such mechanism is still unclear. This study thus focuses on understanding the growth of shadow port and the type of shadow port by using the two capital city ports of Thailand; Bangkok port (the former main port) and Laem Chabang port (the current main port), as the case study. By developing an understanding of the mechanisms of shadow, port could ultimately lead to an increase in the competitiveness. In this study, a framework of opportunity capture (introduced by Magala, 2004) will be used to create a framework for the study of the growth of the selected shadow port. In the process of building this framework, five groups of port development experts, consisting of government, council, academia, logistics provider and industry, will be interviewed. To facilitate this work, the Noticing, Collecting and Thinking model which was developed by Seidel (1998) will be used in an analysis of the dataset. The resulting analysis will be used to classify the type of shadow port. The type of these ports will be a significant factor for developing a feasible strategic guideline for the future management planning of ports, particularly, shadow ports, and then to increase the competitiveness of a nation’s maritime transport industry, and eventually lead to a boost in the national economy.Keywords: shadow port, Bangkok Port, Laem Chabang Port, port growth
Procedia PDF Downloads 177434 Crop Classification using Unmanned Aerial Vehicle Images
Authors: Iqra Yaseen
Abstract:
One of the well-known areas of computer science and engineering, image processing in the context of computer vision has been essential to automation. In remote sensing, medical science, and many other fields, it has made it easier to uncover previously undiscovered facts. Grading of diverse items is now possible because of neural network algorithms, categorization, and digital image processing. Its use in the classification of agricultural products, particularly in the grading of seeds or grains and their cultivars, is widely recognized. A grading and sorting system enables the preservation of time, consistency, and uniformity. Global population growth has led to an increase in demand for food staples, biofuel, and other agricultural products. To meet this demand, available resources must be used and managed more effectively. Image processing is rapidly growing in the field of agriculture. Many applications have been developed using this approach for crop identification and classification, land and disease detection and for measuring other parameters of crop. Vegetation localization is the base of performing these task. Vegetation helps to identify the area where the crop is present. The productivity of the agriculture industry can be increased via image processing that is based upon Unmanned Aerial Vehicle photography and satellite. In this paper we use the machine learning techniques like Convolutional Neural Network, deep learning, image processing, classification, You Only Live Once to UAV imaging dataset to divide the crop into distinct groups and choose the best way to use it.Keywords: image processing, UAV, YOLO, CNN, deep learning, classification
Procedia PDF Downloads 107433 A Posteriori Trading-Inspired Model-Free Time Series Segmentation
Authors: Plessen Mogens Graf
Abstract:
Within the context of multivariate time series segmentation, this paper proposes a method inspired by a posteriori optimal trading. After a normalization step, time series are treated channelwise as surrogate stock prices that can be traded optimally a posteriori in a virtual portfolio holding either stock or cash. Linear transaction costs are interpreted as hyperparameters for noise filtering. Trading signals, as well as trading signals obtained on the reversed time series, are used for unsupervised channelwise labeling before a consensus over all channels is reached that determines the final segmentation time instants. The method is model-free such that no model prescriptions for segments are made. Benefits of proposed approach include simplicity, computational efficiency, and adaptability to a wide range of different shapes of time series. Performance is demonstrated on synthetic and real-world data, including a large-scale dataset comprising a multivariate time series of dimension 1000 and length 2709. Proposed method is compared to a popular model-based bottom-up approach fitting piecewise affine models and to a recent model-based top-down approach fitting Gaussian models and found to be consistently faster while producing more intuitive results in the sense of segmenting time series at peaks and valleys.Keywords: time series segmentation, model-free, trading-inspired, multivariate data
Procedia PDF Downloads 136432 Prediction of Music Track Popularity: A Machine Learning Approach
Authors: Syed Atif Hassan, Luv Mehta, Syed Asif Hassan
Abstract:
Hit song science is a field of investigation wherein machine learning techniques are applied to music tracks in order to extract such features from audio signals which can capture information that could explain the popularity of respective tracks. Record companies invest huge amounts of money into recruiting fresh talents and churning out new music each year. Gaining insight into the basis of why a song becomes popular will result in tremendous benefits for the music industry. This paper aims to extract basic musical and more advanced, acoustic features from songs while also taking into account external factors that play a role in making a particular song popular. We use a dataset derived from popular Spotify playlists divided by genre. We use ten genres (blues, classical, country, disco, hip-hop, jazz, metal, pop, reggae, rock), chosen on the basis of clear to ambiguous delineation in the typical sound of their genres. We feed these features into three different classifiers, namely, SVM with RBF kernel, a deep neural network, and a recurring neural network, to build separate predictive models and choosing the best performing model at the end. Predicting song popularity is particularly important for the music industry as it would allow record companies to produce better content for the masses resulting in a more competitive market.Keywords: classifier, machine learning, music tracks, popularity, prediction
Procedia PDF Downloads 663431 Complete Ensemble Empirical Mode Decomposition with Adaptive Noise Temporal Convolutional Network for Remaining Useful Life Prediction of Lithium Ion Batteries
Authors: Jing Zhao, Dayong Liu, Shihao Wang, Xinghua Zhu, Delong Li
Abstract:
Uhumanned Underwater Vehicles generally operate in the deep sea, which has its own unique working conditions. Lithium-ion power batteries should have the necessary stability and endurance for use as an underwater vehicle’s power source. Therefore, it is essential to accurately forecast how long lithium-ion batteries will last in order to maintain the system’s reliability and safety. In order to model and forecast lithium battery Remaining Useful Life (RUL), this research suggests a model based on Complete Ensemble Empirical Mode Decomposition with Adaptive noise-Temporal Convolutional Net (CEEMDAN-TCN). In this study, two datasets, NASA and CALCE, which have a specific gap in capacity data fluctuation, are used to verify the model and examine the experimental results in order to demonstrate the generalizability of the concept. The experiments demonstrate the network structure’s strong universality and ability to achieve good fitting outcomes on the test set for various battery dataset types. The evaluation metrics reveal that the CEEMDAN-TCN prediction performance of TCN is 25% to 35% better than that of a single neural network, proving that feature expansion and modal decomposition can both enhance the model’s generalizability and be extremely useful in industrial settings.Keywords: lithium-ion battery, remaining useful life, complete EEMD with adaptive noise, temporal convolutional net
Procedia PDF Downloads 154430 From Electroencephalogram to Epileptic Seizures Detection by Using Artificial Neural Networks
Authors: Gaetano Zazzaro, Angelo Martone, Roberto V. Montaquila, Luigi Pavone
Abstract:
Seizure is the main factor that affects the quality of life of epileptic patients. The diagnosis of epilepsy, and hence the identification of epileptogenic zone, is commonly made by using continuous Electroencephalogram (EEG) signal monitoring. Seizure identification on EEG signals is made manually by epileptologists and this process is usually very long and error prone. The aim of this paper is to describe an automated method able to detect seizures in EEG signals, using knowledge discovery in database process and data mining methods and algorithms, which can support physicians during the seizure detection process. Our detection method is based on Artificial Neural Network classifier, trained by applying the multilayer perceptron algorithm, and by using a software application, called Training Builder that has been developed for the massive extraction of features from EEG signals. This tool is able to cover all the data preparation steps ranging from signal processing to data analysis techniques, including the sliding window paradigm, the dimensionality reduction algorithms, information theory, and feature selection measures. The final model shows excellent performances, reaching an accuracy of over 99% during tests on data of a single patient retrieved from a publicly available EEG dataset.Keywords: artificial neural network, data mining, electroencephalogram, epilepsy, feature extraction, seizure detection, signal processing
Procedia PDF Downloads 188429 Leveraging SHAP Values for Effective Feature Selection in Peptide Identification
Authors: Sharon Li, Zhonghang Xia
Abstract:
Post-database search is an essential phase in peptide identification using tandem mass spectrometry (MS/MS) to refine peptide-spectrum matches (PSMs) produced by database search engines. These engines frequently face difficulty differentiating between correct and incorrect peptide assignments. Despite advances in statistical and machine learning methods aimed at improving the accuracy of peptide identification, challenges remain in selecting critical features for these models. In this study, two machine learning models—a random forest tree and a support vector machine—were applied to three datasets to enhance PSMs. SHAP values were utilized to determine the significance of each feature within the models. The experimental results indicate that the random forest model consistently outperformed the SVM across all datasets. Further analysis of SHAP values revealed that the importance of features varies depending on the dataset, indicating that a feature's role in model predictions can differ significantly. This variability in feature selection can lead to substantial differences in model performance, with false discovery rate (FDR) differences exceeding 50% between different feature combinations. Through SHAP value analysis, the most effective feature combinations were identified, significantly enhancing model performance.Keywords: peptide identification, SHAP value, feature selection, random forest tree, support vector machine
Procedia PDF Downloads 23428 Performance Assessment of Multi-Level Ensemble for Multi-Class Problems
Authors: Rodolfo Lorbieski, Silvia Modesto Nassar
Abstract:
Many supervised machine learning tasks require decision making across numerous different classes. Multi-class classification has several applications, such as face recognition, text recognition and medical diagnostics. The objective of this article is to analyze an adapted method of Stacking in multi-class problems, which combines ensembles within the ensemble itself. For this purpose, a training similar to Stacking was used, but with three levels, where the final decision-maker (level 2) performs its training by combining outputs from the tree-based pair of meta-classifiers (level 1) from Bayesian families. These are in turn trained by pairs of base classifiers (level 0) of the same family. This strategy seeks to promote diversity among the ensembles forming the meta-classifier level 2. Three performance measures were used: (1) accuracy, (2) area under the ROC curve, and (3) time for three factors: (a) datasets, (b) experiments and (c) levels. To compare the factors, ANOVA three-way test was executed for each performance measure, considering 5 datasets by 25 experiments by 3 levels. A triple interaction between factors was observed only in time. The accuracy and area under the ROC curve presented similar results, showing a double interaction between level and experiment, as well as for the dataset factor. It was concluded that level 2 had an average performance above the other levels and that the proposed method is especially efficient for multi-class problems when compared to binary problems.Keywords: stacking, multi-layers, ensemble, multi-class
Procedia PDF Downloads 269427 Parkinson’s Disease Detection Analysis through Machine Learning Approaches
Authors: Muhtasim Shafi Kader, Fizar Ahmed, Annesha Acharjee
Abstract:
Machine learning and data mining are crucial in health care, as well as medical information and detection. Machine learning approaches are now being utilized to improve awareness of a variety of critical health issues, including diabetes detection, neuron cell tumor diagnosis, COVID 19 identification, and so on. Parkinson’s disease is basically a disease for our senior citizens in Bangladesh. Parkinson's Disease indications often seem progressive and get worst with time. People got affected trouble walking and communicating with the condition advances. Patients can also have psychological and social vagaries, nap problems, hopelessness, reminiscence loss, and weariness. Parkinson's disease can happen in both men and women. Though men are affected by the illness at a proportion that is around partial of them are women. In this research, we have to get out the accurate ML algorithm to find out the disease with a predictable dataset and the model of the following machine learning classifiers. Therefore, nine ML classifiers are secondhand to portion study to use machine learning approaches like as follows, Naive Bayes, Adaptive Boosting, Bagging Classifier, Decision Tree Classifier, Random Forest classifier, XBG Classifier, K Nearest Neighbor Classifier, Support Vector Machine Classifier, and Gradient Boosting Classifier are used.Keywords: naive bayes, adaptive boosting, bagging classifier, decision tree classifier, random forest classifier, XBG classifier, k nearest neighbor classifier, support vector classifier, gradient boosting classifier
Procedia PDF Downloads 129426 Vision-Based Daily Routine Recognition for Healthcare with Transfer Learning
Authors: Bruce X. B. Yu, Yan Liu, Keith C. C. Chan
Abstract:
We propose to record Activities of Daily Living (ADLs) of elderly people using a vision-based system so as to provide better assistive and personalization technologies. Current ADL-related research is based on data collected with help from non-elderly subjects in laboratory environments and the activities performed are predetermined for the sole purpose of data collection. To obtain more realistic datasets for the application, we recorded ADLs for the elderly with data collected from real-world environment involving real elderly subjects. Motivated by the need to collect data for more effective research related to elderly care, we chose to collect data in the room of an elderly person. Specifically, we installed Kinect, a vision-based sensor on the ceiling, to capture the activities that the elderly subject performs in the morning every day. Based on the data, we identified 12 morning activities that the elderly person performs daily. To recognize these activities, we created a HARELCARE framework to investigate into the effectiveness of existing Human Activity Recognition (HAR) algorithms and propose the use of a transfer learning algorithm for HAR. We compared the performance, in terms of accuracy, and training progress. Although the collected dataset is relatively small, the proposed algorithm has a good potential to be applied to all daily routine activities for healthcare purposes such as evidence-based diagnosis and treatment.Keywords: daily activity recognition, healthcare, IoT sensors, transfer learning
Procedia PDF Downloads 132425 Autism Disease Detection Using Transfer Learning Techniques: Performance Comparison between Central Processing Unit vs. Graphics Processing Unit Functions for Neural Networks
Authors: Mst Shapna Akter, Hossain Shahriar
Abstract:
Neural network approaches are machine learning methods used in many domains, such as healthcare and cyber security. Neural networks are mostly known for dealing with image datasets. While training with the images, several fundamental mathematical operations are carried out in the Neural Network. The operation includes a number of algebraic and mathematical functions, including derivative, convolution, and matrix inversion and transposition. Such operations require higher processing power than is typically needed for computer usage. Central Processing Unit (CPU) is not appropriate for a large image size of the dataset as it is built with serial processing. While Graphics Processing Unit (GPU) has parallel processing capabilities and, therefore, has higher speed. This paper uses advanced Neural Network techniques such as VGG16, Resnet50, Densenet, Inceptionv3, Xception, Mobilenet, XGBOOST-VGG16, and our proposed models to compare CPU and GPU resources. A system for classifying autism disease using face images of an autistic and non-autistic child was used to compare performance during testing. We used evaluation matrices such as Accuracy, F1 score, Precision, Recall, and Execution time. It has been observed that GPU runs faster than the CPU in all tests performed. Moreover, the performance of the Neural Network models in terms of accuracy increases on GPU compared to CPU.Keywords: autism disease, neural network, CPU, GPU, transfer learning
Procedia PDF Downloads 118424 Traffic Prediction with Raw Data Utilization and Context Building
Authors: Zhou Yang, Heli Sun, Jianbin Huang, Jizhong Zhao, Shaojie Qiao
Abstract:
Traffic prediction is essential in a multitude of ways in modern urban life. The researchers of earlier work in this domain carry out the investigation chiefly with two major focuses: (1) the accurate forecast of future values in multiple time series and (2) knowledge extraction from spatial-temporal correlations. However, two key considerations for traffic prediction are often missed: the completeness of raw data and the full context of the prediction timestamp. Concentrating on the two drawbacks of earlier work, we devise an approach that can address these issues in a two-phase framework. First, we utilize the raw trajectories to a greater extent through building a VLA table and data compression. We obtain the intra-trajectory features with graph-based encoding and the intertrajectory ones with a grid-based model and the technique of back projection that restore their surrounding high-resolution spatial-temporal environment. To the best of our knowledge, we are the first to study direct feature extraction from raw trajectories for traffic prediction and attempt the use of raw data with the least degree of reduction. In the prediction phase, we provide a broader context for the prediction timestamp by taking into account the information that are around it in the training dataset. Extensive experiments on several well-known datasets have verified the effectiveness of our solution that combines the strength of raw trajectory data and prediction context. In terms of performance, our approach surpasses several state-of-the-art methods for traffic prediction.Keywords: traffic prediction, raw data utilization, context building, data reduction
Procedia PDF Downloads 128423 Grey Wolf Optimization Technique for Predictive Analysis of Products in E-Commerce: An Adaptive Approach
Authors: Shital Suresh Borse, Vijayalaxmi Kadroli
Abstract:
E-commerce industries nowadays implement the latest AI, ML Techniques to improve their own performance and prediction accuracy. This helps to gain a huge profit from the online market. Ant Colony Optimization, Genetic algorithm, Particle Swarm Optimization, Neural Network & GWO help many e-commerce industries for up-gradation of their predictive performance. These algorithms are providing optimum results in various applications, such as stock price prediction, prediction of drug-target interaction & user ratings of similar products in e-commerce sites, etc. In this study, customer reviews will play an important role in prediction analysis. People showing much interest in buying a lot of services& products suggested by other customers. This ultimately increases net profit. In this work, a convolution neural network (CNN) is proposed which further is useful to optimize the prediction accuracy of an e-commerce website. This method shows that CNN is used to optimize hyperparameters of GWO algorithm using an appropriate coding scheme. Accurate model results are verified by comparing them to PSO results whose hyperparameters have been optimized by CNN in Amazon's customer review dataset. Here, experimental outcome proves that this proposed system using the GWO algorithm achieves superior execution in terms of accuracy, precision, recovery, etc. in prediction analysis compared to the existing systems.Keywords: prediction analysis, e-commerce, machine learning, grey wolf optimization, particle swarm optimization, CNN
Procedia PDF Downloads 113422 AutoML: Comprehensive Review and Application to Engineering Datasets
Authors: Parsa Mahdavi, M. Amin Hariri-Ardebili
Abstract:
The development of accurate machine learning and deep learning models traditionally demands hands-on expertise and a solid background to fine-tune hyperparameters. With the continuous expansion of datasets in various scientific and engineering domains, researchers increasingly turn to machine learning methods to unveil hidden insights that may elude classic regression techniques. This surge in adoption raises concerns about the adequacy of the resultant meta-models and, consequently, the interpretation of the findings. In response to these challenges, automated machine learning (AutoML) emerges as a promising solution, aiming to construct machine learning models with minimal intervention or guidance from human experts. AutoML encompasses crucial stages such as data preparation, feature engineering, hyperparameter optimization, and neural architecture search. This paper provides a comprehensive overview of the principles underpinning AutoML, surveying several widely-used AutoML platforms. Additionally, the paper offers a glimpse into the application of AutoML on various engineering datasets. By comparing these results with those obtained through classical machine learning methods, the paper quantifies the uncertainties inherent in the application of a single ML model versus the holistic approach provided by AutoML. These examples showcase the efficacy of AutoML in extracting meaningful patterns and insights, emphasizing its potential to revolutionize the way we approach and analyze complex datasets.Keywords: automated machine learning, uncertainty, engineering dataset, regression
Procedia PDF Downloads 61421 Multi-Atlas Segmentation Based on Dynamic Energy Model: Application to Brain MR Images
Authors: Jie Huo, Jonathan Wu
Abstract:
Segmentation of anatomical structures in medical images is essential for scientific inquiry into the complex relationships between biological structure and clinical diagnosis, treatment and assessment. As a method of incorporating the prior knowledge and the anatomical structure similarity between a target image and atlases, multi-atlas segmentation has been successfully applied in segmenting a variety of medical images, including the brain, cardiac, and abdominal images. The basic idea of multi-atlas segmentation is to transfer the labels in atlases to the coordinate of the target image by matching the target patch to the atlas patch in the neighborhood. However, this technique is limited by the pairwise registration between target image and atlases. In this paper, a novel multi-atlas segmentation approach is proposed by introducing a dynamic energy model. First, the target is mapped to each atlas image by minimizing the dynamic energy function, then the segmentation of target image is generated by weighted fusion based on the energy. The method is tested on MICCAI 2012 Multi-Atlas Labeling Challenge dataset which includes 20 target images and 15 atlases images. The paper also analyzes the influence of different parameters of the dynamic energy model on the segmentation accuracy and measures the dice coefficient by using different feature terms with the energy model. The highest mean dice coefficient obtained with the proposed method is 0.861, which is competitive compared with the recently published method.Keywords: brain MRI segmentation, dynamic energy model, multi-atlas segmentation, energy minimization
Procedia PDF Downloads 336420 GeneNet: Temporal Graph Data Visualization for Gene Nomenclature and Relationships
Authors: Jake Gonzalez, Tommy Dang
Abstract:
This paper proposes a temporal graph approach to visualize and analyze the evolution of gene relationships and nomenclature over time. An interactive web-based tool implements this temporal graph, enabling researchers to traverse a timeline and observe coupled dynamics in network topology and naming conventions. Analysis of a real human genomic dataset reveals the emergence of densely interconnected functional modules over time, representing groups of genes involved in key biological processes. For example, the antimicrobial peptide DEFA1A3 shows increased connections to related alpha-defensins involved in infection response. Tracking degree and betweenness centrality shifts over timeline iterations also quantitatively highlight the reprioritization of certain genes’ topological importance as knowledge advances. Examination of the CNR1 gene encoding the cannabinoid receptor CB1 demonstrates changing synonymous relationships and consolidating naming patterns over time, reflecting its unique functional role discovery. The integrated framework interconnecting these topological and nomenclature dynamics provides richer contextual insights compared to isolated analysis methods. Overall, this temporal graph approach enables a more holistic study of knowledge evolution to elucidate complex biology.Keywords: temporal graph, gene relationships, nomenclature evolution, interactive visualization, biological insights
Procedia PDF Downloads 61419 Study on the Relationship between the Urban Geography and Urban Agglomeration to the Effects of Carbon Emissions
Authors: Peng-Shao Chen, Yen-Jong Chen
Abstract:
In recent years, global warming, the dramatic change in energy prices and the exhaustion of natural resources illustrated that energy-related topic cannot be ignored. Despite the relationship between the cities and CO₂ emissions has been extensively studied in recent years, little attention has been paid to differences in the geographical location of the city. However, the geographical climate has a great impact on lifestyle from city to city, such as the type of buildings, the major industry of the city, etc. Therefore, the paper instigates empirically the effects of kinds of urban factors and CO₂ emissions with consideration of the different geographic, climatic zones which cities are located. Using the regression model and a dataset of urban agglomeration in East Asia cities with over one million population, including 2005, 2010, and 2015 three years, the findings suggest that the impact of urban factors on CO₂ emissions vary with the latitude of the cities. Surprisingly, all kinds of urban factors, including the urban population, the share of GDP in service industry, per capita income, and others, have different level of impact on the cities locate in the tropical climate zone and temperate climate zone. The results of the study analyze the impact of different urban factors on CO₂ emissions in urban area with different geographical climate zones. These findings will be helpful for the formulation of relevant policies for urban planners and policy makers in different regions.Keywords: carbon emissions, urban agglomeration, urban factor, urban geography
Procedia PDF Downloads 267