Search results for: central machine learning
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 10806

Search results for: central machine learning

10356 Python Implementation for S1000D Applicability Depended Processing Model - SALERNO

Authors: Theresia El Khoury, Georges Badr, Amir Hajjam El Hassani, Stéphane N’Guyen Van Ky

Abstract:

The widespread adoption of machine learning and artificial intelligence across different domains can be attributed to the digitization of data over several decades, resulting in vast amounts of data, types, and structures. Thus, data processing and preparation turn out to be a crucial stage. However, applying these techniques to S1000D standard-based data poses a challenge due to its complexity and the need to preserve logical information. This paper describes SALERNO, an S1000d AppLicability dEpended pRocessiNg mOdel. This python-based model analyzes and converts the XML S1000D-based files into an easier data format that can be used in machine learning techniques while preserving the different logic and relationships in files. The model parses the files in the given folder, filters them, and extracts the required information to be saved in appropriate data frames and Excel sheets. Its main idea is to group the extracted information by applicability. In addition, it extracts the full text by replacing internal and external references while maintaining the relationships between files, as well as the necessary requirements. The resulting files can then be saved in databases and used in different models. Documents in both English and French languages were tested, and special characters were decoded. Updates on the technical manuals were taken into consideration as well. The model was tested on different versions of the S1000D, and the results demonstrated its ability to effectively handle the applicability, requirements, references, and relationships across all files and on different levels.

Keywords: aeronautics, big data, data processing, machine learning, S1000D

Procedia PDF Downloads 102
10355 A Machine Learning Approach to Detecting Evasive PDF Malware

Authors: Vareesha Masood, Ammara Gul, Nabeeha Areej, Muhammad Asif Masood, Hamna Imran

Abstract:

The universal use of PDF files has prompted hackers to use them for malicious intent by hiding malicious codes in their victim’s PDF machines. Machine learning has proven to be the most efficient in identifying benign files and detecting files with PDF malware. This paper has proposed an approach using a decision tree classifier with parameters. A modern, inclusive dataset CIC-Evasive-PDFMal2022, produced by Lockheed Martin’s Cyber Security wing is used. It is one of the most reliable datasets to use in this field. We designed a PDF malware detection system that achieved 99.2%. Comparing the suggested model to other cutting-edge models in the same study field, it has a great performance in detecting PDF malware. Accordingly, we provide the fastest, most reliable, and most efficient PDF Malware detection approach in this paper.

Keywords: PDF, PDF malware, decision tree classifier, random forest classifier

Procedia PDF Downloads 66
10354 South African Students' Statistical Literacy in the Conceptual Understanding about Measures of Central Tendency after Completing Their High School Studies

Authors: Lukanda Kalobo

Abstract:

In South Africa, the High School Mathematics Curriculum provides teachers with specific aims and skills to be developed which involves the understanding about the measures of central tendency. The exploration begins with the definitions of statistical literacy, measurement of central tendency and a discussion on why statistical literacy is essential today. It furthermore discusses the statistical literacy basics involved in understanding the concepts of measures of central tendency. The statistical literacy test on the measures of central tendency, was used to collect data which was administered to 78 first year students direct from high schools. The results indicated that students seemed to have forgotten about the statistical literacy in understanding the concepts of measure of central tendency after completing their high school study. The authors present inferences regarding the alignment between statistical literacy and the understanding of the concepts about the measures of central tendency, leading to the conclusion that there is a need to provide in-service and pre-service training.

Keywords: conceptual understanding, mean, median, mode, statistical literacy

Procedia PDF Downloads 280
10353 Performance Comparison of Different Regression Methods for a Polymerization Process with Adaptive Sampling

Authors: Florin Leon, Silvia Curteanu

Abstract:

Developing complete mechanistic models for polymerization reactors is not easy, because complex reactions occur simultaneously; there is a large number of kinetic parameters involved and sometimes the chemical and physical phenomena for mixtures involving polymers are poorly understood. To overcome these difficulties, empirical models based on sampled data can be used instead, namely regression methods typical of machine learning field. They have the ability to learn the trends of a process without any knowledge about its particular physical and chemical laws. Therefore, they are useful for modeling complex processes, such as the free radical polymerization of methyl methacrylate achieved in a batch bulk process. The goal is to generate accurate predictions of monomer conversion, numerical average molecular weight and gravimetrical average molecular weight. This process is associated with non-linear gel and glass effects. For this purpose, an adaptive sampling technique is presented, which can select more samples around the regions where the values have a higher variation. Several machine learning methods are used for the modeling and their performance is compared: support vector machines, k-nearest neighbor, k-nearest neighbor and random forest, as well as an original algorithm, large margin nearest neighbor regression. The suggested method provides very good results compared to the other well-known regression algorithms.

Keywords: batch bulk methyl methacrylate polymerization, adaptive sampling, machine learning, large margin nearest neighbor regression

Procedia PDF Downloads 283
10352 Analysis of the Significance of Multimedia Channels Using Sparse PCA and Regularized SVD

Authors: Kourosh Modarresi

Abstract:

The abundance of media channels and devices has given users a variety of options to extract, discover, and explore information in the digital world. Since, often, there is a long and complicated path that a typical user may venture before taking any (significant) action (such as purchasing goods and services), it is critical to know how each node (media channel) in the path of user has contributed to the final action. In this work, the significance of each media channel is computed using statistical analysis and machine learning techniques. More specifically, “Regularized Singular Value Decomposition”, and “Sparse Principal Component” has been used to compute the significance of each channel toward the final action. The results of this work are a considerable improvement compared to the present approaches.

Keywords: multimedia attribution, sparse principal component, regularization, singular value decomposition, feature significance, machine learning, linear systems, variable shrinkage

Procedia PDF Downloads 285
10351 Neural Network and Support Vector Machine for Prediction of Foot Disorders Based on Foot Analysis

Authors: Monireh Ahmadi Bani, Adel Khorramrouz, Lalenoor Morvarid, Bagheri Mahtab

Abstract:

Background:- Foot disorders are common in musculoskeletal problems. Plantar pressure distribution measurement is one the most important part of foot disorders diagnosis for quantitative analysis. However, the association of plantar pressure and foot disorders is not clear. With the growth of dataset and machine learning methods, the relationship between foot disorders and plantar pressures can be detected. Significance of the study:- The purpose of this study was to predict the probability of common foot disorders based on peak plantar pressure distribution and center of pressure during walking. Methodologies:- 2323 participants were assessed in a foot therapy clinic between 2015 and 2021. Foot disorders were diagnosed by an experienced physician and then they were asked to walk on a force plate scanner. After the data preprocessing, due to the difference in walking time and foot size, we normalized the samples based on time and foot size. Some of force plate variables were selected as input to a deep neural network (DNN), and the probability of any each foot disorder was measured. In next step, we used support vector machine (SVM) and run dataset for each foot disorder (classification of yes or no). We compared DNN and SVM for foot disorders prediction based on plantar pressure distributions and center of pressure. Findings:- The results demonstrated that the accuracy of deep learning architecture is sufficient for most clinical and research applications in the study population. In addition, the SVM approach has more accuracy for predictions, enabling applications for foot disorders diagnosis. The detection accuracy was 71% by the deep learning algorithm and 78% by the SVM algorithm. Moreover, when we worked with peak plantar pressure distribution, it was more accurate than center of pressure dataset. Conclusion:- Both algorithms- deep learning and SVM will help therapist and patients to improve the data pool and enhance foot disorders prediction with less expense and error after removing some restrictions properly.

Keywords: deep neural network, foot disorder, plantar pressure, support vector machine

Procedia PDF Downloads 320
10350 A Study on the Impact of Artificial Intelligence on Human Society and the Necessity for Setting up the Boundaries on AI Intrusion

Authors: Swarna Pundir, Prabuddha Hans

Abstract:

As AI has already stepped into the daily life of human society, one cannot be ignorant about the data it collects and used it to provide a quality of services depending up on the individuals’ choices. It also helps in giving option for making decision Vs choice selection with a calculation based on the history of our search criteria. Over the past decade or so, the way Artificial Intelligence (AI) has impacted society is undoubtedly large.AI has changed the way we shop, the way we entertain and challenge ourselves, the way information is handled, and has automated some sections of our life. We have answered as to what AI is, but not why one may see it as useful. AI is useful because it is capable of learning and predicting outcomes, using Machine Learning (ML) and Deep Learning (DL) with the help of Artificial Neural Networks (ANN). AI can also be a system that can act like humans. One of the major impacts be Joblessness through automation via AI which is seen mostly in manufacturing sectors, especially in the routine manual and blue-collar occupations and those without a college degree. It raises some serious concerns about AI in regards of less employment, ethics in making moral decisions, Individuals privacy, human judgement’s, natural emotions, biased decisions, discrimination. So, the question is if an error occurs who will be responsible, or it will be just waved off as a “Machine Error”, with no one taking the responsibility of any wrongdoing, it is essential to form some rules for using the AI where both machines and humans are involved.

Keywords: AI, ML, DL, ANN

Procedia PDF Downloads 68
10349 EEG-Based Screening Tool for School Student’s Brain Disorders Using Machine Learning Algorithms

Authors: Abdelrahman A. Ramzy, Bassel S. Abdallah, Mohamed E. Bahgat, Sarah M. Abdelkader, Sherif H. ElGohary

Abstract:

Attention-Deficit/Hyperactivity Disorder (ADHD), epilepsy, and autism affect millions of children worldwide, many of which are undiagnosed despite the fact that all of these disorders are detectable in early childhood. Late diagnosis can cause severe problems due to the late treatment and to the misconceptions and lack of awareness as a whole towards these disorders. Moreover, electroencephalography (EEG) has played a vital role in the assessment of neural function in children. Therefore, quantitative EEG measurement will be utilized as a tool for use in the evaluation of patients who may have ADHD, epilepsy, and autism. We propose a screening tool that uses EEG signals and machine learning algorithms to detect these disorders at an early age in an automated manner. The proposed classifiers used with epilepsy as a step taken for the work done so far, provided an accuracy of approximately 97% using SVM, Naïve Bayes and Decision tree, while 98% using KNN, which gives hope for the work yet to be conducted.

Keywords: ADHD, autism, epilepsy, EEG, SVM

Procedia PDF Downloads 168
10348 Machine Learning Models for the Prediction of Heating and Cooling Loads of a Residential Building

Authors: Aaditya U. Jhamb

Abstract:

Due to the current energy crisis that many countries are battling, energy-efficient buildings are the subject of extensive research in the modern technological era because of growing worries about energy consumption and its effects on the environment. The paper explores 8 factors that help determine energy efficiency for a building: (relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, and glazing area distribution), with Tsanas and Xifara providing a dataset. The data set employed 768 different residential building models to anticipate heating and cooling loads with a low mean squared error. By optimizing these characteristics, machine learning algorithms may assess and properly forecast a building's heating and cooling loads, lowering energy usage while increasing the quality of people's lives. As a result, the paper studied the magnitude of the correlation between these input factors and the two output variables using various statistical methods of analysis after determining which input variable was most closely associated with the output loads. The most conclusive model was the Decision Tree Regressor, which had a mean squared error of 0.258, whilst the least definitive model was the Isotonic Regressor, which had a mean squared error of 21.68. This paper also investigated the KNN Regressor and the Linear Regression, which had to mean squared errors of 3.349 and 18.141, respectively. In conclusion, the model, given the 8 input variables, was able to predict the heating and cooling loads of a residential building accurately and precisely.

Keywords: energy efficient buildings, heating load, cooling load, machine learning models

Procedia PDF Downloads 73
10347 Framework for Detecting External Plagiarism from Monolingual Documents: Use of Shallow NLP and N-Gram Frequency Comparison

Authors: Saugata Bose, Ritambhra Korpal

Abstract:

The internet has increased the copy-paste scenarios amongst students as well as amongst researchers leading to different levels of plagiarized documents. For this reason, much of research is focused on for detecting plagiarism automatically. In this paper, an initiative is discussed where Natural Language Processing (NLP) techniques as well as supervised machine learning algorithms have been combined to detect plagiarized texts. Here, the major emphasis is on to construct a framework which detects external plagiarism from monolingual texts successfully. For successfully detecting the plagiarism, n-gram frequency comparison approach has been implemented to construct the model framework. The framework is based on 120 characteristics which have been extracted during pre-processing the documents using NLP approach. Afterwards, filter metrics has been applied to select most relevant characteristics and then supervised classification learning algorithm has been used to classify the documents in four levels of plagiarism. Confusion matrix was built to estimate the false positives and false negatives. Our plagiarism framework achieved a very high the accuracy score.

Keywords: lexical matching, shallow NLP, supervised machine learning algorithm, word n-gram

Procedia PDF Downloads 336
10346 Predicting Data Center Resource Usage Using Quantile Regression to Conserve Energy While Fulfilling the Service Level Agreement

Authors: Ahmed I. Alutabi, Naghmeh Dezhabad, Sudhakar Ganti

Abstract:

Data centers have been growing in size and dema nd continuously in the last two decades. Planning for the deployment of resources has been shallow and always resorted to over-provisioning. Data center operators try to maximize the availability of their services by allocating multiple of the needed resources. One resource that has been wasted, with little thought, has been energy. In recent years, programmable resource allocation has paved the way to allow for more efficient and robust data centers. In this work, we examine the predictability of resource usage in a data center environment. We use a number of models that cover a wide spectrum of machine learning categories. Then we establish a framework to guarantee the client service level agreement (SLA). Our results show that using prediction can cut energy loss by up to 55%.

Keywords: machine learning, artificial intelligence, prediction, data center, resource allocation, green computing

Procedia PDF Downloads 85
10345 Developing an Out-of-Distribution Generalization Model Selection Framework through Impurity and Randomness Measurements and a Bias Index

Authors: Todd Zhou, Mikhail Yurochkin

Abstract:

Out-of-distribution (OOD) detection is receiving increasing amounts of attention in the machine learning research community, boosted by recent technologies, such as autonomous driving and image processing. This newly-burgeoning field has called for the need for more effective and efficient methods for out-of-distribution generalization methods. Without accessing the label information, deploying machine learning models to out-of-distribution domains becomes extremely challenging since it is impossible to evaluate model performance on unseen domains. To tackle this out-of-distribution detection difficulty, we designed a model selection pipeline algorithm and developed a model selection framework with different impurity and randomness measurements to evaluate and choose the best-performing models for out-of-distribution data. By exploring different randomness scores based on predicted probabilities, we adopted the out-of-distribution entropy and developed a custom-designed score, ”CombinedScore,” as the evaluation criterion. This proposed score was created by adding labeled source information into the judging space of the uncertainty entropy score using harmonic mean. Furthermore, the prediction bias was explored through the equality of opportunity violation measurement. We also improved machine learning model performance through model calibration. The effectiveness of the framework with the proposed evaluation criteria was validated on the Folktables American Community Survey (ACS) datasets.

Keywords: model selection, domain generalization, model fairness, randomness measurements, bias index

Procedia PDF Downloads 104
10344 Effect of Personality Traits on Classification of Political Orientation

Authors: Vesile Evrim, Aliyu Awwal

Abstract:

Today as in the other domains, there are an enormous number of political transcripts available in the Web which is waiting to be mined and used for various purposes such as statistics and recommendations. Therefore, automatically determining the political orientation on these transcripts becomes crucial. The methodologies used by machine learning algorithms to do the automatic classification are based on different features such as Linguistic. Considering the ideology differences between Liberals and Conservatives, in this paper, the effect of Personality Traits on political orientation classification is studied. This is done by considering the correlation between LIWC features and the BIG Five Personality Traits. Several experiments are conducted on Convote U.S. Congressional-Speech dataset with seven benchmark classification algorithms. The different methodologies are applied on selecting different feature sets that constituted by 8 to 64 varying number of features. While Neuroticism is obtained to be the most differentiating personality trait on classification of political polarity, when its top 10 representative features are combined with several classification algorithms, it outperformed the results presented in previous research.

Keywords: politics, personality traits, LIWC, machine learning

Procedia PDF Downloads 471
10343 Personalized Email Marketing Strategy: A Reinforcement Learning Approach

Authors: Lei Zhang, Tingting Xu, Jun He, Zhenyu Yan

Abstract:

Email marketing is one of the most important segments of online marketing. It has been proved to be the most effective way to acquire and retain customers. The email content is vital to customers. Different customers may have different familiarity with a product, so a successful marketing strategy must personalize email content based on individual customers’ product affinity. In this study, we build our personalized email marketing strategy with three types of emails: nurture, promotion, and conversion. Each type of email has a different influence on customers. We investigate this difference by analyzing customers’ open rates, click rates and opt-out rates. Feature importance from response models is also analyzed. The goal of the marketing strategy is to improve the click rate on conversion-type emails. To build the personalized strategy, we formulate the problem as a reinforcement learning problem and adopt a Q-learning algorithm with variations. The simulation results show that our model-based strategy outperforms the current marketer’s strategy.

Keywords: email marketing, email content, reinforcement learning, machine learning, Q-learning

Procedia PDF Downloads 173
10342 Exploring the Determinants of Personal Finance Difficulties by Machine Learning: Focus on Socio-Economic and Behavioural Changes Brought by COVID-19

Authors: Brian Tung, Yam Wing Siu, Tsun Se Cheong

Abstract:

Purpose: This research aims to explore how personal and environmental factors, especially the socio-economic changes and behavioral changes fostered by the COVID-19 outbreak pandemic, affect the financial vulnerability of a specific segment of people in financial distress. Innovative research methodology of machine learning will be applied to data collected from over 300 local individuals in Hong Kong seeking counseling or similar services in recent years. Results: First, machine learning has found that too much exposure to digital services and information on digitized services may lead to adverse effects on respondents’ financial vulnerability. Second, the improvement in financial literacy level provides benefits to the financially vulnerable group, especially those respondents who have started with a lower level. Third, serious addiction to digital technology can lead to worsened debt servicing ability. Machine learning also has found a strong correlation between debt servicing situations and income-seeking behavior as well as spending behavior. In addition, if the vulnerable groups are able to make appropriate investments, they can reduce the probability of incurring financial distress. Finally, being too active in borrowing and repayment can result in a higher likelihood of over-indebtedness. Conclusion: Findings can be employed in formulating a better counseling strategy for professionals. Debt counseling services can be more preventive in nature. For example, according to the findings, with a low level of financial literacy, the respondents are prone to overspending and unable to react properly to the e-marketing promotion messages pop-up from digital services or even falling into financial/investment scams. In addition, people with low levels of financial knowledge will benefit from financial education. Therefore, financial education programs could include tech-savvy matters as special features.

Keywords: personal finance, digitization of the economy, COVID-19 pandemic, addiction to digital technology, financial vulnerability

Procedia PDF Downloads 35
10341 Utilizing Temporal and Frequency Features in Fault Detection of Electric Motor Bearings with Advanced Methods

Authors: Mohammad Arabi

Abstract:

The development of advanced technologies in the field of signal processing and vibration analysis has enabled more accurate analysis and fault detection in electrical systems. This research investigates the application of temporal and frequency features in detecting faults in electric motor bearings, aiming to enhance fault detection accuracy and prevent unexpected failures. The use of methods such as deep learning algorithms and neural networks in this process can yield better results. The main objective of this research is to evaluate the efficiency and accuracy of methods based on temporal and frequency features in identifying faults in electric motor bearings to prevent sudden breakdowns and operational issues. Additionally, the feasibility of using techniques such as machine learning and optimization algorithms to improve the fault detection process is also considered. This research employed an experimental method and random sampling. Vibration signals were collected from electric motors under normal and faulty conditions. After standardizing the data, temporal and frequency features were extracted. These features were then analyzed using statistical methods such as analysis of variance (ANOVA) and t-tests, as well as machine learning algorithms like artificial neural networks and support vector machines (SVM). The results showed that using temporal and frequency features significantly improves the accuracy of fault detection in electric motor bearings. ANOVA indicated significant differences between normal and faulty signals. Additionally, t-tests confirmed statistically significant differences between the features extracted from normal and faulty signals. Machine learning algorithms such as neural networks and SVM also significantly increased detection accuracy, demonstrating high effectiveness in timely and accurate fault detection. This study demonstrates that using temporal and frequency features combined with machine learning algorithms can serve as an effective tool for detecting faults in electric motor bearings. This approach not only enhances fault detection accuracy but also simplifies and streamlines the detection process. However, challenges such as data standardization and the cost of implementing advanced monitoring systems must also be considered. Utilizing temporal and frequency features in fault detection of electric motor bearings, along with advanced machine learning methods, offers an effective solution for preventing failures and ensuring the operational health of electric motors. Given the promising results of this research, it is recommended that this technology be more widely adopted in industrial maintenance processes.

Keywords: electric motor, fault detection, frequency features, temporal features

Procedia PDF Downloads 18
10340 A Framework of Dynamic Rule Selection Method for Dynamic Flexible Job Shop Problem by Reinforcement Learning Method

Authors: Rui Wu

Abstract:

In the volatile modern manufacturing environment, new orders randomly occur at any time, while the pre-emptive methods are infeasible. This leads to a real-time scheduling method that can produce a reasonably good schedule quickly. The dynamic Flexible Job Shop problem is an NP-hard scheduling problem that hybrid the dynamic Job Shop problem with the Parallel Machine problem. A Flexible Job Shop contains different work centres. Each work centre contains parallel machines that can process certain operations. Many algorithms, such as genetic algorithms or simulated annealing, have been proposed to solve the static Flexible Job Shop problems. However, the time efficiency of these methods is low, and these methods are not feasible in a dynamic scheduling problem. Therefore, a dynamic rule selection scheduling system based on the reinforcement learning method is proposed in this research, in which the dynamic Flexible Job Shop problem is divided into several parallel machine problems to decrease the complexity of the dynamic Flexible Job Shop problem. Firstly, the features of jobs, machines, work centres, and flexible job shops are selected to describe the status of the dynamic Flexible Job Shop problem at each decision point in each work centre. Secondly, a framework of reinforcement learning algorithm using a double-layer deep Q-learning network is applied to select proper composite dispatching rules based on the status of each work centre. Then, based on the selected composite dispatching rule, an available operation is selected from the waiting buffer and assigned to an available machine in each work centre. Finally, the proposed algorithm will be compared with well-known dispatching rules on objectives of mean tardiness, mean flow time, mean waiting time, or mean percentage of waiting time in the real-time Flexible Job Shop problem. The result of the simulations proved that the proposed framework has reasonable performance and time efficiency.

Keywords: dynamic scheduling problem, flexible job shop, dispatching rules, deep reinforcement learning

Procedia PDF Downloads 79
10339 How to Guide Students from Surface to Deep Learning: Applied Philosophy in Management Education

Authors: Lihong Wu, Raymond Young

Abstract:

The ability to learn is one of the most critical skills in the information age. However, many students do not have a clear understanding of what learning is, what they are learning, and why they are learning. Many students study simply to pass rather than to learn something useful for their career and their life. They have a misconception about learning and a wrong attitude towards learning. This research explores student attitudes to study in management education and explores how to intercede to lead students from shallow to deeper modes of learning.

Keywords: knowledge, surface learning, deep learning, education

Procedia PDF Downloads 478
10338 Development of Geo-computational Model for Analysis of Lassa Fever Dynamics and Lassa Fever Outbreak Prediction

Authors: Adekunle Taiwo Adenike, I. K. Ogundoyin

Abstract:

Lassa fever is a neglected tropical virus that has become a significant public health issue in Nigeria, with the country having the greatest burden in Africa. This paper presents a Geo-Computational Model for Analysis and Prediction of Lassa Fever Dynamics and Outbreaks in Nigeria. The model investigates the dynamics of the virus with respect to environmental factors and human populations. It confirms the role of the rodent host in virus transmission and identifies how climate and human population are affected. The proposed methodology is carried out on a Linux operating system using the OSGeoLive virtual machine for geographical computing, which serves as a base for spatial ecology computing. The model design uses Unified Modeling Language (UML), and the performance evaluation uses machine learning algorithms such as random forest, fuzzy logic, and neural networks. The study aims to contribute to the control of Lassa fever, which is achievable through the combined efforts of public health professionals and geocomputational and machine learning tools. The research findings will potentially be more readily accepted and utilized by decision-makers for the attainment of Lassa fever elimination.

Keywords: geo-computational model, lassa fever dynamics, lassa fever, outbreak prediction, nigeria

Procedia PDF Downloads 67
10337 Facilitating Written Biology Assessment in Large-Enrollment Courses Using Machine Learning

Authors: Luanna B. Prevost, Kelli Carter, Margaurete Romero, Kirsti Martinez

Abstract:

Writing is an essential scientific practice, yet, in several countries, the increasing university science class-size limits the use of written assessments. Written assessments allow students to demonstrate their learning in their own words and permit the faculty to evaluate students’ understanding. However, the time and resources required to grade written assessments prohibit their use in large-enrollment science courses. This study examined the use of machine learning algorithms to automatically analyze student writing and provide timely feedback to the faculty about students' writing in biology. Written responses to questions about matter and energy transformation were collected from large-enrollment undergraduate introductory biology classrooms. Responses were analyzed using the LightSide text mining and classification software. Cohen’s Kappa was used to measure agreement between the LightSide models and human raters. Predictive models achieved agreement with human coding of 0.7 Cohen’s Kappa or greater. Models captured that when writing about matter-energy transformation at the ecosystem level, students focused on primarily on the concepts of heat loss, recycling of matter, and conservation of matter and energy. Models were also produced to capture writing about processes such as decomposition and biochemical cycling. The models created in this study can be used to provide automatic feedback about students understanding of these concepts to biology faculty who desire to use formative written assessments in larger enrollment biology classes, but do not have the time or personnel for manual grading.

Keywords: machine learning, written assessment, biology education, text mining

Procedia PDF Downloads 253
10336 Spectrogram Pre-Processing to Improve Isotopic Identification to Discriminate Gamma and Neutrons Sources

Authors: Mustafa Alhamdi

Abstract:

Industrial application to classify gamma rays and neutron events is investigated in this study using deep machine learning. The identification using a convolutional neural network and recursive neural network showed a significant improvement in predication accuracy in a variety of applications. The ability to identify the isotope type and activity from spectral information depends on feature extraction methods, followed by classification. The features extracted from the spectrum profiles try to find patterns and relationships to present the actual spectrum energy in low dimensional space. Increasing the level of separation between classes in feature space improves the possibility to enhance classification accuracy. The nonlinear nature to extract features by neural network contains a variety of transformation and mathematical optimization, while principal component analysis depends on linear transformations to extract features and subsequently improve the classification accuracy. In this paper, the isotope spectrum information has been preprocessed by finding the frequencies components relative to time and using them as a training dataset. Fourier transform implementation to extract frequencies component has been optimized by a suitable windowing function. Training and validation samples of different isotope profiles interacted with CdTe crystal have been simulated using Geant4. The readout electronic noise has been simulated by optimizing the mean and variance of normal distribution. Ensemble learning by combing voting of many models managed to improve the classification accuracy of neural networks. The ability to discriminate gamma and neutron events in a single predication approach using deep machine learning has shown high accuracy using deep learning. The paper findings show the ability to improve the classification accuracy by applying the spectrogram preprocessing stage to the gamma and neutron spectrums of different isotopes. Tuning deep machine learning models by hyperparameter optimization of neural network models enhanced the separation in the latent space and provided the ability to extend the number of detected isotopes in the training database. Ensemble learning contributed significantly to improve the final prediction.

Keywords: machine learning, nuclear physics, Monte Carlo simulation, noise estimation, feature extraction, classification

Procedia PDF Downloads 124
10335 Developed Text-Independent Speaker Verification System

Authors: Mohammed Arif, Abdessalam Kifouche

Abstract:

Speech is a very convenient way of communication between people and machines. It conveys information about the identity of the talker. Since speaker recognition technology is increasingly securing our everyday lives, the objective of this paper is to develop two automatic text-independent speaker verification systems (TI SV) using low-level spectral features and machine learning methods. (i) The first system is based on a support vector machine (SVM), which was widely used in voice signal processing with the aim of speaker recognition involving verifying the identity of the speaker based on its voice characteristics, and (ii) the second is based on Gaussian Mixture Model (GMM) and Universal Background Model (UBM) to combine different functions from different resources to implement the SVM based.

Keywords: speaker verification, text-independent, support vector machine, Gaussian mixture model, cepstral analysis

Procedia PDF Downloads 28
10334 Predictive Analysis of Chest X-rays Using NLP and Large Language Models with the Indiana University Dataset and Random Forest Classifier

Authors: Azita Ramezani, Ghazal Mashhadiagha, Bahareh Sanabakhsh

Abstract:

This study researches the combination of Random. Forest classifiers with large language models (LLMs) and natural language processing (NLP) to improve diagnostic accuracy in chest X-ray analysis using the Indiana University dataset. Utilizing advanced NLP techniques, the research preprocesses textual data from radiological reports to extract key features, which are then merged with image-derived data. This improved dataset is analyzed with Random Forest classifiers to predict specific clinical results, focusing on the identification of health issues and the estimation of case urgency. The findings reveal that the combination of NLP, LLMs, and machine learning not only increases diagnostic precision but also reliability, especially in quickly identifying critical conditions. Achieving an accuracy of 99.35%, the model shows significant advancements over conventional diagnostic techniques. The results emphasize the large potential of machine learning in medical imaging, suggesting that these technologies could greatly enhance clinician judgment and patient outcomes by offering quicker and more precise diagnostic approximations.

Keywords: natural language processing (NLP), large language models (LLMs), random forest classifier, chest x-ray analysis, medical imaging, diagnostic accuracy, indiana university dataset, machine learning in healthcare, predictive modeling, clinical decision support systems

Procedia PDF Downloads 15
10333 Impact Location From Instrumented Mouthguard Kinematic Data In Rugby

Authors: Jazim Sohail, Filipe Teixeira-Dias

Abstract:

Mild traumatic brain injury (mTBI) within non-helmeted contact sports is a growing concern due to the serious risk of potential injury. Extensive research is being conducted looking into head kinematics in non-helmeted contact sports utilizing instrumented mouthguards that allow researchers to record accelerations and velocities of the head during and after an impact. This does not, however, allow the location of the impact on the head, and its magnitude and orientation, to be determined. This research proposes and validates two methods to quantify impact locations from instrumented mouthguard kinematic data, one using rigid body dynamics, the other utilizing machine learning. The rigid body dynamics technique focuses on establishing and matching moments from Euler’s and torque equations in order to find the impact location on the head. The methodology is validated with impact data collected from a lab test with the dummy head fitted with an instrumented mouthguard. Additionally, a Hybrid III Dummy head finite element model was utilized to create synthetic kinematic data sets for impacts from varying locations to validate the impact location algorithm. The algorithm calculates accurate impact locations; however, it will require preprocessing of live data, which is currently being done by cross-referencing data timestamps to video footage. The machine learning technique focuses on eliminating the preprocessing aspect by establishing trends within time-series signals from instrumented mouthguards to determine the impact location on the head. An unsupervised learning technique is used to cluster together impacts within similar regions from an entire time-series signal. The kinematic signals established from mouthguards are converted to the frequency domain before using a clustering algorithm to cluster together similar signals within a time series that may span the length of a game. Impacts are clustered within predetermined location bins. The same Hybrid III Dummy finite element model is used to create impacts that closely replicate on-field impacts in order to create synthetic time-series datasets consisting of impacts in varying locations. These time-series data sets are used to validate the machine learning technique. The rigid body dynamics technique provides a good method to establish accurate impact location of impact signals that have already been labeled as true impacts and filtered out of the entire time series. However, the machine learning technique provides a method that can be implemented with long time series signal data but will provide impact location within predetermined regions on the head. Additionally, the machine learning technique can be used to eliminate false impacts captured by sensors saving additional time for data scientists using instrumented mouthguard kinematic data as validating true impacts with video footage would not be required.

Keywords: head impacts, impact location, instrumented mouthguard, machine learning, mTBI

Procedia PDF Downloads 194
10332 Prediction of Remaining Life of Industrial Cutting Tools with Deep Learning-Assisted Image Processing Techniques

Authors: Gizem Eser Erdek

Abstract:

This study is research on predicting the remaining life of industrial cutting tools used in the industrial production process with deep learning methods. When the life of cutting tools decreases, they cause destruction to the raw material they are processing. This study it is aimed to predict the remaining life of the cutting tool based on the damage caused by the cutting tools to the raw material. For this, hole photos were collected from the hole-drilling machine for 8 months. Photos were labeled in 5 classes according to hole quality. In this way, the problem was transformed into a classification problem. Using the prepared data set, a model was created with convolutional neural networks, which is a deep learning method. In addition, VGGNet and ResNet architectures, which have been successful in the literature, have been tested on the data set. A hybrid model using convolutional neural networks and support vector machines is also used for comparison. When all models are compared, it has been determined that the model in which convolutional neural networks are used gives successful results of a %74 accuracy rate. In the preliminary studies, the data set was arranged to include only the best and worst classes, and the study gave ~93% accuracy when the binary classification model was applied. The results of this study showed that the remaining life of the cutting tools could be predicted by deep learning methods based on the damage to the raw material. Experiments have proven that deep learning methods can be used as an alternative for cutting tool life estimation.

Keywords: classification, convolutional neural network, deep learning, remaining life of industrial cutting tools, ResNet, support vector machine, VggNet

Procedia PDF Downloads 52
10331 A Heart Arrhythmia Prediction Using Machine Learning’s Classification Approach and the Concept of Data Mining

Authors: Roshani S. Golhar, Neerajkumar S. Sathawane, Snehal Dongre

Abstract:

Background and objectives: As the, cardiovascular illnesses increasing and becoming cause of mortality worldwide, killing around lot of people each year. Arrhythmia is a type of cardiac illness characterized by a change in the linearity of the heartbeat. The goal of this study is to develop novel deep learning algorithms for successfully interpreting arrhythmia using a single second segment. Because the ECG signal indicates unique electrical heart activity across time, considerable changes between time intervals are detected. Such variances, as well as the limited number of learning data available for each arrhythmia, make standard learning methods difficult, and so impede its exaggeration. Conclusions: The proposed method was able to outperform several state-of-the-art methods. Also proposed technique is an effective and convenient approach to deep learning for heartbeat interpretation, that could be probably used in real-time healthcare monitoring systems

Keywords: electrocardiogram, ECG classification, neural networks, convolutional neural networks, portable document format

Procedia PDF Downloads 52
10330 Experiments on Weakly-Supervised Learning on Imperfect Data

Authors: Yan Cheng, Yijun Shao, James Rudolph, Charlene R. Weir, Beth Sahlmann, Qing Zeng-Treitler

Abstract:

Supervised predictive models require labeled data for training purposes. Complete and accurate labeled data, i.e., a ‘gold standard’, is not always available, and imperfectly labeled data may need to serve as an alternative. An important question is if the accuracy of the labeled data creates a performance ceiling for the trained model. In this study, we trained several models to recognize the presence of delirium in clinical documents using data with annotations that are not completely accurate (i.e., weakly-supervised learning). In the external evaluation, the support vector machine model with a linear kernel performed best, achieving an area under the curve of 89.3% and accuracy of 88%, surpassing the 80% accuracy of the training sample. We then generated a set of simulated data and carried out a series of experiments which demonstrated that models trained on imperfect data can (but do not always) outperform the accuracy of the training data, e.g., the area under the curve for some models is higher than 80% when trained on the data with an error rate of 40%. Our experiments also showed that the error resistance of linear modeling is associated with larger sample size, error type, and linearity of the data (all p-values < 0.001). In conclusion, this study sheds light on the usefulness of imperfect data in clinical research via weakly-supervised learning.

Keywords: weakly-supervised learning, support vector machine, prediction, delirium, simulation

Procedia PDF Downloads 170
10329 EEG-Based Classification of Psychiatric Disorders: Bipolar Mood Disorder vs. Schizophrenia

Authors: Han-Jeong Hwang, Jae-Hyun Jo, Fatemeh Alimardani

Abstract:

An accurate diagnosis of psychiatric diseases is a challenging issue, in particular when distinct symptoms for different diseases are overlapped, such as delusions appeared in bipolar mood disorder (BMD) and schizophrenia (SCH). In the present study, we propose a useful way to discriminate BMD and SCH using electroencephalography (EEG). A total of thirty BMD and SCH patients (15 vs. 15) took part in our experiment. EEG signals were measured with nineteen electrodes attached on the scalp using the international 10-20 system, while they were exposed to a visual stimulus flickering at 16 Hz for 95 s. The flickering visual stimulus induces a certain brain signal, known as steady-state visual evoked potential (SSVEP), which is differently observed in patients with BMD and SCH, respectively, in terms of SSVEP amplitude because they process the same visual information in own unique way. For classifying BDM and SCH patients, machine learning technique was employed in which leave-one-out-cross validation was performed. The SSVEPs induced at the fundamental (16 Hz) and second harmonic (32 Hz) stimulation frequencies were extracted using fast Fourier transformation (FFT), and they were used as features. The most discriminative feature was selected using the Fisher score, and support vector machine (SVM) was used as a classifier. From the analysis, we could obtain a classification accuracy of 83.33 %, showing the feasibility of discriminating patients with BMD and SCH using EEG. We expect that our approach can be utilized for psychiatrists to more accurately diagnose the psychiatric disorders, BMD and SCH.

Keywords: bipolar mood disorder, electroencephalography, schizophrenia, machine learning

Procedia PDF Downloads 389
10328 Prediction-Based Midterm Operation Planning for Energy Management of Exhibition Hall

Authors: Doseong Eom, Jeongmin Kim, Kwang Ryel Ryu

Abstract:

Large exhibition halls require a lot of energy to maintain comfortable atmosphere for the visitors viewing inside. One way of reducing the energy cost is to have thermal energy storage systems installed so that the thermal energy can be stored in the middle of night when the energy price is low and then used later when the price is high. To minimize the overall energy cost, however, we should be able to decide how much energy to save during which time period exactly. If we can foresee future energy load and the corresponding cost, we will be able to make such decisions reasonably. In this paper, we use machine learning technique to obtain models for predicting weather conditions and the number of visitors on hourly basis for the next day. Based on the energy load thus predicted, we build a cost-optimal daily operation plan for the thermal energy storage systems and cooling and heating facilities through simulation-based optimization.

Keywords: building energy management, machine learning, operation planning, simulation-based optimization

Procedia PDF Downloads 298
10327 A Unique Multi-Class Support Vector Machine Algorithm Using MapReduce

Authors: Aditi Viswanathan, Shree Ranjani, Aruna Govada

Abstract:

With data sizes constantly expanding, and with classical machine learning algorithms that analyze such data requiring larger and larger amounts of computation time and storage space, the need to distribute computation and memory requirements among several computers has become apparent. Although substantial work has been done in developing distributed binary SVM algorithms and multi-class SVM algorithms individually, the field of multi-class distributed SVMs remains largely unexplored. This research seeks to develop an algorithm that implements the Support Vector Machine over a multi-class data set and is efficient in a distributed environment. For this, we recursively choose the best binary split of a set of classes using a greedy technique. Much like the divide and conquer approach. Our algorithm has shown better computation time during the testing phase than the traditional sequential SVM methods (One vs. One, One vs. Rest) and out-performs them as the size of the data set grows. This approach also classifies the data with higher accuracy than the traditional multi-class algorithms.

Keywords: distributed algorithm, MapReduce, multi-class, support vector machine

Procedia PDF Downloads 374