Search results for: python machine learning libraries
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 8446

Search results for: python machine learning libraries

8296 A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning

Authors: Samina Khalid, Shamila Nasreen

Abstract:

Dimensionality reduction as a preprocessing step to machine learning is effective in removing irrelevant and redundant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection and feature extraction methods with respect to efficiency and effectiveness. In the field of machine learning and pattern recognition, dimensionality reduction is important area, where many approaches have been proposed. In this paper, some widely used feature selection and feature extraction techniques have analyzed with the purpose of how effectively these techniques can be used to achieve high performance of learning algorithms that ultimately improves predictive accuracy of classifier. An endeavor to analyze dimensionality reduction techniques briefly with the purpose to investigate strengths and weaknesses of some widely used dimensionality reduction methods is presented.

Keywords: age related macular degeneration, feature selection feature subset selection feature extraction/transformation, FSA’s, relief, correlation based method, PCA, ICA

Procedia PDF Downloads 458
8295 AI-Based Techniques for Online Social Media Network Sentiment Analysis: A Methodical Review

Authors: A. M. John-Otumu, M. M. Rahman, O. C. Nwokonkwo, M. C. Onuoha

Abstract:

Online social media networks have long served as a primary arena for group conversations, gossip, text-based information sharing and distribution. The use of natural language processing techniques for text classification and unbiased decision-making has not been far-fetched. Proper classification of this textual information in a given context has also been very difficult. As a result, we decided to conduct a systematic review of previous literature on sentiment classification and AI-based techniques that have been used in order to gain a better understanding of the process of designing and developing a robust and more accurate sentiment classifier that can correctly classify social media textual information of a given context between hate speech and inverted compliments with a high level of accuracy by assessing different artificial intelligence techniques. We evaluated over 250 articles from digital sources like ScienceDirect, ACM, Google Scholar, and IEEE Xplore and whittled down the number of research to 31. Findings revealed that Deep learning approaches such as CNN, RNN, BERT, and LSTM outperformed various machine learning techniques in terms of performance accuracy. A large dataset is also necessary for developing a robust sentiment classifier and can be obtained from places like Twitter, movie reviews, Kaggle, SST, and SemEval Task4. Hybrid Deep Learning techniques like CNN+LSTM, CNN+GRU, CNN+BERT outperformed single Deep Learning techniques and machine learning techniques. Python programming language outperformed Java programming language in terms of sentiment analyzer development due to its simplicity and AI-based library functionalities. Based on some of the important findings from this study, we made a recommendation for future research.

Keywords: artificial intelligence, natural language processing, sentiment analysis, social network, text

Procedia PDF Downloads 91
8294 Identification of Biological Pathways Causative for Breast Cancer Using Unsupervised Machine Learning

Authors: Karthik Mittal

Abstract:

This study performs an unsupervised machine learning analysis to find clusters of related SNPs which highlight biological pathways that are important for the biological mechanisms of breast cancer. Studying genetic variations in isolation is illogical because these genetic variations are known to modulate protein production and function; the downstream effects of these modifications on biological outcomes are highly interconnected. After extracting the SNPs and their effect on different types of breast cancer using the MRBase library, two unsupervised machine learning clustering algorithms were implemented on the genetic variants: a k-means clustering algorithm and a hierarchical clustering algorithm; furthermore, principal component analysis was executed to visually represent the data. These algorithms specifically used the SNP’s beta value on the three different types of breast cancer tested in this project (estrogen-receptor positive breast cancer, estrogen-receptor negative breast cancer, and breast cancer in general) to perform this clustering. Two significant genetic pathways validated the clustering produced by this project: the MAPK signaling pathway and the connection between the BRCA2 gene and the ESR1 gene. This study provides the first proof of concept showing the importance of unsupervised machine learning in interpreting GWAS summary statistics.

Keywords: breast cancer, computational biology, unsupervised machine learning, k-means, PCA

Procedia PDF Downloads 117
8293 Review of Different Machine Learning Algorithms

Authors: Syed Romat Ali Shah, Bilal Shoaib, Saleem Akhtar, Munib Ahmad, Shahan Sadiqui

Abstract:

Classification is a data mining technique, which is recognizedon Machine Learning (ML) algorithm. It is used to classifythe individual articlein a knownofinformation into a set of predefinemodules or group. Web mining is also a portion of that sympathetic of data mining methods. The main purpose of this paper to analysis and compare the performance of Naïve Bayse Algorithm, Decision Tree, K-Nearest Neighbor (KNN), Artificial Neural Network (ANN)and Support Vector Machine (SVM). This paper consists of different ML algorithm and their advantages and disadvantages and also define research issues.

Keywords: Data Mining, Web Mining, classification, ML Algorithms

Procedia PDF Downloads 257
8292 Modeling Bessel Beams and Their Discrete Superpositions from the Generalized Lorenz-Mie Theory to Calculate Optical Forces over Spherical Dielectric Particles

Authors: Leonardo A. Ambrosio, Carlos. H. Silva Santos, Ivan E. L. Rodrigues, Ayumi K. de Campos, Leandro A. Machado

Abstract:

In this work, we propose an algorithm developed under Python language for the modeling of ordinary scalar Bessel beams and their discrete superpositions and subsequent calculation of optical forces exerted over dielectric spherical particles. The mathematical formalism, based on the generalized Lorenz-Mie theory, is implemented in Python for its large number of free mathematical (as SciPy and NumPy), data visualization (Matplotlib and PyJamas) and multiprocessing libraries. We also propose an approach, provided by a synchronized Software as Service (SaaS) in cloud computing, to develop a user interface embedded on a mobile application, thus providing users with the necessary means to easily introduce desired unknowns and parameters and see the graphical outcomes of the simulations right at their mobile devices. Initially proposed as a free Android-based application, such an App enables data post-processing in cloud-based architectures and visualization of results, figures and numerical tables.

Keywords: Bessel Beams and Frozen Waves, Generalized Lorenz-Mie Theory, Numerical Methods, optical forces

Procedia PDF Downloads 354
8291 Gradient Boosted Trees on Spark Platform for Supervised Learning in Health Care Big Data

Authors: Gayathri Nagarajan, L. D. Dhinesh Babu

Abstract:

Health care is one of the prominent industries that generate voluminous data thereby finding the need of machine learning techniques with big data solutions for efficient processing and prediction. Missing data, incomplete data, real time streaming data, sensitive data, privacy, heterogeneity are few of the common challenges to be addressed for efficient processing and mining of health care data. In comparison with other applications, accuracy and fast processing are of higher importance for health care applications as they are related to the human life directly. Though there are many machine learning techniques and big data solutions used for efficient processing and prediction in health care data, different techniques and different frameworks are proved to be effective for different applications largely depending on the characteristics of the datasets. In this paper, we present a framework that uses ensemble machine learning technique gradient boosted trees for data classification in health care big data. The framework is built on Spark platform which is fast in comparison with other traditional frameworks. Unlike other works that focus on a single technique, our work presents a comparison of six different machine learning techniques along with gradient boosted trees on datasets of different characteristics. Five benchmark health care datasets are considered for experimentation, and the results of different machine learning techniques are discussed in comparison with gradient boosted trees. The metric chosen for comparison is misclassification error rate and the run time of the algorithms. The goal of this paper is to i) Compare the performance of gradient boosted trees with other machine learning techniques in Spark platform specifically for health care big data and ii) Discuss the results from the experiments conducted on datasets of different characteristics thereby drawing inference and conclusion. The experimental results show that the accuracy is largely dependent on the characteristics of the datasets for other machine learning techniques whereas gradient boosting trees yields reasonably stable results in terms of accuracy without largely depending on the dataset characteristics.

Keywords: big data analytics, ensemble machine learning, gradient boosted trees, Spark platform

Procedia PDF Downloads 216
8290 Intrusion Detection Based on Graph Oriented Big Data Analytics

Authors: Ahlem Abid, Farah Jemili

Abstract:

Intrusion detection has been the subject of numerous studies in industry and academia, but cyber security analysts always want greater precision and global threat analysis to secure their systems in cyberspace. To improve intrusion detection system, the visualisation of the security events in form of graphs and diagrams is important to improve the accuracy of alerts. In this paper, we propose an approach of an IDS based on cloud computing, big data technique and using a machine learning graph algorithm which can detect in real time different attacks as early as possible. We use the MAWILab intrusion detection dataset . We choose Microsoft Azure as a unified cloud environment to load our dataset on. We implement the k2 algorithm which is a graphical machine learning algorithm to classify attacks. Our system showed a good performance due to the graphical machine learning algorithm and spark structured streaming engine.

Keywords: Apache Spark Streaming, Graph, Intrusion detection, k2 algorithm, Machine Learning, MAWILab, Microsoft Azure Cloud

Procedia PDF Downloads 115
8289 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation

Procedia PDF Downloads 201
8288 A Study on Big Data Analytics, Applications and Challenges

Authors: Chhavi Rana

Abstract:

The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, Healthcare, and business intelligence contain voluminous and incremental data, which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organization's decision-making strategy can be enhanced using big data analytics and applying different machine learning techniques and statistical tools on such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates on various frameworks in the process of Analysis using different machine-learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.

Keywords: big data, big data analytics, machine learning, review

Procedia PDF Downloads 53
8287 A Study on Big Data Analytics, Applications, and Challenges

Authors: Chhavi Rana

Abstract:

The aim of the paper is to highlight the existing development in the field of big data analytics. Applications like bioinformatics, smart infrastructure projects, healthcare, and business intelligence contain voluminous and incremental data which is hard to organise and analyse and can be dealt with using the framework and model in this field of study. An organisation decision-making strategy can be enhanced by using big data analytics and applying different machine learning techniques and statistical tools to such complex data sets that will consequently make better things for society. This paper reviews the current state of the art in this field of study as well as different application domains of big data analytics. It also elaborates various frameworks in the process of analysis using different machine learning techniques. Finally, the paper concludes by stating different challenges and issues raised in existing research.

Keywords: big data, big data analytics, machine learning, review

Procedia PDF Downloads 67
8286 Insider Theft Detection in Organizations Using Keylogger and Machine Learning

Authors: Shamatha Shetty, Sakshi Dhabadi, Prerana M., Indushree B.

Abstract:

About 66% of firms claim that insider attacks are more likely to happen. The frequency of insider incidents has increased by 47% in the last two years. The goal of this work is to prevent dangerous employee behavior by using keyloggers and the Machine Learning (ML) model. Every keystroke that the user enters is recorded by the keylogging program, also known as keystroke logging. Keyloggers are used to stop improper use of the system. This enables us to collect all textual data, save it in a CSV file, and analyze it using an ML algorithm and the VirusTotal API. Many large companies use it to methodically monitor how their employees use computers, the internet, and email. We are utilizing the SVM algorithm and the VirusTotal API to improve overall efficiency and accuracy in identifying specific patterns and words to automate and offer the report for improved monitoring.

Keywords: cyber security, machine learning, cyclic process, email notification

Procedia PDF Downloads 28
8285 Implementation in Python of a Method to Transform One-Dimensional Signals in Graphs

Authors: Luis Andrey Fajardo Fajardo

Abstract:

We are immersed in complex systems. The human brain, the galaxies, the snowflakes are examples of complex systems. An area of interest in Complex systems is the chaos theory. This revolutionary field of science presents different ways of study than determinism and reductionism. Here is where in junction with the Nonlinear DSP, chaos theory offer valuable techniques that establish a link between time series and complex theory in terms of complex networks, so that, the study of signals can be explored from the graph theory. Recently, some people had purposed a method to transform time series in graphs, but no one had developed a suitable implementation in Python with signals extracted from Chaotic Systems or Complex systems. That’s why the implementation in Python of an existing method to transform one dimensional chaotic signals from time domain to graph domain and some measures that may reveal information not extracted in the time domain is proposed.

Keywords: Python, complex systems, graph theory, dynamical systems

Procedia PDF Downloads 483
8284 Review on Implementation of Artificial Intelligence and Machine Learning for Controlling Traffic and Avoiding Accidents

Authors: Neha Singh, Shristi Singh

Abstract:

Accidents involving motor vehicles are more likely to cause serious injuries and fatalities. It also has a host of other perpetual issues, such as the regular loss of life and goods in accidents. To solve these issues, appropriate measures must be implemented, such as establishing an autonomous incident detection system that makes use of machine learning and artificial intelligence. In order to reduce traffic accidents, this article examines the overview of artificial intelligence and machine learning in autonomous event detection systems. The paper explores the major issues, prospective solutions, and use of artificial intelligence and machine learning in road transportation systems for minimising traffic accidents. There is a lot of discussion on additional, fresh, and developing approaches that less frequent accidents in the transportation industry. The study structured the following subtopics specifically: traffic management using machine learning and artificial intelligence and an incident detector with these two technologies. The internet of vehicles and vehicle ad hoc networks, as well as the use of wireless communication technologies like 5G wireless networks and the use of machine learning and artificial intelligence for the planning of road transportation systems, are elaborated. In addition, safety is the primary concern of road transportation. Route optimization, cargo volume forecasting, predictive fleet maintenance, real-time vehicle tracking, and traffic management, according to the review's key conclusions, are essential for ensuring the safety of road transportation networks. In addition to highlighting research trends, unanswered problems, and key research conclusions, the study also discusses the difficulties in applying artificial intelligence to road transport systems. Planning and managing the road transportation system might use the work as a resource.

Keywords: artificial intelligence, machine learning, incident detector, road transport systems, traffic management, automatic incident detection, deep learning

Procedia PDF Downloads 72
8283 Charting Sentiments with Naive Bayes and Logistic Regression

Authors: Jummalla Aashrith, N. L. Shiva Sai, K. Bhavya Sri

Abstract:

The swift progress of web technology has not only amassed a vast reservoir of internet data but also triggered a substantial surge in data generation. The internet has metamorphosed into one of the dynamic hubs for online education, idea dissemination, as well as opinion-sharing. Notably, the widely utilized social networking platform Twitter is experiencing considerable expansion, providing users with the ability to share viewpoints, participate in discussions spanning diverse communities, and broadcast messages on a global scale. The upswing in online engagement has sparked a significant curiosity in subjective analysis, particularly when it comes to Twitter data. This research is committed to delving into sentiment analysis, focusing specifically on the realm of Twitter. It aims to offer valuable insights into deciphering information within tweets, where opinions manifest in a highly unstructured and diverse manner, spanning a spectrum from positivity to negativity, occasionally punctuated by neutrality expressions. Within this document, we offer a comprehensive exploration and comparative assessment of modern approaches to opinion mining. Employing a range of machine learning algorithms such as Naive Bayes and Logistic Regression, our investigation plunges into the domain of Twitter data streams. We delve into overarching challenges and applications inherent in the realm of subjectivity analysis over Twitter.

Keywords: machine learning, sentiment analysis, visualisation, python

Procedia PDF Downloads 26
8282 Predictive Modeling of Student Behavior in Virtual Reality: A Machine Learning Approach

Authors: Gayathri Sadanala, Shibam Pokhrel, Owen Murphy

Abstract:

In the ever-evolving landscape of education, Virtual Reality (VR) environments offer a promising avenue for enhancing student engagement and learning experiences. However, understanding and predicting student behavior within these immersive settings remain challenging tasks. This paper presents a comprehensive study on the predictive modeling of student behavior in VR using machine learning techniques. We introduce a rich data set capturing student interactions, movements, and progress within a VR orientation program. The dataset is divided into training and testing sets, allowing us to develop and evaluate predictive models for various aspects of student behavior, including engagement levels, task completion, and performance. Our machine learning approach leverages a combination of feature engineering and model selection to reveal hidden patterns in the data. We employ regression and classification models to predict student outcomes, and the results showcase promising accuracy in forecasting behavior within VR environments. Furthermore, we demonstrate the practical implications of our predictive models for personalized VR-based learning experiences and early intervention strategies. By uncovering the intricate relationship between student behavior and VR interactions, we provide valuable insights for educators, designers, and developers seeking to optimize virtual learning environments.

Keywords: interaction, machine learning, predictive modeling, virtual reality

Procedia PDF Downloads 89
8281 ANAC-id - Facial Recognition to Detect Fraud

Authors: Giovanna Borges Bottino, Luis Felipe Freitas do Nascimento Alves Teixeira

Abstract:

This article aims to present a case study of the National Civil Aviation Agency (ANAC) in Brazil, ANAC-id. ANAC-id is the artificial intelligence algorithm developed for image analysis that recognizes standard images of unobstructed and uprighted face without sunglasses, allowing to identify potential inconsistencies. It combines YOLO architecture and 3 libraries in python - face recognition, face comparison, and deep face, providing robust analysis with high level of accuracy.

Keywords: artificial intelligence, deepface, face compare, face recognition, YOLO, computer vision

Procedia PDF Downloads 125
8280 Advanced Machine Learning Algorithm for Credit Card Fraud Detection

Authors: Manpreet Kaur

Abstract:

When legitimate credit card users are mistakenly labelled as fraudulent in numerous financial delated applications, there are numerous ethical problems. The innovative machine learning approach we have suggested in this research outperforms the current models and shows how to model a data set for credit card fraud detection while minimizing false positives. As a result, we advise using random forests as the best machine learning method for predicting and identifying credit card transaction fraud. The majority of victims of these fraudulent transactions were discovered to be credit card users over the age of 60, with a higher percentage of fraudulent transactions taking place between the specific hours.

Keywords: automated fraud detection, isolation forest method, local outlier factor, ML algorithm, credit card

Procedia PDF Downloads 79
8279 Optimization of 3D Printing Parameters Using Machine Learning to Enhance Mechanical Properties in Fused Deposition Modeling (FDM) Technology

Authors: Darwin Junnior Sabino Diego, Brando Burgos Guerrero, Diego Arroyo Villanueva

Abstract:

Additive manufacturing, commonly known as 3D printing, has revolutionized modern manufacturing by enabling the agile creation of complex objects. However, challenges persist in the consistency and quality of printed parts, particularly in their mechanical properties. This study focuses on addressing these challenges through the optimization of printing parameters in FDM technology, using Machine Learning techniques. Our aim is to improve the mechanical properties of printed objects by optimizing parameters such as speed, temperature, and orientation. We implement a methodology that combines experimental data collection with Machine Learning algorithms to identify relationships between printing parameters and mechanical properties. The results demonstrate the potential of this methodology to enhance the quality and consistency of 3D printed products, with significant applications across various industrial fields. This research not only advances understanding of additive manufacturing but also opens new avenues for practical implementation in industrial settings.

Keywords: 3D printing, additive manufacturing, machine learning, mechanical properties

Procedia PDF Downloads 11
8278 Machine Learning Analysis of Student Success in Introductory Calculus Based Physics I Course

Authors: Chandra Prayaga, Aaron Wade, Lakshmi Prayaga, Gopi Shankar Mallu

Abstract:

This paper presents the use of machine learning algorithms to predict the success of students in an introductory physics course. Data having 140 rows pertaining to the performance of two batches of students was used. The lack of sufficient data to train robust machine learning models was compensated for by generating synthetic data similar to the real data. CTGAN and CTGAN with Gaussian Copula (Gaussian) were used to generate synthetic data, with the real data as input. To check the similarity between the real data and each synthetic dataset, pair plots were made. The synthetic data was used to train machine learning models using the PyCaret package. For the CTGAN data, the Ada Boost Classifier (ADA) was found to be the ML model with the best fit, whereas the CTGAN with Gaussian Copula yielded Logistic Regression (LR) as the best model. Both models were then tested for accuracy with the real data. ROC-AUC analysis was performed for all the ten classes of the target variable (Grades A, A-, B+, B, B-, C+, C, C-, D, F). The ADA model with CTGAN data showed a mean AUC score of 0.4377, but the LR model with the Gaussian data showed a mean AUC score of 0.6149. ROC-AUC plots were obtained for each Grade value separately. The LR model with Gaussian data showed consistently better AUC scores compared to the ADA model with CTGAN data, except in two cases of the Grade value, C- and A-.

Keywords: machine learning, student success, physics course, grades, synthetic data, CTGAN, gaussian copula CTGAN

Procedia PDF Downloads 18
8277 An Extensible Software Infrastructure for Computer Aided Custom Monitoring of Patients in Smart Homes

Authors: Ritwik Dutta, Marylin Wolf

Abstract:

This paper describes the trade-offs and the design from scratch of a self-contained, easy-to-use health dashboard software system that provides customizable data tracking for patients in smart homes. The system is made up of different software modules and comprises a front-end and a back-end component. Built with HTML, CSS, and JavaScript, the front-end allows adding users, logging into the system, selecting metrics, and specifying health goals. The back-end consists of a NoSQL Mongo database, a Python script, and a SimpleHTTPServer written in Python. The database stores user profiles and health data in JSON format. The Python script makes use of the PyMongo driver library to query the database and displays formatted data as a daily snapshot of user health metrics against target goals. Any number of standard and custom metrics can be added to the system, and corresponding health data can be fed automatically, via sensor APIs or manually, as text or picture data files. A real-time METAR request API permits correlating weather data with patient health, and an advanced query system is implemented to allow trend analysis of selected health metrics over custom time intervals. Available on the GitHub repository system, the project is free to use for academic purposes of learning and experimenting, or practical purposes by building on it.

Keywords: flask, Java, JavaScript, health monitoring, long-term care, Mongo, Python, smart home, software engineering, webserver

Procedia PDF Downloads 364
8276 Injury Prediction for Soccer Players Using Machine Learning

Authors: Amiel Satvedi, Richard Pyne

Abstract:

Injuries in professional sports occur on a regular basis. Some may be minor, while others can cause huge impact on a player's career and earning potential. In soccer, there is a high risk of players picking up injuries during game time. This research work seeks to help soccer players reduce the risk of getting injured by predicting the likelihood of injury while playing in the near future and then providing recommendations for intervention. The injury prediction tool will use a soccer player's number of minutes played on the field, number of appearances, distance covered and performance data for the current and previous seasons as variables to conduct statistical analysis and provide injury predictive results using a machine learning linear regression model.

Keywords: injury predictor, soccer injury prevention, machine learning in soccer, big data in soccer

Procedia PDF Downloads 146
8275 A Survey in Techniques for Imbalanced Intrusion Detection System Datasets

Authors: Najmeh Abedzadeh, Matthew Jacobs

Abstract:

An intrusion detection system (IDS) is a software application that monitors malicious activities and generates alerts if any are detected. However, most network activities in IDS datasets are normal, and the relatively few numbers of attacks make the available data imbalanced. Consequently, cyber-attacks can hide inside a large number of normal activities, and machine learning algorithms have difficulty learning and classifying the data correctly. In this paper, a comprehensive literature review is conducted on different types of algorithms for both implementing the IDS and methods in correcting the imbalanced IDS dataset. The most famous algorithms are machine learning (ML), deep learning (DL), synthetic minority over-sampling technique (SMOTE), and reinforcement learning (RL). Most of the research use the CSE-CIC-IDS2017, CSE-CIC-IDS2018, and NSL-KDD datasets for evaluating their algorithms.

Keywords: IDS, imbalanced datasets, sampling algorithms, big data

Procedia PDF Downloads 279
8274 Fake News Detection for Korean News Using Machine Learning Techniques

Authors: Tae-Uk Yun, Pullip Chung, Kee-Young Kwahk, Hyunchul Ahn

Abstract:

Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection using machine learning techniques over the past years. But, there have been no prior studies proposed an automated fake news detection method for Korean news to our best knowledge. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (topic modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as logistic regression, backpropagation network, support vector machine, and deep neural network can be applied. To validate the effectiveness of the proposed method, we collected about 200 short Korean news from Seoul National University’s FactCheck. which provides with detailed analysis reports from 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

Keywords: fake news detection, Korean news, machine learning, text mining

Procedia PDF Downloads 243
8273 Community Integration: Post-Secondary Education (PSE) and Library Programming

Authors: Leah Plocharczyk, Matthew Conner

Abstract:

This paper analyzes the relatively new trend of PSE programs which seek to provide education, vocational training, and a college experience to individuals with an intellectual and developmental disability (IDD). Specifically, the paper examines the degree of interaction between PSE programs and the libraries of their college campuses. Using ThinkCollege, a clearinghouse and advocate for PSE programs, the researchers identified 293 programs throughout the country. These were all contacted with an email survey asking them about the nature of their involvement, if any, with the academic libraries on their campus. Where indicated by the responses, the libraries of PSE programs were contacted for additional information about their programming. Responses to the survey questions were tabulated and analyzed quantitatively. Written comments were analyzed for themes which were then tabulated. This paper presents the results of this study. They show obvious preferences for library programming, such as group formal instruction, individual liaisons, embedded reference, and various instructional designs. These are discussed in terms of special education principles of mainstreaming, level of restriction, training demands and cost effectiveness. The work serves as a foundation for best practices that can advance the field.

Keywords: disability studies, instructional design, universal design for learning, assessment methodology

Procedia PDF Downloads 42
8272 Prediction of MicroRNA-Target Gene by Machine Learning Algorithms in Lung Cancer Study

Authors: Nilubon Kurubanjerdjit, Nattakarn Iam-On, Ka-Lok Ng

Abstract:

MicroRNAs are small non-coding RNA found in many different species. They play crucial roles in cancer such as biological processes of apoptosis and proliferation. The identification of microRNA-target genes can be an essential first step towards to reveal the role of microRNA in various cancer types. In this paper, we predict miRNA-target genes for lung cancer by integrating prediction scores from miRanda and PITA algorithms used as a feature vector of miRNA-target interaction. Then, machine-learning algorithms were implemented for making a final prediction. The approach developed in this study should be of value for future studies into understanding the role of miRNAs in molecular mechanisms enabling lung cancer formation.

Keywords: microRNA, miRNAs, lung cancer, machine learning, Naïve Bayes, SVM

Procedia PDF Downloads 366
8271 Improved Performance in Content-Based Image Retrieval Using Machine Learning Approach

Authors: B. Ramesh Naik, T. Venugopal

Abstract:

This paper presents a novel approach which improves the high-level semantics of images based on machine learning approach. The contemporary approaches for image retrieval and object recognition includes Fourier transforms, Wavelets, SIFT and HoG. Though these descriptors helpful in a wide range of applications, they exploit zero order statistics, and this lacks high descriptiveness of image features. These descriptors usually take benefit of primitive visual features such as shape, color, texture and spatial locations to describe images. These features do not adequate to describe high-level semantics of the images. This leads to a gap in semantic content caused to unacceptable performance in image retrieval system. A novel method has been proposed referred as discriminative learning which is derived from machine learning approach that efficiently discriminates image features. The analysis and results of proposed approach were validated thoroughly on WANG and Caltech-101 Databases. The results proved that this approach is very competitive in content-based image retrieval.

Keywords: CBIR, discriminative learning, region weight learning, scale invariant feature transforms

Procedia PDF Downloads 148
8270 Uplink Throughput Prediction in Cellular Mobile Networks

Authors: Engin Eyceyurt, Josko Zec

Abstract:

The current and future cellular mobile communication networks generate enormous amounts of data. Networks have become extremely complex with extensive space of parameters, features and counters. These networks are unmanageable with legacy methods and an enhanced design and optimization approach is necessary that is increasingly reliant on machine learning. This paper proposes that machine learning as a viable approach for uplink throughput prediction. LTE radio metric, such as Reference Signal Received Power (RSRP), Reference Signal Received Quality (RSRQ), and Signal to Noise Ratio (SNR) are used to train models to estimate expected uplink throughput. The prediction accuracy with high determination coefficient of 91.2% is obtained from measurements collected with a simple smartphone application.

Keywords: drive test, LTE, machine learning, uplink throughput prediction

Procedia PDF Downloads 127
8269 Missing Link Data Estimation with Recurrent Neural Network: An Application Using Speed Data of Daegu Metropolitan Area

Authors: JaeHwan Yang, Da-Woon Jeong, Seung-Young Kho, Dong-Kyu Kim

Abstract:

In terms of ITS, information on link characteristic is an essential factor for plan or operation. But in practical cases, not every link has installed sensors on it. The link that does not have data on it is called “Missing Link”. The purpose of this study is to impute data of these missing links. To get these data, this study applies the machine learning method. With the machine learning process, especially for the deep learning process, missing link data can be estimated from present link data. For deep learning process, this study uses “Recurrent Neural Network” to take time-series data of road. As input data, Dedicated Short-range Communications (DSRC) data of Dalgubul-daero of Daegu Metropolitan Area had been fed into the learning process. Neural Network structure has 17 links with present data as input, 2 hidden layers, for 1 missing link data. As a result, forecasted data of target link show about 94% of accuracy compared with actual data.

Keywords: data estimation, link data, machine learning, road network

Procedia PDF Downloads 486
8268 Fine-Grained Sentiment Analysis: Recent Progress

Authors: Jie Liu, Xudong Luo, Pingping Lin, Yifan Fan

Abstract:

Facebook, Twitter, Weibo, and other social media and significant e-commerce sites generate a massive amount of online texts, which can be used to analyse people’s opinions or sentiments for better decision-making. So, sentiment analysis, especially fine-grained sentiment analysis, is a very active research topic. In this paper, we survey various methods for fine-grained sentiment analysis, including traditional sentiment lexicon-based methods, machine learning-based methods, and deep learning-based methods in aspect/target/attribute-based sentiment analysis tasks. Besides, we discuss their advantages and problems worthy of careful studies in the future.

Keywords: sentiment analysis, fine-grained, machine learning, deep learning

Procedia PDF Downloads 219
8267 Resilient Machine Learning in the Nuclear Industry: Crack Detection as a Case Study

Authors: Anita Khadka, Gregory Epiphaniou, Carsten Maple

Abstract:

There is a dramatic surge in the adoption of machine learning (ML) techniques in many areas, including the nuclear industry (such as fault diagnosis and fuel management in nuclear power plants), autonomous systems (including self-driving vehicles), space systems (space debris recovery, for example), medical surgery, network intrusion detection, malware detection, to name a few. With the application of learning methods in such diverse domains, artificial intelligence (AI) has become a part of everyday modern human life. To date, the predominant focus has been on developing underpinning ML algorithms that can improve accuracy, while factors such as resiliency and robustness of algorithms have been largely overlooked. If an adversarial attack is able to compromise the learning method or data, the consequences can be fatal, especially but not exclusively in safety-critical applications. In this paper, we present an in-depth analysis of five adversarial attacks and three defence methods on a crack detection ML model. Our analysis shows that it can be dangerous to adopt machine learning techniques in security-critical areas such as the nuclear industry without rigorous testing since they may be vulnerable to adversarial attacks. While common defence methods can effectively defend against different attacks, none of the three considered can provide protection against all five adversarial attacks analysed.

Keywords: adversarial machine learning, attacks, defences, nuclear industry, crack detection

Procedia PDF Downloads 130