Search results for: accuracy
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3723

Search results for: accuracy

2913 Emotion Detection in Twitter Messages Using Combination of Long Short-Term Memory and Convolutional Deep Neural Networks

Authors: Bahareh Golchin, Nooshin Riahi

Abstract:

One of the most significant issues as attended a lot in recent years is that of recognizing the sentiments and emotions in social media texts. The analysis of sentiments and emotions is intended to recognize the conceptual information such as the opinions, feelings, attitudes and emotions of people towards the products, services, organizations, people, topics, events and features in the written text. These indicate the greatness of the problem space. In the real world, businesses and organizations are always looking for tools to gather ideas, emotions, and directions of people about their products, services, or events related to their own. This article uses the Twitter social network, one of the most popular social networks with about 420 million active users, to extract data. Using this social network, users can share their information and opinions about personal issues, policies, products, events, etc. It can be used with appropriate classification of emotional states due to the availability of its data. In this study, supervised learning and deep neural network algorithms are used to classify the emotional states of Twitter users. The use of deep learning methods to increase the learning capacity of the model is an advantage due to the large amount of available data. Tweets collected on various topics are classified into four classes using a combination of two Bidirectional Long Short Term Memory network and a Convolutional network. The results obtained from this study with an average accuracy of 93%, show good results extracted from the proposed framework and improved accuracy compared to previous work.

Keywords: emotion classification, sentiment analysis, social networks, deep neural networks

Procedia PDF Downloads 137
2912 Generalized Additive Model for Estimating Propensity Score

Authors: Tahmidul Islam

Abstract:

Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.

Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching

Procedia PDF Downloads 367
2911 A Robust System for Foot Arch Type Classification from Static Foot Pressure Distribution Data Using Linear Discriminant Analysis

Authors: R. Periyasamy, Deepak Joshi, Sneh Anand

Abstract:

Foot posture assessment is important to evaluate foot type, causing gait and postural defects in all age groups. Although different methods are used for classification of foot arch type in clinical/research examination, there is no clear approach for selecting the most appropriate measurement system. Therefore, the aim of this study was to develop a system for evaluation of foot type as clinical decision-making aids for diagnosis of flat and normal arch based on the Arch Index (AI) and foot pressure distribution parameter - Power Ratio (PR) data. The accuracy of the system was evaluated for 27 subjects with age ranging from 24 to 65 years. Foot area measurements (hind foot, mid foot, and forefoot) were acquired simultaneously from foot pressure intensity image using portable PedoPowerGraph system and analysis of the image in frequency domain to obtain foot pressure distribution parameter - PR data. From our results, we obtain 100% classification accuracy of normal and flat foot by using the linear discriminant analysis method. We observe there is no misclassification of foot types because of incorporating foot pressure distribution data instead of only arch index (AI). We found that the mid-foot pressure distribution ratio data and arch index (AI) value are well correlated to foot arch type based on visual analysis. Therefore, this paper suggests that the proposed system is accurate and easy to determine foot arch type from arch index (AI), as well as incorporating mid-foot pressure distribution ratio data instead of physical area of contact. Hence, such computational tool based system can help the clinicians for assessment of foot structure and cross-check their diagnosis of flat foot from mid-foot pressure distribution.

Keywords: arch index, computational tool, static foot pressure intensity image, foot pressure distribution, linear discriminant analysis

Procedia PDF Downloads 499
2910 Modeling Engagement with Multimodal Multisensor Data: The Continuous Performance Test as an Objective Tool to Track Flow

Authors: Mohammad H. Taheri, David J. Brown, Nasser Sherkat

Abstract:

Engagement is one of the most important factors in determining successful outcomes and deep learning in students. Existing approaches to detect student engagement involve periodic human observations that are subject to inter-rater reliability. Our solution uses real-time multimodal multisensor data labeled by objective performance outcomes to infer the engagement of students. The study involves four students with a combined diagnosis of cerebral palsy and a learning disability who took part in a 3-month trial over 59 sessions. Multimodal multisensor data were collected while they participated in a continuous performance test. Eye gaze, electroencephalogram, body pose, and interaction data were used to create a model of student engagement through objective labeling from the continuous performance test outcomes. In order to achieve this, a type of continuous performance test is introduced, the Seek-X type. Nine features were extracted including high-level handpicked compound features. Using leave-one-out cross-validation, a series of different machine learning approaches were evaluated. Overall, the random forest classification approach achieved the best classification results. Using random forest, 93.3% classification for engagement and 42.9% accuracy for disengagement were achieved. We compared these results to outcomes from different models: AdaBoost, decision tree, k-Nearest Neighbor, naïve Bayes, neural network, and support vector machine. We showed that using a multisensor approach achieved higher accuracy than using features from any reduced set of sensors. We found that using high-level handpicked features can improve the classification accuracy in every sensor mode. Our approach is robust to both sensor fallout and occlusions. The single most important sensor feature to the classification of engagement and distraction was shown to be eye gaze. It has been shown that we can accurately predict the level of engagement of students with learning disabilities in a real-time approach that is not subject to inter-rater reliability, human observation or reliant on a single mode of sensor input. This will help teachers design interventions for a heterogeneous group of students, where teachers cannot possibly attend to each of their individual needs. Our approach can be used to identify those with the greatest learning challenges so that all students are supported to reach their full potential.

Keywords: affective computing in education, affect detection, continuous performance test, engagement, flow, HCI, interaction, learning disabilities, machine learning, multimodal, multisensor, physiological sensors, student engagement

Procedia PDF Downloads 94
2909 On the Solution of Fractional-Order Dynamical Systems Endowed with Block Hybrid Methods

Authors: Kizito Ugochukwu Nwajeri

Abstract:

This paper presents a distinct approach to solving fractional dynamical systems using hybrid block methods (HBMs). Fractional calculus extends the concept of derivatives and integrals to non-integer orders and finds increasing application in fields such as physics, engineering, and finance. However, traditional numerical techniques often struggle to accurately capture the complex behaviors exhibited by these systems. To address this challenge, we develop HBMs that integrate single-step and multi-step methods, enabling the simultaneous computation of multiple solution points while maintaining high accuracy. Our approach employs polynomial interpolation and collocation techniques to derive a system of equations that effectively models the dynamics of fractional systems. We also directly incorporate boundary and initial conditions into the formulation, enhancing the stability and convergence properties of the numerical solution. An adaptive step-size mechanism is introduced to optimize performance based on the local behavior of the solution. Extensive numerical simulations are conducted to evaluate the proposed methods, demonstrating significant improvements in accuracy and efficiency compared to traditional numerical approaches. The results indicate that our hybrid block methods are robust and versatile, making them suitable for a wide range of applications involving fractional dynamical systems. This work contributes to the existing literature by providing an effective numerical framework for analyzing complex behaviors in fractional systems, thereby opening new avenues for research and practical implementation across various disciplines.

Keywords: fractional calculus, numerical simulation, stability and convergence, Adaptive step-size mechanism, collocation methods

Procedia PDF Downloads 43
2908 Comparison of Deep Learning and Machine Learning Algorithms to Diagnose and Predict Breast Cancer

Authors: F. Ghazalnaz Sharifonnasabi, Iman Makhdoom

Abstract:

Breast cancer is a serious health concern that affects many people around the world. According to a study published in the Breast journal, the global burden of breast cancer is expected to increase significantly over the next few decades. The number of deaths from breast cancer has been increasing over the years, but the age-standardized mortality rate has decreased in some countries. It’s important to be aware of the risk factors for breast cancer and to get regular check- ups to catch it early if it does occur. Machin learning techniques have been used to aid in the early detection and diagnosis of breast cancer. These techniques, that have been shown to be effective in predicting and diagnosing the disease, have become a research hotspot. In this study, we consider two deep learning approaches including: Multi-Layer Perceptron (MLP), and Convolutional Neural Network (CNN). We also considered the five-machine learning algorithm titled: Decision Tree (C4.5), Naïve Bayesian (NB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) Algorithm and XGBoost (eXtreme Gradient Boosting) on the Breast Cancer Wisconsin Diagnostic dataset. We have carried out the process of evaluating and comparing classifiers involving selecting appropriate metrics to evaluate classifier performance and selecting an appropriate tool to quantify this performance. The main purpose of the study is predicting and diagnosis breast cancer, applying the mentioned algorithms and also discovering of the most effective with respect to confusion matrix, accuracy and precision. It is realized that CNN outperformed all other classifiers and achieved the highest accuracy (0.982456). The work is implemented in the Anaconda environment based on Python programing language.

Keywords: breast cancer, multi-layer perceptron, Naïve Bayesian, SVM, decision tree, convolutional neural network, XGBoost, KNN

Procedia PDF Downloads 75
2907 Weight Estimation Using the K-Means Method in Steelmaking’s Overhead Cranes in Order to Reduce Swing Error

Authors: Seyedamir Makinejadsanij

Abstract:

One of the most important factors in the production of quality steel is to know the exact weight of steel in the steelmaking area. In this study, a calculation method is presented to estimate the exact weight of the melt as well as the objects transported by the overhead crane. Iran Alloy Steel Company's steelmaking area has three 90-ton cranes, which are responsible for transferring the ladles and ladle caps between 34 areas in the melt shop. Each crane is equipped with a Disomat Tersus weighing system that calculates and displays real-time weight. The moving object has a variable weight due to swinging, and the weighing system has an error of about +-5%. This means that when the object is moving by a crane, which weighs about 80 tons, the device (Disomat Tersus system) calculates about 4 tons more or 4 tons less, and this is the biggest problem in calculating a real weight. The k-means algorithm is an unsupervised clustering method that was used here. The best result was obtained by considering 3 centers. Compared to the normal average(one) or two, four, five, and six centers, the best answer is with 3 centers, which is logically due to the elimination of noise above and below the real weight. Every day, the standard weight is moved with working cranes to test and calibrate cranes. The results are shown that the accuracy is about 40 kilos per 60 tons (standard weight). As a result, with this method, the accuracy of moving weight is calculated as 99.95%. K-means is used to calculate the exact mean of objects. The stopping criterion of the algorithm is also the number of 1000 repetitions or not moving the points between the clusters. As a result of the implementation of this system, the crane operator does not stop while moving objects and continues his activity regardless of weight calculations. Also, production speed increased, and human error decreased.

Keywords: k-means, overhead crane, melt weight, weight estimation, swing problem

Procedia PDF Downloads 90
2906 A New Center of Motion in Cabling Robots

Authors: Alireza Abbasi Moshaii, Farshid Najafi

Abstract:

In this paper a new model for centre of motion creating is proposed. This new method uses cables. So, it is very useful in robots because it is light and has easy assembling process. In the robots which need to be in touch with some things this method is very good. It will be described in the following. The accuracy of the idea is proved by an experiment. This system could be used in the robots which need a fixed point in the contact with some things and make a circular motion. Such as dancer, physician or repair robots.

Keywords: centre of motion, robotic cables, permanent touching, mechatronics engineering

Procedia PDF Downloads 443
2905 Thick Data Techniques for Identifying Abnormality in Video Frames for Wireless Capsule Endoscopy

Authors: Jinan Fiaidhi, Sabah Mohammed, Petros Zezos

Abstract:

Capsule endoscopy (CE) is an established noninvasive diagnostic modality in investigating small bowel disease. CE has a pivotal role in assessing patients with suspected bleeding or identifying evidence of active Crohn's disease in the small bowel. However, CE produces lengthy videos with at least eighty thousand frames, with a frequency rate of 2 frames per second. Gastroenterologists cannot dedicate 8 to 15 hours to reading the CE video frames to arrive at a diagnosis. This is why the issue of analyzing CE videos based on modern artificial intelligence techniques becomes a necessity. However, machine learning, including deep learning, has failed to report robust results because of the lack of large samples to train its neural nets. In this paper, we are describing a thick data approach that learns from a few anchor images. We are using sound datasets like KVASIR and CrohnIPI to filter candidate frames that include interesting anomalies in any CE video. We are identifying candidate frames based on feature extraction to provide representative measures of the anomaly, like the size of the anomaly and the color contrast compared to the image background, and later feed these features to a decision tree that can classify the candidate frames as having a condition like the Crohn's Disease. Our thick data approach reported accuracy of detecting Crohn's Disease based on the availability of ulcer areas at the candidate frames for KVASIR was 89.9% and for the CrohnIPI was 83.3%. We are continuing our research to fine-tune our approach by adding more thick data methods for enhancing diagnosis accuracy.

Keywords: thick data analytics, capsule endoscopy, Crohn’s disease, siamese neural network, decision tree

Procedia PDF Downloads 156
2904 A 3D Cell-Based Biosensor for Real-Time and Non-Invasive Monitoring of 3D Cell Viability and Drug Screening

Authors: Yuxiang Pan, Yong Qiu, Chenlei Gu, Ping Wang

Abstract:

In the past decade, three-dimensional (3D) tumor cell models have attracted increasing interest in the field of drug screening due to their great advantages in simulating more accurately the heterogeneous tumor behavior in vivo. Drug sensitivity testing based on 3D tumor cell models can provide more reliable in vivo efficacy prediction. The gold standard fluorescence staining is hard to achieve the real-time and label-free monitoring of the viability of 3D tumor cell models. In this study, micro-groove impedance sensor (MGIS) was specially developed for dynamic and non-invasive monitoring of 3D cell viability. 3D tumor cells were trapped in the micro-grooves with opposite gold electrodes for the in-situ impedance measurement. The change of live cell number would cause inversely proportional change to the impedance magnitude of the entire cell/matrigel to construct and reflect the proliferation and apoptosis of 3D cells. It was confirmed that 3D cell viability detected by the MGIS platform is highly consistent with the standard live/dead staining. Furthermore, the accuracy of MGIS platform was demonstrated quantitatively using 3D lung cancer model and sophisticated drug sensitivity testing. In addition, the parameters of micro-groove impedance chip processing and measurement experiments were optimized in details. The results demonstrated that the MGIS and 3D cell-based biosensor and would be a promising platform to improve the efficiency and accuracy of cell-based anti-cancer drug screening in vitro.

Keywords: micro-groove impedance sensor, 3D cell-based biosensors, 3D cell viability, micro-electromechanical systems

Procedia PDF Downloads 128
2903 Improving the Accuracy of Stress Intensity Factors Obtained by Scaled Boundary Finite Element Method on Hybrid Quadtree Meshes

Authors: Adrian W. Egger, Savvas P. Triantafyllou, Eleni N. Chatzi

Abstract:

The scaled boundary finite element method (SBFEM) is a semi-analytical numerical method, which introduces a scaling center in each element’s domain, thus transitioning from a Cartesian reference frame to one resembling polar coordinates. Consequently, an analytical solution is achieved in radial direction, implying that only the boundary need be discretized. The only limitation imposed on the resulting polygonal elements is that they remain star-convex. Further arbitrary p- or h-refinement may be applied locally in a mesh. The polygonal nature of SBFEM elements has been exploited in quadtree meshes to alleviate all issues conventionally associated with hanging nodes. Furthermore, since in 2D this results in only 16 possible cell configurations, these are precomputed in order to accelerate the forward analysis significantly. Any cells, which are clipped to accommodate the domain geometry, must be computed conventionally. However, since SBFEM permits polygonal elements, significantly coarser meshes at comparable accuracy levels are obtained when compared with conventional quadtree analysis, further increasing the computational efficiency of this scheme. The generalized stress intensity factors (gSIFs) are computed by exploiting the semi-analytical solution in radial direction. This is initiated by placing the scaling center of the element containing the crack at the crack tip. Taking an analytical limit of this element’s stress field as it approaches the crack tip, delivers an expression for the singular stress field. By applying the problem specific boundary conditions, the geometry correction factor is obtained, and the gSIFs are then evaluated based on their formal definition. Since the SBFEM solution is constructed as a power series, not unlike mode superposition in FEM, the two modes contributing to the singular response of the element can be easily identified in post-processing. Compared to the extended finite element method (XFEM) this approach is highly convenient, since neither enrichment terms nor a priori knowledge of the singularity is required. Computation of the gSIFs by SBFEM permits exceptional accuracy, however, when combined with hybrid quadtrees employing linear elements, this does not always hold. Nevertheless, it has been shown that crack propagation schemes are highly effective even given very coarse discretization since they only rely on the ratio of mode one to mode two gSIFs. The absolute values of the gSIFs may still be subject to large errors. Hence, we propose a post-processing scheme, which minimizes the error resulting from the approximation space of the cracked element, thus limiting the error in the gSIFs to the discretization error of the quadtree mesh. This is achieved by h- and/or p-refinement of the cracked element, which elevates the amount of modes present in the solution. The resulting numerical description of the element is highly accurate, with the main error source now stemming from its boundary displacement solution. Numerical examples show that this post-processing procedure can significantly improve the accuracy of the computed gSIFs with negligible computational cost even on coarse meshes resulting from hybrid quadtrees.

Keywords: linear elastic fracture mechanics, generalized stress intensity factors, scaled finite element method, hybrid quadtrees

Procedia PDF Downloads 146
2902 Uniqueness of Fingerprint Biometrics to Human Dynasty: A Review

Authors: Siddharatha Sharma

Abstract:

With the advent of technology and machines, the role of biometrics in society is taking an important place for secured living. Security issues are the major concern in today’s world and continue to grow in intensity and complexity. Biometrics based recognition, which involves precise measurement of the characteristics of living beings, is not a new method. Fingerprints are being used for several years by law enforcement and forensic agencies to identify the culprits and apprehend them. Biometrics is based on four basic principles i.e. (i) uniqueness, (ii) accuracy, (iii) permanency and (iv) peculiarity. In today’s world fingerprints are the most popular and unique biometrics method claiming a social benefit in the government sponsored programs. A remarkable example of the same is UIDAI (Unique Identification Authority of India) in India. In case of fingerprint biometrics the matching accuracy is very high. It has been observed empirically that even the identical twins also do not have similar prints. With the passage of time there has been an immense progress in the techniques of sensing computational speed, operating environment and the storage capabilities and it has become more user convenient. Only a small fraction of the population may be unsuitable for automatic identification because of genetic factors, aging, environmental or occupational reasons for example workers who have cuts and bruises on their hands which keep fingerprints changing. Fingerprints are limited to human beings only because of the presence of volar skin with corrugated ridges which are unique to this species. Fingerprint biometrics has proved to be a high level authentication system for identification of the human beings. Though it has limitations, for example it may be inefficient and ineffective if ridges of finger(s) or palm are moist authentication becomes difficult. This paper would focus on uniqueness of fingerprints to the human beings in comparison to other living beings and review the advancement in emerging technologies and their limitations.

Keywords: fingerprinting, biometrics, human beings, authentication

Procedia PDF Downloads 325
2901 Electrical Machine Winding Temperature Estimation Using Stateful Long Short-Term Memory Networks (LSTM) and Truncated Backpropagation Through Time (TBPTT)

Authors: Yujiang Wu

Abstract:

As electrical machine (e-machine) power density re-querulents become more stringent in vehicle electrification, mounting a temperature sensor for e-machine stator windings becomes increasingly difficult. This can lead to higher manufacturing costs, complicated harnesses, and reduced reliability. In this paper, we propose a deep-learning method for predicting electric machine winding temperature, which can either replace the sensor entirely or serve as a backup to the existing sensor. We compare the performance of our method, the stateful long short-term memory networks (LSTM) with truncated backpropagation through time (TBTT), with that of linear regression, as well as stateless LSTM with/without residual connection. Our results demonstrate the strength of combining stateful LSTM and TBTT in tackling nonlinear time series prediction problems with long sequence lengths. Additionally, in industrial applications, high-temperature region prediction accuracy is more important because winding temperature sensing is typically used for derating machine power when the temperature is high. To evaluate the performance of our algorithm, we developed a temperature-stratified MSE. We propose a simple but effective data preprocessing trick to improve the high-temperature region prediction accuracy. Our experimental results demonstrate the effectiveness of our proposed method in accurately predicting winding temperature, particularly in high-temperature regions, while also reducing manufacturing costs and improving reliability.

Keywords: deep learning, electrical machine, functional safety, long short-term memory networks (LSTM), thermal management, time series prediction

Procedia PDF Downloads 99
2900 Establishment of a Test Bed for Integrated Map of Underground Space and Verification of GPR Exploration Equipment

Authors: Jisong Ryu, Woosik Lee, Yonggu Jang

Abstract:

The paper discusses the process of establishing a reliable test bed for verifying the usability of Ground Penetrating Radar (GPR) exploration equipment based on an integrated underground spatial map in Korea. The aim of this study is to construct a test bed consisting of metal and non-metal pipelines to verify the performance of GPR equipment and improve the accuracy of the underground spatial integrated map. The study involved the design and construction of a test bed for metal and non-metal pipe detecting tests. The test bed was built in the SOC Demonstration Research Center (Yeoncheon) of the Korea Institute of Civil Engineering and Building Technology, burying metal and non-metal pipelines up to a depth of 5m. The test bed was designed in both vehicle-type and cart-type GPR-mounted equipment. The study collected data through the construction of the test bed and conducting metal and non-metal pipe detecting tests. The study analyzed the reliability of GPR detecting results by comparing them with the basic drawings, such as the underground space integrated map. The study contributes to the improvement of GPR equipment performance evaluation and the accuracy of the underground spatial integrated map, which is essential for urban planning and construction. The study addressed the question of how to verify the usability of GPR exploration equipment based on an integrated underground spatial map and improve its performance. The study found that the test bed is reliable for verifying the performance of GPR exploration equipment and accurately detecting metal and non-metal pipelines using an integrated underground spatial map. The study concludes that the establishment of a test bed for verifying the usability of GPR exploration equipment based on an integrated underground spatial map is essential. The proposed Korean-style test bed can be used for the evaluation of GPR equipment performance and support the construction of a national non-metal pipeline exploration equipment performance evaluation center in Korea.

Keywords: Korea-style GPR testbed, GPR, metal pipe detecting, non-metal pipe detecting

Procedia PDF Downloads 100
2899 Numerical Analysis of a Strainer Using Porous Media Technique

Authors: Ji-Hoon Byeon, Kwon-Hee Lee

Abstract:

Strainer filter serves to block the inflow of impurities while mixed fluid is entering or exiting the piping. The filter of the strainer has a perforated structure, so that the pressure drop and the velocity change necessarily occur when the mixed fluid passes through the filter. It is possible to predict the pressure drop and velocity change of the strainer by numerical analysis by implementing all the perforated plates. However, if the size of the perforated plate exceeds a certain size, it is difficult to perform the numerical analysis, and sometimes we cannot guarantee its accuracy. In this study, we tried to predict the pressure drop and velocity change by using the porous media technique to obtain the equivalent resistance without actual implementation of the perforation shape of the strainer. Ansys-CFX, a commercial software, is used to perform the numerical analysis. The analysis procedure is as follows. Firstly, the unit pattern of the perforated plate is modeled, and the pressure drop is analyzed by varying the velocity by symmetry of the wall surface. Secondly, since the equation for obtaining resistance is a quadratic equation of pressure having unknown velocity, the viscous resistance and the inertia resistance of the perforated plate are obtained from the relationship between pressure and speed. Thirdly, by using the calculated resistance values, the values are substituted into the flat plate implemented as a two-dimensional porous media, and the accuracy is verified by comparing the pressure drop and the velocity change. Fourthly, the pressure drop and velocity change in the whole strainer are analyzed by using the resistance values obtained on the perforated plate in the actual whole strainer model. Using the porous media technique, it is found that pressure drop and velocity change can be predicted in relatively short time without modeling the overall shape of the filter. Acknowledgements: This work was supported by the Valve Center from the Regional Innovation Center(RIC) Program of Ministry of Trade, Industry & Energy (MOTIE).

Keywords: strainer, porous media, CFD, numerical analysis

Procedia PDF Downloads 371
2898 Quality Analysis of Vegetables Through Image Processing

Authors: Abdul Khalique Baloch, Ali Okatan

Abstract:

The quality analysis of food and vegetable from image is hot topic now a day, where researchers make them better then pervious findings through different technique and methods. In this research we have review the literature, and find gape from them, and suggest better proposed approach, design the algorithm, developed a software to measure the quality from images, where accuracy of image show better results, and compare the results with Perouse work done so for. The Application we uses an open-source dataset and python language with tensor flow lite framework. In this research we focus to sort food and vegetable from image, in the images, the application can sorts and make them grading after process the images, it could create less errors them human base sorting errors by manual grading. Digital pictures datasets were created. The collected images arranged by classes. The classification accuracy of the system was about 94%. As fruits and vegetables play main role in day-to-day life, the quality of fruits and vegetables is necessary in evaluating agricultural produce, the customer always buy good quality fruits and vegetables. This document is about quality detection of fruit and vegetables using images. Most of customers suffering due to unhealthy foods and vegetables by suppliers, so there is no proper quality measurement level followed by hotel managements. it have developed software to measure the quality of the fruits and vegetables by using images, it will tell you how is your fruits and vegetables are fresh or rotten. Some algorithms reviewed in this thesis including digital images, ResNet, VGG16, CNN and Transfer Learning grading feature extraction. This application used an open source dataset of images and language used python, and designs a framework of system.

Keywords: deep learning, computer vision, image processing, rotten fruit detection, fruits quality criteria, vegetables quality criteria

Procedia PDF Downloads 70
2897 Land Use Change Detection Using Satellite Images for Najran City, Kingdom of Saudi Arabia (KSA)

Authors: Ismail Elkhrachy

Abstract:

Determination of land use changing is an important component of regional planning for applications ranging from urban fringe change detection to monitoring change detection of land use. This data are very useful for natural resources management.On the other hand, the technologies and methods of change detection also have evolved dramatically during past 20 years. So it has been well recognized that the change detection had become the best methods for researching dynamic change of land use by multi-temporal remotely-sensed data. The objective of this paper is to assess, evaluate and monitor land use change surrounding the area of Najran city, Kingdom of Saudi Arabia (KSA) using Landsat images (June 23, 2009) and ETM+ image(June. 21, 2014). The post-classification change detection technique was applied. At last,two-time subset images of Najran city are compared on a pixel-by-pixel basis using the post-classification comparison method and the from-to change matrix is produced, the land use change information obtained.Three classes were obtained, urban, bare land and agricultural land from unsupervised classification method by using Erdas Imagine and ArcGIS software. Accuracy assessment of classification has been performed before calculating change detection for study area. The obtained accuracy is between 61% to 87% percent for all the classes. Change detection analysis shows that rapid growth in urban area has been increased by 73.2%, the agricultural area has been decreased by 10.5 % and barren area reduced by 7% between 2009 and 2014. The quantitative study indicated that the area of urban class has unchanged by 58.2 km〗^2, gained 70.3 〖km〗^2 and lost 16 〖km〗^2. For bare land class 586.4〖km〗^2 has unchanged, 53.2〖km〗^2 has gained and 101.5〖km〗^2 has lost. While agriculture area class, 20.2〖km〗^2 has unchanged, 31.2〖km〗^2 has gained and 37.2〖km〗^2 has lost.

Keywords: land use, remote sensing, change detection, satellite images, image classification

Procedia PDF Downloads 524
2896 A Hybrid Multi-Criteria Hotel Recommender System Using Explicit and Implicit Feedbacks

Authors: Ashkan Ebadi, Adam Krzyzak

Abstract:

Recommender systems, also known as recommender engines, have become an important research area and are now being applied in various fields. In addition, the techniques behind the recommender systems have been improved over the time. In general, such systems help users to find their required products or services (e.g. books, music) through analyzing and aggregating other users’ activities and behavior, mainly in form of reviews, and making the best recommendations. The recommendations can facilitate user’s decision making process. Despite the wide literature on the topic, using multiple data sources of different types as the input has not been widely studied. Recommender systems can benefit from the high availability of digital data to collect the input data of different types which implicitly or explicitly help the system to improve its accuracy. Moreover, most of the existing research in this area is based on single rating measures in which a single rating is used to link users to items. This paper proposes a highly accurate hotel recommender system, implemented in various layers. Using multi-aspect rating system and benefitting from large-scale data of different types, the recommender system suggests hotels that are personalized and tailored for the given user. The system employs natural language processing and topic modelling techniques to assess the sentiment of the users’ reviews and extract implicit features. The entire recommender engine contains multiple sub-systems, namely users clustering, matrix factorization module, and hybrid recommender system. Each sub-system contributes to the final composite set of recommendations through covering a specific aspect of the problem. The accuracy of the proposed recommender system has been tested intensively where the results confirm the high performance of the system.

Keywords: tourism, hotel recommender system, hybrid, implicit features

Procedia PDF Downloads 272
2895 Software Development for AASHTO and Ethiopian Roads Authority Flexible Pavement Design Methods

Authors: Amare Setegn Enyew, Bikila Teklu Wodajo

Abstract:

The primary aim of flexible pavement design is to ensure the development of economical and safe road infrastructure. However, failures can still occur due to improper or erroneous structural design. In Ethiopia, the design of flexible pavements relies on doing calculations manually and selecting pavement structure from catalogue. The catalogue offers, in eight different charts, alternative structures for combinations of traffic and subgrade classes, as outlined in the Ethiopian Roads Authority (ERA) Pavement Design Manual 2001. Furthermore, design modification is allowed in accordance with the structural number principles outlined in the AASHTO 1993 Guide for Design of Pavement Structures. Nevertheless, the manual calculation and design process involves the use of nomographs, charts, tables, and formulas, which increases the likelihood of human errors and inaccuracies, and this may lead to unsafe or uneconomical road construction. To address the challenge, a software called AASHERA has been developed for AASHTO 1993 and ERA design methods, using MATLAB language. The software accurately determines the required thicknesses of flexible pavement surface, base, and subbase layers for the two methods. It also digitizes design inputs and references like nomographs, charts, default values, and tables. Moreover, the software allows easier comparison of the two design methods in terms of results and cost of construction. AASHERA's accuracy has been confirmed through comparisons with designs from handbooks and manuals. The software can aid in reducing human errors, inaccuracies, and time consumption as compared to the conventional manual design methods employed in Ethiopia. AASHERA, with its validated accuracy, proves to be an indispensable tool for flexible pavement structure designers.

Keywords: flexible pavement design, AASHTO 1993, ERA, MATLAB, AASHERA

Procedia PDF Downloads 63
2894 J-Integral Method for Assessment of Structural Integrity of a Pressure Vessel

Authors: Karthik K. R, Viswanath V, Asraff A. K

Abstract:

The first stage of a new-generation launch vehicle of ISRO makes use of large pressure vessels made of Aluminium alloy AA2219 to store fuel and oxidizer. These vessels have many weld joints that may contain cracks or crack-like defects during their fabrication. These defects may propagate across the vessel during pressure testing or while in service under the influence of tensile stresses leading to catastrophe. Though ductile materials exhibit significant stable crack growth prior to failure, it is not generally acceptable for an aerospace component. There is a need to predict the initiation of stable crack growth. The structural integrity of the vessel from fracture considerations can be studied by constructing the Failure Assessment Diagram (FAD) that accounts for both brittle fracture and plastic collapse. Critical crack sizes of the pressure vessel may be highly conservative if it is predicted from FAD alone. If the J-R curve for material under consideration is available apriori, the critical crack sizes can be predicted to a certain degree of accuracy. In this paper, a novel approach is proposed to predict the integrity of a weld in a pressure vessel made of AA2219 material. Fracture parameter ‘J-integral’ at the crack front, evaluated through finite element analyses, is used in the new procedure. Based on the simulation of tension tests carried out on SCT specimens by NASA, a cut-off value of J-integral value (J?ᵤₜ_ₒ??) is finalised. For the pressure vessel, J-integral at the crack front is evaluated through FE simulations incorporating different surface cracks at long seam weld in a cylinder and in dome petal welds. The obtained J-integral, at vessel level, is compared with a value of J?ᵤₜ_ₒ??, and the integrity of vessel weld in the presence of the surface crack is firmed up. The advantage of this methodology is that if SCT test data of any metal is available, the critical crack size in hardware fabricated using that material can be predicted to a better level of accuracy.

Keywords: FAD, j-integral, fracture, surface crack

Procedia PDF Downloads 187
2893 Financial Fraud Prediction for Russian Non-Public Firms Using Relational Data

Authors: Natalia Feruleva

Abstract:

The goal of this paper is to develop the fraud risk assessment model basing on both relational and financial data and test the impact of the relationships between Russian non-public companies on the likelihood of financial fraud commitment. Relationships mean various linkages between companies such as parent-subsidiary relationship and person-related relationships. These linkages may provide additional opportunities for committing fraud. Person-related relationships appear when firms share a director, or the director owns another firm. The number of companies belongs to CEO and managed by CEO, the number of subsidiaries was calculated to measure the relationships. Moreover, the dummy variable describing the existence of parent company was also included in model. Control variables such as financial leverage and return on assets were also implemented because they describe the motivating factors of fraud. To check the hypotheses about the influence of the chosen parameters on the likelihood of financial fraud, information about person-related relationships between companies, existence of parent company and subsidiaries, profitability and the level of debt was collected. The resulting sample consists of 160 Russian non-public firms. The sample includes 80 fraudsters and 80 non-fraudsters operating in 2006-2017. The dependent variable is dichotomous, and it takes the value 1 if the firm is engaged in financial crime, otherwise 0. Employing probit model, it was revealed that the number of companies which belong to CEO of the firm or managed by CEO has significant impact on the likelihood of financial fraud. The results obtained indicate that the more companies are affiliated with the CEO, the higher the likelihood that the company will be involved in financial crime. The forecast accuracy of the model is about is 80%. Thus, the model basing on both relational and financial data gives high level of forecast accuracy.

Keywords: financial fraud, fraud prediction, non-public companies, regression analysis, relational data

Procedia PDF Downloads 119
2892 Implementation of Fuzzy Version of Block Backward Differentiation Formulas for Solving Fuzzy Differential Equations

Authors: Z. B. Ibrahim, N. Ismail, K. I. Othman

Abstract:

Fuzzy Differential Equations (FDEs) play an important role in modelling many real life phenomena. The FDEs are used to model the behaviour of the problems that are subjected to uncertainty, vague or imprecise information that constantly arise in mathematical models in various branches of science and engineering. These uncertainties have to be taken into account in order to obtain a more realistic model and many of these models are often difficult and sometimes impossible to obtain the analytic solutions. Thus, many authors have attempted to extend or modified the existing numerical methods developed for solving Ordinary Differential Equations (ODEs) into fuzzy version in order to suit for solving the FDEs. Therefore, in this paper, we proposed the development of a fuzzy version of three-point block method based on Block Backward Differentiation Formulas (FBBDF) for the numerical solution of first order FDEs. The three-point block FBBDF method are implemented in uniform step size produces three new approximations simultaneously at each integration step using the same back values. Newton iteration of the FBBDF is formulated and the implementation is based on the predictor and corrector formulas in the PECE mode. For greater efficiency of the block method, the coefficients of the FBBDF are stored at the start of the program. The proposed FBBDF is validated through numerical results on some standard problems found in the literature and comparisons are made with the existing fuzzy version of the Modified Simpson and Euler methods in terms of the accuracy of the approximated solutions. The numerical results show that the FBBDF method performs better in terms of accuracy when compared to the Euler method when solving the FDEs.

Keywords: block, backward differentiation formulas, first order, fuzzy differential equations

Procedia PDF Downloads 319
2891 Detecting Indigenous Languages: A System for Maya Text Profiling and Machine Learning Classification Techniques

Authors: Alejandro Molina-Villegas, Silvia Fernández-Sabido, Eduardo Mendoza-Vargas, Fátima Miranda-Pestaña

Abstract:

The automatic detection of indigenous languages ​​in digital texts is essential to promote their inclusion in digital media. Underrepresented languages, such as Maya, are often excluded from language detection tools like Google’s language-detection library, LANGDETECT. This study addresses these limitations by developing a hybrid language detection solution that accurately distinguishes Maya (YUA) from Spanish (ES). Two strategies are employed: the first focuses on creating a profile for the Maya language within the LANGDETECT library, while the second involves training a Naive Bayes classification model with two categories, YUA and ES. The process includes comprehensive data preprocessing steps, such as cleaning, normalization, tokenization, and n-gram counting, applied to text samples collected from various sources, including articles from La Jornada Maya, a major newspaper in Mexico and the only media outlet that includes a Maya section. After the training phase, a portion of the data is used to create the YUA profile within LANGDETECT, which achieves an accuracy rate above 95% in identifying the Maya language during testing. Additionally, the Naive Bayes classifier, trained and tested on the same database, achieves an accuracy close to 98% in distinguishing between Maya and Spanish, with further validation through F1 score, recall, and logarithmic scoring, without signs of overfitting. This strategy, which combines the LANGDETECT profile with a Naive Bayes model, highlights an adaptable framework that can be extended to other underrepresented languages in future research. This fills a gap in Natural Language Processing and supports the preservation and revitalization of these languages.

Keywords: indigenous languages, language detection, Maya language, Naive Bayes classifier, natural language processing, low-resource languages

Procedia PDF Downloads 16
2890 Heliport Remote Safeguard System Based on Real-Time Stereovision 3D Reconstruction Algorithm

Authors: Ł. Morawiński, C. Jasiński, M. Jurkiewicz, S. Bou Habib, M. Bondyra

Abstract:

With the development of optics, electronics, and computers, vision systems are increasingly used in various areas of life, science, and industry. Vision systems have a huge number of applications. They can be used in quality control, object detection, data reading, e.g., QR-code, etc. A large part of them is used for measurement purposes. Some of them make it possible to obtain a 3D reconstruction of the tested objects or measurement areas. 3D reconstruction algorithms are mostly based on creating depth maps from data that can be acquired from active or passive methods. Due to the specific appliance in airfield technology, only passive methods are applicable because of other existing systems working on the site, which can be blinded on most spectral levels. Furthermore, reconstruction is required to work long distances ranging from hundreds of meters to tens of kilometers with low loss of accuracy even with harsh conditions such as fog, rain, or snow. In response to those requirements, HRESS (Heliport REmote Safeguard System) was developed; which main part is a rotational head with a two-camera stereovision rig gathering images around the head in 360 degrees along with stereovision 3D reconstruction and point cloud combination. The sub-pixel analysis introduced in the HRESS system makes it possible to obtain an increased distance measurement resolution and accuracy of about 3% for distances over one kilometer. Ultimately, this leads to more accurate and reliable measurement data in the form of a point cloud. Moreover, the program algorithm introduces operations enabling the filtering of erroneously collected data in the point cloud. All activities from the programming, mechanical and optical side are aimed at obtaining the most accurate 3D reconstruction of the environment in the measurement area.

Keywords: airfield monitoring, artificial intelligence, stereovision, 3D reconstruction

Procedia PDF Downloads 124
2889 A Statistical Approach to Predict and Classify the Commercial Hatchability of Chickens Using Extrinsic Parameters of Breeders and Eggs

Authors: M. S. Wickramarachchi, L. S. Nawarathna, C. M. B. Dematawewa

Abstract:

Hatchery performance is critical for the profitability of poultry breeder operations. Some extrinsic parameters of eggs and breeders cause to increase or decrease the hatchability. This study aims to identify the affecting extrinsic parameters on the commercial hatchability of local chicken's eggs and determine the most efficient classification model with a hatchability rate greater than 90%. In this study, seven extrinsic parameters were considered: egg weight, moisture loss, breeders age, number of fertilised eggs, shell width, shell length, and shell thickness. Multiple linear regression was performed to determine the most influencing variable on hatchability. First, the correlation between each parameter and hatchability were checked. Then a multiple regression model was developed, and the accuracy of the fitted model was evaluated. Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel, and Random Forest (RF) algorithms were applied to classify the hatchability. This grouping process was conducted using binary classification techniques. Hatchability was negatively correlated with egg weight, breeders' age, shell width, shell length, and positive correlations were identified with moisture loss, number of fertilised eggs, and shell thickness. Multiple linear regression models were more accurate than single linear models regarding the highest coefficient of determination (R²) with 94% and minimum AIC and BIC values. According to the classification results, RF, CART, and kNN had performed the highest accuracy values 0.99, 0.975, and 0.972, respectively, for the commercial hatchery process. Therefore, the RF is the most appropriate machine learning algorithm for classifying the breeder outcomes, which are economically profitable or not, in a commercial hatchery.

Keywords: classification models, egg weight, fertilised eggs, multiple linear regression

Procedia PDF Downloads 87
2888 Local Directional Encoded Derivative Binary Pattern Based Coral Image Classification Using Weighted Distance Gray Wolf Optimization Algorithm

Authors: Annalakshmi G., Sakthivel Murugan S.

Abstract:

This paper presents a local directional encoded derivative binary pattern (LDEDBP) feature extraction method that can be applied for the classification of submarine coral reef images. The classification of coral reef images using texture features is difficult due to the dissimilarities in class samples. In coral reef image classification, texture features are extracted using the proposed method called local directional encoded derivative binary pattern (LDEDBP). The proposed approach extracts the complete structural arrangement of the local region using local binary batten (LBP) and also extracts the edge information using local directional pattern (LDP) from the edge response available in a particular region, thereby achieving extra discriminative feature value. Typically the LDP extracts the edge details in all eight directions. The process of integrating edge responses along with the local binary pattern achieves a more robust texture descriptor than the other descriptors used in texture feature extraction methods. Finally, the proposed technique is applied to an extreme learning machine (ELM) method with a meta-heuristic algorithm known as weighted distance grey wolf optimizer (GWO) to optimize the input weight and biases of single-hidden-layer feed-forward neural networks (SLFN). In the empirical results, ELM-WDGWO demonstrated their better performance in terms of accuracy on all coral datasets, namely RSMAS, EILAT, EILAT2, and MLC, compared with other state-of-the-art algorithms. The proposed method achieves the highest overall classification accuracy of 94% compared to the other state of art methods.

Keywords: feature extraction, local directional pattern, ELM classifier, GWO optimization

Procedia PDF Downloads 163
2887 Investigation of Different Machine Learning Algorithms in Large-Scale Land Cover Mapping within the Google Earth Engine

Authors: Amin Naboureh, Ainong Li, Jinhu Bian, Guangbin Lei, Hamid Ebrahimy

Abstract:

Large-scale land cover mapping has become a new challenge in land change and remote sensing field because of involving a big volume of data. Moreover, selecting the right classification method, especially when there are different types of landscapes in the study area is quite difficult. This paper is an attempt to compare the performance of different machine learning (ML) algorithms for generating a land cover map of the China-Central Asia–West Asia Corridor that is considered as one of the main parts of the Belt and Road Initiative project (BRI). The cloud-based Google Earth Engine (GEE) platform was used for generating a land cover map for the study area from Landsat-8 images (2017) by applying three frequently used ML algorithms including random forest (RF), support vector machine (SVM), and artificial neural network (ANN). The selected ML algorithms (RF, SVM, and ANN) were trained and tested using reference data obtained from MODIS yearly land cover product and very high-resolution satellite images. The finding of the study illustrated that among three frequently used ML algorithms, RF with 91% overall accuracy had the best result in producing a land cover map for the China-Central Asia–West Asia Corridor whereas ANN showed the worst result with 85% overall accuracy. The great performance of the GEE in applying different ML algorithms and handling huge volume of remotely sensed data in the present study showed that it could also help the researchers to generate reliable long-term land cover change maps. The finding of this research has great importance for decision-makers and BRI’s authorities in strategic land use planning.

Keywords: land cover, google earth engine, machine learning, remote sensing

Procedia PDF Downloads 113
2886 The Direct Deconvolution Model for the Large Eddy Simulation of Turbulence

Authors: Ning Chang, Zelong Yuan, Yunpeng Wang, Jianchun Wang

Abstract:

Large eddy simulation (LES) has been extensively used in the investigation of turbulence. LES calculates the grid-resolved large-scale motions and leaves small scales modeled by sub lfilterscale (SFS) models. Among the existing SFS models, the deconvolution model has been used successfully in the LES of the engineering flows and geophysical flows. Despite the wide application of deconvolution models, the effects of subfilter scale dynamics and filter anisotropy on the accuracy of SFS modeling have not been investigated in depth. The results of LES are highly sensitive to the selection of fi lters and the anisotropy of the grid, which has been overlooked in previous research. In the current study, two critical aspects of LES are investigated. Firstly, we analyze the influence of sub-fi lter scale (SFS) dynamics on the accuracy of direct deconvolution models (DDM) at varying fi lter-to-grid ratios (FGR) in isotropic turbulence. An array of invertible filters are employed, encompassing Gaussian, Helmholtz I and II, Butterworth, Chebyshev I and II, Cauchy, Pao, and rapidly decaying filters. The signi ficance of FGR becomes evident, as it acts as a pivotal factor in error control for precise SFS stress prediction. When FGR is set to 1, the DDM models cannot accurately reconstruct the SFS stress due to the insufficient resolution of SFS dynamics. Notably, prediction capabilities are enhanced at an FGR of 2, resulting in accurate SFS stress reconstruction, except for cases involving Helmholtz I and II fi lters. A remarkable precision close to 100% is achieved at an FGR of 4 for all DDM models. Additionally, the further exploration extends to the fi lter anisotropy to address its impact on the SFS dynamics and LES accuracy. By employing dynamic Smagorinsky model (DSM), dynamic mixed model (DMM), and direct deconvolution model (DDM) with the anisotropic fi lter, aspect ratios (AR) ranging from 1 to 16 in LES fi lters are evaluated. The findings highlight the DDM's pro ficiency in accurately predicting SFS stresses under highly anisotropic filtering conditions. High correlation coefficients exceeding 90% are observed in the a priori study for the DDM's reconstructed SFS stresses, surpassing those of the DSM and DMM models. However, these correlations tend to decrease as lter anisotropy increases. In the a posteriori studies, the DDM model consistently outperforms the DSM and DMM models across various turbulence statistics, encompassing velocity spectra, probability density functions related to vorticity, SFS energy flux, velocity increments, strain-rate tensors, and SFS stress. It is observed that as fi lter anisotropy intensify , the results of DSM and DMM become worse, while the DDM continues to deliver satisfactory results across all fi lter-anisotropy scenarios. The fi ndings emphasize the DDM framework's potential as a valuable tool for advancing the development of sophisticated SFS models for LES of turbulence.

Keywords: deconvolution model, large eddy simulation, subfilter scale modeling, turbulence

Procedia PDF Downloads 75
2885 Comparison of Different Artificial Intelligence-Based Protein Secondary Structure Prediction Methods

Authors: Jamerson Felipe Pereira Lima, Jeane Cecília Bezerra de Melo

Abstract:

The difficulty and cost related to obtaining of protein tertiary structure information through experimental methods, such as X-ray crystallography or NMR spectroscopy, helped raising the development of computational methods to do so. An approach used in these last is prediction of tridimensional structure based in the residue chain, however, this has been proved an NP-hard problem, due to the complexity of this process, explained by the Levinthal paradox. An alternative solution is the prediction of intermediary structures, such as the secondary structure of the protein. Artificial Intelligence methods, such as Bayesian statistics, artificial neural networks (ANN), support vector machines (SVM), among others, were used to predict protein secondary structure. Due to its good results, artificial neural networks have been used as a standard method to predict protein secondary structure. Recent published methods that use this technique, in general, achieved a Q3 accuracy between 75% and 83%, whereas the theoretical accuracy limit for protein prediction is 88%. Alternatively, to achieve better results, support vector machines prediction methods have been developed. The statistical evaluation of methods that use different AI techniques, such as ANNs and SVMs, for example, is not a trivial problem, since different training sets, validation techniques, as well as other variables can influence the behavior of a prediction method. In this study, we propose a prediction method based on artificial neural networks, which is then compared with a selected SVM method. The chosen SVM protein secondary structure prediction method is the one proposed by Huang in his work Extracting Physico chemical Features to Predict Protein Secondary Structure (2013). The developed ANN method has the same training and testing process that was used by Huang to validate his method, which comprises the use of the CB513 protein data set and three-fold cross-validation, so that the comparative analysis of the results can be made comparing directly the statistical results of each method.

Keywords: artificial neural networks, protein secondary structure, protein structure prediction, support vector machines

Procedia PDF Downloads 621
2884 An Automated Stock Investment System Using Machine Learning Techniques: An Application in Australia

Authors: Carol Anne Hargreaves

Abstract:

A key issue in stock investment is how to select representative features for stock selection. The objective of this paper is to firstly determine whether an automated stock investment system, using machine learning techniques, may be used to identify a portfolio of growth stocks that are highly likely to provide returns better than the stock market index. The second objective is to identify the technical features that best characterize whether a stock’s price is likely to go up and to identify the most important factors and their contribution to predicting the likelihood of the stock price going up. Unsupervised machine learning techniques, such as cluster analysis, were applied to the stock data to identify a cluster of stocks that was likely to go up in price – portfolio 1. Next, the principal component analysis technique was used to select stocks that were rated high on component one and component two – portfolio 2. Thirdly, a supervised machine learning technique, the logistic regression method, was used to select stocks with a high probability of their price going up – portfolio 3. The predictive models were validated with metrics such as, sensitivity (recall), specificity and overall accuracy for all models. All accuracy measures were above 70%. All portfolios outperformed the market by more than eight times. The top three stocks were selected for each of the three stock portfolios and traded in the market for one month. After one month the return for each stock portfolio was computed and compared with the stock market index returns. The returns for all three stock portfolios was 23.87% for the principal component analysis stock portfolio, 11.65% for the logistic regression portfolio and 8.88% for the K-means cluster portfolio while the stock market performance was 0.38%. This study confirms that an automated stock investment system using machine learning techniques can identify top performing stock portfolios that outperform the stock market.

Keywords: machine learning, stock market trading, logistic regression, cluster analysis, factor analysis, decision trees, neural networks, automated stock investment system

Procedia PDF Downloads 157