Search results for: prediction model accuracy
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 19244

Search results for: prediction model accuracy

19094 Infilling Strategies for Surrogate Model Based Multi-disciplinary Analysis and Applications to Velocity Prediction Programs

Authors: Malo Pocheau-Lesteven, Olivier Le Maître

Abstract:

Engineering and optimisation of complex systems is often achieved through multi-disciplinary analysis of the system, where each subsystem is modeled and interacts with other subsystems to model the complete system. The coherence of the output of the different sub-systems is achieved through the use of compatibility constraints, which enforce the coupling between the different subsystems. Due to the complexity of some sub-systems and the computational cost of evaluating their respective models, it is often necessary to build surrogate models of these subsystems to allow repeated evaluation these subsystems at a relatively low computational cost. In this paper, gaussian processes are used, as their probabilistic nature is leveraged to evaluate the likelihood of satisfying the compatibility constraints. This paper presents infilling strategies to build accurate surrogate models of the subsystems in areas where they are likely to meet the compatibility constraint. It is shown that these infilling strategies can reduce the computational cost of building surrogate models for a given level of accuracy. An application of these methods to velocity prediction programs used in offshore racing naval architecture further demonstrates these method's applicability in a real engineering context. Also, some examples of the application of uncertainty quantification to field of naval architecture are presented.

Keywords: infilling strategy, gaussian process, multi disciplinary analysis, velocity prediction program

Procedia PDF Downloads 134
19093 Market Index Trend Prediction using Deep Learning and Risk Analysis

Authors: Shervin Alaei, Reza Moradi

Abstract:

Trading in financial markets is subject to risks due to their high volatilities. Here, using an LSTM neural network, and by doing some risk-based feature engineering tasks, we developed a method that can accurately predict trends of the Tehran stock exchange market index from a few days ago. Our test results have shown that the proposed method with an average prediction accuracy of more than 94% is superior to the other common machine learning algorithms. To the best of our knowledge, this is the first work incorporating deep learning and risk factors to accurately predict market trends.

Keywords: deep learning, LSTM, trend prediction, risk management, artificial neural networks

Procedia PDF Downloads 119
19092 Prediction of Soil Liquefaction by Using UBC3D-PLM Model in PLAXIS

Authors: A. Daftari, W. Kudla

Abstract:

Liquefaction is a phenomenon in which the strength and stiffness of a soil is reduced by earthquake shaking or other rapid cyclic loading. Liquefaction and related phenomena have been responsible for huge amounts of damage in historical earthquakes around the world. Modelling of soil behaviour is the main step in soil liquefaction prediction process. Nowadays, several constitutive models for sand have been presented. Nevertheless, only some of them can satisfy this mechanism. One of the most useful models in this term is UBCSAND model. In this research, the capability of this model is considered by using PLAXIS software. The real data of superstition hills earthquake 1987 in the Imperial Valley was used. The results of the simulation have shown resembling trend of the UBC3D-PLM model.

Keywords: liquefaction, plaxis, pore-water pressure, UBC3D-PLM

Procedia PDF Downloads 284
19091 Homeless Population Modeling and Trend Prediction Through Identifying Key Factors and Machine Learning

Authors: Shayla He

Abstract:

Background and Purpose: According to Chamie (2017), it’s estimated that no less than 150 million people, or about 2 percent of the world’s population, are homeless. The homeless population in the United States has grown rapidly in the past four decades. In New York City, the sheltered homeless population has increased from 12,830 in 1983 to 62,679 in 2020. Knowing the trend on the homeless population is crucial at helping the states and the cities make affordable housing plans, and other community service plans ahead of time to better prepare for the situation. This study utilized the data from New York City, examined the key factors associated with the homelessness, and developed systematic modeling to predict homeless populations of the future. Using the best model developed, named HP-RNN, an analysis on the homeless population change during the months of 2020 and 2021, which were impacted by the COVID-19 pandemic, was conducted. Moreover, HP-RNN was tested on the data from Seattle. Methods: The methodology involves four phases in developing robust prediction methods. Phase 1 gathered and analyzed raw data of homeless population and demographic conditions from five urban centers. Phase 2 identified the key factors that contribute to the rate of homelessness. In Phase 3, three models were built using Linear Regression, Random Forest, and Recurrent Neural Network (RNN), respectively, to predict the future trend of society's homeless population. Each model was trained and tuned based on the dataset from New York City for its accuracy measured by Mean Squared Error (MSE). In Phase 4, the final phase, the best model from Phase 3 was evaluated using the data from Seattle that was not part of the model training and tuning process in Phase 3. Results: Compared to the Linear Regression based model used by HUD et al (2019), HP-RNN significantly improved the prediction metrics of Coefficient of Determination (R2) from -11.73 to 0.88 and MSE by 99%. HP-RNN was then validated on the data from Seattle, WA, which showed a peak %error of 14.5% between the actual and the predicted count. Finally, the modeling results were collected to predict the trend during the COVID-19 pandemic. It shows a good correlation between the actual and the predicted homeless population, with the peak %error less than 8.6%. Conclusions and Implications: This work is the first work to apply RNN to model the time series of the homeless related data. The Model shows a close correlation between the actual and the predicted homeless population. There are two major implications of this result. First, the model can be used to predict the homeless population for the next several years, and the prediction can help the states and the cities plan ahead on affordable housing allocation and other community service to better prepare for the future. Moreover, this prediction can serve as a reference to policy makers and legislators as they seek to make changes that may impact the factors closely associated with the future homeless population trend.

Keywords: homeless, prediction, model, RNN

Procedia PDF Downloads 96
19090 Machine Learning Approach for Yield Prediction in Semiconductor Production

Authors: Heramb Somthankar, Anujoy Chakraborty

Abstract:

This paper presents a classification study on yield prediction in semiconductor production using machine learning approaches. A complicated semiconductor production process is generally monitored continuously by signals acquired from sensors and measurement sites. A monitoring system contains a variety of signals, all of which contain useful information, irrelevant information, and noise. In the case of each signal being considered a feature, "Feature Selection" is used to find the most relevant signals. The open-source UCI SECOM Dataset provides 1567 such samples, out of which 104 fail in quality assurance. Feature extraction and selection are performed on the dataset, and useful signals were considered for further study. Afterward, common machine learning algorithms were employed to predict whether the signal yields pass or fail. The most relevant algorithm is selected for prediction based on the accuracy and loss of the ML model.

Keywords: deep learning, feature extraction, feature selection, machine learning classification algorithms, semiconductor production monitoring, signal processing, time-series analysis

Procedia PDF Downloads 85
19089 The Direct Deconvolutional Model in the Large-Eddy Simulation of Turbulence

Authors: Ning Chang, Zelong Yuan, Yunpeng Wang, Jianchun Wang

Abstract:

The utilization of Large Eddy Simulation (LES) has been extensive in turbulence research. LES concentrates on resolving the significant grid-scale motions while representing smaller scales through subfilter-scale (SFS) models. The deconvolution model, among the available SFS models, has proven successful in LES of engineering and geophysical flows. Nevertheless, the thorough investigation of how sub-filter scale dynamics and filter anisotropy affect SFS modeling accuracy remains lacking. The outcomes of LES are significantly influenced by filter selection and grid anisotropy, factors that have not been adequately addressed in earlier studies. This study examines two crucial aspects of LES: Firstly, the accuracy of direct deconvolution models (DDM) is evaluated concerning sub-filter scale (SFS) dynamics across varying filter-to-grid ratios (FGR) in isotropic turbulence. Various invertible filters are employed, including Gaussian, Helmholtz I and II, Butterworth, Chebyshev I and II, Cauchy, Pao, and rapidly decaying filters. The importance of FGR becomes evident as it plays a critical role in controlling errors for precise SFS stress prediction. When FGR is set to 1, the DDM models struggle to faithfully reconstruct SFS stress due to inadequate resolution of SFS dynamics. Notably, prediction accuracy improves when FGR is set to 2, leading to accurate reconstruction of SFS stress, except for cases involving Helmholtz I and II filters. Remarkably high precision, nearly 100%, is achieved at an FGR of 4 for all DDM models. Furthermore, the study extends to filter anisotropy and its impact on SFS dynamics and LES accuracy. By utilizing the dynamic Smagorinsky model (DSM), dynamic mixed model (DMM), and direct deconvolution model (DDM) with anisotropic filters, aspect ratios (AR) ranging from 1 to 16 are examined in LES filters. The results emphasize the DDM’s proficiency in accurately predicting SFS stresses under highly anisotropic filtering conditions. Notably high correlation coefficients exceeding 90% are observed in the a priori study for the DDM’s reconstructed SFS stresses, surpassing those of the DSM and DMM models. However, these correlations tend to decrease as filter anisotropy increases. In the a posteriori analysis, the DDM model consistently outperforms the DSM and DMM models across various turbulence statistics, including velocity spectra, probability density functions related to vorticity, SFS energy flux, velocity increments, strainrate tensors, and SFS stress. It is evident that as filter anisotropy intensifies, the results of DSM and DMM deteriorate, while the DDM consistently delivers satisfactory outcomes across all filter-anisotropy scenarios. These findings underscore the potential of the DDM framework as a valuable tool for advancing the development of sophisticated SFS models for LES in turbulence research.

Keywords: deconvolution model, large eddy simulation, subfilter scale modeling, turbulence

Procedia PDF Downloads 49
19088 Springback Prediction for Sheet Metal Cold Stamping Using Convolutional Neural Networks

Authors: Lei Zhu, Nan Li

Abstract:

Cold stamping has been widely applied in the automotive industry for the mass production of a great range of automotive panels. Predicting the springback to ensure the dimensional accuracy of the cold-stamped components is a critical step. The main approaches for the prediction and compensation of springback in cold stamping include running Finite Element (FE) simulations and conducting experiments, which require forming process expertise and can be time-consuming and expensive for the design of cold stamping tools. Machine learning technologies have been proven and successfully applied in learning complex system behaviours using presentative samples. These technologies exhibit the promising potential to be used as supporting design tools for metal forming technologies. This study, for the first time, presents a novel application of a Convolutional Neural Network (CNN) based surrogate model to predict the springback fields for variable U-shape cold bending geometries. A dataset is created based on the U-shape cold bending geometries and the corresponding FE simulations results. The dataset is then applied to train the CNN surrogate model. The result shows that the surrogate model can achieve near indistinguishable full-field predictions in real-time when compared with the FE simulation results. The application of CNN in efficient springback prediction can be adopted in industrial settings to aid both conceptual and final component designs for designers without having manufacturing knowledge.

Keywords: springback, cold stamping, convolutional neural networks, machine learning

Procedia PDF Downloads 124
19087 Electricity Demand Modeling and Forecasting in Singapore

Authors: Xian Li, Qing-Guo Wang, Jiangshuai Huang, Jidong Liu, Ming Yu, Tan Kok Poh

Abstract:

In power industry, accurate electricity demand forecasting for a certain leading time is important for system operation and control, etc. In this paper, we investigate the modeling and forecasting of Singapore’s electricity demand. Several standard models, such as HWT exponential smoothing model, the ARMA model and the ANNs model have been proposed based on historical demand data. We applied them to Singapore electricity market and proposed three refinements based on simulation to improve the modeling accuracy. Compared with existing models, our refined model can produce better forecasting accuracy. It is demonstrated in the simulation that by adding forecasting error into the forecasting equation, the modeling accuracy could be improved greatly.

Keywords: power industry, electricity demand, modeling, forecasting

Procedia PDF Downloads 615
19086 Virtual Metrology for Copper Clad Laminate Manufacturing

Authors: Misuk Kim, Seokho Kang, Jehyuk Lee, Hyunchang Cho, Sungzoon Cho

Abstract:

In semiconductor manufacturing, virtual metrology (VM) refers to methods to predict properties of a wafer based on machine parameters and sensor data of the production equipment, without performing the (costly) physical measurement of the wafer properties (Wikipedia). Additional benefits include avoidance of human bias and identification of important factors affecting the quality of the process which allow improving the process quality in the future. It is however rare to find VM applied to other areas of manufacturing. In this work, we propose to use VM to copper clad laminate (CCL) manufacturing. CCL is a core element of a printed circuit board (PCB) which is used in smartphones, tablets, digital cameras, and laptop computers. The manufacturing of CCL consists of three processes: Treating, lay-up, and pressing. Treating, the most important process among the three, puts resin on glass cloth, heat up in a drying oven, then produces prepreg for lay-up process. In this process, three important quality factors are inspected: Treated weight (T/W), Minimum Viscosity (M/V), and Gel Time (G/T). They are manually inspected, incurring heavy cost in terms of time and money, which makes it a good candidate for VM application. We developed prediction models of the three quality factors T/W, M/V, and G/T, respectively, with process variables, raw material, and environment variables. The actual process data was obtained from a CCL manufacturer. A variety of variable selection methods and learning algorithms were employed to find the best prediction model. We obtained prediction models of M/V and G/T with a high enough accuracy. They also provided us with information on “important” predictor variables, some of which the process engineers had been already aware and the rest of which they had not. They were quite excited to find new insights that the model revealed and set out to do further analysis on them to gain process control implications. T/W did not turn out to be possible to predict with a reasonable accuracy with given factors. The very fact indicates that the factors currently monitored may not affect T/W, thus an effort has to be made to find other factors which are not currently monitored in order to understand the process better and improve the quality of it. In conclusion, VM application to CCL’s treating process was quite successful. The newly built quality prediction model allowed one to reduce the cost associated with actual metrology as well as reveal some insights on the factors affecting the important quality factors and on the level of our less than perfect understanding of the treating process.

Keywords: copper clad laminate, predictive modeling, quality control, virtual metrology

Procedia PDF Downloads 332
19085 6D Posture Estimation of Road Vehicles from Color Images

Authors: Yoshimoto Kurihara, Tad Gonsalves

Abstract:

Currently, in the field of object posture estimation, there is research on estimating the position and angle of an object by storing a 3D model of the object to be estimated in advance in a computer and matching it with the model. However, in this research, we have succeeded in creating a module that is much simpler, smaller in scale, and faster in operation. Our 6D pose estimation model consists of two different networks – a classification network and a regression network. From a single RGB image, the trained model estimates the class of the object in the image, the coordinates of the object, and its rotation angle in 3D space. In addition, we compared the estimation accuracy of each camera position, i.e., the angle from which the object was captured. The highest accuracy was recorded when the camera position was 75°, the accuracy of the classification was about 87.3%, and that of regression was about 98.9%.

Keywords: 6D posture estimation, image recognition, deep learning, AlexNet

Procedia PDF Downloads 123
19084 Additive Weibull Model Using Warranty Claim and Finite Element Analysis Fatigue Analysis

Authors: Kanchan Mondal, Dasharath Koulage, Dattatray Manerikar, Asmita Ghate

Abstract:

This paper presents an additive reliability model using warranty data and Finite Element Analysis (FEA) data. Warranty data for any product gives insight to its underlying issues. This is often used by Reliability Engineers to build prediction model to forecast failure rate of parts. But there is one major limitation in using warranty data for prediction. Warranty periods constitute only a small fraction of total lifetime of a product, most of the time it covers only the infant mortality and useful life zone of a bathtub curve. Predicting with warranty data alone in these cases is not generally provide results with desired accuracy. Failure rate of a mechanical part is driven by random issues initially and wear-out or usage related issues at later stages of the lifetime. For better predictability of failure rate, one need to explore the failure rate behavior at wear out zone of a bathtub curve. Due to cost and time constraints, it is not always possible to test samples till failure, but FEA-Fatigue analysis can provide the failure rate behavior of a part much beyond warranty period in a quicker time and at lesser cost. In this work, the authors proposed an Additive Weibull Model, which make use of both warranty and FEA fatigue analysis data for predicting failure rates. It involves modeling of two data sets of a part, one with existing warranty claims and other with fatigue life data. Hazard rate base Weibull estimation has been used for the modeling the warranty data whereas S-N curved based Weibull parameter estimation is used for FEA data. Two separate Weibull models’ parameters are estimated and combined to form the proposed Additive Weibull Model for prediction.

Keywords: bathtub curve, fatigue, FEA, reliability, warranty, Weibull

Procedia PDF Downloads 46
19083 Model Averaging in a Multiplicative Heteroscedastic Model

Authors: Alan Wan

Abstract:

In recent years, the body of literature on frequentist model averaging in statistics has grown significantly. Most of this work focuses on models with different mean structures but leaves out the variance consideration. In this paper, we consider a regression model with multiplicative heteroscedasticity and develop a model averaging method that combines maximum likelihood estimators of unknown parameters in both the mean and variance functions of the model. Our weight choice criterion is based on a minimisation of a plug-in estimator of the model average estimator's squared prediction risk. We prove that the new estimator possesses an asymptotic optimality property. Our investigation of finite-sample performance by simulations demonstrates that the new estimator frequently exhibits very favourable properties compared to some existing heteroscedasticity-robust model average estimators. The model averaging method hedges against the selection of very bad models and serves as a remedy to variance function misspecification, which often discourages practitioners from modeling heteroscedasticity altogether. The proposed model average estimator is applied to the analysis of two real data sets.

Keywords: heteroscedasticity-robust, model averaging, multiplicative heteroscedasticity, plug-in, squared prediction risk

Procedia PDF Downloads 341
19082 Data-Driven Approach to Predict Inpatient's Estimated Discharge Date

Authors: Ayliana Dharmawan, Heng Yong Sheng, Zhang Xiaojin, Tan Thai Lian

Abstract:

To facilitate discharge planning, doctors are presently required to assign an Estimated Discharge Date (EDD) for each patient admitted to the hospital. This assignment of the EDD is largely based on the doctor’s judgment. This can be difficult for cases which are complex or relatively new to the doctor. It is hypothesized that a data-driven approach would be able to facilitate the doctors to make accurate estimations of the discharge date. Making use of routinely collected data on inpatient discharges between January 2013 and May 2016, a predictive model was developed using machine learning techniques to predict the Length of Stay (and hence the EDD) of inpatients, at the point of admission. The predictive performance of the model was compared to that of the clinicians using accuracy measures. Overall, the best performing model was found to be able to predict EDD with an accuracy improvement in Average Squared Error (ASE) by -38% as compared to the first EDD determined by the present method. It was found that important predictors of the EDD include the provisional diagnosis code, patient’s age, attending doctor at admission, medical specialty at admission, accommodation type, and the mean length of stay of the patient in the past year. The predictive model can be used as a tool to accurately predict the EDD.

Keywords: inpatient, estimated discharge date, EDD, prediction, data-driven

Procedia PDF Downloads 144
19081 Using Water Erosion Prediction Project Simulation Model for Studying Some Soil Properties in Egypt

Authors: H. A. Mansour

Abstract:

The objective of this research work is studying the water use prediction, prediction technology for water use by action agencies, and others involved in conservation, planning, and environmental assessment of the Water Erosion Prediction Project (WEPP) simulation model. Models the important physical, processes governing erosion in Egypt (climate, infiltration, runoff, ET, detachment by raindrops, detachment by flowing water, deposition, etc.). Simulation of the non-uniform slope, soils, cropping/management., and Egyptian databases for climate, soils, and crops. The study included important parameters in Egyptian conditions as follows: Water Balance & Percolation, Soil Component (Tillage impacts), Plant Growth & Residue Decomposition, Overland Flow Hydraulics. It could be concluded that we can adapt the WEPP simulation model to determining the previous important parameters under Egyptian conditions.

Keywords: WEPP, adaptation, soil properties, tillage impacts, water balance, soil percolation

Procedia PDF Downloads 271
19080 Machine Learning Approaches Based on Recency, Frequency, Monetary (RFM) and K-Means for Predicting Electrical Failures and Voltage Reliability in Smart Cities

Authors: Panaya Sudta, Wanchalerm Patanacharoenwong, Prachya Bumrungkun

Abstract:

As With the evolution of smart grids, ensuring the reliability and efficiency of electrical systems in smart cities has become crucial. This paper proposes a distinct approach that combines advanced machine learning techniques to accurately predict electrical failures and address voltage reliability issues. This approach aims to improve the accuracy and efficiency of reliability evaluations in smart cities. The aim of this research is to develop a comprehensive predictive model that accurately predicts electrical failures and voltage reliability in smart cities. This model integrates RFM analysis, K-means clustering, and LSTM networks to achieve this objective. The research utilizes RFM analysis, traditionally used in customer value assessment, to categorize and analyze electrical components based on their failure recency, frequency, and monetary impact. K-means clustering is employed to segment electrical components into distinct groups with similar characteristics and failure patterns. LSTM networks are used to capture the temporal dependencies and patterns in customer data. This integration of RFM, K-means, and LSTM results in a robust predictive tool for electrical failures and voltage reliability. The proposed model has been tested and validated on diverse electrical utility datasets. The results show a significant improvement in prediction accuracy and reliability compared to traditional methods, achieving an accuracy of 92.78% and an F1-score of 0.83. This research contributes to the proactive maintenance and optimization of electrical infrastructures in smart cities. It also enhances overall energy management and sustainability. The integration of advanced machine learning techniques in the predictive model demonstrates the potential for transforming the landscape of electrical system management within smart cities. The research utilizes diverse electrical utility datasets to develop and validate the predictive model. RFM analysis, K-means clustering, and LSTM networks are applied to these datasets to analyze and predict electrical failures and voltage reliability. The research addresses the question of how accurately electrical failures and voltage reliability can be predicted in smart cities. It also investigates the effectiveness of integrating RFM analysis, K-means clustering, and LSTM networks in achieving this goal. The proposed approach presents a distinct, efficient, and effective solution for predicting and mitigating electrical failures and voltage issues in smart cities. It significantly improves prediction accuracy and reliability compared to traditional methods. This advancement contributes to the proactive maintenance and optimization of electrical infrastructures, overall energy management, and sustainability in smart cities.

Keywords: electrical state prediction, smart grids, data-driven method, long short-term memory, RFM, k-means, machine learning

Procedia PDF Downloads 27
19079 Enhancing Patch Time Series Transformer with Wavelet Transform for Improved Stock Prediction

Authors: Cheng-yu Hsieh, Bo Zhang, Ahmed Hambaba

Abstract:

Stock market prediction has long been an area of interest for both expert analysts and investors, driven by its complexity and the noisy, volatile conditions it operates under. This research examines the efficacy of combining the Patch Time Series Transformer (PatchTST) with wavelet transforms, specifically focusing on Haar and Daubechies wavelets, in forecasting the adjusted closing price of the S&P 500 index for the following day. By comparing the performance of the augmented PatchTST models with traditional predictive models such as Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Transformers, this study highlights significant enhancements in prediction accuracy. The integration of the Daubechies wavelet with PatchTST notably excels, surpassing other configurations and conventional models in terms of Mean Absolute Error (MAE) and Mean Squared Error (MSE). The success of the PatchTST model paired with Daubechies wavelet is attributed to its superior capability in extracting detailed signal information and eliminating irrelevant noise, thus proving to be an effective approach for financial time series forecasting.

Keywords: deep learning, financial forecasting, stock market prediction, patch time series transformer, wavelet transform

Procedia PDF Downloads 1
19078 Applying the Regression Technique for ‎Prediction of the Acute Heart Attack ‎

Authors: Paria Soleimani, Arezoo Neshati

Abstract:

Myocardial infarction is one of the leading causes of ‎death in the world. Some of these deaths occur even before the patient ‎reaches the hospital. Myocardial infarction occurs as a result of ‎impaired blood supply. Because the most of these deaths are due to ‎coronary artery disease, hence the awareness of the warning signs of a ‎heart attack is essential. Some heart attacks are sudden and intense, but ‎most of them start slowly, with mild pain or discomfort, then early ‎detection and successful treatment of these symptoms is vital to save ‎them. Therefore, importance and usefulness of a system designing to ‎assist physicians in the early diagnosis of the acute heart attacks is ‎obvious.‎ The purpose of this study is to determine how well a predictive ‎model would perform based on the only patient-reportable clinical ‎history factors, without using diagnostic tests or physical exams. This ‎type of the prediction model might have application outside of the ‎hospital setting to give accurate advice to patients to influence them to ‎seek care in appropriate situations. For this purpose, the data were ‎collected on 711 heart patients in Iran hospitals. 28 attributes of clinical ‎factors can be reported by patients; were studied. Three logistic ‎regression models were made on the basis of the 28 features to predict ‎the risk of heart attacks. The best logistic regression model in terms of ‎performance had a C-index of 0.955 and with an accuracy of 94.9%. ‎The variables, severe chest pain, back pain, cold sweats, shortness of ‎breath, nausea, and vomiting were selected as the main features.‎

Keywords: Coronary heart disease, Acute heart attacks, Prediction, Logistic ‎regression‎

Procedia PDF Downloads 429
19077 [Keynote Speech]: Feature Selection and Predictive Modeling of Housing Data Using Random Forest

Authors: Bharatendra Rai

Abstract:

Predictive data analysis and modeling involving machine learning techniques become challenging in presence of too many explanatory variables or features. Presence of too many features in machine learning is known to not only cause algorithms to slow down, but they can also lead to decrease in model prediction accuracy. This study involves housing dataset with 79 quantitative and qualitative features that describe various aspects people consider while buying a new house. Boruta algorithm that supports feature selection using a wrapper approach build around random forest is used in this study. This feature selection process leads to 49 confirmed features which are then used for developing predictive random forest models. The study also explores five different data partitioning ratios and their impact on model accuracy are captured using coefficient of determination (r-square) and root mean square error (rsme).

Keywords: housing data, feature selection, random forest, Boruta algorithm, root mean square error

Procedia PDF Downloads 291
19076 Identification of Landslide Features Using Back-Propagation Neural Network on LiDAR Digital Elevation Model

Authors: Chia-Hao Chang, Geng-Gui Wang, Jee-Cheng Wu

Abstract:

The prediction of a landslide is a difficult task because it requires a detailed study of past activities using a complete range of investigative methods to determine the changing condition. In this research, first step, LiDAR 1-meter by 1-meter resolution of digital elevation model (DEM) was used to generate six environmental factors of landslide. Then, back-propagation neural networks (BPNN) was adopted to identify scarp, landslide areas and non-landslide areas. The BPNN uses 6 environmental factors in input layer and 1 output layer. Moreover, 6 landslide areas are used as training areas and 4 landslide areas as test areas in the BPNN. The hidden layer is set to be 1 and 2; the hidden layer neurons are set to be 4, 5, 6, 7 and 8; the learning rates are set to be 0.01, 0.1 and 0.5. When using 1 hidden layer with 7 neurons and the learning rate sets to be 0.5, the result of Network training root mean square error is 0.001388. Finally, evaluation of BPNN classification accuracy by the confusion matrix shows that the overall accuracy can reach 94.4%, and the Kappa value is 0.7464.

Keywords: digital elevation model, DEM, environmental factors, back-propagation neural network, BPNN, LiDAR

Procedia PDF Downloads 113
19075 Comparison of Existing Predictor and Development of Computational Method for S- Palmitoylation Site Identification in Arabidopsis Thaliana

Authors: Ayesha Sanjana Kawser Parsha

Abstract:

S-acylation is an irreversible bond in which cysteine residues are linked to fatty acids palmitate (74%) or stearate (22%), either at the COOH or NH2 terminal, via a thioester linkage. There are several experimental methods that can be used to identify the S-palmitoylation site; however, since they require a lot of time, computational methods are becoming increasingly necessary. There aren't many predictors, however, that can locate S- palmitoylation sites in Arabidopsis Thaliana with sufficient accuracy. This research is based on the importance of building a better prediction tool. To identify the type of machine learning algorithm that predicts this site more accurately for the experimental dataset, several prediction tools were examined in this research, including the GPS PALM 6.0, pCysMod, GPS LIPID 1.0, CSS PALM 4.0, and NBA PALM. These analyses were conducted by constructing the receiver operating characteristics plot and the area under the curve score. An AI-driven deep learning-based prediction tool has been developed utilizing the analysis and three sequence-based input data, such as the amino acid composition, binary encoding profile, and autocorrelation features. The model was developed using five layers, two activation functions, associated parameters, and hyperparameters. The model was built using various combinations of features, and after training and validation, it performed better when all the features were present while using the experimental dataset for 8 and 10-fold cross-validations. While testing the model with unseen and new data, such as the GPS PALM 6.0 plant and pCysMod mouse, the model performed better, and the area under the curve score was near 1. It can be demonstrated that this model outperforms the prior tools in predicting the S- palmitoylation site in the experimental data set by comparing the area under curve score of 10-fold cross-validation of the new model with the established tools' area under curve score with their respective training sets. The objective of this study is to develop a prediction tool for Arabidopsis Thaliana that is more accurate than current tools, as measured by the area under the curve score. Plant food production and immunological treatment targets can both be managed by utilizing this method to forecast S- palmitoylation sites.

Keywords: S- palmitoylation, ROC PLOT, area under the curve, cross- validation score

Procedia PDF Downloads 49
19074 Composite Forecasts Accuracy for Automobile Sales in Thailand

Authors: Watchareeporn Chaimongkol

Abstract:

In this paper, we compare the statistical measures accuracy of composite forecasting model to estimate automobile customer demand in Thailand. A modified simple exponential smoothing and autoregressive integrate moving average (ARIMA) forecasting model is built to estimate customer demand of passenger cars, instead of using information of historical sales data. Our model takes into account special characteristic of the Thai automobile market such as sales promotion, advertising and publicity, petrol price, and interest rate for loan. We evaluate our forecasting model by comparing forecasts with actual data using six accuracy measurements, mean absolute percentage error (MAPE), geometric mean absolute error (GMAE), symmetric mean absolute percentage error (sMAPE), mean absolute scaled error (MASE), median relative absolute error (MdRAE), and geometric mean relative absolute error (GMRAE).

Keywords: composite forecasting, simple exponential smoothing model, autoregressive integrate moving average model selection, accuracy measurements

Procedia PDF Downloads 337
19073 Residual Life Prediction for a System Subject to Condition Monitoring and Two Failure Modes

Authors: Akram Khaleghei, Ghosheh Balagh, Viliam Makis

Abstract:

In this paper, we investigate the residual life prediction problem for a partially observable system subject to two failure modes, namely a catastrophic failure and a failure due to the system degradation. The system is subject to condition monitoring and the degradation process is described by a hidden Markov model with unknown parameters. The parameter estimation procedure based on an EM algorithm is developed and the formulas for the conditional reliability function and the mean residual life are derived, illustrated by a numerical example.

Keywords: partially observable system, hidden Markov model, competing risks, residual life prediction

Procedia PDF Downloads 388
19072 Stock Market Prediction Using Convolutional Neural Network That Learns from a Graph

Authors: Mo-Se Lee, Cheol-Hwi Ahn, Kee-Young Kwahk, Hyunchul Ahn

Abstract:

Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN (Convolutional Neural Network), which is known as effective solution for recognizing and classifying images, has been popularly applied to classification and prediction problems in various fields. In this study, we try to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. In specific, we propose to apply CNN as the binary classifier that predicts stock market direction (up or down) by using a graph as its input. That is, our proposal is to build a machine learning algorithm that mimics a person who looks at the graph and predicts whether the trend will go up or down. Our proposed model consists of four steps. In the first step, it divides the dataset into 5 days, 10 days, 15 days, and 20 days. And then, it creates graphs for each interval in step 2. In the next step, CNN classifiers are trained using the graphs generated in the previous step. In step 4, it optimizes the hyper parameters of the trained model by using the validation dataset. To validate our model, we will apply it to the prediction of KOSPI200 for 1,986 days in eight years (from 2009 to 2016). The experimental dataset will include 14 technical indicators such as CCI, Momentum, ROC and daily closing price of KOSPI200 of Korean stock market.

Keywords: convolutional neural network, deep learning, Korean stock market, stock market prediction

Procedia PDF Downloads 407
19071 A Study for Area-level Mosquito Abundance Prediction by Using Supervised Machine Learning Point-level Predictor

Authors: Theoktisti Makridou, Konstantinos Tsaprailis, George Arvanitakis, Charalampos Kontoes

Abstract:

In the literature, the data-driven approaches for mosquito abundance prediction relaying on supervised machine learning models that get trained with historical in-situ measurements. The counterpart of this approach is once the model gets trained on pointlevel (specific x,y coordinates) measurements, the predictions of the model refer again to point-level. These point-level predictions reduce the applicability of those solutions once a lot of early warning and mitigation actions applications need predictions for an area level, such as a municipality, village, etc... In this study, we apply a data-driven predictive model, which relies on public-open satellite Earth Observation and geospatial data and gets trained with historical point-level in-Situ measurements of mosquito abundance. Then we propose a methodology to extract information from a point-level predictive model to a broader area-level prediction. Our methodology relies on the randomly spatial sampling of the area of interest (similar to the Poisson hardcore process), obtaining the EO and geomorphological information for each sample, doing the point-wise prediction for each sample, and aggregating the predictions to represent the average mosquito abundance of the area. We quantify the performance of the transformation from the pointlevel to the area-level predictions, and we analyze it in order to understand which parameters have a positive or negative impact on it. The goal of this study is to propose a methodology that predicts the mosquito abundance of a given area by relying on point-level prediction and to provide qualitative insights regarding the expected performance of the area-level prediction. We applied our methodology to historical data (of Culex pipiens) of two areas of interest (Veneto region of Italy and Central Macedonia of Greece). In both cases, the results were consistent. The mean mosquito abundance of a given area can be estimated with similar accuracy to the point-level predictor, sometimes even better. The density of the samples that we use to represent one area has a positive effect on the performance in contrast to the actual number of sampling points which is not informative at all regarding the performance without the size of the area. Additionally, we saw that the distance between the sampling points and the real in-situ measurements that were used for training did not strongly affect the performance.

Keywords: mosquito abundance, supervised machine learning, culex pipiens, spatial sampling, west nile virus, earth observation data

Procedia PDF Downloads 112
19070 A New Model for Production Forecasting in ERP

Authors: S. F. Wong, W. I. Ho, B. Lin, Q. Huang

Abstract:

ERP has been used in many enterprises for management, the accuracy of the production forecasting module is vital to the decision making of the enterprise, and the profit is affected directly. Therefore, enhancing the accuracy of the production forecasting module can also increase the efficiency and profitability. To deal with a lot of data, a suitable, reliable and accurate statistics model is necessary. LSSVM and Grey System are two main models to be studied in this paper, and a case study is used to demonstrate how the combination model is effective to the result of forecasting.

Keywords: ERP, grey system, LSSVM, production forecasting

Procedia PDF Downloads 425
19069 Computational Models for Accurate Estimation of Joint Forces

Authors: Ibrahim Elnour Abdelrahman Eltayeb

Abstract:

Computational modelling is a method used to investigate joint forces during a movement. It can get high accuracy in the joint forces via subject-specific models. However, the construction of subject-specific models remains time-consuming and expensive. The purpose of this paper was to identify what alterations we can make to generic computational models to get a better estimation of the joint forces. It appraised the impact of these alterations on the accuracy of the estimated joint forces. It found different strategies of alterations: joint model, muscle model, and an optimisation problem. All these alterations affected joint contact force accuracy, so showing the potential for improving the model predictions without involving costly and time-consuming medical images.

Keywords: joint force, joint model, optimisation problem, validation

Procedia PDF Downloads 145
19068 Intelligent Earthquake Prediction System Based On Neural Network

Authors: Emad Amar, Tawfik Khattab, Fatma Zada

Abstract:

Predicting earthquakes is an important issue in the study of geography. Accurate prediction of earthquakes can help people to take effective measures to minimize the loss of personal and economic damage, such as large casualties, destruction of buildings and broken of traffic, occurred within a few seconds. United States Geological Survey (USGS) science organization provides reliable scientific information of Earthquake Existed throughout history & Preliminary database from the National Center Earthquake Information (NEIC) show some useful factors to predict an earthquake in a seismic area like Aleutian Arc in the U.S. state of Alaska. The main advantage of this prediction method that it does not require any assumption, it makes prediction according to the future evolution of object's time series. The article compares between simulation data result from trained BP and RBF neural network versus actual output result from the system calculations. Therefore, this article focuses on analysis of data relating to real earthquakes. Evaluation results show better accuracy and higher speed by using radial basis functions (RBF) neural network.

Keywords: BP neural network, prediction, RBF neural network, earthquake

Procedia PDF Downloads 470
19067 Monthly River Flow Prediction Using a Nonlinear Prediction Method

Authors: N. H. Adenan, M. S. M. Noorani

Abstract:

River flow prediction is an essential to ensure proper management of water resources can be optimally distribute water to consumers. This study presents an analysis and prediction by using nonlinear prediction method involving monthly river flow data in Tanjung Tualang from 1976 to 2006. Nonlinear prediction method involves the reconstruction of phase space and local linear approximation approach. The phase space reconstruction involves the reconstruction of one-dimensional (the observed 287 months of data) in a multidimensional phase space to reveal the dynamics of the system. Revenue of phase space reconstruction is used to predict the next 72 months. A comparison of prediction performance based on correlation coefficient (CC) and root mean square error (RMSE) have been employed to compare prediction performance for nonlinear prediction method, ARIMA and SVM. Prediction performance comparisons show the prediction results using nonlinear prediction method is better than ARIMA and SVM. Therefore, the result of this study could be used to developed an efficient water management system to optimize the allocation water resources.

Keywords: river flow, nonlinear prediction method, phase space, local linear approximation

Procedia PDF Downloads 388
19066 Neural Network based Risk Detection for Dyslexia and Dysgraphia in Sinhala Language Speaking Children

Authors: Budhvin T. Withana, Sulochana Rupasinghe

Abstract:

The educational system faces a significant concern with regards to Dyslexia and Dysgraphia, which are learning disabilities impacting reading and writing abilities. This is particularly challenging for children who speak the Sinhala language due to its complexity and uniqueness. Commonly used methods to detect the risk of Dyslexia and Dysgraphia rely on subjective assessments, leading to limited coverage and time-consuming processes. Consequently, delays in diagnoses and missed opportunities for early intervention can occur. To address this issue, the project developed a hybrid model that incorporates various deep learning techniques to detect the risk of Dyslexia and Dysgraphia. Specifically, Resnet50, VGG16, and YOLOv8 models were integrated to identify handwriting issues. The outputs of these models were then combined with other input data and fed into an MLP model. Hyperparameters of the MLP model were fine-tuned using Grid Search CV, enabling the identification of optimal values for the model. This approach proved to be highly effective in accurately predicting the risk of Dyslexia and Dysgraphia, providing a valuable tool for early detection and intervention. The Resnet50 model exhibited a training accuracy of 0.9804 and a validation accuracy of 0.9653. The VGG16 model achieved a training accuracy of 0.9991 and a validation accuracy of 0.9891. The MLP model demonstrated impressive results with a training accuracy of 0.99918, a testing accuracy of 0.99223, and a loss of 0.01371. These outcomes showcase the high accuracy achieved by the proposed hybrid model in predicting the risk of Dyslexia and Dysgraphia.

Keywords: neural networks, risk detection system, dyslexia, dysgraphia, deep learning, learning disabilities, data science

Procedia PDF Downloads 36
19065 Breast Cancer Prediction Using Score-Level Fusion of Machine Learning and Deep Learning Models

Authors: Sam Khozama, Ali M. Mayya

Abstract:

Breast cancer is one of the most common types in women. Early prediction of breast cancer helps physicians detect cancer in its early stages. Big cancer data needs a very powerful tool to analyze and extract predictions. Machine learning and deep learning are two of the most efficient tools for predicting cancer based on textual data. In this study, we developed a fusion model of two machine learning and deep learning models. To obtain the final prediction, Long-Short Term Memory (LSTM) and ensemble learning with hyper parameters optimization are used, and score-level fusion is used. Experiments are done on the Breast Cancer Surveillance Consortium (BCSC) dataset after balancing and grouping the class categories. Five different training scenarios are used, and the tests show that the designed fusion model improved the performance by 3.3% compared to the individual models.

Keywords: machine learning, deep learning, cancer prediction, breast cancer, LSTM, fusion

Procedia PDF Downloads 135