Search results for: meteorological prediction data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25496

Search results for: meteorological prediction data

25376 Development of Fuzzy Logic and Neuro-Fuzzy Surface Roughness Prediction Systems Coupled with Cutting Current in Milling Operation

Authors: Joseph C. Chen, Venkata Mohan Kudapa

Abstract:

Development of two real-time surface roughness (Ra) prediction systems for milling operations was attempted. The systems used not only cutting parameters, such as feed rate and spindle speed, but also the cutting current generated and corrected by a clamp type energy sensor. Two different approaches were developed. First, a fuzzy inference system (FIS), in which the fuzzy logic rules are generated by experts in the milling processes, was used to conduct prediction modeling using current cutting data. Second, a neuro-fuzzy system (ANFIS) was explored. Neuro-fuzzy systems are adaptive techniques in which data are collected on the network, processed, and rules are generated by the system. The inference system then uses these rules to predict Ra as the output. Experimental results showed that the parameters of spindle speed, feed rate, depth of cut, and input current variation could predict Ra. These two systems enable the prediction of Ra during the milling operation with an average of 91.83% and 94.48% accuracy by FIS and ANFIS systems, respectively. Statistically, the ANFIS system provided better prediction accuracy than that of the FIS system.

Keywords: surface roughness, input current, fuzzy logic, neuro-fuzzy, milling operations

Procedia PDF Downloads 120
25375 Optimization of Air Pollution Control Model for Mining

Authors: Zunaira Asif, Zhi Chen

Abstract:

The sustainable measures on air quality management are recognized as one of the most serious environmental concerns in the mining region. The mining operations emit various types of pollutants which have significant impacts on the environment. This study presents a stochastic control strategy by developing the air pollution control model to achieve a cost-effective solution. The optimization method is formulated to predict the cost of treatment using linear programming with an objective function and multi-constraints. The constraints mainly focus on two factors which are: production of metal should not exceed the available resources, and air quality should meet the standard criteria of the pollutant. The applicability of this model is explored through a case study of an open pit metal mine, Utah, USA. This method simultaneously uses meteorological data as a dispersion transfer function to support the practical local conditions. The probabilistic analysis and the uncertainties in the meteorological conditions are accomplished by Monte Carlo simulation. Reasonable results have been obtained to select the optimized treatment technology for PM2.5, PM10, NOx, and SO2. Additional comparison analysis shows that baghouse is the least cost option as compared to electrostatic precipitator and wet scrubbers for particulate matter, whereas non-selective catalytical reduction and dry-flue gas desulfurization are suitable for NOx and SO2 reduction respectively. Thus, this model can aid planners to reduce these pollutants at a marginal cost by suggesting control pollution devices, while accounting for dynamic meteorological conditions and mining activities.

Keywords: air pollution, linear programming, mining, optimization, treatment technologies

Procedia PDF Downloads 187
25374 Pattern Recognition Using Feature Based Die-Map Clustering in the Semiconductor Manufacturing Process

Authors: Seung Hwan Park, Cheng-Sool Park, Jun Seok Kim, Youngji Yoo, Daewoong An, Jun-Geol Baek

Abstract:

Depending on the big data analysis becomes important, yield prediction using data from the semiconductor process is essential. In general, yield prediction and analysis of the causes of the failure are closely related. The purpose of this study is to analyze pattern affects the final test results using a die map based clustering. Many researches have been conducted using die data from the semiconductor test process. However, analysis has limitation as the test data is less directly related to the final test results. Therefore, this study proposes a framework for analysis through clustering using more detailed data than existing die data. This study consists of three phases. In the first phase, die map is created through fail bit data in each sub-area of die. In the second phase, clustering using map data is performed. And the third stage is to find patterns that affect final test result. Finally, the proposed three steps are applied to actual industrial data and experimental results showed the potential field application.

Keywords: die-map clustering, feature extraction, pattern recognition, semiconductor manufacturing process

Procedia PDF Downloads 379
25373 Performance Evaluation of Arrival Time Prediction Models

Authors: Bin Li, Mei Liu

Abstract:

Arrival time information is a crucial component of advanced public transport system (APTS). The advertisement of arrival time at stops can help reduce the waiting time and anxiety of passengers, and improve the quality of service. In this research, an experiment was conducted to compare the performance on prediction accuracy and precision between the link-based and the path-based historical travel time based model with the automatic vehicle location (AVL) data collected from an actual bus route. The research results show that the path-based model is superior to the link-based model, and achieves the best improvement on peak hours.

Keywords: bus transit, arrival time prediction, link-based, path-based

Procedia PDF Downloads 343
25372 Stock Movement Prediction Using Price Factor and Deep Learning

Authors: Hy Dang, Bo Mei

Abstract:

The development of machine learning methods and techniques has opened doors for investigation in many areas such as medicines, economics, finance, etc. One active research area involving machine learning is stock market prediction. This research paper tries to consider multiple techniques and methods for stock movement prediction using historical price or price factors. The paper explores the effectiveness of some deep learning frameworks for forecasting stock. Moreover, an architecture (TimeStock) is proposed which takes the representation of time into account apart from the price information itself. Our model achieves a promising result that shows a potential approach for the stock movement prediction problem.

Keywords: classification, machine learning, time representation, stock prediction

Procedia PDF Downloads 119
25371 Evaluation of Weather Risk Insurance for Agricultural Products Using a 3-Factor Pricing Model

Authors: O. Benabdeljelil, A. Karioun, S. Amami, R. Rouger, M. Hamidine

Abstract:

A model for preventing the risks related to climate conditions in the agricultural sector is presented. It will determine the yearly optimum premium to be paid by a producer in order to reach his required turnover. The model is based on both climatic stability and 'soft' responses of usually grown species to average climate variations at the same place and inside a safety ball which can be determined from past meteorological data. This allows the use of linear regression expression for dependence of production result in terms of driving meteorological parameters, the main ones of which are daily average sunlight, rainfall and temperature. By simple best parameter fit from the expert table drawn with professionals, optimal representation of yearly production is determined from records of previous years, and yearly payback is evaluated from minimum yearly produced turnover. The model also requires accurate pricing of commodity at N+1. Therefore, a pricing model is developed using 3 state variables, namely the spot price, the difference between the mean-term and the long-term forward price, and the long-term structure of the model. The use of historical data enables to calibrate the parameters of state variables, and allows the pricing of commodity. Application to beet sugar underlines pricer precision. Indeed, the percentage of accuracy between computed result and real world is 99,5%. Optimal premium is then deduced and gives the producer a useful bound for negotiating an offer by insurance companies to effectively protect its harvest. The application to beet production in French Oise department illustrates the reliability of present model with as low as 6% difference between predicted and real data. The model can be adapted to almost any agricultural field by changing state parameters and calibrating their associated coefficients.

Keywords: agriculture, production model, optimal price, meteorological factors, 3-factor model, parameter calibration, forward price

Procedia PDF Downloads 355
25370 Modified Naive Bayes-Based Prediction Modeling for Crop Yield Prediction

Authors: Kefaya Qaddoum

Abstract:

Most of greenhouse growers desire a determined amount of yields in order to accurately meet market requirements. The purpose of this paper is to model a simple but often satisfactory supervised classification method. The original naive Bayes have a serious weakness, which is producing redundant predictors. In this paper, utilized regularization technique was used to obtain a computationally efficient classifier based on naive Bayes. The suggested construction, utilized L1-penalty, is capable of clearing redundant predictors, where a modification of the LARS algorithm is devised to solve this problem, making this method applicable to a wide range of data. In the experimental section, a study conducted to examine the effect of redundant and irrelevant predictors, and test the method on WSG data set for tomato yields, where there are many more predictors than data, and the urge need to predict weekly yield is the goal of this approach. Finally, the modified approach is compared with several naive Bayes variants and other classification algorithms (SVM and kNN), and is shown to be fairly good.

Keywords: tomato yield prediction, naive Bayes, redundancy, WSG

Procedia PDF Downloads 213
25369 Dynamic vs. Static Bankruptcy Prediction Models: A Dynamic Performance Evaluation Framework

Authors: Mohammad Mahdi Mousavi

Abstract:

Bankruptcy prediction models have been implemented for continuous evaluation and monitoring of firms. With the huge number of bankruptcy models, an extensive number of studies have focused on answering the question that which of these models are superior in performance. In practice, one of the drawbacks of existing comparative studies is that the relative assessment of alternative bankruptcy models remains an exercise that is mono-criterion in nature. Further, a very restricted number of criteria and measure have been applied to compare the performance of competing bankruptcy prediction models. In this research, we overcome these methodological gaps through implementing an extensive range of criteria and measures for comparison between dynamic and static bankruptcy models, and through proposing a multi-criteria framework to compare the relative performance of bankruptcy models in forecasting firm distress for UK firms.

Keywords: bankruptcy prediction, data envelopment analysis, performance criteria, performance measures

Procedia PDF Downloads 227
25368 An Intelligent Prediction Method for Annular Pressure Driven by Mechanism and Data

Authors: Zhaopeng Zhu, Xianzhi Song, Gensheng Li, Shuo Zhu, Shiming Duan, Xuezhe Yao

Abstract:

Accurate calculation of wellbore pressure is of great significance to prevent wellbore risk during drilling. The traditional mechanism model needs a lot of iterative solving procedures in the calculation process, which reduces the calculation efficiency and is difficult to meet the demand of dynamic control of wellbore pressure. In recent years, many scholars have introduced artificial intelligence algorithms into wellbore pressure calculation, which significantly improves the calculation efficiency and accuracy of wellbore pressure. However, due to the ‘black box’ property of intelligent algorithm, the existing intelligent calculation model of wellbore pressure is difficult to play a role outside the scope of training data and overreacts to data noise, often resulting in abnormal calculation results. In this study, the multi-phase flow mechanism is embedded into the objective function of the neural network model as a constraint condition, and an intelligent prediction model of wellbore pressure under the constraint condition is established based on more than 400,000 sets of pressure measurement while drilling (MPD) data. The constraint of the multi-phase flow mechanism makes the prediction results of the neural network model more consistent with the distribution law of wellbore pressure, which overcomes the black-box attribute of the neural network model to some extent. The main performance is that the accuracy of the independent test data set is further improved, and the abnormal calculation values basically disappear. This method is a prediction method driven by MPD data and multi-phase flow mechanism, and it is the main way to predict wellbore pressure accurately and efficiently in the future.

Keywords: multiphase flow mechanism, pressure while drilling data, wellbore pressure, mechanism constraints, combined drive

Procedia PDF Downloads 158
25367 A Machine Learning Model for Dynamic Prediction of Chronic Kidney Disease Risk Using Laboratory Data, Non-Laboratory Data, and Metabolic Indices

Authors: Amadou Wurry Jallow, Adama N. S. Bah, Karamo Bah, Shih-Ye Wang, Kuo-Chung Chu, Chien-Yeh Hsu

Abstract:

Chronic kidney disease (CKD) is a major public health challenge with high prevalence, rising incidence, and serious adverse consequences. Developing effective risk prediction models is a cost-effective approach to predicting and preventing complications of chronic kidney disease (CKD). This study aimed to develop an accurate machine learning model that can dynamically identify individuals at risk of CKD using various kinds of diagnostic data, with or without laboratory data, at different follow-up points. Creatinine is a key component used to predict CKD. These models will enable affordable and effective screening for CKD even with incomplete patient data, such as the absence of creatinine testing. This retrospective cohort study included data on 19,429 adults provided by a private research institute and screening laboratory in Taiwan, gathered between 2001 and 2015. Univariate Cox proportional hazard regression analyses were performed to determine the variables with high prognostic values for predicting CKD. We then identified interacting variables and grouped them according to diagnostic data categories. Our models used three types of data gathered at three points in time: non-laboratory, laboratory, and metabolic indices data. Next, we used subgroups of variables within each category to train two machine learning models (Random Forest and XGBoost). Our machine learning models can dynamically discriminate individuals at risk for developing CKD. All the models performed well using all three kinds of data, with or without laboratory data. Using only non-laboratory-based data (such as age, sex, body mass index (BMI), and waist circumference), both models predict chronic kidney disease as accurately as models using laboratory and metabolic indices data. Our machine learning models have demonstrated the use of different categories of diagnostic data for CKD prediction, with or without laboratory data. The machine learning models are simple to use and flexible because they work even with incomplete data and can be applied in any clinical setting, including settings where laboratory data is difficult to obtain.

Keywords: chronic kidney disease, glomerular filtration rate, creatinine, novel metabolic indices, machine learning, risk prediction

Procedia PDF Downloads 81
25366 Hybrid Approach for Software Defect Prediction Using Machine Learning with Optimization Technique

Authors: C. Manjula, Lilly Florence

Abstract:

Software technology is developing rapidly which leads to the growth of various industries. Now-a-days, software-based applications have been adopted widely for business purposes. For any software industry, development of reliable software is becoming a challenging task because a faulty software module may be harmful for the growth of industry and business. Hence there is a need to develop techniques which can be used for early prediction of software defects. Due to complexities in manual prediction, automated software defect prediction techniques have been introduced. These techniques are based on the pattern learning from the previous software versions and finding the defects in the current version. These techniques have attracted researchers due to their significant impact on industrial growth by identifying the bugs in software. Based on this, several researches have been carried out but achieving desirable defect prediction performance is still a challenging task. To address this issue, here we present a machine learning based hybrid technique for software defect prediction. First of all, Genetic Algorithm (GA) is presented where an improved fitness function is used for better optimization of features in data sets. Later, these features are processed through Decision Tree (DT) classification model. Finally, an experimental study is presented where results from the proposed GA-DT based hybrid approach is compared with those from the DT classification technique. The results show that the proposed hybrid approach achieves better classification accuracy.

Keywords: decision tree, genetic algorithm, machine learning, software defect prediction

Procedia PDF Downloads 313
25365 Regional Adjustment to the Analytical Attenuation Coefficient in the GMPM BSSA 14 for the Region of Spain

Authors: Gonzalez Carlos, Martinez Fransisco

Abstract:

There are various types of analysis that allow us to involve seismic phenomena that cause strong requirements for structures that are designed by society; one of them is a probabilistic analysis which works from prediction equations that have been created based on metadata seismic compiled in different regions. These equations form models that are used to describe the 5% damped pseudo spectra response for the various zones considering some easily known input parameters. The biggest problem for the creation of these models requires data with great robust statistics that support the results, and there are several places where this type of information is not available, for which the use of alternative methodologies helps to achieve adjustments to different models of seismic prediction.

Keywords: GMPM, 5% damped pseudo-response spectra, models of seismic prediction, PSHA

Procedia PDF Downloads 56
25364 Nonparametric Quantile Regression for Multivariate Spatial Data

Authors: S. H. Arnaud Kanga, O. Hili, S. Dabo-Niang

Abstract:

Spatial prediction is an issue appealing and attracting several fields such as agriculture, environmental sciences, ecology, econometrics, and many others. Although multiple non-parametric prediction methods exist for spatial data, those are based on the conditional expectation. This paper took a different approach by examining a non-parametric spatial predictor of the conditional quantile. The study especially observes the stationary multidimensional spatial process over a rectangular domain. Indeed, the proposed quantile is obtained by inverting the conditional distribution function. Furthermore, the proposed estimator of the conditional distribution function depends on three kernels, where one of them controls the distance between spatial locations, while the other two control the distance between observations. In addition, the almost complete convergence and the convergence in mean order q of the kernel predictor are obtained when the sample considered is alpha-mixing. Such approach of the prediction method gives the advantage of accuracy as it overcomes sensitivity to extreme and outliers values.

Keywords: conditional quantile, kernel, nonparametric, stationary

Procedia PDF Downloads 130
25363 Epileptic Seizure Prediction Focusing on Relative Change in Consecutive Segments of EEG Signal

Authors: Mohammad Zavid Parvez, Manoranjan Paul

Abstract:

Epilepsy is a common neurological disorders characterized by sudden recurrent seizures. Electroencephalogram (EEG) is widely used to diagnose possible epileptic seizure. Many research works have been devoted to predict epileptic seizure by analyzing EEG signal. Seizure prediction by analyzing EEG signals are challenging task due to variations of brain signals of different patients. In this paper, we propose a new approach for feature extraction based on phase correlation in EEG signals. In phase correlation, we calculate relative change between two consecutive segments of an EEG signal and then combine the changes with neighboring signals to extract features. These features are then used to classify preictal/ictal and interictal EEG signals for seizure prediction. Experiment results show that the proposed method carries good prediction rate with greater consistence for the benchmark data set in different brain locations compared to the existing state-of-the-art methods.

Keywords: EEG, epilepsy, phase correlation, seizure

Procedia PDF Downloads 291
25362 Proposing an Architecture for Drug Response Prediction by Integrating Multiomics Data and Utilizing Graph Transformers

Authors: Nishank Raisinghani

Abstract:

Efficiently predicting drug response remains a challenge in the realm of drug discovery. To address this issue, we propose four model architectures that combine graphical representation with varying positions of multiheaded self-attention mechanisms. By leveraging two types of multi-omics data, transcriptomics and genomics, we create a comprehensive representation of target cells and enable drug response prediction in precision medicine. A majority of our architectures utilize multiple transformer models, one with a graph attention mechanism and the other with a multiheaded self-attention mechanism, to generate latent representations of both drug and omics data, respectively. Our model architectures apply an attention mechanism to both drug and multiomics data, with the goal of procuring more comprehensive latent representations. The latent representations are then concatenated and input into a fully connected network to predict the IC-50 score, a measure of cell drug response. We experiment with all four of these architectures and extract results from all of them. Our study greatly contributes to the future of drug discovery and precision medicine by looking to optimize the time and accuracy of drug response prediction.

Keywords: drug discovery, transformers, graph neural networks, multiomics

Procedia PDF Downloads 124
25361 Masked Candlestick Model: A Pre-Trained Model for Trading Prediction

Authors: Ling Qi, Matloob Khushi, Josiah Poon

Abstract:

This paper introduces a pre-trained Masked Candlestick Model (MCM) for trading time-series data. The pre-trained model is based on three core designs. First, we convert trading price data at each data point as a set of normalized elements and produce embeddings of each element. Second, we generate a masked sequence of such embedded elements as inputs for self-supervised learning. Third, we use the encoder mechanism from the transformer to train the inputs. The masked model learns the contextual relations among the sequence of embedded elements, which can aid downstream classification tasks. To evaluate the performance of the pre-trained model, we fine-tune MCM for three different downstream classification tasks to predict future price trends. The fine-tuned models achieved better accuracy rates for all three tasks than the baseline models. To better analyze the effectiveness of MCM, we test the same architecture for three currency pairs, namely EUR/GBP, AUD/USD, and EUR/JPY. The experimentation results demonstrate MCM’s effectiveness on all three currency pairs and indicate the MCM’s capability for signal extraction from trading data.

Keywords: masked language model, transformer, time series prediction, trading prediction, embedding, transfer learning, self-supervised learning

Procedia PDF Downloads 101
25360 Leveraging the Power of Dual Spatial-Temporal Data Scheme for Traffic Prediction

Authors: Yang Zhou, Heli Sun, Jianbin Huang, Jizhong Zhao, Shaojie Qiao

Abstract:

Traffic prediction is a fundamental problem in urban environment, facilitating the smart management of various businesses, such as taxi dispatching, bike relocation, and stampede alert. Most earlier methods rely on identifying the intrinsic spatial-temporal correlation to forecast. However, the complex nature of this problem entails a more sophisticated solution that can simultaneously capture the mutual influence of both adjacent and far-flung areas, with the information of time-dimension also incorporated seamlessly. To tackle this difficulty, we propose a new multi-phase architecture, DSTDS (Dual Spatial-Temporal Data Scheme for traffic prediction), that aims to reveal the underlying relationship that determines future traffic trend. First, a graph-based neural network with an attention mechanism is devised to obtain the static features of the road network. Then, a multi-granularity recurrent neural network is built in conjunction with the knowledge from a grid-based model. Subsequently, the preceding output is fed into a spatial-temporal super-resolution module. With this 3-phase structure, we carry out extensive experiments on several real-world datasets to demonstrate the effectiveness of our approach, which surpasses several state-of-the-art methods.

Keywords: traffic prediction, spatial-temporal, recurrent neural network, dual data scheme

Procedia PDF Downloads 92
25359 Inferring Human Mobility in India Using Machine Learning

Authors: Asra Yousuf, Ajaykumar Tannirkulum

Abstract:

Inferring rural-urban migration trends can help design effective policies that promote better urban planning and rural development. In this paper, we describe how machine learning algorithms can be applied to predict internal migration decisions of people. We consider data collected from household surveys in Tamil Nadu to train our model. To measure the performance of the model, we use data on past migration from National Sample Survey Organisation of India. The factors for training the model include socioeconomic characteristic of each individual like age, gender, place of residence, outstanding loans, strength of the household, etc. and his past migration history. We perform a comparative analysis of the performance of a number of machine learning algorithm to determine their prediction accuracy. Our results show that machine learning algorithms provide a stronger prediction accuracy as compared to statistical models. Our goal through this research is to propose the use of data science techniques in understanding human decisions and behaviour in developing countries.

Keywords: development, migration, internal migration, machine learning, prediction

Procedia PDF Downloads 251
25358 Winter Wheat Yield Forecasting Using Sentinel-2 Imagery at the Early Stages

Authors: Chunhua Liao, Jinfei Wang, Bo Shan, Yang Song, Yongjun He, Taifeng Dong

Abstract:

Winter wheat is one of the main crops in Canada. Forecasting of within-field variability of yield in winter wheat at the early stages is essential for precision farming. However, the crop yield modelling based on high spatial resolution satellite data is generally affected by the lack of continuous satellite observations, resulting in reducing the generalization ability of the models and increasing the difficulty of crop yield forecasting at the early stages. In this study, the correlations between Sentinel-2 data (vegetation indices and reflectance) and yield data collected by combine harvester were investigated and a generalized multivariate linear regression (MLR) model was built and tested with data acquired in different years. It was found that the four-band reflectance (blue, green, red, near-infrared) performed better than their vegetation indices (NDVI, EVI, WDRVI and OSAVI) in wheat yield prediction. The optimum phenological stage for wheat yield prediction with highest accuracy was at the growing stages from the end of the flowering to the beginning of the filling stage. The best MLR model was therefore built to predict wheat yield before harvest using Sentinel-2 data acquired at the end of the flowering stage. Further, to improve the ability of the yield prediction at the early stages, three simple unsupervised domain adaptation (DA) methods were adopted to transform the reflectance data at the early stages to the optimum phenological stage. The winter wheat yield prediction using multiple vegetation indices showed higher accuracy than using single vegetation index. The optimum stage for winter wheat yield forecasting varied with different fields when using vegetation indices, while it was consistent when using multispectral reflectance and the optimum stage for winter wheat yield prediction was at the end of flowering stage. The average testing RMSE of the MLR model at the end of the flowering stage was 604.48 kg/ha. Near the booting stage, the average testing RMSE of yield prediction using the best MLR was reduced to 799.18 kg/ha when applying the mean matching domain adaptation approach to transform the data to the target domain (at the end of the flowering) compared to that using the original data based on the models developed at the booting stage directly (“MLR at the early stage”) (RMSE =1140.64 kg/ha). This study demonstrated that the simple mean matching (MM) performed better than other DA methods and it was found that “DA then MLR at the optimum stage” performed better than “MLR directly at the early stages” for winter wheat yield forecasting at the early stages. The results indicated that the DA had a great potential in near real-time crop yield forecasting at the early stages. This study indicated that the simple domain adaptation methods had a great potential in crop yield prediction at the early stages using remote sensing data.

Keywords: wheat yield prediction, domain adaptation, Sentinel-2, within-field scale

Procedia PDF Downloads 45
25357 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education

Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue

Abstract:

In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.

Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education

Procedia PDF Downloads 88
25356 Hard Disk Failure Predictions in Supercomputing System Based on CNN-LSTM and Oversampling Technique

Authors: Yingkun Huang, Li Guo, Zekang Lan, Kai Tian

Abstract:

Hard disk drives (HDD) failure of the exascale supercomputing system may lead to service interruption and invalidate previous calculations, and it will cause permanent data loss. Therefore, initiating corrective actions before hard drive failures materialize is critical to the continued operation of jobs. In this paper, a highly accurate analysis model based on CNN-LSTM and oversampling technique was proposed, which can correctly predict the necessity of a disk replacement even ten days in advance. Generally, the learning-based method performs poorly on a training dataset with long-tail distribution, especially fault prediction is a very classic situation as the scarcity of failure data. To overcome the puzzle, a new oversampling was employed to augment the data, and then, an improved CNN-LSTM with the shortcut was built to learn more effective features. The shortcut transmits the results of the previous layer of CNN and is used as the input of the LSTM model after weighted fusion with the output of the next layer. Finally, a detailed, empirical comparison of 6 prediction methods is presented and discussed on a public dataset for evaluation. The experiments indicate that the proposed method predicts disk failure with 0.91 Precision, 0.91 Recall, 0.91 F-measure, and 0.90 MCC for 10 days prediction horizon. Thus, the proposed algorithm is an efficient algorithm for predicting HDD failure in supercomputing.

Keywords: HDD replacement, failure, CNN-LSTM, oversampling, prediction

Procedia PDF Downloads 60
25355 Data Refinement Enhances The Accuracy of Short-Term Traffic Latency Prediction

Authors: Man Fung Ho, Lap So, Jiaqi Zhang, Yuheng Zhao, Huiyang Lu, Tat Shing Choi, K. Y. Michael Wong

Abstract:

Nowadays, a tremendous amount of data is available in the transportation system, enabling the development of various machine learning approaches to make short-term latency predictions. A natural question is then the choice of relevant information to enable accurate predictions. Using traffic data collected from the Taiwan Freeway System, we consider the prediction of short-term latency of a freeway segment with a length of 17 km covering 5 measurement points, each collecting vehicle-by-vehicle data through the electronic toll collection system. The processed data include the past latencies of the freeway segment with different time lags, the traffic conditions of the individual segments (the accumulations, the traffic fluxes, the entrance and exit rates), the total accumulations, and the weekday latency profiles obtained by Gaussian process regression of past data. We arrive at several important conclusions about how data should be refined to obtain accurate predictions, which have implications for future system-wide latency predictions. (1) We find that the prediction of median latency is much more accurate and meaningful than the prediction of average latency, as the latter is plagued by outliers. This is verified by machine-learning prediction using XGBoost that yields a 35% improvement in the mean square error of the 5-minute averaged latencies. (2) We find that the median latency of the segment 15 minutes ago is a very good baseline for performance comparison, and we have evidence that further improvement is achieved by machine learning approaches such as XGBoost and Long Short-Term Memory (LSTM). (3) By analyzing the feature importance score in XGBoost and calculating the mutual information between the inputs and the latencies to be predicted, we identify a sequence of inputs ranked in importance. It confirms that the past latencies are most informative of the predicted latencies, followed by the total accumulation, whereas inputs such as the entrance and exit rates are uninformative. It also confirms that the inputs are much less informative of the average latencies than the median latencies. (4) For predicting the latencies of segments composed of two or three sub-segments, summing up the predicted latencies of each sub-segment is more accurate than the one-step prediction of the whole segment, especially with the latency prediction of the downstream sub-segments trained to anticipate latencies several minutes ahead. The duration of the anticipation time is an increasing function of the traveling time of the upstream segment. The above findings have important implications to predicting the full set of latencies among the various locations in the freeway system.

Keywords: data refinement, machine learning, mutual information, short-term latency prediction

Procedia PDF Downloads 151
25354 Generalized Extreme Value Regression with Binary Dependent Variable: An Application for Predicting Meteorological Drought Probabilities

Authors: Retius Chifurira

Abstract:

Logistic regression model is the most used regression model to predict meteorological drought probabilities. When the dependent variable is extreme, the logistic model fails to adequately capture drought probabilities. In order to adequately predict drought probabilities, we use the generalized linear model (GLM) with the quantile function of the generalized extreme value distribution (GEVD) as the link function. The method maximum likelihood estimation is used to estimate the parameters of the generalized extreme value (GEV) regression model. We compare the performance of the logistic and the GEV regression models in predicting drought probabilities for Zimbabwe. The performance of the regression models are assessed using the goodness-of-fit tests, namely; relative root mean square error (RRMSE) and relative mean absolute error (RMAE). Results show that the GEV regression model performs better than the logistic model, thereby providing a good alternative candidate for predicting drought probabilities. This paper provides the first application of GLM derived from extreme value theory to predict drought probabilities for a drought-prone country such as Zimbabwe.

Keywords: generalized extreme value distribution, general linear model, mean annual rainfall, meteorological drought probabilities

Procedia PDF Downloads 175
25353 Engagement Analysis Using DAiSEE Dataset

Authors: Naman Solanki, Souraj Mondal

Abstract:

With the world moving towards online communication, the video datastore has exploded in the past few years. Consequently, it has become crucial to analyse participant’s engagement levels in online communication videos. Engagement prediction of people in videos can be useful in many domains, like education, client meetings, dating, etc. Video-level or frame-level prediction of engagement for a user involves the development of robust models that can capture facial micro-emotions efficiently. For the development of an engagement prediction model, it is necessary to have a widely-accepted standard dataset for engagement analysis. DAiSEE is one of the datasets which consist of in-the-wild data and has a gold standard annotation for engagement prediction. Earlier research done using the DAiSEE dataset involved training and testing standard models like CNN-based models, but the results were not satisfactory according to industry standards. In this paper, a multi-level classification approach has been introduced to create a more robust model for engagement analysis using the DAiSEE dataset. This approach has recorded testing accuracies of 0.638, 0.7728, 0.8195, and 0.866 for predicting boredom level, engagement level, confusion level, and frustration level, respectively.

Keywords: computer vision, engagement prediction, deep learning, multi-level classification

Procedia PDF Downloads 97
25352 Prediction of Marijuana Use among Iranian Early Youth: an Application of Integrative Model of Behavioral Prediction

Authors: Mehdi Mirzaei Alavijeh, Farzad Jalilian

Abstract:

Background: Marijuana is the most widely used illicit drug worldwide, especially among adolescents and young adults, which can cause numerous complications. The aim of this study was to determine the pattern, motivation use, and factors related to marijuana use among Iranian youths based on the integrative model of behavioral prediction Methods: A cross-sectional study was conducted among 174 youths marijuana user in Kermanshah County and Isfahan County, during summer 2014 which was selected with the convenience sampling for participation in this study. A self-reporting questionnaire was applied for collecting data. Data were analyzed by SPSS version 21 using bivariate correlations and linear regression statistical tests. Results: The mean marijuana use of respondents was 4.60 times at during week [95% CI: 4.06, 5.15]. Linear regression statistical showed, the structures of integrative model of behavioral prediction accounted for 36% of the variation in the outcome measure of the marijuana use at during week (R2 = 36% & P < 0.001); and among them attitude, marijuana refuse, and subjective norms were a stronger predictors. Conclusion: Comprehensive health education and prevention programs need to emphasize on cognitive factors that predict youth’s health-related behaviors. Based on our findings it seems, designing educational and behavioral intervention for reducing positive belief about marijuana, marijuana self-efficacy refuse promotion and reduce subjective norms encourage marijuana use has an effective potential to protect youths marijuana use.

Keywords: marijuana, youth, integrative model of behavioral prediction, Iran

Procedia PDF Downloads 540
25351 Impact of Air Pollution and Climate on the Incidence of Emergency Interventions in Slavonski Brod

Authors: Renata Josipovic, Ante Cvitkovic

Abstract:

Particulate matter belongs to pollutants that can lead to respiratory problems or premature death due to exposure (long-term, short-term) to these substances, all depending on the severity of the effects. The importance of the study is to determine whether the existing climatic conditions in the period from January 1st to August 31st, 2018 increased the number of emergency interventions in Slavonski Brod with regard to pollutants hydrogen sulfide and particles less than 10 µm (PM10) and less than 2.5 µm (PM2.5). Analytical data of the concentration of pollutants are collected from the Croatian Meteorological and Hydrological Service, which monitors the operation of two meteorological stations in Slavonski Brod, as well as climatic conditions. Statistics data of emergency interventions were collected from the Emergency Medicine Department of Slavonski Brod. All data were compared (air pollution, emergency interventions) according to climatic conditions (air humidity and air temperature) and statistically processed. Statistical significance, although weak positive correlation PM2.5 (correlation coefficient 0.147; p = 0.036), determined PM10 (correlation coefficient 0.122; p = 0.048), hydrogen sulfide (correlation coefficient 0.141; p = 0.035) with max. temperature (correlation coefficient 0.202; p = 0.002) with number of interventions. The association between mean air humidity was significant but negative (correlation coefficient - 0.172; p = 0.007). The values of the influence of air pressure are not determined. As the problem of air pollution is very complex, coordinated action at many levels is needed to reduce air pollution in Slavonski Brod and consequences that can affect human health.

Keywords: emergency interventions, human health, hydrogen sulfide, particulate matter

Procedia PDF Downloads 140
25350 A Multilevel Approach for Stroke Prediction Combining Risk Factors and Retinal Images

Authors: Jeena R. S., Sukesh Kumar A.

Abstract:

Stroke is one of the major reasons of adult disability and morbidity in many of the developing countries like India. Early diagnosis of stroke is essential for timely prevention and cure. Various conventional statistical methods and computational intelligent models have been developed for predicting the risk and outcome of stroke. This research work focuses on a multilevel approach for predicting the occurrence of stroke based on various risk factors and invasive techniques like retinal imaging. This risk prediction model can aid in clinical decision making and help patients to have an improved and reliable risk prediction.

Keywords: prediction, retinal imaging, risk factors, stroke

Procedia PDF Downloads 278
25349 Blood Glucose Measurement and Analysis: Methodology

Authors: I. M. Abd Rahim, H. Abdul Rahim, R. Ghazali

Abstract:

There is numerous non-invasive blood glucose measurement technique developed by researchers, and near infrared (NIR) is the potential technique nowadays. However, there are some disagreements on the optimal wavelength range that is suitable to be used as the reference of the glucose substance in the blood. This paper focuses on the experimental data collection technique and also the analysis method used to analyze the data gained from the experiment. The selection of suitable linear and non-linear model structure is essential in prediction system, as the system developed need to be conceivably accurate.

Keywords: linear, near-infrared (NIR), non-invasive, non-linear, prediction system

Procedia PDF Downloads 443
25348 Water Balance Components under Climate Change in Croatia

Authors: Jelena Bašić, Višnjica Vučetić, Mislav Anić, Tomislav Bašić

Abstract:

Lack of precipitation combined with high temperatures causes great damage to the agriculture and economy in Croatia. Therefore, it is important to understand water circulation and balance. We decided to gain a better insight into the spatial distribution of water balance components (WBC) and their long-term changes in Croatia. WBC are precipitation (P), potential evapotranspiration (PET), actual evapotranspiration (ET), soil moisture content (S), runoff (RO), recharge (R), and soil moisture loss (L). Since measurements of the mentioned components in Croatia are very rare, the Palmer model has been applied to estimate them. We refined method by setting into the account the corrective factor to include influence effects of the wind as well as a maximum soil capacity for specific soil types. We will present one hundred years’ time series of PET and ET showing the trends at few meteorological stations and a comparison of components of two climatological periods. The meteorological data from 109 stations have been used for the spatial distribution map of the WBC of Croatia.

Keywords: croatia, long-term trends, the palmer method, water balance components

Procedia PDF Downloads 121
25347 Predicting Match Outcomes in Team Sport via Machine Learning: Evidence from National Basketball Association

Authors: Jacky Liu

Abstract:

This paper develops a team sports outcome prediction system with potential for wide-ranging applications across various disciplines. Despite significant advancements in predictive analytics, existing studies in sports outcome predictions possess considerable limitations, including insufficient feature engineering and underutilization of advanced machine learning techniques, among others. To address these issues, we extend the Sports Cross Industry Standard Process for Data Mining (SRP-CRISP-DM) framework and propose a unique, comprehensive predictive system, using National Basketball Association (NBA) data as an example to test this extended framework. Our approach follows a holistic methodology in feature engineering, employing both Time Series and Non-Time Series Data, as well as conducting Explanatory Data Analysis and Feature Selection. Furthermore, we contribute to the discourse on target variable choice in team sports outcome prediction, asserting that point spread prediction yields higher profits as opposed to game-winner predictions. Using machine learning algorithms, particularly XGBoost, results in a significant improvement in predictive accuracy of team sports outcomes. Applied to point spread betting strategies, it offers an astounding annual return of approximately 900% on an initial investment of $100. Our findings not only contribute to academic literature, but have critical practical implications for sports betting. Our study advances the understanding of team sports outcome prediction a burgeoning are in complex system predictions and pave the way for potential profitability and more informed decision making in sports betting markets.

Keywords: machine learning, team sports, game outcome prediction, sports betting, profits simulation

Procedia PDF Downloads 78