Search results for: empirical bayesian kriging regression prediction
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7346

Search results for: empirical bayesian kriging regression prediction

7286 Free Fatty Acid Assessment of Crude Palm Oil Using a Non-Destructive Approach

Authors: Siti Nurhidayah Naqiah Abdull Rani, Herlina Abdul Rahim, Rashidah Ghazali, Noramli Abdul Razak

Abstract:

Near infrared (NIR) spectroscopy has always been of great interest in the food and agriculture industries. The development of prediction models has facilitated the estimation process in recent years. In this study, 110 crude palm oil (CPO) samples were used to build a free fatty acid (FFA) prediction model. 60% of the collected data were used for training purposes and the remaining 40% used for testing. The visible peaks on the NIR spectrum were at 1725 nm and 1760 nm, indicating the existence of the first overtone of C-H bands. Principal component regression (PCR) was applied to the data in order to build this mathematical prediction model. The optimal number of principal components was 10. The results showed R2=0.7147 for the training set and R2=0.6404 for the testing set.

Keywords: palm oil, fatty acid, NIRS, regression

Procedia PDF Downloads 479
7285 Discrete State Prediction Algorithm Design with Self Performance Enhancement Capacity

Authors: Smail Tigani, Mohamed Ouzzif

Abstract:

This work presents a discrete quantitative state prediction algorithm with intelligent behavior making it able to self-improve some performance aspects. The specificity of this algorithm is the capacity of self-rectification of the prediction strategy before the final decision. The auto-rectification mechanism is based on two parallel mathematical models. In one hand, the algorithm predicts the next state based on event transition matrix updated after each observation. In the other hand, the algorithm extracts its residues trend with a linear regression representing historical residues data-points in order to rectify the first decision if needs. For a normal distribution, the interactivity between the two models allows the algorithm to self-optimize its performance and then make better prediction. Designed key performance indicator, computed during a Monte Carlo simulation, shows the advantages of the proposed approach compared with traditional one.

Keywords: discrete state, Markov Chains, linear regression, auto-adaptive systems, decision making, Monte Carlo Simulation

Procedia PDF Downloads 475
7284 Modified Weibull Approach for Bridge Deterioration Modelling

Authors: Niroshan K. Walgama Wellalage, Tieling Zhang, Richard Dwight

Abstract:

State-based Markov deterioration models (SMDM) sometimes fail to find accurate transition probability matrix (TPM) values, and hence lead to invalid future condition prediction or incorrect average deterioration rates mainly due to drawbacks of existing nonlinear optimization-based algorithms and/or subjective function types used for regression analysis. Furthermore, a set of separate functions for each condition state with age cannot be directly derived by using Markov model for a given bridge element group, which however is of interest to industrial partners. This paper presents a new approach for generating Homogeneous SMDM model output, namely, the Modified Weibull approach, which consists of a set of appropriate functions to describe the percentage condition prediction of bridge elements in each state. These functions are combined with Bayesian approach and Metropolis Hasting Algorithm (MHA) based Markov Chain Monte Carlo (MCMC) simulation technique for quantifying the uncertainty in model parameter estimates. In this study, factors contributing to rail bridge deterioration were identified. The inspection data for 1,000 Australian railway bridges over 15 years were reviewed and filtered accordingly based on the real operational experience. Network level deterioration model for a typical bridge element group was developed using the proposed Modified Weibull approach. The condition state predictions obtained from this method were validated using statistical hypothesis tests with a test data set. Results show that the proposed model is able to not only predict the conditions in network-level accurately but also capture the model uncertainties with given confidence interval.

Keywords: bridge deterioration modelling, modified weibull approach, MCMC, metropolis-hasting algorithm, bayesian approach, Markov deterioration models

Procedia PDF Downloads 693
7283 Modeling Spatio-Temporal Variation in Rainfall Using a Hierarchical Bayesian Regression Model

Authors: Sabyasachi Mukhopadhyay, Joseph Ogutu, Gundula Bartzke, Hans-Peter Piepho

Abstract:

Rainfall is a critical component of climate governing vegetation growth and production, forage availability and quality for herbivores. However, reliable rainfall measurements are not always available, making it necessary to predict rainfall values for particular locations through time. Predicting rainfall in space and time can be a complex and challenging task, especially where the rain gauge network is sparse and measurements are not recorded consistently for all rain gauges, leading to many missing values. Here, we develop a flexible Bayesian model for predicting rainfall in space and time and apply it to Narok County, situated in southwestern Kenya, using data collected at 23 rain gauges from 1965 to 2015. Narok County encompasses the Maasai Mara ecosystem, the northern-most section of the Mara-Serengeti ecosystem, famous for its diverse and abundant large mammal populations and spectacular migration of enormous herds of wildebeest, zebra and Thomson's gazelle. The model incorporates geographical and meteorological predictor variables, including elevation, distance to Lake Victoria and minimum temperature. We assess the efficiency of the model by comparing it empirically with the established Gaussian process, Kriging, simple linear and Bayesian linear models. We use the model to predict total monthly rainfall and its standard error for all 5 * 5 km grid cells in Narok County. Using the Monte Carlo integration method, we estimate seasonal and annual rainfall and their standard errors for 29 sub-regions in Narok. Finally, we use the predicted rainfall to predict large herbivore biomass in the Maasai Mara ecosystem on a 5 * 5 km grid for both the wet and dry seasons. We show that herbivore biomass increases with rainfall in both seasons. The model can handle data from a sparse network of observations with many missing values and performs at least as well as or better than four established and widely used models, on the Narok data set. The model produces rainfall predictions consistent with expectation and in good agreement with the blended station and satellite rainfall values. The predictions are precise enough for most practical purposes. The model is very general and applicable to other variables besides rainfall.

Keywords: non-stationary covariance function, gaussian process, ungulate biomass, MCMC, maasai mara ecosystem

Procedia PDF Downloads 258
7282 Line Heating Forming: Methodology and Application Using Kriging and Fifth Order Spline Formulations

Authors: Henri Champliaud, Zhengkun Feng, Ngan Van Lê, Javad Gholipour

Abstract:

In this article, a method is presented to effectively estimate the deformed shape of a thick plate due to line heating. The method uses a fifth order spline interpolation, with up to C3 continuity at specific points to compute the shape of the deformed geometry. First and second order derivatives over a surface are the resulting parameters of a given heating line on a plate. These parameters are determined through experiments and/or finite element simulations. Very accurate kriging models are fitted to real or virtual surfaces to build-up a database of maps. Maps of first and second order derivatives are then applied on numerical plate models to evaluate their evolving shapes through a sequence of heating lines. Adding an optimization process to this approach would allow determining the trajectories of heating lines needed to shape complex geometries, such as Francis turbine blades.

Keywords: deformation, kriging, fifth order spline interpolation, first, second and third order derivatives, C3 continuity, line heating, plate forming, thermal forming

Procedia PDF Downloads 432
7281 Weighted Rank Regression with Adaptive Penalty Function

Authors: Kang-Mo Jung

Abstract:

The use of regularization for statistical methods has become popular. The least absolute shrinkage and selection operator (LASSO) framework has become the standard tool for sparse regression. However, it is well known that the LASSO is sensitive to outliers or leverage points. We consider a new robust estimation which is composed of the weighted loss function of the pairwise difference of residuals and the adaptive penalty function regulating the tuning parameter for each variable. Rank regression is resistant to regression outliers, but not to leverage points. By adopting a weighted loss function, the proposed method is robust to leverage points of the predictor variable. Furthermore, the adaptive penalty function gives us good statistical properties in variable selection such as oracle property and consistency. We develop an efficient algorithm to compute the proposed estimator using basic functions in program R. We used an optimal tuning parameter based on the Bayesian information criterion (BIC). Numerical simulation shows that the proposed estimator is effective for analyzing real data set and contaminated data.

Keywords: adaptive penalty function, robust penalized regression, variable selection, weighted rank regression

Procedia PDF Downloads 432
7280 Comparative Study on Daily Discharge Estimation of Soolegan River

Authors: Redvan Ghasemlounia, Elham Ansari, Hikmet Kerem Cigizoglu

Abstract:

Hydrological modeling in arid and semi-arid regions is very important. Iran has many regions with these climate conditions such as Chaharmahal and Bakhtiari province that needs lots of attention with an appropriate management. Forecasting of hydrological parameters and estimation of hydrological events of catchments, provide important information that used for design, management and operation of water resources such as river systems, and dams, widely. Discharge in rivers is one of these parameters. This study presents the application and comparison of some estimation methods such as Feed-Forward Back Propagation Neural Network (FFBPNN), Multi Linear Regression (MLR), Gene Expression Programming (GEP) and Bayesian Network (BN) to predict the daily flow discharge of the Soolegan River, located at Chaharmahal and Bakhtiari province, in Iran. In this study, Soolegan, station was considered. This Station is located in Soolegan River at 51° 14՜ Latitude 31° 38՜ longitude at North Karoon basin. The Soolegan station is 2086 meters higher than sea level. The data used in this study are daily discharge and daily precipitation of Soolegan station. Feed Forward Back Propagation Neural Network(FFBPNN), Multi Linear Regression (MLR), Gene Expression Programming (GEP) and Bayesian Network (BN) models were developed using the same input parameters for Soolegan's daily discharge estimation. The results of estimation models were compared with observed discharge values to evaluate performance of the developed models. Results of all methods were compared and shown in tables and charts.

Keywords: ANN, multi linear regression, Bayesian network, forecasting, discharge, gene expression programming

Procedia PDF Downloads 533
7279 Optimized Dynamic Bayesian Networks and Neural Verifier Test Applied to On-Line Isolated Characters Recognition

Authors: Redouane Tlemsani, Redouane, Belkacem Kouninef, Abdelkader Benyettou

Abstract:

In this paper, our system is a Markovien system which we can see it like a Dynamic Bayesian Networks. One of the major interests of these systems resides in the complete training of the models (topology and parameters) starting from training data. The Bayesian Networks are representing models of dubious knowledge on complex phenomena. They are a union between the theory of probability and the graph theory in order to give effective tools to represent a joined probability distribution on a set of random variables. The representation of knowledge bases on description, by graphs, relations of causality existing between the variables defining the field of study. The theory of Dynamic Bayesian Networks is a generalization of the Bayesians networks to the dynamic processes. Our objective amounts finding the better structure which represents the relationships (dependencies) between the variables of a dynamic bayesian network. In applications in pattern recognition, one will carry out the fixing of the structure which obliges us to admit some strong assumptions (for example independence between some variables).

Keywords: Arabic on line character recognition, dynamic Bayesian network, pattern recognition, networks

Procedia PDF Downloads 585
7278 Dry Relaxation Shrinkage Prediction of Bordeaux Fiber Using a Feed Forward Neural

Authors: Baeza S. Roberto

Abstract:

The knitted fabric suffers a deformation in its dimensions due to stretching and tension factors, transverse and longitudinal respectively, during the process in rectilinear knitting machines so it performs a dry relaxation shrinkage procedure and thermal action of prefixed to obtain stable conditions in the knitting. This paper presents a dry relaxation shrinkage prediction of Bordeaux fiber using a feed forward neural network and linear regression models. Six operational alternatives of shrinkage were predicted. A comparison of the results was performed finding neural network models with higher levels of explanation of the variability and prediction. The presence of different reposes are included. The models were obtained through a neural toolbox of Matlab and Minitab software with real data in a knitting company of Southern Guanajuato. The results allow predicting dry relaxation shrinkage of each alternative operation.

Keywords: neural network, dry relaxation, knitting, linear regression

Procedia PDF Downloads 553
7277 Estimation of Functional Response Model by Supervised Functional Principal Component Analysis

Authors: Hyon I. Paek, Sang Rim Kim, Hyon A. Ryu

Abstract:

In functional linear regression, one typical problem is to reduce dimension. Compared with multivariate linear regression, functional linear regression is regarded as an infinite-dimensional case, and the main task is to reduce dimensions of functional response and functional predictors. One common approach is to adapt functional principal component analysis (FPCA) on functional predictors and then use a few leading functional principal components (FPC) to predict the functional model. The leading FPCs estimated by the typical FPCA explain a major variation of the functional predictor, but these leading FPCs may not be mostly correlated with the functional response, so they may not be significant in the prediction for response. In this paper, we propose a supervised functional principal component analysis method for a functional response model with FPCs obtained by considering the correlation of the functional response. Our method would have a better prediction accuracy than the typical FPCA method.

Keywords: supervised, functional principal component analysis, functional response, functional linear regression

Procedia PDF Downloads 40
7276 Application of Bayesian Model Averaging and Geostatistical Output Perturbation to Generate Calibrated Ensemble Weather Forecast

Authors: Muhammad Luthfi, Sutikno Sutikno, Purhadi Purhadi

Abstract:

Weather forecast has necessarily been improved to provide the communities an accurate and objective prediction as well. To overcome such issue, the numerical-based weather forecast was extensively developed to reduce the subjectivity of forecast. Yet the Numerical Weather Predictions (NWPs) outputs are unfortunately issued without taking dynamical weather behavior and local terrain features into account. Thus, NWPs outputs are not able to accurately forecast the weather quantities, particularly for medium and long range forecast. The aim of this research is to aid and extend the development of ensemble forecast for Meteorology, Climatology, and Geophysics Agency of Indonesia. Ensemble method is an approach combining various deterministic forecast to produce more reliable one. However, such forecast is biased and uncalibrated due to its underdispersive or overdispersive nature. As one of the parametric methods, Bayesian Model Averaging (BMA) generates the calibrated ensemble forecast and constructs predictive PDF for specified period. Such method is able to utilize ensemble of any size but does not take spatial correlation into account. Whereas space dependencies involve the site of interest and nearby site, influenced by dynamic weather behavior. Meanwhile, Geostatistical Output Perturbation (GOP) reckons the spatial correlation to generate future weather quantities, though merely built by a single deterministic forecast, and is able to generate an ensemble of any size as well. This research conducts both BMA and GOP to generate the calibrated ensemble forecast for the daily temperature at few meteorological sites nearby Indonesia international airport.

Keywords: Bayesian Model Averaging, ensemble forecast, geostatistical output perturbation, numerical weather prediction, temperature

Procedia PDF Downloads 250
7275 Optimizing Communications Overhead in Heterogeneous Distributed Data Streams

Authors: Rashi Bhalla, Russel Pears, M. Asif Naeem

Abstract:

In this 'Information Explosion Era' analyzing data 'a critical commodity' and mining knowledge from vertically distributed data stream incurs huge communication cost. However, an effort to decrease the communication in the distributed environment has an adverse influence on the classification accuracy; therefore, a research challenge lies in maintaining a balance between transmission cost and accuracy. This paper proposes a method based on Bayesian inference to reduce the communication volume in a heterogeneous distributed environment while retaining prediction accuracy. Our experimental evaluation reveals that a significant reduction in communication can be achieved across a diverse range of dataset types.

Keywords: big data, bayesian inference, distributed data stream mining, heterogeneous-distributed data

Procedia PDF Downloads 129
7274 Prediction of PM₂.₅ Concentration in Ulaanbaatar with Deep Learning Models

Authors: Suriya

Abstract:

Rapid socio-economic development and urbanization have led to an increasingly serious air pollution problem in Ulaanbaatar (UB), the capital of Mongolia. PM₂.₅ pollution has become the most pressing aspect of UB air pollution. Therefore, monitoring and predicting PM₂.₅ concentration in UB is of great significance for the health of the local people and environmental management. As of yet, very few studies have used models to predict PM₂.₅ concentrations in UB. Using data from 0:00 on June 1, 2018, to 23:00 on April 30, 2020, we proposed two deep learning models based on Bayesian-optimized LSTM (Bayes-LSTM) and CNN-LSTM. We utilized hourly observed data, including Himawari8 (H8) aerosol optical depth (AOD), meteorology, and PM₂.₅ concentration, as input for the prediction of PM₂.₅ concentrations. The correlation strengths between meteorology, AOD, and PM₂.₅ were analyzed using the gray correlation analysis method; the comparison of the performance improvement of the model by using the AOD input value was tested, and the performance of these models was evaluated using mean absolute error (MAE) and root mean square error (RMSE). The prediction accuracies of Bayes-LSTM and CNN-LSTM deep learning models were both improved when AOD was included as an input parameter. Improvement of the prediction accuracy of the CNN-LSTM model was particularly enhanced in the non-heating season; in the heating season, the prediction accuracy of the Bayes-LSTM model slightly improved, while the prediction accuracy of the CNN-LSTM model slightly decreased. We propose two novel deep learning models for PM₂.₅ concentration prediction in UB, Bayes-LSTM, and CNN-LSTM deep learning models. Pioneering the use of AOD data from H8 and demonstrating the inclusion of AOD input data improves the performance of our two proposed deep learning models.

Keywords: deep learning, AOD, PM2.5, prediction, Ulaanbaatar

Procedia PDF Downloads 18
7273 Effect of Drying on the Concrete Structures

Authors: A. Brahma

Abstract:

The drying of hydraulics materials is unavoidable and conducted to important spontaneous deformations. In this study, we show that it is possible to describe the drying shrinkage of the high-performance concrete by a simple expression. A multiple regression model was developed for the prediction of the drying shrinkage of the high-performance concrete. The assessment of the proposed model has been done by a set of statistical tests. The model developed takes in consideration the main parameters of confection and conservation. There was a very good agreement between drying shrinkage predicted by the multiple regression model and experimental results. The developed model adjusts easily to all hydraulic concrete types.

Keywords: hydraulic concretes, drying, shrinkage, prediction, modeling

Procedia PDF Downloads 341
7272 Soil Degradati̇on Mapping Using Geographic Information System, Remote Sensing and Laboratory Analysis in the Oum Er Rbia High Basin, Middle Atlas, Morocco

Authors: Aafaf El Jazouli, Ahmed Barakat, Rida Khellouk

Abstract:

Mapping of soil degradation is derived from field observations, laboratory measurements, and remote sensing data, integrated quantitative methods to map the spatial characteristics of soil properties at different spatial and temporal scales to provide up-to-date information on the field. Since soil salinity, texture and organic matter play a vital role in assessing topsoil characteristics and soil quality, remote sensing can be considered an effective method for studying these properties. The main objective of this research is to asses soil degradation by combining remote sensing data and laboratory analysis. In order to achieve this goal, the required study of soil samples was taken at 50 locations in the upper basin of Oum Er Rbia in the Middle Atlas in Morocco. These samples were dried, sieved to 2 mm and analyzed in the laboratory. Landsat 8 OLI imagery was analyzed using physical or empirical methods to derive soil properties. In addition, remote sensing can serve as a supporting data source. Deterministic potential (Spline and Inverse Distance weighting) and probabilistic interpolation methods (ordinary kriging and universal kriging) were used to produce maps of each grain size class and soil properties using GIS software. As a result, a correlation was found between soil texture and soil organic matter content. This approach developed in ongoing research will improve the prospects for the use of remote sensing data for mapping soil degradation in arid and semi-arid environments.

Keywords: Soil degradation, GIS, interpolation methods (spline, IDW, kriging), Landsat 8 OLI, Oum Er Rbia high basin

Procedia PDF Downloads 138
7271 Representativity Based Wasserstein Active Regression

Authors: Benjamin Bobbia, Matthias Picard

Abstract:

In recent years active learning methodologies based on the representativity of the data seems more promising to limit overfitting. The presented query methodology for regression using the Wasserstein distance measuring the representativity of our labelled dataset compared to the global distribution. In this work a crucial use of GroupSort Neural Networks is made therewith to draw a double advantage. The Wasserstein distance can be exactly expressed in terms of such neural networks. Moreover, one can provide explicit bounds for their size and depth together with rates of convergence. However, heterogeneity of the dataset is also considered by weighting the Wasserstein distance with the error of approximation at the previous step of active learning. Such an approach leads to a reduction of overfitting and high prediction performance after few steps of query. After having detailed the methodology and algorithm, an empirical study is presented in order to investigate the range of our hyperparameters. The performances of this method are compared, in terms of numbers of query needed, with other classical and recent query methods on several UCI datasets.

Keywords: active learning, Lipschitz regularization, neural networks, optimal transport, regression

Procedia PDF Downloads 58
7270 Kriging-Based Global Optimization Method for Bluff Body Drag Reduction

Authors: Bingxi Huang, Yiqing Li, Marek Morzynski, Bernd R. Noack

Abstract:

We propose a Kriging-based global optimization method for active flow control with multiple actuation parameters. This method is designed to converge quickly and avoid getting trapped into local minima. We follow the model-free explorative gradient method (EGM) to alternate between explorative and exploitive steps. This facilitates a convergence similar to a gradient-based method and the parallel exploration of potentially better minima. In contrast to EGM, both kinds of steps are performed with Kriging surrogate model from the available data. The explorative step maximizes the expected improvement, i.e., favors regions of large uncertainty. The exploitive step identifies the best location of the cost function from the Kriging surrogate model for a subsequent weight-biased linear-gradient descent search method. To verify the effectiveness and robustness of the improved Kriging-based optimization method, we have examined several comparative test problems of varying dimensions with limited evaluation budgets. The results show that the proposed algorithm significantly outperforms some model-free optimization algorithms like genetic algorithm and differential evolution algorithm with a quicker convergence for a given budget. We have also performed direct numerical simulations of the fluidic pinball (N. Deng et al. 2020 J. Fluid Mech.) on three circular cylinders in equilateral-triangular arrangement immersed in an incoming flow at Re=100. The optimal cylinder rotations lead to 44.0% net drag power saving with 85.8% drag reduction and 41.8% actuation power. The optimal results for active flow control based on this configuration have achieved boat-tailing mechanism by employing Coanda forcing and wake stabilization by delaying separation and minimizing the wake region.

Keywords: direct numerical simulations, flow control, kriging, stochastic optimization, wake stabilization

Procedia PDF Downloads 79
7269 A Comparative Analysis of Machine Learning Techniques for PM10 Forecasting in Vilnius

Authors: Mina Adel Shokry Fahim, Jūratė Sužiedelytė Visockienė

Abstract:

With the growing concern over air pollution (AP), it is clear that this has gained more prominence than ever before. The level of consciousness has increased and a sense of knowledge now has to be forwarded as a duty by those enlightened enough to disseminate it to others. This realisation often comes after an understanding of how poor air quality indices (AQI) damage human health. The study focuses on assessing air pollution prediction models specifically for Lithuania, addressing a substantial need for empirical research within the region. Concentrating on Vilnius, it specifically examines particulate matter concentrations 10 micrometers or less in diameter (PM10). Utilizing Gaussian Process Regression (GPR) and Regression Tree Ensemble, and Regression Tree methodologies, predictive forecasting models are validated and tested using hourly data from January 2020 to December 2022. The study explores the classification of AP data into anthropogenic and natural sources, the impact of AP on human health, and its connection to cardiovascular diseases. The study revealed varying levels of accuracy among the models, with GPR achieving the highest accuracy, indicated by an RMSE of 4.14 in validation and 3.89 in testing.

Keywords: air pollution, anthropogenic and natural sources, machine learning, Gaussian process regression, tree ensemble, forecasting models, particulate matter

Procedia PDF Downloads 25
7268 Improved 3D Structure Prediction of Beta-Barrel Membrane Proteins by Using Evolutionary Coupling Constraints, Reduced State Space and an Empirical Potential Function

Authors: Wei Tian, Jie Liang, Hammad Naveed

Abstract:

Beta-barrel membrane proteins are found in the outer membrane of gram-negative bacteria, mitochondria, and chloroplasts. They carry out diverse biological functions, including pore formation, membrane anchoring, enzyme activity, and bacterial virulence. In addition, beta-barrel membrane proteins increasingly serve as scaffolds for bacterial surface display and nanopore-based DNA sequencing. Due to difficulties in experimental structure determination, they are sparsely represented in the protein structure databank and computational methods can help to understand their biophysical principles. We have developed a novel computational method to predict the 3D structure of beta-barrel membrane proteins using evolutionary coupling (EC) constraints and a reduced state space. Combined with an empirical potential function, we can successfully predict strand register at > 80% accuracy for a set of 49 non-homologous proteins with known structures. This is a significant improvement from previous results using EC alone (44%) and using empirical potential function alone (73%). Our method is general and can be applied to genome-wide structural prediction.

Keywords: beta-barrel membrane proteins, structure prediction, evolutionary constraints, reduced state space

Procedia PDF Downloads 580
7267 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest

Procedia PDF Downloads 206
7266 Loan Repayment Prediction Using Machine Learning: Model Development, Django Web Integration and Cloud Deployment

Authors: Seun Mayowa Sunday

Abstract:

Loan prediction is one of the most significant and recognised fields of research in the banking, insurance, and the financial security industries. Some prediction systems on the market include the construction of static software. However, due to the fact that static software only operates with strictly regulated rules, they cannot aid customers beyond these limitations. Application of many machine learning (ML) techniques are required for loan prediction. Four separate machine learning models, random forest (RF), decision tree (DT), k-nearest neighbour (KNN), and logistic regression, are used to create the loan prediction model. Using the anaconda navigator and the required machine learning (ML) libraries, models are created and evaluated using the appropriate measuring metrics. From the finding, the random forest performs with the highest accuracy of 80.17% which was later implemented into the Django framework. For real-time testing, the web application is deployed on the Alibabacloud which is among the top 4 biggest cloud computing provider. Hence, to the best of our knowledge, this research will serve as the first academic paper which combines the model development and the Django framework, with the deployment into the Alibaba cloud computing application.

Keywords: k-nearest neighbor, random forest, logistic regression, decision tree, django, cloud computing, alibaba cloud

Procedia PDF Downloads 102
7265 Evaluation of a Surrogate Based Method for Global Optimization

Authors: David Lindström

Abstract:

We evaluate the performance of a numerical method for global optimization of expensive functions. The method is using a response surface to guide the search for the global optimum. This metamodel could be based on radial basis functions, kriging, or a combination of different models. We discuss how to set the cycling parameters of the optimization method to get a balance between local and global search. We also discuss the eventual problem with Runge oscillations in the response surface.

Keywords: expensive function, infill sampling criterion, kriging, global optimization, response surface, Runge phenomenon

Procedia PDF Downloads 553
7264 Using Dynamic Bayesian Networks to Characterize and Predict Job Placement

Authors: Xupin Zhang, Maria Caterina Bramati, Enrest Fokoue

Abstract:

Understanding the career placement of graduates from the university is crucial for both the qualities of education and ultimate satisfaction of students. In this research, we adapt the capabilities of dynamic Bayesian networks to characterize and predict students’ job placement using data from various universities. We also provide elements of the estimation of the indicator (score) of the strength of the network. The research focuses on overall findings as well as specific student groups including international and STEM students and their insight on the career path and what changes need to be made. The derived Bayesian network has the potential to be used as a tool for simulating the career path for students and ultimately helps universities in both academic advising and career counseling.

Keywords: dynamic bayesian networks, indicator estimation, job placement, social networks

Procedia PDF Downloads 344
7263 Compression Index Estimation by Water Content and Liquid Limit and Void Ratio Using Statistics Method

Authors: Lizhou Chen, Abdelhamid Belgaid, Assem Elsayed, Xiaoming Yang

Abstract:

Compression index is essential in foundation settlement calculation. The traditional method for determining compression index is consolidation test which is expensive and time consuming. Many researchers have used regression methods to develop empirical equations for predicting compression index from soil properties. Based on a large number of compression index data collected from consolidation tests, the accuracy of some popularly empirical equations were assessed. It was found that primary compression index is significantly overestimated in some equations while it is underestimated in others. The sensitivity analyses of soil parameters including water content, liquid limit and void ratio were performed. The results indicate that the compression index obtained from void ratio is most accurate. The ANOVA (analysis of variance) demonstrates that the equations with multiple soil parameters cannot provide better predictions than the equations with single soil parameter. In other words, it is not necessary to develop the relationships between compression index and multiple soil parameters. Meanwhile, it was noted that secondary compression index is approximately 0.7-5.0% of primary compression index with an average of 2.0%. In the end, the proposed prediction equations using power regression technique were provided that can provide more accurate predictions than those from existing equations.

Keywords: compression index, clay, settlement, consolidation, secondary compression index, soil parameter

Procedia PDF Downloads 133
7262 A Generalized Weighted Loss for Support Vextor Classification and Multilayer Perceptron

Authors: Filippo Portera

Abstract:

Usually standard algorithms employ a loss where each error is the mere absolute difference between the true value and the prediction, in case of a regression task. In the present, we present several error weighting schemes that are a generalization of the consolidated routine. We study both a binary classification model for Support Vextor Classification and a regression net for Multylayer Perceptron. Results proves that the error is never worse than the standard procedure and several times it is better.

Keywords: loss, binary-classification, MLP, weights, regression

Procedia PDF Downloads 66
7261 One-Step Time Series Predictions with Recurrent Neural Networks

Authors: Vaidehi Iyer, Konstantin Borozdin

Abstract:

Time series prediction problems have many important practical applications, but are notoriously difficult for statistical modeling. Recently, machine learning methods have been attracted significant interest as a practical tool applied to a variety of problems, even though developments in this field tend to be semi-empirical. This paper explores application of Long Short Term Memory based Recurrent Neural Networks to the one-step prediction of time series for both trend and stochastic components. Two types of data are analyzed - daily stock prices, that are often considered to be a typical example of a random walk, - and weather patterns dominated by seasonal variations. Results from both analyses are compared, and reinforced learning framework is used to select more efficient between Recurrent Neural Networks and more traditional auto regression methods. It is shown that both methods are able to follow long-term trends and seasonal variations closely, but have difficulties with reproducing day-to-day variability. Future research directions and potential real world applications are briefly discussed.

Keywords: long short term memory, prediction methods, recurrent neural networks, reinforcement learning

Procedia PDF Downloads 201
7260 Prediction of Terrorist Activities in Nigeria using Bayesian Neural Network with Heterogeneous Transfer Functions

Authors: Tayo P. Ogundunmade, Adedayo A. Adepoju

Abstract:

Terrorist attacks in liberal democracies bring about a few pessimistic results, for example, sabotaged public support in the governments they target, disturbing the peace of a protected environment underwritten by the state, and a limitation of individuals from adding to the advancement of the country, among others. Hence, seeking for techniques to understand the different factors involved in terrorism and how to deal with those factors in order to completely stop or reduce terrorist activities is the topmost priority of the government in every country. This research aim is to develop an efficient deep learning-based predictive model for the prediction of future terrorist activities in Nigeria, addressing low-quality prediction accuracy problems associated with the existing solution methods. The proposed predictive AI-based model as a counterterrorism tool will be useful by governments and law enforcement agencies to protect the lives of individuals in society and to improve the quality of life in general. A Heterogeneous Bayesian Neural Network (HETBNN) model was derived with Gaussian error normal distribution. Three primary transfer functions (HOTTFs), as well as two derived transfer functions (HETTFs) arising from the convolution of the HOTTFs, are namely; Symmetric Saturated Linear transfer function (SATLINS ), Hyperbolic Tangent transfer function (TANH), Hyperbolic Tangent sigmoid transfer function (TANSIG), Symmetric Saturated Linear and Hyperbolic Tangent transfer function (SATLINS-TANH) and Symmetric Saturated Linear and Hyperbolic Tangent Sigmoid transfer function (SATLINS-TANSIG). Data on the Terrorist activities in Nigeria gathered through questionnaires for the purpose of this study were used. Mean Square Error (MSE), Mean Absolute Error (MAE) and Test Error are the forecast prediction criteria. The results showed that the HETFs performed better in terms of prediction and factors associated with terrorist activities in Nigeria were determined. The proposed predictive deep learning-based model will be useful to governments and law enforcement agencies as an effective counterterrorism mechanism to understand the parameters of terrorism and to design strategies to deal with terrorism before an incident actually happens and potentially causes the loss of precious lives. The proposed predictive AI-based model will reduce the chances of terrorist activities and is particularly helpful for security agencies to predict future terrorist activities.

Keywords: activation functions, Bayesian neural network, mean square error, test error, terrorism

Procedia PDF Downloads 137
7259 Application Difference between Cox and Logistic Regression Models

Authors: Idrissa Kayijuka

Abstract:

The logistic regression and Cox regression models (proportional hazard model) at present are being employed in the analysis of prospective epidemiologic research looking into risk factors in their application on chronic diseases. However, a theoretical relationship between the two models has been studied. By definition, Cox regression model also called Cox proportional hazard model is a procedure that is used in modeling data regarding time leading up to an event where censored cases exist. Whereas the Logistic regression model is mostly applicable in cases where the independent variables consist of numerical as well as nominal values while the resultant variable is binary (dichotomous). Arguments and findings of many researchers focused on the overview of Cox and Logistic regression models and their different applications in different areas. In this work, the analysis is done on secondary data whose source is SPSS exercise data on BREAST CANCER with a sample size of 1121 women where the main objective is to show the application difference between Cox regression model and logistic regression model based on factors that cause women to die due to breast cancer. Thus we did some analysis manually i.e. on lymph nodes status, and SPSS software helped to analyze the mentioned data. This study found out that there is an application difference between Cox and Logistic regression models which is Cox regression model is used if one wishes to analyze data which also include the follow-up time whereas Logistic regression model analyzes data without follow-up-time. Also, they have measurements of association which is different: hazard ratio and odds ratio for Cox and logistic regression models respectively. A similarity between the two models is that they are both applicable in the prediction of the upshot of a categorical variable i.e. a variable that can accommodate only a restricted number of categories. In conclusion, Cox regression model differs from logistic regression by assessing a rate instead of proportion. The two models can be applied in many other researches since they are suitable methods for analyzing data but the more recommended is the Cox, regression model.

Keywords: logistic regression model, Cox regression model, survival analysis, hazard ratio

Procedia PDF Downloads 424
7258 Growth Curves Genetic Analysis of Native South Caspian Sea Poultry Using Bayesian Statistics

Authors: Jamal Fayazi, Farhad Anoosheh, Mohammad R. Ghorbani, Ali R. Paydar

Abstract:

In this study, to determine the best non-linear regression model describing the growth curve of native poultry, 9657 chicks of generations 18, 19, and 20 raised in Mazandaran breeding center were used. Fowls and roosters of this center distributed in south of Caspian Sea region. To estimate the genetic variability of none linear regression parameter of growth traits, a Gibbs sampling of Bayesian analysis was used. The average body weight traits in the first day (BW1), eighth week (BW8) and twelfth week (BW12) were respectively estimated as 36.05, 763.03, and 1194.98 grams. Based on the coefficient of determination, mean squares of error and Akaike information criteria, Gompertz model was selected as the best growth descriptive function. In Gompertz model, the estimated values for the parameters of maturity weight (A), integration constant (B) and maturity rate (K) were estimated to be 1734.4, 3.986, and 0.282, respectively. The direct heritability of BW1, BW8 and BW12 were respectively reported to be as 0.378, 0.3709, 0.316, 0.389, 0.43, 0.09 and 0.07. With regard to estimated parameters, the results of this study indicated that there is a possibility to improve some property of growth curve using appropriate selection programs.

Keywords: direct heritability, Gompertz, growth traits, maturity weight, native poultry

Procedia PDF Downloads 233
7257 Application of Multilinear Regression Analysis for Prediction of Synthetic Shear Wave Velocity Logs in Upper Assam Basin

Authors: Triveni Gogoi, Rima Chatterjee

Abstract:

Shear wave velocity (Vs) estimation is an important approach in the seismic exploration and characterization of a hydrocarbon reservoir. There are varying methods for prediction of S-wave velocity, if recorded S-wave log is not available. But all the available methods for Vs prediction are empirical mathematical models. Shear wave velocity can be estimated using P-wave velocity by applying Castagna’s equation, which is the most common approach. The constants used in Castagna’s equation vary for different lithologies and geological set-ups. In this study, multiple regression analysis has been used for estimation of S-wave velocity. The EMERGE module from Hampson-Russel software has been used here for generation of S-wave log. Both single attribute and multi attributes analysis have been carried out for generation of synthetic S-wave log in Upper Assam basin. Upper Assam basin situated in North Eastern India is one of the most important petroleum provinces of India. The present study was carried out using four wells of the study area. Out of these wells, S-wave velocity was available for three wells. The main objective of the present study is a prediction of shear wave velocities for wells where S-wave velocity information is not available. The three wells having S-wave velocity were first used to test the reliability of the method and the generated S-wave log was compared with actual S-wave log. Single attribute analysis has been carried out for these three wells within the depth range 1700-2100m, which corresponds to Barail group of Oligocene age. The Barail Group is the main target zone in this study, which is the primary producing reservoir of the basin. A system generated list of attributes with varying degrees of correlation appeared and the attribute with the highest correlation was concerned for the single attribute analysis. Crossplot between the attributes shows the variation of points from line of best fit. The final result of the analysis was compared with the available S-wave log, which shows a good visual fit with a correlation of 72%. Next multi-attribute analysis has been carried out for the same data using all the wells within the same analysis window. A high correlation of 85% has been observed between the output log from the analysis and the recorded S-wave. The almost perfect fit between the synthetic S-wave and the recorded S-wave log validates the reliability of the method. For further authentication, the generated S-wave data from the wells have been tied to the seismic and correlated them. Synthetic share wave log has been generated for the well M2 where S-wave is not available and it shows a good correlation with the seismic. Neutron porosity, density, AI and P-wave velocity are proved to be the most significant variables in this statistical method for S-wave generation. Multilinear regression method thus can be considered as a reliable technique for generation of shear wave velocity log in this study.

Keywords: Castagna's equation, multi linear regression, multi attribute analysis, shear wave logs

Procedia PDF Downloads 199