Search results for: nonparametric regression model
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 18172

Search results for: nonparametric regression model

18142 Application and Verification of Regression Model to Landslide Susceptibility Mapping

Authors: Masood Beheshtirad

Abstract:

Identification of regions having potential for landslide occurrence is one of the basic measures in natural resources management. Different landslide hazard mapping models are proposed based on the environmental condition and goals. In this research landslide hazard map using multiple regression model were provided and applicability of this model is investigated in Baghdasht watershed. Dependent variable is landslide inventory map and independent variables consist of information layers as Geology, slope, aspect, distance from river, distance from road, fault and land use. For doing this, existing landslides have been identified and an inventory map made. The landslide hazard map is based on the multiple regression provided. The level of similarity potential hazard classes and figures of this model were compared with the landslide inventory map in the SPSS environments. Results of research showed that there is a significant correlation between the potential hazard classes and figures with area of the landslides. The multiple regression model is suitable for application in the Baghdasht Watershed.

Keywords: landslide, mapping, multiple model, regression

Procedia PDF Downloads 297
18141 The Use of Geographically Weighted Regression for Deforestation Analysis: Case Study in Brazilian Cerrado

Authors: Ana Paula Camelo, Keila Sanches

Abstract:

The Geographically Weighted Regression (GWR) was proposed in geography literature to allow relationship in a regression model to vary over space. In Brazil, the agricultural exploitation of the Cerrado Biome is the main cause of deforestation. In this study, we propose a methodology using geostatistical methods to characterize the spatial dependence of deforestation in the Cerrado based on agricultural production indicators. Therefore, it was used the set of exploratory spatial data analysis tools (ESDA) and confirmatory analysis using GWR. It was made the calibration a non-spatial model, evaluation the nature of the regression curve, election of the variables by stepwise process and multicollinearity analysis. After the evaluation of the non-spatial model was processed the spatial-regression model, statistic evaluation of the intercept and verification of its effect on calibration. In an analysis of Spearman’s correlation the results between deforestation and livestock was +0.783 and with soybeans +0.405. The model presented R²=0.936 and showed a strong spatial dependence of agricultural activity of soybeans associated to maize and cotton crops. The GWR is a very effective tool presenting results closer to the reality of deforestation in the Cerrado when compared with other analysis.

Keywords: deforestation, geographically weighted regression, land use, spatial analysis

Procedia PDF Downloads 324
18140 Nonparametric Specification Testing for the Drift of the Short Rate Diffusion Process Using a Panel of Yields

Authors: John Knight, Fuchun Li, Yan Xu

Abstract:

Based on a new method of the nonparametric estimator of the drift function, we propose a consistent test for the parametric specification of the drift function in the short rate diffusion process using observations from a panel of yields. The test statistic is shown to follow an asymptotic normal distribution under the null hypothesis that the parametric drift function is correctly specified, and converges to infinity under the alternative. Taking the daily 7-day European rates as a proxy of the short rate, we use our test to examine whether the drift of the short rate diffusion process is linear or nonlinear, which is an unresolved important issue in the short rate modeling literature. The testing results indicate that none of the drift functions in this literature adequately captures the dynamics of the drift, but nonlinear specification performs better than the linear specification.

Keywords: diffusion process, nonparametric estimation, derivative security price, drift function and volatility function

Procedia PDF Downloads 338
18139 The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation

Authors: Edlira Donefski, Lorenc Ekonomi, Tina Donefski

Abstract:

Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.

Keywords: bootstrap, edgeworth approximation, IID, quantile

Procedia PDF Downloads 126
18138 Generalized Additive Model for Estimating Propensity Score

Authors: Tahmidul Islam

Abstract:

Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.

Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching

Procedia PDF Downloads 328
18137 The Impact of Female Education on Fertility: A Natural Experiment from Egypt

Authors: Fatma Romeh, Shiferaw Gurmu

Abstract:

This paper examines the impact of female education on fertility, using the change in length of primary schooling in Egypt in 1988-89 as the source of exogenous variation in schooling. In particular, beginning in 1988, children had to attend primary school for only five years rather than six years. This change was applicable to all individuals born on or after October 1977. Using a nonparametric regression discontinuity approach, we compare education and fertility of women born just before and after October 1977. The results show that female education significantly reduces the number of children born per woman and delays the time until first birth. Applying a robust regression discontinuity approach, however, the impact of education on the number of children is no longer significant. The impact on the timing of first birth remained significant under the robust approach. Each year of female education postponed childbearing by three months, on average.

Keywords: Egypt, female education, fertility, robust regression discontinuity

Procedia PDF Downloads 301
18136 Establishment of the Regression Uncertainty of the Critical Heat Flux Power Correlation for an Advanced Fuel Bundle

Authors: L. Q. Yuan, J. Yang, A. Siddiqui

Abstract:

A new regression uncertainty analysis methodology was applied to determine the uncertainties of the critical heat flux (CHF) power correlation for an advanced 43-element bundle design, which was developed by Canadian Nuclear Laboratories (CNL) to achieve improved economics, resource utilization and energy sustainability. The new methodology is considered more appropriate than the traditional methodology in the assessment of the experimental uncertainty associated with regressions. The methodology was first assessed using both the Monte Carlo Method (MCM) and the Taylor Series Method (TSM) for a simple linear regression model, and then extended successfully to a non-linear CHF power regression model (CHF power as a function of inlet temperature, outlet pressure and mass flow rate). The regression uncertainty assessed by MCM agrees well with that by TSM. An equation to evaluate the CHF power regression uncertainty was developed and expressed as a function of independent variables that determine the CHF power.

Keywords: CHF experiment, CHF correlation, regression uncertainty, Monte Carlo Method, Taylor Series Method

Procedia PDF Downloads 387
18135 Internet Purchases in European Union Countries: Multiple Linear Regression Approach

Authors: Ksenija Dumičić, Anita Čeh Časni, Irena Palić

Abstract:

This paper examines economic and Information and Communication Technology (ICT) development influence on recently increasing Internet purchases by individuals for European Union member states. After a growing trend for Internet purchases in EU27 was noticed, all possible regression analysis was applied using nine independent variables in 2011. Finally, two linear regression models were studied in detail. Conducted simple linear regression analysis confirmed the research hypothesis that the Internet purchases in analysed EU countries is positively correlated with statistically significant variable Gross Domestic Product per capita (GDPpc). Also, analysed multiple linear regression model with four regressors, showing ICT development level, indicates that ICT development is crucial for explaining the Internet purchases by individuals, confirming the research hypothesis.

Keywords: European union, Internet purchases, multiple linear regression model, outlier

Procedia PDF Downloads 274
18134 Formulating a Flexible-Spread Fuzzy Regression Model Based on Dissemblance Index

Authors: Shih-Pin Chen, Shih-Syuan You

Abstract:

This study proposes a regression model with flexible spreads for fuzzy input-output data to cope with the situation that the existing measures cannot reflect the actual estimation error. The main idea is that a dissemblance index (DI) is carefully identified and defined for precisely measuring the actual estimation error. Moreover, the graded mean integration (GMI) representation is adopted for determining more representative numeric regression coefficients. Notably, to comprehensively compare the performance of the proposed model with other ones, three different criteria are adopted. The results from commonly used test numerical examples and an application to Taiwan's business monitoring indicator illustrate that the proposed dissemblance index method not only produces valid fuzzy regression models for fuzzy input-output data, but also has satisfactory and stable performance in terms of the total estimation error based on these three criteria.

Keywords: dissemblance index, forecasting, fuzzy sets, linear regression

Procedia PDF Downloads 324
18133 Landslide Susceptibility Mapping: A Comparison between Logistic Regression and Multivariate Adaptive Regression Spline Models in the Municipality of Oudka, Northern of Morocco

Authors: S. Benchelha, H. C. Aoudjehane, M. Hakdaoui, R. El Hamdouni, H. Mansouri, T. Benchelha, M. Layelmam, M. Alaoui

Abstract:

The logistic regression (LR) and multivariate adaptive regression spline (MarSpline) are applied and verified for analysis of landslide susceptibility map in Oudka, Morocco, using geographical information system. From spatial database containing data such as landslide mapping, topography, soil, hydrology and lithology, the eight factors related to landslides such as elevation, slope, aspect, distance to streams, distance to road, distance to faults, lithology map and Normalized Difference Vegetation Index (NDVI) were calculated or extracted. Using these factors, landslide susceptibility indexes were calculated by the two mentioned methods. Before the calculation, this database was divided into two parts, the first for the formation of the model and the second for the validation. The results of the landslide susceptibility analysis were verified using success and prediction rates to evaluate the quality of these probabilistic models. The result of this verification was that the MarSpline model is the best model with a success rate (AUC = 0.963) and a prediction rate (AUC = 0.951) higher than the LR model (success rate AUC = 0.918, rate prediction AUC = 0.901).

Keywords: landslide susceptibility mapping, regression logistic, multivariate adaptive regression spline, Oudka, Taounate

Procedia PDF Downloads 158
18132 Regression Model Evaluation on Depth Camera Data for Gaze Estimation

Authors: James Purnama, Riri Fitri Sari

Abstract:

We investigate the machine learning algorithm selection problem in the term of a depth image based eye gaze estimation, with respect to its essential difficulty in reducing the number of required training samples and duration time of training. Statistics based prediction accuracy are increasingly used to assess and evaluate prediction or estimation in gaze estimation. This article evaluates Root Mean Squared Error (RMSE) and R-Squared statistical analysis to assess machine learning methods on depth camera data for gaze estimation. There are 4 machines learning methods have been evaluated: Random Forest Regression, Regression Tree, Support Vector Machine (SVM), and Linear Regression. The experiment results show that the Random Forest Regression has the lowest RMSE and the highest R-Squared, which means that it is the best among other methods.

Keywords: gaze estimation, gaze tracking, eye tracking, kinect, regression model, orange python

Procedia PDF Downloads 505
18131 Time Series Regression with Meta-Clusters

Authors: Monika Chuchro

Abstract:

This paper presents a preliminary attempt to apply classification of time series using meta-clusters in order to improve the quality of regression models. In this case, clustering was performed as a method to obtain a subgroups of time series data with normal distribution from inflow into waste water treatment plant data which Composed of several groups differing by mean value. Two simple algorithms: K-mean and EM were chosen as a clustering method. The rand index was used to measure the similarity. After simple meta-clustering, regression model was performed for each subgroups. The final model was a sum of subgroups models. The quality of obtained model was compared with the regression model made using the same explanatory variables but with no clustering of data. Results were compared by determination coefficient (R2), measure of prediction accuracy mean absolute percentage error (MAPE) and comparison on linear chart. Preliminary results allows to foresee the potential of the presented technique.

Keywords: clustering, data analysis, data mining, predictive models

Procedia PDF Downloads 432
18130 Adaptive Nonparametric Approach for Guaranteed Real-Time Detection of Targeted Signals in Multichannel Monitoring Systems

Authors: Andrey V. Timofeev

Abstract:

An adaptive nonparametric method is proposed for stable real-time detection of seismoacoustic sources in multichannel C-OTDR systems with a significant number of channels. This method guarantees given upper boundaries for probabilities of Type I and Type II errors. Properties of the proposed method are rigorously proved. The results of practical applications of the proposed method in a real C-OTDR-system are presented in this report.

Keywords: guaranteed detection, multichannel monitoring systems, change point, interval estimation, adaptive detection

Procedia PDF Downloads 417
18129 Robustified Asymmetric Logistic Regression Model for Global Fish Stock Assessment

Authors: Osamu Komori, Shinto Eguchi, Hiroshi Okamura, Momoko Ichinokawa

Abstract:

The long time-series data on population assessments are essential for global ecosystem assessment because the temporal change of biomass in such a database reflects the status of global ecosystem properly. However, the available assessment data usually have limited sample sizes and the ratio of populations with low abundance of biomass (collapsed) to those with high abundance (non-collapsed) is highly imbalanced. To allow for the imbalance and uncertainty involved in the ecological data, we propose a binary regression model with mixed effects for inferring ecosystem status through an asymmetric logistic model. In the estimation equation, we observe that the weights for the non-collapsed populations are relatively reduced, which in turn puts more importance on the small number of observations of collapsed populations. Moreover, we extend the asymmetric logistic regression model using propensity score to allow for the sample biases observed in the labeled and unlabeled datasets. It robustified the estimation procedure and improved the model fitting.

Keywords: double robust estimation, ecological binary data, mixed effect logistic regression model, propensity score

Procedia PDF Downloads 231
18128 Effect of Drying on the Concrete Structures

Authors: A. Brahma

Abstract:

The drying of hydraulics materials is unavoidable and conducted to important spontaneous deformations. In this study, we show that it is possible to describe the drying shrinkage of the high-performance concrete by a simple expression. A multiple regression model was developed for the prediction of the drying shrinkage of the high-performance concrete. The assessment of the proposed model has been done by a set of statistical tests. The model developed takes in consideration the main parameters of confection and conservation. There was a very good agreement between drying shrinkage predicted by the multiple regression model and experimental results. The developed model adjusts easily to all hydraulic concrete types.

Keywords: hydraulic concretes, drying, shrinkage, prediction, modeling

Procedia PDF Downloads 337
18127 Credit Risk Prediction Based on Bayesian Estimation of Logistic Regression Model with Random Effects

Authors: Sami Mestiri, Abdeljelil Farhat

Abstract:

The aim of this current paper is to predict the credit risk of banks in Tunisia, over the period (2000-2005). For this purpose, two methods for the estimation of the logistic regression model with random effects: Penalized Quasi Likelihood (PQL) method and Gibbs Sampler algorithm are applied. By using the information on a sample of 528 Tunisian firms and 26 financial ratios, we show that Bayesian approach improves the quality of model predictions in terms of good classification as well as by the ROC curve result.

Keywords: forecasting, credit risk, Penalized Quasi Likelihood, Gibbs Sampler, logistic regression with random effects, curve ROC

Procedia PDF Downloads 507
18126 A Brief Study about Nonparametric Adherence Tests

Authors: Vinicius R. Domingues, Luan C. S. M. Ozelim

Abstract:

The statistical study has become indispensable for various fields of knowledge. Not any different, in Geotechnics the study of probabilistic and statistical methods has gained power considering its use in characterizing the uncertainties inherent in soil properties. One of the situations where engineers are constantly faced is the definition of a probability distribution that represents significantly the sampled data. To be able to discard bad distributions, goodness-of-fit tests are necessary. In this paper, three non-parametric goodness-of-fit tests are applied to a data set computationally generated to test the goodness-of-fit of them to a series of known distributions. It is shown that the use of normal distribution does not always provide satisfactory results regarding physical and behavioral representation of the modeled parameters.

Keywords: Kolmogorov-Smirnov test, Anderson-Darling test, Cramer-Von-Mises test, nonparametric adherence tests

Procedia PDF Downloads 410
18125 A Fuzzy Nonlinear Regression Model for Interval Type-2 Fuzzy Sets

Authors: O. Poleshchuk, E. Komarov

Abstract:

This paper presents a regression model for interval type-2 fuzzy sets based on the least squares estimation technique. Unknown coefficients are assumed to be triangular fuzzy numbers. The basic idea is to determine aggregation intervals for type-1 fuzzy sets, membership functions of whose are low membership function and upper membership function of interval type-2 fuzzy set. These aggregation intervals were called weighted intervals. Low and upper membership functions of input and output interval type-2 fuzzy sets for developed regression models are considered as piecewise linear functions.

Keywords: interval type-2 fuzzy sets, fuzzy regression, weighted interval

Procedia PDF Downloads 332
18124 Efficient Model Selection in Linear and Non-Linear Quantile Regression by Cross-Validation

Authors: Yoonsuh Jung, Steven N. MacEachern

Abstract:

Check loss function is used to define quantile regression. In the prospect of cross validation, it is also employed as a validation function when underlying truth is unknown. However, our empirical study indicates that the validation with check loss often leads to choosing an over estimated fits. In this work, we suggest a modified or L2-adjusted check loss which rounds the sharp corner in the middle of check loss. It has a large effect of guarding against over fitted model in some extent. Through various simulation settings of linear and non-linear regressions, the improvement of check loss by L2 adjustment is empirically examined. This adjustment is devised to shrink to zero as sample size grows.

Keywords: cross-validation, model selection, quantile regression, tuning parameter selection

Procedia PDF Downloads 406
18123 Arabic Character Recognition Using Regression Curves with the Expectation Maximization Algorithm

Authors: Abdullah A. AlShaher

Abstract:

In this paper, we demonstrate how regression curves can be used to recognize 2D non-rigid handwritten shapes. Each shape is represented by a set of non-overlapping uniformly distributed landmarks. The underlying models utilize 2nd order of polynomials to model shapes within a training set. To estimate the regression models, we need to extract the required coefficients which describe the variations for a set of shape class. Hence, a least square method is used to estimate such modes. We then proceed by training these coefficients using the apparatus Expectation Maximization algorithm. Recognition is carried out by finding the least error landmarks displacement with respect to the model curves. Handwritten isolated Arabic characters are used to evaluate our approach.

Keywords: character recognition, regression curves, handwritten Arabic letters, expectation maximization algorithm

Procedia PDF Downloads 111
18122 An Overbooking Model for Car Rental Service with Different Types of Cars

Authors: Naragain Phumchusri, Kittitach Pongpairoj

Abstract:

Overbooking is a very useful revenue management technique that could help reduce costs caused by either undersales or oversales. In this paper, we propose an overbooking model for two types of cars that can minimize the total cost for car rental service. With two types of cars, there is an upgrade possibility for lower type to upper type. This makes the model more complex than one type of cars scenario. We have found that convexity can be proved in this case. Sensitivity analysis of the parameters is conducted to observe the effects of relevant parameters on the optimal solution. Model simplification is proposed using multiple linear regression analysis, which can help estimate the optimal overbooking level using appropriate independent variables. The results show that the overbooking level from multiple linear regression model is relatively close to the optimal solution (with the adjusted R-squared value of at least 72.8%). To evaluate the performance of the proposed model, the total cost was compared with the case where the decision maker uses a naïve method for the overbooking level. It was found that the total cost from optimal solution is only 0.5 to 1 percent (on average) lower than the cost from regression model, while it is approximately 67% lower than the cost obtained by the naïve method. It indicates that our proposed simplification method using regression analysis can effectively perform in estimating the overbooking level.

Keywords: overbooking, car rental industry, revenue management, stochastic model

Procedia PDF Downloads 140
18121 Estimate of Maximum Expected Intensity of One-Half-Wave Lines Dancing

Authors: A. Bekbaev, M. Dzhamanbaev, R. Abitaeva, A. Karbozova, G. Nabyeva

Abstract:

In this paper, the regression dependence of dancing intensity from wind speed and length of span was established due to the statistic data obtained from multi-year observations on line wires dancing accumulated by power systems of Kazakhstan and the Russian Federation. The lower and upper limitations of the equations parameters were estimated, as well as the adequacy of the regression model. The constructed model will be used in research of dancing phenomena for the development of methods and means of protection against dancing and for zoning plan of the territories of line wire dancing.

Keywords: power lines, line wire dancing, dancing intensity, regression equation, dancing area intensity

Procedia PDF Downloads 282
18120 Predicting Survival in Cancer: How Cox Regression Model Compares to Artifial Neural Networks?

Authors: Dalia Rimawi, Walid Salameh, Amal Al-Omari, Hadeel AbdelKhaleq

Abstract:

Predication of Survival time of patients with cancer, is a core factor that influences oncologist decisions in different aspects; such as offered treatment plans, patients’ quality of life and medications development. For a long time proportional hazards Cox regression (ph. Cox) was and still the most well-known statistical method to predict survival outcome. But due to the revolution of data sciences; new predication models were employed and proved to be more flexible and provided higher accuracy in that type of studies. Artificial neural network is one of those models that is suitable to handle time to event predication. In this study we aim to compare ph Cox regression with artificial neural network method according to data handling and Accuracy of each model.

Keywords: Cox regression, neural networks, survival, cancer.

Procedia PDF Downloads 152
18119 Survival and Hazard Maximum Likelihood Estimator with Covariate Based on Right Censored Data of Weibull Distribution

Authors: Al Omari Mohammed Ahmed

Abstract:

This paper focuses on Maximum Likelihood Estimator with Covariate. Covariates are incorporated into the Weibull model. Under this regression model with regards to maximum likelihood estimator, the parameters of the covariate, shape parameter, survival function and hazard rate of the Weibull regression distribution with right censored data are estimated. The mean square error (MSE) and absolute bias are used to compare the performance of Weibull regression distribution. For the simulation comparison, the study used various sample sizes and several specific values of the Weibull shape parameter.

Keywords: weibull regression distribution, maximum likelihood estimator, survival function, hazard rate, right censoring

Procedia PDF Downloads 410
18118 Analyzing the Influence of Hydrometeorlogical Extremes, Geological Setting, and Social Demographic on Public Health

Authors: Irfan Ahmad Afip

Abstract:

This main research objective is to accurately identify the possibility for a Leptospirosis outbreak severity of a certain area based on its input features into a multivariate regression model. The research question is the possibility of an outbreak in a specific area being influenced by this feature, such as social demographics and hydrometeorological extremes. If the occurrence of an outbreak is being subjected to these features, then the epidemic severity for an area will be different depending on its environmental setting because the features will influence the possibility and severity of an outbreak. Specifically, this research objective was three-fold, namely: (a) to identify the relevant multivariate features and visualize the patterns data, (b) to develop a multivariate regression model based from the selected features and determine the possibility for Leptospirosis outbreak in an area, and (c) to compare the predictive ability of multivariate regression model and machine learning algorithms. Several secondary data features were collected locations in the state of Negeri Sembilan, Malaysia, based on the possibility it would be relevant to determine the outbreak severity in the area. The relevant features then will become an input in a multivariate regression model; a linear regression model is a simple and quick solution for creating prognostic capabilities. A multivariate regression model has proven more precise prognostic capabilities than univariate models. The expected outcome from this research is to establish a correlation between the features of social demographic and hydrometeorological with Leptospirosis bacteria; it will also become a contributor for understanding the underlying relationship between the pathogen and the ecosystem. The relationship established can be beneficial for the health department or urban planner to inspect and prepare for future outcomes in event detection and system health monitoring.

Keywords: geographical information system, hydrometeorological, leptospirosis, multivariate regression

Procedia PDF Downloads 76
18117 Modeling Standpipe Pressure Using Multivariable Regression Analysis by Combining Drilling Parameters and a Herschel-Bulkley Model

Authors: Seydou Sinde

Abstract:

The aims of this paper are to formulate mathematical expressions that can be used to estimate the standpipe pressure (SPP). The developed formulas take into account the main factors that, directly or indirectly, affect the behavior of SPP values. Fluid rheology and well hydraulics are some of these essential factors. Mud Plastic viscosity, yield point, flow power, consistency index, flow rate, drillstring, and annular geometries are represented by the frictional pressure (Pf), which is one of the input independent parameters and is calculated, in this paper, using Herschel-Bulkley rheological model. Other input independent parameters include the rate of penetration (ROP), applied load or weight on the bit (WOB), bit revolutions per minute (RPM), bit torque (TRQ), and hole inclination and direction coupled in the hole curvature or dogleg (DL). The technique of repeating parameters and Buckingham PI theorem are used to reduce the number of the input independent parameters into the dimensionless revolutions per minute (RPMd), the dimensionless torque (TRQd), and the dogleg, which is already in the dimensionless form of radians. Multivariable linear and polynomial regression technique using PTC Mathcad Prime 4.0 is used to analyze and determine the exact relationships between the dependent parameter, which is SPP, and the remaining three dimensionless groups. Three models proved sufficiently satisfactory to estimate the standpipe pressure: multivariable linear regression model 1 containing three regression coefficients for vertical wells; multivariable linear regression model 2 containing four regression coefficients for deviated wells; and multivariable polynomial quadratic regression model containing six regression coefficients for both vertical and deviated wells. Although that the linear regression model 2 (with four coefficients) is relatively more complex and contains an additional term over the linear regression model 1 (with three coefficients), the former did not really add significant improvements to the later except for some minor values. Thus, the effect of the hole curvature or dogleg is insignificant and can be omitted from the input independent parameters without significant losses of accuracy. The polynomial quadratic regression model is considered the most accurate model due to its relatively higher accuracy for most of the cases. Data of nine wells from the Middle East were used to run the developed models with satisfactory results provided by all of them, even if the multivariable polynomial quadratic regression model gave the best and most accurate results. Development of these models is useful not only to monitor and predict, with accuracy, the values of SPP but also to early control and check for the integrity of the well hydraulics as well as to take the corrective actions should any unexpected problems appear, such as pipe washouts, jet plugging, excessive mud losses, fluid gains, kicks, etc.

Keywords: standpipe, pressure, hydraulics, nondimensionalization, parameters, regression

Procedia PDF Downloads 54
18116 Urban Energy Demand Modelling: Spatial Analysis Approach

Authors: Hung-Chu Chen, Han Qi, Bauke de Vries

Abstract:

Energy consumption in the urban environment has attracted numerous researches in recent decades. However, it is comparatively rare to find literary works which investigated 3D spatial analysis of urban energy demand modelling. In order to analyze the spatial correlation between urban morphology and energy demand comprehensively, this paper investigates their relation by using the spatial regression tool. In addition, the spatial regression tool which is applied in this paper is ordinary least squares regression (OLS) and geographically weighted regression (GWR) model. Normalized Difference Built-up Index (NDBI), Normalized Difference Vegetation Index (NDVI), and building volume are explainers of urban morphology, which act as independent variables of Energy-land use (E-L) model. NDBI and NDVI are used as the index to describe five types of land use: urban area (U), open space (O), artificial green area (G), natural green area (V), and water body (W). Accordingly, annual electricity, gas demand and energy demand are dependent variables of the E-L model. Based on the analytical result of E-L model relation, it revealed that energy demand and urban morphology are closely connected and the possible causes and practical use are discussed. Besides, the spatial analysis methods of OLS and GWR are compared.

Keywords: energy demand model, geographically weighted regression, normalized difference built-up index, normalized difference vegetation index, spatial statistics

Procedia PDF Downloads 117
18115 The Theory behind Logistic Regression

Authors: Jan Henrik Wosnitza

Abstract:

The logistic regression has developed into a standard approach for estimating conditional probabilities in a wide range of applications including credit risk prediction. The article at hand contributes to the current literature on logistic regression fourfold: First, it is demonstrated that the binary logistic regression automatically meets its model assumptions under very general conditions. This result explains, at least in part, the logistic regression's popularity. Second, the requirement of homoscedasticity in the context of binary logistic regression is theoretically substantiated. The variances among the groups of defaulted and non-defaulted obligors have to be the same across the level of the aggregated default indicators in order to achieve linear logits. Third, this article sheds some light on the question why nonlinear logits might be superior to linear logits in case of a small amount of data. Fourth, an innovative methodology for estimating correlations between obligor-specific log-odds is proposed. In order to crystallize the key ideas, this paper focuses on the example of credit risk prediction. However, the results presented in this paper can easily be transferred to any other field of application.

Keywords: correlation, credit risk estimation, default correlation, homoscedasticity, logistic regression, nonlinear logistic regression

Procedia PDF Downloads 389
18114 Neural Network Modelling for Turkey Railway Load Carrying Demand

Authors: Humeyra Bolakar Tosun

Abstract:

The transport sector has an undisputed place in human life. People need transport access to continuous increase day by day with growing population. The number of rail network, urban transport planning, infrastructure improvements, transportation management and other related areas is a key factor affecting our country made it quite necessary to improve the work of transportation. In this context, it plays an important role in domestic rail freight demand planning. Alternatives that the increase in the transportation field and has made it mandatory requirements such as the demand for improving transport quality. In this study generally is known and used in studies by the definition, rail freight transport, railway line length, population, energy consumption. In this study, Iron Road Load Net Demand was modeled by multiple regression and ANN methods. In this study, model dependent variable (Output) is Iron Road Load Net demand and 6 entries variable was determined. These outcome values extracted from the model using ANN and regression model results. In the regression model, some parameters are considered as determinative parameters, and the coefficients of the determinants give meaningful results. As a result, ANN model has been shown to be more successful than traditional regression model.

Keywords: railway load carrying, neural network, modelling transport, transportation

Procedia PDF Downloads 117
18113 Ordinal Regression with Fenton-Wilkinson Order Statistics: A Case Study of an Orienteering Race

Authors: Joonas Pääkkönen

Abstract:

In sports, individuals and teams are typically interested in final rankings. Final results, such as times or distances, dictate these rankings, also known as places. Places can be further associated with ordered random variables, commonly referred to as order statistics. In this work, we introduce a simple, yet accurate order statistical ordinal regression function that predicts relay race places with changeover-times. We call this function the Fenton-Wilkinson Order Statistics model. This model is built on the following educated assumption: individual leg-times follow log-normal distributions. Moreover, our key idea is to utilize Fenton-Wilkinson approximations of changeover-times alongside an estimator for the total number of teams as in the notorious German tank problem. This original place regression function is sigmoidal and thus correctly predicts the existence of a small number of elite teams that significantly outperform the rest of the teams. Our model also describes how place increases linearly with changeover-time at the inflection point of the log-normal distribution function. With real-world data from Jukola 2019, a massive orienteering relay race, the model is shown to be highly accurate even when the size of the training set is only 5% of the whole data set. Numerical results also show that our model exhibits smaller place prediction root-mean-square-errors than linear regression, mord regression and Gaussian process regression.

Keywords: Fenton-Wilkinson approximation, German tank problem, log-normal distribution, order statistics, ordinal regression, orienteering, sports analytics, sports modeling

Procedia PDF Downloads 94