Search results for: geographically-weighted regression
3131 The Strengths and Limitations of the Statistical Modeling of Complex Social Phenomenon: Focusing on SEM, Path Analysis, or Multiple Regression Models
Authors: Jihye Jeon
Abstract:
This paper analyzes the conceptual framework of three statistical methods, multiple regression, path analysis, and structural equation models. When establishing research model of the statistical modeling of complex social phenomenon, it is important to know the strengths and limitations of three statistical models. This study explored the character, strength, and limitation of each modeling and suggested some strategies for accurate explaining or predicting the causal relationships among variables. Especially, on the studying of depression or mental health, the common mistakes of research modeling were discussed.Keywords: multiple regression, path analysis, structural equation models, statistical modeling, social and psychological phenomenon
Procedia PDF Downloads 6533130 QSRR Analysis of 17-Picolyl and 17-Picolinylidene Androstane Derivatives Based on Partial Least Squares and Principal Component Regression
Authors: Sanja Podunavac-Kuzmanović, Strahinja Kovačević, Lidija Jevrić, Evgenija Djurendić, Jovana Ajduković
Abstract:
There are several methods for determination of the lipophilicity of biologically active compounds, however chromatography has been shown as a very suitable method for this purpose. Chromatographic (C18-RP-HPLC) analysis of a series of 24 17-picolyl and 17-picolinylidene androstane derivatives was carried out. The obtained retention indices (logk, methanol (90%) / water (10%)) were correlated with calculated physicochemical and lipophilicity descriptors. The QSRR analysis was carried out applying principal component regression (PCR) and partial least squares regression (PLS). The PCR and PLS model were selected on the basis of the highest variance and the lowest root mean square error of cross-validation. The obtained PCR and PLS model successfully correlate the calculated molecular descriptors with logk parameter indicating the significance of the lipophilicity of compounds in chromatographic process. On the basis of the obtained results it can be concluded that the obtained logk parameters of the analyzed androstane derivatives can be considered as their chromatographic lipophilicity. These results are the part of the project No. 114-451-347/2015-02, financially supported by the Provincial Secretariat for Science and Technological Development of Vojvodina and CMST COST Action CM1105.Keywords: androstane derivatives, chromatography, molecular structure, principal component regression, partial least squares regression
Procedia PDF Downloads 2763129 Detecting Earnings Management via Statistical and Neural Networks Techniques
Authors: Mohammad Namazi, Mohammad Sadeghzadeh Maharluie
Abstract:
Predicting earnings management is vital for the capital market participants, financial analysts and managers. The aim of this research is attempting to respond to this query: Is there a significant difference between the regression model and neural networks’ models in predicting earnings management, and which one leads to a superior prediction of it? In approaching this question, a Linear Regression (LR) model was compared with two neural networks including Multi-Layer Perceptron (MLP), and Generalized Regression Neural Network (GRNN). The population of this study includes 94 listed companies in Tehran Stock Exchange (TSE) market from 2003 to 2011. After the results of all models were acquired, ANOVA was exerted to test the hypotheses. In general, the summary of statistical results showed that the precision of GRNN did not exhibit a significant difference in comparison with MLP. In addition, the mean square error of the MLP and GRNN showed a significant difference with the multi variable LR model. These findings support the notion of nonlinear behavior of the earnings management. Therefore, it is more appropriate for capital market participants to analyze earnings management based upon neural networks techniques, and not to adopt linear regression models.Keywords: earnings management, generalized linear regression, neural networks multi-layer perceptron, Tehran stock exchange
Procedia PDF Downloads 4213128 Minimizing the Impact of Covariate Detection Limit in Logistic Regression
Authors: Shahadut Hossain, Jacek Wesolowski, Zahirul Hoque
Abstract:
In many epidemiological and environmental studies covariate measurements are subject to the detection limit. In most applications, covariate measurements are usually truncated from below which is known as left-truncation. Because the measuring device, which we use to measure the covariate, fails to detect values falling below the certain threshold. In regression analyses, it causes inflated bias and inaccurate mean squared error (MSE) to the estimators. This paper suggests a response-based regression calibration method to correct the deleterious impact introduced by the covariate detection limit in the estimators of the parameters of simple logistic regression model. Compared to the maximum likelihood method, the proposed method is computationally simpler, and hence easier to implement. It is robust to the violation of distributional assumption about the covariate of interest. In producing correct inference, the performance of the proposed method compared to the other competing methods has been investigated through extensive simulations. A real-life application of the method is also shown using data from a population-based case-control study of non-Hodgkin lymphoma.Keywords: environmental exposure, detection limit, left truncation, bias, ad-hoc substitution
Procedia PDF Downloads 2363127 Comparative Study od Three Artificial Intelligence Techniques for Rain Domain in Precipitation Forecast
Authors: Nabilah Filzah Mohd Radzuan, Andi Putra, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan
Abstract:
Precipitation forecast is important to avoid natural disaster incident which can cause losses in the involved area. This paper reviews three techniques logistic regression, decision tree, and random forest which are used in making precipitation forecast. These combination techniques through the vector auto-regression (VAR) model help in finding the advantages and strengths of each technique in the forecast process. The data-set contains variables of the rain’s domain. Adaptation of artificial intelligence techniques involved in rain domain enables the forecast process to be easier and systematic for precipitation forecast.Keywords: logistic regression, decisions tree, random forest, VAR model
Procedia PDF Downloads 4463126 A Study of User Awareness and Attitudes Towards Civil-ID Authentication in Oman’s Electronic Services
Authors: Raya Al Khayari, Rasha Al Jassim, Muna Al Balushi, Fatma Al Moqbali, Said El Hajjar
Abstract:
This study utilizes linear regression analysis to investigate the correlation between user account passwords and the probability of civil ID exposure, offering statistical insights into civil ID security. The study employs multiple linear regression (MLR) analysis to further investigate the elements that influence consumers’ views of civil ID security. This aims to increase awareness and improve preventive measures. The results obtained from the MLR analysis provide a thorough comprehension and can guide specific educational and awareness campaigns aimed at promoting improved security procedures. In summary, the study’s results offer significant insights for improving existing security measures and developing more efficient tactics to reduce risks related to civil ID security in Oman. By identifying key factors that impact consumers’ perceptions, organizations can tailor their strategies to address vulnerabilities effectively. Additionally, the findings can inform policymakers on potential regulatory changes to enhance civil ID security in the country.Keywords: civil-id disclosure, awareness, linear regression, multiple regression
Procedia PDF Downloads 573125 A Research on Inference from Multiple Distance Variables in Hedonic Regression Focus on Three Variables
Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro
Abstract:
In urban context, urban nodes such as amenity or hazard will certainly affect house price, while classic hedonic analysis will employ distance variables measured from each urban nodes. However, effects from distances to facilities on house prices generally do not represent the true price of the property. Distance variables measured on the same surface are suffering a problem called multicollinearity, which is usually presented as magnitude variance and mean value in regression, errors caused by instability. In this paper, we provided a theoretical framework to identify and gather the data with less bias, and also provided specific sampling method on locating the sample region to avoid the spatial multicollinerity problem in three distance variable’s case.Keywords: hedonic regression, urban node, distance variables, multicollinerity, collinearity
Procedia PDF Downloads 4653124 Policy Implications of Demographic Impacts on COVID-19, Pneumonia, and Influenza Mortality: A Multivariable Regression Approach to Death Toll Reduction
Authors: Saiakhil Chilaka
Abstract:
Understanding the demographic factors that influence mortality from respiratory diseases like COVID-19, pneumonia, and influenza is crucial for informing public health policy. This study utilizes multivariable regression models to assess the relationship between state, sex, and age group on deaths from these diseases using U.S. data from 2020 to 2023. The analysis reveals that age and sex play significant roles in mortality, while state-level variations are minimal. Although the model’s low R-squared values indicate that additional factors are at play, this paper discusses how these findings, in light of recent research, can inform future public health policy, resource allocation, and intervention strategies.Keywords: COVID-19, multivariable regression, public policy, data science
Procedia PDF Downloads 213123 Urban Energy Demand Modelling: Spatial Analysis Approach
Authors: Hung-Chu Chen, Han Qi, Bauke de Vries
Abstract:
Energy consumption in the urban environment has attracted numerous researches in recent decades. However, it is comparatively rare to find literary works which investigated 3D spatial analysis of urban energy demand modelling. In order to analyze the spatial correlation between urban morphology and energy demand comprehensively, this paper investigates their relation by using the spatial regression tool. In addition, the spatial regression tool which is applied in this paper is ordinary least squares regression (OLS) and geographically weighted regression (GWR) model. Normalized Difference Built-up Index (NDBI), Normalized Difference Vegetation Index (NDVI), and building volume are explainers of urban morphology, which act as independent variables of Energy-land use (E-L) model. NDBI and NDVI are used as the index to describe five types of land use: urban area (U), open space (O), artificial green area (G), natural green area (V), and water body (W). Accordingly, annual electricity, gas demand and energy demand are dependent variables of the E-L model. Based on the analytical result of E-L model relation, it revealed that energy demand and urban morphology are closely connected and the possible causes and practical use are discussed. Besides, the spatial analysis methods of OLS and GWR are compared.Keywords: energy demand model, geographically weighted regression, normalized difference built-up index, normalized difference vegetation index, spatial statistics
Procedia PDF Downloads 1483122 Modeling Aeration of Sharp Crested Weirs by Using Support Vector Machines
Authors: Arun Goel
Abstract:
The present paper attempts to investigate the prediction of air entrainment rate and aeration efficiency of a free over-fall jets issuing from a triangular sharp crested weir by using regression based modelling. The empirical equations, support vector machine (polynomial and radial basis function) models and the linear regression techniques were applied on the triangular sharp crested weirs relating the air entrainment rate and the aeration efficiency to the input parameters namely drop height, discharge, and vertex angle. It was observed that there exists a good agreement between the measured values and the values obtained using empirical equations, support vector machine (Polynomial and rbf) models, and the linear regression techniques. The test results demonstrated that the SVM based (Poly & rbf) model also provided acceptable prediction of the measured values with reasonable accuracy along with empirical equations and linear regression techniques in modelling the air entrainment rate and the aeration efficiency of a free over-fall jets issuing from triangular sharp crested weir. Further sensitivity analysis has also been performed to study the impact of input parameter on the output in terms of air entrainment rate and aeration efficiency.Keywords: air entrainment rate, dissolved oxygen, weir, SVM, regression
Procedia PDF Downloads 4363121 Use of Regression Analysis in Determining the Length of Plastic Hinge in Reinforced Concrete Columns
Authors: Mehmet Alpaslan Köroğlu, Musa Hakan Arslan, Muslu Kazım Körez
Abstract:
Basic objective of this study is to create a regression analysis method that can estimate the length of a plastic hinge which is an important design parameter, by making use of the outcomes of (lateral load-lateral displacement hysteretic curves) the experimental studies conducted for the reinforced square concrete columns. For this aim, 170 different square reinforced concrete column tests results have been collected from the existing literature. The parameters which are thought affecting the plastic hinge length such as cross-section properties, features of material used, axial loading level, confinement of the column, longitudinal reinforcement bars in the columns etc. have been obtained from these 170 different square reinforced concrete column tests. In the study, when determining the length of plastic hinge, using the experimental test results, a regression analysis have been separately tested and compared with each other. In addition, the outcome of mentioned methods on determination of plastic hinge length of the reinforced concrete columns has been compared to other methods available in the literature.Keywords: columns, plastic hinge length, regression analysis, reinforced concrete
Procedia PDF Downloads 4793120 Measurement Errors and Misclassifications in Covariates in Logistic Regression: Bayesian Adjustment of Main and Interaction Effects and the Sample Size Implications
Authors: Shahadut Hossain
Abstract:
Measurement errors in continuous covariates and/or misclassifications in categorical covariates are common in epidemiological studies. Regression analysis ignoring such mismeasurements seriously biases the estimated main and interaction effects of covariates on the outcome of interest. Thus, adjustments for such mismeasurements are necessary. In this research, we propose a Bayesian parametric framework for eliminating deleterious impacts of covariate mismeasurements in logistic regression. The proposed adjustment method is unified and thus can be applied to any generalized linear and non-linear regression models. Furthermore, adjustment for covariate mismeasurements requires validation data usually in the form of either gold standard measurements or replicates of the mismeasured covariates on a subset of the study population. Initial investigation shows that adequacy of such adjustment depends on the sizes of main and validation samples, especially when prevalences of the categorical covariates are low. Thus, we investigate the impact of main and validation sample sizes on the adjusted estimates, and provide a general guideline about these sample sizes based on simulation studies.Keywords: measurement errors, misclassification, mismeasurement, validation sample, Bayesian adjustment
Procedia PDF Downloads 4083119 Quantitative Structure-Activity Relationship Study of Some Quinoline Derivatives as Antimalarial Agents
Authors: M. Ouassaf, S. Belaid
Abstract:
A series of quinoline derivatives with antimalarial activity were subjected to two-dimensional quantitative structure-activity relationship (2D-QSAR) studies. Three models were implemented using multiple regression linear MLR, a regression partial least squares (PLS), nonlinear regression (MNLR), to see which descriptors are closely related to the activity biologic. We relied on a principal component analysis (PCA). Based on our results, a comparison of the quality of, MLR, PLS, and MNLR models shows that the MNLR (R = 0.914 and R² = 0.835, RCV= 0.853) models have substantially better predictive capability because the MNLR approach gives better results than MLR (R = 0.835 and R² = 0,752, RCV=0.601)), PLS (R = 0.742 and R² = 0.552, RCV=0.550) The model of MNLR gave statistically significant results and showed good stability to data variation in leave-one-out cross-validation. The obtained results suggested that our proposed model MNLR may be useful to predict the biological activity of derivatives of quinoline.Keywords: antimalarial, quinoline, QSAR, PCA, MLR , MNLR, MLR
Procedia PDF Downloads 1563118 Agile Software Effort Estimation Using Regression Techniques
Authors: Mikiyas Adugna
Abstract:
Effort estimation is among the activities carried out in software development processes. An accurate model of estimation leads to project success. The method of agile effort estimation is a complex task because of the dynamic nature of software development. Researchers are still conducting studies on agile effort estimation to enhance prediction accuracy. Due to these reasons, we investigated and proposed a model on LASSO and Elastic Net regression to enhance estimation accuracy. The proposed model has major components: preprocessing, train-test split, training with default parameters, and cross-validation. During the preprocessing phase, the entire dataset is normalized. After normalization, a train-test split is performed on the dataset, setting training at 80% and testing set to 20%. We chose two different phases for training the two algorithms (Elastic Net and LASSO) regression following the train-test-split. In the first phase, the two algorithms are trained using their default parameters and evaluated on the testing data. In the second phase, the grid search technique (the grid is used to search for tuning and select optimum parameters) and 5-fold cross-validation to get the final trained model. Finally, the final trained model is evaluated using the testing set. The experimental work is applied to the agile story point dataset of 21 software projects collected from six firms. The results show that both Elastic Net and LASSO regression outperformed the compared ones. Compared to the proposed algorithms, LASSO regression achieved better predictive performance and has acquired PRED (8%) and PRED (25%) results of 100.0, MMRE of 0.0491, MMER of 0.0551, MdMRE of 0.0593, MdMER of 0.063, and MSE of 0.0007. The result implies LASSO regression algorithm trained model is the most acceptable, and higher estimation performance exists in the literature.Keywords: agile software development, effort estimation, elastic net regression, LASSO
Procedia PDF Downloads 713117 Robustified Asymmetric Logistic Regression Model for Global Fish Stock Assessment
Authors: Osamu Komori, Shinto Eguchi, Hiroshi Okamura, Momoko Ichinokawa
Abstract:
The long time-series data on population assessments are essential for global ecosystem assessment because the temporal change of biomass in such a database reflects the status of global ecosystem properly. However, the available assessment data usually have limited sample sizes and the ratio of populations with low abundance of biomass (collapsed) to those with high abundance (non-collapsed) is highly imbalanced. To allow for the imbalance and uncertainty involved in the ecological data, we propose a binary regression model with mixed effects for inferring ecosystem status through an asymmetric logistic model. In the estimation equation, we observe that the weights for the non-collapsed populations are relatively reduced, which in turn puts more importance on the small number of observations of collapsed populations. Moreover, we extend the asymmetric logistic regression model using propensity score to allow for the sample biases observed in the labeled and unlabeled datasets. It robustified the estimation procedure and improved the model fitting.Keywords: double robust estimation, ecological binary data, mixed effect logistic regression model, propensity score
Procedia PDF Downloads 2663116 Urban-Rural Inequality in Mexico after Nafta: A Quantile Regression Analysis
Authors: Rene Valdiviezo-Issa
Abstract:
In this paper, we use Mexico’s Households Income and Expenditures (ENIGH) survey to explain the behaviour that the urban-rural expenditure gap has had since Mexico’s incorporation to the North American Free Trade Agreement (NAFTA) in 1994 and we compare it with the latest available survey, which took place in 2014. We use real trimestral expenditure per capita (RTEPC) as the measure of welfare. We use quantile regressions and a quantile regression decomposition to describe the gap between urban and rural distributions of log RTEPC. We discover that the decrease in the difference between the urban and rural distributions of log RTEPC, or inequality, is motivated because of a deprivation of the urban areas, in very specific characteristics, rather than an improvement of the urban areas. When using the decomposition we observe that the gap is primarily brought about because differences in returns to covariates between the urban and rural areas.Keywords: quantile regression, urban-rural inequality, inequality in Mexico, income decompositon
Procedia PDF Downloads 2823115 Developing Variable Repetitive Group Sampling Control Chart Using Regression Estimator
Authors: Liaquat Ahmad, Muhammad Aslam, Muhammad Azam
Abstract:
In this article, we propose a control chart based on repetitive group sampling scheme for the location parameter. This charting scheme is based on the regression estimator; an estimator that capitalize the relationship between the variables of interest to provide more sensitive control than the commonly used individual variables. The control limit coefficients have been estimated for different sample sizes for less and highly correlated variables. The monitoring of the production process is constructed by adopting the procedure of the Shewhart’s x-bar control chart. Its performance is verified by the average run length calculations when the shift occurs in the average value of the estimator. It has been observed that the less correlated variables have rapid false alarm rate.Keywords: average run length, control charts, process shift, regression estimators, repetitive group sampling
Procedia PDF Downloads 5653114 BART Matching Method: Using Bayesian Additive Regression Tree for Data Matching
Authors: Gianna Zou
Abstract:
Propensity score matching (PSM), introduced by Paul R. Rosenbaum and Donald Rubin in 1983, is a popular statistical matching technique which tries to estimate the treatment effects by taking into account covariates that could impact the efficacy of study medication in clinical trials. PSM can be used to reduce the bias due to confounding variables. However, PSM assumes that the response values are normally distributed. In some cases, this assumption may not be held. In this paper, a machine learning method - Bayesian Additive Regression Tree (BART), is used as a more robust method of matching. BART can work well when models are misspecified since it can be used to model heterogeneous treatment effects. Moreover, it has the capability to handle non-linear main effects and multiway interactions. In this research, a BART Matching Method (BMM) is proposed to provide a more reliable matching method over PSM. By comparing the analysis results from PSM and BMM, BMM can perform well and has better prediction capability when the response values are not normally distributed.Keywords: BART, Bayesian, matching, regression
Procedia PDF Downloads 1473113 The Relationship Between Hourly Compensation and Unemployment Rate Using the Panel Data Regression Analysis
Authors: S. K. Ashiquer Rahman
Abstract:
the paper concentrations on the importance of hourly compensation, emphasizing the significance of the unemployment rate. There are the two most important factors of a nation these are its unemployment rate and hourly compensation. These are not merely statistics but they have profound effects on individual, families, and the economy. They are inversely related to one another. When we consider the unemployment rate that will probably decline as hourly compensations in manufacturing rise. But when we reduced the unemployment rates and increased job prospects could result from higher compensation. That’s why, the increased hourly compensation in the manufacturing sector that could have a favorable effect on job changing issues. Moreover, the relationship between hourly compensation and unemployment is complex and influenced by broader economic factors. In this paper, we use panel data regression models to evaluate the expected link between hourly compensation and unemployment rate in order to determine the effect of hourly compensation on unemployment rate. We estimate the fixed effects model, evaluate the error components, and determine which model (the FEM or ECM) is better by pooling all 60 observations. We then analysis and review the data by comparing 3 several countries (United States, Canada and the United Kingdom) using panel data regression models. Finally, we provide result, analysis and a summary of the extensive research on how the hourly compensation effects on the unemployment rate. Additionally, this paper offers relevant and useful informational to help the government and academic community use an econometrics and social approach to lessen on the effect of the hourly compensation on Unemployment rate to eliminate the problem.Keywords: hourly compensation, Unemployment rate, panel data regression models, dummy variables, random effects model, fixed effects model, the linear regression model
Procedia PDF Downloads 813112 Performance Comparison of Different Regression Methods for a Polymerization Process with Adaptive Sampling
Authors: Florin Leon, Silvia Curteanu
Abstract:
Developing complete mechanistic models for polymerization reactors is not easy, because complex reactions occur simultaneously; there is a large number of kinetic parameters involved and sometimes the chemical and physical phenomena for mixtures involving polymers are poorly understood. To overcome these difficulties, empirical models based on sampled data can be used instead, namely regression methods typical of machine learning field. They have the ability to learn the trends of a process without any knowledge about its particular physical and chemical laws. Therefore, they are useful for modeling complex processes, such as the free radical polymerization of methyl methacrylate achieved in a batch bulk process. The goal is to generate accurate predictions of monomer conversion, numerical average molecular weight and gravimetrical average molecular weight. This process is associated with non-linear gel and glass effects. For this purpose, an adaptive sampling technique is presented, which can select more samples around the regions where the values have a higher variation. Several machine learning methods are used for the modeling and their performance is compared: support vector machines, k-nearest neighbor, k-nearest neighbor and random forest, as well as an original algorithm, large margin nearest neighbor regression. The suggested method provides very good results compared to the other well-known regression algorithms.Keywords: batch bulk methyl methacrylate polymerization, adaptive sampling, machine learning, large margin nearest neighbor regression
Procedia PDF Downloads 3043111 Chemometric QSRR Evaluation of Behavior of s-Triazine Pesticides in Liquid Chromatography
Authors: Lidija R. Jevrić, Sanja O. Podunavac-Kuzmanović, Strahinja Z. Kovačević
Abstract:
This study considers the selection of the most suitable in silico molecular descriptors that could be used for s-triazine pesticides characterization. Suitable descriptors among topological, geometrical and physicochemical are used for quantitative structure-retention relationships (QSRR) model establishment. Established models were obtained using linear regression (LR) and multiple linear regression (MLR) analysis. In this paper, MLR models were established avoiding multicollinearity among the selected molecular descriptors. Statistical quality of established models was evaluated by standard and cross-validation statistical parameters. For detection of similarity or dissimilarity among investigated s-triazine pesticides and their classification, principal component analysis (PCA) and hierarchical cluster analysis (HCA) were used and gave similar grouping. This study is financially supported by COST action TD1305.Keywords: chemometrics, classification analysis, molecular descriptors, pesticides, regression analysis
Procedia PDF Downloads 3933110 Support Vector Regression Combined with Different Optimization Algorithms to Predict Global Solar Radiation on Horizontal Surfaces in Algeria
Authors: Laidi Maamar, Achwak Madani, Abdellah El Ahdj Abdellah
Abstract:
The aim of this work is to use Support Vector regression (SVR) combined with dragonfly, firefly, Bee Colony and particle swarm Optimization algorithm to predict global solar radiation on horizontal surfaces in some cities in Algeria. Combining these optimization algorithms with SVR aims principally to enhance accuracy by fine-tuning the parameters, speeding up the convergence of the SVR model, and exploring a larger search space efficiently; these parameters are the regularization parameter (C), kernel parameters, and epsilon parameter. By doing so, the aim is to improve the generalization and predictive accuracy of the SVR model. Overall, the aim is to leverage the strengths of both SVR and optimization algorithms to create a more powerful and effective regression model for various cities and under different climate conditions. Results demonstrate close agreement between predicted and measured data in terms of different metrics. In summary, SVM has proven to be a valuable tool in modeling global solar radiation, offering accurate predictions and demonstrating versatility when combined with other algorithms or used in hybrid forecasting models.Keywords: support vector regression (SVR), optimization algorithms, global solar radiation prediction, hybrid forecasting models
Procedia PDF Downloads 353109 Non-Linear Regression Modeling for Composite Distributions
Authors: Mostafa Aminzadeh, Min Deng
Abstract:
Modeling loss data is an important part of actuarial science. Actuaries use models to predict future losses and manage financial risk, which can be beneficial for marketing purposes. In the insurance industry, small claims happen frequently while large claims are rare. Traditional distributions such as Normal, Exponential, and inverse-Gaussian are not suitable for describing insurance data, which often show skewness and fat tails. Several authors have studied classical and Bayesian inference for parameters of composite distributions, such as Exponential-Pareto, Weibull-Pareto, and Inverse Gamma-Pareto. These models separate small to moderate losses from large losses using a threshold parameter. This research introduces a computational approach using a nonlinear regression model for loss data that relies on multiple predictors. Simulation studies were conducted to assess the accuracy of the proposed estimation method. The simulations confirmed that the proposed method provides precise estimates for regression parameters. It's important to note that this approach can be applied to datasets if goodness-of-fit tests confirm that the composite distribution under study fits the data well. To demonstrate the computations, a real data set from the insurance industry is analyzed. A Mathematica code uses the Fisher information algorithm as an iteration method to obtain the maximum likelihood estimation (MLE) of regression parameters.Keywords: maximum likelihood estimation, fisher scoring method, non-linear regression models, composite distributions
Procedia PDF Downloads 333108 Statistic Regression and Open Data Approach for Identifying Economic Indicators That Influence e-Commerce
Authors: Apollinaire Barme, Simon Tamayo, Arthur Gaudron
Abstract:
This paper presents a statistical approach to identify explanatory variables linearly related to e-commerce sales. The proposed methodology allows specifying a regression model in order to quantify the relevance between openly available data (economic and demographic) and national e-commerce sales. The proposed methodology consists in collecting data, preselecting input variables, performing regressions for choosing variables and models, testing and validating. The usefulness of the proposed approach is twofold: on the one hand, it allows identifying the variables that influence e- commerce sales with an accessible approach. And on the other hand, it can be used to model future sales from the input variables. Results show that e-commerce is linearly dependent on 11 economic and demographic indicators.Keywords: e-commerce, statistical modeling, regression, empirical research
Procedia PDF Downloads 2263107 Support Vector Regression for Retrieval of Soil Moisture Using Bistatic Scatterometer Data at X-Band
Authors: Dileep Kumar Gupta, Rajendra Prasad, Pradeep Kumar, Varun Narayan Mishra, Ajeet Kumar Vishwakarma, Prashant K. Srivastava
Abstract:
An approach was evaluated for the retrieval of soil moisture of bare soil surface using bistatic scatterometer data in the angular range of 200 to 700 at VV- and HH- polarization. The microwave data was acquired by specially designed X-band (10 GHz) bistatic scatterometer. The linear regression analysis was done between scattering coefficients and soil moisture content to select the suitable incidence angle for retrieval of soil moisture content. The 250 incidence angle was found more suitable. The support vector regression analysis was used to approximate the function described by the input-output relationship between the scattering coefficient and corresponding measured values of the soil moisture content. The performance of support vector regression algorithm was evaluated by comparing the observed and the estimated soil moisture content by statistical performance indices %Bias, root mean squared error (RMSE) and Nash-Sutcliffe Efficiency (NSE). The values of %Bias, root mean squared error (RMSE) and Nash-Sutcliffe Efficiency (NSE) were found 2.9451, 1.0986, and 0.9214, respectively at HH-polarization. At VV- polarization, the values of %Bias, root mean squared error (RMSE) and Nash-Sutcliffe Efficiency (NSE) were found 3.6186, 0.9373, and 0.9428, respectively.Keywords: bistatic scatterometer, soil moisture, support vector regression, RMSE, %Bias, NSE
Procedia PDF Downloads 4283106 A Comparative Analysis of Machine Learning Techniques for PM10 Forecasting in Vilnius
Authors: Mina Adel Shokry Fahim, Jūratė Sužiedelytė Visockienė
Abstract:
With the growing concern over air pollution (AP), it is clear that this has gained more prominence than ever before. The level of consciousness has increased and a sense of knowledge now has to be forwarded as a duty by those enlightened enough to disseminate it to others. This realisation often comes after an understanding of how poor air quality indices (AQI) damage human health. The study focuses on assessing air pollution prediction models specifically for Lithuania, addressing a substantial need for empirical research within the region. Concentrating on Vilnius, it specifically examines particulate matter concentrations 10 micrometers or less in diameter (PM10). Utilizing Gaussian Process Regression (GPR) and Regression Tree Ensemble, and Regression Tree methodologies, predictive forecasting models are validated and tested using hourly data from January 2020 to December 2022. The study explores the classification of AP data into anthropogenic and natural sources, the impact of AP on human health, and its connection to cardiovascular diseases. The study revealed varying levels of accuracy among the models, with GPR achieving the highest accuracy, indicated by an RMSE of 4.14 in validation and 3.89 in testing.Keywords: air pollution, anthropogenic and natural sources, machine learning, Gaussian process regression, tree ensemble, forecasting models, particulate matter
Procedia PDF Downloads 533105 Forecasting Equity Premium Out-of-Sample with Sophisticated Regression Training Techniques
Authors: Jonathan Iworiso
Abstract:
Forecasting the equity premium out-of-sample is a major concern to researchers in finance and emerging markets. The quest for a superior model that can forecast the equity premium with significant economic gains has resulted in several controversies on the choice of variables and suitable techniques among scholars. This research focuses mainly on the application of Regression Training (RT) techniques to forecast monthly equity premium out-of-sample recursively with an expanding window method. A broad category of sophisticated regression models involving model complexity was employed. The RT models include Ridge, Forward-Backward (FOBA) Ridge, Least Absolute Shrinkage and Selection Operator (LASSO), Relaxed LASSO, Elastic Net, and Least Angle Regression were trained and used to forecast the equity premium out-of-sample. In this study, the empirical investigation of the RT models demonstrates significant evidence of equity premium predictability both statistically and economically relative to the benchmark historical average, delivering significant utility gains. They seek to provide meaningful economic information on mean-variance portfolio investment for investors who are timing the market to earn future gains at minimal risk. Thus, the forecasting models appeared to guarantee an investor in a market setting who optimally reallocates a monthly portfolio between equities and risk-free treasury bills using equity premium forecasts at minimal risk.Keywords: regression training, out-of-sample forecasts, expanding window, statistical predictability, economic significance, utility gains
Procedia PDF Downloads 1073104 Self-Image of Police Officers
Authors: Leo Carlo B. Rondina
Abstract:
Self-image is an important factor to improve the self-esteem of the personnel. The purpose of the study is to determine the self-image of the police. The respondents were the 503 policemen assigned in different Police Station in Davao City, and they were chosen with the used of random sampling. With the used of Exploratory Factor Analysis (EFA), latent construct variables of police image were identified as follows; professionalism, obedience, morality and justice and fairness. Further, ordinal regression indicates statistical characteristics on ages 21-40 which means the age of the respondent statistically improves self-image.Keywords: police image, exploratory factor analysis, ordinal regression, Galatea effect
Procedia PDF Downloads 2883103 Regression Analysis of Travel Indicators and Public Transport Usage in Urban Areas
Authors: Mehdi Moeinaddini, Zohreh Asadi-Shekari, Muhammad Zaly Shah, Amran Hamzah
Abstract:
Currently, planners try to have more green travel options to decrease economic, social and environmental problems. Therefore, this study tries to find significant urban travel factors to be used to increase the usage of alternative urban travel modes. This paper attempts to identify the relationship between prominent urban mobility indicators and daily trips by public transport in 30 cities from various parts of the world. Different travel modes, infrastructures and cost indicators were evaluated in this research as mobility indicators. The results of multi-linear regression analysis indicate that there is a significant relationship between mobility indicators and the daily usage of public transport.Keywords: green travel modes, urban travel indicators, daily trips by public transport, multi-linear regression analysis
Procedia PDF Downloads 5493102 Development of Generalized Correlation for Liquid Thermal Conductivity of N-Alkane and Olefin
Authors: A. Ishag Mohamed, A. A. Rabah
Abstract:
The objective of this research is to develop a generalized correlation for the prediction of thermal conductivity of n-Alkanes and Alkenes. There is a minority of research and lack of correlation for thermal conductivity of liquids in the open literature. The available experimental data are collected covering the groups of n-Alkanes and Alkenes.The data were assumed to correlate to temperature using Filippov correlation. Nonparametric regression of Grace Algorithm was used to develop the generalized correlation model. A spread sheet program based on Microsoft Excel was used to plot and calculate the value of the coefficients. The results obtained were compared with the data that found in Perry's Chemical Engineering Hand Book. The experimental data correlated to the temperature ranged "between" 273.15 to 673.15 K, with R2 = 0.99.The developed correlation reproduced experimental data that which were not included in regression with absolute average percent deviation (AAPD) of less than 7 %. Thus the spread sheet was quite accurate which produces reliable data.Keywords: N-Alkanes, N-Alkenes, nonparametric, regression
Procedia PDF Downloads 654