Search results for: linear regression estimation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 7480

Search results for: linear regression estimation

7390 Sensor Fault-Tolerant Model Predictive Control for Linear Parameter Varying Systems

Authors: Yushuai Wang, Feng Xu, Junbo Tan, Xueqian Wang, Bin Liang

Abstract:

In this paper, a sensor fault-tolerant control (FTC) scheme using robust model predictive control (RMPC) and set theoretic fault detection and isolation (FDI) is extended to linear parameter varying (LPV) systems. First, a group of set-valued observers are designed for passive fault detection (FD) and the observer gains are obtained through minimizing the size of invariant set of state estimation-error dynamics. Second, an input set for fault isolation (FI) is designed offline through set theory for actively isolating faults after FD. Third, an RMPC controller based on state estimation for LPV systems is designed to control the system in the presence of disturbance and measurement noise and tolerate faults. Besides, an FTC algorithm is proposed to maintain the plant operate in the corresponding mode when the fault occurs. Finally, a numerical example is used to show the effectiveness of the proposed results.

Keywords: fault detection, linear parameter varying, model predictive control, set theory

Procedia PDF Downloads 252
7389 Competition between Regression Technique and Statistical Learning Models for Predicting Credit Risk Management

Authors: Chokri Slim

Abstract:

The objective of this research is attempting to respond to this question: Is there a significant difference between the regression model and statistical learning models in predicting credit risk management? A Multiple Linear Regression (MLR) model was compared with neural networks including Multi-Layer Perceptron (MLP), and a Support vector regression (SVR). The population of this study includes 50 listed Banks in Tunis Stock Exchange (TSE) market from 2000 to 2016. Firstly, we show the factors that have significant effect on the quality of loan portfolios of banks in Tunisia. Secondly, it attempts to establish that the systematic use of objective techniques and methods designed to apprehend and assess risk when considering applications for granting credit, has a positive effect on the quality of loan portfolios of banks and their future collectability. Finally, we will try to show that the bank governance has an impact on the choice of methods and techniques for analyzing and measuring the risks inherent in the banking business, including the risk of non-repayment. The results of empirical tests confirm our claims.

Keywords: credit risk management, multiple linear regression, principal components analysis, artificial neural networks, support vector machines

Procedia PDF Downloads 150
7388 Regression Analysis of Travel Indicators and Public Transport Usage in Urban Areas

Authors: Mehdi Moeinaddini, Zohreh Asadi-Shekari, Muhammad Zaly Shah, Amran Hamzah

Abstract:

Currently, planners try to have more green travel options to decrease economic, social and environmental problems. Therefore, this study tries to find significant urban travel factors to be used to increase the usage of alternative urban travel modes. This paper attempts to identify the relationship between prominent urban mobility indicators and daily trips by public transport in 30 cities from various parts of the world. Different travel modes, infrastructures and cost indicators were evaluated in this research as mobility indicators. The results of multi-linear regression analysis indicate that there is a significant relationship between mobility indicators and the daily usage of public transport.

Keywords: green travel modes, urban travel indicators, daily trips by public transport, multi-linear regression analysis

Procedia PDF Downloads 548
7387 Estimation of a Finite Population Mean under Random Non Response Using Improved Nadaraya and Watson Kernel Weights

Authors: Nelson Bii, Christopher Ouma, John Odhiambo

Abstract:

Non-response is a potential source of errors in sample surveys. It introduces bias and large variance in the estimation of finite population parameters. Regression models have been recognized as one of the techniques of reducing bias and variance due to random non-response using auxiliary data. In this study, it is assumed that random non-response occurs in the survey variable in the second stage of cluster sampling, assuming full auxiliary information is available throughout. Auxiliary information is used at the estimation stage via a regression model to address the problem of random non-response. In particular, the auxiliary information is used via an improved Nadaraya-Watson kernel regression technique to compensate for random non-response. The asymptotic bias and mean squared error of the estimator proposed are derived. Besides, a simulation study conducted indicates that the proposed estimator has smaller values of the bias and smaller mean squared error values compared to existing estimators of finite population mean. The proposed estimator is also shown to have tighter confidence interval lengths at a 95% coverage rate. The results obtained in this study are useful, for instance, in choosing efficient estimators of the finite population mean in demographic sample surveys.

Keywords: mean squared error, random non-response, two-stage cluster sampling, confidence interval lengths

Procedia PDF Downloads 137
7386 Exploration and Evaluation of the Effect of Multiple Countermeasures on Road Safety

Authors: Atheer Al-Nuaimi, Harry Evdorides

Abstract:

Every day many people die or get disabled or injured on roads around the world, which necessitates more specific treatments for transportation safety issues. International road assessment program (iRAP) model is one of the comprehensive road safety models which accounting for many factors that affect road safety in a cost-effective way in low and middle income countries. In iRAP model road safety has been divided into five star ratings from 1 star (the lowest level) to 5 star (the highest level). These star ratings are based on star rating score which is calculated by iRAP methodology depending on road attributes, traffic volumes and operating speeds. The outcome of iRAP methodology are the treatments that can be used to improve road safety and reduce fatalities and serious injuries (FSI) numbers. These countermeasures can be used separately as a single countermeasure or mix as multiple countermeasures for a location. There is general agreement that the adequacy of a countermeasure is liable to consistent losses when it is utilized as a part of mix with different countermeasures. That is, accident diminishment appraisals of individual countermeasures cannot be easily added together. The iRAP model philosophy makes utilization of a multiple countermeasure adjustment factors to predict diminishments in the effectiveness of road safety countermeasures when more than one countermeasure is chosen. A multiple countermeasure correction factors are figured for every 100-meter segment and for every accident type. However, restrictions of this methodology incorporate a presumable over-estimation in the predicted crash reduction. This study aims to adjust this correction factor by developing new models to calculate the effect of using multiple countermeasures on the number of fatalities for a location or an entire road. Regression models have been used to establish relationships between crash frequencies and the factors that affect their rates. Multiple linear regression, negative binomial regression, and Poisson regression techniques were used to develop models that can address the effectiveness of using multiple countermeasures. Analyses are conducted using The R Project for Statistical Computing showed that a model developed by negative binomial regression technique could give more reliable results of the predicted number of fatalities after the implementation of road safety multiple countermeasures than the results from iRAP model. The results also showed that the negative binomial regression approach gives more precise results in comparison with multiple linear and Poisson regression techniques because of the overdispersion and standard error issues.

Keywords: international road assessment program, negative binomial, road multiple countermeasures, road safety

Procedia PDF Downloads 240
7385 Stature and Gender Estimation Using Foot Measurements in South Indian Population

Authors: Jagadish Rao Padubidri, Mehak Bhandary, Sowmya J. Rao

Abstract:

Introduction: The significance of the human foot and its measurements in identifying an individual has been proved a lot of times by different studies in different geographical areas and its association to the stature and gender of the individual has been justified by many researches. In our study we have used different foot measurements including the length, width, malleol height and navicular height for establishing its association to stature and gender and to find out its accuracy. The purpose of this study is to show the relation of foot measurements with stature and gender, and to derive Multiple and Logistic regression equations for stature and gender estimation in South Indian population. Materials and Methods: The subjects for this study were 200 South Indian students out of which 100 were females and 100 were males, aged between 18 to 24 years. The data for the present study included the stature, foot length, foot breath, foot malleol height, foot navicular height of both right and left foot. Descriptive statistics, T-test and Pearson correlation coefficients were derived between stature, gender and foot measurements. The stature was estimated from right and left foot measurements for both male and female South Indian population using multiple regression analysis and logistic regression analysis for gender estimation. Results: The means, standard deviation, stature, right and left foot measurements and T-test in male population were higher than in females. LFL (Left foot length) is more than RFL (Right Foot length) in male groups, but in female groups the length of both foot are almost equal [RFL=226.6, LFL=227.1]. There is not much of difference in means of RFW (Right foot width) and LFW (Left foot width) in both the genders. Significant difference were seen in mean values of malleol and navicular height of right and left feet in male gender. No such difference was seen in female subjects. Conclusions: The study has successfully demonstrated the correlation of foot length in stature estimation in all the three study groups in both right and left foot. Next in parameters are Foot width and malleol height in estimating stature among male and female groups. Navicular height of both right and left foot showed poor relationship with stature estimation in both male and female groups. Multiple regression equations for both right and left foot measurements to estimate stature were derived with standard error ranging from 11-12 cm in males and 10-11 cm in females. The SEE was 5.8 when both male and female groups were pooled together. The logistic regression model which was derived to determine gender showed 85% accuracy and 92.5% accuracy using right and left foot measurements respectively. We believe that stature and gender can be estimated with foot measurements in South Indian population.

Keywords: foot length, gender, stature, South Indian

Procedia PDF Downloads 335
7384 Non-Parametric Regression over Its Parametric Couterparts with Large Sample Size

Authors: Jude Opara, Esemokumo Perewarebo Akpos

Abstract:

This paper is on non-parametric linear regression over its parametric counterparts with large sample size. Data set on anthropometric measurement of primary school pupils was taken for the analysis. The study used 50 randomly selected pupils for the study. The set of data was subjected to normality test, and it was discovered that the residuals are not normally distributed (i.e. they do not follow a Gaussian distribution) for the commonly used least squares regression method for fitting an equation into a set of (x,y)-data points using the Anderson-Darling technique. The algorithms for the nonparametric Theil’s regression are stated in this paper as well as its parametric OLS counterpart. The use of a programming language software known as “R Development” was used in this paper. From the analysis, the result showed that there exists a significant relationship between the response and the explanatory variable for both the parametric and non-parametric regression. To know the efficiency of one method over the other, the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) are used, and it is discovered that the nonparametric regression performs better than its parametric regression counterparts due to their lower values in both the AIC and BIC. The study however recommends that future researchers should study a similar work by examining the presence of outliers in the data set, and probably expunge it if detected and re-analyze to compare results.

Keywords: Theil’s regression, Bayesian information criterion, Akaike information criterion, OLS

Procedia PDF Downloads 305
7383 Measurement Errors and Misclassifications in Covariates in Logistic Regression: Bayesian Adjustment of Main and Interaction Effects and the Sample Size Implications

Authors: Shahadut Hossain

Abstract:

Measurement errors in continuous covariates and/or misclassifications in categorical covariates are common in epidemiological studies. Regression analysis ignoring such mismeasurements seriously biases the estimated main and interaction effects of covariates on the outcome of interest. Thus, adjustments for such mismeasurements are necessary. In this research, we propose a Bayesian parametric framework for eliminating deleterious impacts of covariate mismeasurements in logistic regression. The proposed adjustment method is unified and thus can be applied to any generalized linear and non-linear regression models. Furthermore, adjustment for covariate mismeasurements requires validation data usually in the form of either gold standard measurements or replicates of the mismeasured covariates on a subset of the study population. Initial investigation shows that adequacy of such adjustment depends on the sizes of main and validation samples, especially when prevalences of the categorical covariates are low. Thus, we investigate the impact of main and validation sample sizes on the adjusted estimates, and provide a general guideline about these sample sizes based on simulation studies.

Keywords: measurement errors, misclassification, mismeasurement, validation sample, Bayesian adjustment

Procedia PDF Downloads 408
7382 Statistical Model of Water Quality in Estero El Macho, Machala-El Oro

Authors: Rafael Zhindon Almeida

Abstract:

Surface water quality is an important concern for the evaluation and prediction of water quality conditions. The objective of this study is to develop a statistical model that can accurately predict the water quality of the El Macho estuary in the city of Machala, El Oro province. The methodology employed in this study is of a basic type that involves a thorough search for theoretical foundations to improve the understanding of statistical modeling for water quality analysis. The research design is correlational, using a multivariate statistical model involving multiple linear regression and principal component analysis. The results indicate that water quality parameters such as fecal coliforms, biochemical oxygen demand, chemical oxygen demand, iron and dissolved oxygen exceed the allowable limits. The water of the El Macho estuary is determined to be below the required water quality criteria. The multiple linear regression model, based on chemical oxygen demand and total dissolved solids, explains 99.9% of the variance of the dependent variable. In addition, principal component analysis shows that the model has an explanatory power of 86.242%. The study successfully developed a statistical model to evaluate the water quality of the El Macho estuary. The estuary did not meet the water quality criteria, with several parameters exceeding the allowable limits. The multiple linear regression model and principal component analysis provide valuable information on the relationship between the various water quality parameters. The findings of the study emphasize the need for immediate action to improve the water quality of the El Macho estuary to ensure the preservation and protection of this valuable natural resource.

Keywords: statistical modeling, water quality, multiple linear regression, principal components, statistical models

Procedia PDF Downloads 98
7381 Agriculture Yield Prediction Using Predictive Analytic Techniques

Authors: Nagini Sabbineni, Rajini T. V. Kanth, B. V. Kiranmayee

Abstract:

India’s economy primarily depends on agriculture yield growth and their allied agro industry products. The agriculture yield prediction is the toughest task for agricultural departments across the globe. The agriculture yield depends on various factors. Particularly countries like India, majority of agriculture growth depends on rain water, which is highly unpredictable. Agriculture growth depends on different parameters, namely Water, Nitrogen, Weather, Soil characteristics, Crop rotation, Soil moisture, Surface temperature and Rain water etc. In our paper, lot of Explorative Data Analysis is done and various predictive models were designed. Further various regression models like Linear, Multiple Linear, Non-linear models are tested for the effective prediction or the forecast of the agriculture yield for various crops in Andhra Pradesh and Telangana states.

Keywords: agriculture yield growth, agriculture yield prediction, explorative data analysis, predictive models, regression models

Procedia PDF Downloads 313
7380 Count Data Regression Modeling: An Application to Spontaneous Abortion in India

Authors: Prashant Verma, Prafulla K. Swain, K. K. Singh, Mukti Khetan

Abstract:

Objective: In India, around 20,000 women die every year due to abortion-related complications. In the modelling of count variables, there is sometimes a preponderance of zero counts. This article concerns the estimation of various count regression models to predict the average number of spontaneous abortion among women in the Punjab state of India. It also assesses the factors associated with the number of spontaneous abortions. Materials and methods: The study included 27,173 married women of Punjab obtained from the DLHS-4 survey (2012-13). Poisson regression (PR), Negative binomial (NB) regression, zero hurdle negative binomial (ZHNB), and zero-inflated negative binomial (ZINB) models were employed to predict the average number of spontaneous abortions and to identify the determinants affecting the number of spontaneous abortions. Results: Statistical comparisons among four estimation methods revealed that the ZINB model provides the best prediction for the number of spontaneous abortions. Antenatal care (ANC) place, place of residence, total children born to a woman, woman's education and economic status were found to be the most significant factors affecting the occurrence of spontaneous abortion. Conclusions: The study offers a practical demonstration of techniques designed to handle count variables. Statistical comparisons among four estimation models revealed that the ZINB model provided the best prediction for the number of spontaneous abortions and is recommended to be used to predict the number of spontaneous abortions. The study suggests that women receive institutional Antenatal care to attain limited parity. It also advocates promoting higher education among women in Punjab, India.

Keywords: count data, spontaneous abortion, Poisson model, negative binomial model, zero hurdle negative binomial, zero-inflated negative binomial, regression

Procedia PDF Downloads 155
7379 Stability-Indicating High-Performance Thin-Layer Chromatography Method for Estimation of Naftopidil

Authors: P. S. Jain, K. D. Bobade, S. J. Surana

Abstract:

A simple, selective, precise and Stability-indicating High-performance thin-layer chromatographic method for analysis of Naftopidil both in a bulk and in pharmaceutical formulation has been developed and validated. The method employed, HPTLC aluminium plates precoated with silica gel as the stationary phase. The solvent system consisted of hexane: ethyl acetate: glacial acetic acid (4:4:2 v/v). The system was found to give compact spot for Naftopidil (Rf value of 0.43±0.02). Densitometric analysis of Naftopidil was carried out in the absorbance mode at 253 nm. The linear regression analysis data for the calibration plots showed good linear relationship with r2=0.999±0.0001 with respect to peak area in the concentration range 200-1200 ng per spot. The method was validated for precision, recovery and robustness. The limits of detection and quantification were 20.35 and 61.68 ng per spot, respectively. Naftopidil was subjected to acid and alkali hydrolysis, oxidation and thermal degradation. The drug undergoes degradation under acidic, basic, oxidation and thermal conditions. This indicates that the drug is susceptible to acid, base, oxidation and thermal conditions. The degraded product was well resolved from the pure drug with significantly different Rf value. Statistical analysis proves that the method is repeatable, selective and accurate for the estimation of investigated drug. The proposed developed HPTLC method can be applied for identification and quantitative determination of Naftopidil in bulk drug and pharmaceutical formulation.

Keywords: naftopidil, HPTLC, validation, stability, degradation

Procedia PDF Downloads 400
7378 Chemometric Estimation of Phytochemicals Affecting the Antioxidant Potential of Lettuce

Authors: Milica Karadzic, Lidija Jevric, Sanja Podunavac-Kuzmanovic, Strahinja Kovacevic, Aleksandra Tepic-Horecki, Zdravko Sumic

Abstract:

In this paper, the influence of six different phytochemical content (phenols, carotenoids, chlorophyll a, chlorophyll b, chlorophyll a + b and vitamin C) on antioxidant potential of Murai and Levistro lettuce varieties was evaluated. Variable selection was made by generalized pair correlation method (GPCM) as a novel ranking method. This method is used for the discrimination between two variables that almost equal correlate to a dependent variable. Fisher’s conditional exact and McNemar’s test were carried out. Established multiple linear (MLR) models were statistically evaluated. As the best phytochemicals for the antioxidant potential prediction, chlorophyll a, chlorophyll a + b and total carotenoids content stand out. This was confirmed through both GPCM and MLR, predictive ability of obtained MLR can be used for antioxidant potential estimation for similar lettuce samples. This article is based upon work from the project of the Provincial Secretariat for Science and Technological Development of Vojvodina (No. 114-451-347/2015-02).

Keywords: antioxidant activity, generalized pair correlation method, lettuce, regression analysis

Procedia PDF Downloads 387
7377 Copula-Based Estimation of Direct and Indirect Effects in Path Analysis Model

Authors: Alam Ali, Ashok Kumar Pathak

Abstract:

Path analysis is a statistical technique used to evaluate the strength of the direct and indirect effects of variables. One or more structural regression equations are used to estimate a series of parameters in order to find the better fit of data. Sometimes, exogenous variables do not show a significant strength of their direct and indirect effect when the assumption of classical regression (ordinary least squares (OLS)) are violated by the nature of the data. The main motive of this article is to investigate the efficacy of the copula-based regression approach over the classical regression approach and calculate the direct and indirect effects of variables when data violates the OLS assumption and variables are linked through an elliptical copula. We perform this study using a well-organized numerical scheme. Finally, a real data application is also presented to demonstrate the performance of the superiority of the copula approach.

Keywords: path analysis, copula-based regression models, direct and indirect effects, k-fold cross validation technique

Procedia PDF Downloads 71
7376 Derivation of Bathymetry from High-Resolution Satellite Images: Comparison of Empirical Methods through Geographical Error Analysis

Authors: Anusha P. Wijesundara, Dulap I. Rathnayake, Nihal D. Perera

Abstract:

Bathymetric information is fundamental importance to coastal and marine planning and management, nautical navigation, and scientific studies of marine environments. Satellite-derived bathymetry data provide detailed information in areas where conventional sounding data is lacking and conventional surveys are inaccessible. The two empirical approaches of log-linear bathymetric inversion model and non-linear bathymetric inversion model are applied for deriving bathymetry from high-resolution multispectral satellite imagery. This study compares these two approaches by means of geographical error analysis for the site Kankesanturai using WorldView-2 satellite imagery. Based on the Levenberg-Marquardt method calibrated the parameters of non-linear inversion model and the multiple-linear regression model was applied to calibrate the log-linear inversion model. In order to calibrate both models, Single Beam Echo Sounding (SBES) data in this study area were used as reference points. Residuals were calculated as the difference between the derived depth values and the validation echo sounder bathymetry data and the geographical distribution of model residuals was mapped. The spatial autocorrelation was calculated by comparing the performance of the bathymetric models and the results showing the geographic errors for both models. A spatial error model was constructed from the initial bathymetry estimates and the estimates of autocorrelation. This spatial error model is used to generate more reliable estimates of bathymetry by quantifying autocorrelation of model error and incorporating this into an improved regression model. Log-linear model (R²=0.846) performs better than the non- linear model (R²=0.692). Finally, the spatial error models improved bathymetric estimates derived from linear and non-linear models up to R²=0.854 and R²=0.704 respectively. The Root Mean Square Error (RMSE) was calculated for all reference points in various depth ranges. The magnitude of the prediction error increases with depth for both the log-linear and the non-linear inversion models. Overall RMSE for log-linear and the non-linear inversion models were ±1.532 m and ±2.089 m, respectively.

Keywords: log-linear model, multi spectral, residuals, spatial error model

Procedia PDF Downloads 297
7375 Point Estimation for the Type II Generalized Logistic Distribution Based on Progressively Censored Data

Authors: Rana Rimawi, Ayman Baklizi

Abstract:

Skewed distributions are important models that are frequently used in applications. Generalized distributions form a class of skewed distributions and gain widespread use in applications because of their flexibility in data analysis. More specifically, the Generalized Logistic Distribution with its different types has received considerable attention recently. In this study, based on progressively type-II censored data, we will consider point estimation in type II Generalized Logistic Distribution (Type II GLD). We will develop several estimators for its unknown parameters, including maximum likelihood estimators (MLE), Bayes estimators and linear estimators (BLUE). The estimators will be compared using simulation based on the criteria of bias and Mean square error (MSE). An illustrative example of a real data set will be given.

Keywords: point estimation, type II generalized logistic distribution, progressive censoring, maximum likelihood estimation

Procedia PDF Downloads 197
7374 Estimation of Missing Values in Aggregate Level Spatial Data

Authors: Amitha Puranik, V. S. Binu, Seena Biju

Abstract:

Missing data is a common problem in spatial analysis especially at the aggregate level. Missing can either occur in covariate or in response variable or in both in a given location. Many missing data techniques are available to estimate the missing data values but not all of these methods can be applied on spatial data since the data are autocorrelated. Hence there is a need to develop a method that estimates the missing values in both response variable and covariates in spatial data by taking account of the spatial autocorrelation. The present study aims to develop a model to estimate the missing data points at the aggregate level in spatial data by accounting for (a) Spatial autocorrelation of the response variable (b) Spatial autocorrelation of covariates and (c) Correlation between covariates and the response variable. Estimating the missing values of spatial data requires a model that explicitly account for the spatial autocorrelation. The proposed model not only accounts for spatial autocorrelation but also utilizes the correlation that exists between covariates, within covariates and between a response variable and covariates. The precise estimation of the missing data points in spatial data will result in an increased precision of the estimated effects of independent variables on the response variable in spatial regression analysis.

Keywords: spatial regression, missing data estimation, spatial autocorrelation, simulation analysis

Procedia PDF Downloads 382
7373 Phase II Monitoring of First-Order Autocorrelated General Linear Profiles

Authors: Yihua Wang, Yunru Lai

Abstract:

Statistical process control has been successfully applied in a variety of industries. In some applications, the quality of a process or product is better characterized and summarized by a functional relationship between a response variable and one or more explanatory variables. A collection of this type of data is called a profile. Profile monitoring is used to understand and check the stability of this relationship or curve over time. The independent assumption for the error term is commonly used in the existing profile monitoring studies. However, in many applications, the profile data show correlations over time. Therefore, we focus on a general linear regression model with a first-order autocorrelation between profiles in this study. We propose an exponentially weighted moving average charting scheme to monitor this type of profile. The simulation study shows that our proposed methods outperform the existing schemes based on the average run length criterion.

Keywords: autocorrelation, EWMA control chart, general linear regression model, profile monitoring

Procedia PDF Downloads 460
7372 Weighted Rank Regression with Adaptive Penalty Function

Authors: Kang-Mo Jung

Abstract:

The use of regularization for statistical methods has become popular. The least absolute shrinkage and selection operator (LASSO) framework has become the standard tool for sparse regression. However, it is well known that the LASSO is sensitive to outliers or leverage points. We consider a new robust estimation which is composed of the weighted loss function of the pairwise difference of residuals and the adaptive penalty function regulating the tuning parameter for each variable. Rank regression is resistant to regression outliers, but not to leverage points. By adopting a weighted loss function, the proposed method is robust to leverage points of the predictor variable. Furthermore, the adaptive penalty function gives us good statistical properties in variable selection such as oracle property and consistency. We develop an efficient algorithm to compute the proposed estimator using basic functions in program R. We used an optimal tuning parameter based on the Bayesian information criterion (BIC). Numerical simulation shows that the proposed estimator is effective for analyzing real data set and contaminated data.

Keywords: adaptive penalty function, robust penalized regression, variable selection, weighted rank regression

Procedia PDF Downloads 474
7371 Gender Estimation by Means of Quantitative Measurements of Foramen Magnum: An Analysis of CT Head Images

Authors: Thilini Hathurusinghe, Uthpalie Siriwardhana, W. M. Ediri Arachchi, Ranga Thudugala, Indeewari Herath, Gayani Senanayake

Abstract:

The foramen magnum is more prone to protect than other skeletal remains during high impact and severe disruptive injuries. Therefore, it is worthwhile to explore whether these measurements can be used to determine the human gender which is vital in forensic and anthropological studies. The idea was to find out the ability to use quantitative measurements of foramen magnum as an anatomical indicator for human gender estimation and to evaluate the gender-dependent variations of foramen magnum using quantitative measurements. Randomly selected 113 subjects who underwent CT head scans at Sri Jayawardhanapura General Hospital of Sri Lanka within a period of six months, were included in the study. The sample contained 58 males (48.76 ± 14.7 years old) and 55 females (47.04 ±15.9 years old). Maximum length of the foramen magnum (LFM), maximum width of the foramen magnum (WFM), minimum distance between occipital condyles (MnD) and maximum interior distance between occipital condyles (MxID) were measured. Further, AreaT and AreaR were also calculated. The gender was estimated using binomial logistic regression. The mean values of all explanatory variables (LFM, WFM, MnD, MxID, AreaT, and AreaR) were greater among male than female. All explanatory variables except MnD (p=0.669) were statistically significant (p < 0.05). Significant bivariate correlations were demonstrated by AreaT and AreaR with the explanatory variables. The results evidenced that WFM and MxID were the best measurements in predicting gender according to binomial logistic regression. The estimated model was: log (p/1-p) =10.391-0.136×MxID-0.231×WFM, where p is the probability of being a female. The classification accuracy given by the above model was 65.5%. The quantitative measurements of foramen magnum can be used as a reliable anatomical marker for human gender estimation in the Sri Lankan context.

Keywords: foramen magnum, forensic and anthropological studies, gender estimation, logistic regression

Procedia PDF Downloads 151
7370 A Research on Tourism Market Forecast and Its Evaluation

Authors: Min Wei

Abstract:

The traditional prediction methods of the forecast for tourism market are paid more attention to the accuracy of the forecasts, ignoring the results of the feasibility of forecasting and predicting operability, which had made it difficult to predict the results of scientific testing. With the application of Linear Regression Model, this paper attempts to construct a scientific evaluation system for predictive value, both to ensure the accuracy, stability of the predicted value, and to ensure the feasibility of forecasting and predicting the results of operation. The findings show is that a scientific evaluation system can implement the scientific concept of development, the harmonious development of man and nature co-ordinate.

Keywords: linear regression model, tourism market, forecast, tourism economics

Procedia PDF Downloads 332
7369 Mapping of Urban Micro-Climate in Lyon (France) by Integrating Complementary Predictors at Different Scales into Multiple Linear Regression Models

Authors: Lucille Alonso, Florent Renard

Abstract:

The characterizations of urban heat island (UHI) and their interactions with climate change and urban climates are the main research and public health issue, due to the increasing urbanization of the population. These solutions require a better knowledge of the UHI and micro-climate in urban areas, by combining measurements and modelling. This study is part of this topic by evaluating microclimatic conditions in dense urban areas in the Lyon Metropolitan Area (France) using a combination of data traditionally used such as topography, but also from LiDAR (Light Detection And Ranging) data, Landsat 8 satellite observation and Sentinel and ground measurements by bike. These bicycle-dependent weather data collections are used to build the database of the variable to be modelled, the air temperature, over Lyon’s hyper-center. This study aims to model the air temperature, measured during 6 mobile campaigns in Lyon in clear weather, using multiple linear regressions based on 33 explanatory variables. They are of various categories such as meteorological parameters from remote sensing, topographic variables, vegetation indices, the presence of water, humidity, bare soil, buildings, radiation, urban morphology or proximity and density to various land uses (water surfaces, vegetation, bare soil, etc.). The acquisition sources are multiple and come from the Landsat 8 and Sentinel satellites, LiDAR points, and cartographic products downloaded from an open data platform in Greater Lyon. Regarding the presence of low, medium, and high vegetation, the presence of buildings and ground, several buffers close to these factors were tested (5, 10, 20, 25, 50, 100, 200 and 500m). The buffers with the best linear correlations with air temperature for ground are 5m around the measurement points, for low and medium vegetation, and for building 50m and for high vegetation is 100m. The explanatory model of the dependent variable is obtained by multiple linear regression of the remaining explanatory variables (Pearson correlation matrix with a |r| < 0.7 and VIF with < 5) by integrating a stepwise sorting algorithm. Moreover, holdout cross-validation is performed, due to its ability to detect over-fitting of multiple regression, although multiple regression provides internal validation and randomization (80% training, 20% testing). Multiple linear regression explained, on average, 72% of the variance for the study days, with an average RMSE of only 0.20°C. The impact on the model of surface temperature in the estimation of air temperature is the most important variable. Other variables are recurrent such as distance to subway stations, distance to water areas, NDVI, digital elevation model, sky view factor, average vegetation density, or building density. Changing urban morphology influences the city's thermal patterns. The thermal atmosphere in dense urban areas can only be analysed on a microscale to be able to consider the local impact of trees, streets, and buildings. There is currently no network of fixed weather stations sufficiently deployed in central Lyon and most major urban areas. Therefore, it is necessary to use mobile measurements, followed by modelling to characterize the city's multiple thermal environments.

Keywords: air temperature, LIDAR, multiple linear regression, surface temperature, urban heat island

Procedia PDF Downloads 137
7368 Factors Affecting Green Consumption Behaviors of the Urban Residents in Hanoi, Vietnam

Authors: Phan Thi Song Thuong

Abstract:

This paper uses data from a survey on the green consumption behavior of Hanoi residents in October 2022. Data was gathered from a survey conducted in ten districts in the center of Hanoi, with 393 respondents. The hypothesis focuses on understanding the factors that may affect green consumption behavior, such as demographic characteristics, concerns about the environment and health, people living around, self-efficiency, and mass media. A number of methods, such as the T-test, exploratory factor analysis, and a linear regression model, are used to prove the hypotheses. Accordingly, the results show that gender, age, and education level have separate effects on the green consumption behavior of respondents.

Keywords: green consumption, urban residents, environment, sustainable, linear regression

Procedia PDF Downloads 131
7367 A Case Comparative Study of Infant Mortality Rate in North-West Nigeria

Authors: G. I. Onwuka, A. Danbaba, S. U. Gulumbe

Abstract:

This study investigated of Infant Mortality Rate as observed at a general hospital in Kaduna-South, Kaduna State, North West Nigeria. The causes of infant Mortality were examined. The data used for this analysis were collected at the statistics unit of the Hospital. The analysis was carried out on the data using Multiple Linear regression Technique and this showed that there is linear relationship between the dependent variable (death) and the independent variables (malaria, measles, anaemia, and coronary heart disease). The resultant model also revealed that a unit increment in each of these diseases would result to a unit increment in death recorded, 98.7% of the total variation in mortality is explained by the given model. The highest number of mortality was recorded in July, 2005 and the lowest mortality recorded in October, 2009.Recommendations were however made based on the results of the study.

Keywords: infant mortality rate, multiple linear regression, diseases, serial correlation

Procedia PDF Downloads 329
7366 Nonparametric Truncated Spline Regression Model on the Data of Human Development Index in Indonesia

Authors: Kornelius Ronald Demu, Dewi Retno Sari Saputro, Purnami Widyaningsih

Abstract:

Human Development Index (HDI) is a standard measurement for a country's human development. Several factors may have influenced it, such as life expectancy, gross domestic product (GDP) based on the province's annual expenditure, the number of poor people, and the percentage of an illiterate people. The scatter plot between HDI and the influenced factors show that the plot does not follow a specific pattern or form. Therefore, the HDI's data in Indonesia can be applied with a nonparametric regression model. The estimation of the regression curve in the nonparametric regression model is flexible because it follows the shape of the data pattern. One of the nonparametric regression's method is a truncated spline. Truncated spline regression is one of the nonparametric approach, which is a modification of the segmented polynomial functions. The estimator of a truncated spline regression model was affected by the selection of the optimal knots point. Knot points is a focus point of spline truncated functions. The optimal knots point was determined by the minimum value of generalized cross validation (GCV). In this article were applied the data of Human Development Index with a truncated spline nonparametric regression model. The results of this research were obtained the best-truncated spline regression model to the HDI's data in Indonesia with the combination of optimal knots point 5-5-5-4. Life expectancy and the percentage of an illiterate people were the significant factors depend to the HDI in Indonesia. The coefficient of determination is 94.54%. This means the regression model is good enough to applied on the data of HDI in Indonesia.

Keywords: generalized cross validation (GCV), Human Development Index (HDI), knots point, nonparametric regression, truncated spline

Procedia PDF Downloads 338
7365 An Application of Sinc Function to Approximate Quadrature Integrals in Generalized Linear Mixed Models

Authors: Altaf H. Khan, Frank Stenger, Mohammed A. Hussein, Reaz A. Chaudhuri, Sameera Asif

Abstract:

This paper discusses a novel approach to approximate quadrature integrals that arise in the estimation of likelihood parameters for the generalized linear mixed models (GLMM) as well as Bayesian methodology also requires computation of multidimensional integrals with respect to the posterior distributions in which computation are not only tedious and cumbersome rather in some situations impossible to find solutions because of singularities, irregular domains, etc. An attempt has been made in this work to apply Sinc function based quadrature rules to approximate intractable integrals, as there are several advantages of using Sinc based methods, for example: order of convergence is exponential, works very well in the neighborhood of singularities, in general quite stable and provide high accurate and double precisions estimates. The Sinc function based approach seems to be utilized first time in statistical domain to our knowledge, and it's viability and future scopes have been discussed to apply in the estimation of parameters for GLMM models as well as some other statistical areas.

Keywords: generalized linear mixed model, likelihood parameters, qudarature, Sinc function

Procedia PDF Downloads 394
7364 Regression for Doubly Inflated Multivariate Poisson Distributions

Authors: Ishapathik Das, Sumen Sen, N. Rao Chaganty, Pooja Sengupta

Abstract:

Dependent multivariate count data occur in several research studies. These data can be modeled by a multivariate Poisson or Negative binomial distribution constructed using copulas. However, when some of the counts are inflated, that is, the number of observations in some cells are much larger than other cells, then the copula based multivariate Poisson (or Negative binomial) distribution may not fit well and it is not an appropriate statistical model for the data. There is a need to modify or adjust the multivariate distribution to account for the inflated frequencies. In this article, we consider the situation where the frequencies of two cells are higher compared to the other cells, and develop a doubly inflated multivariate Poisson distribution function using multivariate Gaussian copula. We also discuss procedures for regression on covariates for the doubly inflated multivariate count data. For illustrating the proposed methodologies, we present a real data containing bivariate count observations with inflations in two cells. Several models and linear predictors with log link functions are considered, and we discuss maximum likelihood estimation to estimate unknown parameters of the models.

Keywords: copula, Gaussian copula, multivariate distributions, inflated distributios

Procedia PDF Downloads 156
7363 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest

Procedia PDF Downloads 231
7362 Apricot Insurance Portfolio Risk

Authors: Kasirga Yildirak, Ismail Gur

Abstract:

We propose a model to measure hail risk of an Agricultural Insurance portfolio. Hail is one of the major catastrophic event that causes big amount of loss to an insurer. Moreover, it is very hard to predict due to its strange atmospheric characteristics. We make use of parcel based claims data on apricot damage collected by the Turkish Agricultural Insurance Pool (TARSIM). As our ultimate aim is to compute the loadings assigned to specific parcels, we build a portfolio risk model that makes use of PD and the severity of the exposures. PD is computed by Spherical-Linear and Circular –Linear regression models as the data carries coordinate information and seasonality. Severity is mapped into integer brackets so that Probability Generation Function could be employed. Individual regressions are run on each clusters estimated on different criteria. Loss distribution is constructed by Panjer Recursion technique. We also show that one risk-one crop model can easily be extended to the multi risk–multi crop model by assuming conditional independency.

Keywords: hail insurance, spherical regression, circular regression, spherical clustering

Procedia PDF Downloads 251
7361 A Linear Regression Model for Estimating Anxiety Index Using Wide Area Frontal Lobe Brain Blood Volume

Authors: Takashi Kaburagi, Masashi Takenaka, Yosuke Kurihara, Takashi Matsumoto

Abstract:

Major depressive disorder (MDD) is one of the most common mental illnesses today. It is believed to be caused by a combination of several factors, including stress. Stress can be quantitatively evaluated using the State-Trait Anxiety Inventory (STAI), one of the best indices to evaluate anxiety. Although STAI scores are widely used in applications ranging from clinical diagnosis to basic research, the scores are calculated based on a self-reported questionnaire. An objective evaluation is required because the subject may intentionally change his/her answers if multiple tests are carried out. In this article, we present a modified index called the “multi-channel Laterality Index at Rest (mc-LIR)” by recording the brain activity from a wider area of the frontal lobe using multi-channel functional near-infrared spectroscopy (fNIRS). The presented index aims to measure multiple positions near the Fpz defined by the international 10-20 system positioning. Using 24 subjects, the dependencies on the number of measuring points used to calculate the mc-LIR and its correlation coefficients with the STAI scores are reported. Furthermore, a simple linear regression was performed to estimate the STAI scores from mc-LIR. The cross-validation error is also reported. The experimental results show that using multiple positions near the Fpz will improve the correlation coefficients and estimation than those using only two positions.

Keywords: frontal lobe, functional near-infrared spectroscopy, state-trait anxiety inventory score, stress

Procedia PDF Downloads 249