Search results for: poisson regression model
18856 Behind Fuzzy Regression Approach: An Exploration Study
Authors: Lavinia B. Dulla
Abstract:
The exploration study of the fuzzy regression approach attempts to present that fuzzy regression can be used as a possible alternative to classical regression. It likewise seeks to assess the differences and characteristics of simple linear regression and fuzzy regression using the width of prediction interval, mean absolute deviation, and variance of residuals. Based on the simple linear regression model, the fuzzy regression approach is worth considering as an alternative to simple linear regression when the sample size is between 10 and 20. As the sample size increases, the fuzzy regression approach is not applicable to use since the assumption regarding large sample size is already operating within the framework of simple linear regression. Nonetheless, it can be suggested for a practical alternative when decisions often have to be made on the basis of small data.Keywords: fuzzy regression approach, minimum fuzziness criterion, interval regression, prediction interval
Procedia PDF Downloads 29918855 Integrated Nested Laplace Approximations For Quantile Regression
Authors: Kajingulu Malandala, Ranganai Edmore
Abstract:
The asymmetric Laplace distribution (ADL) is commonly used as the likelihood function of the Bayesian quantile regression, and it offers different families of likelihood method for quantile regression. Notwithstanding their popularity and practicality, ADL is not smooth and thus making it difficult to maximize its likelihood. Furthermore, Bayesian inference is time consuming and the selection of likelihood may mislead the inference, as the Bayes theorem does not automatically establish the posterior inference. Furthermore, ADL does not account for greater skewness and Kurtosis. This paper develops a new aspect of quantile regression approach for count data based on inverse of the cumulative density function of the Poisson, binomial and Delaporte distributions using the integrated nested Laplace Approximations. Our result validates the benefit of using the integrated nested Laplace Approximations and support the approach for count data.Keywords: quantile regression, Delaporte distribution, count data, integrated nested Laplace approximation
Procedia PDF Downloads 16318854 Optimization of Machine Learning Regression Results: An Application on Health Expenditures
Authors: Songul Cinaroglu
Abstract:
Machine learning regression methods are recommended as an alternative to classical regression methods in the existence of variables which are difficult to model. Data for health expenditure is typically non-normal and have a heavily skewed distribution. This study aims to compare machine learning regression methods by hyperparameter tuning to predict health expenditure per capita. A multiple regression model was conducted and performance results of Lasso Regression, Random Forest Regression and Support Vector Machine Regression recorded when different hyperparameters are assigned. Lambda (λ) value for Lasso Regression, number of trees for Random Forest Regression, epsilon (ε) value for Support Vector Regression was determined as hyperparameters. Study results performed by using 'k' fold cross validation changed from 5 to 50, indicate the difference between machine learning regression results in terms of R², RMSE and MAE values that are statistically significant (p < 0.001). Study results reveal that Random Forest Regression (R² ˃ 0.7500, RMSE ≤ 0.6000 ve MAE ≤ 0.4000) outperforms other machine learning regression methods. It is highly advisable to use machine learning regression methods for modelling health expenditures.Keywords: machine learning, lasso regression, random forest regression, support vector regression, hyperparameter tuning, health expenditure
Procedia PDF Downloads 22618853 A Nonlocal Means Algorithm for Poisson Denoising Based on Information Geometry
Authors: Dongxu Chen, Yipeng Li
Abstract:
This paper presents an information geometry NonlocalMeans(NLM) algorithm for Poisson denoising. NLM estimates a noise-free pixel as a weighted average of image pixels, where each pixel is weighted according to the similarity between image patches in Euclidean space. In this work, every pixel is a Poisson distribution locally estimated by Maximum Likelihood (ML), all distributions consist of a statistical manifold. A NLM denoising algorithm is conducted on the statistical manifold where Fisher information matrix can be used for computing distribution geodesics referenced as the similarity between patches. This approach was demonstrated to be competitive with related state-of-the-art methods.Keywords: image denoising, Poisson noise, information geometry, nonlocal-means
Procedia PDF Downloads 28518852 Modeling Karachi Dengue Outbreak and Exploration of Climate Structure
Authors: Syed Afrozuddin Ahmed, Junaid Saghir Siddiqi, Sabah Quaiser
Abstract:
Various studies have reported that global warming causes unstable climate and many serious impact to physical environment and public health. The increasing incidence of dengue incidence is now a priority health issue and become a health burden of Pakistan. In this study it has been investigated that spatial pattern of environment causes the emergence or increasing rate of dengue fever incidence that effects the population and its health. The climatic or environmental structure data and the Dengue Fever (DF) data was processed by coding, editing, tabulating, recoding, restructuring in terms of re-tabulating was carried out, and finally applying different statistical methods, techniques, and procedures for the evaluation. Five climatic variables which we have studied are precipitation (P), Maximum temperature (Mx), Minimum temperature (Mn), Humidity (H) and Wind speed (W) collected from 1980-2012. The dengue cases in Karachi from 2010 to 2012 are reported on weekly basis. Principal component analysis is applied to explore the climatic variables and/or the climatic (structure) which may influence in the increase or decrease in the number of dengue fever cases in Karachi. PC1 for all the period is General atmospheric condition. PC2 for dengue period is contrast between precipitation and wind speed. PC3 is the weighted difference between maximum temperature and wind speed. PC4 for dengue period contrast between maximum and wind speed. Negative binomial and Poisson regression model are used to correlate the dengue fever incidence to climatic variable and principal component score. Relative humidity is estimated to positively influence on the chances of dengue occurrence by 1.71% times. Maximum temperature positively influence on the chances dengue occurrence by 19.48% times. Minimum temperature affects positively on the chances of dengue occurrence by 11.51% times. Wind speed is effecting negatively on the weekly occurrence of dengue fever by 7.41% times.Keywords: principal component analysis, dengue fever, negative binomial regression model, poisson regression model
Procedia PDF Downloads 44518851 Generalized Extreme Value Regression with Binary Dependent Variable: An Application for Predicting Meteorological Drought Probabilities
Authors: Retius Chifurira
Abstract:
Logistic regression model is the most used regression model to predict meteorological drought probabilities. When the dependent variable is extreme, the logistic model fails to adequately capture drought probabilities. In order to adequately predict drought probabilities, we use the generalized linear model (GLM) with the quantile function of the generalized extreme value distribution (GEVD) as the link function. The method maximum likelihood estimation is used to estimate the parameters of the generalized extreme value (GEV) regression model. We compare the performance of the logistic and the GEV regression models in predicting drought probabilities for Zimbabwe. The performance of the regression models are assessed using the goodness-of-fit tests, namely; relative root mean square error (RRMSE) and relative mean absolute error (RMAE). Results show that the GEV regression model performs better than the logistic model, thereby providing a good alternative candidate for predicting drought probabilities. This paper provides the first application of GLM derived from extreme value theory to predict drought probabilities for a drought-prone country such as Zimbabwe.Keywords: generalized extreme value distribution, general linear model, mean annual rainfall, meteorological drought probabilities
Procedia PDF Downloads 20018850 A Regression Model for Residual-State Creep Failure
Authors: Deepak Raj Bhat, Ryuichi Yatabe
Abstract:
In this study, a residual-state creep failure model was developed based on the residual-state creep test results of clayey soils. To develop the proposed model, the regression analyses were done by using the R. The model results of the failure time (tf) and critical displacement (δc) were compared with experimental results and found in close agreements to each others. It is expected that the proposed regression model for residual-state creep failure will be more useful for the prediction of displacement of different clayey soils in the future.Keywords: regression model, residual-state creep failure, displacement prediction, clayey soils
Procedia PDF Downloads 40818849 A Learning-Based EM Mixture Regression Algorithm
Authors: Yi-Cheng Tian, Miin-Shen Yang
Abstract:
The mixture likelihood approach to clustering is a popular clustering method where the expectation and maximization (EM) algorithm is the most used mixture likelihood method. In the literature, the EM algorithm had been used for mixture regression models. However, these EM mixture regression algorithms are sensitive to initial values with a priori number of clusters. In this paper, to resolve these drawbacks, we construct a learning-based schema for the EM mixture regression algorithm such that it is free of initializations and can automatically obtain an approximately optimal number of clusters. Some numerical examples and comparisons demonstrate the superiority and usefulness of the proposed learning-based EM mixture regression algorithm.Keywords: clustering, EM algorithm, Gaussian mixture model, mixture regression model
Procedia PDF Downloads 51018848 A Fuzzy Linear Regression Model Based on Dissemblance Index
Authors: Shih-Pin Chen, Shih-Syuan You
Abstract:
Fuzzy regression models are useful for investigating the relationship between explanatory variables and responses in fuzzy environments. To overcome the deficiencies of previous models and increase the explanatory power of fuzzy data, the graded mean integration (GMI) representation is applied to determine representative crisp regression coefficients. A fuzzy regression model is constructed based on the modified dissemblance index (MDI), which can precisely measure the actual total error. Compared with previous studies based on the proposed MDI and distance criterion, the results from commonly used test examples show that the proposed fuzzy linear regression model has higher explanatory power and forecasting accuracy.Keywords: dissemblance index, fuzzy linear regression, graded mean integration, mathematical programming
Procedia PDF Downloads 43918847 New High Order Group Iterative Schemes in the Solution of Poisson Equation
Authors: Sam Teek Ling, Norhashidah Hj. Mohd. Ali
Abstract:
We investigate the formulation and implementation of new explicit group iterative methods in solving the two-dimensional Poisson equation with Dirichlet boundary conditions. The methods are derived from a fourth order compact nine point finite difference discretization. The methods are compared with the existing second order standard five point formula to show the dramatic improvement in computed accuracy. Numerical experiments are presented to illustrate the effectiveness of the proposed methods.Keywords: explicit group iterative method, finite difference, fourth order compact, Poisson equation
Procedia PDF Downloads 43218846 A Hybrid Model Tree and Logistic Regression Model for Prediction of Soil Shear Strength in Clay
Authors: Ehsan Mehryaar, Seyed Armin Motahari Tabari
Abstract:
Without a doubt, soil shear strength is the most important property of the soil. The majority of fatal and catastrophic geological accidents are related to shear strength failure of the soil. Therefore, its prediction is a matter of high importance. However, acquiring the shear strength is usually a cumbersome task that might need complicated laboratory testing. Therefore, prediction of it based on common and easy to get soil properties can simplify the projects substantially. In this paper, A hybrid model based on the classification and regression tree algorithm and logistic regression is proposed where each leaf of the tree is an independent regression model. A database of 189 points for clay soil, including Moisture content, liquid limit, plastic limit, clay content, and shear strength, is collected. The performance of the developed model compared to the existing models and equations using root mean squared error and coefficient of correlation.Keywords: model tree, CART, logistic regression, soil shear strength
Procedia PDF Downloads 19718845 Determination of Poisson’s Ratio and Elastic Modulus of Compression Textile Materials
Authors: Chongyang Ye, Rong Liu
Abstract:
Compression textiles such as compression stockings (CSs) have been extensively applied for the prevention and treatment of chronic venous insufficiency of lower extremities. The involvement of multiple mechanical factors such as interface pressure, frictional force, and elastic materials make the interactions between lower limb and CSs to be complex. Determination of Poisson’s ratio and elastic moduli of CS materials are critical for constructing finite element (FE) modeling to numerically simulate a complex interactive system of CS and lower limb. In this study, a mixed approach, including an analytic model based on the orthotropic Hooke’s Law and experimental study (uniaxial tension testing and pure shear testing), has been proposed to determine Young’s modulus, Poisson’s ratio, and shear modulus of CS fabrics. The results indicated a linear relationship existing between the stress and strain properties of the studied CS samples under controlled stretch ratios (< 100%). The newly proposed method and the determined key mechanical properties of elastic orthotropic CS fabrics facilitate FE modeling for analyzing in-depth the effects of compression material design on their resultant biomechanical function in compression therapy.Keywords: elastic compression stockings, Young’s modulus, Poisson’s ratio, shear modulus, mechanical analysis
Procedia PDF Downloads 11918844 Statistical Modeling of Local Area Fading Channels Based on Triply Stochastic Filtered Marked Poisson Point Processes
Authors: Jihad Daba, Jean-Pierre Dubois
Abstract:
Multi path fading noise degrades the performance of cellular communication, most notably in femto- and pico-cells in 3G and 4G systems. When the wireless channel consists of a small number of scattering paths, the statistics of fading noise is not analytically tractable and poses a serious challenge to developing closed canonical forms that can be analysed and used in the design of efficient and optimal receivers. In this context, noise is multiplicative and is referred to as stochastically local fading. In many analytical investigation of multiplicative noise, the exponential or Gamma statistics are invoked. More recent advances by the author of this paper have utilized a Poisson modulated and weighted generalized Laguerre polynomials with controlling parameters and uncorrelated noise assumptions. In this paper, we investigate the statistics of multi-diversity stochastically local area fading channel when the channel consists of randomly distributed Rayleigh and Rician scattering centers with a coherent specular Nakagami-distributed line of sight component and an underlying doubly stochastic Poisson process driven by a lognormal intensity. These combined statistics form a unifying triply stochastic filtered marked Poisson point process model.Keywords: cellular communication, femto and pico-cells, stochastically local area fading channel, triply stochastic filtered marked Poisson point process
Procedia PDF Downloads 44818843 A Modified Periodic 2D Cellular Re-Entrant Honeycomb Model to Enhance the Auxetic Elastic Properties
Authors: Sohaib Z. Khan, Farrukh Mustahsan, Essam R. I. Mahmoud, S. H. Masood
Abstract:
Materials or structures that contract laterally on the application of a compressive load and vice versa are said to be Auxetic materials which exhibit Negative Poisson’s Ratio (NPR). Numerous auxetic structures are proposed in the literature. One of the most studied periodic auxetic structure is the re-entrant honeycomb model. In this paper, a modified re-entrant model is proposed to enhance the auxetic behavior. The paper aimed to investigate the elastic behaviour of the proposed model to improve Young’s modulus and NPR by evaluating the analytical model. Finite Element Analysis (FEA) is also conducted to support the analytical results. A significant increment in Young’s modulus and NPR can be achieved in one of the two orthogonal directions of the loading at the cost of compromising these values in other direction. The proposed modification resulted in lower relative densities when compared to the existing re-entrant honeycomb structure. A trade-off in the elastic properties in one direction at low relative density makes the proposed model suitable for uni-direction applications where higher stiffness and NPR is required, and strength to weight ratio is important.Keywords: 2D model, auxetic materials, re-entrant honeycomb, negative Poisson's ratio
Procedia PDF Downloads 13818842 Application and Verification of Regression Model to Landslide Susceptibility Mapping
Authors: Masood Beheshtirad
Abstract:
Identification of regions having potential for landslide occurrence is one of the basic measures in natural resources management. Different landslide hazard mapping models are proposed based on the environmental condition and goals. In this research landslide hazard map using multiple regression model were provided and applicability of this model is investigated in Baghdasht watershed. Dependent variable is landslide inventory map and independent variables consist of information layers as Geology, slope, aspect, distance from river, distance from road, fault and land use. For doing this, existing landslides have been identified and an inventory map made. The landslide hazard map is based on the multiple regression provided. The level of similarity potential hazard classes and figures of this model were compared with the landslide inventory map in the SPSS environments. Results of research showed that there is a significant correlation between the potential hazard classes and figures with area of the landslides. The multiple regression model is suitable for application in the Baghdasht Watershed.Keywords: landslide, mapping, multiple model, regression
Procedia PDF Downloads 32518841 Nonparametric Truncated Spline Regression Model on the Data of Human Development Index in Indonesia
Authors: Kornelius Ronald Demu, Dewi Retno Sari Saputro, Purnami Widyaningsih
Abstract:
Human Development Index (HDI) is a standard measurement for a country's human development. Several factors may have influenced it, such as life expectancy, gross domestic product (GDP) based on the province's annual expenditure, the number of poor people, and the percentage of an illiterate people. The scatter plot between HDI and the influenced factors show that the plot does not follow a specific pattern or form. Therefore, the HDI's data in Indonesia can be applied with a nonparametric regression model. The estimation of the regression curve in the nonparametric regression model is flexible because it follows the shape of the data pattern. One of the nonparametric regression's method is a truncated spline. Truncated spline regression is one of the nonparametric approach, which is a modification of the segmented polynomial functions. The estimator of a truncated spline regression model was affected by the selection of the optimal knots point. Knot points is a focus point of spline truncated functions. The optimal knots point was determined by the minimum value of generalized cross validation (GCV). In this article were applied the data of Human Development Index with a truncated spline nonparametric regression model. The results of this research were obtained the best-truncated spline regression model to the HDI's data in Indonesia with the combination of optimal knots point 5-5-5-4. Life expectancy and the percentage of an illiterate people were the significant factors depend to the HDI in Indonesia. The coefficient of determination is 94.54%. This means the regression model is good enough to applied on the data of HDI in Indonesia.Keywords: generalized cross validation (GCV), Human Development Index (HDI), knots point, nonparametric regression, truncated spline
Procedia PDF Downloads 33918840 The Use of Geographically Weighted Regression for Deforestation Analysis: Case Study in Brazilian Cerrado
Authors: Ana Paula Camelo, Keila Sanches
Abstract:
The Geographically Weighted Regression (GWR) was proposed in geography literature to allow relationship in a regression model to vary over space. In Brazil, the agricultural exploitation of the Cerrado Biome is the main cause of deforestation. In this study, we propose a methodology using geostatistical methods to characterize the spatial dependence of deforestation in the Cerrado based on agricultural production indicators. Therefore, it was used the set of exploratory spatial data analysis tools (ESDA) and confirmatory analysis using GWR. It was made the calibration a non-spatial model, evaluation the nature of the regression curve, election of the variables by stepwise process and multicollinearity analysis. After the evaluation of the non-spatial model was processed the spatial-regression model, statistic evaluation of the intercept and verification of its effect on calibration. In an analysis of Spearman’s correlation the results between deforestation and livestock was +0.783 and with soybeans +0.405. The model presented R²=0.936 and showed a strong spatial dependence of agricultural activity of soybeans associated to maize and cotton crops. The GWR is a very effective tool presenting results closer to the reality of deforestation in the Cerrado when compared with other analysis.Keywords: deforestation, geographically weighted regression, land use, spatial analysis
Procedia PDF Downloads 36318839 Characterization of the Upper Crust in Botswana Using Vp/Vs and Poisson's Ratios from Body Waves
Authors: Rapelang E. Simon, Thebeetsile A. Olebetse, Joseph R. Maritinkole, Ruth O. Moleleke
Abstract:
The P and S wave seismic velocity ratios (Vp/Vs) of some aftershocks are investigated using the method ofWadati diagrams. These aftershocks occurred after the 3rdApril 2017 Botswana’s Mw 6.5 earthquake and were recorded by the Network of Autonomously Recording Seismographs (NARS)-Botswana temporary network deployed from 2013 to 2018. In this paper, P and S wave data with good signal-to-noise ratiofrom twenty events of local magnitude greater or equal to 4.0are analysed with the Seisan software and used to infer properties of the upper crust in Botswana. The Vp/Vsratiosare determined from the travel-times of body waves and then converted to Poisson’s ratio, which is useful in determining the physical state of the subsurface materials. The Vp/Vs ratios of the upper crust in Botswana show regional variations from 1.70 to 1.77, with an average of 1.73. The Poisson’s ratios range from 0.24to 0.27 with an average of 0.25 and correlate well with the geological structures in Botswana.Keywords: Botswana, earthquake, poisson's ratio, seismic velocity, Vp/Vs ratio
Procedia PDF Downloads 13518838 A Hyperexponential Approximation to Finite-Time and Infinite-Time Ruin Probabilities of Compound Poisson Processes
Authors: Amir T. Payandeh Najafabadi
Abstract:
This article considers the problem of evaluating infinite-time (or finite-time) ruin probability under a given compound Poisson surplus process by approximating the claim size distribution by a finite mixture exponential, say Hyperexponential, distribution. It restates the infinite-time (or finite-time) ruin probability as a solvable ordinary differential equation (or a partial differential equation). Application of our findings has been given through a simulation study.Keywords: ruin probability, compound poisson processes, mixture exponential (hyperexponential) distribution, heavy-tailed distributions
Procedia PDF Downloads 34118837 Prevalence and Risk Factors of Cardiovascular Diseases among Bangladeshi Adults: Findings from a Cross Sectional Study
Authors: Fouzia Khanam, Belal Hossain, Kaosar Afsana, Mahfuzar Rahman
Abstract:
Aim: Although cardiovascular diseases (CVD) has already been recognized as a major cause of death in developed countries, its prevalence is rising in developing countries as well, and engendering a challenge for the health sector. Bangladesh has experienced an epidemiological transition from communicable to non-communicable diseases over the last few decades. So, the rising prevalence of CVD and its risk factors are imposing a major problem for the country. We aimed to examine the prevalence of CVDs and socioeconomic and lifestyle factors related to it from a population-based survey. Methods: The data used for this study were collected as a part of a large-scale cross-sectional study conducted to explore the overall health status of children, mothers and senior citizens of Bangladesh. Multistage cluster random sampling procedure was applied by considering unions as clusters and households as the primary sampling unit to select a total of 11,428 households for the base survey. Present analysis encompassed 12338 respondents of ≥ 35 years, selected from both rural areas and urban slums of the country. Socio-economic, demographic and lifestyle information were obtained through individual by a face-to-face interview which was noted in ODK platform. And height, weight, blood pressure and glycosuria were measured using standardized methods. Chi-square test, Univariate modified Poisson regression model, and multivariate modified Poisson regression model were done using STATA software (version 13.0) for analysis. Results: Overall, the prevalence of CVD was 4.51%, of which 1.78% had stroke and 3.17% suffered from heart diseases. Male had higher prevalence of stroke (2.20%) than their counterparts (1.37%). Notably, thirty percent of respondents had high blood pressure and 5% population had diabetes and more than half of the population was pre-hypertensive. Additionally, 20% were overweight, 77% were smoker or consumed smokeless tobacco and 28% of respondents were physically inactive. Eighty-two percent of respondents took extra salt while eating and 29% of respondents had deprived sleep. Furthermore, the prevalence of risk factor of CVD varied according to gender. Women had a higher prevalence of overweight, obesity and diabetes. Women were also less physically active compared to men and took more extra salt. Smoking was lower in women compared to men. Moreover, women slept less compared to their counterpart. After adjusting confounders in modified Poisson regression model, age, gender, occupation, wealth quintile, BMI, extra salt intake, daily sleep, tiredness, diabetes, and hypertension remained as risk factors for CVD. Conclusion: The prevalence of CVD is significant in Bangladesh, and there is an evidence of rising trend for its risk factors such as hypertension, diabetes especially in older population, women and high-income groups. Therefore, in this current epidemiological transition, immediate public health intervention is warranted to address the overwhelming CVD risk.Keywords: cardiovascular diseases, diabetes, hypertension, stroke
Procedia PDF Downloads 38118836 An Analysis of a Queueing System with Heterogeneous Servers Subject to Catastrophes
Authors: M. Reni Sagayaraj, S. Anand Gnana Selvam, R. Reynald Susainathan
Abstract:
This study analyzed a queueing system with blocking and no waiting line. The customers arrive according to a Poisson process and the service times follow exponential distribution. There are two non-identical servers in the system. The queue discipline is FCFS, and the customers select the servers on fastest server first (FSF) basis. The service times are exponentially distributed with parameters μ1 and μ2 at servers I and II, respectively. Besides, the catastrophes occur in a Poisson manner with rate γ in the system. When server I is busy or blocked, the customer who arrives in the system leaves the system without being served. Such customers are called lost customers. The probability of losing a customer was computed for the system. The explicit time dependent probabilities of system size are obtained and a numerical example is presented in order to show the managerial insights of the model. Finally, the probability that arriving customer finds system busy and average number of server busy in steady state are obtained numerically.Keywords: queueing system, blocking, poisson process, heterogeneous servers, queue discipline FCFS, busy period
Procedia PDF Downloads 50618835 Bayesian Hidden Markov Modelling of Blood Type Distribution for COVID-19 Cases Using Poisson Distribution
Authors: Johnson Joseph Kwabina Arhinful, Owusu-Ansah Emmanuel Degraft Johnson, Okyere Gabrial Asare, Adebanji Atinuke Olusola
Abstract:
This paper proposes a model to describe the blood types distribution of new Coronavirus (COVID-19) cases using the Bayesian Poisson - Hidden Markov Model (BP-HMM). With the help of the Gibbs sampler algorithm, using OpenBugs, the study first identifies the number of hidden states fitting European (EU) and African (AF) data sets of COVID-19 cases by blood type frequency. The study then compares the state-dependent mean of infection within and across the two geographical areas. The study findings show that the number of hidden states and infection rates within and across the two geographical areas differ according to blood type.Keywords: BP-HMM, COVID-19, blood types, GIBBS sampler
Procedia PDF Downloads 12918834 Statistical Analysis for Overdispersed Medical Count Data
Authors: Y. N. Phang, E. F. Loh
Abstract:
Many researchers have suggested the use of zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) models in modeling over-dispersed medical count data with extra variations caused by extra zeros and unobserved heterogeneity. The studies indicate that ZIP and ZINB always provide better fit than using the normal Poisson and negative binomial models in modeling over-dispersed medical count data. In this study, we proposed the use of Zero Inflated Inverse Trinomial (ZIIT), Zero Inflated Poisson Inverse Gaussian (ZIPIG) and zero inflated strict arcsine models in modeling over-dispersed medical count data. These proposed models are not widely used by many researchers especially in the medical field. The results show that these three suggested models can serve as alternative models in modeling over-dispersed medical count data. This is supported by the application of these suggested models to a real life medical data set. Inverse trinomial, Poisson inverse Gaussian, and strict arcsine are discrete distributions with cubic variance function of mean. Therefore, ZIIT, ZIPIG and ZISA are able to accommodate data with excess zeros and very heavy tailed. They are recommended to be used in modeling over-dispersed medical count data when ZIP and ZINB are inadequate.Keywords: zero inflated, inverse trinomial distribution, Poisson inverse Gaussian distribution, strict arcsine distribution, Pearson’s goodness of fit
Procedia PDF Downloads 54418833 Time of Week Intensity Estimation from Interval Censored Data with Application to Police Patrol Planning
Authors: Jiahao Tian, Michael D. Porter
Abstract:
Law enforcement agencies are tasked with crime prevention and crime reduction under limited resources. Having an accurate temporal estimate of the crime rate would be valuable to achieve such a goal. However, estimation is usually complicated by the interval-censored nature of crime data. We cast the problem of intensity estimation as a Poisson regression using an EM algorithm to estimate the parameters. Two special penalties are added that provide smoothness over the time of day and day of the week. This approach presented here provides accurate intensity estimates and can also uncover day-of-week clusters that share the same intensity patterns. Anticipating where and when crimes might occur is a key element to successful policing strategies. However, this task is complicated by the presence of interval-censored data. The censored data refers to the type of data that the event time is only known to lie within an interval instead of being observed exactly. This type of data is prevailing in the field of criminology because of the absence of victims for certain types of crime. Despite its importance, the research in temporal analysis of crime has lagged behind the spatial component. Inspired by the success of solving crime-related problems with a statistical approach, we propose a statistical model for the temporal intensity estimation of crime with censored data. The model is built on Poisson regression and has special penalty terms added to the likelihood. An EM algorithm was derived to obtain maximum likelihood estimates, and the resulting model shows superior performance to the competing model. Our research is in line with the smart policing initiative (SPI) proposed by the Bureau Justice of Assistance (BJA) as an effort to support law enforcement agencies in building evidence-based, data-driven law enforcement tactics. The goal is to identify strategic approaches that are effective in crime prevention and reduction. In our case, we allow agencies to deploy their resources for a relatively short period of time to achieve the maximum level of crime reduction. By analyzing a particular area within cities where data are available, our proposed approach could not only provide an accurate estimate of intensities for the time unit considered but a time-variation crime incidence pattern. Both will be helpful in the allocation of limited resources by either improving the existing patrol plan with the understanding of the discovery of the day of week cluster or supporting extra resources available.Keywords: cluster detection, EM algorithm, interval censoring, intensity estimation
Procedia PDF Downloads 6618832 The Profit Trend of Cosmetics Products Using Bootstrap Edgeworth Approximation
Authors: Edlira Donefski, Lorenc Ekonomi, Tina Donefski
Abstract:
Edgeworth approximation is one of the most important statistical methods that has a considered contribution in the reduction of the sum of standard deviation of the independent variables’ coefficients in a Quantile Regression Model. This model estimates the conditional median or other quantiles. In this paper, we have applied approximating statistical methods in an economical problem. We have created and generated a quantile regression model to see how the profit gained is connected with the realized sales of the cosmetic products in a real data, taken from a local business. The Linear Regression of the generated profit and the realized sales was not free of autocorrelation and heteroscedasticity, so this is the reason that we have used this model instead of Linear Regression. Our aim is to analyze in more details the relation between the variables taken into study: the profit and the finalized sales and how to minimize the standard errors of the independent variable involved in this study, the level of realized sales. The statistical methods that we have applied in our work are Edgeworth Approximation for Independent and Identical distributed (IID) cases, Bootstrap version of the Model and the Edgeworth approximation for Bootstrap Quantile Regression Model. The graphics and the results that we have presented here identify the best approximating model of our study.Keywords: bootstrap, edgeworth approximation, IID, quantile
Procedia PDF Downloads 15918831 Generalized Additive Model for Estimating Propensity Score
Authors: Tahmidul Islam
Abstract:
Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching
Procedia PDF Downloads 36718830 Semiparametric Regression Of Truncated Spline Biresponse On Farmer Loyalty And Attachment Modeling
Authors: Adji Achmad Rinaldo Fernandes
Abstract:
Regression analysis is a statistical method that is able to describe and predict causal relationships between individuals. Not all relationships have a known curve shape; often, there are relationship patterns that cannot be known in the shape of the curve; besides that, a cause can have an impact on more than one effect, so that between effects can also have a close relationship in it. Regression analysis that can be done to find out the relationship can be brought closer to the semiparametric regression of truncated spline biresponse. The purpose of this study is to examine the function estimator and determine the best model of truncated spline biresponse semiparametric regression. The results of the secondary data study showed that the best model with the highest order of quadratic and a maximum of two knots with a Goodness of fit value in the form of Adjusted R2 of 88.5%.Keywords: biresponse, farmer attachment, farmer loyalty, truncated spline
Procedia PDF Downloads 3718829 Establishment of the Regression Uncertainty of the Critical Heat Flux Power Correlation for an Advanced Fuel Bundle
Authors: L. Q. Yuan, J. Yang, A. Siddiqui
Abstract:
A new regression uncertainty analysis methodology was applied to determine the uncertainties of the critical heat flux (CHF) power correlation for an advanced 43-element bundle design, which was developed by Canadian Nuclear Laboratories (CNL) to achieve improved economics, resource utilization and energy sustainability. The new methodology is considered more appropriate than the traditional methodology in the assessment of the experimental uncertainty associated with regressions. The methodology was first assessed using both the Monte Carlo Method (MCM) and the Taylor Series Method (TSM) for a simple linear regression model, and then extended successfully to a non-linear CHF power regression model (CHF power as a function of inlet temperature, outlet pressure and mass flow rate). The regression uncertainty assessed by MCM agrees well with that by TSM. An equation to evaluate the CHF power regression uncertainty was developed and expressed as a function of independent variables that determine the CHF power.Keywords: CHF experiment, CHF correlation, regression uncertainty, Monte Carlo Method, Taylor Series Method
Procedia PDF Downloads 41618828 A Comparative Study of Additive and Nonparametric Regression Estimators and Variable Selection Procedures
Authors: Adriano Z. Zambom, Preethi Ravikumar
Abstract:
One of the biggest challenges in nonparametric regression is the curse of dimensionality. Additive models are known to overcome this problem by estimating only the individual additive effects of each covariate. However, if the model is misspecified, the accuracy of the estimator compared to the fully nonparametric one is unknown. In this work the efficiency of completely nonparametric regression estimators such as the Loess is compared to the estimators that assume additivity in several situations, including additive and non-additive regression scenarios. The comparison is done by computing the oracle mean square error of the estimators with regards to the true nonparametric regression function. Then, a backward elimination selection procedure based on the Akaike Information Criteria is proposed, which is computed from either the additive or the nonparametric model. Simulations show that if the additive model is misspecified, the percentage of time it fails to select important variables can be higher than that of the fully nonparametric approach. A dimension reduction step is included when nonparametric estimator cannot be computed due to the curse of dimensionality. Finally, the Boston housing dataset is analyzed using the proposed backward elimination procedure and the selected variables are identified.Keywords: additive model, nonparametric regression, variable selection, Akaike Information Criteria
Procedia PDF Downloads 26518827 Internet Purchases in European Union Countries: Multiple Linear Regression Approach
Authors: Ksenija Dumičić, Anita Čeh Časni, Irena Palić
Abstract:
This paper examines economic and Information and Communication Technology (ICT) development influence on recently increasing Internet purchases by individuals for European Union member states. After a growing trend for Internet purchases in EU27 was noticed, all possible regression analysis was applied using nine independent variables in 2011. Finally, two linear regression models were studied in detail. Conducted simple linear regression analysis confirmed the research hypothesis that the Internet purchases in analysed EU countries is positively correlated with statistically significant variable Gross Domestic Product per capita (GDPpc). Also, analysed multiple linear regression model with four regressors, showing ICT development level, indicates that ICT development is crucial for explaining the Internet purchases by individuals, confirming the research hypothesis.Keywords: European union, Internet purchases, multiple linear regression model, outlier
Procedia PDF Downloads 303