Search results for: polynomial regression model
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 18276

Search results for: polynomial regression model

18036 Development of Generalized Correlation for Liquid Thermal Conductivity of N-Alkane and Olefin

Authors: A. Ishag Mohamed, A. A. Rabah

Abstract:

The objective of this research is to develop a generalized correlation for the prediction of thermal conductivity of n-Alkanes and Alkenes. There is a minority of research and lack of correlation for thermal conductivity of liquids in the open literature. The available experimental data are collected covering the groups of n-Alkanes and Alkenes.The data were assumed to correlate to temperature using Filippov correlation. Nonparametric regression of Grace Algorithm was used to develop the generalized correlation model. A spread sheet program based on Microsoft Excel was used to plot and calculate the value of the coefficients. The results obtained were compared with the data that found in Perry's Chemical Engineering Hand Book. The experimental data correlated to the temperature ranged "between" 273.15 to 673.15 K, with R2 = 0.99.The developed correlation reproduced experimental data that which were not included in regression with absolute average percent deviation (AAPD) of less than 7 %. Thus the spread sheet was quite accurate which produces reliable data.

Keywords: N-Alkanes, N-Alkenes, nonparametric, regression

Procedia PDF Downloads 623
18035 Competitors’ Influence Analysis of a Retailer by Using Customer Value and Huff’s Gravity Model

Authors: Yepeng Cheng, Yasuhiko Morimoto

Abstract:

Customer relationship analysis is vital for retail stores, especially for supermarkets. The point of sale (POS) systems make it possible to record the daily purchasing behaviors of customers as an identification point of sale (ID-POS) database, which can be used to analyze customer behaviors of a supermarket. The customer value is an indicator based on ID-POS database for detecting the customer loyalty of a store. In general, there are many supermarkets in a city, and other nearby competitor supermarkets significantly affect the customer value of customers of a supermarket. However, it is impossible to get detailed ID-POS databases of competitor supermarkets. This study firstly focused on the customer value and distance between a customer's home and supermarkets in a city, and then constructed the models based on logistic regression analysis to analyze correlations between distance and purchasing behaviors only from a POS database of a supermarket chain. During the modeling process, there are three primary problems existed, including the incomparable problem of customer values, the multicollinearity problem among customer value and distance data, and the number of valid partial regression coefficients. The improved customer value, Huff’s gravity model, and inverse attractiveness frequency are considered to solve these problems. This paper presents three types of models based on these three methods for loyal customer classification and competitors’ influence analysis. In numerical experiments, all types of models are useful for loyal customer classification. The type of model, including all three methods, is the most superior one for evaluating the influence of the other nearby supermarkets on customers' purchasing of a supermarket chain from the viewpoint of valid partial regression coefficients and accuracy.

Keywords: customer value, Huff's Gravity Model, POS, Retailer

Procedia PDF Downloads 92
18034 Understanding the Effect of Fall Armyworm and Integrated Pest Management Practices on the Farm Productivity and Food Security in Malawi

Authors: Innocent Pangapanga, Eric Mungatana

Abstract:

Fall armyworm (FAW) (Spodoptera frugiperda), an invasive lepidopteran pest, has caused substantial yield loss since its first detection in September 2016, thereby threatening the farm productivity food security and poverty reduction initiatives in Malawi. Several stakeholders, including households, have adopted chemical pesticides to control FAW without accounting for its costs on welfare, health and the environment. Thus, this study has used panel data endogenous switching regression model to investigate the impact of FAW and the integrated pest management (IPM) –related practices on-farm productivity and food security. The study finds that FAW substantively reduces farm productivity by seven (7) percent and influences the adoption of IPM –related practices, namely, intercropping, mulching, and agroforestry, by 6 percent, ceteris paribus. Interestingly, multiple adoptions of the IPM -related practices noticeably increase farm productivity by 21 percent. After accounting for potential endogeneity through the endogenous switching regression model, the IPM practices further demonstrate tenfold more improvement on food security, implying the role of the IPM –related practices in containing the effect of FAW at the household level.

Keywords: hunger, invasive fall army worms, integrated pest management practices, farm productivity, endogenous switching regression

Procedia PDF Downloads 106
18033 Method of Parameter Calibration for Error Term in Stochastic User Equilibrium Traffic Assignment Model

Authors: Xiang Zhang, David Rey, S. Travis Waller

Abstract:

Stochastic User Equilibrium (SUE) model is a widely used traffic assignment model in transportation planning, which is regarded more advanced than Deterministic User Equilibrium (DUE) model. However, a problem exists that the performance of the SUE model depends on its error term parameter. The objective of this paper is to propose a systematic method of determining the appropriate error term parameter value for the SUE model. First, the significance of the parameter is explored through a numerical example. Second, the parameter calibration method is developed based on the Logit-based route choice model. The calibration process is realized through multiple nonlinear regression, using sequential quadratic programming combined with least square method. Finally, case analysis is conducted to demonstrate the application of the calibration process and validate the better performance of the SUE model calibrated by the proposed method compared to the SUE models under other parameter values and the DUE model.

Keywords: parameter calibration, sequential quadratic programming, stochastic user equilibrium, traffic assignment, transportation planning

Procedia PDF Downloads 261
18032 Parameter Estimation via Metamodeling

Authors: Sergio Haram Sarmiento, Arcady Ponosov

Abstract:

Based on appropriate multivariate statistical methodology, we suggest a generic framework for efficient parameter estimation for ordinary differential equations and the corresponding nonlinear models. In this framework classical linear regression strategies is refined into a nonlinear regression by a locally linear modelling technique (known as metamodelling). The approach identifies those latent variables of the given model that accumulate most information about it among all approximations of the same dimension. The method is applied to several benchmark problems, in particular, to the so-called ”power-law systems”, being non-linear differential equations typically used in Biochemical System Theory.

Keywords: principal component analysis, generalized law of mass action, parameter estimation, metamodels

Procedia PDF Downloads 476
18031 A Contribution to the Polynomial Eigen Problem

Authors: Malika Yaici, Kamel Hariche, Tim Clarke

Abstract:

The relationship between eigenstructure (eigenvalues and eigenvectors) and latent structure (latent roots and latent vectors) is established. In control theory eigenstructure is associated with the state space description of a dynamic multi-variable system and a latent structure is associated with its matrix fraction description. Beginning with block controller and block observer state space forms and moving on to any general state space form, we develop the identities that relate eigenvectors and latent vectors in either direction. Numerical examples illustrate this result. A brief discussion of the potential of these identities in linear control system design follows. Additionally, we present a consequent result: a quick and easy method to solve the polynomial eigenvalue problem for regular matrix polynomials.

Keywords: eigenvalues/eigenvectors, latent values/vectors, matrix fraction description, state space description

Procedia PDF Downloads 432
18030 Mathematical Modeling Pressure Losses of Trapezoidal Labyrinth Channel and Bi-Objective Optimization of the Design Parameters

Authors: Nina Philipova

Abstract:

The influence of the geometric parameters of trapezoidal labyrinth channel on the pressure losses along the labyrinth length is investigated in this work. The impact of the dentate height is studied at fixed values of the dentate angle and the dentate spacing. The objective of the work presented in this paper is to derive a mathematical model of the pressure losses along the labyrinth length depending on the dentate height. The numerical simulations of the water flow movement are performed by using Commercial codes ANSYS GAMBIT and FLUENT. Dripper inlet pressure is set up to be 1 bar. As a result, the mathematical model of the pressure losses is determined as a second-order polynomial by means Commercial code STATISTIKA. Bi-objective optimization is performed by using the mean algebraic function of utility. The optimum value of the dentate height is defined at fixed values of the dentate angle and the dentate spacing. The derived model of the pressure losses and the optimum value of the dentate height are used as a basis for a more successful emitter design.

Keywords: drip irrigation, labyrinth channel hydrodynamics, numerical simulations, Reynolds stress model

Procedia PDF Downloads 123
18029 Combined Analysis of m⁶A and m⁵C Modulators on the Prognosis of Hepatocellular Carcinoma

Authors: Hongmeng Su, Luyu Zhao, Yanyan Qian, Hong Fan

Abstract:

Aim: Hepatocellular carcinoma (HCC) is one of the most common malignant tumors that endanger human health seriously. RNA methylation, especially N6-methyladenosine (m⁶A) and 5-methylcytosine (m⁵C), a crucial epigenetic transcriptional regulatory mechanism, plays an important role in tumorigenesis, progression and prognosis. This research aims to systematically evaluate the prognostic value of m⁶A and m⁵C modulators in HCC patients. Methods: Twenty-four modulators of m⁶A and m⁵C were candidates to analyze their expression level and their contribution to predict the prognosis of HCC. Consensus clustering analysis was applied to classify HCC patients. Cox and LASSO regression were used to construct the risk model. According to the risk score, HCC patients were divided into high-risk and low/medium-risk groups. The clinical pathology factors of HCC patients were analyzed by univariate and multivariate Cox regression analysis. Results: The HCC patients were classified into 2 clusters with significant differences in overall survival and clinical characteristics. Nine-gene risk model was constructed including METTL3, VIRMA, YTHDF1, YTHDF2, NOP2, NSUN4, NSUN5, DNMT3A and ALYREF. It was indicated that the risk score could serve as an independent prognostic factor for patients with HCC. Conclusion: This study constructed a Nine-gene risk model by modulators of m⁶A and m⁵C and investigated its effect on the clinical prognosis of HCC. This model may provide important consideration for the therapeutic strategy and prognosis evaluation analysis of patients with HCC.

Keywords: hepatocellular carcinoma, m⁶A, m⁵C, prognosis, RNA methylation

Procedia PDF Downloads 32
18028 Student Loan Debt among Students with Disabilities

Authors: Kaycee Bills

Abstract:

This study will determine if students with disabilities have higher student loan debt payments than other student populations. The hypothesis was that students with disabilities would have significantly higher student loan debt payments than other students due to the length of time they spend in school. Using the Bachelorette and Beyond Study Wave 2015/017 dataset, quantitative methods were employed. These data analysis methods included linear regression and a correlation matrix. Due to the exploratory nature of the study, the significance levels for the overall model and each variable were set at .05. The correlation matrix demonstrated that students with certain types of disabilities are more likely to fall under higher student loan payment brackets than students without disabilities. These results also varied among the different types of disabilities. The result of the overall linear regression model was statistically significant (p = .04). Despite the overall model being statistically significant, the majority of the significance values for the different types of disabilities were null. However, several other variables had statistically significant results, such as veterans, people of minority races, and people who attended private schools. Implications for how this impacts the economy, capitalism, and financial wellbeing of various students are discussed.

Keywords: disability, student loan debt, higher education, social work

Procedia PDF Downloads 145
18027 Analysis of Active Compounds in Thai Herbs by near Infrared Spectroscopy

Authors: Chaluntorn Vichasilp, Sutee Wangtueai

Abstract:

This study aims to develop a new method to detect active compounds in Thai herbs (1-deoxynojirimycin (DNJ) in mulberry leave, anthocyanin in Mao and curcumin in turmeric) using near infrared spectroscopy (NIRs). NIRs is non-destructive technique that rapid, non-chemical involved and low-cost determination. By NIRs and chemometrics technique, it was found that the DNJ prediction equation conducted with partial least square regression with cross-validation had low accuracy R2 (0.42) and SEP (31.87 mg/100g). On the other hand, the anthocyanin prediction equation showed moderate good results (R2 and SEP of 0.78 and 0.51 mg/g) with Multiplication scattering correction at wavelength of 2000-2200 nm. The high absorption could be observed at wavelength of 2047 nm and this model could be used as screening level. For curcumin prediction, the good result was obtained when applied original spectra with smoothing technique. The wavelength of 1400-2500 nm was created regression model with R2 (0.68) and SEP (0.17 mg/g). This model had high NIRs absorption at a wavelength of 1476, 1665, 1986 and 2395 nm, respectively. NIRs showed prospective technique for detection of some active compounds in Thai herbs.

Keywords: anthocyanin, curcumin, 1-deoxynojirimycin (DNJ), near infrared spectroscopy (NIRs)

Procedia PDF Downloads 349
18026 Derivation of Bathymetry from High-Resolution Satellite Images: Comparison of Empirical Methods through Geographical Error Analysis

Authors: Anusha P. Wijesundara, Dulap I. Rathnayake, Nihal D. Perera

Abstract:

Bathymetric information is fundamental importance to coastal and marine planning and management, nautical navigation, and scientific studies of marine environments. Satellite-derived bathymetry data provide detailed information in areas where conventional sounding data is lacking and conventional surveys are inaccessible. The two empirical approaches of log-linear bathymetric inversion model and non-linear bathymetric inversion model are applied for deriving bathymetry from high-resolution multispectral satellite imagery. This study compares these two approaches by means of geographical error analysis for the site Kankesanturai using WorldView-2 satellite imagery. Based on the Levenberg-Marquardt method calibrated the parameters of non-linear inversion model and the multiple-linear regression model was applied to calibrate the log-linear inversion model. In order to calibrate both models, Single Beam Echo Sounding (SBES) data in this study area were used as reference points. Residuals were calculated as the difference between the derived depth values and the validation echo sounder bathymetry data and the geographical distribution of model residuals was mapped. The spatial autocorrelation was calculated by comparing the performance of the bathymetric models and the results showing the geographic errors for both models. A spatial error model was constructed from the initial bathymetry estimates and the estimates of autocorrelation. This spatial error model is used to generate more reliable estimates of bathymetry by quantifying autocorrelation of model error and incorporating this into an improved regression model. Log-linear model (R²=0.846) performs better than the non- linear model (R²=0.692). Finally, the spatial error models improved bathymetric estimates derived from linear and non-linear models up to R²=0.854 and R²=0.704 respectively. The Root Mean Square Error (RMSE) was calculated for all reference points in various depth ranges. The magnitude of the prediction error increases with depth for both the log-linear and the non-linear inversion models. Overall RMSE for log-linear and the non-linear inversion models were ±1.532 m and ±2.089 m, respectively.

Keywords: log-linear model, multi spectral, residuals, spatial error model

Procedia PDF Downloads 272
18025 Integrated Nested Laplace Approximations For Quantile Regression

Authors: Kajingulu Malandala, Ranganai Edmore

Abstract:

The asymmetric Laplace distribution (ADL) is commonly used as the likelihood function of the Bayesian quantile regression, and it offers different families of likelihood method for quantile regression. Notwithstanding their popularity and practicality, ADL is not smooth and thus making it difficult to maximize its likelihood. Furthermore, Bayesian inference is time consuming and the selection of likelihood may mislead the inference, as the Bayes theorem does not automatically establish the posterior inference. Furthermore, ADL does not account for greater skewness and Kurtosis. This paper develops a new aspect of quantile regression approach for count data based on inverse of the cumulative density function of the Poisson, binomial and Delaporte distributions using the integrated nested Laplace Approximations. Our result validates the benefit of using the integrated nested Laplace Approximations and support the approach for count data.

Keywords: quantile regression, Delaporte distribution, count data, integrated nested Laplace approximation

Procedia PDF Downloads 132
18024 The Influence of Air Temperature Controls in Estimation of Air Temperature over Homogeneous Terrain

Authors: Fariza Yunus, Jasmee Jaafar, Zamalia Mahmud, Nurul Nisa’ Khairul Azmi, Nursalleh K. Chang, Nursalleh K. Chang

Abstract:

Variation of air temperature from one place to another is cause by air temperature controls. In general, the most important control of air temperature is elevation. Another significant independent variable in estimating air temperature is the location of meteorological stations. Distances to coastline and land use type are also contributed to significant variations in the air temperature. On the other hand, in homogeneous terrain direct interpolation of discrete points of air temperature work well to estimate air temperature values in un-sampled area. In this process the estimation is solely based on discrete points of air temperature. However, this study presents that air temperature controls also play significant roles in estimating air temperature over homogenous terrain of Peninsular Malaysia. An Inverse Distance Weighting (IDW) interpolation technique was adopted to generate continuous data of air temperature. This study compared two different datasets, observed mean monthly data of T, and estimation error of T–T’, where T’ estimated value from a multiple regression model. The multiple regression model considered eight independent variables of elevation, latitude, longitude, coastline, and four land use types of water bodies, forest, agriculture and build up areas, to represent the role of air temperature controls. Cross validation analysis was conducted to review accuracy of the estimation values. Final results show, estimation values of T–T’ produced lower errors for mean monthly mean air temperature over homogeneous terrain in Peninsular Malaysia.

Keywords: air temperature control, interpolation analysis, peninsular Malaysia, regression model, air temperature

Procedia PDF Downloads 348
18023 On the Performance of Improvised Generalized M-Estimator in the Presence of High Leverage Collinearity Enhancing Observations

Authors: Habshah Midi, Mohammed A. Mohammed, Sohel Rana

Abstract:

Multicollinearity occurs when two or more independent variables in a multiple linear regression model are highly correlated. The ridge regression is the commonly used method to rectify this problem. However, the ridge regression cannot handle the problem of multicollinearity which is caused by high leverage collinearity enhancing observation (HLCEO). Since high leverage points (HLPs) are responsible for inducing multicollinearity, the effect of HLPs needs to be reduced by using Generalized M estimator. The existing GM6 estimator is based on the Minimum Volume Ellipsoid (MVE) which tends to swamp some low leverage points. Hence an improvised GM (MGM) estimator is presented to improve the precision of the GM6 estimator. Numerical example and simulation study are presented to show how HLPs can cause multicollinearity. The numerical results show that our MGM estimator is the most efficient method compared to some existing methods.

Keywords: identification, high leverage points, multicollinearity, GM-estimator, DRGP, DFFITS

Procedia PDF Downloads 219
18022 Replicating Brain’s Resting State Functional Connectivity Network Using a Multi-Factor Hub-Based Model

Authors: B. L. Ho, L. Shi, D. F. Wang, V. C. T. Mok

Abstract:

The brain’s functional connectivity while temporally non-stationary does express consistency at a macro spatial level. The study of stable resting state connectivity patterns hence provides opportunities for identification of diseases if such stability is severely perturbed. A mathematical model replicating the brain’s spatial connections will be useful for understanding brain’s representative geometry and complements the empirical model where it falls short. Empirical computations tend to involve large matrices and become infeasible with fine parcellation. However, the proposed analytical model has no such computational problems. To improve replicability, 92 subject data are obtained from two open sources. The proposed methodology, inspired by financial theory, uses multivariate regression to find relationships of every cortical region of interest (ROI) with some pre-identified hubs. These hubs acted as representatives for the entire cortical surface. A variance-covariance framework of all ROIs is then built based on these relationships to link up all the ROIs. The result is a high level of match between model and empirical correlations in the range of 0.59 to 0.66 after adjusting for sample size; an increase of almost forty percent. More significantly, the model framework provides an intuitive way to delineate between systemic drivers and idiosyncratic noise while reducing dimensions by more than 30 folds, hence, providing a way to conduct attribution analysis. Due to its analytical nature and simple structure, the model is useful as a standalone toolkit for network dependency analysis or as a module for other mathematical models.

Keywords: functional magnetic resonance imaging, multivariate regression, network hubs, resting state functional connectivity

Procedia PDF Downloads 125
18021 Nowcasting Indonesian Economy

Authors: Ferry Kurniawan

Abstract:

In this paper, we nowcast quarterly output growth in Indonesia by exploiting higher frequency data (monthly indicators) using a mixed-frequency factor model and exploiting both quarterly and monthly data. Nowcasting quarterly GDP in Indonesia is particularly relevant for the central bank of Indonesia which set the policy rate in the monthly Board of Governors Meeting; whereby one of the important step is the assessment of the current state of the economy. Thus, having an accurate and up-to-date quarterly GDP nowcast every time new monthly information becomes available would clearly be of interest for central bank of Indonesia, for example, as the initial assessment of the current state of the economy -including nowcast- will be used as input for longer term forecast. We consider a small scale mixed-frequency factor model to produce nowcasts. In particular, we specify variables as year-on-year growth rates thus the relation between quarterly and monthly data is expressed in year-on-year growth rates. To assess the performance of the model, we compare the nowcasts with two other approaches: autoregressive model –which is often difficult when forecasting output growth- and Mixed Data Sampling (MIDAS) regression. In particular, both mixed frequency factor model and MIDAS nowcasts are produced by exploiting the same set of monthly indicators. Hence, we compare the nowcasts performance of the two approaches directly. To preview the results, we find that by exploiting monthly indicators using mixed-frequency factor model and MIDAS regression we improve the nowcast accuracy over a benchmark simple autoregressive model that uses only quarterly frequency data. However, it is not clear whether the MIDAS or mixed-frequency factor model is better. Neither set of nowcasts encompasses the other; suggesting that both nowcasts are valuable in nowcasting GDP but neither is sufficient. By combining the two individual nowcasts, we find that the nowcast combination not only increases the accuracy - relative to individual nowcasts- but also lowers the risk of the worst performance of the individual nowcasts.

Keywords: nowcasting, mixed-frequency data, factor model, nowcasts combination

Procedia PDF Downloads 305
18020 Short-Term Energy Efficiency Decay and Risk Analysis of Ground Source Heat Pump System

Authors: Tu Shuyang, Zhang Xu, Zhou Xiang

Abstract:

The objective of this paper is to investigate the effect of short-term heat exchange decay of ground heat exchanger (GHE) on the ground source heat pump (GSHP) energy efficiency and capacity. A resistance-capacitance (RC) model was developed and adopted to simulate the transient characteristics of the ground thermal condition and heat exchange. The capacity change of the GSHP was linked to the inlet and outlet water temperature by polynomial fitting according to measured parameters given by heat pump manufacturers. Thus, the model, which combined the heat exchange decay with the capacity change, reflected the energy efficiency decay of the whole system. A case of GSHP system was analyzed by the model, and the result showed that there was risk that the GSHP might not meet the load demand because of the efficiency decay in a short-term operation. The conclusion would provide some guidances for GSHP system design to overcome the risk.

Keywords: capacity, energy efficiency, GSHP, heat exchange

Procedia PDF Downloads 311
18019 Efficient Estimation for the Cox Proportional Hazards Cure Model

Authors: Khandoker Akib Mohammad

Abstract:

While analyzing time-to-event data, it is possible that a certain fraction of subjects will never experience the event of interest, and they are said to be cured. When this feature of survival models is taken into account, the models are commonly referred to as cure models. In the presence of covariates, the conditional survival function of the population can be modelled by using the cure model, which depends on the probability of being uncured (incidence) and the conditional survival function of the uncured subjects (latency), and a combination of logistic regression and Cox proportional hazards (PH) regression is used to model the incidence and latency respectively. In this paper, we have shown the asymptotic normality of the profile likelihood estimator via asymptotic expansion of the profile likelihood and obtain the explicit form of the variance estimator with an implicit function in the profile likelihood. We have also shown the efficient score function based on projection theory and the profile likelihood score function are equal. Our contribution in this paper is that we have expressed the efficient information matrix as the variance of the profile likelihood score function. A simulation study suggests that the estimated standard errors from bootstrap samples (SMCURE package) and the profile likelihood score function (our approach) are providing similar and comparable results. The numerical result of our proposed method is also shown by using the melanoma data from SMCURE R-package, and we compare the results with the output obtained from the SMCURE package.

Keywords: Cox PH model, cure model, efficient score function, EM algorithm, implicit function, profile likelihood

Procedia PDF Downloads 108
18018 Assessing the Impacts of Urbanization on Urban Precincts: A Case of Golconda Precinct, Hyderabad

Authors: Sai AKhila Budaraju

Abstract:

Heritage sites are an integral part of cities and carry a sense of identity to the cities/ towns, but the process of urbanization is a carrying potential threat for the loss of these heritage sites/monuments. Both Central and State Governments listed the historic Golconda fort as National Important Monument and the Heritage precinct with eight heritage-listed buildings and two historical sites respectively, for conservation and preservation, due to the presence of IT Corridor 6kms away accommodating more people in the precinct is under constant pressure. The heritage precinct possesses high property values, being a prime location connecting the IT corridor and CBD (central business district )areas. The primary objective of the study was to assess and identify the factors that are affecting the heritage precinct through Mapping and documentation, Identifying and assessing the factors through empirical analysis, Ordinal regression analysis and Hedonic Pricing Model. Ordinal regression analysis was used to identify the factors that contribute to the changes in the precinct due to urbanization. Hedonic Pricing Model was used to understand and establish a relation whether the presence of historical monuments is also a contributing factor to the property value and to what extent this influence can contribute. The above methods and field visit indicates the Physical, socio-economic factors and the neighborhood characteristics of the precinct contributing to the property values. The outturns and the potential elements derived from the analysis of the Development Control Rules were derived as recommendations to Integrate both Old and newly built environments.

Keywords: heritage planning, heritage conservation, hedonic pricing model, ordinal regression analysis

Procedia PDF Downloads 162
18017 Estimation of Missing Values in Aggregate Level Spatial Data

Authors: Amitha Puranik, V. S. Binu, Seena Biju

Abstract:

Missing data is a common problem in spatial analysis especially at the aggregate level. Missing can either occur in covariate or in response variable or in both in a given location. Many missing data techniques are available to estimate the missing data values but not all of these methods can be applied on spatial data since the data are autocorrelated. Hence there is a need to develop a method that estimates the missing values in both response variable and covariates in spatial data by taking account of the spatial autocorrelation. The present study aims to develop a model to estimate the missing data points at the aggregate level in spatial data by accounting for (a) Spatial autocorrelation of the response variable (b) Spatial autocorrelation of covariates and (c) Correlation between covariates and the response variable. Estimating the missing values of spatial data requires a model that explicitly account for the spatial autocorrelation. The proposed model not only accounts for spatial autocorrelation but also utilizes the correlation that exists between covariates, within covariates and between a response variable and covariates. The precise estimation of the missing data points in spatial data will result in an increased precision of the estimated effects of independent variables on the response variable in spatial regression analysis.

Keywords: spatial regression, missing data estimation, spatial autocorrelation, simulation analysis

Procedia PDF Downloads 343
18016 Indian Premier League (IPL) Score Prediction: Comparative Analysis of Machine Learning Models

Authors: Rohini Hariharan, Yazhini R, Bhamidipati Naga Shrikarti

Abstract:

In the realm of cricket, particularly within the context of the Indian Premier League (IPL), the ability to predict team scores accurately holds significant importance for both cricket enthusiasts and stakeholders alike. This paper presents a comprehensive study on IPL score prediction utilizing various machine learning algorithms, including Support Vector Machines (SVM), XGBoost, Multiple Regression, Linear Regression, K-nearest neighbors (KNN), and Random Forest. Through meticulous data preprocessing, feature engineering, and model selection, we aimed to develop a robust predictive framework capable of forecasting team scores with high precision. Our experimentation involved the analysis of historical IPL match data encompassing diverse match and player statistics. Leveraging this data, we employed state-of-the-art machine learning techniques to train and evaluate the performance of each model. Notably, Multiple Regression emerged as the top-performing algorithm, achieving an impressive accuracy of 77.19% and a precision of 54.05% (within a threshold of +/- 10 runs). This research contributes to the advancement of sports analytics by demonstrating the efficacy of machine learning in predicting IPL team scores. The findings underscore the potential of advanced predictive modeling techniques to provide valuable insights for cricket enthusiasts, team management, and betting agencies. Additionally, this study serves as a benchmark for future research endeavors aimed at enhancing the accuracy and interpretability of IPL score prediction models.

Keywords: indian premier league (IPL), cricket, score prediction, machine learning, support vector machines (SVM), xgboost, multiple regression, linear regression, k-nearest neighbors (KNN), random forest, sports analytics

Procedia PDF Downloads 9
18015 Weighted Rank Regression with Adaptive Penalty Function

Authors: Kang-Mo Jung

Abstract:

The use of regularization for statistical methods has become popular. The least absolute shrinkage and selection operator (LASSO) framework has become the standard tool for sparse regression. However, it is well known that the LASSO is sensitive to outliers or leverage points. We consider a new robust estimation which is composed of the weighted loss function of the pairwise difference of residuals and the adaptive penalty function regulating the tuning parameter for each variable. Rank regression is resistant to regression outliers, but not to leverage points. By adopting a weighted loss function, the proposed method is robust to leverage points of the predictor variable. Furthermore, the adaptive penalty function gives us good statistical properties in variable selection such as oracle property and consistency. We develop an efficient algorithm to compute the proposed estimator using basic functions in program R. We used an optimal tuning parameter based on the Bayesian information criterion (BIC). Numerical simulation shows that the proposed estimator is effective for analyzing real data set and contaminated data.

Keywords: adaptive penalty function, robust penalized regression, variable selection, weighted rank regression

Procedia PDF Downloads 429
18014 Monte Carlo Estimation of Heteroscedasticity and Periodicity Effects in a Panel Data Regression Model

Authors: Nureni O. Adeboye, Dawud A. Agunbiade

Abstract:

This research attempts to investigate the effects of heteroscedasticity and periodicity in a Panel Data Regression Model (PDRM) by extending previous works on balanced panel data estimation within the context of fitting PDRM for Banks audit fee. The estimation of such model was achieved through the derivation of Joint Lagrange Multiplier (LM) test for homoscedasticity and zero-serial correlation, a conditional LM test for zero serial correlation given heteroscedasticity of varying degrees as well as conditional LM test for homoscedasticity given first order positive serial correlation via a two-way error component model. Monte Carlo simulations were carried out for 81 different variations, of which its design assumed a uniform distribution under a linear heteroscedasticity function. Each of the variation was iterated 1000 times and the assessment of the three estimators considered are based on Variance, Absolute bias (ABIAS), Mean square error (MSE) and the Root Mean Square (RMSE) of parameters estimates. Eighteen different models at different specified conditions were fitted, and the best-fitted model is that of within estimator when heteroscedasticity is severe at either zero or positive serial correlation value. LM test results showed that the tests have good size and power as all the three tests are significant at 5% for the specified linear form of heteroscedasticity function which established the facts that Banks operations are severely heteroscedastic in nature with little or no periodicity effects.

Keywords: audit fee lagrange multiplier test, heteroscedasticity, lagrange multiplier test, Monte-Carlo scheme, periodicity

Procedia PDF Downloads 113
18013 An Efficient Algorithm of Time Step Control for Error Correction Method

Authors: Youngji Lee, Yonghyeon Jeon, Sunyoung Bu, Philsu Kim

Abstract:

The aim of this paper is to construct an algorithm of time step control for the error correction method most recently developed by one of the authors for solving stiff initial value problems. It is achieved with the generalized Chebyshev polynomial and the corresponding error correction method. The main idea of the proposed scheme is in the usage of the duplicated node points in the generalized Chebyshev polynomials of two different degrees by adding necessary sample points instead of re-sampling all points. At each integration step, the proposed method is comprised of two equations for the solution and the error, respectively. The constructed algorithm controls both the error and the time step size simultaneously and possesses a good performance in the computational cost compared to the original method. Two stiff problems are numerically solved to assess the effectiveness of the proposed scheme.

Keywords: stiff initial value problem, error correction method, generalized Chebyshev polynomial, node points

Procedia PDF Downloads 532
18012 Meta Model for Optimum Design Objective Function of Steel Frames Subjected to Seismic Loads

Authors: Salah R. Al Zaidee, Ali S. Mahdi

Abstract:

Except for simple problems of statically determinate structures, optimum design problems in structural engineering have implicit objective functions where structural analysis and design are essential within each searching loop. With these implicit functions, the structural engineer is usually enforced to write his/her own computer code for analysis, design, and searching for optimum design among many feasible candidates and cannot take advantage of available software for structural analysis, design, and searching for the optimum solution. The meta-model is a regression model used to transform an implicit objective function into objective one and leads in turn to decouple the structural analysis and design processes from the optimum searching process. With the meta-model, well-known software for structural analysis and design can be used in sequence with optimum searching software. In this paper, the meta-model has been used to develop an explicit objective function for plane steel frames subjected to dead, live, and seismic forces. Frame topology is assumed as predefined based on architectural and functional requirements. Columns and beams sections and different connections details are the main design variables in this study. Columns and beams are grouped to reduce the number of design variables and to make the problem similar to that adopted in engineering practice. Data for the implicit objective function have been generated based on analysis and assessment for many design proposals with CSI SAP software. These data have been used later in SPSS software to develop a pure quadratic nonlinear regression model for the explicit objective function. Good correlations with a coefficient, R2, in the range from 0.88 to 0.99 have been noted between the original implicit functions and the corresponding explicit functions generated with meta-model.

Keywords: meta-modal, objective function, steel frames, seismic analysis, design

Procedia PDF Downloads 216
18011 Phase II Monitoring of First-Order Autocorrelated General Linear Profiles

Authors: Yihua Wang, Yunru Lai

Abstract:

Statistical process control has been successfully applied in a variety of industries. In some applications, the quality of a process or product is better characterized and summarized by a functional relationship between a response variable and one or more explanatory variables. A collection of this type of data is called a profile. Profile monitoring is used to understand and check the stability of this relationship or curve over time. The independent assumption for the error term is commonly used in the existing profile monitoring studies. However, in many applications, the profile data show correlations over time. Therefore, we focus on a general linear regression model with a first-order autocorrelation between profiles in this study. We propose an exponentially weighted moving average charting scheme to monitor this type of profile. The simulation study shows that our proposed methods outperform the existing schemes based on the average run length criterion.

Keywords: autocorrelation, EWMA control chart, general linear regression model, profile monitoring

Procedia PDF Downloads 433
18010 Breast Cancer Detection Using Machine Learning Algorithms

Authors: Jiwan Kumar, Pooja, Sandeep Negi, Anjum Rouf, Amit Kumar, Naveen Lakra

Abstract:

In modern times where, health issues are increasing day by day, breast cancer is also one of them, which is very crucial and really important to find in the early stages. Doctors can use this model in order to tell their patients whether a cancer is not harmful (benign) or harmful (malignant). We have used the knowledge of machine learning in order to produce the model. we have used algorithms like Logistic Regression, Random forest, support Vector Classifier, Bayesian Network and Radial Basis Function. We tried to use the data of crucial parts and show them the results in pictures in order to make it easier for doctors. By doing this, we're making ML better at finding breast cancer, which can lead to saving more lives and better health care.

Keywords: Bayesian network, radial basis function, ensemble learning, understandable, data making better, random forest, logistic regression, breast cancer

Procedia PDF Downloads 5
18009 Mathematical Modeling of Carotenoids and Polyphenols Content of Faba Beans (Vicia faba L.) during Microwave Treatments

Authors: Ridha Fethi Mechlouch, Ahlem Ayadi, Ammar Ben Brahim

Abstract:

Given the importance of the preservation of polyphenols and carotenoids during thermal processing, we attempted in this study to investigate the variation of these two parameters in faba beans during microwave treatment using different power densities (1; 2; and 3W/g), then to perform a mathematical modeling by using non-linear regression analysis to evaluate the models constants. The variation of the carotenoids and polyphenols ratio of faba beans and the models are tested to validate the experimental results. Exponential models were found to be suitable to describe the variation of caratenoid ratio (R²= 0.945, 0.927 and 0.946) for power densities (1; 2; and 3W/g) respectively, and polyphenol ratio (R²= 0.931, 0.989 and 0.982) for power densities (1; 2; and 3W/g) respectively. The effect of microwave power density Pd(W/g) on the coefficient k of models were also investigated. The coefficient is highly correlated (R² = 1) and can be expressed as a polynomial function.

Keywords: microwave treatment, power density, carotenoid, polyphenol, modeling

Procedia PDF Downloads 225
18008 MapReduce Logistic Regression Algorithms with RHadoop

Authors: Byung Ho Jung, Dong Hoon Lim

Abstract:

Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. Logistic regression is used extensively in numerous disciplines, including the medical and social science fields. In this paper, we address the problem of estimating parameters in the logistic regression based on MapReduce framework with RHadoop that integrates R and Hadoop environment applicable to large scale data. There exist three learning algorithms for logistic regression, namely Gradient descent method, Cost minimization method and Newton-Rhapson's method. The Newton-Rhapson's method does not require a learning rate, while gradient descent and cost minimization methods need to manually pick a learning rate. The experimental results demonstrated that our learning algorithms using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also compared the performance of our Newton-Rhapson's method with gradient descent and cost minimization methods. The results showed that our newton's method appeared to be the most robust to all data tested.

Keywords: big data, logistic regression, MapReduce, RHadoop

Procedia PDF Downloads 244
18007 The Impact of Public Open Space System on Housing Price in Chicago

Authors: Si Chen, Le Zhang, Xian He

Abstract:

The research explored the influences of public open space system on housing price through hedonic models, in order to support better open space plans and economic policies. We have three initial hypotheses: 1) public open space system has an overall positive influence on surrounding housing prices. 2) Different public open space types have different levels of influence on motivating surrounding housing prices. 3) Walking and driving accessibilities from property to public open spaces have different statistical relation with housing prices. Cook County, Illinois, was chosen to be a study area since data availability, sufficient open space types, and long-term open space preservation strategies. We considered the housing attributes, driving and walking accessibility scores from houses to nearby public open spaces, and driving accessibility scores to hospitals as influential features and used real housing sales price in 2010 as a dependent variable in the built hedonic model. Through ordinary least squares (OLS) regression analysis, General Moran’s I analysis and geographically weighted regression analysis, we observed the statistical relations between public open spaces and housing sale prices in the three built hedonic models and confirmed all three hypotheses.

Keywords: hedonic model, public open space, housing sale price, regression analysis, accessibility score

Procedia PDF Downloads 98