Search results for: Cox regression model
18601 Analysis of Active Compounds in Thai Herbs by near Infrared Spectroscopy
Authors: Chaluntorn Vichasilp, Sutee Wangtueai
Abstract:
This study aims to develop a new method to detect active compounds in Thai herbs (1-deoxynojirimycin (DNJ) in mulberry leave, anthocyanin in Mao and curcumin in turmeric) using near infrared spectroscopy (NIRs). NIRs is non-destructive technique that rapid, non-chemical involved and low-cost determination. By NIRs and chemometrics technique, it was found that the DNJ prediction equation conducted with partial least square regression with cross-validation had low accuracy R2 (0.42) and SEP (31.87 mg/100g). On the other hand, the anthocyanin prediction equation showed moderate good results (R2 and SEP of 0.78 and 0.51 mg/g) with Multiplication scattering correction at wavelength of 2000-2200 nm. The high absorption could be observed at wavelength of 2047 nm and this model could be used as screening level. For curcumin prediction, the good result was obtained when applied original spectra with smoothing technique. The wavelength of 1400-2500 nm was created regression model with R2 (0.68) and SEP (0.17 mg/g). This model had high NIRs absorption at a wavelength of 1476, 1665, 1986 and 2395 nm, respectively. NIRs showed prospective technique for detection of some active compounds in Thai herbs.Keywords: anthocyanin, curcumin, 1-deoxynojirimycin (DNJ), near infrared spectroscopy (NIRs)
Procedia PDF Downloads 38218600 MapReduce Logistic Regression Algorithms with RHadoop
Authors: Byung Ho Jung, Dong Hoon Lim
Abstract:
Logistic regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. Logistic regression is used extensively in numerous disciplines, including the medical and social science fields. In this paper, we address the problem of estimating parameters in the logistic regression based on MapReduce framework with RHadoop that integrates R and Hadoop environment applicable to large scale data. There exist three learning algorithms for logistic regression, namely Gradient descent method, Cost minimization method and Newton-Rhapson's method. The Newton-Rhapson's method does not require a learning rate, while gradient descent and cost minimization methods need to manually pick a learning rate. The experimental results demonstrated that our learning algorithms using RHadoop can scale well and efficiently process large data sets on commodity hardware. We also compared the performance of our Newton-Rhapson's method with gradient descent and cost minimization methods. The results showed that our newton's method appeared to be the most robust to all data tested.Keywords: big data, logistic regression, MapReduce, RHadoop
Procedia PDF Downloads 28518599 Method of Parameter Calibration for Error Term in Stochastic User Equilibrium Traffic Assignment Model
Authors: Xiang Zhang, David Rey, S. Travis Waller
Abstract:
Stochastic User Equilibrium (SUE) model is a widely used traffic assignment model in transportation planning, which is regarded more advanced than Deterministic User Equilibrium (DUE) model. However, a problem exists that the performance of the SUE model depends on its error term parameter. The objective of this paper is to propose a systematic method of determining the appropriate error term parameter value for the SUE model. First, the significance of the parameter is explored through a numerical example. Second, the parameter calibration method is developed based on the Logit-based route choice model. The calibration process is realized through multiple nonlinear regression, using sequential quadratic programming combined with least square method. Finally, case analysis is conducted to demonstrate the application of the calibration process and validate the better performance of the SUE model calibrated by the proposed method compared to the SUE models under other parameter values and the DUE model.Keywords: parameter calibration, sequential quadratic programming, stochastic user equilibrium, traffic assignment, transportation planning
Procedia PDF Downloads 29918598 Derivation of Bathymetry from High-Resolution Satellite Images: Comparison of Empirical Methods through Geographical Error Analysis
Authors: Anusha P. Wijesundara, Dulap I. Rathnayake, Nihal D. Perera
Abstract:
Bathymetric information is fundamental importance to coastal and marine planning and management, nautical navigation, and scientific studies of marine environments. Satellite-derived bathymetry data provide detailed information in areas where conventional sounding data is lacking and conventional surveys are inaccessible. The two empirical approaches of log-linear bathymetric inversion model and non-linear bathymetric inversion model are applied for deriving bathymetry from high-resolution multispectral satellite imagery. This study compares these two approaches by means of geographical error analysis for the site Kankesanturai using WorldView-2 satellite imagery. Based on the Levenberg-Marquardt method calibrated the parameters of non-linear inversion model and the multiple-linear regression model was applied to calibrate the log-linear inversion model. In order to calibrate both models, Single Beam Echo Sounding (SBES) data in this study area were used as reference points. Residuals were calculated as the difference between the derived depth values and the validation echo sounder bathymetry data and the geographical distribution of model residuals was mapped. The spatial autocorrelation was calculated by comparing the performance of the bathymetric models and the results showing the geographic errors for both models. A spatial error model was constructed from the initial bathymetry estimates and the estimates of autocorrelation. This spatial error model is used to generate more reliable estimates of bathymetry by quantifying autocorrelation of model error and incorporating this into an improved regression model. Log-linear model (R²=0.846) performs better than the non- linear model (R²=0.692). Finally, the spatial error models improved bathymetric estimates derived from linear and non-linear models up to R²=0.854 and R²=0.704 respectively. The Root Mean Square Error (RMSE) was calculated for all reference points in various depth ranges. The magnitude of the prediction error increases with depth for both the log-linear and the non-linear inversion models. Overall RMSE for log-linear and the non-linear inversion models were ±1.532 m and ±2.089 m, respectively.Keywords: log-linear model, multi spectral, residuals, spatial error model
Procedia PDF Downloads 29718597 Interference among Lambsquarters and Oil Rapeseed Cultivars
Authors: Reza Siyami, Bahram Mirshekari
Abstract:
Seed and oil yield of rapeseed is considerably affected by weeds interference including mustard (Sinapis arvensis L.), lambsquarters (Chenopodium album L.) and redroot pigweed (Amaranthus retroflexus L.) throughout the East Azerbaijan province in Iran. To formulate the relationship between four independent growth variables measured in our experiment with a dependent variable, multiple regression analysis was carried out for the weed leaves number per plant (X1), green cover percentage (X2), LAI (X3) and leaf area per plant (X4) as independent variables and rapeseed oil yield as a dependent variable. The multiple regression equation is shown as follows: Seed essential oil yield (kg/ha) = 0.156 + 0.0325 (X1) + 0.0489 (X2) + 0.0415 (X3) + 0.133 (X4). Furthermore, the stepwise regression analysis was also carried out for the data obtained to test the significance of the independent variables affecting the oil yield as a dependent variable. The resulted stepwise regression equation is shown as follows: Oil yield = 4.42 + 0.0841 (X2) + 0.0801 (X3); R2 = 81.5. The stepwise regression analysis verified that the green cover percentage and LAI of weed had a marked increasing effect on the oil yield of rapeseed.Keywords: green cover percentage, independent variable, interference, regression
Procedia PDF Downloads 42018596 Indian Premier League (IPL) Score Prediction: Comparative Analysis of Machine Learning Models
Authors: Rohini Hariharan, Yazhini R, Bhamidipati Naga Shrikarti
Abstract:
In the realm of cricket, particularly within the context of the Indian Premier League (IPL), the ability to predict team scores accurately holds significant importance for both cricket enthusiasts and stakeholders alike. This paper presents a comprehensive study on IPL score prediction utilizing various machine learning algorithms, including Support Vector Machines (SVM), XGBoost, Multiple Regression, Linear Regression, K-nearest neighbors (KNN), and Random Forest. Through meticulous data preprocessing, feature engineering, and model selection, we aimed to develop a robust predictive framework capable of forecasting team scores with high precision. Our experimentation involved the analysis of historical IPL match data encompassing diverse match and player statistics. Leveraging this data, we employed state-of-the-art machine learning techniques to train and evaluate the performance of each model. Notably, Multiple Regression emerged as the top-performing algorithm, achieving an impressive accuracy of 77.19% and a precision of 54.05% (within a threshold of +/- 10 runs). This research contributes to the advancement of sports analytics by demonstrating the efficacy of machine learning in predicting IPL team scores. The findings underscore the potential of advanced predictive modeling techniques to provide valuable insights for cricket enthusiasts, team management, and betting agencies. Additionally, this study serves as a benchmark for future research endeavors aimed at enhancing the accuracy and interpretability of IPL score prediction models.Keywords: indian premier league (IPL), cricket, score prediction, machine learning, support vector machines (SVM), xgboost, multiple regression, linear regression, k-nearest neighbors (KNN), random forest, sports analytics
Procedia PDF Downloads 5318595 Assessing the Impacts of Urbanization on Urban Precincts: A Case of Golconda Precinct, Hyderabad
Authors: Sai AKhila Budaraju
Abstract:
Heritage sites are an integral part of cities and carry a sense of identity to the cities/ towns, but the process of urbanization is a carrying potential threat for the loss of these heritage sites/monuments. Both Central and State Governments listed the historic Golconda fort as National Important Monument and the Heritage precinct with eight heritage-listed buildings and two historical sites respectively, for conservation and preservation, due to the presence of IT Corridor 6kms away accommodating more people in the precinct is under constant pressure. The heritage precinct possesses high property values, being a prime location connecting the IT corridor and CBD (central business district )areas. The primary objective of the study was to assess and identify the factors that are affecting the heritage precinct through Mapping and documentation, Identifying and assessing the factors through empirical analysis, Ordinal regression analysis and Hedonic Pricing Model. Ordinal regression analysis was used to identify the factors that contribute to the changes in the precinct due to urbanization. Hedonic Pricing Model was used to understand and establish a relation whether the presence of historical monuments is also a contributing factor to the property value and to what extent this influence can contribute. The above methods and field visit indicates the Physical, socio-economic factors and the neighborhood characteristics of the precinct contributing to the property values. The outturns and the potential elements derived from the analysis of the Development Control Rules were derived as recommendations to Integrate both Old and newly built environments.Keywords: heritage planning, heritage conservation, hedonic pricing model, ordinal regression analysis
Procedia PDF Downloads 19318594 Non-Linear Regression Modeling for Composite Distributions
Authors: Mostafa Aminzadeh, Min Deng
Abstract:
Modeling loss data is an important part of actuarial science. Actuaries use models to predict future losses and manage financial risk, which can be beneficial for marketing purposes. In the insurance industry, small claims happen frequently while large claims are rare. Traditional distributions such as Normal, Exponential, and inverse-Gaussian are not suitable for describing insurance data, which often show skewness and fat tails. Several authors have studied classical and Bayesian inference for parameters of composite distributions, such as Exponential-Pareto, Weibull-Pareto, and Inverse Gamma-Pareto. These models separate small to moderate losses from large losses using a threshold parameter. This research introduces a computational approach using a nonlinear regression model for loss data that relies on multiple predictors. Simulation studies were conducted to assess the accuracy of the proposed estimation method. The simulations confirmed that the proposed method provides precise estimates for regression parameters. It's important to note that this approach can be applied to datasets if goodness-of-fit tests confirm that the composite distribution under study fits the data well. To demonstrate the computations, a real data set from the insurance industry is analyzed. A Mathematica code uses the Fisher information algorithm as an iteration method to obtain the maximum likelihood estimation (MLE) of regression parameters.Keywords: maximum likelihood estimation, fisher scoring method, non-linear regression models, composite distributions
Procedia PDF Downloads 3418593 Replicating Brain’s Resting State Functional Connectivity Network Using a Multi-Factor Hub-Based Model
Authors: B. L. Ho, L. Shi, D. F. Wang, V. C. T. Mok
Abstract:
The brain’s functional connectivity while temporally non-stationary does express consistency at a macro spatial level. The study of stable resting state connectivity patterns hence provides opportunities for identification of diseases if such stability is severely perturbed. A mathematical model replicating the brain’s spatial connections will be useful for understanding brain’s representative geometry and complements the empirical model where it falls short. Empirical computations tend to involve large matrices and become infeasible with fine parcellation. However, the proposed analytical model has no such computational problems. To improve replicability, 92 subject data are obtained from two open sources. The proposed methodology, inspired by financial theory, uses multivariate regression to find relationships of every cortical region of interest (ROI) with some pre-identified hubs. These hubs acted as representatives for the entire cortical surface. A variance-covariance framework of all ROIs is then built based on these relationships to link up all the ROIs. The result is a high level of match between model and empirical correlations in the range of 0.59 to 0.66 after adjusting for sample size; an increase of almost forty percent. More significantly, the model framework provides an intuitive way to delineate between systemic drivers and idiosyncratic noise while reducing dimensions by more than 30 folds, hence, providing a way to conduct attribution analysis. Due to its analytical nature and simple structure, the model is useful as a standalone toolkit for network dependency analysis or as a module for other mathematical models.Keywords: functional magnetic resonance imaging, multivariate regression, network hubs, resting state functional connectivity
Procedia PDF Downloads 15118592 Estimation of Missing Values in Aggregate Level Spatial Data
Authors: Amitha Puranik, V. S. Binu, Seena Biju
Abstract:
Missing data is a common problem in spatial analysis especially at the aggregate level. Missing can either occur in covariate or in response variable or in both in a given location. Many missing data techniques are available to estimate the missing data values but not all of these methods can be applied on spatial data since the data are autocorrelated. Hence there is a need to develop a method that estimates the missing values in both response variable and covariates in spatial data by taking account of the spatial autocorrelation. The present study aims to develop a model to estimate the missing data points at the aggregate level in spatial data by accounting for (a) Spatial autocorrelation of the response variable (b) Spatial autocorrelation of covariates and (c) Correlation between covariates and the response variable. Estimating the missing values of spatial data requires a model that explicitly account for the spatial autocorrelation. The proposed model not only accounts for spatial autocorrelation but also utilizes the correlation that exists between covariates, within covariates and between a response variable and covariates. The precise estimation of the missing data points in spatial data will result in an increased precision of the estimated effects of independent variables on the response variable in spatial regression analysis.Keywords: spatial regression, missing data estimation, spatial autocorrelation, simulation analysis
Procedia PDF Downloads 38218591 Nowcasting Indonesian Economy
Authors: Ferry Kurniawan
Abstract:
In this paper, we nowcast quarterly output growth in Indonesia by exploiting higher frequency data (monthly indicators) using a mixed-frequency factor model and exploiting both quarterly and monthly data. Nowcasting quarterly GDP in Indonesia is particularly relevant for the central bank of Indonesia which set the policy rate in the monthly Board of Governors Meeting; whereby one of the important step is the assessment of the current state of the economy. Thus, having an accurate and up-to-date quarterly GDP nowcast every time new monthly information becomes available would clearly be of interest for central bank of Indonesia, for example, as the initial assessment of the current state of the economy -including nowcast- will be used as input for longer term forecast. We consider a small scale mixed-frequency factor model to produce nowcasts. In particular, we specify variables as year-on-year growth rates thus the relation between quarterly and monthly data is expressed in year-on-year growth rates. To assess the performance of the model, we compare the nowcasts with two other approaches: autoregressive model –which is often difficult when forecasting output growth- and Mixed Data Sampling (MIDAS) regression. In particular, both mixed frequency factor model and MIDAS nowcasts are produced by exploiting the same set of monthly indicators. Hence, we compare the nowcasts performance of the two approaches directly. To preview the results, we find that by exploiting monthly indicators using mixed-frequency factor model and MIDAS regression we improve the nowcast accuracy over a benchmark simple autoregressive model that uses only quarterly frequency data. However, it is not clear whether the MIDAS or mixed-frequency factor model is better. Neither set of nowcasts encompasses the other; suggesting that both nowcasts are valuable in nowcasting GDP but neither is sufficient. By combining the two individual nowcasts, we find that the nowcast combination not only increases the accuracy - relative to individual nowcasts- but also lowers the risk of the worst performance of the individual nowcasts.Keywords: nowcasting, mixed-frequency data, factor model, nowcasts combination
Procedia PDF Downloads 33118590 Efficient Estimation for the Cox Proportional Hazards Cure Model
Authors: Khandoker Akib Mohammad
Abstract:
While analyzing time-to-event data, it is possible that a certain fraction of subjects will never experience the event of interest, and they are said to be cured. When this feature of survival models is taken into account, the models are commonly referred to as cure models. In the presence of covariates, the conditional survival function of the population can be modelled by using the cure model, which depends on the probability of being uncured (incidence) and the conditional survival function of the uncured subjects (latency), and a combination of logistic regression and Cox proportional hazards (PH) regression is used to model the incidence and latency respectively. In this paper, we have shown the asymptotic normality of the profile likelihood estimator via asymptotic expansion of the profile likelihood and obtain the explicit form of the variance estimator with an implicit function in the profile likelihood. We have also shown the efficient score function based on projection theory and the profile likelihood score function are equal. Our contribution in this paper is that we have expressed the efficient information matrix as the variance of the profile likelihood score function. A simulation study suggests that the estimated standard errors from bootstrap samples (SMCURE package) and the profile likelihood score function (our approach) are providing similar and comparable results. The numerical result of our proposed method is also shown by using the melanoma data from SMCURE R-package, and we compare the results with the output obtained from the SMCURE package.Keywords: Cox PH model, cure model, efficient score function, EM algorithm, implicit function, profile likelihood
Procedia PDF Downloads 14418589 Phase II Monitoring of First-Order Autocorrelated General Linear Profiles
Authors: Yihua Wang, Yunru Lai
Abstract:
Statistical process control has been successfully applied in a variety of industries. In some applications, the quality of a process or product is better characterized and summarized by a functional relationship between a response variable and one or more explanatory variables. A collection of this type of data is called a profile. Profile monitoring is used to understand and check the stability of this relationship or curve over time. The independent assumption for the error term is commonly used in the existing profile monitoring studies. However, in many applications, the profile data show correlations over time. Therefore, we focus on a general linear regression model with a first-order autocorrelation between profiles in this study. We propose an exponentially weighted moving average charting scheme to monitor this type of profile. The simulation study shows that our proposed methods outperform the existing schemes based on the average run length criterion.Keywords: autocorrelation, EWMA control chart, general linear regression model, profile monitoring
Procedia PDF Downloads 46018588 Breast Cancer Detection Using Machine Learning Algorithms
Authors: Jiwan Kumar, Pooja, Sandeep Negi, Anjum Rouf, Amit Kumar, Naveen Lakra
Abstract:
In modern times where, health issues are increasing day by day, breast cancer is also one of them, which is very crucial and really important to find in the early stages. Doctors can use this model in order to tell their patients whether a cancer is not harmful (benign) or harmful (malignant). We have used the knowledge of machine learning in order to produce the model. we have used algorithms like Logistic Regression, Random forest, support Vector Classifier, Bayesian Network and Radial Basis Function. We tried to use the data of crucial parts and show them the results in pictures in order to make it easier for doctors. By doing this, we're making ML better at finding breast cancer, which can lead to saving more lives and better health care.Keywords: Bayesian network, radial basis function, ensemble learning, understandable, data making better, random forest, logistic regression, breast cancer
Procedia PDF Downloads 5318587 Monte Carlo Estimation of Heteroscedasticity and Periodicity Effects in a Panel Data Regression Model
Authors: Nureni O. Adeboye, Dawud A. Agunbiade
Abstract:
This research attempts to investigate the effects of heteroscedasticity and periodicity in a Panel Data Regression Model (PDRM) by extending previous works on balanced panel data estimation within the context of fitting PDRM for Banks audit fee. The estimation of such model was achieved through the derivation of Joint Lagrange Multiplier (LM) test for homoscedasticity and zero-serial correlation, a conditional LM test for zero serial correlation given heteroscedasticity of varying degrees as well as conditional LM test for homoscedasticity given first order positive serial correlation via a two-way error component model. Monte Carlo simulations were carried out for 81 different variations, of which its design assumed a uniform distribution under a linear heteroscedasticity function. Each of the variation was iterated 1000 times and the assessment of the three estimators considered are based on Variance, Absolute bias (ABIAS), Mean square error (MSE) and the Root Mean Square (RMSE) of parameters estimates. Eighteen different models at different specified conditions were fitted, and the best-fitted model is that of within estimator when heteroscedasticity is severe at either zero or positive serial correlation value. LM test results showed that the tests have good size and power as all the three tests are significant at 5% for the specified linear form of heteroscedasticity function which established the facts that Banks operations are severely heteroscedastic in nature with little or no periodicity effects.Keywords: audit fee lagrange multiplier test, heteroscedasticity, lagrange multiplier test, Monte-Carlo scheme, periodicity
Procedia PDF Downloads 14118586 The Impact of Public Open Space System on Housing Price in Chicago
Authors: Si Chen, Le Zhang, Xian He
Abstract:
The research explored the influences of public open space system on housing price through hedonic models, in order to support better open space plans and economic policies. We have three initial hypotheses: 1) public open space system has an overall positive influence on surrounding housing prices. 2) Different public open space types have different levels of influence on motivating surrounding housing prices. 3) Walking and driving accessibilities from property to public open spaces have different statistical relation with housing prices. Cook County, Illinois, was chosen to be a study area since data availability, sufficient open space types, and long-term open space preservation strategies. We considered the housing attributes, driving and walking accessibility scores from houses to nearby public open spaces, and driving accessibility scores to hospitals as influential features and used real housing sales price in 2010 as a dependent variable in the built hedonic model. Through ordinary least squares (OLS) regression analysis, General Moran’s I analysis and geographically weighted regression analysis, we observed the statistical relations between public open spaces and housing sale prices in the three built hedonic models and confirmed all three hypotheses.Keywords: hedonic model, public open space, housing sale price, regression analysis, accessibility score
Procedia PDF Downloads 13318585 Integrating Machine Learning and Rule-Based Decision Models for Enhanced B2B Sales Forecasting and Customer Prioritization
Authors: Wenqi Liu, Reginald Bailey
Abstract:
This study proposes a comprehensive and effective approach to business-to-business (B2B) sales forecasting by integrating advanced machine learning models with a rule-based decision-making framework. The methodology addresses the critical challenge of optimizing sales pipeline performance and improving conversion rates through predictive analytics and actionable insights. The first component involves developing a classification model to predict the likelihood of conversion, aiming to outperform traditional methods such as logistic regression in terms of accuracy, precision, recall, and F1 score. Feature importance analysis highlights key predictive factors, such as client revenue size and sales velocity, providing valuable insights into conversion dynamics. The second component focuses on forecasting sales value using a regression model, designed to achieve superior performance compared to linear regression by minimizing mean absolute error (MAE), mean squared error (MSE), and maximizing R-squared metrics. The regression analysis identifies primary drivers of sales value, further informing data-driven strategies. To bridge the gap between predictive modeling and actionable outcomes, a rule-based decision framework is introduced. This model categorizes leads into high, medium, and low priorities based on thresholds for conversion probability and predicted sales value. By combining classification and regression outputs, this framework enables sales teams to allocate resources effectively, focus on high-value opportunities, and streamline lead management processes. The integrated approach significantly enhances lead prioritization, increases conversion rates, and drives revenue generation, offering a robust solution to the declining pipeline conversion rates faced by many B2B organizations. Our findings demonstrate the practical benefits of blending machine learning with decision-making frameworks, providing a scalable, data-driven solution for strategic sales optimization. This study underscores the potential of predictive analytics to transform B2B sales operations, enabling more informed decision-making and improved organizational outcomes in competitive markets.Keywords: machine learning, XGBoost, regression, decision making framework, system engineering
Procedia PDF Downloads 1718584 Meta Model for Optimum Design Objective Function of Steel Frames Subjected to Seismic Loads
Authors: Salah R. Al Zaidee, Ali S. Mahdi
Abstract:
Except for simple problems of statically determinate structures, optimum design problems in structural engineering have implicit objective functions where structural analysis and design are essential within each searching loop. With these implicit functions, the structural engineer is usually enforced to write his/her own computer code for analysis, design, and searching for optimum design among many feasible candidates and cannot take advantage of available software for structural analysis, design, and searching for the optimum solution. The meta-model is a regression model used to transform an implicit objective function into objective one and leads in turn to decouple the structural analysis and design processes from the optimum searching process. With the meta-model, well-known software for structural analysis and design can be used in sequence with optimum searching software. In this paper, the meta-model has been used to develop an explicit objective function for plane steel frames subjected to dead, live, and seismic forces. Frame topology is assumed as predefined based on architectural and functional requirements. Columns and beams sections and different connections details are the main design variables in this study. Columns and beams are grouped to reduce the number of design variables and to make the problem similar to that adopted in engineering practice. Data for the implicit objective function have been generated based on analysis and assessment for many design proposals with CSI SAP software. These data have been used later in SPSS software to develop a pure quadratic nonlinear regression model for the explicit objective function. Good correlations with a coefficient, R2, in the range from 0.88 to 0.99 have been noted between the original implicit functions and the corresponding explicit functions generated with meta-model.Keywords: meta-modal, objective function, steel frames, seismic analysis, design
Procedia PDF Downloads 24318583 A Case Comparative Study of Infant Mortality Rate in North-West Nigeria
Authors: G. I. Onwuka, A. Danbaba, S. U. Gulumbe
Abstract:
This study investigated of Infant Mortality Rate as observed at a general hospital in Kaduna-South, Kaduna State, North West Nigeria. The causes of infant Mortality were examined. The data used for this analysis were collected at the statistics unit of the Hospital. The analysis was carried out on the data using Multiple Linear regression Technique and this showed that there is linear relationship between the dependent variable (death) and the independent variables (malaria, measles, anaemia, and coronary heart disease). The resultant model also revealed that a unit increment in each of these diseases would result to a unit increment in death recorded, 98.7% of the total variation in mortality is explained by the given model. The highest number of mortality was recorded in July, 2005 and the lowest mortality recorded in October, 2009.Recommendations were however made based on the results of the study.Keywords: infant mortality rate, multiple linear regression, diseases, serial correlation
Procedia PDF Downloads 33118582 Image Compression Based on Regression SVM and Biorthogonal Wavelets
Authors: Zikiou Nadia, Lahdir Mourad, Ameur Soltane
Abstract:
In this paper, we propose an effective method for image compression based on SVM Regression (SVR), with three different kernels, and biorthogonal 2D Discrete Wavelet Transform. SVM regression could learn dependency from training data and compressed using fewer training points (support vectors) to represent the original data and eliminate the redundancy. Biorthogonal wavelet has been used to transform the image and the coefficients acquired are then trained with different kernels SVM (Gaussian, Polynomial, and Linear). Run-length and Arithmetic coders are used to encode the support vectors and its corresponding weights, obtained from the SVM regression. The peak signal noise ratio (PSNR) and their compression ratios of several test images, compressed with our algorithm, with different kernels are presented. Compared with other kernels, Gaussian kernel achieves better image quality. Experimental results show that the compression performance of our method gains much improvement.Keywords: image compression, 2D discrete wavelet transform (DWT-2D), support vector regression (SVR), SVM Kernels, run-length, arithmetic coding
Procedia PDF Downloads 38218581 Artificial Neural Network and Statistical Method
Authors: Tomas Berhanu Bekele
Abstract:
Traffic congestion is one of the main problems related to transportation in developed as well as developing countries. Traffic control systems are based on the idea of avoiding traffic instabilities and homogenizing traffic flow in such a way that the risk of accidents is minimized and traffic flow is maximized. Lately, Intelligent Transport Systems (ITS) has become an important area of research to solve such road traffic-related issues for making smart decisions. It links people, roads and vehicles together using communication technologies to increase safety and mobility. Moreover, accurate prediction of road traffic is important to manage traffic congestion. The aim of this study is to develop an ANN model for the prediction of traffic flow and to compare the ANN model with the linear regression model of traffic flow predictions. Data extraction was carried out in intervals of 15 minutes from the video player. Video of mixed traffic flow was taken and then counted during office work in order to determine the traffic volume. Vehicles were classified into six categories, namely Car, Motorcycle, Minibus, mid-bus, Bus, and Truck vehicles. The average time taken by each vehicle type to travel the trap length was measured by time displayed on a video screen.Keywords: intelligent transport system (ITS), traffic flow prediction, artificial neural network (ANN), linear regression
Procedia PDF Downloads 6718580 In and Out-Of-Sample Performance of Non Simmetric Models in International Price Differential Forecasting in a Commodity Country Framework
Authors: Nicola Rubino
Abstract:
This paper presents an analysis of a group of commodity exporting countries' nominal exchange rate movements in relationship to the US dollar. Using a series of Unrestricted Self-exciting Threshold Autoregressive models (SETAR), we model and evaluate sixteen national CPI price differentials relative to the US dollar CPI. Out-of-sample forecast accuracy is evaluated through calculation of mean absolute error measures on the basis of two-hundred and fifty-three months rolling window forecasts and extended to three additional models, namely a logistic smooth transition regression (LSTAR), an additive non linear autoregressive model (AAR) and a simple linear Neural Network model (NNET). Our preliminary results confirm presence of some form of TAR non linearity in the majority of the countries analyzed, with a relatively higher goodness of fit, with respect to the linear AR(1) benchmark, in five countries out of sixteen considered. Although no model appears to statistically prevail over the other, our final out-of-sample forecast exercise shows that SETAR models tend to have quite poor relative forecasting performance, especially when compared to alternative non-linear specifications. Finally, by analyzing the implied half-lives of the > coefficients, our results confirms the presence, in the spirit of arbitrage band adjustment, of band convergence with an inner unit root behaviour in five of the sixteen countries analyzed.Keywords: transition regression model, real exchange rate, nonlinearities, price differentials, PPP, commodity points
Procedia PDF Downloads 27818579 Adaptive Neuro Fuzzy Inference System Model Based on Support Vector Regression for Stock Time Series Forecasting
Authors: Anita Setianingrum, Oki S. Jaya, Zuherman Rustam
Abstract:
Forecasting stock price is a challenging task due to the complex time series of the data. The complexity arises from many variables that affect the stock market. Many time series models have been proposed before, but those previous models still have some problems: 1) put the subjectivity of choosing the technical indicators, and 2) rely upon some assumptions about the variables, so it is limited to be applied to all datasets. Therefore, this paper studied a novel Adaptive Neuro-Fuzzy Inference System (ANFIS) time series model based on Support Vector Regression (SVR) for forecasting the stock market. In order to evaluate the performance of proposed models, stock market transaction data of TAIEX and HIS from January to December 2015 is collected as experimental datasets. As a result, the method has outperformed its counterparts in terms of accuracy.Keywords: ANFIS, fuzzy time series, stock forecasting, SVR
Procedia PDF Downloads 24718578 Electricity Load Modeling: An Application to Italian Market
Authors: Giovanni Masala, Stefania Marica
Abstract:
Forecasting electricity load plays a crucial role regards decision making and planning for economical purposes. Besides, in the light of the recent privatization and deregulation of the power industry, the forecasting of future electricity load turned out to be a very challenging problem. Empirical data about electricity load highlights a clear seasonal behavior (higher load during the winter season), which is partly due to climatic effects. We also emphasize the presence of load periodicity at a weekly basis (electricity load is usually lower on weekends or holidays) and at daily basis (electricity load is clearly influenced by the hour). Finally, a long-term trend may depend on the general economic situation (for example, industrial production affects electricity load). All these features must be captured by the model. The purpose of this paper is then to build an hourly electricity load model. The deterministic component of the model requires non-linear regression and Fourier series while we will investigate the stochastic component through econometrical tools. The calibration of the parameters’ model will be performed by using data coming from the Italian market in a 6 year period (2007- 2012). Then, we will perform a Monte Carlo simulation in order to compare the simulated data respect to the real data (both in-sample and out-of-sample inspection). The reliability of the model will be deduced thanks to standard tests which highlight a good fitting of the simulated values.Keywords: ARMA-GARCH process, electricity load, fitting tests, Fourier series, Monte Carlo simulation, non-linear regression
Procedia PDF Downloads 39518577 Life in Bequia in the Era of Climate Change: Societal Perception of Adaptation and Vulnerability
Authors: Sherry Ann Ganase, Sandra Sookram
Abstract:
This study examines adaptation measures and factors that influence adaptation decisions in Bequia by using multiple linear regression and a structural equation model. Using survey data, the results suggest that households are knowledgeable and concerned about climate change but lack knowledge about the measures needed to adapt. The findings from the SEM suggest that a positive relationship exist between vulnerability and adaptation, vulnerability and perception, along with a negative relationship between perception and adaptation. This suggests that being aware of the terms associated with climate change and knowledge about climate change is insufficient for implementing adaptation measures; instead the risk and importance placed on climate change, vulnerability experienced with household flooding, drainage and expected threat of future sea level are the main factors that influence the adaptation decision. The results obtained in this study are beneficial to all as adaptation requires a collective effort by stakeholders.Keywords: adaptation, Bequia, multiple linear regression, structural equation model
Procedia PDF Downloads 46318576 Growth Curves Genetic Analysis of Native South Caspian Sea Poultry Using Bayesian Statistics
Authors: Jamal Fayazi, Farhad Anoosheh, Mohammad R. Ghorbani, Ali R. Paydar
Abstract:
In this study, to determine the best non-linear regression model describing the growth curve of native poultry, 9657 chicks of generations 18, 19, and 20 raised in Mazandaran breeding center were used. Fowls and roosters of this center distributed in south of Caspian Sea region. To estimate the genetic variability of none linear regression parameter of growth traits, a Gibbs sampling of Bayesian analysis was used. The average body weight traits in the first day (BW1), eighth week (BW8) and twelfth week (BW12) were respectively estimated as 36.05, 763.03, and 1194.98 grams. Based on the coefficient of determination, mean squares of error and Akaike information criteria, Gompertz model was selected as the best growth descriptive function. In Gompertz model, the estimated values for the parameters of maturity weight (A), integration constant (B) and maturity rate (K) were estimated to be 1734.4, 3.986, and 0.282, respectively. The direct heritability of BW1, BW8 and BW12 were respectively reported to be as 0.378, 0.3709, 0.316, 0.389, 0.43, 0.09 and 0.07. With regard to estimated parameters, the results of this study indicated that there is a possibility to improve some property of growth curve using appropriate selection programs.Keywords: direct heritability, Gompertz, growth traits, maturity weight, native poultry
Procedia PDF Downloads 26518575 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques
Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas
Abstract:
The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.Keywords: Artificial Neural network, Competitive dynamics, Logistic Regression, Text classification, Text mining
Procedia PDF Downloads 12118574 Predicting Bridge Pier Scour Depth with SVM
Authors: Arun Goel
Abstract:
Prediction of maximum local scour is necessary for the safety and economical design of the bridges. A number of equations have been developed over the years to predict local scour depth using laboratory data and a few pier equations have also been proposed using field data. Most of these equations are empirical in nature as indicated by the past publications. In this paper, attempts have been made to compute local depth of scour around bridge pier in dimensional and non-dimensional form by using linear regression, simple regression and SVM (Poly and Rbf) techniques along with few conventional empirical equations. The outcome of this study suggests that the SVM (Poly and Rbf) based modeling can be employed as an alternate to linear regression, simple regression and the conventional empirical equations in predicting scour depth of bridge piers. The results of present study on the basis of non-dimensional form of bridge pier scour indicates the improvement in the performance of SVM (Poly and Rbf) in comparison to dimensional form of scour.Keywords: modeling, pier scour, regression, prediction, SVM (Poly and Rbf kernels)
Procedia PDF Downloads 45118573 Development of a Turbulent Boundary Layer Wall-pressure Fluctuations Power Spectrum Model Using a Stepwise Regression Algorithm
Authors: Zachary Huffman, Joana Rocha
Abstract:
Wall-pressure fluctuations induced by the turbulent boundary layer (TBL) developed over aircraft are a significant source of aircraft cabin noise. Since the power spectral density (PSD) of these pressure fluctuations is directly correlated with the amount of sound radiated into the cabin, the development of accurate empirical models that predict the PSD has been an important ongoing research topic. The sound emitted can be represented from the pressure fluctuations term in the Reynoldsaveraged Navier-Stokes equations (RANS). Therefore, early TBL empirical models (including those from Lowson, Robertson, Chase, and Howe) were primarily derived by simplifying and solving the RANS for pressure fluctuation and adding appropriate scales. Most subsequent models (including Goody, Efimtsov, Laganelli, Smol’yakov, and Rackl and Weston models) were derived by making modifications to these early models or by physical principles. Overall, these models have had varying levels of accuracy, but, in general, they are most accurate under the specific Reynolds and Mach numbers they were developed for, while being less accurate under other flow conditions. Despite this, recent research into the possibility of using alternative methods for deriving the models has been rather limited. More recent studies have demonstrated that an artificial neural network model was more accurate than traditional models and could be applied more generally, but the accuracy of other machine learning techniques has not been explored. In the current study, an original model is derived using a stepwise regression algorithm in the statistical programming language R, and TBL wall-pressure fluctuations PSD data gathered at the Carleton University wind tunnel. The theoretical advantage of a stepwise regression approach is that it will automatically filter out redundant or uncorrelated input variables (through the process of feature selection), and it is computationally faster than machine learning. The main disadvantage is the potential risk of overfitting. The accuracy of the developed model is assessed by comparing it to independently sourced datasets.Keywords: aircraft noise, machine learning, power spectral density models, regression models, turbulent boundary layer wall-pressure fluctuations
Procedia PDF Downloads 13518572 Satellite LiDAR-Based Digital Terrain Model Correction using Gaussian Process Regression
Authors: Keisuke Takahata, Hiroshi Suetsugu
Abstract:
Forest height is an important parameter for forest biomass estimation, and precise elevation data is essential for accurate forest height estimation. There are several globally or nationally available digital elevation models (DEMs) like SRTM and ASTER. However, its accuracy is reported to be low particularly in mountainous areas where there are closed canopy or steep slope. Recently, space-borne LiDAR, such as the Global Ecosystem Dynamics Investigation (GEDI), have started to provide sparse but accurate ground elevation and canopy height estimates. Several studies have reported the high degree of accuracy in their elevation products on their exact footprints, while it is not clear how this sparse information can be used for wider area. In this study, we developed a digital terrain model correction algorithm by spatially interpolating the difference between existing DEMs and GEDI elevation products by using Gaussian Process (GP) regression model. The result shows that our GP-based methodology can reduce the mean bias of the elevation data from 3.7m to 0.3m when we use airborne LiDAR-derived elevation information as ground truth. Our algorithm is also capable of quantifying the elevation data uncertainty, which is critical requirement for biomass inventory. Upcoming satellite-LiDAR missions, like MOLI (Multi-footprint Observation Lidar and Imager), are expected to contribute to the more accurate digital terrain model generation.Keywords: digital terrain model, satellite LiDAR, gaussian processes, uncertainty quantification
Procedia PDF Downloads 183