Search results for: shrinkage regression
3175 A Statistical Approach to Predict and Classify the Commercial Hatchability of Chickens Using Extrinsic Parameters of Breeders and Eggs
Authors: M. S. Wickramarachchi, L. S. Nawarathna, C. M. B. Dematawewa
Abstract:
Hatchery performance is critical for the profitability of poultry breeder operations. Some extrinsic parameters of eggs and breeders cause to increase or decrease the hatchability. This study aims to identify the affecting extrinsic parameters on the commercial hatchability of local chicken's eggs and determine the most efficient classification model with a hatchability rate greater than 90%. In this study, seven extrinsic parameters were considered: egg weight, moisture loss, breeders age, number of fertilised eggs, shell width, shell length, and shell thickness. Multiple linear regression was performed to determine the most influencing variable on hatchability. First, the correlation between each parameter and hatchability were checked. Then a multiple regression model was developed, and the accuracy of the fitted model was evaluated. Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel, and Random Forest (RF) algorithms were applied to classify the hatchability. This grouping process was conducted using binary classification techniques. Hatchability was negatively correlated with egg weight, breeders' age, shell width, shell length, and positive correlations were identified with moisture loss, number of fertilised eggs, and shell thickness. Multiple linear regression models were more accurate than single linear models regarding the highest coefficient of determination (R²) with 94% and minimum AIC and BIC values. According to the classification results, RF, CART, and kNN had performed the highest accuracy values 0.99, 0.975, and 0.972, respectively, for the commercial hatchery process. Therefore, the RF is the most appropriate machine learning algorithm for classifying the breeder outcomes, which are economically profitable or not, in a commercial hatchery.Keywords: classification models, egg weight, fertilised eggs, multiple linear regression
Procedia PDF Downloads 873174 Non-Methane Hydrocarbons Emission during the Photocopying Process
Authors: Kiurski S. Jelena, Aksentijević M. Snežana, Kecić S. Vesna, Oros B. Ivana
Abstract:
The prosperity of electronic equipment in photocopying environment not only has improved work efficiency, but also has changed indoor air quality. Considering the number of photocopying employed, indoor air quality might be worse than in general office environments. Determining the contribution from any type of equipment to indoor air pollution is a complex matter. Non-methane hydrocarbons are known to have an important role of air quality due to their high reactivity. The presence of hazardous pollutants in indoor air has been detected in one photocopying shop in Novi Sad, Serbia. Air samples were collected and analyzed for five days, during 8-hr working time in three-time intervals, whereas three different sampling points were determined. Using multiple linear regression model and software package STATISTICA 10 the concentrations of occupational hazards and micro-climates parameters were mutually correlated. Based on the obtained multiple coefficients of determination (0.3751, 0.2389, and 0.1975), a weak positive correlation between the observed variables was determined. Small values of parameter F indicated that there was no statistically significant difference between the concentration levels of non-methane hydrocarbons and micro-climates parameters. The results showed that variable could be presented by the general regression model: y = b0 + b1xi1+ b2xi2. Obtained regression equations allow to measure the quantitative agreement between the variation of variables and thus obtain more accurate knowledge of their mutual relations.Keywords: non-methane hydrocarbons, photocopying process, multiple regression analysis, indoor air quality, pollutant emission
Procedia PDF Downloads 3783173 Principal Component Regression in Amylose Content on the Malaysian Market Rice Grains Using Near Infrared Reflectance Spectroscopy
Authors: Syahira Ibrahim, Herlina Abdul Rahim
Abstract:
The amylose content is an essential element in determining the texture and taste of rice grains. This paper evaluates the use of VIS-SWNIRS in estimating the amylose content for seven varieties of rice grains available in the Malaysian market. Each type consists of 30 samples and all the samples are scanned using the spectroscopy to obtain a range of values between 680-1000nm. The Savitzky-Golay (SG) smoothing filter is applied to each sample’s data before the Principal Component Regression (PCR) technique is used to examine the data and produce a single value for each sample. This value is then compared with reference values obtained from the standard iodine colorimetric test in terms of its coefficient of determination, R2. Results show that this technique produced low R2 values of less than 0.50. In order to improve the result, the range should include a wavelength range of 1100-2500nm and the number of samples processed should also be increased.Keywords: amylose content, diffuse reflectance, Malaysia rice grain, principal component regression (PCR), Visible and Shortwave near-infrared spectroscopy (VIS-SWNIRS)
Procedia PDF Downloads 3823172 Improved Regression Relations Between Different Magnitude Types and the Moment Magnitude in the Western Balkan Earthquake Catalogue
Authors: Anila Xhahysa, Migena Ceyhan, Neki Kuka, Klajdi Qoshi, Damiano Koxhaj
Abstract:
The seismic event catalog has been updated in the framework of a bilateral project supported by the Central European Investment Fund and with the extensive support of Global Earthquake Model Foundation to update Albania's national seismic hazard model. The earthquake catalogue prepared within this project covers the Western Balkan area limited by 38.0° - 48°N, 12.5° - 24.5°E and includes 41,806 earthquakes that occurred in the region between 510 BC and 2022. Since the moment magnitude characterizes the earthquake size accurately and the selected ground motion prediction equations for the seismic hazard assessment employ this scale, it was chosen as the uniform magnitude scale for the catalogue. Therefore, proxy values of moment magnitude had to be obtained by using new magnitude conversion equations between the local and other magnitude types to this unified scale. The Global Centroid Moment Tensor Catalogue was considered the most authoritative for moderate to large earthquakes for moment magnitude reports; hence it was used as a reference for calibrating other sources. The best fit was observed when compared to some regional agencies, whereas, with reports of moment magnitudes from Italy, Greece and Turkey, differences were observed in all magnitude ranges. For teleseismic magnitudes, to account for the non-linearity of the relationships, we used the exponential model for the derivation of the regression equations. The obtained regressions for the surface wave magnitude and short-period body-wave magnitude show considerable differences with Global Earthquake Model regression curves, especially for low magnitude ranges. Moreover, a conversion relation was obtained between the local magnitude of Albania and the corresponding moment magnitude as reported by the global and regional agencies. As errors were present in both variables, the Deming regression was used.Keywords: regression, seismic catalogue, local magnitude, tele-seismic magnitude, moment magnitude
Procedia PDF Downloads 703171 Modeling the Impacts of Road Construction on Lands Values
Authors: Maha Almumaiz, Harry Evdorides
Abstract:
Change in land value typically occurs when a new interurban road construction causes an increase in accessibility; this change in the adjacent lands values differs according to land characteristics such as geographic location, land use type, land area and sale time (appraisal time). A multiple regression model is obtained to predict the percent change in land value (CLV) based on four independent variables namely land distance from the constructed road, area of land, nature of land use and time from the works completion of the road. The random values of percent change in land value were generated using Microsoft Excel with a range of up to 35%. The trend of change in land value with the four independent variables was determined from the literature references. The statistical analysis and model building process has been made by using the IBM SPSS V23 software. The Regression model suggests, for lands that are located within 3 miles as the straight distance from the road, the percent CLV is between (0-35%) which is depending on many factors including distance from the constructed road, land use, land area and time from works completion of the new road.Keywords: interurban road, land use types, new road construction, percent CLV, regression model
Procedia PDF Downloads 2663170 Quantitative Structure Activity Relationship and Insilco Docking of Substituted 1,3,4-Oxadiazole Derivatives as Potential Glucosamine-6-Phosphate Synthase Inhibitors
Authors: Suman Bala, Sunil Kamboj, Vipin Saini
Abstract:
Quantitative Structure Activity Relationship (QSAR) analysis has been developed to relate antifungal activity of novel substituted 1,3,4-oxadiazole against Candida albicans and Aspergillus niger using computer assisted multiple regression analysis. The study has shown the better relationship between antifungal activities with respect to various descriptors established by multiple regression analysis. The analysis has shown statistically significant correlation with R2 values 0.932 and 0.782 against Candida albicans and Aspergillus niger respectively. These derivatives were further subjected to molecular docking studies to investigate the interactions between the target compounds and amino acid residues present in the active site of glucosamine-6-phosphate synthase. All the synthesized compounds have better docking score as compared to standard fluconazole. Our results could be used for the further design as well as development of optimal and potential antifungal agents.Keywords: 1, 3, 4-oxadiazole, QSAR, multiple linear regression, docking, glucosamine-6-phosphate synthase
Procedia PDF Downloads 3413169 A Study on Characteristics of Hedonic Price Models in Korea Based on Meta-Regression Analysis
Authors: Minseo Jo
Abstract:
The purpose of this paper is to examine the factors in the hedonic price models, that has significance impact in determining the price of apartments. There are many variables employed in the hedonic price models and their effectiveness vary differently according to the researchers and the regions they are analysing. In order to consider various conditions, the meta-regression analysis has been selected for the study. In this paper, four meta-independent variables, from the 65 hedonic price models to analysis. The factors that influence the prices of apartments, as well as including factors that influence the prices of apartments, regions, which are divided into two of the research performed, years of research performed, the coefficients of the functions employed. The covariance between the four meta-variables and p-value of the coefficients and the four meta-variables and number of data used in the 65 hedonic price models have been analyzed in this study. The six factors that are most important in deciding the prices of apartments are positioning of apartments, the noise of the apartments, points of the compass and views from the apartments, proximity to the public transportations, companies that have constructed the apartments, social environments (such as schools etc.).Keywords: hedonic price model, housing price, meta-regression analysis, characteristics
Procedia PDF Downloads 4023168 Switched System Diagnosis Based on Intelligent State Filtering with Unknown Models
Authors: Nada Slimane, Foued Theljani, Faouzi Bouani
Abstract:
The paper addresses the problem of fault diagnosis for systems operating in several modes (normal or faulty) based on states assessment. We use, for this purpose, a methodology consisting of three main processes: 1) sequential data clustering, 2) linear model regression and 3) state filtering. Typically, Kalman Filter (KF) is an algorithm that provides estimation of unknown states using a sequence of I/O measurements. Inevitably, although it is an efficient technique for state estimation, it presents two main weaknesses. First, it merely predicts states without being able to isolate/classify them according to their different operating modes, whether normal or faulty modes. To deal with this dilemma, the KF is endowed with an extra clustering step based fully on sequential version of the k-means algorithm. Second, to provide state estimation, KF requires state space models, which can be unknown. A linear regularized regression is used to identify the required models. To prove its effectiveness, the proposed approach is assessed on a simulated benchmark.Keywords: clustering, diagnosis, Kalman Filtering, k-means, regularized regression
Procedia PDF Downloads 1823167 Utilization of Sludge in the Manufacturing of Fired Clay Bricks
Authors: Anjali G. Pillai, S. Chadrakaran
Abstract:
The extensive amount of sludge generated throughout the world, as a part of water treatment works, have caused various social and economic issues, such as a demand on landfill spaces, increase in environmental pollution and raising the waste management cost. With growing social awareness about toxic incinerator emissions and the increasing concern over the disposal of sludge on the agricultural land, the recovery of sewage sludge as a building and construction raw material can be considered as an innovative approach to tackle the sludge disposal problem. The proposed work aims at studying the recycling ability of the sludge, generated from the water treatment process, by incorporating it into the fired clay brick units. The work involves initial study of the geotechnical characteristics of the brick-clay and the sludge. Chemical compatibility of both the materials will be analyzed by X-ray fluorescence technique. The variation in the strength aspects with varying proportions of sludge i.e. 10%, 20%, 30% and 40% in the sludge-clay mix will also be determined by the proctor density test. Based on the optimum moisture content, the sludge-clay bricks will be manufactured in a brick manufacturing plant and the modified brick units will be tested to determine the variation in compressive strength, bulk density, firing shrinkage, shrinkage loss and initial water absorption rate with respect to the conventional clay bricks. The results will be compared with the specifications given in Indian Standards to arrive at the potential use of the new bricks. The durability aspect will be studied by conducting the leachate analysis test using atomic adsorption spectrometry. The lightweight characteristics of the sludge modified bricks will be ascertained with the scanning electron microscope technique which will be indicative of the variation in pore structure with the increase in sludge content within the bricks. The work will determine the suitable proportion of the sludge – clay mix in the brick which can then be effectively implemented. The feasibility aspect of the work will be determined for commercial production of the units. The work involves providing a strategy for conversion of waste to resource. Moreover, it provides an alternative solution to the problem of growing scarcity of brick-clay for the manufacturing of fired clay bricks.Keywords: eco-bricks, green construction material, sludge amended bricks, sludge disposal, waste management
Procedia PDF Downloads 3063166 Water Temperature on Early Age Concrete Property
Authors: Tesfaye Sisay Dessalegn
Abstract:
The long-term performance of concrete structures is affected by the properties and behavior of concrete at an early age. However, the fundamental mechanisms affecting the early-age behavior of concrete have not yet been fully studied. The effect of water temperature on concrete is not sufficiently studied, and at the same time, the majority of studies focused on the effect of mixing water temperature on the workability and mechanical properties of concrete. However, to the best of the authors' knowledge, the effect of mixing water temperatures on plastic shrinkage cracking of concrete has not been studied yet.Keywords: water temperature, early age concrete strength, mechanical properties of concrete, strength
Procedia PDF Downloads 583165 Examining the Effects of College Education on Democratic Attitudes in China: A Regression Discontinuity Analysis
Authors: Gang Wang
Abstract:
Education is widely believed to be a prerequisite for democracy and civil society, but the causal link between education and outcome variables is usually hardly to be identified. This study applies a fuzzy regression discontinuity design to examine the effects of college education on democratic attitudes in the Chinese context. In the analysis treatment assignment is determined by students’ college entry years and thus naturally selected by subjects’ ages. Using a sample of Chinese college students collected in Beijing in 2009, this study finds that college education actually reduces undergraduates’ motivation for political development in China but promotes political loyalty to the authoritarian government. Further hypotheses tests explain these interesting findings from two perspectives. The first is related to the complexity of politics. As college students progress over time, they increasingly realize the complexity of political reform in China’s authoritarian regime and rather stay away from politics. The second is related to students’ career opportunities. As students are close to graduation, they are immersed with job hunting and have a reduced interest in political freedom.Keywords: china, college education, democratic attitudes, regression discontinuity
Procedia PDF Downloads 3513164 Count Data Regression Modeling: An Application to Spontaneous Abortion in India
Authors: Prashant Verma, Prafulla K. Swain, K. K. Singh, Mukti Khetan
Abstract:
Objective: In India, around 20,000 women die every year due to abortion-related complications. In the modelling of count variables, there is sometimes a preponderance of zero counts. This article concerns the estimation of various count regression models to predict the average number of spontaneous abortion among women in the Punjab state of India. It also assesses the factors associated with the number of spontaneous abortions. Materials and methods: The study included 27,173 married women of Punjab obtained from the DLHS-4 survey (2012-13). Poisson regression (PR), Negative binomial (NB) regression, zero hurdle negative binomial (ZHNB), and zero-inflated negative binomial (ZINB) models were employed to predict the average number of spontaneous abortions and to identify the determinants affecting the number of spontaneous abortions. Results: Statistical comparisons among four estimation methods revealed that the ZINB model provides the best prediction for the number of spontaneous abortions. Antenatal care (ANC) place, place of residence, total children born to a woman, woman's education and economic status were found to be the most significant factors affecting the occurrence of spontaneous abortion. Conclusions: The study offers a practical demonstration of techniques designed to handle count variables. Statistical comparisons among four estimation models revealed that the ZINB model provided the best prediction for the number of spontaneous abortions and is recommended to be used to predict the number of spontaneous abortions. The study suggests that women receive institutional Antenatal care to attain limited parity. It also advocates promoting higher education among women in Punjab, India.Keywords: count data, spontaneous abortion, Poisson model, negative binomial model, zero hurdle negative binomial, zero-inflated negative binomial, regression
Procedia PDF Downloads 1553163 Business Constraints and Growth Potential of Smes: Case Study of Electrical Industry in Pakistan
Authors: Muhammad Waseem Akram
Abstract:
The current study attempts to analyze the impact of business constraints on the growth potential and performance of Small and Medium Enterprises (SMEs) in the electrical industry of Pakistan. Primary data have been utilized for the study collected from the electrical industry cluster in Sargodha, Pakistan. OLS regression is used to assess the impact of business constraints on the performance of SMEs by controlling the effect of Technology Level, Innovations, and Firm Size. To associate business constraints with the growth potential of SMEs, the study utilized Tetrachoric Correlation and Logistic Regression. Findings reveal that all the business constraints negatively affect the performance of SMEs in the electrical industry except Political Instability. Results of Tetrachoric Correlation show that all the business constraints are negatively correlated with the growth potential of SMEs. Logistic Regression results show that Energy Constraint, Inflation and Price Instability, and Bad Business Practices, all three business constraints cause to reduce the probability of income growth in sample SMEs.Keywords: SMEs, business constraints, performance, growth potential
Procedia PDF Downloads 1693162 Application of Nonparametric Geographically Weighted Regression to Evaluate the Unemployment Rate in East Java
Authors: Sifriyani Sifriyani, I Nyoman Budiantara, Sri Haryatmi, Gunardi Gunardi
Abstract:
East Java Province has a first rank as a province that has the most counties and cities in Indonesia and has the largest population. In 2015, the population reached 38.847.561 million, this figure showed a very high population growth. High population growth is feared to lead to increase the levels of unemployment. In this study, the researchers mapped and modeled the unemployment rate with 6 variables that were supposed to influence. Modeling was done by nonparametric geographically weighted regression methods with truncated spline approach. This method was chosen because spline method is a flexible method, these models tend to look for its own estimation. In this modeling, there were point knots, the point that showed the changes of data. The selection of the optimum point knots was done by selecting the most minimun value of Generalized Cross Validation (GCV). Based on the research, 6 variables were declared to affect the level of unemployment in eastern Java. They were the percentage of population that is educated above high school, the rate of economic growth, the population density, the investment ratio of total labor force, the regional minimum wage and the ratio of the number of big industry and medium scale industry from the work force. The nonparametric geographically weighted regression models with truncated spline approach had a coefficient of determination 98.95% and the value of MSE equal to 0.0047.Keywords: East Java, nonparametric geographically weighted regression, spatial, spline approach, unemployed rate
Procedia PDF Downloads 3213161 Comparative Analysis of Predictive Models for Customer Churn Prediction in the Telecommunication Industry
Authors: Deepika Christopher, Garima Anand
Abstract:
To determine the best model for churn prediction in the telecom industry, this paper compares 11 machine learning algorithms, namely Logistic Regression, Support Vector Machine, Random Forest, Decision Tree, XGBoost, LightGBM, Cat Boost, AdaBoost, Extra Trees, Deep Neural Network, and Hybrid Model (MLPClassifier). It also aims to pinpoint the top three factors that lead to customer churn and conducts customer segmentation to identify vulnerable groups. According to the data, the Logistic Regression model performs the best, with an F1 score of 0.6215, 81.76% accuracy, 68.95% precision, and 56.57% recall. The top three attributes that cause churn are found to be tenure, Internet Service Fiber optic, and Internet Service DSL; conversely, the top three models in this article that perform the best are Logistic Regression, Deep Neural Network, and AdaBoost. The K means algorithm is applied to establish and analyze four different customer clusters. This study has effectively identified customers that are at risk of churn and may be utilized to develop and execute strategies that lower customer attrition.Keywords: attrition, retention, predictive modeling, customer segmentation, telecommunications
Procedia PDF Downloads 573160 Research on the Spatio-Temporal Evolution Pattern of Traffic Dominance in Shaanxi Province
Authors: Leng Jian-Wei, Wang Lai-Jun, Li Ye
Abstract:
In order to measure and analyze the transportation situation within the counties of Shaanxi province over a certain period of time and to promote the province's future transportation planning and development, this paper proposes a reasonable layout plan and compares model rationality. The study uses entropy weight method to measure the transportation advantages of 107 counties in Shaanxi province from three dimensions: road network density, trunk line influence and location advantage in 2013 and 2021, and applies spatial autocorrelation analysis method to analyze the spatial layout and development trend of county-level transportation, and conducts ordinary least square (OLS)regression on transportation impact factors and other influencing factors. The paper also compares the regression fitting degree of the Geographically weighted regression(GWR) model and the OLS model. The results show that spatially, the transportation advantages of Shaanxi province generally show a decreasing trend from the Weihe Plain to the surrounding areas and mainly exhibit high-high clustering phenomenon. Temporally, transportation advantages show an overall upward trend, and the phenomenon of spatial imbalance gradually decreases. People's travel demands have changed to some extent, and the demand for rapid transportation has increased overall. The GWR model regression fitting degree of transportation advantages is 0.74, which is higher than the OLS regression model's fitting degree of 0.64. Based on the evolution of transportation advantages, it is predicted that this trend will continue for a period of time in the future. To improve the transportation advantages of Shaanxi province increasing the layout of rapid transportation can effectively enhance the transportation advantages of Shaanxi province. When analyzing spatial heterogeneity, geographic factors should be considered to establish a more reliable modelKeywords: traffic dominance, GWR model, spatial autocorrelation analysis, temporal and spatial evolution
Procedia PDF Downloads 893159 Effect of Manual Compacting and Semi-Automatic Compacting on Behavior of Stabilized Earth Concrete
Authors: Sihem Chaibeddra, Fattoum Kharchi, Fahim Kahlouche, Youcef Benna
Abstract:
In the recent years, a considerable level of interest has been developed on the use of earth in construction, led by its rediscovery as an environmentally building material. The Stabilized Earth Concrete (SEC) is a good alternative to the cement concrete, thanks to its thermal and moisture regulating features. Many parameters affect the behavior of stabilized earth concrete. This article presents research results related to the influence of the compacting nature on some SEC properties namely: The mechanical behavior, capillary absorption, shrinkage and sustainability to water erosion, and this, basing on two types of compacting: Manual and semi-automatic.Keywords: behavior, compacting, manual, SEC, semi-automatic
Procedia PDF Downloads 3613158 Influence of Parameters of Modeling and Data Distribution for Optimal Condition on Locally Weighted Projection Regression Method
Authors: Farhad Asadi, Mohammad Javad Mollakazemi, Aref Ghafouri
Abstract:
Recent research in neural networks science and neuroscience for modeling complex time series data and statistical learning has focused mostly on learning from high input space and signals. Local linear models are a strong choice for modeling local nonlinearity in data series. Locally weighted projection regression is a flexible and powerful algorithm for nonlinear approximation in high dimensional signal spaces. In this paper, different learning scenario of one and two dimensional data series with different distributions are investigated for simulation and further noise is inputted to data distribution for making different disordered distribution in time series data and for evaluation of algorithm in locality prediction of nonlinearity. Then, the performance of this algorithm is simulated and also when the distribution of data is high or when the number of data is less the sensitivity of this approach to data distribution and influence of important parameter of local validity in this algorithm with different data distribution is explained.Keywords: local nonlinear estimation, LWPR algorithm, online training method, locally weighted projection regression method
Procedia PDF Downloads 5023157 Exploration and Evaluation of the Effect of Multiple Countermeasures on Road Safety
Authors: Atheer Al-Nuaimi, Harry Evdorides
Abstract:
Every day many people die or get disabled or injured on roads around the world, which necessitates more specific treatments for transportation safety issues. International road assessment program (iRAP) model is one of the comprehensive road safety models which accounting for many factors that affect road safety in a cost-effective way in low and middle income countries. In iRAP model road safety has been divided into five star ratings from 1 star (the lowest level) to 5 star (the highest level). These star ratings are based on star rating score which is calculated by iRAP methodology depending on road attributes, traffic volumes and operating speeds. The outcome of iRAP methodology are the treatments that can be used to improve road safety and reduce fatalities and serious injuries (FSI) numbers. These countermeasures can be used separately as a single countermeasure or mix as multiple countermeasures for a location. There is general agreement that the adequacy of a countermeasure is liable to consistent losses when it is utilized as a part of mix with different countermeasures. That is, accident diminishment appraisals of individual countermeasures cannot be easily added together. The iRAP model philosophy makes utilization of a multiple countermeasure adjustment factors to predict diminishments in the effectiveness of road safety countermeasures when more than one countermeasure is chosen. A multiple countermeasure correction factors are figured for every 100-meter segment and for every accident type. However, restrictions of this methodology incorporate a presumable over-estimation in the predicted crash reduction. This study aims to adjust this correction factor by developing new models to calculate the effect of using multiple countermeasures on the number of fatalities for a location or an entire road. Regression models have been used to establish relationships between crash frequencies and the factors that affect their rates. Multiple linear regression, negative binomial regression, and Poisson regression techniques were used to develop models that can address the effectiveness of using multiple countermeasures. Analyses are conducted using The R Project for Statistical Computing showed that a model developed by negative binomial regression technique could give more reliable results of the predicted number of fatalities after the implementation of road safety multiple countermeasures than the results from iRAP model. The results also showed that the negative binomial regression approach gives more precise results in comparison with multiple linear and Poisson regression techniques because of the overdispersion and standard error issues.Keywords: international road assessment program, negative binomial, road multiple countermeasures, road safety
Procedia PDF Downloads 2403156 Rd-PLS Regression: From the Analysis of Two Blocks of Variables to Path Modeling
Authors: E. Tchandao Mangamana, V. Cariou, E. Vigneau, R. Glele Kakai, E. M. Qannari
Abstract:
A new definition of a latent variable associated with a dataset makes it possible to propose variants of the PLS2 regression and the multi-block PLS (MB-PLS). We shall refer to these variants as Rd-PLS regression and Rd-MB-PLS respectively because they are inspired by both Redundancy analysis and PLS regression. Usually, a latent variable t associated with a dataset Z is defined as a linear combination of the variables of Z with the constraint that the length of the loading weights vector equals 1. Formally, t=Zw with ‖w‖=1. Denoting by Z' the transpose of Z, we define herein, a latent variable by t=ZZ’q with the constraint that the auxiliary variable q has a norm equal to 1. This new definition of a latent variable entails that, as previously, t is a linear combination of the variables in Z and, in addition, the loading vector w=Z’q is constrained to be a linear combination of the rows of Z. More importantly, t could be interpreted as a kind of projection of the auxiliary variable q onto the space generated by the variables in Z, since it is collinear to the first PLS1 component of q onto Z. Consider the situation in which we aim to predict a dataset Y from another dataset X. These two datasets relate to the same individuals and are assumed to be centered. Let us consider a latent variable u=YY’q to which we associate the variable t= XX’YY’q. Rd-PLS consists in seeking q (and therefore u and t) so that the covariance between t and u is maximum. The solution to this problem is straightforward and consists in setting q to the eigenvector of YY’XX’YY’ associated with the largest eigenvalue. For the determination of higher order components, we deflate X and Y with respect to the latent variable t. Extending Rd-PLS to the context of multi-block data is relatively easy. Starting from a latent variable u=YY’q, we consider its ‘projection’ on the space generated by the variables of each block Xk (k=1, ..., K) namely, tk= XkXk'YY’q. Thereafter, Rd-MB-PLS seeks q in order to maximize the average of the covariances of u with tk (k=1, ..., K). The solution to this problem is given by q, eigenvector of YY’XX’YY’, where X is the dataset obtained by horizontally merging datasets Xk (k=1, ..., K). For the determination of latent variables of order higher than 1, we use a deflation of Y and Xk with respect to the variable t= XX’YY’q. In the same vein, extending Rd-MB-PLS to the path modeling setting is straightforward. Methods are illustrated on the basis of case studies and performance of Rd-PLS and Rd-MB-PLS in terms of prediction is compared to that of PLS2 and MB-PLS.Keywords: multiblock data analysis, partial least squares regression, path modeling, redundancy analysis
Procedia PDF Downloads 1473155 Applying the Regression Technique for Prediction of the Acute Heart Attack
Authors: Paria Soleimani, Arezoo Neshati
Abstract:
Myocardial infarction is one of the leading causes of death in the world. Some of these deaths occur even before the patient reaches the hospital. Myocardial infarction occurs as a result of impaired blood supply. Because the most of these deaths are due to coronary artery disease, hence the awareness of the warning signs of a heart attack is essential. Some heart attacks are sudden and intense, but most of them start slowly, with mild pain or discomfort, then early detection and successful treatment of these symptoms is vital to save them. Therefore, importance and usefulness of a system designing to assist physicians in the early diagnosis of the acute heart attacks is obvious. The purpose of this study is to determine how well a predictive model would perform based on the only patient-reportable clinical history factors, without using diagnostic tests or physical exams. This type of the prediction model might have application outside of the hospital setting to give accurate advice to patients to influence them to seek care in appropriate situations. For this purpose, the data were collected on 711 heart patients in Iran hospitals. 28 attributes of clinical factors can be reported by patients; were studied. Three logistic regression models were made on the basis of the 28 features to predict the risk of heart attacks. The best logistic regression model in terms of performance had a C-index of 0.955 and with an accuracy of 94.9%. The variables, severe chest pain, back pain, cold sweats, shortness of breath, nausea, and vomiting were selected as the main features.Keywords: Coronary heart disease, Acute heart attacks, Prediction, Logistic regression
Procedia PDF Downloads 4493154 Fuzzy Logic Classification Approach for Exponential Data Set in Health Care System for Predication of Future Data
Authors: Manish Pandey, Gurinderjit Kaur, Meenu Talwar, Sachin Chauhan, Jagbir Gill
Abstract:
Health-care management systems are a unit of nice connection as a result of the supply a straightforward and fast management of all aspects relating to a patient, not essentially medical. What is more, there are unit additional and additional cases of pathologies during which diagnosing and treatment may be solely allotted by victimization medical imaging techniques. With associate ever-increasing prevalence, medical pictures area unit directly acquired in or regenerate into digital type, for his or her storage additionally as sequent retrieval and process. Data Mining is the process of extracting information from large data sets through using algorithms and Techniques drawn from the field of Statistics, Machine Learning and Data Base Management Systems. Forecasting may be a prediction of what's going to occur within the future, associated it's an unsure method. Owing to the uncertainty, the accuracy of a forecast is as vital because the outcome foretold by foretelling the freelance variables. A forecast management should be wont to establish if the accuracy of the forecast is within satisfactory limits. Fuzzy regression strategies have normally been wont to develop shopper preferences models that correlate the engineering characteristics with shopper preferences relating to a replacement product; the patron preference models offer a platform, wherever by product developers will decide the engineering characteristics so as to satisfy shopper preferences before developing the merchandise. Recent analysis shows that these fuzzy regression strategies area units normally will not to model client preferences. We tend to propose a Testing the strength of Exponential Regression Model over regression toward the mean Model.Keywords: health-care management systems, fuzzy regression, data mining, forecasting, fuzzy membership function
Procedia PDF Downloads 2793153 Glucose Monitoring System Using Machine Learning Algorithms
Authors: Sangeeta Palekar, Neeraj Rangwani, Akash Poddar, Jayu Kalambe
Abstract:
The bio-medical analysis is an indispensable procedure for identifying health-related diseases like diabetes. Monitoring the glucose level in our body regularly helps us identify hyperglycemia and hypoglycemia, which can cause severe medical problems like nerve damage or kidney diseases. This paper presents a method for predicting the glucose concentration in blood samples using image processing and machine learning algorithms. The glucose solution is prepared by the glucose oxidase (GOD) and peroxidase (POD) method. An experimental database is generated based on the colorimetric technique. The image of the glucose solution is captured by the raspberry pi camera and analyzed using image processing by extracting the RGB, HSV, LUX color space values. Regression algorithms like multiple linear regression, decision tree, RandomForest, and XGBoost were used to predict the unknown glucose concentration. The multiple linear regression algorithm predicts the results with 97% accuracy. The image processing and machine learning-based approach reduce the hardware complexities of existing platforms.Keywords: artificial intelligence glucose detection, glucose oxidase, peroxidase, image processing, machine learning
Procedia PDF Downloads 2043152 Statistical Analysis of the Impact of Maritime Transport Gross Domestic Product (GDP) on Nigeria’s Economy
Authors: Kehinde Peter Oyeduntan, Kayode Oshinubi
Abstract:
Nigeria is referred as the ‘Giant of Africa’ due to high population, land mass and large economy. However, it still trails far behind many smaller economies in the continent in terms of maritime operations. As we have seen that the maritime industry is the spark plug for national growth, because it houses the most crucial infrastructure that generates wealth for a nation, it is worrisome that a nation with six seaports lag in maritime activities. In this research, we have studied how the Gross Domestic Product (GDP) of the maritime transport influences the Nigerian economy. To do this, we applied Simple Linear Regression (SLR), Support Vector Machine (SVM), Polynomial Regression Model (PRM), Generalized Additive Model (GAM) and Generalized Linear Mixed Model (GLMM) to model the relationship between the nation’s Total GDP (TGDP) and the Maritime Transport GDP (MGDP) using a time series data of 20 years. The result showed that the MGDP is statistically significant to the Nigerian economy. Amongst the statistical tool applied, the PRM of order 4 describes the relationship better when compared to other methods. The recommendations presented in this study will guide policy makers and help improve the economy of Nigeria in terms of its GDP.Keywords: maritime transport, economy, GDP, regression, port
Procedia PDF Downloads 1543151 The Effect of Accounting Conservatism on Cost of Capital: A Quantile Regression Approach for MENA Countries
Authors: Maha Zouaoui Khalifa, Hakim Ben Othman, Hussaney Khaled
Abstract:
Prior empirical studies have investigated the economic consequences of accounting conservatism by examining its impact on the cost of equity capital (COEC). However, findings are not conclusive. We assume that inconsistent results of such association may be attributed to the regression models used in data analysis. To address this issue, we re-examine the effect of different dimension of accounting conservatism: unconditional conservatism (U_CONS) and conditional conservatism (C_CONS) on the COEC for a sample of listed firms from Middle Eastern and North Africa (MENA) countries, applying quantile regression (QR) approach developed by Koenker and Basset (1978). While classical ordinary least square (OLS) method is widely used in empirical accounting research, however it may produce inefficient and bias estimates in the case of departures from normality or long tail error distribution. QR method is more powerful than OLS to handle this kind of problem. It allows the coefficient on the independent variables to shift across the distribution of the dependent variable whereas OLS method only estimates the conditional mean effects of a response variable. We find as predicted that U_CONS has a significant positive effect on the COEC however, C_CONS has a negative impact. Findings suggest also that the effect of the two dimensions of accounting conservatism differs considerably across COEC quantiles. Comparing results from QR method with those of OLS, this study throws more lights on the association between accounting conservatism and COEC.Keywords: unconditional conservatism, conditional conservatism, cost of equity capital, OLS, quantile regression, emerging markets, MENA countries
Procedia PDF Downloads 3563150 Optimizing the Scanning Time with Radiation Prediction Using a Machine Learning Technique
Authors: Saeed Eskandari, Seyed Rasoul Mehdikhani
Abstract:
Radiation sources have been used in many industries, such as gamma sources in medical imaging. These waves have destructive effects on humans and the environment. It is very important to detect and find the source of these waves because these sources cannot be seen by the eye. A portable robot has been designed and built with the purpose of revealing radiation sources that are able to scan the place from 5 to 20 meters away and shows the location of the sources according to the intensity of the waves on a two-dimensional digital image. The operation of the robot is done by measuring the pixels separately. By increasing the image measurement resolution, we will have a more accurate scan of the environment, and more points will be detected. But this causes a lot of time to be spent on scanning. In this paper, to overcome this challenge, we designed a method that can optimize this time. In this method, a small number of important points of the environment are measured. Hence the remaining pixels are predicted and estimated by regression algorithms in machine learning. The research method is based on comparing the actual values of all pixels. These steps have been repeated with several other radiation sources. The obtained results of the study show that the values estimated by the regression method are very close to the real values.Keywords: regression, machine learning, scan radiation, robot
Procedia PDF Downloads 803149 Chemometric Regression Analysis of Radical Scavenging Ability of Kombucha Fermented Kefir-Like Products
Authors: Strahinja Kovacevic, Milica Karadzic Banjac, Jasmina Vitas, Stefan Vukmanovic, Radomir Malbasa, Lidija Jevric, Sanja Podunavac-Kuzmanovic
Abstract:
The present study deals with chemometric regression analysis of quality parameters and the radical scavenging ability of kombucha fermented kefir-like products obtained with winter savory (WS), peppermint (P), stinging nettle (SN) and wild thyme tea (WT) kombucha inoculums. Each analyzed sample was described by milk fat content (MF, %), total unsaturated fatty acids content (TUFA, %), monounsaturated fatty acids content (MUFA, %), polyunsaturated fatty acids content (PUFA, %), the ability of free radicals scavenging (RSA Dₚₚₕ, % and RSA.ₒₕ, %) and pH values measured after each hour from the start until the end of fermentation. The aim of the conducted regression analysis was to establish chemometric models which can predict the radical scavenging ability (RSA Dₚₚₕ, % and RSA.ₒₕ, %) of the samples by correlating it with the MF, TUFA, MUFA, PUFA and the pH value at the beginning, in the middle and at the end of fermentation process which lasted between 11 and 17 hours, until pH value of 4.5 was reached. The analysis was carried out applying univariate linear (ULR) and multiple linear regression (MLR) methods on the raw data and the data standardized by the min-max normalization method. The obtained models were characterized by very limited prediction power (poor cross-validation parameters) and weak statistical characteristics. Based on the conducted analysis it can be concluded that the resulting radical scavenging ability cannot be precisely predicted only on the basis of MF, TUFA, MUFA, PUFA content, and pH values, however, other quality parameters should be considered and included in the further modeling. This study is based upon work from project: Kombucha beverages production using alternative substrates from the territory of the Autonomous Province of Vojvodina, 142-451-2400/2019-03, supported by Provincial Secretariat for Higher Education and Scientific Research of AP Vojvodina.Keywords: chemometrics, regression analysis, kombucha, quality control
Procedia PDF Downloads 1423148 Enhancing Spatial Interpolation: A Multi-Layer Inverse Distance Weighting Model for Complex Regression and Classification Tasks in Spatial Data Analysis
Authors: Yakin Hajlaoui, Richard Labib, Jean-François Plante, Michel Gamache
Abstract:
This study introduces the Multi-Layer Inverse Distance Weighting Model (ML-IDW), inspired by the mathematical formulation of both multi-layer neural networks (ML-NNs) and Inverse Distance Weighting model (IDW). ML-IDW leverages ML-NNs' processing capabilities, characterized by compositions of learnable non-linear functions applied to input features, and incorporates IDW's ability to learn anisotropic spatial dependencies, presenting a promising solution for nonlinear spatial interpolation and learning from complex spatial data. it employ gradient descent and backpropagation to train ML-IDW, comparing its performance against conventional spatial interpolation models such as Kriging and standard IDW on regression and classification tasks using simulated spatial datasets of varying complexity. the results highlight the efficacy of ML-IDW, particularly in handling complex spatial datasets, exhibiting lower mean square error in regression and higher F1 score in classification.Keywords: deep learning, multi-layer neural networks, gradient descent, spatial interpolation, inverse distance weighting
Procedia PDF Downloads 523147 Indian Premier League (IPL) Score Prediction: Comparative Analysis of Machine Learning Models
Authors: Rohini Hariharan, Yazhini R, Bhamidipati Naga Shrikarti
Abstract:
In the realm of cricket, particularly within the context of the Indian Premier League (IPL), the ability to predict team scores accurately holds significant importance for both cricket enthusiasts and stakeholders alike. This paper presents a comprehensive study on IPL score prediction utilizing various machine learning algorithms, including Support Vector Machines (SVM), XGBoost, Multiple Regression, Linear Regression, K-nearest neighbors (KNN), and Random Forest. Through meticulous data preprocessing, feature engineering, and model selection, we aimed to develop a robust predictive framework capable of forecasting team scores with high precision. Our experimentation involved the analysis of historical IPL match data encompassing diverse match and player statistics. Leveraging this data, we employed state-of-the-art machine learning techniques to train and evaluate the performance of each model. Notably, Multiple Regression emerged as the top-performing algorithm, achieving an impressive accuracy of 77.19% and a precision of 54.05% (within a threshold of +/- 10 runs). This research contributes to the advancement of sports analytics by demonstrating the efficacy of machine learning in predicting IPL team scores. The findings underscore the potential of advanced predictive modeling techniques to provide valuable insights for cricket enthusiasts, team management, and betting agencies. Additionally, this study serves as a benchmark for future research endeavors aimed at enhancing the accuracy and interpretability of IPL score prediction models.Keywords: indian premier league (IPL), cricket, score prediction, machine learning, support vector machines (SVM), xgboost, multiple regression, linear regression, k-nearest neighbors (KNN), random forest, sports analytics
Procedia PDF Downloads 533146 Scoring System for the Prognosis of Sepsis Patients in Intensive Care Units
Authors: Javier E. García-Gallo, Nelson J. Fonseca-Ruiz, John F. Duitama-Munoz
Abstract:
Sepsis is a syndrome that occurs with physiological and biochemical abnormalities induced by severe infection and carries a high mortality and morbidity, therefore the severity of its condition must be interpreted quickly. After patient admission in an intensive care unit (ICU), it is necessary to synthesize the large volume of information that is collected from patients in a value that represents the severity of their condition. Traditional severity of illness scores seeks to be applicable to all patient populations, and usually assess in-hospital mortality. However, the use of machine learning techniques and the data of a population that shares a common characteristic could lead to the development of customized mortality prediction scores with better performance. This study presents the development of a score for the one-year mortality prediction of the patients that are admitted to an ICU with a sepsis diagnosis. 5650 ICU admissions extracted from the MIMICIII database were evaluated, divided into two groups: 70% to develop the score and 30% to validate it. Comorbidities, demographics and clinical information of the first 24 hours after the ICU admission were used to develop a mortality prediction score. LASSO (least absolute shrinkage and selection operator) and SGB (Stochastic Gradient Boosting) variable importance methodologies were used to select the set of variables that make up the developed score; each of this variables was dichotomized and a cut-off point that divides the population into two groups with different mean mortalities was found; if the patient is in the group that presents a higher mortality a one is assigned to the particular variable, otherwise a zero is assigned. These binary variables are used in a logistic regression (LR) model, and its coefficients were rounded to the nearest integer. The resulting integers are the point values that make up the score when multiplied with each binary variables and summed. The one-year mortality probability was estimated using the score as the only variable in a LR model. Predictive power of the score, was evaluated using the 1695 admissions of the validation subset obtaining an area under the receiver operating characteristic curve of 0.7528, which outperforms the results obtained with Sequential Organ Failure Assessment (SOFA), Oxford Acute Severity of Illness Score (OASIS) and Simplified Acute Physiology Score II (SAPSII) scores on the same validation subset. Observed and predicted mortality rates within estimated probabilities deciles were compared graphically and found to be similar, indicating that the risk estimate obtained with the score is close to the observed mortality, it is also observed that the number of events (deaths) is indeed increasing as the outcome go from the decile with the lowest probabilities to the decile with the highest probabilities. Sepsis is a syndrome that carries a high mortality, 43.3% for the patients included in this study; therefore, tools that help clinicians to quickly and accurately predict a worse prognosis are needed. This work demonstrates the importance of customization of mortality prediction scores since the developed score provides better performance than traditional scoring systems.Keywords: intensive care, logistic regression model, mortality prediction, sepsis, severity of illness, stochastic gradient boosting
Procedia PDF Downloads 222