Search results for: stepwise multiple regression analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30687

Search results for: stepwise multiple regression analysis

30657 Development of a Turbulent Boundary Layer Wall-pressure Fluctuations Power Spectrum Model Using a Stepwise Regression Algorithm

Authors: Zachary Huffman, Joana Rocha

Abstract:

Wall-pressure fluctuations induced by the turbulent boundary layer (TBL) developed over aircraft are a significant source of aircraft cabin noise. Since the power spectral density (PSD) of these pressure fluctuations is directly correlated with the amount of sound radiated into the cabin, the development of accurate empirical models that predict the PSD has been an important ongoing research topic. The sound emitted can be represented from the pressure fluctuations term in the Reynoldsaveraged Navier-Stokes equations (RANS). Therefore, early TBL empirical models (including those from Lowson, Robertson, Chase, and Howe) were primarily derived by simplifying and solving the RANS for pressure fluctuation and adding appropriate scales. Most subsequent models (including Goody, Efimtsov, Laganelli, Smol’yakov, and Rackl and Weston models) were derived by making modifications to these early models or by physical principles. Overall, these models have had varying levels of accuracy, but, in general, they are most accurate under the specific Reynolds and Mach numbers they were developed for, while being less accurate under other flow conditions. Despite this, recent research into the possibility of using alternative methods for deriving the models has been rather limited. More recent studies have demonstrated that an artificial neural network model was more accurate than traditional models and could be applied more generally, but the accuracy of other machine learning techniques has not been explored. In the current study, an original model is derived using a stepwise regression algorithm in the statistical programming language R, and TBL wall-pressure fluctuations PSD data gathered at the Carleton University wind tunnel. The theoretical advantage of a stepwise regression approach is that it will automatically filter out redundant or uncorrelated input variables (through the process of feature selection), and it is computationally faster than machine learning. The main disadvantage is the potential risk of overfitting. The accuracy of the developed model is assessed by comparing it to independently sourced datasets.

Keywords: aircraft noise, machine learning, power spectral density models, regression models, turbulent boundary layer wall-pressure fluctuations

Procedia PDF Downloads 111
30656 Chemometric QSRR Evaluation of Behavior of s-Triazine Pesticides in Liquid Chromatography

Authors: Lidija R. Jevrić, Sanja O. Podunavac-Kuzmanović, Strahinja Z. Kovačević

Abstract:

This study considers the selection of the most suitable in silico molecular descriptors that could be used for s-triazine pesticides characterization. Suitable descriptors among topological, geometrical and physicochemical are used for quantitative structure-retention relationships (QSRR) model establishment. Established models were obtained using linear regression (LR) and multiple linear regression (MLR) analysis. In this paper, MLR models were established avoiding multicollinearity among the selected molecular descriptors. Statistical quality of established models was evaluated by standard and cross-validation statistical parameters. For detection of similarity or dissimilarity among investigated s-triazine pesticides and their classification, principal component analysis (PCA) and hierarchical cluster analysis (HCA) were used and gave similar grouping. This study is financially supported by COST action TD1305.

Keywords: chemometrics, classification analysis, molecular descriptors, pesticides, regression analysis

Procedia PDF Downloads 356
30655 Factors Predicting Preventive Behavior for Osteoporosis in University Students

Authors: Thachamon Sinsoongsud, Noppawan Piaseu

Abstract:

This predictive study was aimed to 1) describe self efficacy for risk reduction and preventive behavior for osteoporosis, and 2) examine factors predicting preventive behavior for osteoporosis in nursing students. Through purposive sampling, the sample included 746 nursing students in a public university in Bangkok, Thailand. Data were collected by a self-reported questionnaire on self efficacy and preventive behavior for osteoporosis. Data were analyzed using descriptive statistics and multiple regression analysis with stepwise method. Results revealed that majority of the students were female (98.3%) with mean age of 19.86 + 1.26 years. The students had self efficacy and preventive behavior for osteoporosis at moderate level. Self efficacy and level of education could together predicted 35.2% variance of preventive behavior for osteoporosis (p< .001). Results suggest approaches for promoting preventive behavior for osteoporosis through enhancing self efficacy among nursing students in a public university in Bangkok, Thailand.

Keywords: osteoporosis, self-efficacy, preventive behavior, nursing students

Procedia PDF Downloads 341
30654 Prediction of Coronary Artery Stenosis Severity Based on Machine Learning Algorithms

Authors: Yu-Jia Jian, Emily Chia-Yu Su, Hui-Ling Hsu, Jian-Jhih Chen

Abstract:

Coronary artery is the major supplier of myocardial blood flow. When fat and cholesterol are deposit in the coronary arterial wall, narrowing and stenosis of the artery occurs, which may lead to myocardial ischemia and eventually infarction. According to the World Health Organization (WHO), estimated 740 million people have died of coronary heart disease in 2015. According to Statistics from Ministry of Health and Welfare in Taiwan, heart disease (except for hypertensive diseases) ranked the second among the top 10 causes of death from 2013 to 2016, and it still shows a growing trend. According to American Heart Association (AHA), the risk factors for coronary heart disease including: age (> 65 years), sex (men to women with 2:1 ratio), obesity, diabetes, hypertension, hyperlipidemia, smoking, family history, lack of exercise and more. We have collected a dataset of 421 patients from a hospital located in northern Taiwan who received coronary computed tomography (CT) angiography. There were 300 males (71.26%) and 121 females (28.74%), with age ranging from 24 to 92 years, and a mean age of 56.3 years. Prior to coronary CT angiography, basic data of the patients, including age, gender, obesity index (BMI), diastolic blood pressure, systolic blood pressure, diabetes, hypertension, hyperlipidemia, smoking, family history of coronary heart disease and exercise habits, were collected and used as input variables. The output variable of the prediction module is the degree of coronary artery stenosis. The output variable of the prediction module is the narrow constriction of the coronary artery. In this study, the dataset was randomly divided into 80% as training set and 20% as test set. Four machine learning algorithms, including logistic regression, stepwise regression, neural network and decision tree, were incorporated to generate prediction results. We used area under curve (AUC) / accuracy (Acc.) to compare the four models, the best model is neural network, followed by stepwise logistic regression, decision tree, and logistic regression, with 0.68 / 79 %, 0.68 / 74%, 0.65 / 78%, and 0.65 / 74%, respectively. Sensitivity of neural network was 27.3%, specificity was 90.8%, stepwise Logistic regression sensitivity was 18.2%, specificity was 92.3%, decision tree sensitivity was 13.6%, specificity was 100%, logistic regression sensitivity was 27.3%, specificity 89.2%. From the result of this study, we hope to improve the accuracy by improving the module parameters or other methods in the future and we hope to solve the problem of low sensitivity by adjusting the imbalanced proportion of positive and negative data.

Keywords: decision support, computed tomography, coronary artery, machine learning

Procedia PDF Downloads 203
30653 Relationship between Codependency, Perceived Social Support, and Depression in Mothers of Children with Intellectual Disability

Authors: Sajed Yaghoubnezhad, Mina Karimi, Seyede Marjan Modirkhazeni

Abstract:

The goal of this research was to study the relationship between codependency, perceived social support and depression in mothers of children with intellectual disability (ID). The correlational method was used in this study. The research population is comprised of mothers of educable children with ID in the age range of 25 to 61 years. From among this, a sample of 251 individuals, in the multistage cluster sampling method, was selected from educational districts in Tehran, who responded to the Spann-Fischer Codependency Scale (SFCDS), the Social Support Questionnaire and the Beck Depression Inventory (BDI). The findings of this study indicate that among mothers of children with ID depression has a positive and significant correlation with codependency (P<0.01, r=0.4) and a negative and significant correlation with the total score of social support (P<0.01, r=-0.34). Moreover, the results of stepwise multiple regression analysis showed that codependency is allocated a higher variance than social support in explaining depression (R2=0.023).

Keywords: codependency, social support, depression, mothers of children with ID

Procedia PDF Downloads 329
30652 An Overbooking Model for Car Rental Service with Different Types of Cars

Authors: Naragain Phumchusri, Kittitach Pongpairoj

Abstract:

Overbooking is a very useful revenue management technique that could help reduce costs caused by either undersales or oversales. In this paper, we propose an overbooking model for two types of cars that can minimize the total cost for car rental service. With two types of cars, there is an upgrade possibility for lower type to upper type. This makes the model more complex than one type of cars scenario. We have found that convexity can be proved in this case. Sensitivity analysis of the parameters is conducted to observe the effects of relevant parameters on the optimal solution. Model simplification is proposed using multiple linear regression analysis, which can help estimate the optimal overbooking level using appropriate independent variables. The results show that the overbooking level from multiple linear regression model is relatively close to the optimal solution (with the adjusted R-squared value of at least 72.8%). To evaluate the performance of the proposed model, the total cost was compared with the case where the decision maker uses a naïve method for the overbooking level. It was found that the total cost from optimal solution is only 0.5 to 1 percent (on average) lower than the cost from regression model, while it is approximately 67% lower than the cost obtained by the naïve method. It indicates that our proposed simplification method using regression analysis can effectively perform in estimating the overbooking level.

Keywords: overbooking, car rental industry, revenue management, stochastic model

Procedia PDF Downloads 146
30651 Mapping of Urban Micro-Climate in Lyon (France) by Integrating Complementary Predictors at Different Scales into Multiple Linear Regression Models

Authors: Lucille Alonso, Florent Renard

Abstract:

The characterizations of urban heat island (UHI) and their interactions with climate change and urban climates are the main research and public health issue, due to the increasing urbanization of the population. These solutions require a better knowledge of the UHI and micro-climate in urban areas, by combining measurements and modelling. This study is part of this topic by evaluating microclimatic conditions in dense urban areas in the Lyon Metropolitan Area (France) using a combination of data traditionally used such as topography, but also from LiDAR (Light Detection And Ranging) data, Landsat 8 satellite observation and Sentinel and ground measurements by bike. These bicycle-dependent weather data collections are used to build the database of the variable to be modelled, the air temperature, over Lyon’s hyper-center. This study aims to model the air temperature, measured during 6 mobile campaigns in Lyon in clear weather, using multiple linear regressions based on 33 explanatory variables. They are of various categories such as meteorological parameters from remote sensing, topographic variables, vegetation indices, the presence of water, humidity, bare soil, buildings, radiation, urban morphology or proximity and density to various land uses (water surfaces, vegetation, bare soil, etc.). The acquisition sources are multiple and come from the Landsat 8 and Sentinel satellites, LiDAR points, and cartographic products downloaded from an open data platform in Greater Lyon. Regarding the presence of low, medium, and high vegetation, the presence of buildings and ground, several buffers close to these factors were tested (5, 10, 20, 25, 50, 100, 200 and 500m). The buffers with the best linear correlations with air temperature for ground are 5m around the measurement points, for low and medium vegetation, and for building 50m and for high vegetation is 100m. The explanatory model of the dependent variable is obtained by multiple linear regression of the remaining explanatory variables (Pearson correlation matrix with a |r| < 0.7 and VIF with < 5) by integrating a stepwise sorting algorithm. Moreover, holdout cross-validation is performed, due to its ability to detect over-fitting of multiple regression, although multiple regression provides internal validation and randomization (80% training, 20% testing). Multiple linear regression explained, on average, 72% of the variance for the study days, with an average RMSE of only 0.20°C. The impact on the model of surface temperature in the estimation of air temperature is the most important variable. Other variables are recurrent such as distance to subway stations, distance to water areas, NDVI, digital elevation model, sky view factor, average vegetation density, or building density. Changing urban morphology influences the city's thermal patterns. The thermal atmosphere in dense urban areas can only be analysed on a microscale to be able to consider the local impact of trees, streets, and buildings. There is currently no network of fixed weather stations sufficiently deployed in central Lyon and most major urban areas. Therefore, it is necessary to use mobile measurements, followed by modelling to characterize the city's multiple thermal environments.

Keywords: air temperature, LIDAR, multiple linear regression, surface temperature, urban heat island

Procedia PDF Downloads 107
30650 Optimizing Nitrogen Fertilizer Application in Rice Cultivation: A Decision Model for Top and Ear Dressing Dosages

Authors: Ya-Li Tsai

Abstract:

Nitrogen is a vital element crucial for crop growth, significantly influencing crop yield. In rice cultivation, farmers often apply substantial nitrogen fertilizer to maximize yields. However, excessive nitrogen application increases the risk of lodging and pest infestation, leading to yield losses. Additionally, conventional flooded irrigation methods consume significant water resources, necessitating precise agricultural and intelligent water management systems. In this study, it leveraged physiological data and field images captured by unmanned aerial vehicles, considering fertilizer treatment and irrigation as key factors. Statistical models incorporating rice physiological data, yield, and vegetation indices from image data were developed. Missing physiological data were addressed using multiple imputation and regression methods, and regression models were established using principal component analysis and stepwise regression. Target nitrogen accumulation at key growth stages was identified to optimize fertilizer application, with the difference between actual and target nitrogen accumulation guiding recommendations for ear dressing dosage. Field experiments conducted in 2022 validated the recommended ear dressing dosage, demonstrating no significant difference in final yield compared to traditional fertilizer levels under alternate wetting and drying irrigation. These findings highlight the efficacy of applying recommended dosages based on fertilizer decision models, offering the potential for reduced fertilizer use while maintaining yield in rice cultivation.

Keywords: intelligent fertilizer management, nitrogen top and ear dressing fertilizer, rice, yield optimization

Procedia PDF Downloads 27
30649 Competition between Regression Technique and Statistical Learning Models for Predicting Credit Risk Management

Authors: Chokri Slim

Abstract:

The objective of this research is attempting to respond to this question: Is there a significant difference between the regression model and statistical learning models in predicting credit risk management? A Multiple Linear Regression (MLR) model was compared with neural networks including Multi-Layer Perceptron (MLP), and a Support vector regression (SVR). The population of this study includes 50 listed Banks in Tunis Stock Exchange (TSE) market from 2000 to 2016. Firstly, we show the factors that have significant effect on the quality of loan portfolios of banks in Tunisia. Secondly, it attempts to establish that the systematic use of objective techniques and methods designed to apprehend and assess risk when considering applications for granting credit, has a positive effect on the quality of loan portfolios of banks and their future collectability. Finally, we will try to show that the bank governance has an impact on the choice of methods and techniques for analyzing and measuring the risks inherent in the banking business, including the risk of non-repayment. The results of empirical tests confirm our claims.

Keywords: credit risk management, multiple linear regression, principal components analysis, artificial neural networks, support vector machines

Procedia PDF Downloads 121
30648 A Statistical Approach to Predict and Classify the Commercial Hatchability of Chickens Using Extrinsic Parameters of Breeders and Eggs

Authors: M. S. Wickramarachchi, L. S. Nawarathna, C. M. B. Dematawewa

Abstract:

Hatchery performance is critical for the profitability of poultry breeder operations. Some extrinsic parameters of eggs and breeders cause to increase or decrease the hatchability. This study aims to identify the affecting extrinsic parameters on the commercial hatchability of local chicken's eggs and determine the most efficient classification model with a hatchability rate greater than 90%. In this study, seven extrinsic parameters were considered: egg weight, moisture loss, breeders age, number of fertilised eggs, shell width, shell length, and shell thickness. Multiple linear regression was performed to determine the most influencing variable on hatchability. First, the correlation between each parameter and hatchability were checked. Then a multiple regression model was developed, and the accuracy of the fitted model was evaluated. Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel, and Random Forest (RF) algorithms were applied to classify the hatchability. This grouping process was conducted using binary classification techniques. Hatchability was negatively correlated with egg weight, breeders' age, shell width, shell length, and positive correlations were identified with moisture loss, number of fertilised eggs, and shell thickness. Multiple linear regression models were more accurate than single linear models regarding the highest coefficient of determination (R²) with 94% and minimum AIC and BIC values. According to the classification results, RF, CART, and kNN had performed the highest accuracy values 0.99, 0.975, and 0.972, respectively, for the commercial hatchery process. Therefore, the RF is the most appropriate machine learning algorithm for classifying the breeder outcomes, which are economically profitable or not, in a commercial hatchery.

Keywords: classification models, egg weight, fertilised eggs, multiple linear regression

Procedia PDF Downloads 58
30647 Regional Flood Frequency Analysis in Narmada Basin: A Case Study

Authors: Ankit Shah, R. K. Shrivastava

Abstract:

Flood and drought are two main features of hydrology which affect the human life. Floods are natural disasters which cause millions of rupees’ worth of damage each year in India and the whole world. Flood causes destruction in form of life and property. An accurate estimate of the flood damage potential is a key element to an effective, nationwide flood damage abatement program. Also, the increase in demand of water due to increase in population, industrial and agricultural growth, has let us know that though being a renewable resource it cannot be taken for granted. We have to optimize the use of water according to circumstances and conditions and need to harness it which can be done by construction of hydraulic structures. For their safe and proper functioning of hydraulic structures, we need to predict the flood magnitude and its impact. Hydraulic structures play a key role in harnessing and optimization of flood water which in turn results in safe and maximum use of water available. Mainly hydraulic structures are constructed on ungauged sites. There are two methods by which we can estimate flood viz. generation of Unit Hydrographs and Flood Frequency Analysis. In this study, Regional Flood Frequency Analysis has been employed. There are many methods for estimating the ‘Regional Flood Frequency Analysis’ viz. Index Flood Method. National Environmental and Research Council (NERC Methods), Multiple Regression Method, etc. However, none of the methods can be considered universal for every situation and location. The Narmada basin is located in Central India. It is drained by most of the tributaries, most of which are ungauged. Therefore it is very difficult to estimate flood on these tributaries and in the main river. As mentioned above Artificial Neural Network (ANN)s and Multiple Regression Method is used for determination of Regional flood Frequency. The annual peak flood data of 20 sites gauging sites of Narmada Basin is used in the present study to determine the Regional Flood relationships. Homogeneity of the considered sites is determined by using the Index Flood Method. Flood relationships obtained by both the methods are compared with each other, and it is found that ANN is more reliable than Multiple Regression Method for the present study area.

Keywords: artificial neural network, index flood method, multi layer perceptrons, multiple regression, Narmada basin, regional flood frequency

Procedia PDF Downloads 387
30646 Non-Methane Hydrocarbons Emission during the Photocopying Process

Authors: Kiurski S. Jelena, Aksentijević M. Snežana, Kecić S. Vesna, Oros B. Ivana

Abstract:

The prosperity of electronic equipment in photocopying environment not only has improved work efficiency, but also has changed indoor air quality. Considering the number of photocopying employed, indoor air quality might be worse than in general office environments. Determining the contribution from any type of equipment to indoor air pollution is a complex matter. Non-methane hydrocarbons are known to have an important role of air quality due to their high reactivity. The presence of hazardous pollutants in indoor air has been detected in one photocopying shop in Novi Sad, Serbia. Air samples were collected and analyzed for five days, during 8-hr working time in three-time intervals, whereas three different sampling points were determined. Using multiple linear regression model and software package STATISTICA 10 the concentrations of occupational hazards and micro-climates parameters were mutually correlated. Based on the obtained multiple coefficients of determination (0.3751, 0.2389, and 0.1975), a weak positive correlation between the observed variables was determined. Small values of parameter F indicated that there was no statistically significant difference between the concentration levels of non-methane hydrocarbons and micro-climates parameters. The results showed that variable could be presented by the general regression model: y = b0 + b1xi1+ b2xi2. Obtained regression equations allow to measure the quantitative agreement between the variation of variables and thus obtain more accurate knowledge of their mutual relations.

Keywords: non-methane hydrocarbons, photocopying process, multiple regression analysis, indoor air quality, pollutant emission

Procedia PDF Downloads 350
30645 Housing Price Prediction Using Machine Learning Algorithms: The Case of Melbourne City, Australia

Authors: The Danh Phan

Abstract:

House price forecasting is a main topic in the real estate market research. Effective house price prediction models could not only allow home buyers and real estate agents to make better data-driven decisions but may also be beneficial for the property policymaking process. This study investigates the housing market by using machine learning techniques to analyze real historical house sale transactions in Australia. It seeks useful models which could be deployed as an application for house buyers and sellers. Data analytics show a high discrepancy between the house price in the most expensive suburbs and the most affordable suburbs in the city of Melbourne. In addition, experiments demonstrate that the combination of Stepwise and Support Vector Machine (SVM), based on the Mean Squared Error (MSE) measurement, consistently outperforms other models in terms of prediction accuracy.

Keywords: house price prediction, regression trees, neural network, support vector machine, stepwise

Procedia PDF Downloads 188
30644 Big Data Analysis with RHadoop

Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim

Abstract:

It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop

Procedia PDF Downloads 407
30643 Analysis of Ferroresonant Overvoltages in Cable-fed Transformers

Authors: George Eduful, Ebenezer A. Jackson, Kingsford A. Atanga

Abstract:

This paper investigates the impacts of cable length and capacity of transformer on ferroresonant overvoltage in cable-fed transformers. The study was conducted by simulation using the EMTP RV. Results show that ferroresonance can cause dangerous overvoltages ranging from 2 to 5 per unit. These overvoltages impose stress on insulations of transformers and cables and subsequently result in system failures. Undertaking Basic Multiple Regression Analysis (BMR) on the results obtained, a statistical model was obtained in terms of cable length and transformer capacity. The model is useful for ferroresonant prediction and control in cable-fed transformers.

Keywords: ferroresonance, cable-fed transformers, EMTP RV, regression analysis

Procedia PDF Downloads 497
30642 Model-Based Software Regression Test Suite Reduction

Authors: Shiwei Deng, Yang Bao

Abstract:

In this paper, we present a model-based regression test suite reducing approach that uses EFSM model dependence analysis and probability-driven greedy algorithm to reduce software regression test suites. The approach automatically identifies the difference between the original model and the modified model as a set of elementary model modifications. The EFSM dependence analysis is performed for each elementary modification to reduce the regression test suite, and then the probability-driven greedy algorithm is adopted to select the minimum set of test cases from the reduced regression test suite that cover all interaction patterns. Our initial experience shows that the approach may significantly reduce the size of regression test suites.

Keywords: dependence analysis, EFSM model, greedy algorithm, regression test

Procedia PDF Downloads 398
30641 Exploration and Evaluation of the Effect of Multiple Countermeasures on Road Safety

Authors: Atheer Al-Nuaimi, Harry Evdorides

Abstract:

Every day many people die or get disabled or injured on roads around the world, which necessitates more specific treatments for transportation safety issues. International road assessment program (iRAP) model is one of the comprehensive road safety models which accounting for many factors that affect road safety in a cost-effective way in low and middle income countries. In iRAP model road safety has been divided into five star ratings from 1 star (the lowest level) to 5 star (the highest level). These star ratings are based on star rating score which is calculated by iRAP methodology depending on road attributes, traffic volumes and operating speeds. The outcome of iRAP methodology are the treatments that can be used to improve road safety and reduce fatalities and serious injuries (FSI) numbers. These countermeasures can be used separately as a single countermeasure or mix as multiple countermeasures for a location. There is general agreement that the adequacy of a countermeasure is liable to consistent losses when it is utilized as a part of mix with different countermeasures. That is, accident diminishment appraisals of individual countermeasures cannot be easily added together. The iRAP model philosophy makes utilization of a multiple countermeasure adjustment factors to predict diminishments in the effectiveness of road safety countermeasures when more than one countermeasure is chosen. A multiple countermeasure correction factors are figured for every 100-meter segment and for every accident type. However, restrictions of this methodology incorporate a presumable over-estimation in the predicted crash reduction. This study aims to adjust this correction factor by developing new models to calculate the effect of using multiple countermeasures on the number of fatalities for a location or an entire road. Regression models have been used to establish relationships between crash frequencies and the factors that affect their rates. Multiple linear regression, negative binomial regression, and Poisson regression techniques were used to develop models that can address the effectiveness of using multiple countermeasures. Analyses are conducted using The R Project for Statistical Computing showed that a model developed by negative binomial regression technique could give more reliable results of the predicted number of fatalities after the implementation of road safety multiple countermeasures than the results from iRAP model. The results also showed that the negative binomial regression approach gives more precise results in comparison with multiple linear and Poisson regression techniques because of the overdispersion and standard error issues.

Keywords: international road assessment program, negative binomial, road multiple countermeasures, road safety

Procedia PDF Downloads 211
30640 A Case Comparative Study of Infant Mortality Rate in North-West Nigeria

Authors: G. I. Onwuka, A. Danbaba, S. U. Gulumbe

Abstract:

This study investigated of Infant Mortality Rate as observed at a general hospital in Kaduna-South, Kaduna State, North West Nigeria. The causes of infant Mortality were examined. The data used for this analysis were collected at the statistics unit of the Hospital. The analysis was carried out on the data using Multiple Linear regression Technique and this showed that there is linear relationship between the dependent variable (death) and the independent variables (malaria, measles, anaemia, and coronary heart disease). The resultant model also revealed that a unit increment in each of these diseases would result to a unit increment in death recorded, 98.7% of the total variation in mortality is explained by the given model. The highest number of mortality was recorded in July, 2005 and the lowest mortality recorded in October, 2009.Recommendations were however made based on the results of the study.

Keywords: infant mortality rate, multiple linear regression, diseases, serial correlation

Procedia PDF Downloads 300
30639 Determination of Genetic Markers, Microsatellites Type, Liked to Milk Production Traits in Goats

Authors: Mohamed Fawzy Elzarei, Yousef Mohammed Al-Dakheel, Ali Mohamed Alseaf

Abstract:

Modern molecular techniques, like single marker analysis for linked traits to these markers, can provide us with rapid and accurate genetic results. In the last two decades of the last century, the applications of molecular techniques were reached a faraway point in cattle, sheep, and pig. In goats, especially in our region, the application of molecular techniques is still far from other species. As reported by many researchers, microsatellites marker is one of the suitable markers for lie studies. The single marker linked to traits of interest is one technique allowed us to early select animals without the necessity for mapping the entire genome. Simplicity, applicability, and low cost of this technique gave this technique a wide range of applications in many areas of genetics and molecular biology. Also, this technique provides a useful approach for evaluating genetic differentiation, particularly in populations that are poorly known genetically. The expected breeding value (EBV) and yield deviation (YD) are considered as the most parameters used for studying the linkage between quantitative characteristics and molecular markers, since these values are raw data corrected for the non-genetic factors. A total of 17 microsatellites markers (from chromosomes 6, 14, 18, 20 and 23) were used in this study to search for areas that could be responsible for genetic variability for some milk traits and search of chromosomal regions that explain part of the phenotypic variance. Results of single-marker analyses were used to identify the linkage between microsatellite markers and variation in EBVs of these traits, Milk yield, Protein percentage, Fat percentage, Litter size and weight at birth, and litter size and weight at weaning. The estimates of the parameters from forward and backward solutions using stepwise regression procedure on milk yield trait, only two markers, OARCP9 and AGLA29, showed a highly significant effect (p≤0.01) in backward and forward solutions. The forward solution for different equations conducted that R2 of these equations were highly depending on only two partials regressions coefficient (βi,) for these markers. For the milk protein trait, four marker showed significant effect BMS2361, CSSM66 (p≤0.01), BMS2626, and OARCP9 (p≤0.05). By the other way, four markers (MCM147, BM1225, INRA006, andINRA133) showed highly significant effect (p≤0.01) in both backward and forward solutions in association with milk fat trait. For both litter size at birth and at weaning traits, only one marker (BM143(p≤0.01) and RJH1 (p≤0.05), respectively) showed a significant effect in backward and forward solutions. The estimates of the parameters from forward and backward solution using stepwise regression procedure on litter weight at birth (LWB) trait only one marker (MCM147) showed highly significant effect (p≤0.01) and two marker (ILSTS011, CSSM66) showed a significant effect (p≤0.05) in backward and forward solutions.

Keywords: microsatellites marker, estimated breeding value, stepwise regression, milk traits

Procedia PDF Downloads 53
30638 Public Preferences for Lung Cancer Screening in China: A Discrete Choice Experiment

Authors: Zixuan Zhao, Lingbin Du, Le Wang, Youqing Wang, Yi Yang, Jingjun Chen, Hengjin Dong

Abstract:

Objectives: Few results from public attitudes for lung cancer screening are available both in China and abroad. This study aimed to identify preferred lung cancer screening modalities in a Chinese population and predict uptake rates of different modalities. Materials and Methods: A discrete choice experiment questionnaire was administered to 392 Chinese individuals aged 50–74 years who were at high risk for lung cancer. Each choice set had two lung screening options and an option to opt-out, and respondents were asked to choose the most preferred one. Both mixed logit analysis and stepwise logistic analysis were conducted to explore whether preferences were related to respondent characteristics and identify which kinds of respondents were more likely to opt out of any screening. Results: On mixed logit analysis, attributes that were predictive of choice at 1% level of statistical significance included the screening interval, screening venue, and out-of-pocket costs. The preferred screening modality seemed to be screening by low-dose computed tomography (LDCT) + blood test once a year in a general hospital at a cost of RMB 50; this could increase the uptake rate by 0.40 compared to the baseline setting. On stepwise logistic regression, those with no endowment insurance were more likely to opt out; those who were older and housewives/househusbands, and those with a health check habit and with commercial endowment insurance were less likely to opt out from a screening programme. Conclusions: There was considerable variance between real risk and self-perceived risk of lung cancer among respondents, and further research is required in this area. Lung cancer screening uptake can be increased by offering various screening modalities, so as to help policymakers further design the screening modality.

Keywords: lung cancer, screening, China., discrete choice experiment

Procedia PDF Downloads 219
30637 A DFT-Based QSARs Study of Kovats Retention Indices of Adamantane Derivatives

Authors: Z. Bayat

Abstract:

A quantitative structure–property relationship (QSPR) study was performed to develop models those relate the structures of 65 Kovats retention index (RI) of adamantane derivatives. Molecular descriptors derived solely from 3D structures of the molecular compounds. The usefulness of the quantum chemical descriptors, calculated at the level of the DFT theories using 6-311+G** basis set for QSAR study of adamantane derivatives was examined. The use of descriptors calculated only from molecular structure eliminates the need to experimental determination of properties for use in the correlation and allows for the estimation of RI for molecules not yet synthesized. The prediction results are in good agreement with the experimental value. A multi-parametric equation containing maximum Four descriptors at B3LYP/6-31+G** method with good statistical qualities (R2train=0.913, Ftrain=97.67, R2test=0.770, Ftest=3.21, Q2LOO=0.895, R2adj=0.904, Q2LGO=0.844) was obtained by Multiple Linear Regression using stepwise method.

Keywords: DFT, adamantane, QSAR, Kovat

Procedia PDF Downloads 339
30636 Blood Pressure and Anthropometric Measurements: A Correlational Study

Authors: Abdul-Monim Batiha, Manar AlAzzam, Mohammed ALBashtawy, Loai Tawalbeh, Ahmad Tubaishat, Fadwa N. Alhalaiqa

Abstract:

Background: Obesity is the major modifiable risk factor for many chronic illnesses especially high blood pressure. Objectives: To evaluate the relationship between anthropometric indices and high blood pressure, and which one was most strongly correlated with high blood pressure in Jordanian population. Methods: A cross-sectional study was conducted with a total 622 students and workers from three Jordanian universities. Results: Nearly half of the participant are overweight (34.7%) and obese (15.4%) and hypertension was detected among 138 (22.2%) of the participants. Linear correlation was significant (p<0.01) between both systolic blood pressure and diastolic blood pressure for all anthropometric indices, except for A body shape index and diastolic blood pressure was significant at p< 0.05. Stepwise multiple linear regression analysis was used to assess the influence of age and anthropometric measurements. Conclusions: The waist circumference was the only independent predictor of hypertension, showing that this simple measurement may be an importance marker of high blood pressure in Jordanian population.

Keywords: anthropometric indices, Jordan, blood pressure, cross-sectional study, obesity, hypertension, waist circumference

Procedia PDF Downloads 262
30635 Indian Premier League (IPL) Score Prediction: Comparative Analysis of Machine Learning Models

Authors: Rohini Hariharan, Yazhini R, Bhamidipati Naga Shrikarti

Abstract:

In the realm of cricket, particularly within the context of the Indian Premier League (IPL), the ability to predict team scores accurately holds significant importance for both cricket enthusiasts and stakeholders alike. This paper presents a comprehensive study on IPL score prediction utilizing various machine learning algorithms, including Support Vector Machines (SVM), XGBoost, Multiple Regression, Linear Regression, K-nearest neighbors (KNN), and Random Forest. Through meticulous data preprocessing, feature engineering, and model selection, we aimed to develop a robust predictive framework capable of forecasting team scores with high precision. Our experimentation involved the analysis of historical IPL match data encompassing diverse match and player statistics. Leveraging this data, we employed state-of-the-art machine learning techniques to train and evaluate the performance of each model. Notably, Multiple Regression emerged as the top-performing algorithm, achieving an impressive accuracy of 77.19% and a precision of 54.05% (within a threshold of +/- 10 runs). This research contributes to the advancement of sports analytics by demonstrating the efficacy of machine learning in predicting IPL team scores. The findings underscore the potential of advanced predictive modeling techniques to provide valuable insights for cricket enthusiasts, team management, and betting agencies. Additionally, this study serves as a benchmark for future research endeavors aimed at enhancing the accuracy and interpretability of IPL score prediction models.

Keywords: indian premier league (IPL), cricket, score prediction, machine learning, support vector machines (SVM), xgboost, multiple regression, linear regression, k-nearest neighbors (KNN), random forest, sports analytics

Procedia PDF Downloads 19
30634 Application Difference between Cox and Logistic Regression Models

Authors: Idrissa Kayijuka

Abstract:

The logistic regression and Cox regression models (proportional hazard model) at present are being employed in the analysis of prospective epidemiologic research looking into risk factors in their application on chronic diseases. However, a theoretical relationship between the two models has been studied. By definition, Cox regression model also called Cox proportional hazard model is a procedure that is used in modeling data regarding time leading up to an event where censored cases exist. Whereas the Logistic regression model is mostly applicable in cases where the independent variables consist of numerical as well as nominal values while the resultant variable is binary (dichotomous). Arguments and findings of many researchers focused on the overview of Cox and Logistic regression models and their different applications in different areas. In this work, the analysis is done on secondary data whose source is SPSS exercise data on BREAST CANCER with a sample size of 1121 women where the main objective is to show the application difference between Cox regression model and logistic regression model based on factors that cause women to die due to breast cancer. Thus we did some analysis manually i.e. on lymph nodes status, and SPSS software helped to analyze the mentioned data. This study found out that there is an application difference between Cox and Logistic regression models which is Cox regression model is used if one wishes to analyze data which also include the follow-up time whereas Logistic regression model analyzes data without follow-up-time. Also, they have measurements of association which is different: hazard ratio and odds ratio for Cox and logistic regression models respectively. A similarity between the two models is that they are both applicable in the prediction of the upshot of a categorical variable i.e. a variable that can accommodate only a restricted number of categories. In conclusion, Cox regression model differs from logistic regression by assessing a rate instead of proportion. The two models can be applied in many other researches since they are suitable methods for analyzing data but the more recommended is the Cox, regression model.

Keywords: logistic regression model, Cox regression model, survival analysis, hazard ratio

Procedia PDF Downloads 423
30633 Behind Fuzzy Regression Approach: An Exploration Study

Authors: Lavinia B. Dulla

Abstract:

The exploration study of the fuzzy regression approach attempts to present that fuzzy regression can be used as a possible alternative to classical regression. It likewise seeks to assess the differences and characteristics of simple linear regression and fuzzy regression using the width of prediction interval, mean absolute deviation, and variance of residuals. Based on the simple linear regression model, the fuzzy regression approach is worth considering as an alternative to simple linear regression when the sample size is between 10 and 20. As the sample size increases, the fuzzy regression approach is not applicable to use since the assumption regarding large sample size is already operating within the framework of simple linear regression. Nonetheless, it can be suggested for a practical alternative when decisions often have to be made on the basis of small data.

Keywords: fuzzy regression approach, minimum fuzziness criterion, interval regression, prediction interval

Procedia PDF Downloads 258
30632 Economic Analysis of Cowpea (Unguiculata spp) Production in Northern Nigeria: A Case Study of Kano Katsina and Jigawa States

Authors: Yakubu Suleiman, S. A. Musa

Abstract:

Nigeria is the largest cowpea producer in the world, accounting for about 45%, followed by Brazil with about 17%. Cowpea is grown in Kano, Bauchi, Katsina, Borno in the north, Oyo in the west, and to the lesser extent in Enugu in the east. This study was conducted to determine the input–output relationship of Cowpea production in Kano, Katsina, and Jigawa states of Nigeria. The data were collected with the aid of 1000 structured questionnaires that were randomly distributed to Cowpea farmers in the three states mentioned above of the study area. The data collected were analyzed using regression analysis (Cobb–Douglass production function model). The result of the regression analysis revealed the coefficient of multiple determinations, R2, to be 72.5% and the F ration to be 106.20 and was found to be significant (P < 0.01). The regression coefficient of constant is 0.5382 and is significant (P < 0.01). The regression coefficient with respect to labor and seeds were 0.65554 and 0.4336, respectively, and they are highly significant (P < 0.01). The regression coefficient with respect to fertilizer is 0.26341 which is significant (P < 0.05). This implies that a unit increase of any one of the variable inputs used while holding all other variables inputs constants, will significantly increase the total Cowpea output by their corresponding coefficient. This indicated that farmers in the study area are operating in stage II of the production function. The result revealed that Cowpea farmer in Kano, Jigawa and Katsina States realized a profit of N15,997, N34,016 and N19,788 per hectare respectively. It is hereby recommended that more attention should be given to Cowpea production by government and research institutions.

Keywords: coefficient, constant, inputs, regression

Procedia PDF Downloads 387
30631 An In-Depth Inquiry into the Impact of Poor Teacher-Student Relationships on Chronic Absenteeism in Secondary Schools of West Java Province, Indonesia

Authors: Yenni Anggrayni

Abstract:

The lack of awareness of the significant prevalence of school absenteeism in Indonesia, which ultimately results in high rates of school dropouts, is an unresolved issue. Therefore, this study aims to investigate the root causes of chronic absenteeism qualitatively and quantitatively using the bioecological systems paradigm in secondary schools for any reason. This study used an open-ended questionnaire to collect data from 1,148 students in six West Java Province districts/cities. Univariate and stepwise multiple logistic regression analyses produced a prediction model for the components. Analysis results show that poor teacher-student relationships, bullying by peers or teachers, negative perception of education, and lack of parental involvement in learning activities are the leading causes of chronic absenteeism. Another finding is to promote home-school partnerships to improve school climate and parental involvement in learning to address chronic absenteeism.

Keywords: bullying, chronic absenteeism, dropout of school, home-school partnerships, parental involvement

Procedia PDF Downloads 34
30630 SVM-Based Modeling of Mass Transfer Potential of Multiple Plunging Jets

Authors: Surinder Deswal, Mahesh Pal

Abstract:

The paper investigates the potential of support vector machines based regression approach to model the mass transfer capacity of multiple plunging jets, both vertical (θ = 90°) and inclined (θ = 60°). The data set used in this study consists of four input parameters with a total of eighty eight cases. For testing, tenfold cross validation was used. Correlation coefficient values of 0.971 and 0.981 (root mean square error values of 0.0025 and 0.0020) were achieved by using polynomial and radial basis kernel functions based support vector regression respectively. Results suggest an improved performance by radial basis function in comparison to polynomial kernel based support vector machines. The estimated overall mass transfer coefficient, by both the kernel functions, is in good agreement with actual experimental values (within a scatter of ±15 %); thereby suggesting the utility of support vector machines based regression approach.

Keywords: mass transfer, multiple plunging jets, support vector machines, ecological sciences

Procedia PDF Downloads 424
30629 Reliability of Using Standard Penetration Test (SPT) in Evaluation of Soil Properties

Authors: Hossein Alimohammadi, Mohsen Amirmojahedi, Mehrdad Rowhani

Abstract:

Soil properties are used by geotechnical engineers to evaluate and analyze site conditions for designing purposes. Although basic soil classification tests are easy to perform and provide useful information to determine the properties of soils, it may take time to get the result and add some costs to the projects. Standard Penetration Test (SPT) provides an opportunity to evaluate soil parameters without performing laboratory tests. In addition to its simplicity and cheapness, the results become available immediately. This research provides a guideline on the application of the SPT test method, reliability of adapting the SPT test results in evaluating soil physical and mechanical properties such as Atterberg limits, shear strength, and compressive strength compressibility parameters. A total of 70 boreholes were investigated in this study by taking soil samples between depths of 1.2 to 15.25 meters. The project site was located in Morrow County, Ohio. A regression-based formula was proposed based on Tobit regression with a stepwise variable selection analysis conducted between SPT and other typical soil properties obtained from soil tests. The results of the research illustrated that the shear strength and physical properties of the soil affect the SPT number. The proposed correlation can help engineers to use SPT test results in their design with higher accuracy.

Keywords: standard penetration test, soil properties, soil classification, regression method

Procedia PDF Downloads 159
30628 Predictors of School Safety Awareness among Malaysian Primary School Teachers

Authors: Ssekamanya, Mastura Badzis, Khamsiah Ismail, Dayang Shuzaidah Bt Abduludin

Abstract:

With rising incidents of school violence worldwide, educators and researchers are trying to understand and find ways to enhance the safety of children at school. The purpose of this study was to investigate the extent to which the demographic variables of gender, age, length of service, position, academic qualification, and school location predicted teachers’ awareness about school safety practices in Malaysian primary schools. A stratified random sample of 380 teachers was selected in the central Malaysian states of Kuala Lumpur and Selangor. Multiple regression analysis revealed that none of the factors was a good predictor of awareness about school safety training, delivery methods of school safety information, and available school safety programs. Awareness about school safety activities was significantly predicted by school location (whether the school was located in a rural or urban area). While these results may reflect a general lack of awareness about school safety among primary school teachers in the selected locations, a national study needs to be conducted for the whole country.

Keywords: school safety awareness, predictors of school safety, multiple regression analysis, malaysian primary schools

Procedia PDF Downloads 418