Search results for: multiple regression analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30586

Search results for: multiple regression analysis

30586 Internet Purchases in European Union Countries: Multiple Linear Regression Approach

Authors: Ksenija Dumičić, Anita Čeh Časni, Irena Palić

Abstract:

This paper examines economic and Information and Communication Technology (ICT) development influence on recently increasing Internet purchases by individuals for European Union member states. After a growing trend for Internet purchases in EU27 was noticed, all possible regression analysis was applied using nine independent variables in 2011. Finally, two linear regression models were studied in detail. Conducted simple linear regression analysis confirmed the research hypothesis that the Internet purchases in analysed EU countries is positively correlated with statistically significant variable Gross Domestic Product per capita (GDPpc). Also, analysed multiple linear regression model with four regressors, showing ICT development level, indicates that ICT development is crucial for explaining the Internet purchases by individuals, confirming the research hypothesis.

Keywords: European union, Internet purchases, multiple linear regression model, outlier

Procedia PDF Downloads 275
30585 The Strengths and Limitations of the Statistical Modeling of Complex Social Phenomenon: Focusing on SEM, Path Analysis, or Multiple Regression Models

Authors: Jihye Jeon

Abstract:

This paper analyzes the conceptual framework of three statistical methods, multiple regression, path analysis, and structural equation models. When establishing research model of the statistical modeling of complex social phenomenon, it is important to know the strengths and limitations of three statistical models. This study explored the character, strength, and limitation of each modeling and suggested some strategies for accurate explaining or predicting the causal relationships among variables. Especially, on the studying of depression or mental health, the common mistakes of research modeling were discussed.

Keywords: multiple regression, path analysis, structural equation models, statistical modeling, social and psychological phenomenon

Procedia PDF Downloads 598
30584 Multi-Linear Regression Based Prediction of Mass Transfer by Multiple Plunging Jets

Authors: S. Deswal, M. Pal

Abstract:

The paper aims to compare the performance of vertical and inclined multiple plunging jets and to model and predict their mass transfer capacity by multi-linear regression based approach. The multiple vertical plunging jets have jet impact angle of θ = 90O; whereas, multiple inclined plunging jets have jet impact angle of θ = 600. The results of the study suggests that mass transfer is higher for multiple jets, and inclined multiple plunging jets have up to 1.6 times higher mass transfer than vertical multiple plunging jets under similar conditions. The derived relationship, based on multi-linear regression approach, has successfully predicted the volumetric mass transfer coefficient (KLa) from operational parameters of multiple plunging jets with a correlation coefficient of 0.973, root mean square error of 0.002 and coefficient of determination of 0.946. The results suggests that predicted overall mass transfer coefficient is in good agreement with actual experimental values; thereby suggesting the utility of derived relationship based on multi-linear regression based approach and can be successfully employed in modelling mass transfer by multiple plunging jets.

Keywords: mass transfer, multiple plunging jets, multi-linear regression, earth sciences

Procedia PDF Downloads 424
30583 Interference among Lambsquarters and Oil Rapeseed Cultivars

Authors: Reza Siyami, Bahram Mirshekari

Abstract:

Seed and oil yield of rapeseed is considerably affected by weeds interference including mustard (Sinapis arvensis L.), lambsquarters (Chenopodium album L.) and redroot pigweed (Amaranthus retroflexus L.) throughout the East Azerbaijan province in Iran. To formulate the relationship between four independent growth variables measured in our experiment with a dependent variable, multiple regression analysis was carried out for the weed leaves number per plant (X1), green cover percentage (X2), LAI (X3) and leaf area per plant (X4) as independent variables and rapeseed oil yield as a dependent variable. The multiple regression equation is shown as follows: Seed essential oil yield (kg/ha) = 0.156 + 0.0325 (X1) + 0.0489 (X2) + 0.0415 (X3) + 0.133 (X4). Furthermore, the stepwise regression analysis was also carried out for the data obtained to test the significance of the independent variables affecting the oil yield as a dependent variable. The resulted stepwise regression equation is shown as follows: Oil yield = 4.42 + 0.0841 (X2) + 0.0801 (X3); R2 = 81.5. The stepwise regression analysis verified that the green cover percentage and LAI of weed had a marked increasing effect on the oil yield of rapeseed.

Keywords: green cover percentage, independent variable, interference, regression

Procedia PDF Downloads 386
30582 Ketones Emission during Pad Printing Process

Authors: Kiurski S. Jelena, Aksentijević M. Snežana, Oros B. Ivana, Kecić S. Vesna, Djogo Z. Maja

Abstract:

The paper investigates the effect of light intensity on the formation of two ketones, acetone and methyl ethyl ketone, in working premises of five pad printing departments in Novi Sad, Serbia. Multiple linear regression analysis examined the form of interdependency concentrations of methyl ethyl ketone, acetone and light intensity in five printing presses at seven sampling points, using Statistica software package version 10th. The results show an average stacking variation investigated variable and can be presented by the general regression model: y = b0 + b1xi1 + b2xi2.

Keywords: acetone, methyl ethyl ketone, multiple linear regression analysis, pad printing

Procedia PDF Downloads 387
30581 A Preliminary Study of the Subcontractor Evaluation System for the International Construction Market

Authors: Hochan Seok, Woosik Jang, Seung-Heon Han

Abstract:

The stagnant global construction market has intensified competition since 2008 among firms that aim to win overseas contracts. Against this backdrop, subcontractor selection is identified as one of the most critical success factors in overseas construction project. However, it is difficult to select qualified subcontractors due to the lack of evaluation standards and reliability. This study aims to identify the problems associated with existing subcontractor evaluations using a correlations analysis and a multiple regression analysis with pre-qualification and performance evaluation of 121 firms in six countries.

Keywords: subcontractor evaluation system, pre-qualification, performance evaluation, correlation analysis, multiple regression analysis

Procedia PDF Downloads 333
30580 Quantitative Structure Activity Relationship and Insilco Docking of Substituted 1,3,4-Oxadiazole Derivatives as Potential Glucosamine-6-Phosphate Synthase Inhibitors

Authors: Suman Bala, Sunil Kamboj, Vipin Saini

Abstract:

Quantitative Structure Activity Relationship (QSAR) analysis has been developed to relate antifungal activity of novel substituted 1,3,4-oxadiazole against Candida albicans and Aspergillus niger using computer assisted multiple regression analysis. The study has shown the better relationship between antifungal activities with respect to various descriptors established by multiple regression analysis. The analysis has shown statistically significant correlation with R2 values 0.932 and 0.782 against Candida albicans and Aspergillus niger respectively. These derivatives were further subjected to molecular docking studies to investigate the interactions between the target compounds and amino acid residues present in the active site of glucosamine-6-phosphate synthase. All the synthesized compounds have better docking score as compared to standard fluconazole. Our results could be used for the further design as well as development of optimal and potential antifungal agents.

Keywords: 1, 3, 4-oxadiazole, QSAR, multiple linear regression, docking, glucosamine-6-phosphate synthase

Procedia PDF Downloads 304
30579 Application and Verification of Regression Model to Landslide Susceptibility Mapping

Authors: Masood Beheshtirad

Abstract:

Identification of regions having potential for landslide occurrence is one of the basic measures in natural resources management. Different landslide hazard mapping models are proposed based on the environmental condition and goals. In this research landslide hazard map using multiple regression model were provided and applicability of this model is investigated in Baghdasht watershed. Dependent variable is landslide inventory map and independent variables consist of information layers as Geology, slope, aspect, distance from river, distance from road, fault and land use. For doing this, existing landslides have been identified and an inventory map made. The landslide hazard map is based on the multiple regression provided. The level of similarity potential hazard classes and figures of this model were compared with the landslide inventory map in the SPSS environments. Results of research showed that there is a significant correlation between the potential hazard classes and figures with area of the landslides. The multiple regression model is suitable for application in the Baghdasht Watershed.

Keywords: landslide, mapping, multiple model, regression

Procedia PDF Downloads 299
30578 Statistical Model of Water Quality in Estero El Macho, Machala-El Oro

Authors: Rafael Zhindon Almeida

Abstract:

Surface water quality is an important concern for the evaluation and prediction of water quality conditions. The objective of this study is to develop a statistical model that can accurately predict the water quality of the El Macho estuary in the city of Machala, El Oro province. The methodology employed in this study is of a basic type that involves a thorough search for theoretical foundations to improve the understanding of statistical modeling for water quality analysis. The research design is correlational, using a multivariate statistical model involving multiple linear regression and principal component analysis. The results indicate that water quality parameters such as fecal coliforms, biochemical oxygen demand, chemical oxygen demand, iron and dissolved oxygen exceed the allowable limits. The water of the El Macho estuary is determined to be below the required water quality criteria. The multiple linear regression model, based on chemical oxygen demand and total dissolved solids, explains 99.9% of the variance of the dependent variable. In addition, principal component analysis shows that the model has an explanatory power of 86.242%. The study successfully developed a statistical model to evaluate the water quality of the El Macho estuary. The estuary did not meet the water quality criteria, with several parameters exceeding the allowable limits. The multiple linear regression model and principal component analysis provide valuable information on the relationship between the various water quality parameters. The findings of the study emphasize the need for immediate action to improve the water quality of the El Macho estuary to ensure the preservation and protection of this valuable natural resource.

Keywords: statistical modeling, water quality, multiple linear regression, principal components, statistical models

Procedia PDF Downloads 44
30577 Multiobjective Optimization of a Pharmaceutical Formulation Using Regression Method

Authors: J. Satya Eswari, Ch. Venkateswarlu

Abstract:

The formulation of a commercial pharmaceutical product involves several composition factors and response characteristics. When the formulation requires to satisfy multiple response characteristics which are conflicting, an optimal solution requires the need for an efficient multiobjective optimization technique. In this work, a regression is combined with a non-dominated sorting differential evolution (NSDE) involving Naïve & Slow and ε constraint techniques to derive different multiobjective optimization strategies, which are then evaluated by means of a trapidil pharmaceutical formulation. The analysis of the results show the effectiveness of the strategy that combines the regression model and NSDE with the integration of both Naïve & Slow and ε constraint techniques for Pareto optimization of trapidil formulation. With this strategy, the optimal formulation at pH=6.8 is obtained with the decision variables of micro crystalline cellulose, hydroxypropyl methylcellulose and compression pressure. The corresponding response characteristics of rate constant and release order are also noted down. The comparison of these results with the experimental data and with those of other multiple regression model based multiobjective evolutionary optimization strategies signify the better performance for optimal trapidil formulation.

Keywords: pharmaceutical formulation, multiple regression model, response surface method, radial basis function network, differential evolution, multiobjective optimization

Procedia PDF Downloads 380
30576 Optimization of Machine Learning Regression Results: An Application on Health Expenditures

Authors: Songul Cinaroglu

Abstract:

Machine learning regression methods are recommended as an alternative to classical regression methods in the existence of variables which are difficult to model. Data for health expenditure is typically non-normal and have a heavily skewed distribution. This study aims to compare machine learning regression methods by hyperparameter tuning to predict health expenditure per capita. A multiple regression model was conducted and performance results of Lasso Regression, Random Forest Regression and Support Vector Machine Regression recorded when different hyperparameters are assigned. Lambda (λ) value for Lasso Regression, number of trees for Random Forest Regression, epsilon (ε) value for Support Vector Regression was determined as hyperparameters. Study results performed by using 'k' fold cross validation changed from 5 to 50, indicate the difference between machine learning regression results in terms of R², RMSE and MAE values that are statistically significant (p < 0.001). Study results reveal that Random Forest Regression (R² ˃ 0.7500, RMSE ≤ 0.6000 ve MAE ≤ 0.4000) outperforms other machine learning regression methods. It is highly advisable to use machine learning regression methods for modelling health expenditures.

Keywords: machine learning, lasso regression, random forest regression, support vector regression, hyperparameter tuning, health expenditure

Procedia PDF Downloads 187
30575 Form of Distribution of Traffic Accident and Environment Factors of Road Affecting of Traffic Accident in Dusit District, Only Area Responsible of Samsen Police Station

Authors: Musthaya Patchanee

Abstract:

This research aimed to study form of traffic distribution and environmental factors of road that affect traffic accidents in Dusit District, only areas responsible of Samsen Police Station. Data used in this analysis is the secondary data of traffic accident case from year 2011. Observed area units are 15 traffic lines that are under responsible of Samsen Police Station. Technique and method used are the Cartographic Method, the Correlation Analysis, and the Multiple Regression Analysis. The results of form of traffic accidents show that, the Samsen Road area had most traffic accidents (24.29%), second was Rachvithi Road (18.10%), third was Sukhothai Road (15.71%), fourth was Rachasrima Road (12.38%), and fifth was Amnuaysongkram Road (7.62%). The result from Dusit District, only areas responsible of Samsen police station, has suggested that the scale of accidents have high positive correlation with statistic significant at level 0.05 and the frequency of travel (r=0.857). Traffic intersection point (r=0.763)and traffic control equipments (r=0.713) are relevant factors respectively. By using the Multiple Regression Analysis, travel frequency is the only one that has considerable influences on traffic accidents in Dusit district only Samsen Police Station area. Also, a factor in frequency of travel can explain the change in traffic accidents scale to 73.40 (R2 = 0.734). By using the Multiple regression summation from analysis was Y ̂=-7.977+0.044X6.

Keywords: form of traffic distribution, environmental factors of road, traffic accidents, Dusit district

Procedia PDF Downloads 354
30574 Chemometric QSRR Evaluation of Behavior of s-Triazine Pesticides in Liquid Chromatography

Authors: Lidija R. Jevrić, Sanja O. Podunavac-Kuzmanović, Strahinja Z. Kovačević

Abstract:

This study considers the selection of the most suitable in silico molecular descriptors that could be used for s-triazine pesticides characterization. Suitable descriptors among topological, geometrical and physicochemical are used for quantitative structure-retention relationships (QSRR) model establishment. Established models were obtained using linear regression (LR) and multiple linear regression (MLR) analysis. In this paper, MLR models were established avoiding multicollinearity among the selected molecular descriptors. Statistical quality of established models was evaluated by standard and cross-validation statistical parameters. For detection of similarity or dissimilarity among investigated s-triazine pesticides and their classification, principal component analysis (PCA) and hierarchical cluster analysis (HCA) were used and gave similar grouping. This study is financially supported by COST action TD1305.

Keywords: chemometrics, classification analysis, molecular descriptors, pesticides, regression analysis

Procedia PDF Downloads 352
30573 An Overbooking Model for Car Rental Service with Different Types of Cars

Authors: Naragain Phumchusri, Kittitach Pongpairoj

Abstract:

Overbooking is a very useful revenue management technique that could help reduce costs caused by either undersales or oversales. In this paper, we propose an overbooking model for two types of cars that can minimize the total cost for car rental service. With two types of cars, there is an upgrade possibility for lower type to upper type. This makes the model more complex than one type of cars scenario. We have found that convexity can be proved in this case. Sensitivity analysis of the parameters is conducted to observe the effects of relevant parameters on the optimal solution. Model simplification is proposed using multiple linear regression analysis, which can help estimate the optimal overbooking level using appropriate independent variables. The results show that the overbooking level from multiple linear regression model is relatively close to the optimal solution (with the adjusted R-squared value of at least 72.8%). To evaluate the performance of the proposed model, the total cost was compared with the case where the decision maker uses a naïve method for the overbooking level. It was found that the total cost from optimal solution is only 0.5 to 1 percent (on average) lower than the cost from regression model, while it is approximately 67% lower than the cost obtained by the naïve method. It indicates that our proposed simplification method using regression analysis can effectively perform in estimating the overbooking level.

Keywords: overbooking, car rental industry, revenue management, stochastic model

Procedia PDF Downloads 142
30572 A Study of Anthropometric Correlation between Upper and Lower Limb Dimensions in Sudanese Population

Authors: Altayeb Abdalla Ahmed

Abstract:

Skeletal phenotype is a product of a balanced interaction between genetics and environmental factors throughout different life stages. Therefore, interlimb proportions are variable between populations. Although interlimb proportion indices have been used in anthropology in assessing the influence of various environmental factors on limbs, an extensive literature review revealed that there is a paucity of published research assessing interlimb part correlations and possibility of reconstruction. Hence, this study aims to assess the relationships between upper and lower limb parts and develop regression formulae to reconstruct the parts from one another. The left upper arm length, ulnar length, wrist breadth, hand length, hand breadth, tibial length, bimalleolar breadth, foot length, and foot breadth of 376 right-handed subjects, comprising 187 males and 189 females (aged 25-35 years), were measured. Initially, the data were analyzed using basic univariate analysis and independent t-tests; then sex-specific simple and multiple linear regression models were used to estimate upper limb parts from lower limb parts and vice-versa. The results of this study indicated significant sexual dimorphism for all variables. The results indicated a significant correlation between the upper and lower limbs parts (p < 0.01). Linear and multiple (stepwise) regression equations were developed to reconstruct the limb parts in the presence of a single or multiple dimension(s) from the other limb. Multiple stepwise regression equations generated better reconstructions than simple equations. These results are significant in forensics as it can aid in identification of multiple isolated limb parts particularly during mass disasters and criminal dismemberment. Although a DNA analysis is the most reliable tool for identification, its usage has multiple limitations in undeveloped countries, e.g., cost, facility availability, and trained personnel. Furthermore, it has important implication in plastic and orthopedic reconstructive surgeries. This study is the only reported study assessing the correlation and prediction capabilities between many of the upper and lower dimensions. The present study demonstrates a significant correlation between the interlimb parts in both sexes, which indicates a possibility to reconstruction using regression equations.

Keywords: anthropometry, correlation, limb, Sudanese

Procedia PDF Downloads 260
30571 Competition between Regression Technique and Statistical Learning Models for Predicting Credit Risk Management

Authors: Chokri Slim

Abstract:

The objective of this research is attempting to respond to this question: Is there a significant difference between the regression model and statistical learning models in predicting credit risk management? A Multiple Linear Regression (MLR) model was compared with neural networks including Multi-Layer Perceptron (MLP), and a Support vector regression (SVR). The population of this study includes 50 listed Banks in Tunis Stock Exchange (TSE) market from 2000 to 2016. Firstly, we show the factors that have significant effect on the quality of loan portfolios of banks in Tunisia. Secondly, it attempts to establish that the systematic use of objective techniques and methods designed to apprehend and assess risk when considering applications for granting credit, has a positive effect on the quality of loan portfolios of banks and their future collectability. Finally, we will try to show that the bank governance has an impact on the choice of methods and techniques for analyzing and measuring the risks inherent in the banking business, including the risk of non-repayment. The results of empirical tests confirm our claims.

Keywords: credit risk management, multiple linear regression, principal components analysis, artificial neural networks, support vector machines

Procedia PDF Downloads 119
30570 The Potential Factors Relating to the Decision of Return Migration of Myanmar Migrant Workers: A Case Study in Prachuap Khiri Khan Province

Authors: Musthaya Patchanee

Abstract:

The aim of this research is to study potential factors relating to the decision of return migration of Myanmar migrant workers in Prachuap Khiri Khan Province by conducting a random sampling of 400 people aged between 15-59 who migrated from Myanmar. The information collected through interviews was analyzed to find a percentage and mean using the Stepwise Multiple Regression Analysis. The results have shown that 33.25% of Myanmar migrant workers want to return to their home country within the next 1-5 years, 46.25%, in 6-10 years and the rest, in over 10 years. The factors relating to such decision can be concluded that the scale of the decision of return migration has a positive relationship with a statistical significance at 0.05 with a conformity with friends and relatives (r=0.886), a relationship with family and community (r=0.782), possession of land in hometown (r=0.756) and educational level (r=0.699). However, the factor of property possession in Prachuap Khiri Khan is the only factor with a high negative relationship (r=0.-537). From the Stepwise Multiple Regression Analysis, the results have shown that the conformity with friends and relatives and educational level factors are influential to the decision of return migration of Myanmar migrant workers in Prachuap Khiri Khan Province, which can predict the decision at 86.60% and the multiple regression equation from the analysis is Y= 6.744+1.198 conformity + 0.647 education.

Keywords: decision of return migration, factors of return migration, Myanmar migrant workers, Prachuap Khiri Khan Province

Procedia PDF Downloads 508
30569 A Study on the Conspicuous Consumption, Involvement and Physical and Mental Health of Pet Owners

Authors: Chi-Yueh Hsu, Hsuan-Liang Hsu, Hsiu-Hui Chiang

Abstract:

This study is to explore the relationship between the conspicuous consumption, leisure involvement and physical and mental health, and to understand the prediction of conspicuous consumption and leisure involvement to physical and mental health. The data was collected and analysed by purposive sampling, and the research objects were the dog walkers in Taiwan area. A total of 300 questionnaires were issued and after shaving the invalid questionnaire, a total of 246 valid samples were collected, and the effective rate was 82%.. The data were analyzed by correlation analysis and multiple stepwise regression analysis. The results showed that there was a significant correlation between conspicuous consumption and leisure involvement, and the conspicuous consumption and leisure involvement of dog walkers have a significant impact on physical and mental health, especially in self-expression, attractiveness and centrality of leisure involvement have a significant impact on physical and mental health.

Keywords: walking dog, attractiveness, self-expression, multiple stepwise regression analysis

Procedia PDF Downloads 219
30568 A Statistical Approach to Predict and Classify the Commercial Hatchability of Chickens Using Extrinsic Parameters of Breeders and Eggs

Authors: M. S. Wickramarachchi, L. S. Nawarathna, C. M. B. Dematawewa

Abstract:

Hatchery performance is critical for the profitability of poultry breeder operations. Some extrinsic parameters of eggs and breeders cause to increase or decrease the hatchability. This study aims to identify the affecting extrinsic parameters on the commercial hatchability of local chicken's eggs and determine the most efficient classification model with a hatchability rate greater than 90%. In this study, seven extrinsic parameters were considered: egg weight, moisture loss, breeders age, number of fertilised eggs, shell width, shell length, and shell thickness. Multiple linear regression was performed to determine the most influencing variable on hatchability. First, the correlation between each parameter and hatchability were checked. Then a multiple regression model was developed, and the accuracy of the fitted model was evaluated. Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), k-Nearest Neighbors (kNN), Support Vector Machines (SVM) with a linear kernel, and Random Forest (RF) algorithms were applied to classify the hatchability. This grouping process was conducted using binary classification techniques. Hatchability was negatively correlated with egg weight, breeders' age, shell width, shell length, and positive correlations were identified with moisture loss, number of fertilised eggs, and shell thickness. Multiple linear regression models were more accurate than single linear models regarding the highest coefficient of determination (R²) with 94% and minimum AIC and BIC values. According to the classification results, RF, CART, and kNN had performed the highest accuracy values 0.99, 0.975, and 0.972, respectively, for the commercial hatchery process. Therefore, the RF is the most appropriate machine learning algorithm for classifying the breeder outcomes, which are economically profitable or not, in a commercial hatchery.

Keywords: classification models, egg weight, fertilised eggs, multiple linear regression

Procedia PDF Downloads 56
30567 Regional Flood Frequency Analysis in Narmada Basin: A Case Study

Authors: Ankit Shah, R. K. Shrivastava

Abstract:

Flood and drought are two main features of hydrology which affect the human life. Floods are natural disasters which cause millions of rupees’ worth of damage each year in India and the whole world. Flood causes destruction in form of life and property. An accurate estimate of the flood damage potential is a key element to an effective, nationwide flood damage abatement program. Also, the increase in demand of water due to increase in population, industrial and agricultural growth, has let us know that though being a renewable resource it cannot be taken for granted. We have to optimize the use of water according to circumstances and conditions and need to harness it which can be done by construction of hydraulic structures. For their safe and proper functioning of hydraulic structures, we need to predict the flood magnitude and its impact. Hydraulic structures play a key role in harnessing and optimization of flood water which in turn results in safe and maximum use of water available. Mainly hydraulic structures are constructed on ungauged sites. There are two methods by which we can estimate flood viz. generation of Unit Hydrographs and Flood Frequency Analysis. In this study, Regional Flood Frequency Analysis has been employed. There are many methods for estimating the ‘Regional Flood Frequency Analysis’ viz. Index Flood Method. National Environmental and Research Council (NERC Methods), Multiple Regression Method, etc. However, none of the methods can be considered universal for every situation and location. The Narmada basin is located in Central India. It is drained by most of the tributaries, most of which are ungauged. Therefore it is very difficult to estimate flood on these tributaries and in the main river. As mentioned above Artificial Neural Network (ANN)s and Multiple Regression Method is used for determination of Regional flood Frequency. The annual peak flood data of 20 sites gauging sites of Narmada Basin is used in the present study to determine the Regional Flood relationships. Homogeneity of the considered sites is determined by using the Index Flood Method. Flood relationships obtained by both the methods are compared with each other, and it is found that ANN is more reliable than Multiple Regression Method for the present study area.

Keywords: artificial neural network, index flood method, multi layer perceptrons, multiple regression, Narmada basin, regional flood frequency

Procedia PDF Downloads 385
30566 Non-Methane Hydrocarbons Emission during the Photocopying Process

Authors: Kiurski S. Jelena, Aksentijević M. Snežana, Kecić S. Vesna, Oros B. Ivana

Abstract:

The prosperity of electronic equipment in photocopying environment not only has improved work efficiency, but also has changed indoor air quality. Considering the number of photocopying employed, indoor air quality might be worse than in general office environments. Determining the contribution from any type of equipment to indoor air pollution is a complex matter. Non-methane hydrocarbons are known to have an important role of air quality due to their high reactivity. The presence of hazardous pollutants in indoor air has been detected in one photocopying shop in Novi Sad, Serbia. Air samples were collected and analyzed for five days, during 8-hr working time in three-time intervals, whereas three different sampling points were determined. Using multiple linear regression model and software package STATISTICA 10 the concentrations of occupational hazards and micro-climates parameters were mutually correlated. Based on the obtained multiple coefficients of determination (0.3751, 0.2389, and 0.1975), a weak positive correlation between the observed variables was determined. Small values of parameter F indicated that there was no statistically significant difference between the concentration levels of non-methane hydrocarbons and micro-climates parameters. The results showed that variable could be presented by the general regression model: y = b0 + b1xi1+ b2xi2. Obtained regression equations allow to measure the quantitative agreement between the variation of variables and thus obtain more accurate knowledge of their mutual relations.

Keywords: non-methane hydrocarbons, photocopying process, multiple regression analysis, indoor air quality, pollutant emission

Procedia PDF Downloads 347
30565 Big Data Analysis with RHadoop

Authors: Ji Eun Shin, Byung Ho Jung, Dong Hoon Lim

Abstract:

It is almost impossible to store or analyze big data increasing exponentially with traditional technologies. Hadoop is a new technology to make that possible. R programming language is by far the most popular statistical tool for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with different sizes of actual data. Experimental results showed our RHadoop system was much faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and big lm packages available on big memory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Keywords: big data, Hadoop, parallel regression analysis, R, RHadoop

Procedia PDF Downloads 406
30564 Analysis of Ferroresonant Overvoltages in Cable-fed Transformers

Authors: George Eduful, Ebenezer A. Jackson, Kingsford A. Atanga

Abstract:

This paper investigates the impacts of cable length and capacity of transformer on ferroresonant overvoltage in cable-fed transformers. The study was conducted by simulation using the EMTP RV. Results show that ferroresonance can cause dangerous overvoltages ranging from 2 to 5 per unit. These overvoltages impose stress on insulations of transformers and cables and subsequently result in system failures. Undertaking Basic Multiple Regression Analysis (BMR) on the results obtained, a statistical model was obtained in terms of cable length and transformer capacity. The model is useful for ferroresonant prediction and control in cable-fed transformers.

Keywords: ferroresonance, cable-fed transformers, EMTP RV, regression analysis

Procedia PDF Downloads 495
30563 Model-Based Software Regression Test Suite Reduction

Authors: Shiwei Deng, Yang Bao

Abstract:

In this paper, we present a model-based regression test suite reducing approach that uses EFSM model dependence analysis and probability-driven greedy algorithm to reduce software regression test suites. The approach automatically identifies the difference between the original model and the modified model as a set of elementary model modifications. The EFSM dependence analysis is performed for each elementary modification to reduce the regression test suite, and then the probability-driven greedy algorithm is adopted to select the minimum set of test cases from the reduced regression test suite that cover all interaction patterns. Our initial experience shows that the approach may significantly reduce the size of regression test suites.

Keywords: dependence analysis, EFSM model, greedy algorithm, regression test

Procedia PDF Downloads 396
30562 Exploration and Evaluation of the Effect of Multiple Countermeasures on Road Safety

Authors: Atheer Al-Nuaimi, Harry Evdorides

Abstract:

Every day many people die or get disabled or injured on roads around the world, which necessitates more specific treatments for transportation safety issues. International road assessment program (iRAP) model is one of the comprehensive road safety models which accounting for many factors that affect road safety in a cost-effective way in low and middle income countries. In iRAP model road safety has been divided into five star ratings from 1 star (the lowest level) to 5 star (the highest level). These star ratings are based on star rating score which is calculated by iRAP methodology depending on road attributes, traffic volumes and operating speeds. The outcome of iRAP methodology are the treatments that can be used to improve road safety and reduce fatalities and serious injuries (FSI) numbers. These countermeasures can be used separately as a single countermeasure or mix as multiple countermeasures for a location. There is general agreement that the adequacy of a countermeasure is liable to consistent losses when it is utilized as a part of mix with different countermeasures. That is, accident diminishment appraisals of individual countermeasures cannot be easily added together. The iRAP model philosophy makes utilization of a multiple countermeasure adjustment factors to predict diminishments in the effectiveness of road safety countermeasures when more than one countermeasure is chosen. A multiple countermeasure correction factors are figured for every 100-meter segment and for every accident type. However, restrictions of this methodology incorporate a presumable over-estimation in the predicted crash reduction. This study aims to adjust this correction factor by developing new models to calculate the effect of using multiple countermeasures on the number of fatalities for a location or an entire road. Regression models have been used to establish relationships between crash frequencies and the factors that affect their rates. Multiple linear regression, negative binomial regression, and Poisson regression techniques were used to develop models that can address the effectiveness of using multiple countermeasures. Analyses are conducted using The R Project for Statistical Computing showed that a model developed by negative binomial regression technique could give more reliable results of the predicted number of fatalities after the implementation of road safety multiple countermeasures than the results from iRAP model. The results also showed that the negative binomial regression approach gives more precise results in comparison with multiple linear and Poisson regression techniques because of the overdispersion and standard error issues.

Keywords: international road assessment program, negative binomial, road multiple countermeasures, road safety

Procedia PDF Downloads 209
30561 A Case Comparative Study of Infant Mortality Rate in North-West Nigeria

Authors: G. I. Onwuka, A. Danbaba, S. U. Gulumbe

Abstract:

This study investigated of Infant Mortality Rate as observed at a general hospital in Kaduna-South, Kaduna State, North West Nigeria. The causes of infant Mortality were examined. The data used for this analysis were collected at the statistics unit of the Hospital. The analysis was carried out on the data using Multiple Linear regression Technique and this showed that there is linear relationship between the dependent variable (death) and the independent variables (malaria, measles, anaemia, and coronary heart disease). The resultant model also revealed that a unit increment in each of these diseases would result to a unit increment in death recorded, 98.7% of the total variation in mortality is explained by the given model. The highest number of mortality was recorded in July, 2005 and the lowest mortality recorded in October, 2009.Recommendations were however made based on the results of the study.

Keywords: infant mortality rate, multiple linear regression, diseases, serial correlation

Procedia PDF Downloads 297
30560 The Use of Geographically Weighted Regression for Deforestation Analysis: Case Study in Brazilian Cerrado

Authors: Ana Paula Camelo, Keila Sanches

Abstract:

The Geographically Weighted Regression (GWR) was proposed in geography literature to allow relationship in a regression model to vary over space. In Brazil, the agricultural exploitation of the Cerrado Biome is the main cause of deforestation. In this study, we propose a methodology using geostatistical methods to characterize the spatial dependence of deforestation in the Cerrado based on agricultural production indicators. Therefore, it was used the set of exploratory spatial data analysis tools (ESDA) and confirmatory analysis using GWR. It was made the calibration a non-spatial model, evaluation the nature of the regression curve, election of the variables by stepwise process and multicollinearity analysis. After the evaluation of the non-spatial model was processed the spatial-regression model, statistic evaluation of the intercept and verification of its effect on calibration. In an analysis of Spearman’s correlation the results between deforestation and livestock was +0.783 and with soybeans +0.405. The model presented R²=0.936 and showed a strong spatial dependence of agricultural activity of soybeans associated to maize and cotton crops. The GWR is a very effective tool presenting results closer to the reality of deforestation in the Cerrado when compared with other analysis.

Keywords: deforestation, geographically weighted regression, land use, spatial analysis

Procedia PDF Downloads 324
30559 Indian Premier League (IPL) Score Prediction: Comparative Analysis of Machine Learning Models

Authors: Rohini Hariharan, Yazhini R, Bhamidipati Naga Shrikarti

Abstract:

In the realm of cricket, particularly within the context of the Indian Premier League (IPL), the ability to predict team scores accurately holds significant importance for both cricket enthusiasts and stakeholders alike. This paper presents a comprehensive study on IPL score prediction utilizing various machine learning algorithms, including Support Vector Machines (SVM), XGBoost, Multiple Regression, Linear Regression, K-nearest neighbors (KNN), and Random Forest. Through meticulous data preprocessing, feature engineering, and model selection, we aimed to develop a robust predictive framework capable of forecasting team scores with high precision. Our experimentation involved the analysis of historical IPL match data encompassing diverse match and player statistics. Leveraging this data, we employed state-of-the-art machine learning techniques to train and evaluate the performance of each model. Notably, Multiple Regression emerged as the top-performing algorithm, achieving an impressive accuracy of 77.19% and a precision of 54.05% (within a threshold of +/- 10 runs). This research contributes to the advancement of sports analytics by demonstrating the efficacy of machine learning in predicting IPL team scores. The findings underscore the potential of advanced predictive modeling techniques to provide valuable insights for cricket enthusiasts, team management, and betting agencies. Additionally, this study serves as a benchmark for future research endeavors aimed at enhancing the accuracy and interpretability of IPL score prediction models.

Keywords: indian premier league (IPL), cricket, score prediction, machine learning, support vector machines (SVM), xgboost, multiple regression, linear regression, k-nearest neighbors (KNN), random forest, sports analytics

Procedia PDF Downloads 7
30558 Application Difference between Cox and Logistic Regression Models

Authors: Idrissa Kayijuka

Abstract:

The logistic regression and Cox regression models (proportional hazard model) at present are being employed in the analysis of prospective epidemiologic research looking into risk factors in their application on chronic diseases. However, a theoretical relationship between the two models has been studied. By definition, Cox regression model also called Cox proportional hazard model is a procedure that is used in modeling data regarding time leading up to an event where censored cases exist. Whereas the Logistic regression model is mostly applicable in cases where the independent variables consist of numerical as well as nominal values while the resultant variable is binary (dichotomous). Arguments and findings of many researchers focused on the overview of Cox and Logistic regression models and their different applications in different areas. In this work, the analysis is done on secondary data whose source is SPSS exercise data on BREAST CANCER with a sample size of 1121 women where the main objective is to show the application difference between Cox regression model and logistic regression model based on factors that cause women to die due to breast cancer. Thus we did some analysis manually i.e. on lymph nodes status, and SPSS software helped to analyze the mentioned data. This study found out that there is an application difference between Cox and Logistic regression models which is Cox regression model is used if one wishes to analyze data which also include the follow-up time whereas Logistic regression model analyzes data without follow-up-time. Also, they have measurements of association which is different: hazard ratio and odds ratio for Cox and logistic regression models respectively. A similarity between the two models is that they are both applicable in the prediction of the upshot of a categorical variable i.e. a variable that can accommodate only a restricted number of categories. In conclusion, Cox regression model differs from logistic regression by assessing a rate instead of proportion. The two models can be applied in many other researches since they are suitable methods for analyzing data but the more recommended is the Cox, regression model.

Keywords: logistic regression model, Cox regression model, survival analysis, hazard ratio

Procedia PDF Downloads 421
30557 Economic Analysis of Cowpea (Unguiculata spp) Production in Northern Nigeria: A Case Study of Kano Katsina and Jigawa States

Authors: Yakubu Suleiman, S. A. Musa

Abstract:

Nigeria is the largest cowpea producer in the world, accounting for about 45%, followed by Brazil with about 17%. Cowpea is grown in Kano, Bauchi, Katsina, Borno in the north, Oyo in the west, and to the lesser extent in Enugu in the east. This study was conducted to determine the input–output relationship of Cowpea production in Kano, Katsina, and Jigawa states of Nigeria. The data were collected with the aid of 1000 structured questionnaires that were randomly distributed to Cowpea farmers in the three states mentioned above of the study area. The data collected were analyzed using regression analysis (Cobb–Douglass production function model). The result of the regression analysis revealed the coefficient of multiple determinations, R2, to be 72.5% and the F ration to be 106.20 and was found to be significant (P < 0.01). The regression coefficient of constant is 0.5382 and is significant (P < 0.01). The regression coefficient with respect to labor and seeds were 0.65554 and 0.4336, respectively, and they are highly significant (P < 0.01). The regression coefficient with respect to fertilizer is 0.26341 which is significant (P < 0.05). This implies that a unit increase of any one of the variable inputs used while holding all other variables inputs constants, will significantly increase the total Cowpea output by their corresponding coefficient. This indicated that farmers in the study area are operating in stage II of the production function. The result revealed that Cowpea farmer in Kano, Jigawa and Katsina States realized a profit of N15,997, N34,016 and N19,788 per hectare respectively. It is hereby recommended that more attention should be given to Cowpea production by government and research institutions.

Keywords: coefficient, constant, inputs, regression

Procedia PDF Downloads 383