Search results for: multi-variable regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3205

Search results for: multi-variable regression

3085 Heart Attack Prediction Using Several Machine Learning Methods

Authors: Suzan Anwar, Utkarsh Goyal

Abstract:

Heart rate (HR) is a predictor of cardiovascular, cerebrovascular, and all-cause mortality in the general population, as well as in patients with cardio and cerebrovascular diseases. Machine learning (ML) significantly improves the accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment while avoiding unnecessary treatment of others. This research examines relationship between the individual's various heart health inputs like age, sex, cp, trestbps, thalach, oldpeaketc, and the likelihood of developing heart disease. Machine learning techniques like logistic regression and decision tree, and Python are used. The results of testing and evaluating the model using the Heart Failure Prediction Dataset show the chance of a person having a heart disease with variable accuracy. Logistic regression has yielded an accuracy of 80.48% without data handling. With data handling (normalization, standardscaler), the logistic regression resulted in improved accuracy of 87.80%, decision tree 100%, random forest 100%, and SVM 100%.

Keywords: heart rate, machine learning, SVM, decision tree, logistic regression, random forest

Procedia PDF Downloads 135
3084 Efficient Model Selection in Linear and Non-Linear Quantile Regression by Cross-Validation

Authors: Yoonsuh Jung, Steven N. MacEachern

Abstract:

Check loss function is used to define quantile regression. In the prospect of cross validation, it is also employed as a validation function when underlying truth is unknown. However, our empirical study indicates that the validation with check loss often leads to choosing an over estimated fits. In this work, we suggest a modified or L2-adjusted check loss which rounds the sharp corner in the middle of check loss. It has a large effect of guarding against over fitted model in some extent. Through various simulation settings of linear and non-linear regressions, the improvement of check loss by L2 adjustment is empirically examined. This adjustment is devised to shrink to zero as sample size grows.

Keywords: cross-validation, model selection, quantile regression, tuning parameter selection

Procedia PDF Downloads 431
3083 Oral Sex Practice among Men Who Have Sex with Men: A Cross-Sectional Study in Indonesian Urban Settings

Authors: I Putu Yuda Hananta, Inke Kusumastuti

Abstract:

The latest Indonesian Biology and Behavior Surveillance (IBBS) conducted by Indonesian Ministry of Health reported a large proportion of men who have sex with men (MSM) engaging in oral sex in their recent sexual history. While it is considered as a pleasuring and safe, oral sex might facilitate the transmission of various sexually transmitted infection (STI) pathogens. This study was aimed to investigate the oral sex practice among MSM in Indonesian urban settings to help delineate demographic and behavior determinants of such practice. In 2014, 501 MSM in 8 clinic-based and outreach STI services were recruited in Jakarta, Yogyakarta and Denpasar, Indonesia. Respondents completed a self-administered questionnaire inquiring about their demographics and sexual history. Median age (interquartile range) of the respondents was 27 (24-30) years; most completed senior high school (54.3%), worked in informal jobs (57.9%), and single (60.9%); and 32.3% reported receiving money in exchange for sex. Oral sex was practiced by most respondents: insertive only (10.0%), receptive only (6.0%), and both (82.4%). A separate multivariable analysis was performed using logistic regression to identify the determinants for receptive and insertive oral sex. Factors associated with receptive oral sex were having more than 10 sex partner(s) in the preceding 6 months vs 1 partner, adjusted odds ratio (aOR) [95% CI]=3.40 [1.22-9.42], p=0.03; and history of receptive-insertive anal sex vs no history, aOR=4.37 [1.76-10.82], p=0.01. Factors associated with insertive oral sex were receiving money for sex vs. not receiving, aOR=2.98 [1.10-8.04], p=0.02; and history of receptive-insertive anal sex vs. no history, aOR=2.10 [0.51-8.74], p<0.001. Only a few respondents reported consistent condom use (11.6% and 12.0% for receptive and insertive oral sex, respectively). Our findings demonstrated that while oral sex is a common practice among MSM, the consistency of condom use in oral sex is very low. In addition, certain sex behavior (number of sex partners, sex work and history of anal sex) were associated with oral sex, and this might need to be addressed during health promotion efforts on STI prevention through oral-genital contact.

Keywords: behavior, Indonesia, men who have sex with men, oral sex

Procedia PDF Downloads 237
3082 Instability Index Method and Logistic Regression to Assess Landslide Susceptibility in County Route 89, Taiwan

Authors: Y. H. Wu, Ji-Yuan Lin, Yu-Ming Liou

Abstract:

This study aims to set up the landslide susceptibility map of County Route 89 at Ren-Ai Township in Nantou County using the Instability Index Method and Logistic regression. Seven susceptibility factors including Slope Angle, Aspect, Elevation, Distance to fold, Distance to River, Distance to Road and Accumulated Rainfall were obtained by GIS based on the Typhoon Toraji landslide area identified by Industrial Technology Research Institute in 2001. To calculate the landslide percentage of each factor and acquire the weight and grade the grid by means of Instability Index Method. In this study, landslide susceptibility can be classified into four grades: high, medium high, medium low and low, in order to determine the advantages and disadvantages of the two models. The precision of this model is verified by classification error matrix and SRC curve. These results suggest that the logistic regression model is a preferred method than instability index in the assessment of landslide susceptibility. It is suitable for the landslide prediction and precaution in this area in the future.

Keywords: instability index method, logistic regression, landslide susceptibility, SRC curve

Procedia PDF Downloads 284
3081 Clinical Prediction Score for Ruptured Appendicitis In ED

Authors: Thidathit Prachanukool, Chaiyaporn Yuksen, Welawat Tienpratarn, Sorravit Savatmongkorngul, Panvilai Tangkulpanich, Chetsadakon Jenpanitpong, Yuranan Phootothum, Malivan Phontabtim, Promphet Nuanprom

Abstract:

Background: Ruptured appendicitis has a high morbidity and mortality and requires immediate surgery. The Alvarado Score is used as a tool to predict the risk of acute appendicitis, but there is no such score for predicting rupture. This study aimed to developed the prediction score to determine the likelihood of ruptured appendicitis in an Asian population. Methods: This study was diagnostic, retrospectively cross-sectional and exploratory model at the Emergency Medicine Department in Ramathibodi Hospital between March 2016 and March 2018. The inclusion criteria were age >15 years and an available pathology report after appendectomy. Clinical factors included gender, age>60 years, right lower quadrant pain, migratory pain, nausea and/or vomiting, diarrhea, anorexia, fever>37.3°C, rebound tenderness, guarding, white blood cell count, polymorphonuclear white blood cells (PMN)>75%, and the pain duration before presentation. The predictive model and prediction score for ruptured appendicitis was developed by multivariable logistic regression analysis. Result: During the study period, 480 patients met the inclusion criteria; of these, 77 (16%) had ruptured appendicitis. Five independent factors were predictive of rupture, age>60 years, fever>37.3°C, guarding, PMN>75%, and duration of pain>24 hours to presentation. A score > 6 increased the likelihood ratio of ruptured appendicitis by 3.88 times. Conclusion: Using the Ramathibodi Welawat Ruptured Appendicitis Score. (RAMA WeRA Score) developed in this study, a score of > 6 was associated with ruptured appendicitis.

Keywords: predictive model, risk score, ruptured appendicitis, emergency room

Procedia PDF Downloads 161
3080 Regret-Regression for Multi-Armed Bandit Problem

Authors: Deyadeen Ali Alshibani

Abstract:

In the literature, the multi-armed bandit problem as a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. There are several different algorithms models and their applications on this problem. In this paper, we evaluate the Regret-regression through comparing with Q-learning method. A simulation on determination of optimal treatment regime is presented in detail.

Keywords: optimal, bandit problem, optimization, dynamic programming

Procedia PDF Downloads 450
3079 The Strengths and Limitations of the Statistical Modeling of Complex Social Phenomenon: Focusing on SEM, Path Analysis, or Multiple Regression Models

Authors: Jihye Jeon

Abstract:

This paper analyzes the conceptual framework of three statistical methods, multiple regression, path analysis, and structural equation models. When establishing research model of the statistical modeling of complex social phenomenon, it is important to know the strengths and limitations of three statistical models. This study explored the character, strength, and limitation of each modeling and suggested some strategies for accurate explaining or predicting the causal relationships among variables. Especially, on the studying of depression or mental health, the common mistakes of research modeling were discussed.

Keywords: multiple regression, path analysis, structural equation models, statistical modeling, social and psychological phenomenon

Procedia PDF Downloads 641
3078 QSRR Analysis of 17-Picolyl and 17-Picolinylidene Androstane Derivatives Based on Partial Least Squares and Principal Component Regression

Authors: Sanja Podunavac-Kuzmanović, Strahinja Kovačević, Lidija Jevrić, Evgenija Djurendić, Jovana Ajduković

Abstract:

There are several methods for determination of the lipophilicity of biologically active compounds, however chromatography has been shown as a very suitable method for this purpose. Chromatographic (C18-RP-HPLC) analysis of a series of 24 17-picolyl and 17-picolinylidene androstane derivatives was carried out. The obtained retention indices (logk, methanol (90%) / water (10%)) were correlated with calculated physicochemical and lipophilicity descriptors. The QSRR analysis was carried out applying principal component regression (PCR) and partial least squares regression (PLS). The PCR and PLS model were selected on the basis of the highest variance and the lowest root mean square error of cross-validation. The obtained PCR and PLS model successfully correlate the calculated molecular descriptors with logk parameter indicating the significance of the lipophilicity of compounds in chromatographic process. On the basis of the obtained results it can be concluded that the obtained logk parameters of the analyzed androstane derivatives can be considered as their chromatographic lipophilicity. These results are the part of the project No. 114-451-347/2015-02, financially supported by the Provincial Secretariat for Science and Technological Development of Vojvodina and CMST COST Action CM1105.

Keywords: androstane derivatives, chromatography, molecular structure, principal component regression, partial least squares regression

Procedia PDF Downloads 271
3077 Detecting Earnings Management via Statistical and Neural Networks Techniques

Authors: Mohammad Namazi, Mohammad Sadeghzadeh Maharluie

Abstract:

Predicting earnings management is vital for the capital market participants, financial analysts and managers. The aim of this research is attempting to respond to this query: Is there a significant difference between the regression model and neural networks’ models in predicting earnings management, and which one leads to a superior prediction of it? In approaching this question, a Linear Regression (LR) model was compared with two neural networks including Multi-Layer Perceptron (MLP), and Generalized Regression Neural Network (GRNN). The population of this study includes 94 listed companies in Tehran Stock Exchange (TSE) market from 2003 to 2011. After the results of all models were acquired, ANOVA was exerted to test the hypotheses. In general, the summary of statistical results showed that the precision of GRNN did not exhibit a significant difference in comparison with MLP. In addition, the mean square error of the MLP and GRNN showed a significant difference with the multi variable LR model. These findings support the notion of nonlinear behavior of the earnings management. Therefore, it is more appropriate for capital market participants to analyze earnings management based upon neural networks techniques, and not to adopt linear regression models.

Keywords: earnings management, generalized linear regression, neural networks multi-layer perceptron, Tehran stock exchange

Procedia PDF Downloads 416
3076 Anemia Among Pregnant Women in Kuwait: Findings from Kuwait Birth Cohort Study

Authors: Majeda Hammoud

Abstract:

Background: Anemia during pregnancy increases the risk of delivery by cesarean section, low birth weight, preterm birth, perinatal mortality, stillbirth, and maternal mortality. In this study, we aimed to assess the prevalence of anemia in pregnant women and its associated factors in the Kuwait birth cohort study. Methods: The Kuwait birth cohort (N=1108) was a prospective cohort study in which pregnant women were recruited in the third trimester. Data were collected through personal interviews with mothers who attend antenatal care visits, including data on socio-economic status and lifestyle factors. Blood samples were taken after the recruitment to measure multiple laboratory indicators. Clinical data were extracted from the medical records by a clinician including data on comorbidities. Anemia was defined as having Hemoglobin (Hb) <110 g/L with further classification as mild (100-109 g/L), moderate (70-99 g/L), or severe (<70 g/L). Predictors of anemia were classified as underlying or direct factors, and logistic regression was used to investigate their association with anemia. Results: The mean Hb level in the study group was 115.21 g/L (95%CI: 114.56- 115.87 g/L), with significant differences between age groups (p=0.034). The prevalence of anemia was 28.16% (95%CI: 25.53-30.91%), with no significant difference by age group (p=0.164). Of all 1108 pregnant women, 8.75% had moderate anemia, and 19.40% had mild anemia, but no pregnant women had severe anemia. In multivariable analysis, getting pregnant while using contraception, adjusted odds ratio (AOR) 1.73(95%CI:1.01-2.96); p=0.046 and current use of supplements, AOR 0.50 (95%CI: 0.26-0.95); p=0.035 were significantly associated with anemia (underlying factors). From the direct factors group, only iron and ferritin levels were significantly associated with anemia (P<0.001). Conclusion: Although the severe form of anemia is low among pregnant women in Kuwait, mild and moderate anemia remains a significant health problem despite free access to antenatal care.

Keywords: anemia, pregnancy, hemoglobin, ferritin

Procedia PDF Downloads 46
3075 Minimizing the Impact of Covariate Detection Limit in Logistic Regression

Authors: Shahadut Hossain, Jacek Wesolowski, Zahirul Hoque

Abstract:

In many epidemiological and environmental studies covariate measurements are subject to the detection limit. In most applications, covariate measurements are usually truncated from below which is known as left-truncation. Because the measuring device, which we use to measure the covariate, fails to detect values falling below the certain threshold. In regression analyses, it causes inflated bias and inaccurate mean squared error (MSE) to the estimators. This paper suggests a response-based regression calibration method to correct the deleterious impact introduced by the covariate detection limit in the estimators of the parameters of simple logistic regression model. Compared to the maximum likelihood method, the proposed method is computationally simpler, and hence easier to implement. It is robust to the violation of distributional assumption about the covariate of interest. In producing correct inference, the performance of the proposed method compared to the other competing methods has been investigated through extensive simulations. A real-life application of the method is also shown using data from a population-based case-control study of non-Hodgkin lymphoma.

Keywords: environmental exposure, detection limit, left truncation, bias, ad-hoc substitution

Procedia PDF Downloads 229
3074 Comparative Study od Three Artificial Intelligence Techniques for Rain Domain in Precipitation Forecast

Authors: Nabilah Filzah Mohd Radzuan, Andi Putra, Zalinda Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan

Abstract:

Precipitation forecast is important to avoid natural disaster incident which can cause losses in the involved area. This paper reviews three techniques logistic regression, decision tree, and random forest which are used in making precipitation forecast. These combination techniques through the vector auto-regression (VAR) model help in finding the advantages and strengths of each technique in the forecast process. The data-set contains variables of the rain’s domain. Adaptation of artificial intelligence techniques involved in rain domain enables the forecast process to be easier and systematic for precipitation forecast.

Keywords: logistic regression, decisions tree, random forest, VAR model

Procedia PDF Downloads 441
3073 A Study of User Awareness and Attitudes Towards Civil-ID Authentication in Oman’s Electronic Services

Authors: Raya Al Khayari, Rasha Al Jassim, Muna Al Balushi, Fatma Al Moqbali, Said El Hajjar

Abstract:

This study utilizes linear regression analysis to investigate the correlation between user account passwords and the probability of civil ID exposure, offering statistical insights into civil ID security. The study employs multiple linear regression (MLR) analysis to further investigate the elements that influence consumers’ views of civil ID security. This aims to increase awareness and improve preventive measures. The results obtained from the MLR analysis provide a thorough comprehension and can guide specific educational and awareness campaigns aimed at promoting improved security procedures. In summary, the study’s results offer significant insights for improving existing security measures and developing more efficient tactics to reduce risks related to civil ID security in Oman. By identifying key factors that impact consumers’ perceptions, organizations can tailor their strategies to address vulnerabilities effectively. Additionally, the findings can inform policymakers on potential regulatory changes to enhance civil ID security in the country.

Keywords: civil-id disclosure, awareness, linear regression, multiple regression

Procedia PDF Downloads 51
3072 A Research on Inference from Multiple Distance Variables in Hedonic Regression Focus on Three Variables

Authors: Yan Wang, Yasushi Asami, Yukio Sadahiro

Abstract:

In urban context, urban nodes such as amenity or hazard will certainly affect house price, while classic hedonic analysis will employ distance variables measured from each urban nodes. However, effects from distances to facilities on house prices generally do not represent the true price of the property. Distance variables measured on the same surface are suffering a problem called multicollinearity, which is usually presented as magnitude variance and mean value in regression, errors caused by instability. In this paper, we provided a theoretical framework to identify and gather the data with less bias, and also provided specific sampling method on locating the sample region to avoid the spatial multicollinerity problem in three distance variable’s case.

Keywords: hedonic regression, urban node, distance variables, multicollinerity, collinearity

Procedia PDF Downloads 459
3071 Knowledge, Attitude, Practice and Contributing Factors on Menstrual Hygiene Among High School Students, Ethiopia: Cross-Sectional Study

Authors: Getnet Gedefaw, Fentanesh Endalew, Bitewush Azmeraw, Bethelhem Walelign, Eyob Shitie

Abstract:

Introduction: The issue of menstrual hygiene is often overlooked and has not been sufficiently addressed in the fields of reproductive health in low and middle-income countries. Inadequate menstrual hygiene practices can increase the risk of various infectious and chronic obstetric and gynaecological complications for girls and adolescents. Hence, this study seeks to investigate the knowledge, attitudes, and practices related to menstrual hygiene, along with the factors influencing them, among high school students. Methods: A facility based cross-sectional study was conducted involving a total of 423 study subjects. A systematic random sampling technique was utilized. Data was entered and analyzed through Epi data 3.1 and SPSS 22, respectively. Both univariable and multivariable logistic regression models were employed. A p-value of less than 0.05 was considered statistically significant. Results: This study revealed that 365(89.2%), 200(48.9%) and 196(47.9%) of the study participants have good knowledge, good practice, and good attitudes about menstrual hygiene, respectively. Being higher grade students (grade 10) [AOR=3.96, 95% CI =2.0-7.8] and having good practice of menstrual hygiene (AOR=2.52, 95% CI= 1.26-5) had a positive association with menstrual hygiene knowledge. Whereas maternal education level (AOR=1.86, 95% CI=1.18-2.9) and being a grade 10 student (AOR=2.3, 95% CI=1.48-3.56) were associated factors for practising menstrual hygiene. Additionally, being higher grade students (AOR=1.9, 95% CI=1.2-2.8), age ≥18 years (AOR=1.67, 95% CI=1.09-2.55) were statistically and positively associated with the attitude of menstrual hygiene. Conclusion: The study findings indicated that the knowledge of the study participants regarding menstrual hygiene was high, while their attitudes and practices towards menstrual hygiene were low. It is suggested that raising awareness among reproductive health groups and educating their families and parents could potentially lead to a positive change in their poor practices and attitudes towards menstrual hygiene.

Keywords: menstrual hygiene, menstruation, students, reproductive health

Procedia PDF Downloads 52
3070 Urban Energy Demand Modelling: Spatial Analysis Approach

Authors: Hung-Chu Chen, Han Qi, Bauke de Vries

Abstract:

Energy consumption in the urban environment has attracted numerous researches in recent decades. However, it is comparatively rare to find literary works which investigated 3D spatial analysis of urban energy demand modelling. In order to analyze the spatial correlation between urban morphology and energy demand comprehensively, this paper investigates their relation by using the spatial regression tool. In addition, the spatial regression tool which is applied in this paper is ordinary least squares regression (OLS) and geographically weighted regression (GWR) model. Normalized Difference Built-up Index (NDBI), Normalized Difference Vegetation Index (NDVI), and building volume are explainers of urban morphology, which act as independent variables of Energy-land use (E-L) model. NDBI and NDVI are used as the index to describe five types of land use: urban area (U), open space (O), artificial green area (G), natural green area (V), and water body (W). Accordingly, annual electricity, gas demand and energy demand are dependent variables of the E-L model. Based on the analytical result of E-L model relation, it revealed that energy demand and urban morphology are closely connected and the possible causes and practical use are discussed. Besides, the spatial analysis methods of OLS and GWR are compared.

Keywords: energy demand model, geographically weighted regression, normalized difference built-up index, normalized difference vegetation index, spatial statistics

Procedia PDF Downloads 143
3069 Modeling Aeration of Sharp Crested Weirs by Using Support Vector Machines

Authors: Arun Goel

Abstract:

The present paper attempts to investigate the prediction of air entrainment rate and aeration efficiency of a free over-fall jets issuing from a triangular sharp crested weir by using regression based modelling. The empirical equations, support vector machine (polynomial and radial basis function) models and the linear regression techniques were applied on the triangular sharp crested weirs relating the air entrainment rate and the aeration efficiency to the input parameters namely drop height, discharge, and vertex angle. It was observed that there exists a good agreement between the measured values and the values obtained using empirical equations, support vector machine (Polynomial and rbf) models, and the linear regression techniques. The test results demonstrated that the SVM based (Poly & rbf) model also provided acceptable prediction of the measured values with reasonable accuracy along with empirical equations and linear regression techniques in modelling the air entrainment rate and the aeration efficiency of a free over-fall jets issuing from triangular sharp crested weir. Further sensitivity analysis has also been performed to study the impact of input parameter on the output in terms of air entrainment rate and aeration efficiency.

Keywords: air entrainment rate, dissolved oxygen, weir, SVM, regression

Procedia PDF Downloads 428
3068 Use of Regression Analysis in Determining the Length of Plastic Hinge in Reinforced Concrete Columns

Authors: Mehmet Alpaslan Köroğlu, Musa Hakan Arslan, Muslu Kazım Körez

Abstract:

Basic objective of this study is to create a regression analysis method that can estimate the length of a plastic hinge which is an important design parameter, by making use of the outcomes of (lateral load-lateral displacement hysteretic curves) the experimental studies conducted for the reinforced square concrete columns. For this aim, 170 different square reinforced concrete column tests results have been collected from the existing literature. The parameters which are thought affecting the plastic hinge length such as cross-section properties, features of material used, axial loading level, confinement of the column, longitudinal reinforcement bars in the columns etc. have been obtained from these 170 different square reinforced concrete column tests. In the study, when determining the length of plastic hinge, using the experimental test results, a regression analysis have been separately tested and compared with each other. In addition, the outcome of mentioned methods on determination of plastic hinge length of the reinforced concrete columns has been compared to other methods available in the literature.

Keywords: columns, plastic hinge length, regression analysis, reinforced concrete

Procedia PDF Downloads 475
3067 Measurement Errors and Misclassifications in Covariates in Logistic Regression: Bayesian Adjustment of Main and Interaction Effects and the Sample Size Implications

Authors: Shahadut Hossain

Abstract:

Measurement errors in continuous covariates and/or misclassifications in categorical covariates are common in epidemiological studies. Regression analysis ignoring such mismeasurements seriously biases the estimated main and interaction effects of covariates on the outcome of interest. Thus, adjustments for such mismeasurements are necessary. In this research, we propose a Bayesian parametric framework for eliminating deleterious impacts of covariate mismeasurements in logistic regression. The proposed adjustment method is unified and thus can be applied to any generalized linear and non-linear regression models. Furthermore, adjustment for covariate mismeasurements requires validation data usually in the form of either gold standard measurements or replicates of the mismeasured covariates on a subset of the study population. Initial investigation shows that adequacy of such adjustment depends on the sizes of main and validation samples, especially when prevalences of the categorical covariates are low. Thus, we investigate the impact of main and validation sample sizes on the adjusted estimates, and provide a general guideline about these sample sizes based on simulation studies.

Keywords: measurement errors, misclassification, mismeasurement, validation sample, Bayesian adjustment

Procedia PDF Downloads 404
3066 Quantitative Structure-Activity Relationship Study of Some Quinoline Derivatives as Antimalarial Agents

Authors: M. Ouassaf, S. Belaid

Abstract:

A series of quinoline derivatives with antimalarial activity were subjected to two-dimensional quantitative structure-activity relationship (2D-QSAR) studies. Three models were implemented using multiple regression linear MLR, a regression partial least squares (PLS), nonlinear regression (MNLR), to see which descriptors are closely related to the activity biologic. We relied on a principal component analysis (PCA). Based on our results, a comparison of the quality of, MLR, PLS, and MNLR models shows that the MNLR (R = 0.914 and R² = 0.835, RCV= 0.853) models have substantially better predictive capability because the MNLR approach gives better results than MLR (R = 0.835 and R² = 0,752, RCV=0.601)), PLS (R = 0.742 and R² = 0.552, RCV=0.550) The model of MNLR gave statistically significant results and showed good stability to data variation in leave-one-out cross-validation. The obtained results suggested that our proposed model MNLR may be useful to predict the biological activity of derivatives of quinoline.

Keywords: antimalarial, quinoline, QSAR, PCA, MLR , MNLR, MLR

Procedia PDF Downloads 151
3065 Agile Software Effort Estimation Using Regression Techniques

Authors: Mikiyas Adugna

Abstract:

Effort estimation is among the activities carried out in software development processes. An accurate model of estimation leads to project success. The method of agile effort estimation is a complex task because of the dynamic nature of software development. Researchers are still conducting studies on agile effort estimation to enhance prediction accuracy. Due to these reasons, we investigated and proposed a model on LASSO and Elastic Net regression to enhance estimation accuracy. The proposed model has major components: preprocessing, train-test split, training with default parameters, and cross-validation. During the preprocessing phase, the entire dataset is normalized. After normalization, a train-test split is performed on the dataset, setting training at 80% and testing set to 20%. We chose two different phases for training the two algorithms (Elastic Net and LASSO) regression following the train-test-split. In the first phase, the two algorithms are trained using their default parameters and evaluated on the testing data. In the second phase, the grid search technique (the grid is used to search for tuning and select optimum parameters) and 5-fold cross-validation to get the final trained model. Finally, the final trained model is evaluated using the testing set. The experimental work is applied to the agile story point dataset of 21 software projects collected from six firms. The results show that both Elastic Net and LASSO regression outperformed the compared ones. Compared to the proposed algorithms, LASSO regression achieved better predictive performance and has acquired PRED (8%) and PRED (25%) results of 100.0, MMRE of 0.0491, MMER of 0.0551, MdMRE of 0.0593, MdMER of 0.063, and MSE of 0.0007. The result implies LASSO regression algorithm trained model is the most acceptable, and higher estimation performance exists in the literature.

Keywords: agile software development, effort estimation, elastic net regression, LASSO

Procedia PDF Downloads 61
3064 The Risk of Post-stroke Pneumonia and Its One-Year Disability in Taiwan

Authors: Hui-Chi Huang, Su-Ju Yang, Ching-Wei Lin, Jui-Yao Tsai, Liang-Yiang

Abstract:

Background: Evidence exists that pneumonia is a frequently encountered complication after stroke which is associated with a higher rate of mortality and increased long-term disability Purpose: To determine the predictors associated with the risk of one-year disability in acute stroke. Methods: Data for this longitudinal follow-up study were extracted from a tertiary referral medical center’s stroke registry database in Northern Taipei. Eligible patients with acute stroke admitted to the hospital and completed a one-year follow up were recruited for analysis. Favorable outcome was defined as a modified Rankin Scale score ≤ 2. SAS version 9.2 was used for the multivariable regression analyses to examine the factors correlated with the one-year disability in stroke patients. Results: From January 2012 to December 2013, a total of 1373 (mean age: 70.49±15.4 years, 913(66.5%) males) consecutively administered acute stroke patients were recruited. Overall, the rate of one-year disability was 37.20%(404/1086) in those without post-stroke pneumonia. It increased to 82.93 %(238/287) in patients developed post-stroke pneumonia. Factors associated with increased risk of disability were age ≧ 75(OR= 4.845, p<.0001), female /gender (OR=1.568, p =.0062), previous stroke (OR= 1.868, p = <. 0001) ,dementia (OR= 2.872, p =.0047), ventilator use (OR= 4.653, p <. 0001),age ≧ 75 /pneumonia (OR=1.236, p <. 0001) , ICU admission (OR=3.314, p <.0001) , nasogastric tube insertion (OR= 4.28, p <.0001), speech therapy (OR= 1.79, p =.0142), urinary tract infection (OR= 1.865, p =.0018), estimated glomerular filtration rate (eGFR > 60 )(OR= 0.525, p= .0029), Admission NIHSS >11 (OR= 2.101, p = .0099), Length of hospitalization > 30(d) (OR= 5.182, p <.0001). Conclusion: Older age, severe neurological deficit, complications, rehabilitation intervention, length of hospitalization >30(d), and cognitive impairment were significantly associated with Post-stroke functional impairment, especially those with post-stroke pneumonia. These findings could open new avenues in the management of stroke patients.

Keywords: stroke, risk, pneumonia, disability

Procedia PDF Downloads 227
3063 Robustified Asymmetric Logistic Regression Model for Global Fish Stock Assessment

Authors: Osamu Komori, Shinto Eguchi, Hiroshi Okamura, Momoko Ichinokawa

Abstract:

The long time-series data on population assessments are essential for global ecosystem assessment because the temporal change of biomass in such a database reflects the status of global ecosystem properly. However, the available assessment data usually have limited sample sizes and the ratio of populations with low abundance of biomass (collapsed) to those with high abundance (non-collapsed) is highly imbalanced. To allow for the imbalance and uncertainty involved in the ecological data, we propose a binary regression model with mixed effects for inferring ecosystem status through an asymmetric logistic model. In the estimation equation, we observe that the weights for the non-collapsed populations are relatively reduced, which in turn puts more importance on the small number of observations of collapsed populations. Moreover, we extend the asymmetric logistic regression model using propensity score to allow for the sample biases observed in the labeled and unlabeled datasets. It robustified the estimation procedure and improved the model fitting.

Keywords: double robust estimation, ecological binary data, mixed effect logistic regression model, propensity score

Procedia PDF Downloads 261
3062 Urban-Rural Inequality in Mexico after Nafta: A Quantile Regression Analysis

Authors: Rene Valdiviezo-Issa

Abstract:

In this paper, we use Mexico’s Households Income and Expenditures (ENIGH) survey to explain the behaviour that the urban-rural expenditure gap has had since Mexico’s incorporation to the North American Free Trade Agreement (NAFTA) in 1994 and we compare it with the latest available survey, which took place in 2014. We use real trimestral expenditure per capita (RTEPC) as the measure of welfare. We use quantile regressions and a quantile regression decomposition to describe the gap between urban and rural distributions of log RTEPC. We discover that the decrease in the difference between the urban and rural distributions of log RTEPC, or inequality, is motivated because of a deprivation of the urban areas, in very specific characteristics, rather than an improvement of the urban areas. When using the decomposition we observe that the gap is primarily brought about because differences in returns to covariates between the urban and rural areas.

Keywords: quantile regression, urban-rural inequality, inequality in Mexico, income decompositon

Procedia PDF Downloads 276
3061 Global Optimization: The Alienor Method Mixed with Piyavskii-Shubert Technique

Authors: Guettal Djaouida, Ziadi Abdelkader

Abstract:

In this paper, we study a coupling of the Alienor method with the algorithm of Piyavskii-Shubert. The classical multidimensional global optimization methods involves great difficulties for their implementation to high dimensions. The Alienor method allows to transform a multivariable function into a function of a single variable for which it is possible to use efficient and rapid method for calculating the the global optimum. This simplification is based on the using of a reducing transformation called Alienor.

Keywords: global optimization, reducing transformation, α-dense curves, Alienor method, Piyavskii-Shubert algorithm

Procedia PDF Downloads 498
3060 Developing Variable Repetitive Group Sampling Control Chart Using Regression Estimator

Authors: Liaquat Ahmad, Muhammad Aslam, Muhammad Azam

Abstract:

In this article, we propose a control chart based on repetitive group sampling scheme for the location parameter. This charting scheme is based on the regression estimator; an estimator that capitalize the relationship between the variables of interest to provide more sensitive control than the commonly used individual variables. The control limit coefficients have been estimated for different sample sizes for less and highly correlated variables. The monitoring of the production process is constructed by adopting the procedure of the Shewhart’s x-bar control chart. Its performance is verified by the average run length calculations when the shift occurs in the average value of the estimator. It has been observed that the less correlated variables have rapid false alarm rate.

Keywords: average run length, control charts, process shift, regression estimators, repetitive group sampling

Procedia PDF Downloads 555
3059 The Effect of Slum Neighborhoods on Pregnancy Outcomes in Tanzania: Secondary Analysis of the 2015-2016 Tanzania Demographic and Health Survey Data

Authors: Luisa Windhagen, Atsumi Hirose, Alex Bottle

Abstract:

Global urbanization has resulted in the expansion of slums, leaving over 10 million Tanzanians in urban poverty and at risk of poor health. Whilst rural residence has historically been associated with an increased risk of adverse pregnancy outcomes, recent studies found higher perinatal mortality rates in urban Tanzania. This study aims to understand to what extent slum neighborhoods may account for the spatial disparities seen in Tanzania. We generated a slum indicator based on UN-HABITAT criteria to identify slum clusters within the 2015-2016 Tanzania Demographic and Health Survey. Descriptive statistics, disaggregated by urban slum, urban non-slum, and rural areas, were produced. Simple and multivariable logistic regression examined the association between cluster residence type and neonatal mortality and stillbirth. For neonatal mortality, we additionally built a multilevel logistic regression model, adjusting for confounding and clustering. The neonatal mortality ratio was highest in slums (38.3 deaths per 1000 live births); the stillbirth rate was three times higher in slums (32.4 deaths per 1000 births) than in urban non-slums. Neonatal death was more likely to occur in slums than in urban non-slums (aOR=2.15, 95% CI=1.02-4.56) and rural areas (aOR=1.78, 95% CI=1.15-2.77). Odds of stillbirth were over five times higher among rural than urban non-slum residents (aOR=5.25, 95% CI=1.31-20.96). The results suggest that slums contribute to the urban disadvantage in Tanzanian neonatal health. Higher neonatal mortality in slums may be attributable to lack of education, lower socioeconomic status, poor healthcare access, and environmental factors, including indoor and outdoor air pollution and unsanitary conditions from inadequate housing. However, further research is required to ascertain specific causalities as well as significant associations between residence type and other pregnancy outcomes. The high neonatal mortality, stillbirth, and slum formation rates in Tanzania signify that considerable change is necessary to achieve international goals for health and human settlements. Disparities in access to adequate housing, safe water and sanitation, high standard antenatal, intrapartum, and neonatal care, and maternal education need to urgently be addressed. This study highlights the spatial neonatal mortality shift from rural settings to urban informal settlements in Tanzania. Importantly, other low- and middle-income countries experiencing overwhelming urbanization and slum expansion may also be at risk of a reversing trend in residential neonatal health differences.

Keywords: urban health, slum residence, neonatal mortality, stillbirth, global urbanisation

Procedia PDF Downloads 60
3058 BART Matching Method: Using Bayesian Additive Regression Tree for Data Matching

Authors: Gianna Zou

Abstract:

Propensity score matching (PSM), introduced by Paul R. Rosenbaum and Donald Rubin in 1983, is a popular statistical matching technique which tries to estimate the treatment effects by taking into account covariates that could impact the efficacy of study medication in clinical trials. PSM can be used to reduce the bias due to confounding variables. However, PSM assumes that the response values are normally distributed. In some cases, this assumption may not be held. In this paper, a machine learning method - Bayesian Additive Regression Tree (BART), is used as a more robust method of matching. BART can work well when models are misspecified since it can be used to model heterogeneous treatment effects. Moreover, it has the capability to handle non-linear main effects and multiway interactions. In this research, a BART Matching Method (BMM) is proposed to provide a more reliable matching method over PSM. By comparing the analysis results from PSM and BMM, BMM can perform well and has better prediction capability when the response values are not normally distributed.

Keywords: BART, Bayesian, matching, regression

Procedia PDF Downloads 140
3057 Cross-Sectional Association between Socio-Demographic Factors and Paid Blood Donation in Half Million Chinese Population

Authors: Jiashu Shen, Guoting Zhang, Zhicheng Wang, Yu Wang, Yun Liang, Siyu Zou, Fan Yang, Kun Tang

Abstract:

Objectives: This study aims to enhance the understanding of paid blood donors’ characteristics in Chinese population and devise strategies to protect these paid donors. Background: Paid blood donation was the predominant mode of blood donation in China from the 1970s to 1998 and caused several health and social problems including largely increased the risk of infectious diseases with nonstandard operation in unhygienic conditions. Methods: This study utilized the cross-sectional data from the China Kadoorie Biobank with about 0.5 million people from 10 regions of China from 2004 to 2008. Multivariable logistic regression was performed to examine the associations between socio-demographic factors and paid blood donation. Furthermore, a stratified analysis was applied in education level and annual household income by rural and urban areas. Results: The prevalence of paid blood donation was 0.50% in China and males were more likely to donate blood than females (Adjusted odds ratio (AOR) =0.81, 95%Confident Intervals (CI): 0.75-0.88). Urban people had much lower odds than rural people (AOR =0.24, 95%CI: 0.21-0.27). People with a high annual household income had lower odds of paid blood donation compared with that of people with low income (AOR=0.37, 95%CI: 0.31-0.44). Compared with people who didn’t receive school education, people in a higher level of education had increased odds of paid blood donation (AOR=2.31, 95%CI: 1.94-2.74). Conclusion: Paid blood donors in China were associated with those who were males, living in rural areas, with low annual household income and educational background.

Keywords: China Kadoorie Biobank, Chinese population, paid blood donation, socio-demographic factors

Procedia PDF Downloads 146
3056 The Relationship Between Hourly Compensation and Unemployment Rate Using the Panel Data Regression Analysis

Authors: S. K. Ashiquer Rahman

Abstract:

the paper concentrations on the importance of hourly compensation, emphasizing the significance of the unemployment rate. There are the two most important factors of a nation these are its unemployment rate and hourly compensation. These are not merely statistics but they have profound effects on individual, families, and the economy. They are inversely related to one another. When we consider the unemployment rate that will probably decline as hourly compensations in manufacturing rise. But when we reduced the unemployment rates and increased job prospects could result from higher compensation. That’s why, the increased hourly compensation in the manufacturing sector that could have a favorable effect on job changing issues. Moreover, the relationship between hourly compensation and unemployment is complex and influenced by broader economic factors. In this paper, we use panel data regression models to evaluate the expected link between hourly compensation and unemployment rate in order to determine the effect of hourly compensation on unemployment rate. We estimate the fixed effects model, evaluate the error components, and determine which model (the FEM or ECM) is better by pooling all 60 observations. We then analysis and review the data by comparing 3 several countries (United States, Canada and the United Kingdom) using panel data regression models. Finally, we provide result, analysis and a summary of the extensive research on how the hourly compensation effects on the unemployment rate. Additionally, this paper offers relevant and useful informational to help the government and academic community use an econometrics and social approach to lessen on the effect of the hourly compensation on Unemployment rate to eliminate the problem.

Keywords: hourly compensation, Unemployment rate, panel data regression models, dummy variables, random effects model, fixed effects model, the linear regression model

Procedia PDF Downloads 72