Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 3258

Search results for: logistic regression

3168 Comparison Study of Machine Learning Classifiers for Speech Emotion Recognition

Authors: Aishwarya Ravindra Fursule, Shruti Kshirsagar

Abstract:

In the intersection of artificial intelligence and human-centered computing, this paper delves into speech emotion recognition (SER). It presents a comparative analysis of machine learning models such as K-Nearest Neighbors (KNN),logistic regression, support vector machines (SVM), decision trees, ensemble classifiers, and random forests, applied to SER. The research employs four datasets: Crema D, SAVEE, TESS, and RAVDESS. It focuses on extracting salient audio signal features like Zero Crossing Rate (ZCR), Chroma_stft, Mel Frequency Cepstral Coefficients (MFCC), root mean square (RMS) value, and MelSpectogram. These features are used to train and evaluate the models’ ability to recognize eight types of emotions from speech: happy, sad, neutral, angry, calm, disgust, fear, and surprise. Among the models, the Random Forest algorithm demonstrated superior performance, achieving approximately 79% accuracy. This suggests its suitability for SER within the parameters of this study. The research contributes to SER by showcasing the effectiveness of various machine learning algorithms and feature extraction techniques. The findings hold promise for the development of more precise emotion recognition systems in the future. This abstract provides a succinct overview of the paper’s content, methods, and results.

Keywords: comparison, ML classifiers, KNN, decision tree, SVM, random forest, logistic regression, ensemble classifiers

Procedia PDF Downloads 9

3167 Classical and Bayesian Inference of the Generalized Log-Logistic Distribution with Applications to Survival Data

Authors: Abdisalam Hassan Muse, Samuel Mwalili, Oscar Ngesa

Abstract:

A generalized log-logistic distribution with variable shapes of the hazard rate was introduced and studied, extending the log-logistic distribution by adding an extra parameter to the classical distribution, leading to greater flexibility in analysing and modeling various data types. The proposed distribution has a large number of well-known lifetime special sub-models such as; Weibull, log-logistic, exponential, and Burr XII distributions. Its basic mathematical and statistical properties were derived. The method of maximum likelihood was adopted for estimating the unknown parameters of the proposed distribution, and a Monte Carlo simulation study is carried out to assess the behavior of the estimators. The importance of this distribution is that its tendency to model both monotone (increasing and decreasing) and non-monotone (unimodal and bathtub shape) or reversed “bathtub” shape hazard rate functions which are quite common in survival and reliability data analysis. Furthermore, the flexibility and usefulness of the proposed distribution are illustrated in a real-life data set and compared to its sub-models; Weibull, log-logistic, and BurrXII distributions and other parametric survival distributions with 3-parmaeters; like the exponentiated Weibull distribution, the 3-parameter lognormal distribution, the 3- parameter gamma distribution, the 3-parameter Weibull distribution, and the 3-parameter log-logistic (also known as shifted log-logistic) distribution. The proposed distribution provided a better fit than all of the competitive distributions based on the goodness-of-fit tests, the log-likelihood, and information criterion values. Finally, Bayesian analysis and performance of Gibbs sampling for the data set are also carried out.

Keywords: hazard rate function, log-logistic distribution, maximum likelihood estimation, generalized log-logistic distribution, survival data, Monte Carlo simulation

Procedia PDF Downloads 165

3166 Study on the Factors Influencing the Built Environment of Residential Areas on the Lifestyle Walking Trips of the Elderly

Authors: Daming Xu, Yuanyuan Wang

Abstract:

Abstract: Under the trend of rapid expansion of urbanization, the motorized urban characteristics become more and more obvious, and the walkability of urban space is seriously affected. The construction of walkability of space, as the main mode of travel for the elderly in their daily lives, has become more and more important in the current social context of serious aging. Settlement is the most basic living unit of residents, and daily shopping, medical care, and other daily trips are closely related to the daily life of the elderly. Therefore, it is of great practical significance to explore the impact of built environment on elderly people's daily walking trips at the settlement level for the construction of pedestrian-friendly settlements for the elderly. The study takes three typical settlements in Harbin Daoli District in three different periods as examples and obtains data on elderly people's walking trips and built environment characteristics through field research, questionnaire distribution, and internet data acquisition. Finally, correlation analysis and multinomial logistic regression model were applied to analyze the influence mechanism of built environment on elderly people's walkability based on the control of personal attribute variables in order to provide reference and guidance for the construction of walkability for elderly people in built environment in the future.

Keywords: built environment, elderly, walkability, multinomial logistic regression model

Procedia PDF Downloads 49

3165 A Machine Learning Model for Predicting Students’ Academic Performance in Higher Institutions

Authors: Emmanuel Osaze Oshoiribhor, Adetokunbo MacGregor John-Otumu

Abstract:

There has been a need in recent years to predict student academic achievement prior to graduation. This is to assist them in improving their grades, especially for those who have struggled in the past. The purpose of this research is to use supervised learning techniques to create a model that predicts student academic progress. Many scholars have developed models that predict student academic achievement based on characteristics including smoking, demography, culture, social media, parent educational background, parent finances, and family background, to mention a few. This element, as well as the model used, could have misclassified the kids in terms of their academic achievement. As a prerequisite to predicting if the student will perform well in the future on related courses, this model is built using a logistic regression classifier with basic features such as the previous semester's course score, attendance to class, class participation, and the total number of course materials or resources the student is able to cover per semester. With a 96.7 percent accuracy, the model outperformed other classifiers such as Naive bayes, Support vector machine (SVM), Decision Tree, Random forest, and Adaboost. This model is offered as a desktop application with user-friendly interfaces for forecasting student academic progress for both teachers and students. As a result, both students and professors are encouraged to use this technique to predict outcomes better.

Keywords: artificial intelligence, ML, logistic regression, performance, prediction

Procedia PDF Downloads 80

3164 Investigating the Impacts on Cyclist Casualty Severity at Roundabouts: A UK Case Study

Authors: Nurten Akgun, Dilum Dissanayake, Neil Thorpe, Margaret C. Bell

Abstract:

Cycling has gained a great attention with comparable speeds, low cost, health benefits and reducing the impact on the environment. The main challenge associated with cycling is the provision of safety for the people choosing to cycle as their main means of transport. From the road safety point of view, cyclists are considered as vulnerable road users because they are at higher risk of serious casualty in the urban network but more specifically at roundabouts. This research addresses the development of an enhanced mathematical model by including a broad spectrum of casualty related variables. These variables were geometric design measures (approach number of lanes and entry path radius), speed limit, meteorological condition variables (light, weather, road surface) and socio-demographic characteristics (age and gender), as well as contributory factors. Contributory factors included driver’s behavior related variables such as failed to look properly, sudden braking, a vehicle passing too close to a cyclist, junction overshot, failed to judge other person’s path, restart moving off at the junction, poor turn or manoeuvre and disobeyed give-way. Tyne and Wear in the UK were selected as a case study area. The cyclist casualty data was obtained from UK STATS19 National dataset. The reference categories for the regression model were set to slight and serious cyclist casualties. Therefore, binary logistic regression was applied. Binary logistic regression analysis showed that approach number of lanes was statistically significant at the 95% level of confidence. A higher number of approach lanes increased the probability of severity of cyclist casualty occurrence. In addition, sudden braking statistically significantly increased the cyclist casualty severity at the 95% level of confidence. The result concluded that cyclist casualty severity was highly related to approach a number of lanes and sudden braking. Further research should be carried out an in-depth analysis to explore connectivity of sudden braking and approach number of lanes in order to investigate the driver’s behavior at approach locations. The output of this research will inform investment in measure to improve the safety of cyclists at roundabouts.

Keywords: binary logistic regression, casualty severity, cyclist safety, roundabout

Procedia PDF Downloads 157

3163 Modelling the Effect of Physical Environment Factors on Child Pedestrian Severity Collisions in Malaysia: A Multinomial Logistic Regression Analysis

Authors: Muhamad N. Borhan, Nur S. Darus, Siti Z. Ishak, Rozmi Ismail, Siti F. M. Razali

Abstract:

Children are at the greater risk to be involved in road traffic collisions due to the complex interaction of various elements in our transportation system. It encompasses interactions between the elements of children and driver behavior along with physical and social environment factors. The present study examined the effect between the collisions severity and physical environment factors on child pedestrian collisions. The severity of collisions is categorized into four injury outcomes: fatal, serious injury, slight injury, and damage. The sample size comprised of 2487 cases of child pedestrian-vehicle collisions in which children aged 7 to 12 years old was involved in Malaysia for the years 2006-2015. A multinomial logistic regression was applied to establish the effect between severity levels and physical environment factors. The results showed that eight contributing factors influence the probability of an injury road surface material, traffic system, road marking, control type, lighting condition, type of location, land use and road surface condition. Understanding the effect of physical environment factors may contribute to the improvement of physical environment design and decrease the collision involvement.

Keywords: child pedestrian, collisions, primary school, road injuries

Procedia PDF Downloads 140

3162 Development of the Logistic Service Providers under the Pandemic Affects during COVID-19 in Turkey

Authors: Süleyman Günes

Abstract:

The crucial effects of the COVID-19 pandemic have on social and economic systems in Turkey as well as all over the world. It has impacted logistic providers and worldwide supply chains. Unexpected risks played a central role in creating vulnerabilities for logistics service operations during the pandemic terms. This study aims to research and design qualitative and quantitive contributions to logistic services. The COVID-19 pandemic brought unavoidable risks to the logistics industry in Turkey. The Logistic Service Providers (LSPs) have learned how to ensure uncertainties and risks triggered by main and adverse effects. The risks that LSPs encounter during the COVID-19 pandemic have been investigated and unveiled, and identified uncertainties and risks. The cause-effect structures were displayed by the qualitative and quantitive studies. The results suggest that supply chains and demand changes triggered by the COVID-19 pandemic while it influenced financial failure and forecast horizon with operational performances.

Keywords: logistic service providers, COVID-19, development, financial failure

Procedia PDF Downloads 50

3161 Charting Sentiments with Naive Bayes and Logistic Regression

Authors: Jummalla Aashrith, N. L. Shiva Sai, K. Bhavya Sri

Abstract:

The swift progress of web technology has not only amassed a vast reservoir of internet data but also triggered a substantial surge in data generation. The internet has metamorphosed into one of the dynamic hubs for online education, idea dissemination, as well as opinion-sharing. Notably, the widely utilized social networking platform Twitter is experiencing considerable expansion, providing users with the ability to share viewpoints, participate in discussions spanning diverse communities, and broadcast messages on a global scale. The upswing in online engagement has sparked a significant curiosity in subjective analysis, particularly when it comes to Twitter data. This research is committed to delving into sentiment analysis, focusing specifically on the realm of Twitter. It aims to offer valuable insights into deciphering information within tweets, where opinions manifest in a highly unstructured and diverse manner, spanning a spectrum from positivity to negativity, occasionally punctuated by neutrality expressions. Within this document, we offer a comprehensive exploration and comparative assessment of modern approaches to opinion mining. Employing a range of machine learning algorithms such as Naive Bayes and Logistic Regression, our investigation plunges into the domain of Twitter data streams. We delve into overarching challenges and applications inherent in the realm of subjectivity analysis over Twitter.

Keywords: machine learning, sentiment analysis, visualisation, python

Procedia PDF Downloads 21

3160 Improving the Logistic System to Secure Effective Food Fish Supply Chain in Indonesia

Authors: Atikah Nurhayati, Asep A. Handaka

Abstract:

Indonesia is a world’s major fish producer which can feed not only its citizens but also the people of the world. Currently, the total annual production is 11 tons and expected to double by the year of 2050. Given the potential, fishery has been an important part of the national food security system in Indonesia. Despite such a potential, a big challenge is facing the Indonesians in making fish the reliable source for their food, more specifically source of protein intake. The long geographic distance between the fish production centers and the consumer concentrations has prevented effective supply chain from producers to consumers and therefore demands a good logistic system. This paper is based on our research, which aimed at analyzing the fish supply chain and is to suggest relevant improvement to the chain. The research was conducted in the Year of 2016 in selected locations of Java Island, where intensive transaction on fishery commodities occur. Data used in this research comprises secondary data of time series reports on production and distribution and primary data regarding distribution aspects which were collected through interviews with purposively selected 100 respondents representing fishers, traders and processors. The data were analyzed following the supply chain management framework and processed following logistic regression and validity tests. The main findings of the research are as follows. Firstly, it was found that improperly managed connectivity and logistic chain is the main cause for insecurity of availability and affordability for the consumers. Secondly, lack of quality of most local processed products is a major obstacle for improving affordability and connectivity. The paper concluded with a number of recommended strategies to tackle the problem. These include rationalization of the length of the existing supply chain, intensification of processing activities, and improvement of distribution infrastructure and facilities.

Keywords: fishery, food security, logistic, supply chain

Procedia PDF Downloads 211

3159 Modelling the Impacts of Geophysical Parameters on Deforestation and Forest Degradation in Pre and Post Ban Logging Periods in Hindu Kush Himalayas

Authors: Alam Zeb, Glen W. Armstrong, Muhammad Qasim

Abstract:

Loss of forest cover is one of the most important land cover changes and has been of great concern to policy makers. This study quantified forest cover changes over pre logging ban (1973-1993) and post logging ban (1993-2015) to examine the role of geophysical factors and spatial attributes of land in the two periods. We show that despite a complete ban on green felling, forest cover decreased by 28% and mostly converted to rangeland. Nevertheless, the logging ban was completely effective in controlling agriculture expansion. The binary logistic regression revealed that the south facing aspects at low elevation witnessed more deforestation in the pre-ban period compared to post-ban. Opposite to deforestation, forest degradation was more prominent on the northern aspects at higher elevation during the policy period. Agriculture expansion was widespread in the low elevation flat areas with gentle slope, while during the policy period agriculture contraction in the form of regeneration was observed on the low elevation areas of north facing slopes. All proximity variables, except distance to administrative boundary, showed a similar trend across the two periods and were important explanatory variables in understanding forest and agriculture expansion. The changes in determinants of forest and agriculture expansion and contraction over the two periods might be attributed to the influence of policy and a general decrease in resource availability.

Keywords: forest conservation , wood harvesting ban, logistic regression, deforestation, forest degradation, agriculture expansion, Chitral, Pakistan

Procedia PDF Downloads 199

3158 Prevalence and Associated Factors of Attention Deficit Hyperactivity Disorder among Children Age 6 to 17 Years Old Living in Girja District, Oromia Regional State, Rural Ethiopia: Community Based Cross-Sectional Study

Authors: Hirbaye Mokona, Abebaw Gebeyehu, Aemro Zerihun

Abstract:

Introduction: Attention deficit hyperactivity disorder is serious public health problem affecting millions of children throughout the world. Method: A cross-sectional study conducted from May to June 2015 among children age 6 to 17 years living in rural area of Girja district. Multi-stage cluster sampling technique was used to select 1302 study participants. Disruptive Behavior Disorder rating scale was used to collect the data. Data were coded, entered and cleaned by Epi-Data version 3.1 and analyzed by SPSS version 20. Logistic regression analysis was used and Variables that have P-values less than 0.05 on multivariable logistic regression was considered as statistically significant. Results: Prevalence of Attention deficit hyperactivity disorder (ADHD) among children age 6 to 17 years was 7.3%. Being male [AOR=1.81, 95%CI: (1.13, 2.91)]; living with single parent [AOR=5.0, 95%CI: (2.35, 10.65)]; child birth order/rank [AOR=2.35, 95%CI: (1.30, 4.25)]; low family socio-economic status [AOR= 2.43, 95%CI: (1.29, 4.59)]; maternal alcohol/khat use during pregnancy [AOR=3.14, 95%CI: (1.37, 7.37)] and complication at delivery [AOR=3.56, 95%CI: (1.19, 10.64)] were more likely to develop Attention deficit hyperactivity disorder. Conclusion: In this study, the prevalence of Attention deficit hyperactivity disorder was similar with worldwide prevalence. Prevention and early management of its modifiable risk factors should be carryout alongside increasing community awareness.

Keywords: attention deficit hyperactivity disorder, ADHD, associated factors, children, prevalence

Procedia PDF Downloads 150

3157 Teachers’ Intention to Leave: Educational Policies as External Stress Factor

Authors: A. Myrzabekova, D. Nurmukhamed, K. Nurumov, A. Zhulbarissova

Abstract:

It is widely believed that stress can affect teachers’ intention to change the workplace. While existing research primarily focuses on the intrinsic sources of stress stemming from the school climate, the current attempt analyzes educational policies as one of the determinants of teacher’s intention to leave schools. In this respect, Kazakhstan presents a unique case since the country endorsed several educational policies which directly impacted teaching and administrative practices within schools. Using Teaching and Learning International Survey 2018 (TALIS) data with the country specific questionnaire, we construct a statistical measure of stress caused by the implementation of educational policies and test its impact on teacher’s intention to leave through the logistic regression. In addition, we control for sociodemographic, professional, and students related covariates while considering the intrinsic dimension of stress stemming from the school climate. Overall, our results suggest that stress caused by the educational policies has a statistically significant positive effect on teachers’ intentions to transfer between schools. Both policy makers and educational scholars could find these results beneficial. For the former careful planning and addressing the negative effects of the educational policies is critical for the sustainability of the educational process. For the latter, accounting for exogenous sources of stress can lead to a more complete understanding of why teachers decide to change their schools.

Keywords: educational policies, Kazakhstani teachers, logistic regression factor analysis, sustainability education TALIS, teacher turnover intention, work stress

Procedia PDF Downloads 73

3156 Point Estimation for the Type II Generalized Logistic Distribution Based on Progressively Censored Data

Authors: Rana Rimawi, Ayman Baklizi

Abstract:

Skewed distributions are important models that are frequently used in applications. Generalized distributions form a class of skewed distributions and gain widespread use in applications because of their flexibility in data analysis. More specifically, the Generalized Logistic Distribution with its different types has received considerable attention recently. In this study, based on progressively type-II censored data, we will consider point estimation in type II Generalized Logistic Distribution (Type II GLD). We will develop several estimators for its unknown parameters, including maximum likelihood estimators (MLE), Bayes estimators and linear estimators (BLUE). The estimators will be compared using simulation based on the criteria of bias and Mean square error (MSE). An illustrative example of a real data set will be given.

Keywords: point estimation, type II generalized logistic distribution, progressive censoring, maximum likelihood estimation

Procedia PDF Downloads 165

3155 Lifestyle Factors Associated With Overweight/obesity Status In Croatian Adolescents: A Population-Based Study

Authors: Lovro Štefan

Abstract:

The main purpose of the present study was to investigate the associations between the overweight/obesity status and lifestyle factors. In this cross-sectional study, participants were 1950 urban secondary-school students (54.7% of female students) aged 17-18 years old. Dependent variable was body-mass index status derived from self-reported height and weight. The outcome was binarised, where participants with value <25 kg/m2 were collapsed into „normal“, while those ≥25 kg/m2 into „overweight/obesity“ category. Independent variables were gender, type of school, physical activity, sedentary behaviour, self-rated health, self-perceived socioeconomic status and psychological distress. The associations between the dependent and independent variables were analyzed by using multiple logistic regression analysis. In the univariate model, being overweight/obese was significantly associated with being a male student (OR 0.31; 95% CI 0.23 to 0.42), attending a vocational school (OR 1.87; 95% CI 1.42 to 2.48), not meeting the recommendations for moderate-to-vigorous physical activity (OR 0.44; 95% CI 0.22 to 0.88), more time spending in sedentary behaviour (OR 1.53; 95% CI 1.07 to 2.19), poor self-rated health (OR 0.35, 95% CI 0.20 to 0.56) and lower socioeconomic status (OR 0.63; 95% CI 0.48 to 0.84). In the multivariate model, the same associations occured between the dependent and independent variable. In both models, psychological distress was not associated with being overweight/obese. In conclusion, our findings suggest, that lifestyle factors are independently associated with body-mass index

Keywords: body mass index, secondary-school students, Croatia, physical activity, sedentary behaviour, logistic regression

Procedia PDF Downloads 44

3154 Mediterranean Diet, Duration of Admission and Mortality in Elderly, Hospitalized Patients: A Cross-Sectional Study

Authors: Christos Lampropoulos, Maria Konsta, Ifigenia Apostolou, Vicky Dradaki, Tamta Sirbilatze, Irini Dri, Christina Kordali, Vaggelis Lambas, Kostas Argyros, Georgios Mavras

Abstract:

Objectives: Mediterranean diet has been associated with lower incidence of cardiovascular disease and cancer. The purpose of our study was to examine the hypothesis that Mediterranean diet may protect against mortality and reduce admission duration in elderly, hospitalized patients. Methods: Sample population included 150 patients (78 men, 72 women, mean age 80±8.2). The following data were taken into account in analysis: anthropometric and laboratory data, dietary habits (MedDiet score), patients’ nutritional status [Mini Nutritional Assessment (MNA) score], physical activity (International Physical Activity Questionnaires, IPAQ), smoking status, cause and duration of current admission, medical history (co-morbidities, previous admissions). Primary endpoints were mortality (from admission until 6 months afterwards) and duration of admission, compared to national guidelines for closed consolidated medical expenses. Logistic regression and linear regression analysis were performed in order to identify independent predictors for mortality and admission duration difference respectively. Results: According to MNA, nutrition was normal in 54/150 (36%) of patients, 46/150 (30.7%) of them were at risk of malnutrition and the rest 50/150 (33.3%) were malnourished. After performing multivariate logistic regression analysis we found that the odds of death decreased 30% per each unit increase of MedDiet score (OR=0.7, 95% CI:0.6-0.8, p < 0.0001). Patients with cancer-related admission were 37.7 times more likely to die, compared to those with infection (OR=37.7, 95% CI:4.4-325, p=0.001). According to multivariate linear regression analysis, admission duration was inversely related to Mediterranean diet, since it is decreased 0.18 days on average for each unit increase of MedDiet score (b:-0.18, 95% CI:-0.33 - -0.035, p=0.02). Additionally, the duration of current admission increased on average 0.83 days for each previous hospital admission (b:0.83, 95% CI:0.5-1.16, p<0.0001). The admission duration of patients with cancer was on average 4.5 days higher than the patients who admitted due to infection (b:4.5, 95% CI:0.9-8, p=0.015). Conclusion: Mediterranean diet adequately protects elderly, hospitalized patients against mortality and reduces the duration of hospitalization.

Keywords: Mediterranean diet, malnutrition, nutritional status, prognostic factors for mortality

Procedia PDF Downloads 277

3153 A Hybrid Adomian Decomposition Method in the Solution of Logistic Abelian Ordinary Differential and Its Comparism with Some Standard Numerical Scheme

Authors: F. J. Adeyeye, D. Eni, K. M. Okedoye

Abstract:

In this paper we present a Hybrid of Adomian decomposition method (ADM). This is the substitution of a One-step method of Taylor’s series approximation of orders I and II, into the nonlinear part of Adomian decomposition method resulting in a convergent series scheme. This scheme is applied to solve some Logistic problems represented as Abelian differential equation and the results are compared with the actual solution and Runge-kutta of order IV in order to ascertain the accuracy and efficiency of the scheme. The findings shows that the scheme is efficient enough to solve logistic problems considered in this paper.

Keywords: Adomian decomposition method, nonlinear part, one-step method, Taylor series approximation, hybrid of Adomian polynomial, logistic problem, Malthusian parameter, Verhulst Model

Procedia PDF Downloads 368

3152 Trajectories of Depression Anxiety and Stress among Breast Cancer Patients: Assessment at First Year of Diagnosis

Authors: Jyoti Srivastava, Sandhya S. Kaushik, Mallika Tewari, Hari S. Shukla

Abstract:

Little information is available about the development of psychological well being over time among women who have been undergoing treatment for breast cancer. The aim of this study was to identify the trajectories of depression anxiety and stress among women with early-stage breast cancer. Of the 48 Indian women with newly diagnosed early-stage breast cancer recruited from surgical oncology unit, 39 completed an interview and were assessed for depression anxiety and stress (Depression Anxiety Stress Scale-DASS 21) before their first course of chemotherapy (baseline) and follow up interviews at 3, 6 and 9 months thereafter. Growth mixture modeling was used to identify distinct trajectories of Depression Anxiety and Stress symptoms. Logistic Regression analysis was used to evaluate the characteristics of women in distinct groups. Most women showed mild to moderate level of depression and anxiety (68%) while normal to mild level of stress (71%). But one in 11 women was chronically anxious (9%) and depressed (9%). Young age, having a partner, shorter education and receiving chemotherapy but not radiotherapy might characterize women whose psychological symptoms remain strong nine months after diagnosis. By looking beyond the mean, it was found that several socio-demographic and treatment factors characterized the women whose depression, anxiety and stress level remained severe even nine months after diagnosis. The results suggest that support provided to cancer patients should have a special focus on a relatively small group of patient most in need.

Keywords: psychological well being, growth mixture modeling, logistic regression analysis, socio-demographic factors

Procedia PDF Downloads 111

3151 Stock Prediction and Portfolio Optimization Thesis

Authors: Deniz Peksen

Abstract:

This thesis aims to predict trend movement of closing price of stock and to maximize portfolio by utilizing the predictions. In this context, the study aims to define a stock portfolio strategy from models created by using Logistic Regression, Gradient Boosting and Random Forest. Recently, predicting the trend of stock price has gained a significance role in making buy and sell decisions and generating returns with investment strategies formed by machine learning basis decisions. There are plenty of studies in the literature on the prediction of stock prices in capital markets using machine learning methods but most of them focus on closing prices instead of the direction of price trend. Our study differs from literature in terms of target definition. Ours is a classification problem which is focusing on the market trend in next 20 trading days. To predict trend direction, fourteen years of data were used for training. Following three years were used for validation. Finally, last three years were used for testing. Training data are between 2002-06-18 and 2016-12-30 Validation data are between 2017-01-02 and 2019-12-31 Testing data are between 2020-01-02 and 2022-03-17 We determine Hold Stock Portfolio, Best Stock Portfolio and USD-TRY Exchange rate as benchmarks which we should outperform. We compared our machine learning basis portfolio return on test data with return of Hold Stock Portfolio, Best Stock Portfolio and USD-TRY Exchange rate. We assessed our model performance with the help of roc-auc score and lift charts. We use logistic regression, Gradient Boosting and Random Forest with grid search approach to fine-tune hyper-parameters. As a result of the empirical study, the existence of uptrend and downtrend of five stocks could not be predicted by the models. When we use these predictions to define buy and sell decisions in order to generate model-based-portfolio, model-based-portfolio fails in test dataset. It was found that Model-based buy and sell decisions generated a stock portfolio strategy whose returns can not outperform non-model portfolio strategies on test dataset. We found that any effort for predicting the trend which is formulated on stock price is a challenge. We found same results as Random Walk Theory claims which says that stock price or price changes are unpredictable. Our model iterations failed on test dataset. Although, we built up several good models on validation dataset, we failed on test dataset. We implemented Random Forest, Gradient Boosting and Logistic Regression. We discovered that complex models did not provide advantage or additional performance while comparing them with Logistic Regression. More complexity did not lead us to reach better performance. Using a complex model is not an answer to figure out the stock-related prediction problem. Our approach was to predict the trend instead of the price. This approach converted our problem into classification. However, this label approach does not lead us to solve the stock prediction problem and deny or refute the accuracy of the Random Walk Theory for the stock price.

Keywords: stock prediction, portfolio optimization, data science, machine learning

Procedia PDF Downloads 51

3150 Machine Learning Approach for Stress Detection Using Wireless Physical Activity Tracker

Authors: B. Padmaja, V. V. Rama Prasad, K. V. N. Sunitha, E. Krishna Rao Patro

Abstract:

Stress is a psychological condition that reduces the quality of sleep and affects every facet of life. Constant exposure to stress is detrimental not only for mind but also body. Nevertheless, to cope with stress, one should first identify it. This paper provides an effective method for the cognitive stress level detection by using data provided from a physical activity tracker device Fitbit. This device gathers people’s daily activities of food, weight, sleep, heart rate, and physical activities. In this paper, four major stressors like physical activities, sleep patterns, working hours and change in heart rate are used to assess the stress levels of individuals. The main motive of this system is to use machine learning approach in stress detection with the help of Smartphone sensor technology. Individually, the effect of each stressor is evaluated using logistic regression and then combined model is built and assessed using variants of ordinal logistic regression models like logit, probit and complementary log-log. Then the quality of each model is evaluated using Akaike Information Criterion (AIC) and probit is assessed as the more suitable model for our dataset. This system is experimented and evaluated in a real time environment by taking data from adults working in IT and other sectors in India. The novelty of this work lies in the fact that stress detection system should be less invasive as possible for the users.

Keywords: physical activity tracker, sleep pattern, working hours, heart rate, smartphone sensor

Procedia PDF Downloads 207

3149 Organic Farming Profitability: Evidence from South Korea

Authors: Saem Lee, Thanh Nguyen, Hio-Jung Shin, Thomas Koellner

Abstract:

Land-use management has an influence on the provision of ecosystem service in dynamic, agricultural landscapes. Agricultural land use is important for maintaining the productivity and sustainability of agricultural ecosystems. However, in Korea, intensive farming activities in this highland agricultural zone, the upper stream of Soyang has led to contaminated soil caused by over-use pesticides and fertilizers. This has led to decrease in water and soil quality, which has consequences for ecosystem services and human wellbeing. Conventional farming has still high percentage in this area and there is no special measure to prevent low water quality caused by farming activities. Therefore, the adoption of environmentally friendly farming has been considered one of the alternatives that lead to improved water quality and increase in biomass production. Concurrently, farm households with environmentally friendly farming have occupied still low rates. Therefore, our research involved a farm household survey spanning conventional farming, the farm in transition and organic farming in Soyang watershed. Another purpose of our research was to compare economic advantage of the farmers adopting environmentally friendly farming and non-adaptors and to investigate the different factors by logistic regression analysis with socio-economic and benefit-cost ratio variables. The results found that farmers with environmentally friendly farming tended to be younger than conventional farming and farmer in transition. They are similar in terms of gender which was predominately male. Farmers with environmentally friendly farming were more educated and had less farming experience than conventional farming and farmer in transition. Based on the benefit-cost analysis, total costs that farm in transition farmers spent for one year are about two times as much as the sum of costs in environmentally friendly farming. The benefit of organic farmers was assessed with 2,800 KRW per household per year. In logistic regression, the factors having statistical significance are subsidy and district, residence period and benefit-cost ratio. And district and residence period have the negative impact on the practice of environmentally friendly farming techniques. The results of our research make a valuable contribution to provide important information to describe Korean policy-making for agricultural and water management and to consider potential approaches to policy that would substantiate ways beneficial for sustainable resource management.

Keywords: organic farming, logistic regression, profitability, agricultural land-use

Procedia PDF Downloads 363

3148 Research of the Factors Affecting the Administrative Capacity of Enterprises in the Logistic Sector of Bulgaria

Authors: R. Kenova, K. Anguelov, R. Nikolova

Abstract:

The human factor plays a major role in boosting the competitive capacity of logistic enterprises. This is of particular importance when it comes to logistic companies. On the one hand they should be strictly compliant with legislation; on the other hand, they should be competitive in terms of pricing and of delivery timelines. Moreover, their policies should allow them to be as flexible as possible. All these circumstances are reason for very serious challenges for the qualification, motivation and experience of the human resources, working in logistic companies or in logistic departments of trade and industrial enterprises. The geographic place of Bulgaria puts it in position of a country with some specific competitive advantages in the goods transport from Europe to Asia and back. Along with it, there is a number of logistic companies, that operate in this sphere in Bulgaria. In the current paper, the authors aim to establish the condition of the administrative capacity and human resources in the logistic companies and logistic departments of trade and industrial companies in Bulgaria in order to propose some guidelines for improving of their effectiveness. Due to independent empirical research, conducted in Bulgarian logistic, trade and industrial enterprises, the authors investigate both the impact degree and the interdependence of various factors that characterize the administrative capacity. The study is conducted with a prepared questionnaire, in format of direct interview with the respondents. The volume of the poll is 50 respondents, representatives of: general managers of industrial or trade enterprises; logistic managers of industrial or trade enterprises; general managers of forwarding companies – either with own or with hired transport; experts from Bulgarian association of logistics; logistic lobbyist and scientists of the relevant area. The data are gathered for 3 months, then arranged by a specialized software program and analyzed by preset criteria. Based on the results of this methodological toolbox, it can be claimed that there is a correlation between the individual criteria. Also, a commitment between the administrative capacity and other factors that determine the competitiveness of the studied companies is established. In this paper, the authors present results of the empirical research that concerns the number and the workload in the logistic departments of the enterprises. Also, what is commented is the experience, related to logistic processes management and human resources competence. Moreover, the overload level of the logistic specialists is analyzed as one of the main threats for making mistakes and losing clients. The paper stands behind the thesis that there is indispensability of forming an effective and efficient administrative capacity, based on the number, qualification, experience and motivation of the staff in the logistic companies. The paper ends with recommendations about the qualification and experience of the specialists in logistic departments; providing effective and efficient administrative capacity in the logistic departments; interdependence of the human factor and the other factors that influence the enterprise competitiveness.

Keywords: administrative capacity, human resources, logistic competitiveness, staff qualification

Procedia PDF Downloads 116

3147 Farmers’ Access to Agricultural Extension Services Delivery Systems: Evidence from a Field Study in India

Authors: Ankit Nagar, Dinesh Kumar Nauriyal, Sukhpal Singh

Abstract:

This paper examines the key determinants of farmers’ access to agricultural extension services, sources of agricultural extension services preferred and accessed by the farmers. An ordered logistic regression model was used to analyse the data of the 360 sample households based on a primary survey conducted in western Uttar Pradesh, India. The study finds that farmers' decision to engage in the agricultural extension programme is significantly influenced by factors such as education level, gender, farming experience, social group, group membership, farm size, credit access, awareness about the extension scheme, farmers' perception, and distance from extension sources. The most intriguing finding of this study is that the progressive farmers, which have long been regarded as a major source of knowledge diffusion, are the most distrusted sources of information as they are suspected of withholding vital information from potential beneficiaries. The positive relationship between farm size and ‘Access’ underlines that the extension services should revisit their strategies for targeting more marginal and small farmers constituting over 85 percent of the agricultural households by incorporating their priorities in their outreach programs. The study suggests that marginal and small farmers' productive potential could still be greatly augmented by the appropriate technology, advisory services, guidance, and improved market access. Also, the perception of poor quality of the public extension services can be corrected by initiatives aimed at building up extension workers' capacity.

Keywords: agriculture, access, extension services, ordered logistic regression

Procedia PDF Downloads 175

3146 Local Interpretable Model-agnostic Explanations (LIME) Approach to Email Spam Detection

Authors: Rohini Hariharan, Yazhini R., Blessy Maria Mathew

Abstract:

The task of detecting email spam is a very important one in the era of digital technology that needs effective ways of curbing unwanted messages. This paper presents an approach aimed at making email spam categorization algorithms transparent, reliable and more trustworthy by incorporating Local Interpretable Model-agnostic Explanations (LIME). Our technique assists in providing interpretable explanations for specific classifications of emails to help users understand the decision-making process by the model. In this study, we developed a complete pipeline that incorporates LIME into the spam classification framework and allows creating simplified, interpretable models tailored to individual emails. LIME identifies influential terms, pointing out key elements that drive classification results, thus reducing opacity inherent in conventional machine learning models. Additionally, we suggest a visualization scheme for displaying keywords that will improve understanding of categorization decisions by users. We test our method on a diverse email dataset and compare its performance with various baseline models, such as Gaussian Naive Bayes, Multinomial Naive Bayes, Bernoulli Naive Bayes, Support Vector Classifier, K-Nearest Neighbors, Decision Tree, and Logistic Regression. Our testing results show that our model surpasses all other models, achieving an accuracy of 96.59% and a precision of 99.12%.

Keywords: text classification, LIME (local interpretable model-agnostic explanations), stemming, tokenization, logistic regression.

Procedia PDF Downloads 15

3145 Effect of Genuine Missing Data Imputation on Prediction of Urinary Incontinence

Authors: Suzan Arslanturk, Mohammad-Reza Siadat, Theophilus Ogunyemi, Ananias Diokno

Abstract:

Missing data is a common challenge in statistical analyses of most clinical survey datasets. A variety of methods have been developed to enable analysis of survey data to deal with missing values. Imputation is the most commonly used among the above methods. However, in order to minimize the bias introduced due to imputation, one must choose the right imputation technique and apply it to the correct type of missing data. In this paper, we have identified different types of missing values: missing data due to skip pattern (SPMD), undetermined missing data (UMD), and genuine missing data (GMD) and applied rough set imputation on only the GMD portion of the missing data. We have used rough set imputation to evaluate the effect of such imputation on prediction by generating several simulation datasets based on an existing epidemiological dataset (MESA). To measure how well each dataset lends itself to the prediction model (logistic regression), we have used p-values from the Wald test. To evaluate the accuracy of the prediction, we have considered the width of 95% confidence interval for the probability of incontinence. Both imputed and non-imputed simulation datasets were fit to the prediction model, and they both turned out to be significant (p-value < 0.05). However, the Wald score shows a better fit for the imputed compared to non-imputed datasets (28.7 vs. 23.4). The average confidence interval width was decreased by 10.4% when the imputed dataset was used, meaning higher precision. The results show that using the rough set method for missing data imputation on GMD data improve the predictive capability of the logistic regression. Further studies are required to generalize this conclusion to other clinical survey datasets.

Keywords: rough set, imputation, clinical survey data simulation, genuine missing data, predictive index

Procedia PDF Downloads 137

3144 Establishment of a Nomogram Prediction Model for Postpartum Hemorrhage during Vaginal Delivery

Authors: Yinglisong, Jingge Chen, Jingxuan Chen, Yan Wang, Hui Huang, Jing Zhnag, Qianqian Zhang, Zhenzhen Zhang, Ji Zhang

Abstract:

Purpose: The study aims to establish a nomogram prediction model for postpartum hemorrhage (PPH) in vaginal delivery. Patients and Methods: Clinical data were retrospectively collected from vaginal delivery patients admitted to a hospital in Zhengzhou, China, from June 1, 2022 - October 31, 2022. Univariate and multivariate logistic regression were used to filter out independent risk factors. A nomogram model was established for PPH in vaginal delivery based on the risk factors coefficient. Bootstrapping was used for internal validation. To assess discrimination and calibration, receiver operator characteristics (ROC) and calibration curves were generated in the derivation and validation groups. Results: A total of 1340 cases of vaginal delivery were enrolled, with 81 (6.04%) having PPH. Logistic regression indicated that history of uterine surgery, induction of labor, duration of first labor, neonatal weight, WBC value (during the first stage of labor), and cervical lacerations were all independent risk factors of hemorrhage (P <0.05). The area-under-curve (AUC) of ROC curves of the derivation group and the validation group were 0.817 and 0.821, respectively, indicating good discrimination. Two calibration curves showed that nomogram prediction and practical results were highly consistent (P = 0.105, P = 0.113). Conclusion: The developed individualized risk prediction nomogram model can assist midwives in recognizing and diagnosing high-risk groups of PPH and initiating early warning to reduce PPH incidence.

Keywords: vaginal delivery, postpartum hemorrhage, risk factor, nomogram

Procedia PDF Downloads 36

3143 Unveiling Comorbidities in Irritable Bowel Syndrome: A UK BioBank Study utilizing Supervised Machine Learning

Authors: Uswah Ahmad Khan, Muhammad Moazam Fraz, Humayoon Shafique Satti, Qasim Aziz

Abstract:

Approximately 10-14% of the global population experiences a functional disorder known as irritable bowel syndrome (IBS). The disorder is defined by persistent abdominal pain and an irregular bowel pattern. IBS significantly impairs work productivity and disrupts patients' daily lives and activities. Although IBS is widespread, there is still an incomplete understanding of its underlying pathophysiology. This study aims to help characterize the phenotype of IBS patients by differentiating the comorbidities found in IBS patients from those in non-IBS patients using machine learning algorithms. In this study, we extracted samples coding for IBS from the UK BioBank cohort and randomly selected patients without a code for IBS to create a total sample size of 18,000. We selected the codes for comorbidities of these cases from 2 years before and after their IBS diagnosis and compared them to the comorbidities in the non-IBS cohort. Machine learning models, including Decision Trees, Gradient Boosting, Support Vector Machine (SVM), AdaBoost, Logistic Regression, and XGBoost, were employed to assess their accuracy in predicting IBS. The most accurate model was then chosen to identify the features associated with IBS. In our case, we used XGBoost feature importance as a feature selection method. We applied different models to the top 10% of features, which numbered 50. Gradient Boosting, Logistic Regression and XGBoost algorithms yielded a diagnosis of IBS with an optimal accuracy of 71.08%, 71.427%, and 71.53%, respectively. Among the comorbidities most closely associated with IBS included gut diseases (Haemorrhoids, diverticular diseases), atopic conditions(asthma), and psychiatric comorbidities (depressive episodes or disorder, anxiety). This finding emphasizes the need for a comprehensive approach when evaluating the phenotype of IBS, suggesting the possibility of identifying new subsets of IBS rather than relying solely on the conventional classification based on stool type. Additionally, our study demonstrates the potential of machine learning algorithms in predicting the development of IBS based on comorbidities, which may enhance diagnosis and facilitate better management of modifiable risk factors for IBS. Further research is necessary to confirm our findings and establish cause and effect. Alternative feature selection methods and even larger and more diverse datasets may lead to more accurate classification models. Despite these limitations, our findings highlight the effectiveness of Logistic Regression and XGBoost in predicting IBS diagnosis.

Keywords: comorbidities, disease association, irritable bowel syndrome (IBS), predictive analytics

Procedia PDF Downloads 84

3142 Exploring the Factors Affecting the Presence of Farmers’ Markets in Rural British Columbia

Authors: Amirmohsen Behjat, Aleck Ostry, Christina Miewald, Bernie Pauly

Abstract:

Farmers’ Markets have become one of the important healthy food suppliers in both rural communities and urban settings. Farmers’ markets are evolving and their number has rapidly increased in the past decade. Despite this drastic increase, the distribution of the farmers’ markets is not even across different areas. The main goal of this study is to explore the socioeconomic, geographic, and demographic variables which affect the establishment of farmers’ market in rural communities in British Columbia (BC). Thus, the data on available farmers’ markets in rural areas were collected from BC Association of Farmers’ Markets and spatially joined to BC map at Dissemination Area (DA) level using ArcGIS software to link the farmers’ market to the respective communities that they serve. Then, in order to investigate this issue and understand which rural communities farmer’ markets tend to operate, a binary logistic regression analysis was performed with the availability of farmer’ markets at DA-level as dependent variable and Deprivation Index (DI), Metro Influence Zone (MIZ) and population as independent variables. The results indicated that DI and MIZ variables are not statistically significant whereas the population is the only which had a significant contribution in predicting the availability of farmers’ markets in rural BC. Moreover, this study found that farmers’ markets usually do not operate in rural food deserts where other healthy food providers such as supermarkets and grocery stores are non-existent. In conclusion, the presence of farmers markets is not associated with socioeconomic and geographic characteristics of rural communities in BC, but farmers’ markets tend to operate in more populated rural communities in BC.

Keywords: farmers’ markets, socioeconomic and demographic variables, metro influence zone, logistic regression, ArcGIS

Procedia PDF Downloads 162

3141 Orthogonal Regression for Nonparametric Estimation of Errors-In-Variables Models

Authors: Anastasiia Yu. Timofeeva

Abstract:

Two new algorithms for nonparametric estimation of errors-in-variables models are proposed. The first algorithm is based on penalized regression spline. The spline is represented as a piecewise-linear function and for each linear portion orthogonal regression is estimated. This algorithm is iterative. The second algorithm involves locally weighted regression estimation. When the independent variable is measured with error such estimation is a complex nonlinear optimization problem. The simulation results have shown the advantage of the second algorithm under the assumption that true smoothing parameters values are known. Nevertheless the use of some indexes of fit to smoothing parameters selection gives the similar results and has an oversmoothing effect.

Keywords: grade point average, orthogonal regression, penalized regression spline, locally weighted regression

Procedia PDF Downloads 381

3140 Agroforestry Systems and Practices and Its Adoption in Kilombero Cluster of Sagcot, Tanzania

Authors: Lazaro E. Nnko, Japhet J. Kashaigili, Gerald C. Monela, Pantaleo K. T. Munishi

Abstract:

Agroforestry systems and practices are perceived to improve livelihood and sustainable management of natural resources. However, their adoption in various regions differs with the biophysical conditions and societal characteristics. This study was conducted in Kilombero District to investigate the factors influencing the adoption of different agroforestry systems and practices in agro-ecosystems and farming systems. A household survey, key informant interviews, and focus group discussion was used for data collection in three villages. Descriptive statistics and multinomial logistic regression in SPSS were applied for analysis. Results show that Igima and Ngajengwa villages had home garden practices dominated, as revealed by 63.3% and 66.7%, respectively, while Mbingu village had mixed intercropping practice with 56.67%. Agrosilvopasture systems were dominant in Igima and Ngajengwa villages with 56.7% and 66.7%, respectively, while in Mbingu village, the dominant system was agrosilviculture with 66.7%. The results from multinomial logistic regression show that different explanatory variable was statistical significance as predictors of the adoption of agroforestry systems and practices. Residence type and sex were the most dominant factor influencing the adoption of agroforestry systems. Duration of stay in the village, availability of extension education, residence, and sex were the dominant factor influencing the adoption of agroforestry practices. The most important and statistically significant factors among these were residence type and sex. The study concludes that agroforestry will be more successful if the local priorities, which include social-economic need characteristics of the society, will be considered in designing systems and practices. The socio-economic need of the community should be addressed in the process of expanding the adoption of agroforestry systems and practices.

Keywords: agroforestry adoption, agroforestry systems, agroforestry practices, agroforestry, Kilombero

Procedia PDF Downloads 82

3139 Cigarette Smoking and Alcohol Use among Mauritian Adolescents: Analysis of 2017 WHO Global School-Based Student Health Survey

Authors: Iyanujesu Adereti, Tajudeen Basiru, Ayodamola Olanipekun

Abstract:

Background: Substance abuse among adolescents is of public health concern globally. Despite being the most abused by adolescents, there are limited studies on the prevalence of alcohol use and cigarette smoking among adolescents in Mauritius. Objectives: To determine the prevalence of cigarette smoking, alcohol use and associated correlates among school-going adolescents in Mauritius. Methodology: Data obtained from 2017 WHO Global School-based Student Health Survey (GSHS) survey of 3,012 school-going adolescents in Mauritius was analyzed using STATA. Descriptive statistics were used to obtain prevalence. Bivariate and multivariate logistic regression analysis was used to evaluate predictors of cigarette smoking and alcohol use. Results: Prevalence of alcohol consumption and cigarette smoking were 26.0% and 17.1%, respectively. Smoking and alcohol use was more prevalent among males, younger adolescents, and those in higher school grades (p-value <.000). In multivariable logistic regression, male gender was associated with a higher risk of cigarette smoking (adjusted Odds Ratio (aOR) [95%Confidence Interval (CI)]= 1.51[1.06-2.14]) but lower risk of alcohol use (aOR[95%CI]= 0.69[0.53-0.90]) while older age (mid and late adolescence) and parental smoking were found to be associated with increased risk of alcohol use (aOR[95%CI]= 1.94[1.34-2.99] and 1.36[1.05-1.78] respectively). Marijuana use, truancy, being in a fight and suicide ideation were associated with increased odds of alcohol use (aOR[95%CI]= 3.82[3.39-6.09]; 2.15[1.62-2.87]; 1.83[1.34-2.49] and 1.93[1.38-2.69] respectively) and cigarette smoking (aOR[95%CI]= 17.28[10.4 - 28.51]; 1.73[1.21-2. 49]; 1.67[1.14-2.45] and 2.17[1.43-3.28] respectively) while involvement in sexual activity was associated with reduced risk of alcohol use (aOR[95%CI]= 0.50[0.37-0.68]) and cigarette smoking (aOR[95%CI]= 0.47[0.33-0.69]). Parental support and parental monitoring were uniquely associated with lower risk of cigarette smoking (aOR[95%CI]= 0.69[0.47-0.99] and 0.62[0.43-0.91] respectively). Conclusion: The high prevalence of alcohol use and cigarette smoking in this study shows the need for the government of Mauritius to enhance policies that will help address this issue putting into accounts the various risk and protective factors.

Keywords: adolescent health, alcohol use, cigarette smoking, global school-based student health survey

Procedia PDF Downloads 207