Search results for: Tobit regression model
18476 Predicting Football Player Performance: Integrating Data Visualization and Machine Learning
Authors: Saahith M. S., Sivakami R.
Abstract:
In the realm of football analytics, particularly focusing on predicting football player performance, the ability to forecast player success accurately is of paramount importance for teams, managers, and fans. This study introduces an elaborate examination of predicting football player performance through the integration of data visualization methods and machine learning algorithms. The research entails the compilation of an extensive dataset comprising player attributes, conducting data preprocessing, feature selection, model selection, and model training to construct predictive models. The analysis within this study will involve delving into feature significance using methodologies like Select Best and Recursive Feature Elimination (RFE) to pinpoint pertinent attributes for predicting player performance. Various machine learning algorithms, including Random Forest, Decision Tree, Linear Regression, Support Vector Regression (SVR), and Artificial Neural Networks (ANN), will be explored to develop predictive models. The evaluation of each model's performance utilizing metrics such as Mean Squared Error (MSE) and R-squared will be executed to gauge their efficacy in predicting player performance. Furthermore, this investigation will encompass a top player analysis to recognize the top-performing players based on the anticipated overall performance scores. Nationality analysis will entail scrutinizing the player distribution based on nationality and investigating potential correlations between nationality and player performance. Positional analysis will concentrate on examining the player distribution across various positions and assessing the average performance of players in each position. Age analysis will evaluate the influence of age on player performance and identify any discernible trends or patterns associated with player age groups. The primary objective is to predict a football player's overall performance accurately based on their individual attributes, leveraging data-driven insights to enrich the comprehension of player success on the field. By amalgamating data visualization and machine learning methodologies, the aim is to furnish valuable tools for teams, managers, and fans to effectively analyze and forecast player performance. This research contributes to the progression of sports analytics by showcasing the potential of machine learning in predicting football player performance and offering actionable insights for diverse stakeholders in the football industry.Keywords: football analytics, player performance prediction, data visualization, machine learning algorithms, random forest, decision tree, linear regression, support vector regression, artificial neural networks, model evaluation, top player analysis, nationality analysis, positional analysis
Procedia PDF Downloads 3818475 Analytical Modelling of Surface Roughness during Compacted Graphite Iron Milling Using Ceramic Inserts
Authors: Ş. Karabulut, A. Güllü, A. Güldaş, R. Gürbüz
Abstract:
This study investigates the effects of the lead angle and chip thickness variation on surface roughness during the machining of compacted graphite iron using ceramic cutting tools under dry cutting conditions. Analytical models were developed for predicting the surface roughness values of the specimens after the face milling process. Experimental data was collected and imported to the artificial neural network model. A multilayer perceptron model was used with the back propagation algorithm employing the input parameters of lead angle, cutting speed and feed rate in connection with chip thickness. Furthermore, analysis of variance was employed to determine the effects of the cutting parameters on surface roughness. Artificial neural network and regression analysis were used to predict surface roughness. The values thus predicted were compared with the collected experimental data, and the corresponding percentage error was computed. Analysis results revealed that the lead angle is the dominant factor affecting surface roughness. Experimental results indicated an improvement in the surface roughness value with decreasing lead angle value from 88° to 45°.Keywords: CGI, milling, surface roughness, ANN, regression, modeling, analysis
Procedia PDF Downloads 44818474 Use of Front-Face Fluorescence Spectroscopy and Multiway Analysis for the Prediction of Olive Oil Quality Features
Authors: Omar Dib, Rita Yaacoub, Luc Eveleigh, Nathalie Locquet, Hussein Dib, Ali Bassal, Christophe B. Y. Cordella
Abstract:
The potential of front-face fluorescence coupled with chemometric techniques, namely parallel factor analysis (PARAFAC) and multiple linear regression (MLR) as a rapid analysis tool to characterize Lebanese virgin olive oils was investigated. Fluorescence fingerprints were acquired directly on 102 Lebanese virgin olive oil samples in the range of 280-540 nm in excitation and 280-700 nm in emission. A PARAFAC model with seven components was considered optimal with a residual of 99.64% and core consistency value of 78.65. The model revealed seven main fluorescence profiles in olive oil and was mainly associated with tocopherols, polyphenols, chlorophyllic compounds and oxidation/hydrolysis products. 23 MLR regression models based on PARAFAC scores were generated, the majority of which showed a good correlation coefficient (R > 0.7 for 12 predicted variables), thus satisfactory prediction performances. Acid values, peroxide values, and Delta K had the models with the highest predictions, with R values of 0.89, 0.84 and 0.81 respectively. Among fatty acids, linoleic and oleic acids were also highly predicted with R values of 0.8 and 0.76, respectively. Factors contributing to the model's construction were related to common fluorophores found in olive oil, mainly chlorophyll, polyphenols, and oxidation products. This study demonstrates the interest of front-face fluorescence as a promising tool for quality control of Lebanese virgin olive oils.Keywords: front-face fluorescence, Lebanese virgin olive oils, multiple Linear regressions, PARAFAC analysis
Procedia PDF Downloads 45218473 Automatic API Regression Analyzer and Executor
Authors: Praveena Sridhar, Nihar Devathi, Parikshit Chakraborty
Abstract:
As the software product changes versions across releases, there are changes to the API’s and features and the upgrades become necessary. Hence, it becomes imperative to get the impact of upgrading the dependent components. This tool finds out API changes across two versions and their impact on other API’s followed by execution of the automated regression suites relevant to updates and their impacted areas. This tool has 4 layer architecture, each layer with its own unique pre-assigned capability which it does and sends the required information to next layer. This are the 4 layers. 1) Comparator: Compares the two versions of API. 2) Analyzer: Analyses the API doc and gives the modified class and its dependencies along with implemented interface details. 3) Impact Filter: Find the impact of the modified class on the other API methods. 4) Auto Executer: Based on the output given by Impact Filter, Executor will run the API regression Suite. Tool reads the java doc and extracts the required information of classes, interfaces and enumerations. The extracted information is saved into a data structure which shows the class details and its dependencies along with interfaces and enumerations that are listed in the java doc.Keywords: automation impact regression, java doc, executor, analyzer, layers
Procedia PDF Downloads 48818472 Gender Estimation by Means of Quantitative Measurements of Foramen Magnum: An Analysis of CT Head Images
Authors: Thilini Hathurusinghe, Uthpalie Siriwardhana, W. M. Ediri Arachchi, Ranga Thudugala, Indeewari Herath, Gayani Senanayake
Abstract:
The foramen magnum is more prone to protect than other skeletal remains during high impact and severe disruptive injuries. Therefore, it is worthwhile to explore whether these measurements can be used to determine the human gender which is vital in forensic and anthropological studies. The idea was to find out the ability to use quantitative measurements of foramen magnum as an anatomical indicator for human gender estimation and to evaluate the gender-dependent variations of foramen magnum using quantitative measurements. Randomly selected 113 subjects who underwent CT head scans at Sri Jayawardhanapura General Hospital of Sri Lanka within a period of six months, were included in the study. The sample contained 58 males (48.76 ± 14.7 years old) and 55 females (47.04 ±15.9 years old). Maximum length of the foramen magnum (LFM), maximum width of the foramen magnum (WFM), minimum distance between occipital condyles (MnD) and maximum interior distance between occipital condyles (MxID) were measured. Further, AreaT and AreaR were also calculated. The gender was estimated using binomial logistic regression. The mean values of all explanatory variables (LFM, WFM, MnD, MxID, AreaT, and AreaR) were greater among male than female. All explanatory variables except MnD (p=0.669) were statistically significant (p < 0.05). Significant bivariate correlations were demonstrated by AreaT and AreaR with the explanatory variables. The results evidenced that WFM and MxID were the best measurements in predicting gender according to binomial logistic regression. The estimated model was: log (p/1-p) =10.391-0.136×MxID-0.231×WFM, where p is the probability of being a female. The classification accuracy given by the above model was 65.5%. The quantitative measurements of foramen magnum can be used as a reliable anatomical marker for human gender estimation in the Sri Lankan context.Keywords: foramen magnum, forensic and anthropological studies, gender estimation, logistic regression
Procedia PDF Downloads 15118471 A Predictive Machine Learning Model of the Survival of Female-led and Co-Led Small and Medium Enterprises in the UK
Authors: Mais Khader, Xingjie Wei
Abstract:
This research sheds light on female entrepreneurs by providing new insights on the survival predictions of companies led by females in the UK. This study aims to build a predictive machine learning model of the survival of female-led & co-led small & medium enterprises (SMEs) in the UK over the period 2000-2020. The predictive model built utilised a combination of financial and non-financial features related to both companies and their directors to predict SMEs' survival. These features were studied in terms of their contribution to the resultant predictive model. Five machine learning models are used in the modelling: Decision tree, AdaBoost, Naïve Bayes, Logistic regression and SVM. The AdaBoost model had the highest performance of the five models, with an accuracy of 73% and an AUC of 80%. The results show high feature importance in predicting companies' survival for company size, management experience, financial performance, industry, region, and females' percentage in management.Keywords: company survival, entrepreneurship, females, machine learning, SMEs
Procedia PDF Downloads 10118470 Developing and Evaluating Clinical Risk Prediction Models for Coronary Artery Bypass Graft Surgery
Authors: Mohammadreza Mohebbi, Masoumeh Sanagou
Abstract:
The ability to predict clinical outcomes is of great importance to physicians and clinicians. A number of different methods have been used in an effort to accurately predict these outcomes. These methods include the development of scoring systems based on multivariate statistical modelling, and models involving the use of classification and regression trees. The process usually consists of two consecutive phases, namely model development and external validation. The model development phase consists of building a multivariate model and evaluating its predictive performance by examining calibration and discrimination, and internal validation. External validation tests the predictive performance of a model by assessing its calibration and discrimination in different but plausibly related patients. A motivate example focuses on prediction modeling using a sample of patients undergone coronary artery bypass graft (CABG) has been used for illustrative purpose and a set of primary considerations for evaluating prediction model studies using specific quality indicators as criteria to help stakeholders evaluate the quality of a prediction model study has been proposed.Keywords: clinical prediction models, clinical decision rule, prognosis, external validation, model calibration, biostatistics
Procedia PDF Downloads 29718469 Using Machine Learning to Enhance Win Ratio for College Ice Hockey Teams
Authors: Sadixa Sanjel, Ahmed Sadek, Naseef Mansoor, Zelalem Denekew
Abstract:
Collegiate ice hockey (NCAA) sports analytics is different from the national level hockey (NHL). We apply and compare multiple machine learning models such as Linear Regression, Random Forest, and Neural Networks to predict the win ratio for a team based on their statistics. Data exploration helps determine which statistics are most useful in increasing the win ratio, which would be beneficial to coaches and team managers. We ran experiments to select the best model and chose Random Forest as the best performing. We conclude with how to bridge the gap between the college and national levels of sports analytics and the use of machine learning to enhance team performance despite not having a lot of metrics or budget for automatic tracking.Keywords: NCAA, NHL, sports analytics, random forest, regression, neural networks, game predictions
Procedia PDF Downloads 11418468 Electroencephalogram Based Approach for Mental Stress Detection during Gameplay with Level Prediction
Authors: Priyadarsini Samal, Rajesh Singla
Abstract:
Many mobile games come with the benefits of entertainment by introducing stress to the human brain. In recognizing this mental stress, the brain-computer interface (BCI) plays an important role. It has various neuroimaging approaches which help in analyzing the brain signals. Electroencephalogram (EEG) is the most commonly used method among them as it is non-invasive, portable, and economical. Here, this paper investigates the pattern in brain signals when introduced with mental stress. Two healthy volunteers played a game whose aim was to search hidden words from the grid, and the levels were chosen randomly. The EEG signals during gameplay were recorded to investigate the impacts of stress with the changing levels from easy to medium to hard. A total of 16 features of EEG were analyzed for this experiment which includes power band features with relative powers, event-related desynchronization, along statistical features. Support vector machine was used as the classifier, which resulted in an accuracy of 93.9% for three-level stress analysis; for two levels, the accuracy of 92% and 98% are achieved. In addition to that, another game that was similar in nature was played by the volunteers. A suitable regression model was designed for prediction where the feature sets of the first and second game were used for testing and training purposes, respectively, and an accuracy of 73% was found.Keywords: brain computer interface, electroencephalogram, regression model, stress, word search
Procedia PDF Downloads 18718467 Analytical Authentication of Butter Using Fourier Transform Infrared Spectroscopy Coupled with Chemometrics
Authors: M. Bodner, M. Scampicchio
Abstract:
Fourier Transform Infrared (FT-IR) spectroscopy coupled with chemometrics was used to distinguish between butter samples and non-butter samples. Further, quantification of the content of margarine in adulterated butter samples was investigated. Fingerprinting region (1400-800 cm–1) was used to develop unsupervised pattern recognition (Principal Component Analysis, PCA), supervised modeling (Soft Independent Modelling by Class Analogy, SIMCA), classification (Partial Least Squares Discriminant Analysis, PLS-DA) and regression (Partial Least Squares Regression, PLS-R) models. PCA of the fingerprinting region shows a clustering of the two sample types. All samples were classified in their rightful class by SIMCA approach; however, nine adulterated samples (between 1% and 30% w/w of margarine) were classified as belonging both at the butter class and at the non-butter one. In the two-class PLS-DA model’s (R2 = 0.73, RMSEP, Root Mean Square Error of Prediction = 0.26% w/w) sensitivity was 71.4% and Positive Predictive Value (PPV) 100%. Its threshold was calculated at 7% w/w of margarine in adulterated butter samples. Finally, PLS-R model (R2 = 0.84, RMSEP = 16.54%) was developed. PLS-DA was a suitable classification tool and PLS-R a proper quantification approach. Results demonstrate that FT-IR spectroscopy combined with PLS-R can be used as a rapid, simple and safe method to identify pure butter samples from adulterated ones and to determine the grade of adulteration of margarine in butter samples.Keywords: adulterated butter, margarine, PCA, PLS-DA, PLS-R, SIMCA
Procedia PDF Downloads 14318466 The Labor Participation–Fertility Trade-off: The Case of the Philippines
Authors: Daphne Ashley Sze, Kenneth Santos, Ariane Gabrielle Lim
Abstract:
As women are now given more freedom and choice to pursue employment, the world’s over-all fertility has been decreasing mainly due to the shift in time allocation between working and child rearing. As such, we study the case of the Philippines, where there exists a decreasing fertility rate and increasing openness for women labor participation. We focused on the distinction between fertility and fecundity, the former being the manifestation of the latter and aim to trace and compare the effects of both fecundity and fertility to women’s employment status through the estimation of the reproduction function and multinomial logistic function. Findings suggest that the perception of women regarding employment opportunities in the Philippines links the negative relationship observed between fertility, fecundity and women’s employment status. Today, there has been a convergence in the traditional family roles of men and women, as both genders now have identical employment opportunities that continue to shape their preferences.Keywords: multinomial logistic function, tobit, fertility, women employment status, fecundity
Procedia PDF Downloads 60618465 Bayesian Variable Selection in Quantile Regression with Application to the Health and Retirement Study
Authors: Priya Kedia, Kiranmoy Das
Abstract:
There is a rich literature on variable selection in regression setting. However, most of these methods assume normality for the response variable under consideration for implementing the methodology and establishing the statistical properties of the estimates. In many real applications, the distribution for the response variable may be non-Gaussian, and one might be interested in finding the best subset of covariates at some predetermined quantile level. We develop dynamic Bayesian approach for variable selection in quantile regression framework. We use a zero-inflated mixture prior for the regression coefficients, and consider the asymmetric Laplace distribution for the response variable for modeling different quantiles of its distribution. An efficient Gibbs sampler is developed for our computation. Our proposed approach is assessed through extensive simulation studies, and real application of the proposed approach is also illustrated. We consider the data from health and retirement study conducted by the University of Michigan, and select the important predictors when the outcome of interest is out-of-pocket medical cost, which is considered as an important measure for financial risk. Our analysis finds important predictors at different quantiles of the outcome, and thus enhance our understanding on the effects of different predictors on the out-of-pocket medical cost.Keywords: variable selection, quantile regression, Gibbs sampler, asymmetric Laplace distribution
Procedia PDF Downloads 15618464 The Labor Participation-Fertility Trade-Off: Exploring Fecundity and Its Consequences to Women's Employment in the Philippines
Authors: Ariane C. Lim, Daphne Ashley L. Sze, Kenneth S. Santos
Abstract:
As women are now given more freedom and choice to pursue employment, the world’s over-all fertility has been decreasing mainly due to the shift in time allocation between working and child-rearing. As such, we study the case of the Philippines, where there exists a decreasing fertility rate and increasing openness for women labor participation. We focused on the distinction between fertility and fecundity, the former being the manifestation of the latter and aim to trace and compare the effects of both fecundity and fertility to women’s employment status through the estimation of the reproduction function and multinomial logistic function. Findings suggest that the perception of women regarding employment opportunities in the Philippines links the negative relationship observed between fertility, fecundity and women’s employment status. Today, there has been a convergence in the traditional family roles of men and women, as both genders now have identical employment opportunities that continue to shape their preferences.Keywords: multinomial logistic function, tobit, fertility, women employment status, fecundity
Procedia PDF Downloads 62918463 Partial Least Square Regression for High-Dimentional and High-Correlated Data
Authors: Mohammed Abdullah Alshahrani
Abstract:
The research focuses on investigating the use of partial least squares (PLS) methodology for addressing challenges associated with high-dimensional correlated data. Recent technological advancements have led to experiments producing data characterized by a large number of variables compared to observations, with substantial inter-variable correlations. Such data patterns are common in chemometrics, where near-infrared (NIR) spectrometer calibrations record chemical absorbance levels across hundreds of wavelengths, and in genomics, where thousands of genomic regions' copy number alterations (CNA) are recorded from cancer patients. PLS serves as a widely used method for analyzing high-dimensional data, functioning as a regression tool in chemometrics and a classification method in genomics. It handles data complexity by creating latent variables (components) from original variables. However, applying PLS can present challenges. The study investigates key areas to address these challenges, including unifying interpretations across three main PLS algorithms and exploring unusual negative shrinkage factors encountered during model fitting. The research presents an alternative approach to addressing the interpretation challenge of predictor weights associated with PLS. Sparse estimation of predictor weights is employed using a penalty function combining a lasso penalty for sparsity and a Cauchy distribution-based penalty to account for variable dependencies. The results demonstrate sparse and grouped weight estimates, aiding interpretation and prediction tasks in genomic data analysis. High-dimensional data scenarios, where predictors outnumber observations, are common in regression analysis applications. Ordinary least squares regression (OLS), the standard method, performs inadequately with high-dimensional and highly correlated data. Copy number alterations (CNA) in key genes have been linked to disease phenotypes, highlighting the importance of accurate classification of gene expression data in bioinformatics and biology using regularized methods like PLS for regression and classification.Keywords: partial least square regression, genetics data, negative filter factors, high dimensional data, high correlated data
Procedia PDF Downloads 4918462 Currency Exchange Rate Forecasts Using Quantile Regression
Authors: Yuzhi Cai
Abstract:
In this paper, we discuss a Bayesian approach to quantile autoregressive (QAR) time series model estimation and forecasting. Together with a combining forecasts technique, we then predict USD to GBP currency exchange rates. Combined forecasts contain all the information captured by the fitted QAR models at different quantile levels and are therefore better than those obtained from individual models. Our results show that an unequally weighted combining method performs better than other forecasting methodology. We found that a median AR model can perform well in point forecasting when the predictive density functions are symmetric. However, in practice, using the median AR model alone may involve the loss of information about the data captured by other QAR models. We recommend that combined forecasts should be used whenever possible.Keywords: combining forecasts, MCMC, predictive density functions, quantile forecasting, quantile modelling
Procedia PDF Downloads 25618461 The Impact of Insider Trading on Open Market Share Repurchase: A Study in Indian Context
Authors: Sarthak Kumar Jena, Chandra Sekhar Mishra, Prabina Rajib
Abstract:
Purpose: This paper aims to derive undervaluation signal from the insiders trading of Indian companies where the ownership is complex and concentrated, investors protection is weak, and the insider rules and regulations are not stringent like developed country. This study examines the relationship between insider trading with short term and long term abnormal return. The study also examines the relationship between insider trading and the actual share repurchase by the firm. Methodology: A sample of 78 companies over the period 2008-2013 are analyzed in the study due to not availability of insider data in Indian context. For preliminary analysis T-test and Wilcoxon rank sum test is used to find the difference between the insider trading before and after the share repurchase announcement. Tobit model is used to find out whether insider trading influence shares repurchase decisions or not. Return on the basis of market model and buy hold are calculated in the previous year and the following year of share repurchase announcement. Findings: The paper finds that insider trading around share repurchase is more than control firms and there is positive and significant difference in insider buying between the previous year of share buyback announcement and the following year of buyback announcement. Insider buying before share repurchase announcement has a positive influence on share repurchase decisions. We find insider buying has a positive and significant relationship with announcement return, whereas insider selling has a negative significant relationship with announcement return. Actual share repurchase and program completion also depend on insider trading before share repurchase. Research limitation: The study is constrained by the small sample size, so the results should be viewed by keeping this limitation in mind. Originality: The paper is to our best knowledge the first study based on Indian context to extend the insider trading literature to share repurchase event and examine insider trading to find out undervaluation signal associated with insider buying.Keywords: insider trading, buyback, open market share repurchase, signalling
Procedia PDF Downloads 19918460 Exploring Factors Affecting Electricity Production in Malaysia
Authors: Endang Jati Mat Sahid, Hussain Ali Bekhet
Abstract:
Ability to supply reliable and secure electricity has been one of the crucial components of economic development for any country. Forecasting of electricity production is therefore very important for accurate investment planning of generation power plants. In this study, we aim to examine and analyze the factors that affect electricity generation. Multiple regression models were used to find the relationship between various variables and electricity production. The models will simultaneously determine the effects of the variables on electricity generation. Many variables influencing electricity generation, i.e. natural gas (NG), coal (CO), fuel oil (FO), renewable energy (RE), gross domestic product (GDP) and fuel prices (FP), were examined for Malaysia. The results demonstrate that NG, CO, and FO were the main factors influencing electricity generation growth. This study then identified a number of policy implications resulting from the empirical results.Keywords: energy policy, energy security, electricity production, Malaysia, the regression model
Procedia PDF Downloads 16318459 Machine Learning Approach for Stress Detection Using Wireless Physical Activity Tracker
Authors: B. Padmaja, V. V. Rama Prasad, K. V. N. Sunitha, E. Krishna Rao Patro
Abstract:
Stress is a psychological condition that reduces the quality of sleep and affects every facet of life. Constant exposure to stress is detrimental not only for mind but also body. Nevertheless, to cope with stress, one should first identify it. This paper provides an effective method for the cognitive stress level detection by using data provided from a physical activity tracker device Fitbit. This device gathers people’s daily activities of food, weight, sleep, heart rate, and physical activities. In this paper, four major stressors like physical activities, sleep patterns, working hours and change in heart rate are used to assess the stress levels of individuals. The main motive of this system is to use machine learning approach in stress detection with the help of Smartphone sensor technology. Individually, the effect of each stressor is evaluated using logistic regression and then combined model is built and assessed using variants of ordinal logistic regression models like logit, probit and complementary log-log. Then the quality of each model is evaluated using Akaike Information Criterion (AIC) and probit is assessed as the more suitable model for our dataset. This system is experimented and evaluated in a real time environment by taking data from adults working in IT and other sectors in India. The novelty of this work lies in the fact that stress detection system should be less invasive as possible for the users.Keywords: physical activity tracker, sleep pattern, working hours, heart rate, smartphone sensor
Procedia PDF Downloads 25618458 Stock Prediction and Portfolio Optimization Thesis
Authors: Deniz Peksen
Abstract:
This thesis aims to predict trend movement of closing price of stock and to maximize portfolio by utilizing the predictions. In this context, the study aims to define a stock portfolio strategy from models created by using Logistic Regression, Gradient Boosting and Random Forest. Recently, predicting the trend of stock price has gained a significance role in making buy and sell decisions and generating returns with investment strategies formed by machine learning basis decisions. There are plenty of studies in the literature on the prediction of stock prices in capital markets using machine learning methods but most of them focus on closing prices instead of the direction of price trend. Our study differs from literature in terms of target definition. Ours is a classification problem which is focusing on the market trend in next 20 trading days. To predict trend direction, fourteen years of data were used for training. Following three years were used for validation. Finally, last three years were used for testing. Training data are between 2002-06-18 and 2016-12-30 Validation data are between 2017-01-02 and 2019-12-31 Testing data are between 2020-01-02 and 2022-03-17 We determine Hold Stock Portfolio, Best Stock Portfolio and USD-TRY Exchange rate as benchmarks which we should outperform. We compared our machine learning basis portfolio return on test data with return of Hold Stock Portfolio, Best Stock Portfolio and USD-TRY Exchange rate. We assessed our model performance with the help of roc-auc score and lift charts. We use logistic regression, Gradient Boosting and Random Forest with grid search approach to fine-tune hyper-parameters. As a result of the empirical study, the existence of uptrend and downtrend of five stocks could not be predicted by the models. When we use these predictions to define buy and sell decisions in order to generate model-based-portfolio, model-based-portfolio fails in test dataset. It was found that Model-based buy and sell decisions generated a stock portfolio strategy whose returns can not outperform non-model portfolio strategies on test dataset. We found that any effort for predicting the trend which is formulated on stock price is a challenge. We found same results as Random Walk Theory claims which says that stock price or price changes are unpredictable. Our model iterations failed on test dataset. Although, we built up several good models on validation dataset, we failed on test dataset. We implemented Random Forest, Gradient Boosting and Logistic Regression. We discovered that complex models did not provide advantage or additional performance while comparing them with Logistic Regression. More complexity did not lead us to reach better performance. Using a complex model is not an answer to figure out the stock-related prediction problem. Our approach was to predict the trend instead of the price. This approach converted our problem into classification. However, this label approach does not lead us to solve the stock prediction problem and deny or refute the accuracy of the Random Walk Theory for the stock price.Keywords: stock prediction, portfolio optimization, data science, machine learning
Procedia PDF Downloads 8018457 Impact Factor Analysis for Spatially Varying Aerosol Optical Depth in Wuhan Agglomeration
Authors: Wenting Zhang, Shishi Liu, Peihong Fu
Abstract:
As an indicator of air quality and directly related to concentration of ground PM2.5, the spatial-temporal variation and impact factor analysis of Aerosol Optical Depth (AOD) have been a hot spot in air pollution. This paper concerns the non-stationarity and the autocorrelation (with Moran’s I index of 0.75) of the AOD in Wuhan agglomeration (WHA), in central China, uses the geographically weighted regression (GRW) to identify the spatial relationship of AOD and its impact factors. The 3 km AOD product of Moderate Resolution Imaging Spectrometer (MODIS) is used in this study. Beyond the economic-social factor, land use density factors, vegetable cover, and elevation, the landscape metric is also considered as one factor. The results suggest that the GWR model is capable of dealing with spatial varying relationship, with R square, corrected Akaike Information Criterion (AICc) and standard residual better than that of ordinary least square (OLS) model. The results of GWR suggest that the urban developing, forest, landscape metric, and elevation are the major driving factors of AOD. Generally, the higher AOD trends to located in the place with higher urban developing, less forest, and flat area.Keywords: aerosol optical depth, geographically weighted regression, land use change, Wuhan agglomeration
Procedia PDF Downloads 35718456 Development of a Data-Driven Method for Diagnosing the State of Health of Battery Cells, Based on the Use of an Electrochemical Aging Model, with a View to Their Use in Second Life
Authors: Desplanches Maxime
Abstract:
Accurate estimation of the remaining useful life of lithium-ion batteries for electronic devices is crucial. Data-driven methodologies encounter challenges related to data volume and acquisition protocols, particularly in capturing a comprehensive range of aging indicators. To address these limitations, we propose a hybrid approach that integrates an electrochemical model with state-of-the-art data analysis techniques, yielding a comprehensive database. Our methodology involves infusing an aging phenomenon into a Newman model, leading to the creation of an extensive database capturing various aging states based on non-destructive parameters. This database serves as a robust foundation for subsequent analysis. Leveraging advanced data analysis techniques, notably principal component analysis and t-Distributed Stochastic Neighbor Embedding, we extract pivotal information from the data. This information is harnessed to construct a regression function using either random forest or support vector machine algorithms. The resulting predictor demonstrates a 5% error margin in estimating remaining battery life, providing actionable insights for optimizing usage. Furthermore, the database was built from the Newman model calibrated for aging and performance using data from a European project called Teesmat. The model was then initialized numerous times with different aging values, for instance, with varying thicknesses of SEI (Solid Electrolyte Interphase). This comprehensive approach ensures a thorough exploration of battery aging dynamics, enhancing the accuracy and reliability of our predictive model. Of particular importance is our reliance on the database generated through the integration of the electrochemical model. This database serves as a crucial asset in advancing our understanding of aging states. Beyond its capability for precise remaining life predictions, this database-driven approach offers valuable insights for optimizing battery usage and adapting the predictor to various scenarios. This underscores the practical significance of our method in facilitating better decision-making regarding lithium-ion battery management.Keywords: Li-ion battery, aging, diagnostics, data analysis, prediction, machine learning, electrochemical model, regression
Procedia PDF Downloads 6918455 An Application of the Single Equation Regression Model
Authors: S. K. Ashiquer Rahman
Abstract:
Recently, oil has become more influential in almost every economic sector as a key material. As can be seen from the news, when there are some changes in an oil price or OPEC announces a new strategy, its effect spreads to every part of the economy directly and indirectly. That’s a reason why people always observe the oil price and try to forecast the changes of it. The most important factor affecting the price is its supply which is determined by the number of wildcats drilled. Therefore, a study about the number of wellheads and other economic variables may give us some understanding of the mechanism indicated by the amount of oil supplies. In this paper, we will consider a relationship between the number of wellheads and three key factors: the price of the wellhead, domestic output, and GNP constant dollars. We also add trend variables in the models because the consumption of oil varies from time to time. Moreover, this paper will use an econometrics method to estimate parameters in the model, apply some tests to verify the result we acquire, and then conclude the model.Keywords: price, domestic output, GNP, trend variable, wildcat activity
Procedia PDF Downloads 6218454 Use of Protection Motivation Theory to Assess Preventive Behaviors of COVID-19
Authors: Maryam Khazaee-Pool, Tahereh Pashaei, Koen Ponnet
Abstract:
Background: The global prevalence and morbidity of Coronavirus disease 2019 (COVID-19) are high. Preventive behaviors are proven to reduce the damage caused by the disease. There is a paucity of information on determinants of preventive behaviors in response to COVID-19 in Mazandaran province, north of Iran. So, we aimed to evaluate the protection motivation theory (PMT) in promoting preventive behaviors of COVID-19 in Mazandaran province. Materials and Methods: In this descriptive cross-sectional study, 1220 individuals participated. They were selected via social networks using convenience sampling in 2020. Data were collected online using a demographic questionnaire and a valid and reliable scale based on PMT. Data analysis was done using the Pearson correlation coefficient and linear regression in SPSS V24. Result: The mean age of the participants was 39.34±8.74 years. The regression model showed perceived threat (ß =0.033, P =0.007), perceived costs (ß=0.039, P=0.045), perceived self-efficacy (ß =0.116, P>0.001), and perceived fear (ß=0.131, P>0.001) as the significant predictors of COVID-19 preventive behaviors. This model accounted for 78% of the variance in these behaviors. Conclusion: According to constructs of the PMT associated with protection against COVID-19, educational programs and health promotion based on the theory and benefiting from social networks could be helpful in increasing the motivation of people towards protective behaviors against COVID-19.Keywords: questionnaire development, validation, intention, prevention, covid-19
Procedia PDF Downloads 4218453 Modeling Karachi Dengue Outbreak and Exploration of Climate Structure
Authors: Syed Afrozuddin Ahmed, Junaid Saghir Siddiqi, Sabah Quaiser
Abstract:
Various studies have reported that global warming causes unstable climate and many serious impact to physical environment and public health. The increasing incidence of dengue incidence is now a priority health issue and become a health burden of Pakistan. In this study it has been investigated that spatial pattern of environment causes the emergence or increasing rate of dengue fever incidence that effects the population and its health. The climatic or environmental structure data and the Dengue Fever (DF) data was processed by coding, editing, tabulating, recoding, restructuring in terms of re-tabulating was carried out, and finally applying different statistical methods, techniques, and procedures for the evaluation. Five climatic variables which we have studied are precipitation (P), Maximum temperature (Mx), Minimum temperature (Mn), Humidity (H) and Wind speed (W) collected from 1980-2012. The dengue cases in Karachi from 2010 to 2012 are reported on weekly basis. Principal component analysis is applied to explore the climatic variables and/or the climatic (structure) which may influence in the increase or decrease in the number of dengue fever cases in Karachi. PC1 for all the period is General atmospheric condition. PC2 for dengue period is contrast between precipitation and wind speed. PC3 is the weighted difference between maximum temperature and wind speed. PC4 for dengue period contrast between maximum and wind speed. Negative binomial and Poisson regression model are used to correlate the dengue fever incidence to climatic variable and principal component score. Relative humidity is estimated to positively influence on the chances of dengue occurrence by 1.71% times. Maximum temperature positively influence on the chances dengue occurrence by 19.48% times. Minimum temperature affects positively on the chances of dengue occurrence by 11.51% times. Wind speed is effecting negatively on the weekly occurrence of dengue fever by 7.41% times.Keywords: principal component analysis, dengue fever, negative binomial regression model, poisson regression model
Procedia PDF Downloads 44518452 On Estimating the Headcount Index by Using the Logistic Regression Estimator
Authors: Encarnación Álvarez, Rosa M. García-Fernández, Juan F. Muñoz, Francisco J. Blanco-Encomienda
Abstract:
The problem of estimating a proportion has important applications in the field of economics, and in general, in many areas such as social sciences. A common application in economics is the estimation of the headcount index. In this paper, we define the general headcount index as a proportion. Furthermore, we introduce a new quantitative method for estimating the headcount index. In particular, we suggest to use the logistic regression estimator for the problem of estimating the headcount index. Assuming a real data set, results derived from Monte Carlo simulation studies indicate that the logistic regression estimator can be more accurate than the traditional estimator of the headcount index.Keywords: poverty line, poor, risk of poverty, Monte Carlo simulations, sample
Procedia PDF Downloads 42318451 On Reliability of a Credit Default Swap Contract during the EMU Debt Crisis
Authors: Petra Buzkova, Milos Kopa
Abstract:
Reliability of the credit default swap market had been questioned repeatedly during the EMU debt crisis. The article examines whether this development influenced sovereign EMU CDS prices in general. We regress the CDS market price on a model risk neutral CDS price obtained from an adopted reduced form valuation model in the 2009-2013 period. We look for a break point in the single-equation and multi-equation econometric models in order to show the changes in relations between CDS market and model prices. Our results differ according to the risk profile of a country. We find that in the case of riskier countries, the relationship between the market and model price changed when market participants started to question the ability of CDS contracts to protect their buyers. Specifically, it weakened after the change. In the case of less risky countries, the change happened earlier and the effect of a weakened relationship is not observed.Keywords: chow stability test, credit default swap, debt crisis, reduced form valuation model, seemingly unrelated regression
Procedia PDF Downloads 26218450 Development and Validation of a Coronary Heart Disease Risk Score in Indian Type 2 Diabetes Mellitus Patients
Authors: Faiz N. K. Yusufi, Aquil Ahmed, Jamal Ahmad
Abstract:
Diabetes in India is growing at an alarming rate and the complications caused by it need to be controlled. Coronary heart disease (CHD) is one of the complications that will be discussed for prediction in this study. India has the second most number of diabetes patients in the world. To the best of our knowledge, there is no CHD risk score for Indian type 2 diabetes patients. Any form of CHD has been taken as the event of interest. A sample of 750 was determined and randomly collected from the Rajiv Gandhi Centre for Diabetes and Endocrinology, J.N.M.C., A.M.U., Aligarh, India. Collected variables include patients data such as sex, age, height, weight, body mass index (BMI), blood sugar fasting (BSF), post prandial sugar (PP), glycosylated haemoglobin (HbA1c), diastolic blood pressure (DBP), systolic blood pressure (SBP), smoking, alcohol habits, total cholesterol (TC), triglycerides (TG), high density lipoprotein (HDL), low density lipoprotein (LDL), very low density lipoprotein (VLDL), physical activity, duration of diabetes, diet control, history of antihypertensive drug treatment, family history of diabetes, waist circumference, hip circumference, medications, central obesity and history of CHD. Predictive risk scores of CHD events are designed by cox proportional hazard regression. Model calibration and discrimination is assessed from Hosmer Lemeshow and area under receiver operating characteristic (ROC) curve. Overfitting and underfitting of the model is checked by applying regularization techniques and best method is selected between ridge, lasso and elastic net regression. Youden’s index is used to choose the optimal cut off point from the scores. Five year probability of CHD is predicted by both survival function and Markov chain two state model and the better technique is concluded. The risk scores for CHD developed can be calculated by doctors and patients for self-control of diabetes. Furthermore, the five-year probabilities can be implemented as well to forecast and maintain the condition of patients.Keywords: coronary heart disease, cox proportional hazard regression, ROC curve, type 2 diabetes Mellitus
Procedia PDF Downloads 21918449 Challenges in Achieving Profitability for MRO Companies in the Aviation Industry: An Analytical Approach
Authors: Nur Sahver Uslu, Ali̇ Hakan Büyüklü
Abstract:
Maintenance, Repair, and Overhaul (MRO) costs are significant in the aviation industry. On the other hand, companies that provide MRO services to the aviation industry but are not dominant in the sector, need to determine the right strategies for sustainable profitability in a competitive environment. This study examined the operational real data of a small medium enterprise (SME) MRO company where analytical methods are not widely applied. The company's customers were divided into two categories: airline companies and non-airline companies, and the variables that best explained profitability were analyzed with Logistic Regression for each category and the results were compared. First, data reduction was applied to the transformed variables that went through the data cleaning and preparation stages, and the variables to be included in the model were decided. The misclassification rates for the logistic regression results concerning both customer categories are similar, indicating consistent model performance across different segments. Less profit margin is obtained from airline customers, which can be explained by the variables part description, time to quotation (TTQ), turnaround time (TAT), manager, part cost, and labour cost. The higher profit margin obtained from non-airline customers is explained only by the variables part description, part cost, and labour cost. Based on the two models, it can be stated that it is significantly more challenging for the MRO company, which is the subject of our study, to achieve profitability from Airline customers. While operational processes and organizational structure also affect the profit from airline customers, only the type of parts and costs determine the profit for non-airlines.Keywords: aircraft, aircraft components, aviation, data analytics, data science, gini index, maintenance, repair, and overhaul, MRO, logistic regression, profit, variable clustering, variable reduction
Procedia PDF Downloads 3318448 Smallholder’s Agricultural Water Management Technology Adoption, Adoption Intensity and Their Determinants: The Case of Meda Welabu Woreda, Oromia, Ethiopia
Authors: Naod Mekonnen Anega
Abstract:
The very objective of this paper was to empirically identify technology tailored determinants to the adoption and adoption intensity (extent of use) of agricultural water management technologies in Meda Welabu Woreda, Oromia regional state, Ethiopia. Meda Welabu Woreda which is one of the administrative Woredas of the Oromia regional state was selected purposively as this Woreda is one of the Woredas in the region where small scale irrigation practices and the use of agricultural water management technologies can be found among smallholders. Using the existence water management practices (use of water management technologies) and land use pattern as a criterion Genale Mekchira Kebele is selected to undergo the study. A total of 200 smallholders were selected from the Kebele using the technique developed by Krejeie and Morgan. The study employed the Logit and Tobit models to estimate and identify the economic, social, geographical, household, institutional, psychological, technological factors that determine adoption and adoption intensity of water management technologies. The study revealed that while 55 of the sampled households are adopters of agricultural water management technology the rest 140 were non adopters of the technologies. Among the adopters included in the sample 97% are using river diversion technology (traditional) with traditional canal while the rest 7% percent are using pond with treadle pump technology. The Logit estimation reveled that while adoption of river diversion is positively and significantly affected by membership to local institutions, active labor force, income, access to credit and land ownership, adoption of treadle pump technology is positively and significantly affected by family size, education level, access to credit, extension contact, income, access to market, and slope. The Logit estimation also revealed that whereas, group action requirement, distance to farm, and size of active labor force negative and significantly influenced adoption of river diversion, age and perception has negatively and significantly influenced adoption decision of treadle pump technology. On the other hand, the Tobit estimation reveled that while adoption intensity (extent of use) of agricultural water management is positively and significantly affected by education, credit, and extension contact, access to credit, access to market and income. This study revealed that technology tailored study on adoption of Agricultural water management technologies (AWMTs) should be considered to indentify and scale up best agricultural water management practices. In fact, in countries like Ethiopia, where there is difference in social, economic, cultural, environmental and agro ecological conditions even within the same Kebele technology tailored study that fit the condition of each Kebele would help to identify and scale up best practices in agricultural water management.Keywords: water management technology, adoption, adoption intensity, smallholders, technology tailored approach
Procedia PDF Downloads 45418447 Removal of Phenol from Aqueous Solution Using Watermelon (Citrullus C. lanatus) Rind
Authors: Fidelis Chigondo
Abstract:
This study focuses on investigating the effectiveness of watermelon rind in phenol removal from aqueous solution. The effects of various parameters (pH, initial phenol concentration, biosorbent dosage and contact time) on phenol adsorption were investigated. The pH of 2, initial phenol concentration of 40 ppm, the biosorbent dosage of 0.6 g and contact time of 6 h also deduced to be the optimum conditions for the adsorption process. The maximum phenol removal under optimized conditions was 85%. The sorption data fitted to the Freundlich isotherm with a regression coefficient of 0.9824. The kinetics was best described by the intraparticle diffusion model and Elovich Equation with regression coefficients of 1 and 0.8461 respectively showing that the reaction is chemisorption on a heterogeneous surface and the intraparticle diffusion rate only is the rate determining step. The study revealed that watermelon rind has a potential of removing phenol from industrial wastewaters.Keywords: biosorption, phenol, biosorbent, watermelon rind
Procedia PDF Downloads 247