Search results for: polynomial regression model
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 18327

Search results for: polynomial regression model

18117 A Machine Learning Model for Predicting Students’ Academic Performance in Higher Institutions

Authors: Emmanuel Osaze Oshoiribhor, Adetokunbo MacGregor John-Otumu

Abstract:

There has been a need in recent years to predict student academic achievement prior to graduation. This is to assist them in improving their grades, especially for those who have struggled in the past. The purpose of this research is to use supervised learning techniques to create a model that predicts student academic progress. Many scholars have developed models that predict student academic achievement based on characteristics including smoking, demography, culture, social media, parent educational background, parent finances, and family background, to mention a few. This element, as well as the model used, could have misclassified the kids in terms of their academic achievement. As a prerequisite to predicting if the student will perform well in the future on related courses, this model is built using a logistic regression classifier with basic features such as the previous semester's course score, attendance to class, class participation, and the total number of course materials or resources the student is able to cover per semester. With a 96.7 percent accuracy, the model outperformed other classifiers such as Naive bayes, Support vector machine (SVM), Decision Tree, Random forest, and Adaboost. This model is offered as a desktop application with user-friendly interfaces for forecasting student academic progress for both teachers and students. As a result, both students and professors are encouraged to use this technique to predict outcomes better.

Keywords: artificial intelligence, ML, logistic regression, performance, prediction

Procedia PDF Downloads 86
18116 Minimizing the Impact of Covariate Detection Limit in Logistic Regression

Authors: Shahadut Hossain, Jacek Wesolowski, Zahirul Hoque

Abstract:

In many epidemiological and environmental studies covariate measurements are subject to the detection limit. In most applications, covariate measurements are usually truncated from below which is known as left-truncation. Because the measuring device, which we use to measure the covariate, fails to detect values falling below the certain threshold. In regression analyses, it causes inflated bias and inaccurate mean squared error (MSE) to the estimators. This paper suggests a response-based regression calibration method to correct the deleterious impact introduced by the covariate detection limit in the estimators of the parameters of simple logistic regression model. Compared to the maximum likelihood method, the proposed method is computationally simpler, and hence easier to implement. It is robust to the violation of distributional assumption about the covariate of interest. In producing correct inference, the performance of the proposed method compared to the other competing methods has been investigated through extensive simulations. A real-life application of the method is also shown using data from a population-based case-control study of non-Hodgkin lymphoma.

Keywords: environmental exposure, detection limit, left truncation, bias, ad-hoc substitution

Procedia PDF Downloads 212
18115 2 Stage CMOS Regulated Cascode Distributed Amplifier Design Based On Inductive Coupling Technique in Submicron CMOS Process

Authors: Kittipong Tripetch, Nobuhiko Nakano

Abstract:

This paper proposes one stage and two stage CMOS Complementary Regulated Cascode Distributed Amplifier (CRCDA) design based on Inductive and Transformer coupling techniques. Usually, Distributed amplifier is based on inductor coupling between gate and gate of MOSFET and between drain and drain of MOSFET. But this paper propose some new idea, by coupling with differential primary windings of transformer between gate and gate of MOSFET first stage and second stage of regulated cascade amplifier and by coupling with differential secondary windings transformer of MOSFET between drain and drain of MOSFET first stage and second stage of regulated cascade amplifier. This paper also proposes polynomial modeling of Silicon Transformer passive equivalent circuit from Nanyang Technological University which is used to extract frequency response of transformer. Cadence simulation results are used to verify validity of transformer polynomial modeling which can be used to design distributed amplifier without Cadence. 4 parameters of scattering matrix of 2 port of the propose circuit is derived as a function of 4 parameters of impedance matrix.

Keywords: CMOS regulated cascode distributed amplifier, silicon transformer modeling with polynomial, low power consumption, distribute amplification technique

Procedia PDF Downloads 480
18114 Poverty Dynamics in Thailand: Evidence from Household Panel Data

Authors: Nattabhorn Leamcharaskul

Abstract:

This study aims to examine determining factors of the dynamics of poverty in Thailand by using panel data of 3,567 households in 2007-2017. Four techniques of estimation are employed to analyze the situation of poverty across households and time periods: the multinomial logit model, the sequential logit model, the quantile regression model, and the difference in difference model. Households are categorized based on their experiences into 5 groups, namely chronically poor, falling into poverty, re-entering into poverty, exiting from poverty and never poor households. Estimation results emphasize the effects of demographic and socioeconomic factors as well as unexpected events on the economic status of a household. It is found that remittances have positive impact on household’s economic status in that they are likely to lower the probability of falling into poverty or trapping in poverty while they tend to increase the probability of exiting from poverty. In addition, not only receiving a secondary source of household income can raise the probability of being a never poor household, but it also significantly increases household income per capita of the chronically poor and falling into poverty households. Public work programs are recommended as an important tool to relieve household financial burden and uncertainty and thus consequently increase a chance for households to escape from poverty.

Keywords: difference in difference, dynamic, multinomial logit model, panel data, poverty, quantile regression, remittance, sequential logit model, Thailand, transfer

Procedia PDF Downloads 85
18113 Response Surface Methodology to Obtain Disopyramide Phosphate Loaded Controlled Release Ethyl Cellulose Microspheres

Authors: Krutika K. Sawant, Anil Solanki

Abstract:

The present study deals with the preparation and optimization of ethyl cellulose-containing disopyramide phosphate loaded microspheres using solvent evaporation technique. A central composite design consisting of a two-level full factorial design superimposed on a star design was employed for optimizing the preparation microspheres. The drug:polymer ratio (X1) and speed of the stirrer (X2) were chosen as the independent variables. The cumulative release of the drug at a different time (2, 6, 10, 14, and 18 hr) was selected as the dependent variable. An optimum polynomial equation was generated for the prediction of the response variable at time 10 hr. Based on the results of multiple linear regression analysis and F statistics, it was concluded that sustained action can be obtained when X1 and X2 are kept at high levels. The X1X2 interaction was found to be statistically significant. The drug release pattern fitted the Higuchi model well. The data of a selected batch were subjected to an optimization study using Box-Behnken design, and an optimal formulation was fabricated. Good agreement was observed between the predicted and the observed dissolution profiles of the optimal formulation.

Keywords: disopyramide phosphate, ethyl cellulose, microspheres, controlled release, Box-Behnken design, factorial design

Procedia PDF Downloads 430
18112 Multicollinearity and MRA in Sustainability: Application of the Raise Regression

Authors: Claudia García-García, Catalina B. García-García, Román Salmerón-Gómez

Abstract:

Much economic-environmental research includes the analysis of possible interactions by using Moderated Regression Analysis (MRA), which is a specific application of multiple linear regression analysis. This methodology allows analyzing how the effect of one of the independent variables is moderated by a second independent variable by adding a cross-product term between them as an additional explanatory variable. Due to the very specification of the methodology, the moderated factor is often highly correlated with the constitutive terms. Thus, great multicollinearity problems arise. The appearance of strong multicollinearity in a model has important consequences. Inflated variances of the estimators may appear, there is a tendency to consider non-significant regressors that they probably are together with a very high coefficient of determination, incorrect signs of our coefficients may appear and also the high sensibility of the results to small changes in the dataset. Finally, the high relationship among explanatory variables implies difficulties in fixing the individual effects of each one on the model under study. These consequences shifted to the moderated analysis may imply that it is not worth including an interaction term that may be distorting the model. Thus, it is important to manage the problem with some methodology that allows for obtaining reliable results. After a review of those works that applied the MRA among the ten top journals of the field, it is clear that multicollinearity is mostly disregarded. Less than 15% of the reviewed works take into account potential multicollinearity problems. To overcome the issue, this work studies the possible application of recent methodologies to MRA. Particularly, the raised regression is analyzed. This methodology mitigates collinearity from a geometrical point of view: the collinearity problem arises because the variables under study are very close geometrically, so by separating both variables, the problem can be mitigated. Raise regression maintains the available information and modifies the problematic variables instead of deleting variables, for example. Furthermore, the global characteristics of the initial model are also maintained (sum of squared residuals, estimated variance, coefficient of determination, global significance test and prediction). The proposal is implemented to data from countries of the European Union during the last year available regarding greenhouse gas emissions, per capita GDP and a dummy variable that represents the topography of the country. The use of a dummy variable as the moderator is a special variant of MRA, sometimes called “subgroup regression analysis.” The main conclusion of this work is that applying new techniques to the field can improve in a substantial way the results of the analysis. Particularly, the use of raised regression mitigates great multicollinearity problems, so the researcher is able to rely on the interaction term when interpreting the results of a particular study.

Keywords: multicollinearity, MRA, interaction, raise

Procedia PDF Downloads 74
18111 Binary Logistic Regression Model in Predicting the Employability of Senior High School Graduates

Authors: Cromwell F. Gopo, Joy L. Picar

Abstract:

This study aimed to predict the employability of senior high school graduates for S.Y. 2018- 2019 in the Davao del Norte Division through quantitative research design using the descriptive status and predictive approaches among the indicated parameters, namely gender, school type, academics, academic award recipient, skills, values, and strand. The respondents of the study were the 33 secondary schools offering senior high school programs identified through simple random sampling, which resulted in 1,530 cases of graduates’ secondary data, which were analyzed using frequency, percentage, mean, standard deviation, and binary logistic regression. Results showed that the majority of the senior high school graduates who come from large schools were females. Further, less than half of these graduates received any academic award in any semester. In general, the graduates’ performance in academics, skills, and values were proficient. Moreover, less than half of the graduates were not employed. Then, those who were employed were either contractual, casual, or part-time workers dominated by GAS graduates. Further, the predictors of employability were gender and the Information and Communications Technology (ICT) strand, while the remaining variables did not add significantly to the model. The null hypothesis had been rejected as the coefficients of the predictors in the binary logistic regression equation did not take the value of 0. After utilizing the model, it was concluded that Technical-Vocational-Livelihood (TVL) graduates except ICT had greater estimates of employability.

Keywords: employability, senior high school graduates, Davao del Norte, Philippines

Procedia PDF Downloads 113
18110 Using the Bootstrap for Problems Statistics

Authors: Brahim Boukabcha, Amar Rebbouh

Abstract:

The bootstrap method based on the idea of exploiting all the information provided by the initial sample, allows us to study the properties of estimators. In this article we will present a theoretical study on the different methods of bootstrapping and using the technique of re-sampling in statistics inference to calculate the standard error of means of an estimator and determining a confidence interval for an estimated parameter. We apply these methods tested in the regression models and Pareto model, giving the best approximations.

Keywords: bootstrap, error standard, bias, jackknife, mean, median, variance, confidence interval, regression models

Procedia PDF Downloads 356
18109 Determining the Factors Affecting Social Media Addiction (Virtual Tolerance, Virtual Communication), Phubbing, and Perception of Addiction in Nurses

Authors: Fatima Zehra Allahverdi, Nukhet Bayer

Abstract:

Objective: Three questions were formulated to examine stressful working units (intensive care units, emergency unit nurses) utilizing the self-perception theory and social support theory. This study provides a distinctive input by inspecting the combination of variables regarding stressful working environments. Method: The descriptive research was conducted with the participation of 400 nurses working at Ankara City Hospital. The study used Multivariate Analysis of Variance (MANOVA), regression analysis, and a mediation model. Hypothesis one used MANOVA followed by a Scheffe post hoc test. Hypothesis two utilized regression analysis using a hierarchical linear regression model. Hypothesis three used a mediation model. Result: The study utilized mediation analyses. Findings supported the hypotheses that intensive care units have significantly high scores in virtual communication and virtual tolerance. The number of years on the job, virtual communication, virtual tolerance, and phubbing significantly predicted 51% of the variance of perception of addiction. Interestingly, the number of years on the job, while significant, was negatively related to perception of addiction. Conclusion: The reasoning behind these findings and the lack of significance in the emergency unit is discussed. Around 7% of the variance of phubbing was accounted for through working in intensive care units. The model accounted for 26.80 % of the differences in the perception of addiction.

Keywords: phubbing, social media, working units, years on the job, stress

Procedia PDF Downloads 21
18108 An Epsilon Hierarchical Fuzzy Twin Support Vector Regression

Authors: Arindam Chaudhuri

Abstract:

The research presents epsilon- hierarchical fuzzy twin support vector regression (epsilon-HFTSVR) based on epsilon-fuzzy twin support vector regression (epsilon-FTSVR) and epsilon-twin support vector regression (epsilon-TSVR). Epsilon-FTSVR is achieved by incorporating trapezoidal fuzzy numbers to epsilon-TSVR which takes care of uncertainty existing in forecasting problems. Epsilon-FTSVR determines a pair of epsilon-insensitive proximal functions by solving two related quadratic programming problems. The structural risk minimization principle is implemented by introducing regularization term in primal problems of epsilon-FTSVR. This yields dual stable positive definite problems which improves regression performance. Epsilon-FTSVR is then reformulated as epsilon-HFTSVR consisting of a set of hierarchical layers each containing epsilon-FTSVR. Experimental results on both synthetic and real datasets reveal that epsilon-HFTSVR has remarkable generalization performance with minimum training time.

Keywords: regression, epsilon-TSVR, epsilon-FTSVR, epsilon-HFTSVR

Procedia PDF Downloads 333
18107 Probabilistic Slope Stability Analysis of Excavation Induced Landslides Using Hermite Polynomial Chaos

Authors: Schadrack Mwizerwa

Abstract:

The characterization and prediction of landslides are crucial for assessing geological hazards and mitigating risks to infrastructure and communities. This research aims to develop a probabilistic framework for analyzing excavation-induced landslides, which is fundamental for assessing geological hazards and mitigating risks to infrastructure and communities. The study uses Hermite polynomial chaos, a non-stationary random process, to analyze the stability of a slope and characterize the failure probability of a real landslide induced by highway construction excavation. The correlation within the data is captured using the Karhunen-Loève (KL) expansion theory, and the finite element method is used to analyze the slope's stability. The research contributes to the field of landslide characterization by employing advanced random field approaches, providing valuable insights into the complex nature of landslide behavior and the effectiveness of advanced probabilistic models for risk assessment and management. The data collected from the Baiyuzui landslide, induced by highway construction, is used as an illustrative example. The findings highlight the importance of considering the probabilistic nature of landslides and provide valuable insights into the complex behavior of such hazards.

Keywords: Hermite polynomial chaos, Karhunen-Loeve, slope stability, probabilistic analysis

Procedia PDF Downloads 44
18106 Ground Motion Modeling Using the Least Absolute Shrinkage and Selection Operator

Authors: Yildiz Stella Dak, Jale Tezcan

Abstract:

Ground motion models that relate a strong motion parameter of interest to a set of predictive seismological variables describing the earthquake source, the propagation path of the seismic wave, and the local site conditions constitute a critical component of seismic hazard analyses. When a sufficient number of strong motion records are available, ground motion relations are developed using statistical analysis of the recorded ground motion data. In regions lacking a sufficient number of recordings, a synthetic database is developed using stochastic, theoretical or hybrid approaches. Regardless of the manner the database was developed, ground motion relations are developed using regression analysis. Development of a ground motion relation is a challenging process which inevitably requires the modeler to make subjective decisions regarding the inclusion criteria of the recordings, the functional form of the model and the set of seismological variables to be included in the model. Because these decisions are critically important to the validity and the applicability of the model, there is a continuous interest on procedures that will facilitate the development of ground motion models. This paper proposes the use of the Least Absolute Shrinkage and Selection Operator (LASSO) in selecting the set predictive seismological variables to be used in developing a ground motion relation. The LASSO can be described as a penalized regression technique with a built-in capability of variable selection. Similar to the ridge regression, the LASSO is based on the idea of shrinking the regression coefficients to reduce the variance of the model. Unlike ridge regression, where the coefficients are shrunk but never set equal to zero, the LASSO sets some of the coefficients exactly to zero, effectively performing variable selection. Given a set of candidate input variables and the output variable of interest, LASSO allows ranking the input variables in terms of their relative importance, thereby facilitating the selection of the set of variables to be included in the model. Because the risk of overfitting increases as the ratio of the number of predictors to the number of recordings increases, selection of a compact set of variables is important in cases where a small number of recordings are available. In addition, identification of a small set of variables can improve the interpretability of the resulting model, especially when there is a large number of candidate predictors. A practical application of the proposed approach is presented, using more than 600 recordings from the National Geospatial-Intelligence Agency (NGA) database, where the effect of a set of seismological predictors on the 5% damped maximum direction spectral acceleration is investigated. The set of candidate predictors considered are Magnitude, Rrup, Vs30. Using LASSO, the relative importance of the candidate predictors has been ranked. Regression models with increasing levels of complexity were constructed using one, two, three, and four best predictors, and the models’ ability to explain the observed variance in the target variable have been compared. The bias-variance trade-off in the context of model selection is discussed.

Keywords: ground motion modeling, least absolute shrinkage and selection operator, penalized regression, variable selection

Procedia PDF Downloads 304
18105 BART Matching Method: Using Bayesian Additive Regression Tree for Data Matching

Authors: Gianna Zou

Abstract:

Propensity score matching (PSM), introduced by Paul R. Rosenbaum and Donald Rubin in 1983, is a popular statistical matching technique which tries to estimate the treatment effects by taking into account covariates that could impact the efficacy of study medication in clinical trials. PSM can be used to reduce the bias due to confounding variables. However, PSM assumes that the response values are normally distributed. In some cases, this assumption may not be held. In this paper, a machine learning method - Bayesian Additive Regression Tree (BART), is used as a more robust method of matching. BART can work well when models are misspecified since it can be used to model heterogeneous treatment effects. Moreover, it has the capability to handle non-linear main effects and multiway interactions. In this research, a BART Matching Method (BMM) is proposed to provide a more reliable matching method over PSM. By comparing the analysis results from PSM and BMM, BMM can perform well and has better prediction capability when the response values are not normally distributed.

Keywords: BART, Bayesian, matching, regression

Procedia PDF Downloads 115
18104 A Survey on Quasi-Likelihood Estimation Approaches for Longitudinal Set-ups

Authors: Naushad Mamode Khan

Abstract:

The Com-Poisson (CMP) model is one of the most popular discrete generalized linear models (GLMS) that handles both equi-, over- and under-dispersed data. In longitudinal context, an integer-valued autoregressive (INAR(1)) process that incorporates covariate specification has been developed to model longitudinal CMP counts. However, the joint likelihood CMP function is difficult to specify and thus restricts the likelihood based estimating methodology. The joint generalized quasilikelihood approach (GQL-I) was instead considered but is rather computationally intensive and may not even estimate the regression effects due to a complex and frequently ill conditioned covariance structure. This paper proposes a new GQL approach for estimating the regression parameters (GQLIII) that are based on a single score vector representation. The performance of GQL-III is compared with GQL-I and separate marginal GQLs (GQL-II) through some simulation experiments and is proved to yield equally efficient estimates as GQL-I and is far more computationally stable.

Keywords: longitudinal, com-Poisson, ill-conditioned, INAR(1), GLMS, GQL

Procedia PDF Downloads 334
18103 The Relationship between Coping Styles and Internet Addiction among High School Students

Authors: Adil Kaval, Digdem Muge Siyez

Abstract:

With the negative effects of internet use in a person's life, the use of the Internet has become an issue. This subject was mostly considered as internet addiction, and it was investigated. In literature, it is noteworthy that some theoretical models have been proposed to explain the reasons for internet addiction. In addition to these theoretical models, it may be thought that the coping style for stressing events can be a predictor of internet addiction. It was aimed to test with logistic regression the effect of high school students' coping styles on internet addiction levels. Sample of the study consisted of 770 Turkish adolescents (471 girls, 299 boys) selected from high schools in the 2017-2018 academic year in İzmir province. Internet Addiction Test, Coping Scale for Child and Adolescents and a demographic information form were used in this study. The results of the logistic regression analysis indicated that the model of coping styles predicted internet addiction provides a statistically significant prediction of internet addiction. Gender does not predict whether or not to be addicted to the internet. The active coping style is not effective on internet addiction levels, while the avoiding and negative coping style are effective on internet addiction levels. With this model, % 79.1 of internet addiction in high school is estimated. The Negelkerke pseudo R2 indicated that the model accounted for %35 of the total variance. The results of this study on Turkish adolescents are similar to the results of other studies in the literature. It can be argued that avoiding and negative coping styles are important risk factors in the development of internet addiction.

Keywords: adolescents, coping, internet addiction, regression analysis

Procedia PDF Downloads 148
18102 Modeling Default Probabilities of the Chosen Czech Banks in the Time of the Financial Crisis

Authors: Petr Gurný

Abstract:

One of the most important tasks in the risk management is the correct determination of probability of default (PD) of particular financial subjects. In this paper a possibility of determination of financial institution’s PD according to the credit-scoring models is discussed. The paper is divided into the two parts. The first part is devoted to the estimation of the three different models (based on the linear discriminant analysis, logit regression and probit regression) from the sample of almost three hundred US commercial banks. Afterwards these models are compared and verified on the control sample with the view to choose the best one. The second part of the paper is aimed at the application of the chosen model on the portfolio of three key Czech banks to estimate their present financial stability. However, it is not less important to be able to estimate the evolution of PD in the future. For this reason, the second task in this paper is to estimate the probability distribution of the future PD for the Czech banks. So, there are sampled randomly the values of particular indicators and estimated the PDs’ distribution, while it’s assumed that the indicators are distributed according to the multidimensional subordinated Lévy model (Variance Gamma model and Normal Inverse Gaussian model, particularly). Although the obtained results show that all banks are relatively healthy, there is still high chance that “a financial crisis” will occur, at least in terms of probability. This is indicated by estimation of the various quantiles in the estimated distributions. Finally, it should be noted that the applicability of the estimated model (with respect to the used data) is limited to the recessionary phase of the financial market.

Keywords: credit-scoring models, multidimensional subordinated Lévy model, probability of default

Procedia PDF Downloads 425
18101 Project Time Prediction Model: A Case Study of Construction Projects in Sindh, Pakistan

Authors: Tauha Hussain Ali, Shabir Hussain Khahro, Nafees Ahmed Memon

Abstract:

Accurate prediction of project time for planning and bid preparation stage should contain realistic dates. Constructors use their experience to estimate the project duration for the new projects, which is based on intuitions. It has been a constant concern to both researchers and constructors to analyze the accurate prediction of project duration for bid preparation stage. In Pakistan, such study for time cost relationship has been lacked to predict duration performance for the construction projects. This study is an attempt to explore the time cost relationship that would conclude with a mathematical model to predict the time for the drainage rehabilitation projects in the province of Sindh, Pakistan. The data has been collected from National Engineering Services (NESPAK), Pakistan and regression analysis has been carried out for the analysis of results. Significant relationship has been found between time and cost of the construction projects in Sindh and the generated mathematical model can be used by the constructors to predict the project duration for the upcoming projects of same nature. This study also provides the professionals with a requisite knowledge to make decisions regarding project duration, which is significantly important to win the projects at the bid stage.

Keywords: BTC Model, project time, relationship of time cost, regression

Procedia PDF Downloads 356
18100 Cryptographic Attack on Lucas Based Cryptosystems Using Chinese Remainder Theorem

Authors: Tze Jin Wong, Lee Feng Koo, Pang Hung Yiu

Abstract:

Lenstra’s attack uses Chinese remainder theorem as a tool and requires a faulty signature to be successful. This paper reports on the security responses of fourth and sixth order Lucas based (LUC4,6) cryptosystem under the Lenstra’s attack as compared to the other two Lucas based cryptosystems such as LUC and LUC3 cryptosystems. All the Lucas based cryptosystems were exposed mathematically to the Lenstra’s attack using Chinese Remainder Theorem and Dickson polynomial. Result shows that the possibility for successful Lenstra’s attack is less against LUC4,6 cryptosystem than LUC3 and LUC cryptosystems. Current study concludes that LUC4,6 cryptosystem is more secure than LUC and LUC3 cryptosystems in sustaining against Lenstra’s attack.

Keywords: Lucas sequence, Dickson polynomial, faulty signature, corresponding signature, congruence

Procedia PDF Downloads 133
18099 Predictive Analysis of the Stock Price Market Trends with Deep Learning

Authors: Suraj Mehrotra

Abstract:

The stock market is a volatile, bustling marketplace that is a cornerstone of economics. It defines whether companies are successful or in spiral. A thorough understanding of it is important - many companies have whole divisions dedicated to analysis of both their stock and of rivaling companies. Linking the world of finance and artificial intelligence (AI), especially the stock market, has been a relatively recent development. Predicting how stocks will do considering all external factors and previous data has always been a human task. With the help of AI, however, machine learning models can help us make more complete predictions in financial trends. Taking a look at the stock market specifically, predicting the open, closing, high, and low prices for the next day is very hard to do. Machine learning makes this task a lot easier. A model that builds upon itself that takes in external factors as weights can predict trends far into the future. When used effectively, new doors can be opened up in the business and finance world, and companies can make better and more complete decisions. This paper explores the various techniques used in the prediction of stock prices, from traditional statistical methods to deep learning and neural networks based approaches, among other methods. It provides a detailed analysis of the techniques and also explores the challenges in predictive analysis. For the accuracy of the testing set, taking a look at four different models - linear regression, neural network, decision tree, and naïve Bayes - on the different stocks, Apple, Google, Tesla, Amazon, United Healthcare, Exxon Mobil, J.P. Morgan & Chase, and Johnson & Johnson, the naïve Bayes model and linear regression models worked best. For the testing set, the naïve Bayes model had the highest accuracy along with the linear regression model, followed by the neural network model and then the decision tree model. The training set had similar results except for the fact that the decision tree model was perfect with complete accuracy in its predictions, which makes sense. This means that the decision tree model likely overfitted the training set when used for the testing set.

Keywords: machine learning, testing set, artificial intelligence, stock analysis

Procedia PDF Downloads 61
18098 Qualitative and Quantitative Analysis of Motivation Letters to Model Turnover in Non-Governmental Organization

Authors: A. Porshnev, A. Zaporozhtchuk

Abstract:

Motivation regarded as a key factor of labor turnover, is especially important for volunteers working on an altruistic basis in NGO. Despite the motivational letter, candidate selection depends on the impression of the selection committee, which can be subject to human bias. We expect that structured and unstructured information provided in motivation letters could be used to improve candidate selection procedures. In our paper, we perform qualitative and quantitative analysis of 2280 motivation letters, create logistic regression, and build a decision tree to improve selection procedures. Our analysis showed that motivation factors are significant and enable human resources department to forecast labor turnover and provide extra information to demographic, professional and timing questions. In spite of the average level of accuracy the model demonstrates the selection procedures of company of under consideration can be improved. We also discuss interrelation between answers to open and closed motivation questions, recommend changes in motivational letter templates to ensure more relevant information about applicants and further steps to create more accurate model.

Keywords: decision trees, logistic regression, model, motivational letter, non-governmental organization, retention, turnover

Procedia PDF Downloads 151
18097 Modeling of Traffic Turning Movement

Authors: Michael Tilahun Mulugeta

Abstract:

Pedestrians are the most vulnerable road users as they are more exposed to the risk of collusion. Pedestrian safety at road intersections still remains the most vital and yet unsolved issue in Addis Ababa, Ethiopia. One of the critical points in pedestrian safety is the occurrence of conflict between turning vehicle and pedestrians at un-signalized intersection. However, a better understanding of the factors that affect the likelihood of the conflicts would help provide direction for countermeasures aimed at reducing the number of crashes. This paper has sorted to explore a model to describe the relation between traffic conflicts and influencing factors using Multiple Linear regression methodology. In this research the main focus is to study the interaction of turning (left & right) vehicle with pedestrian at unsignalized intersections. The specific objectives also to determine factors that affect the number of potential conflicts and develop a model of potential conflict.

Keywords: potential, regression analysis, pedestrian, conflicts

Procedia PDF Downloads 26
18096 Shedding Light on the Black Box: Explaining Deep Neural Network Prediction of Clinical Outcome

Authors: Yijun Shao, Yan Cheng, Rashmee U. Shah, Charlene R. Weir, Bruce E. Bray, Qing Zeng-Treitler

Abstract:

Deep neural network (DNN) models are being explored in the clinical domain, following the recent success in other domains such as image recognition. For clinical adoption, outcome prediction models require explanation, but due to the multiple non-linear inner transformations, DNN models are viewed by many as a black box. In this study, we developed a deep neural network model for predicting 1-year mortality of patients who underwent major cardio vascular procedures (MCVPs), using temporal image representation of past medical history as input. The dataset was obtained from the electronic medical data warehouse administered by Veteran Affairs Information and Computing Infrastructure (VINCI). We identified 21,355 veterans who had their first MCVP in 2014. Features for prediction included demographics, diagnoses, procedures, medication orders, hospitalizations, and frailty measures extracted from clinical notes. Temporal variables were created based on the patient history data in the 2-year window prior to the index MCVP. A temporal image was created based on these variables for each individual patient. To generate the explanation for the DNN model, we defined a new concept called impact score, based on the presence/value of clinical conditions’ impact on the predicted outcome. Like (log) odds ratio reported by the logistic regression (LR) model, impact scores are continuous variables intended to shed light on the black box model. For comparison, a logistic regression model was fitted on the same dataset. In our cohort, about 6.8% of patients died within one year. The prediction of the DNN model achieved an area under the curve (AUC) of 78.5% while the LR model achieved an AUC of 74.6%. A strong but not perfect correlation was found between the aggregated impact scores and the log odds ratios (Spearman’s rho = 0.74), which helped validate our explanation.

Keywords: deep neural network, temporal data, prediction, frailty, logistic regression model

Procedia PDF Downloads 134
18095 Regional Flood-Duration-Frequency Models for Norway

Authors: Danielle M. Barna, Kolbjørn Engeland, Thordis Thorarinsdottir, Chong-Yu Xu

Abstract:

Design flood values give estimates of flood magnitude within a given return period and are essential to making adaptive decisions around land use planning, infrastructure design, and disaster mitigation. Often design flood values are needed at locations with insufficient data. Additionally, in hydrologic applications where flood retention is important (e.g., floodplain management and reservoir design), design flood values are required at different flood durations. A statistical approach to this problem is a development of a regression model for extremes where some of the parameters are dependent on flood duration in addition to being covariate-dependent. In hydrology, this is called a regional flood-duration-frequency (regional-QDF) model. Typically, the underlying statistical distribution is chosen to be the Generalized Extreme Value (GEV) distribution. However, as the support of the GEV distribution depends on both its parameters and the range of the data, special care must be taken with the development of the regional model. In particular, we find that the GEV is problematic when developing a GAMLSS-type analysis due to the difficulty of proposing a link function that is independent of the unknown parameters and the observed data. We discuss these challenges in the context of developing a regional QDF model for Norway.

Keywords: design flood values, bayesian statistics, regression modeling of extremes, extreme value analysis, GEV

Procedia PDF Downloads 45
18094 A Medical Resource Forecasting Model for Emergency Room Patients with Acute Hepatitis

Authors: R. J. Kuo, W. C. Cheng, W. C. Lien, T. J. Yang

Abstract:

Taiwan is a hyper endemic area for the Hepatitis B virus (HBV). The estimated total number of HBsAg carriers in the general population who are more than 20 years old is more than 3 million. Therefore, a case record review is conducted from January 2003 to June 2007 for all patients with a diagnosis of acute hepatitis who were admitted to the Emergency Department (ED) of a well-known teaching hospital. The cost for the use of medical resources is defined as the total medical fee. In this study, principal component analysis (PCA) is firstly employed to reduce the number of dimensions. Support vector regression (SVR) and artificial neural network (ANN) are then used to develop the forecasting model. A total of 117 patients meet the inclusion criteria. 61% patients involved in this study are hepatitis B related. The computational result shows that the proposed PCA-SVR model has superior performance than other compared algorithms. In conclusion, the Child-Pugh score and echogram can both be used to predict the cost of medical resources for patients with acute hepatitis in the ED.

Keywords: acute hepatitis, medical resource cost, artificial neural network, support vector regression

Procedia PDF Downloads 402
18093 Mathematical Modeling of Drip Emitter Discharge of Trapezoidal Labyrinth Channel

Authors: N. Philipova

Abstract:

The influence of the geometric parameters of trapezoidal labyrinth channel on the emitter discharge is investigated in this work. The impact of the dentate angle, the dentate spacing, and the dentate height are studied among the geometric parameters of the labyrinth channel. Numerical simulations of the water flow movement are performed according to central cubic composite design using Commercial codes GAMBIT and FLUENT. Inlet pressure of the dripper is set up to be 1 bar. The objective of this paper is to derive a mathematical model of the emitter discharge depending on the dentate angle, the dentate spacing, the dentate height of the labyrinth channel. As a result, the obtained mathematical model is a second-order polynomial reporting 2-way interactions among the geometric parameters. The dentate spacing has the most important and positive influence on the emitter discharge, followed by the simultaneous impact of the dentate spacing and the dentate height. The dentate angle in the observed interval has no significant effect on the emitter discharge. The obtained model can be used as a basis for a future emitter design.

Keywords: drip irrigation, labyrinth channel hydrodynamics, numerical simulations, Reynolds stress model.

Procedia PDF Downloads 166
18092 Chemometric QSRR Evaluation of Behavior of s-Triazine Pesticides in Liquid Chromatography

Authors: Lidija R. Jevrić, Sanja O. Podunavac-Kuzmanović, Strahinja Z. Kovačević

Abstract:

This study considers the selection of the most suitable in silico molecular descriptors that could be used for s-triazine pesticides characterization. Suitable descriptors among topological, geometrical and physicochemical are used for quantitative structure-retention relationships (QSRR) model establishment. Established models were obtained using linear regression (LR) and multiple linear regression (MLR) analysis. In this paper, MLR models were established avoiding multicollinearity among the selected molecular descriptors. Statistical quality of established models was evaluated by standard and cross-validation statistical parameters. For detection of similarity or dissimilarity among investigated s-triazine pesticides and their classification, principal component analysis (PCA) and hierarchical cluster analysis (HCA) were used and gave similar grouping. This study is financially supported by COST action TD1305.

Keywords: chemometrics, classification analysis, molecular descriptors, pesticides, regression analysis

Procedia PDF Downloads 358
18091 Friction Behavior of Wood-Plastic Composites against Uncoated Cemented Carbide

Authors: Almontas Vilutis, Vytenis Jankauskas

Abstract:

The paper presents the results of the investigation of the dry sliding friction of wood-plastic composites (WPCs) against WC-Co cemented carbide. The dependence of the dynamic coefficient of friction on the main influencing factors (vertical load, temperature, and sliding distance) was investigated by evaluating their mutual interaction. Multiple regression analysis showed a high polynomial dependence (adjusted R2 > 0.98). The resistance of the composite to thermo-mechanical effects determines how temperature and force factors affect the magnitude of the coefficient of friction. WPC-B composite has the lowest friction and highest resistance compared to WPC-A, while composite and cemented carbide materials wear the least. Energy dispersive spectroscopy (EDS), based on elemental composition, provided important insights into the friction process.

Keywords: friction, composite, carbide, factors

Procedia PDF Downloads 50
18090 A Geographic Information System Mapping Method for Creating Improved Satellite Solar Radiation Dataset Over Qatar

Authors: Sachin Jain, Daniel Perez-Astudillo, Dunia A. Bachour, Antonio P. Sanfilippo

Abstract:

The future of solar energy in Qatar is evolving steadily. Hence, high-quality spatial solar radiation data is of the uttermost requirement for any planning and commissioning of solar technology. Generally, two types of solar radiation data are available: satellite data and ground observations. Satellite solar radiation data is developed by the physical and statistical model. Ground data is collected by solar radiation measurement stations. The ground data is of high quality. However, they are limited to distributed point locations with the high cost of installation and maintenance for the ground stations. On the other hand, satellite solar radiation data is continuous and available throughout geographical locations, but they are relatively less accurate than ground data. To utilize the advantage of both data, a product has been developed here which provides spatial continuity and higher accuracy than any of the data alone. The popular satellite databases: National Solar radiation Data Base, NSRDB (PSM V3 model, spatial resolution: 4 km) is chosen here for merging with ground-measured solar radiation measurement in Qatar. The spatial distribution of ground solar radiation measurement stations is comprehensive in Qatar, with a network of 13 ground stations. The monthly average of the daily total Global Horizontal Irradiation (GHI) component from ground and satellite data is used for error analysis. The normalized root means square error (NRMSE) values of 3.31%, 6.53%, and 6.63% for October, November, and December 2019 were observed respectively when comparing in-situ and NSRDB data. The method is based on the Empirical Bayesian Kriging Regression Prediction model available in ArcGIS, ESRI. The workflow of the algorithm is based on the combination of regression and kriging methods. A regression model (OLS, ordinary least square) is fitted between the ground and NSBRD data points. A semi-variogram is fitted into the experimental semi-variogram obtained from the residuals. The kriging residuals obtained after fitting the semi-variogram model were added to NSRBD data predicted values obtained from the regression model to obtain the final predicted values. The NRMSE values obtained after merging are respectively 1.84%, 1.28%, and 1.81% for October, November, and December 2019. One more explanatory variable, that is the ground elevation, has been incorporated in the regression and kriging methods to reduce the error and to provide higher spatial resolution (30 m). The final GHI maps have been created after merging, and NRMSE values of 1.24%, 1.28%, and 1.28% have been observed for October, November, and December 2019, respectively. The proposed merging method has proven as a highly accurate method. An additional method is also proposed here to generate calibrated maps by using regression and kriging model and further to use the calibrated model to generate solar radiation maps from the explanatory variable only when not enough historical ground data is available for long-term analysis. The NRMSE values obtained after the comparison of the calibrated maps with ground data are 5.60% and 5.31% for November and December 2019 month respectively.

Keywords: global horizontal irradiation, GIS, empirical bayesian kriging regression prediction, NSRDB

Procedia PDF Downloads 63
18089 Selection of Designs in Ordinal Regression Models under Linear Predictor Misspecification

Authors: Ishapathik Das

Abstract:

The purpose of this article is to find a method of comparing designs for ordinal regression models using quantile dispersion graphs in the presence of linear predictor misspecification. The true relationship between response variable and the corresponding control variables are usually unknown. Experimenter assumes certain form of the linear predictor of the ordinal regression models. The assumed form of the linear predictor may not be correct always. Thus, the maximum likelihood estimates (MLE) of the unknown parameters of the model may be biased due to misspecification of the linear predictor. In this article, the uncertainty in the linear predictor is represented by an unknown function. An algorithm is provided to estimate the unknown function at the design points where observations are available. The unknown function is estimated at all points in the design region using multivariate parametric kriging. The comparison of the designs are based on a scalar valued function of the mean squared error of prediction (MSEP) matrix, which incorporates both variance and bias of the prediction caused by the misspecification in the linear predictor. The designs are compared using quantile dispersion graphs approach. The graphs also visually depict the robustness of the designs on the changes in the parameter values. Numerical examples are presented to illustrate the proposed methodology.

Keywords: model misspecification, multivariate kriging, multivariate logistic link, ordinal response models, quantile dispersion graphs

Procedia PDF Downloads 359
18088 An Improved Model of Estimation Global Solar Irradiation from in situ Data: Case of Oran Algeria Region

Authors: Houcine Naim, Abdelatif Hassini, Noureddine Benabadji, Alex Van Den Bossche

Abstract:

In this paper, two models to estimate the overall monthly average daily radiation on a horizontal surface were applied to the site of Oran (35.38 ° N, 0.37 °W). We present a comparison between the first one is a regression equation of the Angstrom type and the second model is developed by the present authors some modifications were suggested using as input parameters: the astronomical parameters as (latitude, longitude, and altitude) and meteorological parameters as (relative humidity). The comparisons are made using the mean bias error (MBE), root mean square error (RMSE), mean percentage error (MPE), and mean absolute bias error (MABE). This comparison shows that the second model is closer to the experimental values that the model of Angstrom.

Keywords: meteorology, global radiation, Angstrom model, Oran

Procedia PDF Downloads 205