Search results for: sparse regression
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3381

Search results for: sparse regression

3081 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest

Procedia PDF Downloads 231
3080 Free Fatty Acid Assessment of Crude Palm Oil Using a Non-Destructive Approach

Authors: Siti Nurhidayah Naqiah Abdull Rani, Herlina Abdul Rahim, Rashidah Ghazali, Noramli Abdul Razak

Abstract:

Near infrared (NIR) spectroscopy has always been of great interest in the food and agriculture industries. The development of prediction models has facilitated the estimation process in recent years. In this study, 110 crude palm oil (CPO) samples were used to build a free fatty acid (FFA) prediction model. 60% of the collected data were used for training purposes and the remaining 40% used for testing. The visible peaks on the NIR spectrum were at 1725 nm and 1760 nm, indicating the existence of the first overtone of C-H bands. Principal component regression (PCR) was applied to the data in order to build this mathematical prediction model. The optimal number of principal components was 10. The results showed R2=0.7147 for the training set and R2=0.6404 for the testing set.

Keywords: palm oil, fatty acid, NIRS, regression

Procedia PDF Downloads 506
3079 Estimation of Foliar Nitrogen in Selected Vegetation Communities of Uttrakhand Himalayas Using Hyperspectral Satellite Remote Sensing

Authors: Yogita Mishra, Arijit Roy, Dhruval Bhavsar

Abstract:

The study estimates the nitrogen concentration in selected vegetation community’s i.e. chir pine (pinusroxburghii) by using hyperspectral satellite data and also identified the appropriate spectral bands and nitrogen indices. The Short Wave InfraRed reflectance spectrum at 1790 nm and 1680 nm shows the maximum possible absorption by nitrogen in selected species. Among the nitrogen indices, log normalized nitrogen index performed positively and negatively too. The strong positive correlation is taken out from 1510 nm and 760 nm for the pinusroxburghii for leaf nitrogen concentration and leaf nitrogen mass while using NDNI. The regression value of R² developed by using linear equation achieved maximum at 0.7525 for the analysis of satellite image data and R² is maximum at 0.547 for ground truth data for pinusroxburghii respectively.

Keywords: hyperspectral, NDNI, nitrogen concentration, regression value

Procedia PDF Downloads 295
3078 Predictors of Quality of Life among Older Refugees Aging out of Place

Authors: Jonix Owino, Heather Fuller

Abstract:

Refugees flee from their home countries due to civil unrest, war, persecution and migrate to Western countries such as the United States in search of a safe haven. Transitioning into a new society and culture can be challenging, thereby affecting refugee’s quality of life and well-being in the host communities. Moreover, as individuals age, they experience physical, cognitive and socioemotional changes that may impact their quality of life. However, little is known about the predictors of quality of life among aging refugees. It is not clear how quality of life varies by age, that is, between midlife refugees in comparison to their older counterparts. In addition to age, other sociodemographic factors such as gender, socioeconomic status, or country of origin are likely to have differential associations to quality of life, yet research on such variations among older refugees is sparse. Thus the present study seeks to explore factors associated with quality of life by asking the following research questions: 1) Do sociodemographic factors (such as age and gender) predict quality of life among older refugees, 2) Is there an association between social integration and quality of life, and 3) Is there an association between migratory related experiences (such as post migratory adjustments) and quality of life. The present study recruited 90 refugees (primarily originating from Bhutan, Somalia, Burundi, and Sudan) aged 50 or older living in the US. The participants completed a structured questionnaire which assessed factors such as participant’s sociodemographic attributes (e.g., age, gender, length of residence in the US, country of origin, employment, level of education, and marital status), and validated measures of social integration, post-migration living difficulties, and quality of life. Preliminary results suggest sociodemographic variability in quality of life among these refugees. Further analyses will be conducted using hierarchical regression analyses to address the following hypotheses: first, it is hypothesized that quality of life will vary by age and gender such that younger refugees and men will report higher quality of life. Second, it is expected that refugees with greater levels of social integration will also report better quality of life. Finally, post-migration factors such as language barriers and family stress are hypothesized to predict poorer quality of life. Further results will be analyzed, including potential moderating effects of age and gender, and resulting findings will be interpreted and discussed. The findings from this study have potential implications for communities on how they can better support older refugees as well as develop social programs that can effectively cater to their well-being. Conclusions will be drawn and discussed in light of policies related to both aging and refugee migration within the context of the US.

Keywords: aging out of place, migration, older refugees, quality of life, social integration

Procedia PDF Downloads 100
3077 A Multinomial Logistic Regression Analysis of Factors Influencing Couples' Fertility Preferences in Kenya

Authors: Naomi W. Maina

Abstract:

Fertility preference is a subject of great significance in developing countries. Studies reveal that the preferences of fertility are actually significant in determining the society’s fertility levels because the fertility behavior of the future has a high likelihood of falling under the effect of currently observed fertility inclinations. The objective of this study was to establish the factors associated with fertility preference amongst couples in Kenya by fitting a multinomial logistic regression model against 5,265 couple data obtained from Kenya demographic health survey 2014. Results revealed that the type of place of residence, the region of residence, age and spousal age gap significantly influence desire for additional children among couples in Kenya. There was the notable high likelihood of couples living in rural settlements having similar fertility preference compared to those living in urban settlements. Moreover, geographical disparities such as in northern Kenya revealed significant differences in a couples desire to have additional children compared to Nairobi. The odds of a couple’s desire for additional children were further observed to vary dependent on either the wife or husbands age and to a large extent the spousal age gap. Evidenced from the study, was the fact that as spousal age gap increases, the desire for more children amongst couples decreases. Insights derived from this study would be attractive to demographers, health practitioners, policymakers, and non-governmental organizations implementing fertility related interventions in Kenya among other stakeholders. Moreover, with the adoption of devolution, there is a clear need for adoption of population policies that are County specific as opposed to a national population policy as is the current practice in Kenya. Additionally, researchers or students who have little understanding in the application of multinomial logistic regression, both theoretical understanding and practical analysis in SPSS as well as application on real datasets, will find this article useful.

Keywords: couples' desire, fertility, fertility preference, multinomial regression analysis

Procedia PDF Downloads 181
3076 Estimation of a Finite Population Mean under Random Non Response Using Improved Nadaraya and Watson Kernel Weights

Authors: Nelson Bii, Christopher Ouma, John Odhiambo

Abstract:

Non-response is a potential source of errors in sample surveys. It introduces bias and large variance in the estimation of finite population parameters. Regression models have been recognized as one of the techniques of reducing bias and variance due to random non-response using auxiliary data. In this study, it is assumed that random non-response occurs in the survey variable in the second stage of cluster sampling, assuming full auxiliary information is available throughout. Auxiliary information is used at the estimation stage via a regression model to address the problem of random non-response. In particular, the auxiliary information is used via an improved Nadaraya-Watson kernel regression technique to compensate for random non-response. The asymptotic bias and mean squared error of the estimator proposed are derived. Besides, a simulation study conducted indicates that the proposed estimator has smaller values of the bias and smaller mean squared error values compared to existing estimators of finite population mean. The proposed estimator is also shown to have tighter confidence interval lengths at a 95% coverage rate. The results obtained in this study are useful, for instance, in choosing efficient estimators of the finite population mean in demographic sample surveys.

Keywords: mean squared error, random non-response, two-stage cluster sampling, confidence interval lengths

Procedia PDF Downloads 137
3075 Logistic Regression Based Model for Predicting Students’ Academic Performance in Higher Institutions

Authors: Emmanuel Osaze Oshoiribhor, Adetokunbo MacGregor John-Otumu

Abstract:

In recent years, there has been a desire to forecast student academic achievement prior to graduation. This is to help them improve their grades, particularly for individuals with poor performance. The goal of this study is to employ supervised learning techniques to construct a predictive model for student academic achievement. Many academics have already constructed models that predict student academic achievement based on factors such as smoking, demography, culture, social media, parent educational background, parent finances, and family background, to name a few. This feature and the model employed may not have correctly classified the students in terms of their academic performance. This model is built using a logistic regression classifier with basic features such as the previous semester's course score, attendance to class, class participation, and the total number of course materials or resources the student is able to cover per semester as a prerequisite to predict if the student will perform well in future on related courses. The model outperformed other classifiers such as Naive bayes, Support vector machine (SVM), Decision Tree, Random forest, and Adaboost, returning a 96.7% accuracy. This model is available as a desktop application, allowing both instructors and students to benefit from user-friendly interfaces for predicting student academic achievement. As a result, it is recommended that both students and professors use this tool to better forecast outcomes.

Keywords: artificial intelligence, ML, logistic regression, performance, prediction

Procedia PDF Downloads 97
3074 Electrical Load Estimation Using Estimated Fuzzy Linear Parameters

Authors: Bader Alkandari, Jamal Y. Madouh, Ahmad M. Alkandari, Anwar A. Alnaqi

Abstract:

A new formulation of fuzzy linear estimation problem is presented. It is formulated as a linear programming problem. The objective is to minimize the spread of the data points, taking into consideration the type of the membership function of the fuzzy parameters to satisfy the constraints on each measurement point and to insure that the original membership is included in the estimated membership. Different models are developed for a fuzzy triangular membership. The proposed models are applied to different examples from the area of fuzzy linear regression and finally to different examples for estimating the electrical load on a busbar. It had been found that the proposed technique is more suited for electrical load estimation, since the nature of the load is characterized by the uncertainty and vagueness.

Keywords: fuzzy regression, load estimation, fuzzy linear parameters, electrical load estimation

Procedia PDF Downloads 540
3073 Stature and Gender Estimation Using Foot Measurements in South Indian Population

Authors: Jagadish Rao Padubidri, Mehak Bhandary, Sowmya J. Rao

Abstract:

Introduction: The significance of the human foot and its measurements in identifying an individual has been proved a lot of times by different studies in different geographical areas and its association to the stature and gender of the individual has been justified by many researches. In our study we have used different foot measurements including the length, width, malleol height and navicular height for establishing its association to stature and gender and to find out its accuracy. The purpose of this study is to show the relation of foot measurements with stature and gender, and to derive Multiple and Logistic regression equations for stature and gender estimation in South Indian population. Materials and Methods: The subjects for this study were 200 South Indian students out of which 100 were females and 100 were males, aged between 18 to 24 years. The data for the present study included the stature, foot length, foot breath, foot malleol height, foot navicular height of both right and left foot. Descriptive statistics, T-test and Pearson correlation coefficients were derived between stature, gender and foot measurements. The stature was estimated from right and left foot measurements for both male and female South Indian population using multiple regression analysis and logistic regression analysis for gender estimation. Results: The means, standard deviation, stature, right and left foot measurements and T-test in male population were higher than in females. LFL (Left foot length) is more than RFL (Right Foot length) in male groups, but in female groups the length of both foot are almost equal [RFL=226.6, LFL=227.1]. There is not much of difference in means of RFW (Right foot width) and LFW (Left foot width) in both the genders. Significant difference were seen in mean values of malleol and navicular height of right and left feet in male gender. No such difference was seen in female subjects. Conclusions: The study has successfully demonstrated the correlation of foot length in stature estimation in all the three study groups in both right and left foot. Next in parameters are Foot width and malleol height in estimating stature among male and female groups. Navicular height of both right and left foot showed poor relationship with stature estimation in both male and female groups. Multiple regression equations for both right and left foot measurements to estimate stature were derived with standard error ranging from 11-12 cm in males and 10-11 cm in females. The SEE was 5.8 when both male and female groups were pooled together. The logistic regression model which was derived to determine gender showed 85% accuracy and 92.5% accuracy using right and left foot measurements respectively. We believe that stature and gender can be estimated with foot measurements in South Indian population.

Keywords: foot length, gender, stature, South Indian

Procedia PDF Downloads 335
3072 Uncovering the Relationship between EFL Students' Self-Concept and Their Willingness to Communicate in Language Classes

Authors: Seyedeh Khadijeh Amirian, Seyed Mohammad Reza Amirian, Narges Hekmati

Abstract:

The current study aims at examining the relationship between English as a foreign language (EFL) students' self-concept and their willingness to communicate (WTC) in EFL classes. To this effect, two questionnaires, namely 'Willingness to Communicate' (MacIntyre et al., 2001) and 'Self-Concept Scale' (Liu and Wang, 2005), were distributed among 174 (45 males and 129 females) Iranian EFL university students. Correlation and regression analyses were conducted to examine the relationship between the two variables. The results indicated that there was a significantly positive correlation between EFL students' self-concept and their WTC in EFL classes (p < .0.05). Moreover, regression analyses indicated that self-concept has a significantly positive influence on students’ WTC in language classes (B= .302, p < .0.05) and explains .302 percent of the variance in the dependent variable (WTC). The results are discussed with regards to the individual differences in educational contexts, and implications are offered.

Keywords: EFL students, language classes, willingness to communicate, self-concept

Procedia PDF Downloads 126
3071 The Influence of Interest, Beliefs, and Identity with Mathematics on Achievement

Authors: Asma Alzahrani, Elizabeth Stojanovski

Abstract:

This study investigated factors that influence mathematics achievement based on a sample of ninth-grade students (N  =  21,444) from the High School Longitudinal Study of 2009 (HSLS09). Key aspects studied included efficacy in mathematics, interest and enjoyment of mathematics, identity with mathematics and future utility beliefs and how these influence mathematics achievement. The predictability of mathematics achievement based on these factors was assessed using correlation coefficients and multiple linear regression. Spearman rank correlations and multiple regression analyses indicated positive and statistically significant relationships between the explanatory variables: mathematics efficacy, identity with mathematics, interest in and future utility beliefs with the response variable, achievement in mathematics.

Keywords: Mathematics achievement, math efficacy, mathematics interest, factors influence

Procedia PDF Downloads 150
3070 Determinants of Free Independent Traveler Tourist Expenditures in Israel: Quantile Regression Model

Authors: Shlomit Hon-Snir, Sharon Teitler-Regev, Anabel Lifszyc Friedlander

Abstract:

Tourism, one of the world's largest and fastest growing industries, exerts a major economic influence. The number of international tourists is growing every year, and the relative portion of independent (FIT) tourists is growing as well. The characteristics of independent tourists differ from those of tourists who travel in organized trips. The purpose of the research is to identify the factors that affect the individual tourist's expenses in Israel: total expenses, expenses per day, expenses per tourist, expenses per day per tourist, accommodation expenses, dining expenses and transportation expenses. Most of the research analyzed the total expenses using OLS regression. The determinants influencing expenses were divided into four groups: budget constraints, socio-demographic data, psychological characteristics and travel-related characteristics. Since the effect of each variable may change over different levels of total expenses the quantile regression (QR) theory will be applied. The current research will use data collected by the Israeli Ministry of Tourism in 2015 from individual independent tourists at the end of their visit to Israel. Preliminary results show that: At lower levels of expense, only income has a (positive) effect on total expenses, while at higher levels of expense, both income and length of stay have (positive) effects. -The effect of income on total expenses is higher for higher levels of expenses than for lower level of expenses. -The number of sites visited during the trip has a (negative) effect on tourist accommodation expenses only for tourists with a high level of total expenses. Due to the increasing share of independent tourism in Israel and around the world and due to the importance of tourism to Israel, it is very important to understand the factors that influence the expenses and behavior of independent tourists. Understanding the factors that affect independent tourists' expenses in Israel can help Israeli policymakers in their promotional efforts to attract tourism to Israel.

Keywords: independent tourist, quantile regression theory, tourism expenses, tourism

Procedia PDF Downloads 274
3069 Binary Logistic Regression Model in Predicting the Employability of Senior High School Graduates

Authors: Cromwell F. Gopo, Joy L. Picar

Abstract:

This study aimed to predict the employability of senior high school graduates for S.Y. 2018- 2019 in the Davao del Norte Division through quantitative research design using the descriptive status and predictive approaches among the indicated parameters, namely gender, school type, academics, academic award recipient, skills, values, and strand. The respondents of the study were the 33 secondary schools offering senior high school programs identified through simple random sampling, which resulted in 1,530 cases of graduates’ secondary data, which were analyzed using frequency, percentage, mean, standard deviation, and binary logistic regression. Results showed that the majority of the senior high school graduates who come from large schools were females. Further, less than half of these graduates received any academic award in any semester. In general, the graduates’ performance in academics, skills, and values were proficient. Moreover, less than half of the graduates were not employed. Then, those who were employed were either contractual, casual, or part-time workers dominated by GAS graduates. Further, the predictors of employability were gender and the Information and Communications Technology (ICT) strand, while the remaining variables did not add significantly to the model. The null hypothesis had been rejected as the coefficients of the predictors in the binary logistic regression equation did not take the value of 0. After utilizing the model, it was concluded that Technical-Vocational-Livelihood (TVL) graduates except ICT had greater estimates of employability.

Keywords: employability, senior high school graduates, Davao del Norte, Philippines

Procedia PDF Downloads 152
3068 Prediction of Coronary Artery Stenosis Severity Based on Machine Learning Algorithms

Authors: Yu-Jia Jian, Emily Chia-Yu Su, Hui-Ling Hsu, Jian-Jhih Chen

Abstract:

Coronary artery is the major supplier of myocardial blood flow. When fat and cholesterol are deposit in the coronary arterial wall, narrowing and stenosis of the artery occurs, which may lead to myocardial ischemia and eventually infarction. According to the World Health Organization (WHO), estimated 740 million people have died of coronary heart disease in 2015. According to Statistics from Ministry of Health and Welfare in Taiwan, heart disease (except for hypertensive diseases) ranked the second among the top 10 causes of death from 2013 to 2016, and it still shows a growing trend. According to American Heart Association (AHA), the risk factors for coronary heart disease including: age (> 65 years), sex (men to women with 2:1 ratio), obesity, diabetes, hypertension, hyperlipidemia, smoking, family history, lack of exercise and more. We have collected a dataset of 421 patients from a hospital located in northern Taiwan who received coronary computed tomography (CT) angiography. There were 300 males (71.26%) and 121 females (28.74%), with age ranging from 24 to 92 years, and a mean age of 56.3 years. Prior to coronary CT angiography, basic data of the patients, including age, gender, obesity index (BMI), diastolic blood pressure, systolic blood pressure, diabetes, hypertension, hyperlipidemia, smoking, family history of coronary heart disease and exercise habits, were collected and used as input variables. The output variable of the prediction module is the degree of coronary artery stenosis. The output variable of the prediction module is the narrow constriction of the coronary artery. In this study, the dataset was randomly divided into 80% as training set and 20% as test set. Four machine learning algorithms, including logistic regression, stepwise regression, neural network and decision tree, were incorporated to generate prediction results. We used area under curve (AUC) / accuracy (Acc.) to compare the four models, the best model is neural network, followed by stepwise logistic regression, decision tree, and logistic regression, with 0.68 / 79 %, 0.68 / 74%, 0.65 / 78%, and 0.65 / 74%, respectively. Sensitivity of neural network was 27.3%, specificity was 90.8%, stepwise Logistic regression sensitivity was 18.2%, specificity was 92.3%, decision tree sensitivity was 13.6%, specificity was 100%, logistic regression sensitivity was 27.3%, specificity 89.2%. From the result of this study, we hope to improve the accuracy by improving the module parameters or other methods in the future and we hope to solve the problem of low sensitivity by adjusting the imbalanced proportion of positive and negative data.

Keywords: decision support, computed tomography, coronary artery, machine learning

Procedia PDF Downloads 228
3067 Bank Concentration and Industry Structure: Evidence from China

Authors: Jingjing Ye, Cijun Fan, Yan Dong

Abstract:

The development of financial sector plays an important role in shaping industrial structure. However, evidence on the micro-level channels through which this relation manifest remains relatively sparse, particularly for developing countries. In this paper, we compile an industry-by-city dataset based on manufacturing firms and registered banks in 287 Chinese cities from 1998 to 2008. Based on a difference-in-difference approach, we find the highly concentrated banking sector decreases the competitiveness of firms in each manufacturing industry. There are two main reasons: i) bank accessibility successfully fosters firm expansion within each industry, however, only for sufficiently large enterprises; ii) state-owned enterprises are favored by the banking industry in China. The results are robust after considering alternative concentration and external finance dependence measures.

Keywords: bank concentration, China, difference-in-difference, industry structure

Procedia PDF Downloads 388
3066 Determinants of Poverty: A Logit Regression Analysis of Zakat Applicants

Authors: Zunaidah Ab Hasan, Azhana Othman, Abd Halim Mohd Noor, Nor Shahrina Mohd Rafien

Abstract:

Zakat is a portion of wealth contributed from financially able Muslims to be distributed to predetermine recipients; main among them are the poor and the needy. Distribution of the zakat fund is given with the objective to lift the recipients from poverty. Due to the multidimensional and multifaceted nature of poverty, it is imperative that the causes of poverty are properly identified for assistance given by zakat authorities reached the intended target. Despite, various studies undertaken to identify the poor correctly, there are reports of the poor not receiving the adequate assistance required from zakat. Thus, this study examines the determinants of poverty among applicants for zakat assistance distributed by the State Islamic Religious Council in Malacca (SIRCM). Malacca is a state in Malaysia. The respondents were based on the list of names of new zakat applicants for the month of April and May 2014 provided by SIRCM. A binary logistic regression was estimated based on this data with either zakat applications is rejected or accepted as the dependent variable and set of demographic variables and health as the explanatory variables. Overall, the logistic model successfully predicted factors of acceptance of zakat applications. Three independent variables namely gender, age; size of households and health significantly explain the likelihood of a successful zakat application. Among others, the finding suggests the importance of focusing on providing education opportunity in helping the poor.

Keywords: logistic regression, zakat distribution, status of zakat applications, poverty, education

Procedia PDF Downloads 336
3065 Modelling the Impact of Installation of Heat Cost Allocators in District Heating Systems Using Machine Learning

Authors: Danica Maljkovic, Igor Balen, Bojana Dalbelo Basic

Abstract:

Following the regulation of EU Directive on Energy Efficiency, specifically Article 9, individual metering in district heating systems has to be introduced by the end of 2016. These directions have been implemented in member state’s legal framework, Croatia is one of these states. The directive allows installation of both heat metering devices and heat cost allocators. Mainly due to bad communication and PR, the general public false image was created that the heat cost allocators are devices that save energy. Although this notion is wrong, the aim of this work is to develop a model that would precisely express the influence of installation heat cost allocators on potential energy savings in each unit within multifamily buildings. At the same time, in recent years, a science of machine learning has gain larger application in various fields, as it is proven to give good results in cases where large amounts of data are to be processed with an aim to recognize a pattern and correlation of each of the relevant parameter as well as in the cases where the problem is too complex for a human intelligence to solve. A special method of machine learning, decision tree method, has proven an accuracy of over 92% in prediction general building consumption. In this paper, a machine learning algorithms will be used to isolate the sole impact of installation of heat cost allocators on a single building in multifamily houses connected to district heating systems. Special emphasises will be given regression analysis, logistic regression, support vector machines, decision trees and random forest method.

Keywords: district heating, heat cost allocator, energy efficiency, machine learning, decision tree model, regression analysis, logistic regression, support vector machines, decision trees and random forest method

Procedia PDF Downloads 249
3064 Ground Motion Modeling Using the Least Absolute Shrinkage and Selection Operator

Authors: Yildiz Stella Dak, Jale Tezcan

Abstract:

Ground motion models that relate a strong motion parameter of interest to a set of predictive seismological variables describing the earthquake source, the propagation path of the seismic wave, and the local site conditions constitute a critical component of seismic hazard analyses. When a sufficient number of strong motion records are available, ground motion relations are developed using statistical analysis of the recorded ground motion data. In regions lacking a sufficient number of recordings, a synthetic database is developed using stochastic, theoretical or hybrid approaches. Regardless of the manner the database was developed, ground motion relations are developed using regression analysis. Development of a ground motion relation is a challenging process which inevitably requires the modeler to make subjective decisions regarding the inclusion criteria of the recordings, the functional form of the model and the set of seismological variables to be included in the model. Because these decisions are critically important to the validity and the applicability of the model, there is a continuous interest on procedures that will facilitate the development of ground motion models. This paper proposes the use of the Least Absolute Shrinkage and Selection Operator (LASSO) in selecting the set predictive seismological variables to be used in developing a ground motion relation. The LASSO can be described as a penalized regression technique with a built-in capability of variable selection. Similar to the ridge regression, the LASSO is based on the idea of shrinking the regression coefficients to reduce the variance of the model. Unlike ridge regression, where the coefficients are shrunk but never set equal to zero, the LASSO sets some of the coefficients exactly to zero, effectively performing variable selection. Given a set of candidate input variables and the output variable of interest, LASSO allows ranking the input variables in terms of their relative importance, thereby facilitating the selection of the set of variables to be included in the model. Because the risk of overfitting increases as the ratio of the number of predictors to the number of recordings increases, selection of a compact set of variables is important in cases where a small number of recordings are available. In addition, identification of a small set of variables can improve the interpretability of the resulting model, especially when there is a large number of candidate predictors. A practical application of the proposed approach is presented, using more than 600 recordings from the National Geospatial-Intelligence Agency (NGA) database, where the effect of a set of seismological predictors on the 5% damped maximum direction spectral acceleration is investigated. The set of candidate predictors considered are Magnitude, Rrup, Vs30. Using LASSO, the relative importance of the candidate predictors has been ranked. Regression models with increasing levels of complexity were constructed using one, two, three, and four best predictors, and the models’ ability to explain the observed variance in the target variable have been compared. The bias-variance trade-off in the context of model selection is discussed.

Keywords: ground motion modeling, least absolute shrinkage and selection operator, penalized regression, variable selection

Procedia PDF Downloads 330
3063 Quality Parameters of Offset Printing Wastewater

Authors: Kiurski S. Jelena, Kecić S. Vesna, Aksentijević M. Snežana

Abstract:

Samples of tap and wastewater were collected in three offset printing facilities in Novi Sad, Serbia. Ten physicochemical parameters were analyzed within all collected samples: pH, conductivity, m - alkalinity, p - alkalinity, acidity, carbonate concentration, hydrogen carbonate concentration, active oxygen content, chloride concentration and total alkali content. All measurements were conducted using the standard analytical and instrumental methods. Comparing the obtained results for tap water and wastewater, a clear quality difference was noticeable, since all physicochemical parameters were significantly higher within wastewater samples. The study also involves the application of simple linear regression analysis on the obtained dataset. By using software package ORIGIN 5 the pH value was mutually correlated with other physicochemical parameters. Based on the obtained values of Pearson coefficient of determination a strong positive correlation between chloride concentration and pH (r = -0.943), as well as between acidity and pH (r = -0.855) was determined. In addition, statistically significant difference was obtained only between acidity and chloride concentration with pH values, since the values of parameter F (247.634 and 182.536) were higher than Fcritical (5.59). In this way, results of statistical analysis highlighted the most influential parameter of water contamination in offset printing, in the form of acidity and chloride concentration. The results showed that variable dependence could be represented by the general regression model: y = a0 + a1x+ k, which further resulted with matching graphic regressions.

Keywords: pollution, printing industry, simple linear regression analysis, wastewater

Procedia PDF Downloads 235
3062 A Hybrid-Evolutionary Optimizer for Modeling the Process of Obtaining Bricks

Authors: Marius Gavrilescu, Sabina-Adriana Floria, Florin Leon, Silvia Curteanu, Costel Anton

Abstract:

Natural sciences provide a wide range of experimental data whose related problems require study and modeling beyond the capabilities of conventional methodologies. Such problems have solution spaces whose complexity and high dimensionality require correspondingly complex regression methods for proper characterization. In this context, we propose an optimization method which consists in a hybrid dual optimizer setup: a global optimizer based on a modified variant of the popular Imperialist Competitive Algorithm (ICA), and a local optimizer based on a gradient descent approach. The ICA is modified such that intermediate solution populations are more quickly and efficiently pruned of low-fitness individuals by appropriately altering the assimilation, revolution and competition phases, which, combined with an initialization strategy based on low-discrepancy sampling, allows for a more effective exploration of the corresponding solution space. Subsequently, gradient-based optimization is used locally to seek the optimal solution in the neighborhoods of the solutions found through the modified ICA. We use this combined approach to find the optimal configuration and weights of a fully-connected neural network, resulting in regression models used to characterize the process of obtained bricks using silicon-based materials. Installations in the raw ceramics industry, i.e., bricks, are characterized by significant energy consumption and large quantities of emissions. Thus, the purpose of our approach is to determine by simulation the working conditions, including the manufacturing mix recipe with the addition of different materials, to minimize the emissions represented by CO and CH4. Our approach determines regression models which perform significantly better than those found using the traditional ICA for the aforementioned problem, resulting in better convergence and a substantially lower error.

Keywords: optimization, biologically inspired algorithm, regression models, bricks, emissions

Procedia PDF Downloads 82
3061 Econometric Analysis of West African Countries’ Container Terminal Throughput and Gross Domestic Products

Authors: Kehinde Peter Oyeduntan, Kayode Oshinubi

Abstract:

The west African ports have been experiencing large inflow and outflow of containerized cargo in the last decades, and this has created a quest amongst the countries to attain the status of hub port for the sub-region. This study analyzed the relationship between the container throughput and Gross Domestic Products (GDP) of nine west African countries, using Simple Linear Regression (SLR), Polynomial Regression Model (PRM) and Support Vector Machines (SVM) with a time series of 20 years. The results showed that there exists a high correlation between the GDP and container throughput. The model also predicted the container throughput in west Africa for the next 20 years. The findings and recommendations presented in this research will guide policy makers and help improve the management of container ports and terminals in west Africa, thereby boosting the economy.

Keywords: container, ports, terminals, throughput

Procedia PDF Downloads 214
3060 The Influence of the Vocational Teachers Empowerment toward the Vocational High Schools’ Performance Based on the Education National Standards of Indonesia

Authors: Abdul Haris Setiawan

Abstract:

Teachers empowerment is one of the important factors considered to contribute significantly to the achievement of the national education goals. This study was conducted to determine the influence on the vocational teachers empowerment toward the performance of the vocational high schools based on the Education National Standards of Indonesia. The population of the study was all vocational teachers at the State Vocational High schools in Surakarta, Central Java Province, Indonesia. The sampling technique used proportional random sampling technique. This study used a quantitative descriptive statistical analysis techniques. The data was collected using questionnaires. The data has been collected and then tested using analysis requirements test. Having tested using the requirements analysis and then the data processed using regression analysis between the independent and dependent variables to determine the effect and the regression equation. The results of the study found that the level of vocational high schools’ performance based on the Education National Standards of Indonesia was 74.29%, including in the high category; the level of vocational teachers empowerment was 76.20%, including in the high category; there was a positive influence of vocational teachers empowerment toward the vocational high schools’ performance based on the Education National Standards of Indonesia with a correlation coefficient of 0,886, and a contribution of 78.50% with the regression equation Y = 79.431 +0.534 X.

Keywords: vocational teachers, empowerment, vocational high school, the education national standards

Procedia PDF Downloads 394
3059 Prediction of Index-Mechanical Properties of Pyroclastic Rock Utilizing Electrical Resistivity Method

Authors: İsmail İnce

Abstract:

The aim of this study is to determine index and mechanical properties of pyroclastic rock in a practical way by means of electrical resistivity method. For this purpose, electrical resistivity, uniaxial compressive strength, point load strength, P-wave velocity, density and porosity values of 10 different pyroclastic rocks were measured in the laboratory. A simple regression analysis was made among the index-mechanical properties of the samples compatible with electrical resistivity values. A strong exponentially relation was found between index-mechanical properties and electrical resistivity values. The electrical resistivity method can be used to assess the engineering properties of the rock from which it is difficult to obtain regular shaped samples as a non-destructive method.

Keywords: electrical resistivity, index-mechanical properties, pyroclastic rocks, regression analysis

Procedia PDF Downloads 473
3058 Using Machine Learning to Enhance Win Ratio for College Ice Hockey Teams

Authors: Sadixa Sanjel, Ahmed Sadek, Naseef Mansoor, Zelalem Denekew

Abstract:

Collegiate ice hockey (NCAA) sports analytics is different from the national level hockey (NHL). We apply and compare multiple machine learning models such as Linear Regression, Random Forest, and Neural Networks to predict the win ratio for a team based on their statistics. Data exploration helps determine which statistics are most useful in increasing the win ratio, which would be beneficial to coaches and team managers. We ran experiments to select the best model and chose Random Forest as the best performing. We conclude with how to bridge the gap between the college and national levels of sports analytics and the use of machine learning to enhance team performance despite not having a lot of metrics or budget for automatic tracking.

Keywords: NCAA, NHL, sports analytics, random forest, regression, neural networks, game predictions

Procedia PDF Downloads 114
3057 A Survey on Quasi-Likelihood Estimation Approaches for Longitudinal Set-ups

Authors: Naushad Mamode Khan

Abstract:

The Com-Poisson (CMP) model is one of the most popular discrete generalized linear models (GLMS) that handles both equi-, over- and under-dispersed data. In longitudinal context, an integer-valued autoregressive (INAR(1)) process that incorporates covariate specification has been developed to model longitudinal CMP counts. However, the joint likelihood CMP function is difficult to specify and thus restricts the likelihood based estimating methodology. The joint generalized quasilikelihood approach (GQL-I) was instead considered but is rather computationally intensive and may not even estimate the regression effects due to a complex and frequently ill conditioned covariance structure. This paper proposes a new GQL approach for estimating the regression parameters (GQLIII) that are based on a single score vector representation. The performance of GQL-III is compared with GQL-I and separate marginal GQLs (GQL-II) through some simulation experiments and is proved to yield equally efficient estimates as GQL-I and is far more computationally stable.

Keywords: longitudinal, com-Poisson, ill-conditioned, INAR(1), GLMS, GQL

Procedia PDF Downloads 354
3056 Extended Arithmetic Precision in Meshfree Calculations

Authors: Edward J. Kansa, Pavel Holoborodko

Abstract:

Continuously differentiable radial basis functions (RBFs) are meshfree, converge faster as the dimensionality increases, and is theoretically spectrally convergent. When implemented on current single and double precision computers, such RBFs can suffer from ill-conditioning because the systems of equations needed to be solved to find the expansion coefficients are full. However, the Advanpix extended precision software package allows computer mathematics to resemble asymptotically ideal Platonic mathematics. Additionally, full systems with extended precision execute faster graphical processors units and field-programmable gate arrays because no branching is needed. Sparse equation systems are fast for iterative solvers in a very limited number of cases.

Keywords: partial differential equations, Meshfree radial basis functions, , no restrictions on spatial dimensions, Extended arithmetic precision.

Procedia PDF Downloads 149
3055 A Review on Intelligent Systems for Geoscience

Authors: R Palson Kennedy, P.Kiran Sai

Abstract:

This article introduces machine learning (ML) researchers to the hurdles that geoscience problems present, as well as the opportunities for improvement in both ML and geosciences. This article presents a review from the data life cycle perspective to meet that need. Numerous facets of geosciences present unique difficulties for the study of intelligent systems. Geosciences data is notoriously difficult to analyze since it is frequently unpredictable, intermittent, sparse, multi-resolution, and multi-scale. The first half addresses data science’s essential concepts and theoretical underpinnings, while the second section contains key themes and sharing experiences from current publications focused on each stage of the data life cycle. Finally, themes such as open science, smart data, and team science are considered.

Keywords: Data science, intelligent system, machine learning, big data, data life cycle, recent development, geo science

Procedia PDF Downloads 135
3054 The Relationship between Coping Styles and Internet Addiction among High School Students

Authors: Adil Kaval, Digdem Muge Siyez

Abstract:

With the negative effects of internet use in a person's life, the use of the Internet has become an issue. This subject was mostly considered as internet addiction, and it was investigated. In literature, it is noteworthy that some theoretical models have been proposed to explain the reasons for internet addiction. In addition to these theoretical models, it may be thought that the coping style for stressing events can be a predictor of internet addiction. It was aimed to test with logistic regression the effect of high school students' coping styles on internet addiction levels. Sample of the study consisted of 770 Turkish adolescents (471 girls, 299 boys) selected from high schools in the 2017-2018 academic year in İzmir province. Internet Addiction Test, Coping Scale for Child and Adolescents and a demographic information form were used in this study. The results of the logistic regression analysis indicated that the model of coping styles predicted internet addiction provides a statistically significant prediction of internet addiction. Gender does not predict whether or not to be addicted to the internet. The active coping style is not effective on internet addiction levels, while the avoiding and negative coping style are effective on internet addiction levels. With this model, % 79.1 of internet addiction in high school is estimated. The Negelkerke pseudo R2 indicated that the model accounted for %35 of the total variance. The results of this study on Turkish adolescents are similar to the results of other studies in the literature. It can be argued that avoiding and negative coping styles are important risk factors in the development of internet addiction.

Keywords: adolescents, coping, internet addiction, regression analysis

Procedia PDF Downloads 173
3053 Anticipation of Bending Reinforcement Based on Iranian Concrete Code Using Meta-Heuristic Tools

Authors: Seyed Sadegh Naseralavi, Najmeh Bemani

Abstract:

In this paper, different concrete codes including America, New Zealand, Mexico, Italy, India, Canada, Hong Kong, Euro Code and Britain are compared with the Iranian concrete design code. First, by using Adaptive Neuro Fuzzy Inference System (ANFIS), the codes having the most correlation with the Iranian ninth issue of the national regulation are determined. Consequently, two anticipated methods are used for comparing the codes: Artificial Neural Network (ANN) and Multi-variable regression. The results show that ANN performs better. Predicting is done by using only tensile steel ratio and with ignoring the compression steel ratio.

Keywords: adaptive neuro fuzzy inference system, anticipate method, artificial neural network, concrete design code, multi-variable regression

Procedia PDF Downloads 284
3052 Efficient Credit Card Fraud Detection Based on Multiple ML Algorithms

Authors: Neha Ahirwar

Abstract:

In the contemporary digital era, the rise of credit card fraud poses a significant threat to both financial institutions and consumers. As fraudulent activities become more sophisticated, there is an escalating demand for robust and effective fraud detection mechanisms. Advanced machine learning algorithms have become crucial tools in addressing this challenge. This paper conducts a thorough examination of the design and evaluation of a credit card fraud detection system, utilizing four prominent machine learning algorithms: random forest, logistic regression, decision tree, and XGBoost. The surge in digital transactions has opened avenues for fraudsters to exploit vulnerabilities within payment systems. Consequently, there is an urgent need for proactive and adaptable fraud detection systems. This study addresses this imperative by exploring the efficacy of machine learning algorithms in identifying fraudulent credit card transactions. The selection of random forest, logistic regression, decision tree, and XGBoost for scrutiny in this study is based on their documented effectiveness in diverse domains, particularly in credit card fraud detection. These algorithms are renowned for their capability to model intricate patterns and provide accurate predictions. Each algorithm is implemented and evaluated for its performance in a controlled environment, utilizing a diverse dataset comprising both genuine and fraudulent credit card transactions.

Keywords: efficient credit card fraud detection, random forest, logistic regression, XGBoost, decision tree

Procedia PDF Downloads 66