Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 4396

Search results for: multivariate regression tree

3676 On Improving Breast Cancer Prediction Using GRNN-CP

Abstract:

The aim of this study is to predict breast cancer and to construct a supportive model that will stimulate a more reliable prediction as a factor that is fundamental for public health. In this study, we utilize general regression neural networks (GRNN) to replace the normal predictions with prediction periods to achieve a reasonable percentage of confidence. The mechanism employed here utilises a machine learning system called conformal prediction (CP), in order to assign consistent confidence measures to predictions, which are combined with GRNN. We apply the resulting algorithm to the problem of breast cancer diagnosis. The results show that the prediction constructed by this method is reasonable and could be useful in practice.

Keywords: neural network, conformal prediction, cancer classification, regression

Procedia PDF Downloads 281

3675 Multiple Linear Regression for Rapid Estimation of Subsurface Resistivity from Apparent Resistivity Measurements

Authors: Sabiu Bala Muhammad, Rosli Saad

Abstract:

Multiple linear regression (MLR) models for fast estimation of true subsurface resistivity from apparent resistivity field measurements are developed and assessed in this study. The parameters investigated were apparent resistivity (ρₐ), horizontal location (X) and depth (Z) of measurement as the independent variables; and true resistivity (ρₜ) as the dependent variable. To achieve linearity in both resistivity variables, datasets were first transformed into logarithmic domain following diagnostic checks of normality of the dependent variable and heteroscedasticity to ensure accurate models. Four MLR models were developed based on hierarchical combination of the independent variables. The generated MLR coefficients were applied to another data set to estimate ρₜ values for validation. Contours of the estimated ρₜ values were plotted and compared to the observed data plots at the colour scale and blanking for visual assessment. The accuracy of the models was assessed using coefficient of determination (R²), standard error (SE) and weighted mean absolute percentage error (wMAPE). It is concluded that the MLR models can estimate ρₜ for with high level of accuracy.

Keywords: apparent resistivity, depth, horizontal location, multiple linear regression, true resistivity

Procedia PDF Downloads 271

3674 Release Response of Black Spruce and White Spruce Following Overstory Lodgepole Pine Mortality Due to Mountain Pine Beetle Attack

Authors: F. O. Oboite, P. G. Comeau

Abstract:

Advance regeneration is present in many lodgepole pine stands in Alberta. When the overstory pine canopy is killed by Mountain Pine Beetle (MPB) the growth of this advance is likely to increase. Understanding the growth response of these understory tree species is needed to improve mid-term timber supply projections and management decisions. To quantify the growth (diameter, height, height/diameter ratio) responses of black spruce and white spruce to lodgepole pine mortality, sample trees of black and white spruce advance regeneration were selected from 7 lodgepole pine dominated stands (5 attacked; 2 control) in the Foothills Region of western Alberta. Measurements were collected 7-8 years after MPB attack across a wide range of spruce height and stand densities. Analysis was done using mixed model linear regression. Result indicates that there was an increase in both diameter and height growth after MPB attack; however, this increase in growth was delayed for about four years. Both spruce species had similar height response and their height/diameter ratio decreased after release, partly as a result of increased understory light associated with loss of needles in the pine canopy. In addition, the diameter and height growth responses of both spruce species were strongly related to density, prerelease growth and initial size.

Keywords: mountain pine beetle, forest regeneration, lodgepole pine, growth response

Procedia PDF Downloads 374

3673 Multicollinearity and MRA in Sustainability: Application of the Raise Regression

Authors: Claudia García-García, Catalina B. García-García, Román Salmerón-Gómez

Abstract:

Much economic-environmental research includes the analysis of possible interactions by using Moderated Regression Analysis (MRA), which is a specific application of multiple linear regression analysis. This methodology allows analyzing how the effect of one of the independent variables is moderated by a second independent variable by adding a cross-product term between them as an additional explanatory variable. Due to the very specification of the methodology, the moderated factor is often highly correlated with the constitutive terms. Thus, great multicollinearity problems arise. The appearance of strong multicollinearity in a model has important consequences. Inflated variances of the estimators may appear, there is a tendency to consider non-significant regressors that they probably are together with a very high coefficient of determination, incorrect signs of our coefficients may appear and also the high sensibility of the results to small changes in the dataset. Finally, the high relationship among explanatory variables implies difficulties in fixing the individual effects of each one on the model under study. These consequences shifted to the moderated analysis may imply that it is not worth including an interaction term that may be distorting the model. Thus, it is important to manage the problem with some methodology that allows for obtaining reliable results. After a review of those works that applied the MRA among the ten top journals of the field, it is clear that multicollinearity is mostly disregarded. Less than 15% of the reviewed works take into account potential multicollinearity problems. To overcome the issue, this work studies the possible application of recent methodologies to MRA. Particularly, the raised regression is analyzed. This methodology mitigates collinearity from a geometrical point of view: the collinearity problem arises because the variables under study are very close geometrically, so by separating both variables, the problem can be mitigated. Raise regression maintains the available information and modifies the problematic variables instead of deleting variables, for example. Furthermore, the global characteristics of the initial model are also maintained (sum of squared residuals, estimated variance, coefficient of determination, global significance test and prediction). The proposal is implemented to data from countries of the European Union during the last year available regarding greenhouse gas emissions, per capita GDP and a dummy variable that represents the topography of the country. The use of a dummy variable as the moderator is a special variant of MRA, sometimes called “subgroup regression analysis.” The main conclusion of this work is that applying new techniques to the field can improve in a substantial way the results of the analysis. Particularly, the use of raised regression mitigates great multicollinearity problems, so the researcher is able to rely on the interaction term when interpreting the results of a particular study.

Keywords: multicollinearity, MRA, interaction, raise

Procedia PDF Downloads 100

3672 A Validated Estimation Method to Predict the Interior Wall of Residential Buildings Based on Easy to Collect Variables

Authors: B. Gepts, E. Meex, E. Nuyts, E. Knaepen, G. Verbeeck

Abstract:

The importance of resource efficiency and environmental impact assessment has raised the interest in knowing the amount of materials used in buildings. If no BIM model or energy performance certificate is available, material quantities can be obtained through an estimation or time-consuming calculation. For the interior wall area, no validated estimation method exists. However, in the case of environmental impact assessment or evaluating the existing building stock as future material banks, knowledge of the material quantities used in interior walls is indispensable. This paper presents a validated method for the estimation of the interior wall area for dwellings based on easy-to-collect building characteristics. A database of 4963 residential buildings spread all over Belgium is used. The data are collected through onsite measurements of the buildings during the construction phase (between mid-2010 and mid-2017). The interior wall area refers to the area of all interior walls in the building, including the inner leaf of exterior (party) walls, minus the area of windows and doors, unless mentioned otherwise. The two predictive modelling techniques used are 1) a (stepwise) linear regression and 2) a decision tree. The best estimation method is selected based on the best R² k-fold (5) fit. The research shows that the building volume is by far the most important variable to estimate the interior wall area. A stepwise regression based on building volume per building, building typology, and type of house provides the best fit, with R² k-fold (5) = 0.88. Although the best R² k-fold value is obtained when the other parameters ‘building typology’ and ‘type of house’ are included, the contribution of these variables can be seen as statistically significant but practically irrelevant. Thus, if these parameters are not available, a simplified estimation method based on only the volume of the building can also be applied (R² k-fold = 0.87). The robustness and precision of the method (output) are validated three times. Firstly, the prediction of the interior wall area is checked by means of alternative calculations of the building volume and of the interior wall area; thus, other definitions are applied to the same data. Secondly, the output is tested on an extension of the database, so it has the same definitions but on other data. Thirdly, the output is checked on an unrelated database with other definitions and other data. The validation of the estimation methods demonstrates that the methods remain accurate when underlying data are changed. The method can support environmental as well as economic dimensions of impact assessment, as it can be used in early design. As it allows the prediction of the amount of interior wall materials to be produced in the future or that might become available after demolition, the presented estimation method can be part of material flow analyses on input and on output.

Keywords: buildings as material banks, building stock, estimation method, interior wall area

Procedia PDF Downloads 23

3671 Detecting Music Enjoyment Level Using Electroencephalogram Signals and Machine Learning Techniques

Authors: Raymond Feng, Shadi Ghiasi

Abstract:

An electroencephalogram (EEG) is a non-invasive technique that records electrical activity in the brain using scalp electrodes. Researchers have studied the use of EEG to detect emotions and moods by collecting signals from participants and analyzing how those signals correlate with their activities. In this study, researchers investigated the relationship between EEG signals and music enjoyment. Participants listened to music while data was collected. During the signal-processing phase, power spectral densities (PSDs) were computed from the signals, and dominant brainwave frequencies were extracted from the PSDs to form a comprehensive feature matrix. A machine learning approach was then taken to find correlations between the processed data and the music enjoyment level indicated by the participants. To improve on previous research, multiple machine learning models were employed, including K-Nearest Neighbors Classifier, Support Vector Classifier, and Decision Tree Classifier. Hyperparameters were used to fine-tune each model to further increase its performance. The experiments showed that a strong correlation exists, with the Decision Tree Classifier with hyperparameters yielding 85% accuracy. This study proves that EEG is a reliable means to detect music enjoyment and has future applications, including personalized music recommendation, mood adjustment, and mental health therapy.

Keywords: EEG, electroencephalogram, machine learning, mood, music enjoyment, physiological signals

Procedia PDF Downloads 56

3670 State and Determinant of Caregiver’s Mental Health in Thailand: A Household Level Analysis

Authors: Ruttana Phetsitong, Patama Vapattanawong, Malee Sunpuwan, Marc Voelker

Abstract:

The majority of care for older people at home in Thai society falls upon caregivers resulting in caregiver’s mental health problem. Beyond individual characteristics, household factors might have a profound effect on the caregiver’s mental health. But reliable data capturing this at the household level have been limited to date. The objectives of the present study were to explore the levels of Thai caregiver’s mental health and to investigate the factors affecting the mental health at household level. Data were obtained from the 2011 National Survey of Thai Older Persons conducted by the National Statistical Office of Thailand. Caregiver’s mental health was measured by using the 15- items-short version of the Thai Mental Health Indicator (TMHI-15) developed by the Department of Mental Health, the Ministry of Public Health. Multivariate logistic regression models were used to explore the impact of potential factors on caregiver’s mental health. The THMI-15 produced an overall average caregiver mental health score of 30.9 out of 45 (SD 5.3). The score can be categorized into good (34.02-45), fair (27.01-34), and poor (0-27). Duration of care for older people, household wealth, and functional dependency of the older people significantly predicted total caregiver’s mental health. Household economic factor was key in predicting better mental health. Compared to those poorest households, the adjusted effect of the fifth quintile household wealth was high (OR=2.34; 95%CI=1.47-3.73). The findings of this study provide a fuller picture to a better understanding of the level and factors that cause the mental health of Thai caregivers. Health care providers and policymakers should consider these factors when designing interventions aimed at alleviating caregiver’s psychological burden when provided care for older people at home.

Keywords: caregiver’s mental health, household, older people, Thailand

Procedia PDF Downloads 141

3669 Bayesian Reliability of Weibull Regression with Type-I Censored Data

Authors: Al Omari Moahmmed Ahmed

Abstract:

In the Bayesian, we developed an approach by using non-informative prior with covariate and obtained by using Gauss quadrature method to estimate the parameters of the covariate and reliability function of the Weibull regression distribution with Type-I censored data. The maximum likelihood seen that the estimators obtained are not available in closed forms, although they can be solved it by using Newton-Raphson methods. The comparison criteria are the MSE and the performance of these estimates are assessed using simulation considering various sample size, several specific values of shape parameter. The results show that Bayesian with non-informative prior is better than Maximum Likelihood Estimator.

Keywords: non-informative prior, Bayesian method, type-I censoring, Gauss quardature

Procedia PDF Downloads 499

3668 Walmart Sales Forecasting using Machine Learning in Python

Authors: Niyati Sharma, Om Anand, Sanjeev Kumar Prasad

Abstract:

Assuming future sale value for any of the organizations is one of the major essential characteristics of tactical development. Walmart Sales Forecasting is the finest illustration to work with as a beginner; subsequently, it has the major retail data set. Walmart uses this sales estimate problem for hiring purposes also. We would like to analyzing how the internal and external effects of one of the largest companies in the US can walk out their Weekly Sales in the future. Demand forecasting is the planned prerequisite of products or services in the imminent on the basis of present and previous data and different stages of the market. Since all associations is facing the anonymous future and we do not distinguish in the future good demand. Hence, through exploring former statistics and recent market statistics, we envisage the forthcoming claim and building of individual goods, which are extra challenging in the near future. As a result of this, we are producing the required products in pursuance of the petition of the souk in advance. We will be using several machine learning models to test the exactness and then lastly, train the whole data by Using linear regression and fitting the training data into it. Accuracy is 8.88%. The extra trees regression model gives the best accuracy of 97.15%.

Keywords: random forest algorithm, linear regression algorithm, extra trees classifier, mean absolute error

Procedia PDF Downloads 146

3667 Machine Learning Models for the Prediction of Heating and Cooling Loads of a Residential Building

Authors: Aaditya U. Jhamb

Abstract:

Due to the current energy crisis that many countries are battling, energy-efficient buildings are the subject of extensive research in the modern technological era because of growing worries about energy consumption and its effects on the environment. The paper explores 8 factors that help determine energy efficiency for a building: (relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, and glazing area distribution), with Tsanas and Xifara providing a dataset. The data set employed 768 different residential building models to anticipate heating and cooling loads with a low mean squared error. By optimizing these characteristics, machine learning algorithms may assess and properly forecast a building's heating and cooling loads, lowering energy usage while increasing the quality of people's lives. As a result, the paper studied the magnitude of the correlation between these input factors and the two output variables using various statistical methods of analysis after determining which input variable was most closely associated with the output loads. The most conclusive model was the Decision Tree Regressor, which had a mean squared error of 0.258, whilst the least definitive model was the Isotonic Regressor, which had a mean squared error of 21.68. This paper also investigated the KNN Regressor and the Linear Regression, which had to mean squared errors of 3.349 and 18.141, respectively. In conclusion, the model, given the 8 input variables, was able to predict the heating and cooling loads of a residential building accurately and precisely.

Keywords: energy efficient buildings, heating load, cooling load, machine learning models

Procedia PDF Downloads 90

3666 Analysis of Ferroresonant Overvoltages in Cable-fed Transformers

Authors: George Eduful, Ebenezer A. Jackson, Kingsford A. Atanga

Abstract:

This paper investigates the impacts of cable length and capacity of transformer on ferroresonant overvoltage in cable-fed transformers. The study was conducted by simulation using the EMTP RV. Results show that ferroresonance can cause dangerous overvoltages ranging from 2 to 5 per unit. These overvoltages impose stress on insulations of transformers and cables and subsequently result in system failures. Undertaking Basic Multiple Regression Analysis (BMR) on the results obtained, a statistical model was obtained in terms of cable length and transformer capacity. The model is useful for ferroresonant prediction and control in cable-fed transformers.

Keywords: ferroresonance, cable-fed transformers, EMTP RV, regression analysis

Procedia PDF Downloads 529

3665 The Role of HPV Status in Patients with Overlapping Grey Zone Cancer in Oral Cavity and Oropharynx

Authors: Yao Song

Abstract:

Objectives: We aimed to explore the clinicodemographic characteristics and prognosis of grey zone squamous cell cancer (GZSCC) located in the overlapping or ambiguous area of the oral cavity and oropharynx and to identify valuable factors that would improve its differential diagnosis and prognosis. Methods: Information of GZSCC patients in the Surveillance, Epidemiology, and End Results (SEER) database was compared to patients with an oral cavity (OCSCC) and oropharyngeal (OPSCC) squamous cell carcinomas with corresponding HPV status, respectively. Kaplan-Meier method with log-rank test and multivariate Cox regression analysis were applied to assess associations between clinical characteristics and overall survival (OS). A predictive model integrating age, gender, marital status, HPV status, and staging variables was conducted to classify GZSCC patients into three risk groups and verified internally by 10-fold cross validation. Results: A total of 3318 GZSCC, 10792 OPSCC, and 6656 OCSCC patients were identified. HPV-positive GZSCC patients had the best 5-year OS as HPV-positive OPSCC (81% vs. 82%). However, the 5-year OS of HPV-negative/unknown GZSCC (43%/42%) was the worst among all groups, indicating that HPV status and the overlapping nature of tumors were valuable prognostic predictors in GZSCC patients. Compared with the strategy of dividing GZSCC into two groups by HPV status, the predictive model integrating more variables could additionally identify a unique high-risk GZSCC group with the lowest OS rate. Conclusions: GZSCC patients had distinct clinical characteristics and prognoses compared with OPSCC and OCSCC; integrating HPV status and other clinical factors could help distinguish GZSCC and predict their prognosis.

Keywords: GZSCC, OCSCC, OPSCC, HPV

Procedia PDF Downloads 72

3664 Geostatistical Analysis of Contamination of Soils in an Urban Area in Ghana

Authors: S. K. Appiah, E. N. Aidoo, D. Asamoah Owusu, M. W. Nuonabuor

Abstract:

Urbanization remains one of the unique predominant factors which is linked to the destruction of urban environment and its associated cases of soil contamination by heavy metals through the natural and anthropogenic activities. These activities are important sources of toxic heavy metals such as arsenic (As), cadmium (Cd), chromium (Cr), copper (Cu), iron (Fe), manganese (Mn), and lead (Pb), nickel (Ni) and zinc (Zn). Often, these heavy metals lead to increased levels in some areas due to the impact of atmospheric deposition caused by their proximity to industrial plants or the indiscriminately burning of substances. Information gathered on potentially hazardous levels of these heavy metals in soils leads to establish serious health and urban agriculture implications. However, characterization of spatial variations of soil contamination by heavy metals in Ghana is limited. Kumasi is a Metropolitan city in Ghana, West Africa and is challenged with the recent spate of deteriorating soil quality due to rapid economic development and other human activities such as “Galamsey”, illegal mining operations within the metropolis. The paper seeks to use both univariate and multivariate geostatistical techniques to assess the spatial distribution of heavy metals in soils and the potential risk associated with ingestion of sources of soil contamination in the Metropolis. Geostatistical tools have the ability to detect changes in correlation structure and how a good knowledge of the study area can help to explain the different scales of variation detected. To achieve this task, point referenced data on heavy metals measured from topsoil samples in a previous study, were collected at various locations. Linear models of regionalisation and coregionalisation were fitted to all experimental semivariograms to describe the spatial dependence between the topsoil heavy metals at different spatial scales, which led to ordinary kriging and cokriging at unsampled locations and production of risk maps of soil contamination by these heavy metals. Results obtained from both the univariate and multivariate semivariogram models showed strong spatial dependence with range of autocorrelations ranging from 100 to 300 meters. The risk maps produced show strong spatial heterogeneity for almost all the soil heavy metals with extremely risk of contamination found close to areas with commercial and industrial activities. Hence, ongoing pollution interventions should be geared towards these highly risk areas for efficient management of soil contamination to avert further pollution in the metropolis.

Keywords: coregionalization, heavy metals, multivariate geostatistical analysis, soil contamination, spatial distribution

Procedia PDF Downloads 296

3663 An Adaptive Controller Method Based on Full-State Linear Model of Variable Cycle Engine

Authors: Jia Li, Huacong Li, Xiaobao Han

Abstract:

Due to the more variable geometry parameters of VCE (variable cycle aircraft engine), presents an adaptive controller method based on the full-state linear model of VCE and has simulated to solve the multivariate controller design problem of the whole flight envelops. First, analyzes the static and dynamic performances of bypass ratio and other state parameters caused by variable geometric components, and develops nonlinear component model of VCE. Then based on the component model, through small deviation linearization of main fuel (Wf), the area of tail nozzle throat (A8) and the angle of rear bypass ejector (A163), setting up multiple linear model which variable geometric parameters can be inputs. Second, designs the adaptive controllers for VCE linear models of different nominal points. Among them, considering of modeling uncertainties and external disturbances, derives the adaptive law by lyapunov function. The simulation results showed that, the adaptive controller method based on full-state linear model used the angle of rear bypass ejector as input and effectively solved the multivariate control problems of VCE. The performance of all nominal points could track the desired closed-loop reference instructions. The adjust time was less than 1.2s, and the system overshoot was less than 1%, at the same time, the errors of steady states were less than 0.5% and the dynamic tracking errors were less than 1%. In addition, the designed controller could effectively suppress interference and reached the desired commands with different external random noise signals.

Keywords: variable cycle engine (VCE), full-state linear model, adaptive control, by-pass ratio

Procedia PDF Downloads 311

3662 Writings About Homeland: Palestinian American Poetry

Authors: Laila Shikaki

Abstract:

‘Writings about Home’ discusses the poetry of Palestinian American female poets, especially ones who write about their homelands, living away from home, as well as their family ties to the land. This is a paper about poetry, but it is also about Palestinian American women who use English to convey issues pertaining to homesickness, family, and language. She study poems by Naomi Shihab Nye and Natalie Hanal. In ‘My Father and the Fig Tree,’ for example, Nye depicts her father’s life away from Palestine and his attachment to a tree that represents his homeland and nostalgia. Nye’s style is diverse and unified, and her attention is to details and images. While her words and imagery are usually simple, they are always rich in meaning. Nathalie Handal’s poetry, on the other hand, has a more complicated, multi-layered, and nuanced style as the poet herself lived in many areas and spoke multiple languages. ‘Bethlehem,’ for instance, depicts her city of origin, recalling her grandfather. Her poem ‘Blue Hours’ illustrates a persona’s difficulty in belonging, switching from one language to the next, and feeling a betrayal in both. This paper pays attention to language and how being bilingual adds another level of exile and pain to those who have fled or were forced to leave Palestine. This paper is very timely as the issue of Palestinian freedom and its right to autonomy and self-determination is the central stage for many Americans, seen in their protests, university encampments, and graduation ceremonies, not forgetting its effect on voters’ decisions for president and elected officials.

Keywords: Palestinian American, poetry, homeland, Nye, Handal

Procedia PDF Downloads 25

3661 Prevalence and Occupational Factors Associated with Low Back Pain among the Female Garment Workers: A Cross-Sectional Study in Bangladesh

Authors: Fazle Rabbi, Mashuda Khanom Tithi, Tasnim Mirza, Sanjida Rowshan Anannya, Ahmed Hossain

Abstract:

Background: Low Back Pain (LBP) is one of the common health problems among the garment workers that causes workers absenteeism from the work. The purpose of the study is to identify the association between occupational factors and LBP among the female garment workers in Bangladesh. Materials and Methods: A cross-sectional study was conducted with 487 female garment workers from three compliant garment factories of Bangladesh. Face-to-face interview on four different LBP measures along with questions on socio-demographic, occupational, and physical factors were used to collect the data. Result: The prevalence rates for LBP lasts for at least one day during the last six months, chronic pain, intense pain, and seeking medical care for LBP were found 63.04%, 38.60%, 13.76%, and 18.89%, respectively among the female garments workers. The multivariate logistic regression analysis indicates that duration of employment (>5 years), regular weight bearing and extended weekly working hours (>48 hours) are positively associated with LBP. Besides, age, BMI, family income, marital status and number of children are also found positively associated with the LBP measures. Conclusion: The prevalence of LBP among female garment workers in Bangladesh is found high. The duration of employment (>5 years), regular weight bearing and extended weekly working hours (>48 hours) play a significant role in developing LBP among the female workers. Factories need to consider training programs on the appropriate technique of weight bearing. It is also important to conduct regular screening programs to identify LBP, especially with married, overweight/obese and older age group to reduce the occurrence of LBP.

Keywords: Bangladesh, garment workers, low back pain, occupational health

Procedia PDF Downloads 196

3660 Evaluation of the CRISP-DM Business Understanding Step: An Approach for Assessing the Predictive Power of Regression versus Classification for the Quality Prediction of Hydraulic Test Results

Authors: Christian Neunzig, Simon Fahle, Jürgen Schulz, Matthias Möller, Bernd Kuhlenkötter

Abstract:

Digitalisation in production technology is a driver for the application of machine learning methods. Through the application of predictive quality, the great potential for saving necessary quality control can be exploited through the data-based prediction of product quality and states. However, the serial use of machine learning applications is often prevented by various problems. Fluctuations occur in real production data sets, which are reflected in trends and systematic shifts over time. To counteract these problems, data preprocessing includes rule-based data cleaning, the application of dimensionality reduction techniques, and the identification of comparable data subsets to extract stable features. Successful process control of the target variables aims to centre the measured values around a mean and minimise variance. Competitive leaders claim to have mastered their processes. As a result, much of the real data has a relatively low variance. For the training of prediction models, the highest possible generalisability is required, which is at least made more difficult by this data availability. The implementation of a machine learning application can be interpreted as a production process. The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model with six phases that describes the life cycle of data science. As in any process, the costs to eliminate errors increase significantly with each advancing process phase. For the quality prediction of hydraulic test steps of directional control valves, the question arises in the initial phase whether a regression or a classification is more suitable. In the context of this work, the initial phase of the CRISP-DM, the business understanding, is critically compared for the use case at Bosch Rexroth with regard to regression and classification. The use of cross-process production data along the value chain of hydraulic valves is a promising approach to predict the quality characteristics of workpieces. Suitable methods for leakage volume flow regression and classification for inspection decision are applied. Impressively, classification is clearly superior to regression and achieves promising accuracies.

Keywords: classification, CRISP-DM, machine learning, predictive quality, regression

Procedia PDF Downloads 140

3659 Statistical Model to Examine the Impact of the Inflation Rate and Real Interest Rate on the Bahrain Economy

Authors: Ghada Abo-Zaid

Abstract:

Introduction: Oil is one of the most income source in Bahrain. Low oil price influence on the economy growth and the investment rate in Bahrain. For example, the economic growth was 3.7% in 2012, and it reduced to 2.9% in 2015. Investment rate was 9.8% in 2012, and it is reduced to be 5.9% and -12.1% in 2014 and 2015, respectively. The inflation rate is increased to the peak point in 2013 with 3.3 %. Objectives: The objectives here are to build statistical models to examine the effect of the interest rate inflation rate on the growth economy in Bahrain from 2000 to 2018. Methods: This study based on 18 years, and the multiple regression model is used for the analysis. All of the missing data are omitted from the analysis. Results: Regression model is used to examine the association between the Growth national product (GNP), the inflation rate, and real interest rate. We found that (i) Increase the real interest rate decrease the GNP. (ii) Increase the inflation rate does not effect on the growth economy in Bahrain since the average of the inflation rate was almost 2%, and this is considered as a low percentage. Conclusion: There is a positive impact of the real interest rate on the GNP in Bahrain. While the inflation rate does not show any negative influence on the GNP as the inflation rate was not large enough to effect negatively on the economy growth rate in Bahrain.

Keywords: growth national product, egypt, regression model, interest rate

Procedia PDF Downloads 159

3658 Support Vector Regression with Weighted Least Absolute Deviations

Authors: Kang-Mo Jung

Abstract:

Least squares support vector machine (LS-SVM) is a penalized regression which considers both fitting and generalization ability of a model. However, the squared loss function is very sensitive to even single outlier. We proposed a weighted absolute deviation loss function for the robustness of the estimates in least absolute deviation support vector machine. The proposed estimates can be obtained by a quadratic programming algorithm. Numerical experiments on simulated datasets show that the proposed algorithm is competitive in view of robustness to outliers.

Keywords: least absolute deviation, quadratic programming, robustness, support vector machine, weight

Procedia PDF Downloads 520

3657 The Prediction of Effective Equation on Drivers' Behavioral Characteristics of Lane Changing

Authors: Khashayar Kazemzadeh, Mohammad Hanif Dasoomi

Abstract:

According to the increasing volume of traffic, lane changing plays a crucial role in traffic flow. Lane changing in traffic depends on several factors including road geometrical design, speed, drivers’ behavioral characteristics, etc. A great deal of research has been carried out regarding these fields. Despite of the other significant factors, the drivers’ behavioral characteristics of lane changing has been emphasized in this paper. This paper has predicted the effective equation based on personal characteristics of lane changing by regression models.

Keywords: effective equation, lane changing, drivers’ behavioral characteristics, regression models

Procedia PDF Downloads 444

3656 Optimization of Hate Speech and Abusive Language Detection on Indonesian-language Twitter using Genetic Algorithms

Authors: Rikson Gultom

Abstract:

Hate Speech and Abusive language on social media is difficult to detect, usually, it is detected after it becomes viral in cyberspace, of course, it is too late for prevention. An early detection system that has a fairly good accuracy is needed so that it can reduce conflicts that occur in society caused by postings on social media that attack individuals, groups, and governments in Indonesia. The purpose of this study is to find an early detection model on Twitter social media using machine learning that has high accuracy from several machine learning methods studied. In this study, the support vector machine (SVM), Naïve Bayes (NB), and Random Forest Decision Tree (RFDT) methods were compared with the Support Vector machine with genetic algorithm (SVM-GA), Nave Bayes with genetic algorithm (NB-GA), and Random Forest Decision Tree with Genetic Algorithm (RFDT-GA). The study produced a comparison table for the accuracy of the hate speech and abusive language detection model, and presented it in the form of a graph of the accuracy of the six algorithms developed based on the Indonesian-language Twitter dataset, and concluded the best model with the highest accuracy.

Keywords: abusive language, hate speech, machine learning, optimization, social media

Procedia PDF Downloads 125

3655 Climate Changes in Albania and Their Effect on Cereal Yield

Authors: Lule Basha, Eralda Gjika

Abstract:

This study is focused on analyzing climate change in Albania and its potential effects on cereal yields. Initially, monthly temperature and rainfalls in Albania were studied for the period 1960-2021. Climacteric variables are important variables when trying to model cereal yield behavior, especially when significant changes in weather conditions are observed. For this purpose, in the second part of the study, linear and nonlinear models explaining cereal yield are constructed for the same period, 1960-2021. The multiple linear regression analysis and lasso regression method are applied to the data between cereal yield and each independent variable: average temperature, average rainfall, fertilizer consumption, arable land, land under cereal production, and nitrous oxide emissions. In our regression model, heteroscedasticity is not observed, data follow a normal distribution, and there is a low correlation between factors, so we do not have the problem of multicollinearity. Machine-learning methods, such as random forest, are used to predict cereal yield responses to climacteric and other variables. Random Forest showed high accuracy compared to the other statistical models in the prediction of cereal yield. We found that changes in average temperature negatively affect cereal yield. The coefficients of fertilizer consumption, arable land, and land under cereal production are positively affecting production. Our results show that the Random Forest method is an effective and versatile machine-learning method for cereal yield prediction compared to the other two methods.

Keywords: cereal yield, climate change, machine learning, multiple regression model, random forest

Procedia PDF Downloads 84

3654 Impact of Perceived Stress on Psychological Well-Being, Aggression and Emotional Regulation

Authors: Nishtha Batra

Abstract:

This study was conducted to identify the effect of perceived stress on emotional regulation, aggression and psychological well-being. Analysis was conducted using correlational and regression models to examine the relationships between perceived stress (independent variable) and psychological factors containing emotional intelligence, psychological well-being and aggression. Subjects N=100, Male students 50 and Female students 50. The data was collected using Cohen's Perceived Stress Scale, Gross’s Emotional Regulation Questionnaire (ERQ), Ryff’s Psychological Well-being scale and Orispina’s aggression scale. Correlation and regression (SPSS version 22) Emotional regulation and psychological well-being had a significant relationship with Perceived stress.

Keywords: perceived stress, psychological well-being, aggression, emotional regulation, students

Procedia PDF Downloads 11

3653 Exploring the Spatial Relationship between Built Environment and Ride-hailing Demand: Applying Street-Level Images

Authors: Jingjue Bao, Ye Li, Yujie Qi

Abstract:

The explosive growth of ride-hailing has reshaped residents' travel behavior and plays a crucial role in urban mobility within the built environment. Contributing to the research of the spatial variation of ride-hailing demand and its relationship to the built environment and socioeconomic factors, this study utilizes multi-source data from Haikou, China, to construct a Multi-scale Geographically Weighted Regression model (MGWR), considering spatial scale heterogeneity. The regression results showed that MGWR model was demonstrated superior interpretability and reliability with an improvement of 3.4% on R2 and from 4853 to 4787 on AIC, compared with Geographically Weighted Regression model (GWR). Furthermore, to precisely identify the surrounding environment of sampling point, DeepLabv3+ model is employed to segment street-level images. Features extracted from these images are incorporated as variables in the regression model, further enhancing its rationality and accuracy by 7.78% improvement on R2 compared with the MGWR model only considered region-level variables. By integrating multi-scale geospatial data and utilizing advanced computer vision techniques, this study provides a comprehensive understanding of the spatial dynamics between ride-hailing demand and the urban built environment. The insights gained from this research are expected to contribute significantly to urban transportation planning and policy making, as well as ride-hailing platforms, facilitating the development of more efficient and effective mobility solutions in modern cities.

Keywords: travel behavior, ride-hailing, spatial relationship, built environment, street-level image

Procedia PDF Downloads 73

3652 Impact of Land-Use and Climate Change on the Population Structure and Distribution Range of the Rare and Endangered Dracaena ombet and Dobera glabra in Northern Ethiopia

Authors: Emiru Birhane, Tesfay Gidey, Haftu Abrha, Abrha Brhan, Amanuel Zenebe, Girmay Gebresamuel, Florent Noulèkoun

Abstract:

Dracaena ombet and Dobera glabra are two of the most rare and endangered tree species in dryland areas. Unfortunately, their sustainability is being compromised by different anthropogenic and natural factors. However, the impacts of ongoing land use and climate change on the population structure and distribution of the species are less explored. This study was carried out in the grazing lands and hillside areas of the Desa'a dry Afromontane forest, northern Ethiopia, to characterize the population structure of the species and predict the impact of climate change on their potential distributions. In each land-use type, abundance, diameter at breast height, and height of the trees were collected using 70 sampling plots distributed over seven transects spaced one km apart. The geographic coordinates of each individual tree were also recorded. The results showed that the species populations were characterized by low abundance and unstable population structure. The latter was evinced by a lack of seedlings and mature trees. The study also revealed that the total abundance and dendrometric traits of the trees were significantly different between the two land uses. The hillside areas had a denser abundance of bigger and taller trees than the grazing lands. Climate change predictions using the MaxEnt model highlighted that future temperature increases coupled with reduced precipitation would lead to significant reductions in the suitable habitats of the species in northern Ethiopia. The species' suitable habitats were predicted to decline by 48–83% for D. ombet and 35–87% for D. glabra. Hence, to sustain the species populations, different strategies should be adopted, namely the introduction of alternative livelihoods (e.g., gathering NTFP) to reduce the overexploitation of the species for subsistence income and the protection of the current habitats that will remain suitable in the future using community-based exclosures. Additionally, the preservation of the species' seeds in gene banks is crucial to ensure their long-term conservation.

Keywords: grazing lands, hillside areas, land-use change, MaxEnt, range limitation, rare and endangered tree species

Procedia PDF Downloads 88

3651 Constraints and Opportunities of Wood Production Value Chain: Evidence from Southwest Ethiopia

Authors: Abduselam Faris, Rijalu Negash, Zera Kedir

Abstract:

This study was initiated to identify constraints and opportunities of the wood production value chain in Southwest Ethiopia. About 385 wood trees growing farmers were randomly interviewed. Similarly, about 30 small-scale wood processors, 30 retailers, 15 local collectors and 5 wholesalers were purposively included in the study. The results of the study indicated that 98.96 % of the smallholder farmers that engaged in the production of wood trees which is used for wood were male-headed, with an average age of 46.88 years. The main activity that the household engaged was agriculture (crop and livestock) which accounts for about 61.56% of the sample respondents. Through value chain mapping of actors, the major value chain participant and supporting actors were identified. On average, the tree-growing farmers generated gross income of 9385.926 Ethiopian birr during the survey year. Among the critical constraints identified along the wood production value chain was limited supply of credit, poor market information dissemination, high interference of brokers, and shortage of machines, inadequate working area and electricity. The availability of forest resources is the leading opportunity in the wood production value chain. Reinforcing the linkage among wood production value chain actors, providing skill training for small-scale processors, and developing suitable policy for wood tree wise use is key recommendations forward.

Keywords: value chain analysis, wood production, southwest Ethiopia, constraints and opportunities

Procedia PDF Downloads 83

3650 Utilizing Mahogany (Swietenia Macrophylla) Fruits, Leaves, and Branches as Biochar for Soil Amendment in Okra (Abelmoschus Esculentus) Plant

Authors: Ayaka A. Matsuo, Gweyneth Victoria I. Maranan, Shawn Mikel Hobayan

Abstract:

In this study, we delve into the application of mahogany fruits as biochar for soil amendment, aiming to evaluate their effectiveness in improving soil quality and influencing the growth parameters of okra plants through a comprehensive analysis employing various multivariate tests. In a more straightforward approach, our results show that biochar derived from isn't just a minor player but emerges as a key contributor to our study. This finding holds profound implications, as it highlights the material significance of biochar derived from Mahogany (Swietenia macrophylla) fruits, leaves, and branches in shaping the outcomes. The importance of this discovery lies in its contribution to an enhanced comprehension of the overall effects of biochar on the variables explored in our investigation. Notably, the positive changes observed in height, number of leaves, and width of leaves in okra plants further support the premise that the incorporation of biochar improves soil quality. These findings provide valuable insights for agricultural practices, suggesting that biochar derived from Mahogany (Swietenia macrophylla) fruits, leaves, and branches holds promise as a sustainable soil amendment with positive implications for plant growth. The statistical results from multivariate tests serve to solidify the conclusion that biochar plays a pivotal role in driving the observed outcomes in our study. In essence, this research not only sheds light on the potential of mahogany fruit-derived biochar but also emphasizes its significance in fostering healthier soil conditions and, consequently, enhanced plant growth.

Keywords: soil amendment, biochar, mahogany, soil health

Procedia PDF Downloads 66

3649 Investigating the Influence of the Ferro Alloys Consumption on the Slab Product Standard Cost with Different Grades Using Regression Analysis (A Case Study of Iran's Iron and Steel Industry)

Authors: Iman Fakhrian, Ali Salehi Manzari

Abstract:

Consistent Profitability is one of the most important priorities in manufacturing companies. One of the fundamental factors for increasing the companies profitability is cost management. Isfahan's mobarakeh steel company is one of the largest producers of the slab product grades in the middle east. Raw material cost constitutes about 70% of the company's expenditures. The costs of the ferro alloys have a remarkable contribution of the raw material costs. This research aims to determine the ferro alloys which have significant effect on the variability of the standard cost of the slab product grades. Used data in this study were collected from standard costing system of isfahan's mobarakeh steel company in 2022. The results of conducting the regression analysis model show that expense items: 03020, 03045, 03125, 03130 and 03150 have dominant role in variability of the standard cost of the slab product grades. In other words, the mentioned ferro alloys have noticeable and significant role in variability of the standard cost of the slab product grades.

Keywords: consistent profitability, ferro alloys, slab product grades, regression analysis

Procedia PDF Downloads 64

3648 Characterizing Multivariate Thresholds in Industrial Engineering

Authors: Ali E. Abbas

Abstract:

This paper highlights some of the normative issues that might result by setting independent thresholds in risk analyses and particularly with safety regions. A second objective is to explain how such regions can be specified appropriately in a meaningful way. We start with a review of the importance of setting deterministic trade-offs among target requirements. We then show how to determine safety regions for risk analysis appropriately using utility functions.

Keywords: decision analysis, thresholds, risk, reliability

Procedia PDF Downloads 309

3647 Long-Term Indoor Air Monitoring for Students with Emphasis on Particulate Matter (PM2.5) Exposure

Authors: Seyedtaghi Mirmohammadi, Jamshid Yazdani, Syavash Etemadi Nejad

Abstract:

One of the main indoor air parameters in classrooms is dust pollution and it depends on the particle size and exposure duration. However, there is a lake of data about the exposure level to PM2.5 concentrations in rural area classrooms. The objective of the current study was exposure assessment for PM2.5 for students in the classrooms. One year monitoring was carried out for fifteen schools by time-series sampling to evaluate the indoor air PM2.5 in the rural district of Sari city, Iran. A hygrometer and thermometer were used to measure some psychrometric parameters (temperature, relative humidity, and wind speed) and Real-Time Dust Monitor, (MicroDust Pro, Casella, UK) was used to monitor particulate matters (PM2.5) concentration. The results show the mean indoor PM2.5 concentration in the studied classrooms was 135µg/m3. The regression model indicated that a positive correlation between indoor PM2.5 concentration and relative humidity, also with distance from city center and classroom size. Meanwhile, the regression model revealed that the indoor PM2.5 concentration, the relative humidity, and dry bulb temperature was significant at 0.05, 0.035, and 0.05 levels, respectively. A statistical predictive model was obtained from multiple regressions modeling for indoor PM2.5 concentration and indoor psychrometric parameters conditions.

Keywords: classrooms, concentration, humidity, particulate matters, regression

Procedia PDF Downloads 331