Search results for: regression tree
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3926

Search results for: regression tree

3596 Discerning Divergent Nodes in Social Networks

Authors: Mehran Asadi, Afrand Agah

Abstract:

In data mining, partitioning is used as a fundamental tool for classification. With the help of partitioning, we study the structure of data, which allows us to envision decision rules, which can be applied to classification trees. In this research, we used online social network dataset and all of its attributes (e.g., Node features, labels, etc.) to determine what constitutes an above average chance of being a divergent node. We used the R statistical computing language to conduct the analyses in this report. The data were found on the UC Irvine Machine Learning Repository. This research introduces the basic concepts of classification in online social networks. In this work, we utilize overfitting and describe different approaches for evaluation and performance comparison of different classification methods. In classification, the main objective is to categorize different items and assign them into different groups based on their properties and similarities. In data mining, recursive partitioning is being utilized to probe the structure of a data set, which allow us to envision decision rules and apply them to classify data into several groups. Estimating densities is hard, especially in high dimensions, with limited data. Of course, we do not know the densities, but we could estimate them using classical techniques. First, we calculated the correlation matrix of the dataset to see if any predictors are highly correlated with one another. By calculating the correlation coefficients for the predictor variables, we see that density is strongly correlated with transitivity. We initialized a data frame to easily compare the quality of the result classification methods and utilized decision trees (with k-fold cross validation to prune the tree). The method performed on this dataset is decision trees. Decision tree is a non-parametric classification method, which uses a set of rules to predict that each observation belongs to the most commonly occurring class label of the training data. Our method aggregates many decision trees to create an optimized model that is not susceptible to overfitting. When using a decision tree, however, it is important to use cross-validation to prune the tree in order to narrow it down to the most important variables.

Keywords: online social networks, data mining, social cloud computing, interaction and collaboration

Procedia PDF Downloads 130
3595 A Comprehensive Method of Fault Detection and Isolation based on Testability Modeling Data

Authors: Junyou Shi, Weiwei Cui

Abstract:

Testability modeling is a commonly used method in testability design and analysis of system. A dependency matrix will be obtained from testability modeling, and we will give a quantitative evaluation about fault detection and isolation. Based on the dependency matrix, we can obtain the diagnosis tree. The tree provides the procedures of the fault detection and isolation. But the dependency matrix usually includes built-in test (BIT) and manual test in fact. BIT runs the test automatically and is not limited by the procedures. The method above cannot give a more efficient diagnosis and use the advantages of the BIT. A Comprehensive method of fault detection and isolation is proposed. This method combines the advantages of the BIT and Manual test by splitting the matrix. The result of the case study shows that the method is effective.

Keywords: fault detection, fault isolation, testability modeling, BIT

Procedia PDF Downloads 315
3594 Measurement Errors and Misclassifications in Covariates in Logistic Regression: Bayesian Adjustment of Main and Interaction Effects and the Sample Size Implications

Authors: Shahadut Hossain

Abstract:

Measurement errors in continuous covariates and/or misclassifications in categorical covariates are common in epidemiological studies. Regression analysis ignoring such mismeasurements seriously biases the estimated main and interaction effects of covariates on the outcome of interest. Thus, adjustments for such mismeasurements are necessary. In this research, we propose a Bayesian parametric framework for eliminating deleterious impacts of covariate mismeasurements in logistic regression. The proposed adjustment method is unified and thus can be applied to any generalized linear and non-linear regression models. Furthermore, adjustment for covariate mismeasurements requires validation data usually in the form of either gold standard measurements or replicates of the mismeasured covariates on a subset of the study population. Initial investigation shows that adequacy of such adjustment depends on the sizes of main and validation samples, especially when prevalences of the categorical covariates are low. Thus, we investigate the impact of main and validation sample sizes on the adjusted estimates, and provide a general guideline about these sample sizes based on simulation studies.

Keywords: measurement errors, misclassification, mismeasurement, validation sample, Bayesian adjustment

Procedia PDF Downloads 395
3593 An Application to Predict the Best Study Path for Information Technology Students in Learning Institutes

Authors: L. S. Chathurika

Abstract:

Early prediction of student performance is an important factor to be gained academic excellence. Whatever the study stream in secondary education, students lay the foundation for higher studies during the first year of their degree or diploma program in Sri Lanka. The information technology (IT) field has certain improvements in the education domain by selecting specialization areas to show the talents and skills of students. These specializations can be software engineering, network administration, database administration, multimedia design, etc. After completing the first-year, students attempt to select the best path by considering numerous factors. The purpose of this experiment is to predict the best study path using machine learning algorithms. Five classification algorithms: decision tree, support vector machine, artificial neural network, Naïve Bayes, and logistic regression are selected and tested. The support vector machine obtained the highest accuracy, 82.4%. Then affecting features are recognized to select the best study path.

Keywords: algorithm, classification, evaluation, features, testing, training

Procedia PDF Downloads 107
3592 PDDA: Priority-Based, Dynamic Data Aggregation Approach for Sensor-Based Big Data Framework

Authors: Lutful Karim, Mohammed S. Al-kahtani

Abstract:

Sensors are being used in various applications such as agriculture, health monitoring, air and water pollution monitoring, traffic monitoring and control and hence, play the vital role in the growth of big data. However, sensors collect redundant data. Thus, aggregating and filtering sensors data are significantly important to design an efficient big data framework. Current researches do not focus on aggregating and filtering data at multiple layers of sensor-based big data framework. Thus, this paper introduces (i) three layers data aggregation and framework for big data and (ii) a priority-based, dynamic data aggregation scheme (PDDA) for the lowest layer at sensors. Simulation results show that the PDDA outperforms existing tree and cluster-based data aggregation scheme in terms of overall network energy consumptions and end-to-end data transmission delay.

Keywords: big data, clustering, tree topology, data aggregation, sensor networks

Procedia PDF Downloads 320
3591 Quantitative Structure-Activity Relationship Study of Some Quinoline Derivatives as Antimalarial Agents

Authors: M. Ouassaf, S. Belaid

Abstract:

A series of quinoline derivatives with antimalarial activity were subjected to two-dimensional quantitative structure-activity relationship (2D-QSAR) studies. Three models were implemented using multiple regression linear MLR, a regression partial least squares (PLS), nonlinear regression (MNLR), to see which descriptors are closely related to the activity biologic. We relied on a principal component analysis (PCA). Based on our results, a comparison of the quality of, MLR, PLS, and MNLR models shows that the MNLR (R = 0.914 and R² = 0.835, RCV= 0.853) models have substantially better predictive capability because the MNLR approach gives better results than MLR (R = 0.835 and R² = 0,752, RCV=0.601)), PLS (R = 0.742 and R² = 0.552, RCV=0.550) The model of MNLR gave statistically significant results and showed good stability to data variation in leave-one-out cross-validation. The obtained results suggested that our proposed model MNLR may be useful to predict the biological activity of derivatives of quinoline.

Keywords: antimalarial, quinoline, QSAR, PCA, MLR , MNLR, MLR

Procedia PDF Downloads 136
3590 A Predictive Machine Learning Model of the Survival of Female-led and Co-Led Small and Medium Enterprises in the UK

Authors: Mais Khader, Xingjie Wei

Abstract:

This research sheds light on female entrepreneurs by providing new insights on the survival predictions of companies led by females in the UK. This study aims to build a predictive machine learning model of the survival of female-led & co-led small & medium enterprises (SMEs) in the UK over the period 2000-2020. The predictive model built utilised a combination of financial and non-financial features related to both companies and their directors to predict SMEs' survival. These features were studied in terms of their contribution to the resultant predictive model. Five machine learning models are used in the modelling: Decision tree, AdaBoost, Naïve Bayes, Logistic regression and SVM. The AdaBoost model had the highest performance of the five models, with an accuracy of 73% and an AUC of 80%. The results show high feature importance in predicting companies' survival for company size, management experience, financial performance, industry, region, and females' percentage in management.

Keywords: company survival, entrepreneurship, females, machine learning, SMEs

Procedia PDF Downloads 73
3589 Agile Software Effort Estimation Using Regression Techniques

Authors: Mikiyas Adugna

Abstract:

Effort estimation is among the activities carried out in software development processes. An accurate model of estimation leads to project success. The method of agile effort estimation is a complex task because of the dynamic nature of software development. Researchers are still conducting studies on agile effort estimation to enhance prediction accuracy. Due to these reasons, we investigated and proposed a model on LASSO and Elastic Net regression to enhance estimation accuracy. The proposed model has major components: preprocessing, train-test split, training with default parameters, and cross-validation. During the preprocessing phase, the entire dataset is normalized. After normalization, a train-test split is performed on the dataset, setting training at 80% and testing set to 20%. We chose two different phases for training the two algorithms (Elastic Net and LASSO) regression following the train-test-split. In the first phase, the two algorithms are trained using their default parameters and evaluated on the testing data. In the second phase, the grid search technique (the grid is used to search for tuning and select optimum parameters) and 5-fold cross-validation to get the final trained model. Finally, the final trained model is evaluated using the testing set. The experimental work is applied to the agile story point dataset of 21 software projects collected from six firms. The results show that both Elastic Net and LASSO regression outperformed the compared ones. Compared to the proposed algorithms, LASSO regression achieved better predictive performance and has acquired PRED (8%) and PRED (25%) results of 100.0, MMRE of 0.0491, MMER of 0.0551, MdMRE of 0.0593, MdMER of 0.063, and MSE of 0.0007. The result implies LASSO regression algorithm trained model is the most acceptable, and higher estimation performance exists in the literature.

Keywords: agile software development, effort estimation, elastic net regression, LASSO

Procedia PDF Downloads 43
3588 Robustified Asymmetric Logistic Regression Model for Global Fish Stock Assessment

Authors: Osamu Komori, Shinto Eguchi, Hiroshi Okamura, Momoko Ichinokawa

Abstract:

The long time-series data on population assessments are essential for global ecosystem assessment because the temporal change of biomass in such a database reflects the status of global ecosystem properly. However, the available assessment data usually have limited sample sizes and the ratio of populations with low abundance of biomass (collapsed) to those with high abundance (non-collapsed) is highly imbalanced. To allow for the imbalance and uncertainty involved in the ecological data, we propose a binary regression model with mixed effects for inferring ecosystem status through an asymmetric logistic model. In the estimation equation, we observe that the weights for the non-collapsed populations are relatively reduced, which in turn puts more importance on the small number of observations of collapsed populations. Moreover, we extend the asymmetric logistic regression model using propensity score to allow for the sample biases observed in the labeled and unlabeled datasets. It robustified the estimation procedure and improved the model fitting.

Keywords: double robust estimation, ecological binary data, mixed effect logistic regression model, propensity score

Procedia PDF Downloads 249
3587 PM10 Prediction and Forecasting Using CART: A Case Study for Pleven, Bulgaria

Authors: Snezhana G. Gocheva-Ilieva, Maya P. Stoimenova

Abstract:

Ambient air pollution with fine particulate matter (PM10) is a systematic permanent problem in many countries around the world. The accumulation of a large number of measurements of both the PM10 concentrations and the accompanying atmospheric factors allow for their statistical modeling to detect dependencies and forecast future pollution. This study applies the classification and regression trees (CART) method for building and analyzing PM10 models. In the empirical study, average daily air data for the city of Pleven, Bulgaria for a period of 5 years are used. Predictors in the models are seven meteorological variables, time variables, as well as lagged PM10 variables and some lagged meteorological variables, delayed by 1 or 2 days with respect to the initial time series, respectively. The degree of influence of the predictors in the models is determined. The selected best CART models are used to forecast future PM10 concentrations for two days ahead after the last date in the modeling procedure and show very accurate results.

Keywords: cross-validation, decision tree, lagged variables, short-term forecasting

Procedia PDF Downloads 177
3586 Urban-Rural Inequality in Mexico after Nafta: A Quantile Regression Analysis

Authors: Rene Valdiviezo-Issa

Abstract:

In this paper, we use Mexico’s Households Income and Expenditures (ENIGH) survey to explain the behaviour that the urban-rural expenditure gap has had since Mexico’s incorporation to the North American Free Trade Agreement (NAFTA) in 1994 and we compare it with the latest available survey, which took place in 2014. We use real trimestral expenditure per capita (RTEPC) as the measure of welfare. We use quantile regressions and a quantile regression decomposition to describe the gap between urban and rural distributions of log RTEPC. We discover that the decrease in the difference between the urban and rural distributions of log RTEPC, or inequality, is motivated because of a deprivation of the urban areas, in very specific characteristics, rather than an improvement of the urban areas. When using the decomposition we observe that the gap is primarily brought about because differences in returns to covariates between the urban and rural areas.

Keywords: quantile regression, urban-rural inequality, inequality in Mexico, income decompositon

Procedia PDF Downloads 264
3585 Estimating Tree Height and Forest Classification from Multi Temporal Risat-1 HH and HV Polarized Satellite Aperture Radar Interferometric Phase Data

Authors: Saurav Kumar Suman, P. Karthigayani

Abstract:

In this paper the height of the tree is estimated and forest types is classified from the multi temporal RISAT-1 Horizontal-Horizontal (HH) and Horizontal-Vertical (HV) Polarised Satellite Aperture Radar (SAR) data. The novelty of the proposed project is combined use of the Back-scattering Coefficients (Sigma Naught) and the Coherence. It uses Water Cloud Model (WCM). The approaches use two main steps. (a) Extraction of the different forest parameter data from the Product.xml, BAND-META file and from Grid-xxx.txt file come with the HH & HV polarized data from the ISRO (Indian Space Research Centre). These file contains the required parameter during height estimation. (b) Calculation of the Vegetation and Ground Backscattering, Coherence and other Forest Parameters. (c) Classification of Forest Types using the ENVI 5.0 Tool and ROI (Region of Interest) calculation.

Keywords: RISAT-1, classification, forest, SAR data

Procedia PDF Downloads 384
3584 The Role of Agroforestry Practices in Climate Change Mitigation in Western Kenya

Authors: Humphrey Agevi, Harrison Tsingalia, Richard Onwonga, Shem Kuyah

Abstract:

Most of the world ecosystems have been affected by the effects of climate change. Efforts have been made to mitigate against climate change effects. While most studies have been done in forest ecosystems and pure plant plantations, trees on farms including agroforestry have only received attention recently. Agroforestry systems and tree cover on agricultural lands make an important contribution to climate change mitigation but are not systematically accounted for in the global carbon budgets. This study sought to: (i) determine tree diversity in different agroforestry practices; (ii) determine tree biomass in different agroforestry practices. Study area was determined according to the Land degradation surveillance framework (LSDF). Two study sites were established. At each of the site, a 5km x 10km block was established on a map using Google maps and satellite images. Way points were then uploaded in a GPS helped locate the blocks on the ground. In each of the blocks, Nine (8) sentinel clusters measuring 1km x 1km were randomized. Randomization was done in a common spreadsheet program and later be downloaded to a Global Positioning System (GPS) so that during surveys the researchers were able to navigate to the sampling points. In each of the sentinel cluster, two farm boundaries were randomly identified for convenience and to avoid bias. This led to 16 farms in Kakamega South and 16 farms in Kakamega North totalling to 32 farms in Kakamega Site. Species diversity was determined using Shannon wiener index. Tree biomass was determined using allometric equation. Two agroforestry practices were found; homegarden and hedgerow. Species diversity ranged from 0.25-2.7 with a mean of 1.8 ± 0.10. Species diversity in homegarden ranged from 1-2.7 with a mean of 1.98± 0.14. Hedgerow species diversity ranged from 0.25-2.52 with a mean of 1.74± 0.11. Total Aboveground Biomass (AGB) determined was 13.96±0.37 Mgha-1. Homegarden with the highest abundance of trees had higher above ground biomass (AGB) compared to hedgerow agroforestry. This study is timely as carbon budgets in the agroforestry can be incorporated in the global carbon budgets and improve the accuracy of national reporting of greenhouse gases.

Keywords: agroforestry, allometric equations, biomass, climate change

Procedia PDF Downloads 337
3583 Distribution of Epiphytic Lichen Biodiversity and Comparision with Their Preferred Tree Species around the Şeker Canyon, Karabük, Turkey

Authors: Hatice Esra Akgül, Celaleddin Öztürk

Abstract:

Lichen biodiversity in forests is controlled by environmental conditions. Epiphytic lichens have some degree of substrate specificity. Diversity and distribution of epiphytic lichens are affected by humidity, light, altitude, temperature, bark pH of the trees.This study describes the epiphytic lichen communities with comparing their preferred tree species. 34 epiphytic lichen taxa are reported on Pinus sp. L., Quercus sp. L., Fagus sp. L., Carpinus sp. L., Abies sp. Mill., Fraxinus sp. Tourn. ex L. from different altitudes around the Şeker Canyon (Karabük, Turkey). 11 of these taxa are growing on Quercus sp., 10 of them are growing on Fagus sp., 7 of them are growing on Pinus sp., 4 of them are on Carpinus sp., 2 of them are on Abies sp. and one of them is on Fraxinus sp. Evernia prunastri (L.) Ach. is growing on both of Fagus sp. and Quercus sp. Lecanora pulicaris (Pers.) Ach. is growing on both of Abies sp. and Quercus sp.

Keywords: biodiversity, epiphytic lichen, forest, Turkey

Procedia PDF Downloads 321
3582 Developing Variable Repetitive Group Sampling Control Chart Using Regression Estimator

Authors: Liaquat Ahmad, Muhammad Aslam, Muhammad Azam

Abstract:

In this article, we propose a control chart based on repetitive group sampling scheme for the location parameter. This charting scheme is based on the regression estimator; an estimator that capitalize the relationship between the variables of interest to provide more sensitive control than the commonly used individual variables. The control limit coefficients have been estimated for different sample sizes for less and highly correlated variables. The monitoring of the production process is constructed by adopting the procedure of the Shewhart’s x-bar control chart. Its performance is verified by the average run length calculations when the shift occurs in the average value of the estimator. It has been observed that the less correlated variables have rapid false alarm rate.

Keywords: average run length, control charts, process shift, regression estimators, repetitive group sampling

Procedia PDF Downloads 547
3581 Mosquito Repellent Finishing of Cotton Using Pepper Tree (Schinus molle) Seed Oil Extract

Authors: Granch Berhe Tseghai, Tekalgn Gebremedhin Belay, Abrehaley Hagos Gebremariam

Abstract:

Mosquito repellent textiles are one of the most growing ways to advance the textile field by providing the needed characteristics of protecting against mosquitoes, especially in the tropical areas. These types of textiles ensure the protection of human beings from the mosquitoes and the mosquito-borne disease includes malaria, filariasis and dengue fever. In this work Schinus Molle oil (pepper tree oil) was used for mosquito repellent finish as a preformatted thing. This study focused on the penetration of mosquito repellent finish in textile applications as well as nature based alternatives to commercial chemical mosquito repellents in the market. Suitable techniques and materials to achieve mosquito repellency are discussed and pointed out according to our project. In this study textile, sample was treated with binder and schinus oil. The different property has been studied for effective mosquito repellency.

Keywords: cotton, Schinus molle seed oil, mosquito repellent, mosquito-borne diseases

Procedia PDF Downloads 262
3580 Identification of Healthy and BSR-Infected Oil Palm Trees Using Color Indices

Authors: Siti Khairunniza-Bejo, Yusnida Yusoff, Nik Salwani Nik Yusoff, Idris Abu Seman, Mohamad Izzuddin Anuar

Abstract:

Most of the oil palm plantations have been threatened by Basal Stem Rot (BSR) disease which causes serious economic impact. This study was conducted to identify the healthy and BSR-infected oil palm tree using thirteen color indices. Multispectral and thermal camera was used to capture 216 images of the leaves taken from frond number 1, 9 and 17. Indices of normalized difference vegetation index (NDVI), red (R), green (G), blue (B), near infrared (NIR), green – blue (GB), green/blue (G/B), green – red (GR), green/red (G/R), hue (H), saturation (S), intensity (I) and thermal index (T) were used. From this study, it can be concluded that G index taken from frond number 9 is the best index to differentiate between the healthy and BSR-infected oil palm trees. It not only gave high value of correlation coefficient (R=-0.962), but also high value of separation between healthy and BSR-infected oil palm tree. Furthermore, power and S model developed using G index gave the highest R2 value which is 0.985.

Keywords: oil palm, image processing, disease, leaves

Procedia PDF Downloads 484
3579 Classification Using Worldview-2 Imagery of Giant Panda Habitat in Wolong, Sichuan Province, China

Authors: Yunwei Tang, Linhai Jing, Hui Li, Qingjie Liu, Xiuxia Li, Qi Yan, Haifeng Ding

Abstract:

The giant panda (Ailuropoda melanoleuca) is an endangered species, mainly live in central China, where bamboos act as the main food source of wild giant pandas. Knowledge of spatial distribution of bamboos therefore becomes important for identifying the habitat of giant pandas. There have been ongoing studies for mapping bamboos and other tree species using remote sensing. WorldView-2 (WV-2) is the first high resolution commercial satellite with eight Multi-Spectral (MS) bands. Recent studies demonstrated that WV-2 imagery has a high potential in classification of tree species. The advanced classification techniques are important for utilising high spatial resolution imagery. It is generally agreed that object-based image analysis is a more desirable method than pixel-based analysis in processing high spatial resolution remotely sensed data. Classifiers that use spatial information combined with spectral information are known as contextual classifiers. It is suggested that contextual classifiers can achieve greater accuracy than non-contextual classifiers. Thus, spatial correlation can be incorporated into classifiers to improve classification results. The study area is located at Wuyipeng area in Wolong, Sichuan Province. The complex environment makes it difficult for information extraction since bamboos are sparsely distributed, mixed with brushes, and covered by other trees. Extensive fieldworks in Wuyingpeng were carried out twice. The first one was on 11th June, 2014, aiming at sampling feature locations for geometric correction and collecting training samples for classification. The second fieldwork was on 11th September, 2014, for the purposes of testing the classification results. In this study, spectral separability analysis was first performed to select appropriate MS bands for classification. Also, the reflectance analysis provided information for expanding sample points under the circumstance of knowing only a few. Then, a spatially weighted object-based k-nearest neighbour (k-NN) classifier was applied to the selected MS bands to identify seven land cover types (bamboo, conifer, broadleaf, mixed forest, brush, bare land, and shadow), accounting for spatial correlation within classes using geostatistical modelling. The spatially weighted k-NN method was compared with three alternatives: the traditional k-NN classifier, the Support Vector Machine (SVM) method and the Classification and Regression Tree (CART). Through field validation, it was proved that the classification result obtained using the spatially weighted k-NN method has the highest overall classification accuracy (77.61%) and Kappa coefficient (0.729); the producer’s accuracy and user’s accuracy achieve 81.25% and 95.12% for the bamboo class, respectively, also higher than the other methods. Photos of tree crowns were taken at sample locations using a fisheye camera, so the canopy density could be estimated. It is found that it is difficult to identify bamboo in the areas with a large canopy density (over 0.70); it is possible to extract bamboos in the areas with a median canopy density (from 0.2 to 0.7) and in a sparse forest (canopy density is less than 0.2). In summary, this study explores the ability of WV-2 imagery for bamboo extraction in a mountainous region in Sichuan. The study successfully identified the bamboo distribution, providing supporting knowledge for assessing the habitats of giant pandas.

Keywords: bamboo mapping, classification, geostatistics, k-NN, worldview-2

Procedia PDF Downloads 297
3578 The Relationship Between Hourly Compensation and Unemployment Rate Using the Panel Data Regression Analysis

Authors: S. K. Ashiquer Rahman

Abstract:

the paper concentrations on the importance of hourly compensation, emphasizing the significance of the unemployment rate. There are the two most important factors of a nation these are its unemployment rate and hourly compensation. These are not merely statistics but they have profound effects on individual, families, and the economy. They are inversely related to one another. When we consider the unemployment rate that will probably decline as hourly compensations in manufacturing rise. But when we reduced the unemployment rates and increased job prospects could result from higher compensation. That’s why, the increased hourly compensation in the manufacturing sector that could have a favorable effect on job changing issues. Moreover, the relationship between hourly compensation and unemployment is complex and influenced by broader economic factors. In this paper, we use panel data regression models to evaluate the expected link between hourly compensation and unemployment rate in order to determine the effect of hourly compensation on unemployment rate. We estimate the fixed effects model, evaluate the error components, and determine which model (the FEM or ECM) is better by pooling all 60 observations. We then analysis and review the data by comparing 3 several countries (United States, Canada and the United Kingdom) using panel data regression models. Finally, we provide result, analysis and a summary of the extensive research on how the hourly compensation effects on the unemployment rate. Additionally, this paper offers relevant and useful informational to help the government and academic community use an econometrics and social approach to lessen on the effect of the hourly compensation on Unemployment rate to eliminate the problem.

Keywords: hourly compensation, Unemployment rate, panel data regression models, dummy variables, random effects model, fixed effects model, the linear regression model

Procedia PDF Downloads 57
3577 Performance Comparison of Different Regression Methods for a Polymerization Process with Adaptive Sampling

Authors: Florin Leon, Silvia Curteanu

Abstract:

Developing complete mechanistic models for polymerization reactors is not easy, because complex reactions occur simultaneously; there is a large number of kinetic parameters involved and sometimes the chemical and physical phenomena for mixtures involving polymers are poorly understood. To overcome these difficulties, empirical models based on sampled data can be used instead, namely regression methods typical of machine learning field. They have the ability to learn the trends of a process without any knowledge about its particular physical and chemical laws. Therefore, they are useful for modeling complex processes, such as the free radical polymerization of methyl methacrylate achieved in a batch bulk process. The goal is to generate accurate predictions of monomer conversion, numerical average molecular weight and gravimetrical average molecular weight. This process is associated with non-linear gel and glass effects. For this purpose, an adaptive sampling technique is presented, which can select more samples around the regions where the values have a higher variation. Several machine learning methods are used for the modeling and their performance is compared: support vector machines, k-nearest neighbor, k-nearest neighbor and random forest, as well as an original algorithm, large margin nearest neighbor regression. The suggested method provides very good results compared to the other well-known regression algorithms.

Keywords: batch bulk methyl methacrylate polymerization, adaptive sampling, machine learning, large margin nearest neighbor regression

Procedia PDF Downloads 289
3576 Mycorrhizal Autochthonous Consortium Induced Defense-Related Mechanisms of Olive Trees against Verticillium dahliae

Authors: Hanane Boutaj, Abdelilah Meddich, Said Wahbi, Zainab El Alaoui-Talibi, Allal Douira, Abdelkarim Filali-Maltouf, Cherkaoui El Modafar

Abstract:

The present work aims to investigate the effect of arbuscular mycorrhizal fungi (AMF) in improving the olive tree resistance to Verticillium wilt caused by Verticillium dahliae. Inoculated plants with a mycorrhizal autochthonous consortium 'Rhizolive consortium' and pure strain 'Glomus irregulare' were infected after three months with V. dahliae. The improving of olive tree resistance was determined through disease severity, incidence, and defoliation. On the other hand, the defense mechanisms of olive plants were evaluated through lignin content, phenylalanine ammonia lyase (PAL) activity, and polyphenol content. The results revealed that both AMF significantly (p < 0.05) reduced disease development and the rate of defoliation in infected olive plants. Moreover, the contents of lignin were boosted after mycorrhizal inoculation in both the roots and the stems of olive plants, which remained significantly (p < 0.001) higher after the 90th days of V. dahliae inoculation. PAL activity was increased after V. dahliae inoculation in the stems of 'Rhizolive consortium' treatment that were 17 times higher than those in the roots of olive plants. The polyphenol content in the stems was about twice higher than those in the roots. The reduction of disease severity was accompanied by increased levels of lignin content, PAL activity, and polyphenol content, particularly in the stems of olive plants, indicating the strengthening of the olive plant immune system against V. dahliae.

Keywords: olive tree, Mycorrhizal autochthonous consortium, Glomus irregulare, Verticillium dahliae, defense mechanisms

Procedia PDF Downloads 96
3575 Composite Kernels for Public Emotion Recognition from Twitter

Authors: Chien-Hung Chen, Yan-Chun Hsing, Yung-Chun Chang

Abstract:

The Internet has grown into a powerful medium for information dispersion and social interaction that leads to a rapid growth of social media which allows users to easily post their emotions and perspectives regarding certain topics online. Our research aims at using natural language processing and text mining techniques to explore the public emotions expressed on Twitter by analyzing the sentiment behind tweets. In this paper, we propose a composite kernel method that integrates tree kernel with the linear kernel to simultaneously exploit both the tree representation and the distributed emotion keyword representation to analyze the syntactic and content information in tweets. The experiment results demonstrate that our method can effectively detect public emotion of tweets while outperforming the other compared methods.

Keywords: emotion recognition, natural language processing, composite kernel, sentiment analysis, text mining

Procedia PDF Downloads 201
3574 Warfare Ships at Ancient Egypt: Since Pre-Historic Era (3700 B.C.) Uptill the End of the 2nd Intermediate Period (1550 B.C.)

Authors: Mohsen Negmeddin

Abstract:

Throughout their history, ancient Egyptians had known several kinds and types of boats, which were made from two main kinds of materials, the local one, as the dried papyrus reeds and the local tree trunks, the imported one, as the boats which were made from Lebanon cedar tree trunks. A varied using of these boats, as the fish hunting small boats, the transportation and trade boats "Cargo Boats", as well as the ceremonial boats, and the warfare boats. The research is intending for the last one, the warfare boats and the river/maritime battles since the beginning of ancient Egyptian civilization at the pre-historic era up till the end of the second intermediate period, to reveal the kinds and types of those fighting ships before establishing the Egyptian navy at the beginning of the New Kingdome (1550-1770 B.C). Two methods will follow at this research, the mention of names and titles of these ships through the texts (ancient Egyptian language) resources, and the depiction of it at the scenes.

Keywords: the warfare boats, the maritime battles, the pre-historic era, the second intermediate period

Procedia PDF Downloads 255
3573 Nature of Forest Fragmentation Owing to Human Population along Elevation Gradient in Different Countries in Hindu Kush Himalaya Mountains

Authors: Pulakesh Das, Mukunda Dev Behera, Manchiraju Sri Ramachandra Murthy

Abstract:

Large numbers of people living in and around the Hindu Kush Himalaya (HKH) region, depends on this diverse mountainous region for ecosystem services. Following the global trend, this region also experiencing rapid population growth, and demand for timber and agriculture land. The eight countries sharing the HKH region have different forest resources utilization and conservation policies that exert varying forces in the forest ecosystem. This created a variable spatial as well altitudinal gradient in rate of deforestation and corresponding forest patch fragmentation. The quantitative relationship between fragmentation and demography has not been established before for HKH vis-à-vis along elevation gradient. This current study was carried out to attribute the overall and different nature in landscape fragmentations along the altitudinal gradient with the demography of each sharing countries. We have used the tree canopy cover data derived from Landsat data to analyze the deforestation and afforestation rate, and corresponding landscape fragmentation observed during 2000 – 2010. Area-weighted mean radius of gyration (AMN radius of gyration) was computed owing to its advantage as spatial indicator of fragmentation over non-spatial fragmentation indices. Using the subtraction method, the change in fragmentation was computed during 2000 – 2010. Using the tree canopy cover data as a surrogate of forest cover, highest forest loss was observed in Myanmar followed by China, India, Bangladesh, Nepal, Pakistan, Bhutan, and Afghanistan. However, the sequence of fragmentation was different after the maximum fragmentation observed in Myanmar followed by India, China, Bangladesh, and Bhutan; whereas increase in fragmentation was seen following the sequence of as Nepal, Pakistan, and Afghanistan. Using SRTM-derived DEM, we observed higher rate of fragmentation up to 2400m that corroborated with high human population for the year 2000 and 2010. To derive the nature of fragmentation along the altitudinal gradients, the Statistica software was used, where the user defined function was utilized for regression applying the Gauss-Newton estimation method with 50 iterations. We observed overall logarithmic decrease in fragmentation change (area-weighted mean radius of gyration), forest cover loss and population growth during 2000-2010 along the elevation gradient with very high R2 values (i.e., 0.889, 0.895, 0.944 respectively). The observed negative logarithmic function with the major contribution in the initial elevation gradients suggest to gap filling afforestation in the lower altitudes to enhance the forest patch connectivity. Our finding on the pattern of forest fragmentation and human population across the elevation gradient in HKH region will have policy level implication for different nations and would help in characterizing hotspots of change. Availability of free satellite derived data products on forest cover and DEM, grid-data on demography, and utility of geospatial tools helped in quick evaluation of the forest fragmentation vis-a-vis human impact pattern along the elevation gradient in HKH.

Keywords: area-weighted mean radius of gyration, fragmentation, human impact, tree canopy cover

Procedia PDF Downloads 200
3572 Chemometric QSRR Evaluation of Behavior of s-Triazine Pesticides in Liquid Chromatography

Authors: Lidija R. Jevrić, Sanja O. Podunavac-Kuzmanović, Strahinja Z. Kovačević

Abstract:

This study considers the selection of the most suitable in silico molecular descriptors that could be used for s-triazine pesticides characterization. Suitable descriptors among topological, geometrical and physicochemical are used for quantitative structure-retention relationships (QSRR) model establishment. Established models were obtained using linear regression (LR) and multiple linear regression (MLR) analysis. In this paper, MLR models were established avoiding multicollinearity among the selected molecular descriptors. Statistical quality of established models was evaluated by standard and cross-validation statistical parameters. For detection of similarity or dissimilarity among investigated s-triazine pesticides and their classification, principal component analysis (PCA) and hierarchical cluster analysis (HCA) were used and gave similar grouping. This study is financially supported by COST action TD1305.

Keywords: chemometrics, classification analysis, molecular descriptors, pesticides, regression analysis

Procedia PDF Downloads 375
3571 Support Vector Regression Combined with Different Optimization Algorithms to Predict Global Solar Radiation on Horizontal Surfaces in Algeria

Authors: Laidi Maamar, Achwak Madani, Abdellah El Ahdj Abdellah

Abstract:

The aim of this work is to use Support Vector regression (SVR) combined with dragonfly, firefly, Bee Colony and particle swarm Optimization algorithm to predict global solar radiation on horizontal surfaces in some cities in Algeria. Combining these optimization algorithms with SVR aims principally to enhance accuracy by fine-tuning the parameters, speeding up the convergence of the SVR model, and exploring a larger search space efficiently; these parameters are the regularization parameter (C), kernel parameters, and epsilon parameter. By doing so, the aim is to improve the generalization and predictive accuracy of the SVR model. Overall, the aim is to leverage the strengths of both SVR and optimization algorithms to create a more powerful and effective regression model for various cities and under different climate conditions. Results demonstrate close agreement between predicted and measured data in terms of different metrics. In summary, SVM has proven to be a valuable tool in modeling global solar radiation, offering accurate predictions and demonstrating versatility when combined with other algorithms or used in hybrid forecasting models.

Keywords: support vector regression (SVR), optimization algorithms, global solar radiation prediction, hybrid forecasting models

Procedia PDF Downloads 14
3570 Statistic Regression and Open Data Approach for Identifying Economic Indicators That Influence e-Commerce

Authors: Apollinaire Barme, Simon Tamayo, Arthur Gaudron

Abstract:

This paper presents a statistical approach to identify explanatory variables linearly related to e-commerce sales. The proposed methodology allows specifying a regression model in order to quantify the relevance between openly available data (economic and demographic) and national e-commerce sales. The proposed methodology consists in collecting data, preselecting input variables, performing regressions for choosing variables and models, testing and validating. The usefulness of the proposed approach is twofold: on the one hand, it allows identifying the variables that influence e- commerce sales with an accessible approach. And on the other hand, it can be used to model future sales from the input variables. Results show that e-commerce is linearly dependent on 11 economic and demographic indicators.

Keywords: e-commerce, statistical modeling, regression, empirical research

Procedia PDF Downloads 208
3569 Characteristics of Tremella fuciformis and Annulohypoxylon stygium for Optimal Cultivation Conditions

Authors: Eun-Ji Lee, Hye-Sung Park, Chan-Jung Lee, Won-Sik Kong

Abstract:

We analyzed the DNA sequence of the ITS (Internal Transcribed Spacer) region of the 18S ribosomal gene and compared it with the gene sequence of T. fuciformis and Hypoxylon sp. in the BLAST database. The sequences of collected T. fuciformis and Hypoxylon sp. have over 99% homology in the T. fuciformis and Hypoxylon sp. sequence BLAST database. In order to select the optimal medium for T. fuciformis, five kinds of a medium such as Potato Dextrose Agar (PDA), Mushroom Complete Medium (MCM), Malt Extract Agar (MEA), Yeast extract (YM), and Compost Extract Dextrose Agar (CDA) were used. T. fuciformis showed the best growth on PDA medium, and Hypoxylon sp. showed the best growth on MCM. So as to investigate the optimum pH and temperature, the pH range was set to pH4 to pH8 and the temperature range was set to 15℃ to 35℃ (5℃ degree intervals). Optimum culture conditions for the T. fuciformis growth were pH5 at 25℃. Hypoxylon sp. were pH6 at 25°C. In order to confirm the most suitable carbon source, we used fructose, galactose, saccharose, soluble starch, inositol, glycerol, xylose, dextrose, lactose, dextrin, Na-CMC, adonitol. Mannitol, mannose, maltose, raffinose, cellobiose, ethanol, salicine, glucose, arabinose. In the optimum carbon source, T. fuciformis is xylose and Hypoxylon sp. is arabinose. Using the column test, we confirmed sawdust a suitable for T. fuciformis, since the composition of sawdust affects the growth of fruiting bodies of T. fuciformis. The sawdust we used is oak tree, pine tree, poplar, birch, cottonseed meal, cottonseed hull. In artificial cultivation of T. fuciformis with sawdust medium, T. fuciformis and Hypoxylon sp. showed fast mycelial growth on mixture of oak tree sawdust, cottonseed hull, and wheat bran.

Keywords: cultivation, optimal condition, tremella fuciformis, nutritional source

Procedia PDF Downloads 187
3568 Expert Supporting System for Diagnosing Lymphoid Neoplasms Using Probabilistic Decision Tree Algorithm and Immunohistochemistry Profile Database

Authors: Yosep Chong, Yejin Kim, Jingyun Choi, Hwanjo Yu, Eun Jung Lee, Chang Suk Kang

Abstract:

For the past decades, immunohistochemistry (IHC) has been playing an important role in the diagnosis of human neoplasms, by helping pathologists to make a clearer decision on differential diagnosis, subtyping, personalized treatment plan, and finally prognosis prediction. However, the IHC performed in various tumors of daily practice often shows conflicting and very challenging results to interpret. Even comprehensive diagnosis synthesizing clinical, histologic and immunohistochemical findings can be helpless in some twisted cases. Another important issue is that the IHC data is increasing exponentially and more and more information have to be taken into account. For this reason, we reached an idea to develop an expert supporting system to help pathologists to make a better decision in diagnosing human neoplasms with IHC results. We gave probabilistic decision tree algorithm and tested the algorithm with real case data of lymphoid neoplasms, in which the IHC profile is more important to make a proper diagnosis than other human neoplasms. We designed probabilistic decision tree based on Bayesian theorem, program computational process using MATLAB (The MathWorks, Inc., USA) and prepared IHC profile database (about 104 disease category and 88 IHC antibodies) based on WHO classification by reviewing the literature. The initial probability of each neoplasm was set with the epidemiologic data of lymphoid neoplasm in Korea. With the IHC results of 131 patients sequentially selected, top three presumptive diagnoses for each case were made and compared with the original diagnoses. After the review of the data, 124 out of 131 were used for final analysis. As a result, the presumptive diagnoses were concordant with the original diagnoses in 118 cases (93.7%). The major reason of discordant cases was that the similarity of the IHC profile between two or three different neoplasms. The expert supporting system algorithm presented in this study is in its elementary stage and need more optimization using more advanced technology such as deep-learning with data of real cases, especially in differentiating T-cell lymphomas. Although it needs more refinement, it may be used to aid pathological decision making in future. A further application to determine IHC antibodies for a certain subset of differential diagnoses might be possible in near future.

Keywords: database, expert supporting system, immunohistochemistry, probabilistic decision tree

Procedia PDF Downloads 210
3567 Using Data Mining Techniques to Evaluate the Different Factors Affecting the Academic Performance of Students at the Faculty of Information Technology in Hashemite University in Jordan

Authors: Feras Hanandeh, Majdi Shannag

Abstract:

This research studies the different factors that could affect the Faculty of Information Technology in Hashemite University students’ accumulative average. The research paper verifies the student information, background, their academic records, and how this information will affect the student to get high grades. The student information used in the study is extracted from the student’s academic records. The data mining tools and techniques are used to decide which attribute(s) will affect the student’s accumulative average. The results show that the most important factor which affects the students’ accumulative average is the student Acceptance Type. And we built a decision tree model and rules to determine how the student can get high grades in their courses. The overall accuracy of the model is 44% which is accepted rate.

Keywords: data mining, classification, extracting rules, decision tree

Procedia PDF Downloads 399