Search results for: tree ensemble
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 446

Search results for: tree ensemble

446 Ensemble Learning with Decision Tree for Remote Sensing Classification

Authors: Mahesh Pal

Abstract:

In recent years, a number of works proposing the combination of multiple classifiers to produce a single classification have been reported in remote sensing literature. The resulting classifier, referred to as an ensemble classifier, is generally found to be more accurate than any of the individual classifiers making up the ensemble. As accuracy is the primary concern, much of the research in the field of land cover classification is focused on improving classification accuracy. This study compares the performance of four ensemble approaches (boosting, bagging, DECORATE and random subspace) with a univariate decision tree as base classifier. Two training datasets, one without ant noise and other with 20 percent noise was used to judge the performance of different ensemble approaches. Results with noise free data set suggest an improvement of about 4% in classification accuracy with all ensemble approaches in comparison to the results provided by univariate decision tree classifier. Highest classification accuracy of 87.43% was achieved by boosted decision tree. A comparison of results with noisy data set suggests that bagging, DECORATE and random subspace approaches works well with this data whereas the performance of boosted decision tree degrades and a classification accuracy of 79.7% is achieved which is even lower than that is achieved (i.e. 80.02%) by using unboosted decision tree classifier.

Keywords: Ensemble learning, decision tree, remote sensingclassification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2583
445 Ensemble Approach for Predicting Student's Academic Performance

Authors: L. A. Muhammad, M. S. Argungu

Abstract:

Educational data mining (EDM) has recorded substantial considerations. Techniques of data mining in one way or the other have been proposed to dig out out-of-sight knowledge in educational data. The result of the study got assists academic institutions in further enhancing their process of learning and methods of passing knowledge to students. Consequently, the performance of students boasts and the educational products are by no doubt enhanced. This study adopted a student performance prediction model premised on techniques of data mining with Students' Essential Features (SEF). SEF are linked to the learner's interactivity with the e-learning management system. The performance of the student's predictive model is assessed by a set of classifiers, viz. Bayes Network, Logistic Regression, and Reduce Error Pruning Tree (REP). Consequently, ensemble methods of Bagging, Boosting, and Random Forest (RF) are applied to improve the performance of these single classifiers. The study reveals that the result shows a robust affinity between learners' behaviors and their academic attainment. Result from the study shows that the REP Tree and its ensemble record the highest accuracy of 83.33% using SEF. Hence, in terms of the Receiver Operating Curve (ROC), boosting method of REP Tree records 0.903, which is the best. This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, bagging, Random Forest, boosting, data mining, classifiers, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 760
444 Analysis of Diverse Cluster Ensemble Techniques

Authors: S. Sarumathi, N. Shanthi, P. Ranjetha

Abstract:

Data mining is the procedure of determining interesting patterns from the huge amount of data. With the intention of accessing the data faster the most supporting processes needed is clustering. Clustering is the process of identifying similarity between data according to the individuality present in the data and grouping associated data objects into clusters. Cluster ensemble is the technique to combine various runs of different clustering algorithms to obtain a general partition of the original dataset, aiming for consolidation of outcomes from a collection of individual clustering outcomes. The performances of clustering ensembles are mainly affecting by two principal factors such as diversity and quality. This paper presents the overview about the different cluster ensemble algorithm along with their methods used in cluster ensemble to improve the diversity and quality in the several cluster ensemble related papers and shows the comparative analysis of different cluster ensemble also summarize various cluster ensemble methods. Henceforth this clear analysis will be very useful for the world of clustering experts and also helps in deciding the most appropriate one to determine the problem in hand.

Keywords: Cluster Ensemble, Consensus Function, CSPA, Diversity, HGPA, MCLA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1841
443 Breast Cancer Survivability Prediction via Classifier Ensemble

Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia

Abstract:

This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.

Keywords: Classifier ensemble, breast cancer survivability, data mining, SEER.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1671
442 A Comparative Analysis of Machine Learning Techniques for PM10 Forecasting in Vilnius

Authors: M. A. S. Fahim, J. Sužiedelytė Visockienė

Abstract:

With the growing concern over air pollution (AP), it is clear that this has gained more prominence than ever before. The level of consciousness has increased and a sense of knowledge now has to be forwarded as a duty by those enlightened enough to disseminate it to others. This realization often comes after an understanding of how poor air quality indices (AQI) damage human health. The study focuses on assessing air pollution prediction models specifically for Lithuania, addressing a substantial need for empirical research within the region. Concentrating on Vilnius, it specifically examines particulate matter concentrations 10 micrometers or less in diameter (PM10). Utilizing Gaussian Process Regression (GPR) and Regression Tree Ensemble, and Regression Tree methodologies, predictive forecasting models are validated and tested using hourly data from January 2020 to December 2022. The study explores the classification of AP data into anthropogenic and natural sources, the impact of AP on human health, and its connection to cardiovascular diseases. The study revealed varying levels of accuracy among the models, with GPR achieving the highest accuracy, indicated by an RMSE of 4.14 in validation and 3.89 in testing.

Keywords: Air pollution, anthropogenic and natural sources, machine learning, Gaussian process regression, tree ensemble, forecasting models, particulate matter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 117
441 Development of an Ensemble Classification Model Based on Hybrid Filter-Wrapper Feature Selection for Email Phishing Detection

Authors: R. B. Ibrahim, M. S. Argungu, I. M. Mungadi

Abstract:

It is obvious in this present time, internet has become an indispensable part of human life since its inception. The Internet has provided diverse opportunities to make life so easy for human beings, through the adoption of various channels. Among these channels are email, internet banking, video conferencing, and the like. Email is one of the easiest means of communication hugely accepted among individuals and organizations globally. But over decades the security integrity of this platform has been challenged with malicious activities like Phishing. Email phishing is designed by phishers to fool the recipient into handing over sensitive personal information such as passwords, credit card numbers, account credentials, social security numbers, etc. This activity has caused a lot of financial damage to email users globally which has resulted in bankruptcy, sudden death of victims, and other health-related sicknesses. Although many methods have been proposed to detect email phishing, in this research, the results of multiple machine-learning methods for predicting email phishing have been compared with the use of filter-wrapper feature selection. It is worth noting that all three models performed substantially but one outperformed the other. The dataset used for these models is obtained from Kaggle online data repository, while three classifiers: decision tree, Naïve Bayes, and Logistic regression are ensemble (Bagging) respectively. Results from the study show that the Decision Tree (CART) bagging ensemble recorded the highest accuracy of 98.13% using PEF (Phishing Essential Features). This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, hybrid, filter-wrapper, phishing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 178
440 Application of Machine Learning Methods to Online Test Error Detection in Semiconductor Test

Authors: Matthias Kirmse, Uwe Petersohn, Elief Paffrath

Abstract:

As in today's semiconductor industries test costs can make up to 50 percent of the total production costs, an efficient test error detection becomes more and more important. In this paper, we present a new machine learning approach to test error detection that should provide a faster recognition of test system faults as well as an improved test error recall. The key idea is to learn a classifier ensemble, detecting typical test error patterns in wafer test results immediately after finishing these tests. Since test error detection has not yet been discussed in the machine learning community, we define central problem-relevant terms and provide an analysis of important domain properties. Finally, we present comparative studies reflecting the failure detection performance of three individual classifiers and three ensemble methods based upon them. As base classifiers we chose a decision tree learner, a support vector machine and a Bayesian network, while the compared ensemble methods were simple and weighted majority vote as well as stacking. For the evaluation, we used cross validation and a specially designed practical simulation. By implementing our approach in a semiconductor test department for the observation of two products, we proofed its practical applicability.

Keywords: Ensemble methods, fault detection, machine learning, semiconductor test.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2273
439 Evaluation of Ensemble Classifiers for Intrusion Detection

Authors: M. Govindarajan

Abstract:

One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. In this research work, new ensemble classification methods are proposed with homogeneous ensemble classifier using bagging and heterogeneous ensemble classifier using arcing and their performances are analyzed in terms of accuracy. A Classifier ensemble is designed using Radial Basis Function (RBF) and Support Vector Machine (SVM) as base classifiers. The feasibility and the benefits of the proposed approaches are demonstrated by the means of standard datasets of intrusion detection. The main originality of the proposed approach is based on three main parts: preprocessing phase, classification phase, and combining phase. A wide range of comparative experiments is conducted for standard datasets of intrusion detection. The performance of the proposed homogeneous and heterogeneous ensemble classifiers are compared to the performance of other standard homogeneous and heterogeneous ensemble methods. The standard homogeneous ensemble methods include Error correcting output codes, Dagging and heterogeneous ensemble methods include majority voting, stacking. The proposed ensemble methods provide significant improvement of accuracy compared to individual classifiers and the proposed bagged RBF and SVM performs significantly better than ECOC and Dagging and the proposed hybrid RBF-SVM performs significantly better than voting and stacking. Also heterogeneous models exhibit better results than homogeneous models for standard datasets of intrusion detection. 

Keywords: Data mining, ensemble, radial basis function, support vector machine, accuracy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1700
438 Spanning Tree Transformation of Connected Graphs into Single-Row Networks

Authors: S.L. Loh, S. Salleh, N.H. Sarmin

Abstract:

A spanning tree of a connected graph is a tree which consists the set of vertices and some or perhaps all of the edges from the connected graph. In this paper, a model for spanning tree transformation of connected graphs into single-row networks, namely Spanning Tree of Connected Graph Modeling (STCGM) will be introduced. Path-Growing Tree-Forming algorithm applied with Vertex-Prioritized is contained in the model to produce the spanning tree from the connected graph. Paths are produced by Path-Growing and they are combined into a spanning tree by Tree-Forming. The spanning tree that is produced from the connected graph is then transformed into single-row network using Tree Sequence Modeling (TSM). Finally, the single-row routing problem is solved using a method called Enhanced Simulated Annealing for Single-Row Routing (ESSR).

Keywords: Graph theory, simulated annealing, single-rowrouting and spanning tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1735
437 Recommender Systems Using Ensemble Techniques

Authors: Yeonjeong Lee, Kyoung-jae Kim, Youngtae Kim

Abstract:

This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.

Keywords: Product recommender system, Ensemble technique, Association rules, Decision tree, Artificial neural networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4221
436 Judges System for Classifiers Specialization

Authors: Abdel Rodríguez, Isis Bonet, Ricardo Grau, María M. García

Abstract:

In this paper we designed and implemented a new ensemble of classifiers based on a sequence of classifiers which were specialized in regions of the training dataset where errors of its trained homologous are concentrated. In order to separate this regions, and to determine the aptitude of each classifier to properly respond to a new case, it was used another set of classifiers built hierarchically. We explored a selection based variant to combine the base classifiers. We validated this model with different base classifiers using 37 training datasets. It was carried out a statistical comparison of these models with the well known Bagging and Boosting, obtaining significantly superior results with the hierarchical ensemble using Multilayer Perceptron as base classifier. Therefore, we demonstrated the efficacy of the proposed ensemble, as well as its applicability to general problems.

Keywords: classifiers, delegation, ensemble

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1304
435 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: Artificial neural networks, breast cancer, cancer dataset, classifiers, cervical cancer, F-score, logistic regression, machine learning, precision, recall, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1553
434 Neural Network Ensemble-based Solar Power Generation Short-Term Forecasting

Authors: A. Chaouachi, R.M. Kamel, R. Ichikawa, H. Hayashi, K. Nagasaka

Abstract:

This paper presents the applicability of artificial neural networks for 24 hour ahead solar power generation forecasting of a 20 kW photovoltaic system, the developed forecasting is suitable for a reliable Microgrid energy management. In total four neural networks were proposed, namely: multi-layred perceptron, radial basis function, recurrent and a neural network ensemble consisting in ensemble of bagged networks. Forecasting reliability of the proposed neural networks was carried out in terms forecasting error performance basing on statistical and graphical methods. The experimental results showed that all the proposed networks achieved an acceptable forecasting accuracy. In term of comparison the neural network ensemble gives the highest precision forecasting comparing to the conventional networks. In fact, each network of the ensemble over-fits to some extent and leads to a diversity which enhances the noise tolerance and the forecasting generalization performance comparing to the conventional networks.

Keywords: Neural network ensemble, Solar power generation, 24 hour forecasting, Comparative study

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3275
433 Rank-Based Chain-Mode Ensemble for Binary Classification

Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu

Abstract:

In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.

Keywords: Consensus, curse of correlation, imbalanced classification, rank-based chain-mode ensemble.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 734
432 Meta Random Forests

Authors: Praveen Boinee, Alessandro De Angelis, Gian Luca Foresti

Abstract:

Leo Breimans Random Forests (RF) is a recent development in tree based classifiers and quickly proven to be one of the most important algorithms in the machine learning literature. It has shown robust and improved results of classifications on standard data sets. Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets with mainly decision trees as their base classifiers. In this paper we experiment to apply these Meta learning techniques to the random forests. We experiment the working of the ensembles of random forests on the standard data sets available in UCI data sets. We compare the original random forest algorithm with their ensemble counterparts and discuss the results.

Keywords: Random Forests [RF], ensembles, UCI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2709
431 A Review: Comparative Analysis of Different Categorical Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, M. Sharmila

Abstract:

Over the past epoch a rampant amount of work has been done in the data clustering research under the unsupervised learning technique in Data mining. Furthermore several algorithms and methods have been proposed focusing on clustering different data types, representation of cluster models, and accuracy rates of the clusters. However no single clustering algorithm proves to be the most efficient in providing best results. Accordingly in order to find the solution to this issue a new technique, called Cluster ensemble method was bloomed. This cluster ensemble is a good alternative approach for facing the cluster analysis problem. The main hope of the cluster ensemble is to merge different clustering solutions in such a way to achieve accuracy and to improve the quality of individual data clustering. Due to the substantial and unremitting development of new methods in the sphere of data mining and also the incessant interest in inventing new algorithms, makes obligatory to scrutinize a critical analysis of the existing techniques and the future novelty. This paper exposes the comparative study of different cluster ensemble methods along with their features, systematic working process and the average accuracy and error rates of each ensemble methods. Consequently this speculative and comprehensive analysis will be very useful for the community of clustering practitioners and also helps in deciding the most suitable one to rectify the problem in hand.

Keywords: Clustering, Cluster Ensemble methods, Co-association matrix, Consensus function, Median partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2603
430 Construction Of Decentralized Lifetime Maximizing Tree for Data Aggregation in Wireless Sensor Networks

Authors: Deepali Virmani , Satbir Jain

Abstract:

To meet the demands of wireless sensor networks (WSNs) where data are usually aggregated at a single source prior to transmitting to any distant user, there is a need to establish a tree structure inside any given event region. In this paper , a novel technique to create one such tree is proposed .This tree preserves the energy and maximizes the lifetime of event sources while they are constantly transmitting for data aggregation. The term Decentralized Lifetime Maximizing Tree (DLMT) is used to denote this tree. DLMT features in nodes with higher energy tend to be chosen as data aggregating parents so that the time to detect the first broken tree link can be extended and less energy is involved in tree maintenance. By constructing the tree in such a way, the protocol is able to reduce the frequency of tree reconstruction, minimize the amount of data loss ,minimize the delay during data collection and preserves the energy.

Keywords: branch energy, decentralized, energy level , lifetime, tree energy, wireless sensor networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1487
429 An Experimental Study of a Self-Supervised Classifier Ensemble

Authors: Neamat El Gayar

Abstract:

Learning using labeled and unlabelled data has received considerable amount of attention in the machine learning community due its potential in reducing the need for expensive labeled data. In this work we present a new method for combining labeled and unlabeled data based on classifier ensembles. The model we propose assumes each classifier in the ensemble observes the input using different set of features. Classifiers are initially trained using some labeled samples. The trained classifiers learn further through labeling the unknown patterns using a teaching signals that is generated using the decision of the classifier ensemble, i.e. the classifiers self-supervise each other. Experiments on a set of object images are presented. Our experiments investigate different classifier models, different fusing techniques, different training sizes and different input features. Experimental results reveal that the proposed self-supervised ensemble learning approach reduces classification error over the single classifier and the traditional ensemble classifier approachs.

Keywords: Multiple Classifier Systems, classifier ensembles, learning using labeled and unlabelled data, K-nearest neighbor classifier, Bayes classifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1643
428 Enhancing Predictive Accuracy in Pharmaceutical Sales Through an Ensemble Kernel Gaussian Process Regression Approach

Authors: Shahin Mirshekari, Mohammadreza Moradi, Hossein Jafari, Mehdi Jafari, Mohammad Ensaf

Abstract:

This research employs Gaussian Process Regression (GPR) with an ensemble kernel, integrating Exponential Squared, Revised Matérn, and Rational Quadratic kernels to analyze pharmaceutical sales data. Bayesian optimization was used to identify optimal kernel weights: 0.76 for Exponential Squared, 0.21 for Revised Matérn, and 0.13 for Rational Quadratic. The ensemble kernel demonstrated superior performance in predictive accuracy, achieving an R² score near 1.0, and significantly lower values in MSE, MAE, and RMSE. These findings highlight the efficacy of ensemble kernels in GPR for predictive analytics in complex pharmaceutical sales datasets.

Keywords: Gaussian Process Regression, Ensemble Kernels, Bayesian Optimization, Pharmaceutical Sales Analysis, Time Series Forecasting, Data Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 111
427 Combining Bagging and Additive Regression

Authors: Sotiris B. Kotsiantis

Abstract:

Bagging and boosting are among the most popular re-sampling ensemble methods that generate and combine a diversity of regression models using the same learning algorithm as base-learner. Boosting algorithms are considered stronger than bagging on noise-free data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in this work we built an ensemble using an averaging methodology of bagging and boosting ensembles with 10 sub-learners in each one. We performed a comparison with simple bagging and boosting ensembles with 25 sub-learners on standard benchmark datasets and the proposed ensemble gave better accuracy.

Keywords: Regressors, statistical learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1639
426 An ensemble of Weighted Support Vector Machines for Ordinal Regression

Authors: Willem Waegeman, Luc Boullart

Abstract:

Instead of traditional (nominal) classification we investigate the subject of ordinal classification or ranking. An enhanced method based on an ensemble of Support Vector Machines (SVM-s) is proposed. Each binary classifier is trained with specific weights for each object in the training data set. Experiments on benchmark datasets and synthetic data indicate that the performance of our approach is comparable to state of the art kernel methods for ordinal regression. The ensemble method, which is straightforward to implement, provides a very good sensitivity-specificity trade-off for the highest and lowest rank.

Keywords: Ordinal regression, support vector machines, ensemblelearning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1641
425 Balancing of Quad Tree using Point Pattern Analysis

Authors: Amitava Chakraborty, Sudip Kumar De, Ranjan Dasgupta

Abstract:

Point quad tree is considered as one of the most common data organizations to deal with spatial data & can be used to increase the efficiency for searching the point features. As the efficiency of the searching technique depends on the height of the tree, arbitrary insertion of the point features may make the tree unbalanced and lead to higher time of searching. This paper attempts to design an algorithm to make a nearly balanced quad tree. Point pattern analysis technique has been applied for this purpose which shows a significant enhancement of the performance and the results are also included in the paper for the sake of completeness.

Keywords: Algorithm, Height balanced tree, Point patternanalysis, Point quad tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2698
424 Developing of Thai Classical Music Ensemble in Rattanakosin Period

Authors: Pansak Vandee

Abstract:

The research titled “Developing of Thai Classical Music Ensemble in Rattanakosin Period" aimed 1) to study the history of Thai Classical Music Ensemble in Rattanakosin Period and 2) to analyze changing in each period of Rattanakosin Era. This is the historical and documentary research. The data was collected by in-depth interview those musicians, and academic music experts and field study. The focus group discussion was conducted to analyze and conclude the findings. The research found that the history of Thai Classical Music Ensemble in Rattanakosin Period derived from the Ayutthaya period. Thai classical music ensemble consisted of “Wong Pipat", “Wong Mahori", “Wong Kreang Sai". “Wong Kubmai", “Wong Krongkak", “Brass Band", and “Kan Band" which were used to ceremony, ritual, drama, performs and entertainment. Changed of the Thai music in the early Rattanakosin Period were passed from the Ayutthaya Period and the influence of the western civilization. New Band formed in Thai Music were “Orchestra" and “Contemporary Band". The role of Thai music was changed from the ceremonial rituals to entertainment. Development of the Thai music during the reign of King Rama 1 to King Rama 7, was improved from the court. But after the revolution, the musical patronage of the court was maintained by the Government. Thai Classical Music Ensemble were performed to be standard pattern.

Keywords: Development, Rattanakosin Period, Thai Classical Music Ensemble.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3006
423 The Application of an Ensemble of Boosted Elman Networks to Time Series Prediction: A Benchmark Study

Authors: Chee Peng Lim, Wei Yee Goh

Abstract:

In this paper, the application of multiple Elman neural networks to time series data regression problems is studied. An ensemble of Elman networks is formed by boosting to enhance the performance of the individual networks. A modified version of the AdaBoost algorithm is employed to integrate the predictions from multiple networks. Two benchmark time series data sets, i.e., the Sunspot and Box-Jenkins gas furnace problems, are used to assess the effectiveness of the proposed system. The simulation results reveal that an ensemble of boosted Elman networks can achieve a higher degree of generalization as well as performance than that of the individual networks. The results are compared with those from other learning systems, and implications of the performance are discussed.

Keywords: AdaBoost, Elman network, neural network ensemble, time series regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1689
422 Performance Assessment of Multi-Level Ensemble for Multi-Class Problems

Authors: Rodolfo Lorbieski, Silvia Modesto Nassar

Abstract:

Many supervised machine learning tasks require decision making across numerous different classes. Multi-class classification has several applications, such as face recognition, text recognition and medical diagnostics. The objective of this article is to analyze an adapted method of Stacking in multi-class problems, which combines ensembles within the ensemble itself. For this purpose, a training similar to Stacking was used, but with three levels, where the final decision-maker (level 2) performs its training by combining outputs from the tree-based pair of meta-classifiers (level 1) from Bayesian families. These are in turn trained by pairs of base classifiers (level 0) of the same family. This strategy seeks to promote diversity among the ensembles forming the meta-classifier level 2. Three performance measures were used: (1) accuracy, (2) area under the ROC curve, and (3) time for three factors: (a) datasets, (b) experiments and (c) levels. To compare the factors, ANOVA three-way test was executed for each performance measure, considering 5 datasets by 25 experiments by 3 levels. A triple interaction between factors was observed only in time. The accuracy and area under the ROC curve presented similar results, showing a double interaction between level and experiment, as well as for the dataset factor. It was concluded that level 2 had an average performance above the other levels and that the proposed method is especially efficient for multi-class problems when compared to binary problems.

Keywords: Stacking, multi-layers, ensemble, multi-class.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1093
421 Clinical Decision Support for Disease Classification based on the Tests Association

Authors: Sung Ho Ha, Seong Hyeon Joo, Eun Kyung Kwon

Abstract:

Until recently, researchers have developed various tools and methodologies for effective clinical decision-making. Among those decisions, chest pain diseases have been one of important diagnostic issues especially in an emergency department. To improve the ability of physicians in diagnosis, many researchers have developed diagnosis intelligence by using machine learning and data mining. However, most of the conventional methodologies have been generally based on a single classifier for disease classification and prediction, which shows moderate performance. This study utilizes an ensemble strategy to combine multiple different classifiers to help physicians diagnose chest pain diseases more accurately than ever. Specifically the ensemble strategy is applied by using the integration of decision trees, neural networks, and support vector machines. The ensemble models are applied to real-world emergency data. This study shows that the performance of the ensemble models is superior to each of single classifiers.

Keywords: Diagnosis intelligence, ensemble approach, data mining, emergency department

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1634
420 Lipschitz Classifiers Ensembles: Usage for Classification of Target Events in C-OTDR Monitoring Systems

Authors: Andrey V. Timofeev

Abstract:

This paper introduces an original method for guaranteed estimation of the accuracy for an ensemble of Lipschitz classifiers. The solution was obtained as a finite closed set of alternative hypotheses, which contains an object of classification with probability of not less than the specified value. Thus, the classification is represented by a set of hypothetical classes. In this case, the smaller the cardinality of the discrete set of hypothetical classes is, the higher is the classification accuracy. Experiments have shown that if cardinality of the classifiers ensemble is increased then the cardinality of this set of hypothetical classes is reduced. The problem of the guaranteed estimation of the accuracy for an ensemble of Lipschitz classifiers is relevant in multichannel classification of target events in C-OTDR monitoring systems. Results of suggested approach practical usage to accuracy control in C-OTDR monitoring systems are present.

Keywords: Lipschitz classifiers, confidence set, C-OTDR monitoring, classifiers accuracy, classifiers ensemble.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1953
419 An Attribute-Centre Based Decision Tree Classification Algorithm

Authors: Gökhan Silahtaroğlu

Abstract:

Decision tree algorithms have very important place at classification model of data mining. In literature, algorithms use entropy concept or gini index to form the tree. The shape of the classes and their closeness to each other some of the factors that affect the performance of the algorithm. In this paper we introduce a new decision tree algorithm which employs data (attribute) folding method and variation of the class variables over the branches to be created. A comparative performance analysis has been held between the proposed algorithm and C4.5.

Keywords: Classification, decision tree, split, pruning, entropy, gini.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1368
418 A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods

Authors: S. Sarumathi, N. Shanthi, S. Vidhya, M. Sharmila

Abstract:

An extensive amount of work has been done in data clustering research under the unsupervised learning technique in Data Mining during the past two decades. Moreover, several approaches and methods have been emerged focusing on clustering diverse data types, features of cluster models and similarity rates of clusters. However, none of the single clustering algorithm exemplifies its best nature in extracting efficient clusters. Consequently, in order to rectify this issue, a new challenging technique called Cluster Ensemble method was bloomed. This new approach tends to be the alternative method for the cluster analysis problem. The main objective of the Cluster Ensemble is to aggregate the diverse clustering solutions in such a way to attain accuracy and also to improve the eminence the individual clustering algorithms. Due to the massive and rapid development of new methods in the globe of data mining, it is highly mandatory to scrutinize a vital analysis of existing techniques and the future novelty. This paper shows the comparative analysis of different cluster ensemble methods along with their methodologies and salient features. Henceforth this unambiguous analysis will be very useful for the society of clustering experts and also helps in deciding the most appropriate one to resolve the problem in hand.

Keywords: Clustering, Cluster Ensemble Methods, Coassociation matrix, Consensus Function, Median Partition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2104
417 Improving Fault Resilience and Reconstruction of Overlay Multicast Tree Using Leaving Time of Participants

Authors: Bhed Bahadur Bista

Abstract:

Network layer multicast, i.e. IP multicast, even after many years of research, development and standardization, is not deployed in large scale due to both technical (e.g. upgrading of routers) and political (e.g. policy making and negotiation) issues. Researchers looked for alternatives and proposed application/overlay multicast where multicast functions are handled by end hosts, not network layer routers. Member hosts wishing to receive multicast data form a multicast delivery tree. The intermediate hosts in the tree act as routers also, i.e. they forward data to the lower hosts in the tree. Unlike IP multicast, where a router cannot leave the tree until all members below it leave, in overlay multicast any member can leave the tree at any time thus disjoining the tree and disrupting the data dissemination. All the disrupted hosts have to rejoin the tree. This characteristic of the overlay multicast causes multicast tree unstable, data loss and rejoin overhead. In this paper, we propose that each node sets its leaving time from the tree and sends join request to a number of nodes in the tree. The nodes in the tree will reject the request if their leaving time is earlier than the requesting node otherwise they will accept the request. The node can join at one of the accepting nodes. This makes the tree more stable as the nodes will join the tree according to their leaving time, earliest leaving time node being at the leaf of the tree. Some intermediate nodes may not follow their leaving time and leave earlier than their leaving time thus disrupting the tree. For this, we propose a proactive recovery mechanism so that disrupted nodes can rejoin the tree at predetermined nodes immediately. We have shown by simulation that there is less overhead when joining the multicast tree and the recovery time of the disrupted nodes is much less than the previous works. Keywords

Keywords: Network layer multicast, Fault Resilience, IP multicast

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1387