Search results for: rank-based chain-mode ensemble
191 Evaluation of Ensemble Classifiers for Intrusion Detection
Authors: M. Govindarajan
Abstract:
One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. In this research work, new ensemble classification methods are proposed with homogeneous ensemble classifier using bagging and heterogeneous ensemble classifier using arcing and their performances are analyzed in terms of accuracy. A Classifier ensemble is designed using Radial Basis Function (RBF) and Support Vector Machine (SVM) as base classifiers. The feasibility and the benefits of the proposed approaches are demonstrated by the means of standard datasets of intrusion detection. The main originality of the proposed approach is based on three main parts: preprocessing phase, classification phase, and combining phase. A wide range of comparative experiments is conducted for standard datasets of intrusion detection. The performance of the proposed homogeneous and heterogeneous ensemble classifiers are compared to the performance of other standard homogeneous and heterogeneous ensemble methods. The standard homogeneous ensemble methods include Error correcting output codes, Dagging and heterogeneous ensemble methods include majority voting, stacking. The proposed ensemble methods provide significant improvement of accuracy compared to individual classifiers and the proposed bagged RBF and SVM performs significantly better than ECOC and Dagging and the proposed hybrid RBF-SVM performs significantly better than voting and stacking. Also heterogeneous models exhibit better results than homogeneous models for standard datasets of intrusion detection.Keywords: data mining, ensemble, radial basis function, support vector machine, accuracy
Procedia PDF Downloads 249190 Random Subspace Ensemble of CMAC Classifiers
Authors: Somaiyeh Dehghan, Mohammad Reza Kheirkhahan Haghighi
Abstract:
The rapid growth of domains that have data with a large number of features, while the number of samples is limited has caused difficulty in constructing strong classifiers. To reduce the dimensionality of the feature space becomes an essential step in classification task. Random subspace method (or attribute bagging) is an ensemble classifier that consists of several classifiers that each base learner in ensemble has subset of features. In the present paper, we introduce Random Subspace Ensemble of CMAC neural network (RSE-CMAC), each of which has training with subset of features. Then we use this model for classification task. For evaluation performance of our model, we compare it with bagging algorithm on 36 UCI datasets. The results reveal that the new model has better performance.Keywords: classification, random subspace, ensemble, CMAC neural network
Procedia PDF Downloads 332189 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis
Authors: C. B. Le, V. N. Pham
Abstract:
In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering
Procedia PDF Downloads 191188 Rank-Based Chain-Mode Ensemble for Binary Classification
Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu
Abstract:
In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.Keywords: consensus, curse of correlation, imbalance classification, rank-based chain-mode ensemble
Procedia PDF Downloads 138187 Sentiment Analysis of Ensemble-Based Classifiers for E-Mail Data
Authors: Muthukumarasamy Govindarajan
Abstract:
Detection of unwanted, unsolicited mails called spam from email is an interesting area of research. It is necessary to evaluate the performance of any new spam classifier using standard data sets. Recently, ensemble-based classifiers have gained popularity in this domain. In this research work, an efficient email filtering approach based on ensemble methods is addressed for developing an accurate and sensitive spam classifier. The proposed approach employs Naive Bayes (NB), Support Vector Machine (SVM) and Genetic Algorithm (GA) as base classifiers along with different ensemble methods. The experimental results show that the ensemble classifier was performing with accuracy greater than individual classifiers, and also hybrid model results are found to be better than the combined models for the e-mail dataset. The proposed ensemble-based classifiers turn out to be good in terms of classification accuracy, which is considered to be an important criterion for building a robust spam classifier.Keywords: accuracy, arcing, bagging, genetic algorithm, Naive Bayes, sentiment mining, support vector machine
Procedia PDF Downloads 143186 Extreme Temperature Response to Solar Radiation Management in Southeast Asia
Authors: Heri Kuswanto, Brina Miftahurrohmah, Fatkhurokhman Fauzi
Abstract:
Southeast Asia has experienced rising temperatures and is predicted to reach a 1.5°C increase by 2030, which is earlier than the Paris Agreement target. Solar Radiation Management (SRM) has been proposed as an alternative to combat global warming. This research investigates changes in the annual maximum temperature (TXx) with and without SRM over southeast Asia. We examined outputs from three ensemble members of the Geoengineering Large Ensemble Project (GLENS) experiment for the period 2051 to 2080. One ensemble member generated outputs that significantly deviated from the others, leading to the removal of ensemble 3 from the impact analysis. Our observations indicate that the magnitude of TXx changes with SRM is heterogeneous across countries. We found that SRM significantly reduces TXx levels compared to historical periods. Furthermore, SRM can reduce temperatures by up to 5°C compared to scenarios without SRM, with even more pronounced effects in Thailand, Cambodia, Laos, and Myanmar. This indicates that SRM can mitigate climate change by lowering future TXx levels.Keywords: solar radiation management, GLENS, extreme, temperature, ensemble
Procedia PDF Downloads 18185 Enhancing Predictive Accuracy in Pharmaceutical Sales through an Ensemble Kernel Gaussian Process Regression Approach
Authors: Shahin Mirshekari, Mohammadreza Moradi, Hossein Jafari, Mehdi Jafari, Mohammad Ensaf
Abstract:
This research employs Gaussian Process Regression (GPR) with an ensemble kernel, integrating Exponential Squared, Revised Matern, and Rational Quadratic kernels to analyze pharmaceutical sales data. Bayesian optimization was used to identify optimal kernel weights: 0.76 for Exponential Squared, 0.21 for Revised Matern, and 0.13 for Rational Quadratic. The ensemble kernel demonstrated superior performance in predictive accuracy, achieving an R² score near 1.0, and significantly lower values in MSE, MAE, and RMSE. These findings highlight the efficacy of ensemble kernels in GPR for predictive analytics in complex pharmaceutical sales datasets.Keywords: Gaussian process regression, ensemble kernels, bayesian optimization, pharmaceutical sales analysis, time series forecasting, data analysis
Procedia PDF Downloads 71184 Feature Evaluation Based on Random Subspace and Multiple-K Ensemble
Authors: Jaehong Yu, Seoung Bum Kim
Abstract:
Clustering analysis can facilitate the extraction of intrinsic patterns in a dataset and reveal its natural groupings without requiring class information. For effective clustering analysis in high dimensional datasets, unsupervised dimensionality reduction is an important task. Unsupervised dimensionality reduction can generally be achieved by feature extraction or feature selection. In many situations, feature selection methods are more appropriate than feature extraction methods because of their clear interpretation with respect to the original features. The unsupervised feature selection can be categorized as feature subset selection and feature ranking method, and we focused on unsupervised feature ranking methods which evaluate the features based on their importance scores. Recently, several unsupervised feature ranking methods were developed based on ensemble approaches to achieve their higher accuracy and stability. However, most of the ensemble-based feature ranking methods require the true number of clusters. Furthermore, these algorithms evaluate the feature importance depending on the ensemble clustering solution, and they produce undesirable evaluation results if the clustering solutions are inaccurate. To address these limitations, we proposed an ensemble-based feature ranking method with random subspace and multiple-k ensemble (FRRM). The proposed FRRM algorithm evaluates the importance of each feature with the random subspace ensemble, and all evaluation results are combined with the ensemble importance scores. Moreover, FRRM does not require the determination of the true number of clusters in advance through the use of the multiple-k ensemble idea. Experiments on various benchmark datasets were conducted to examine the properties of the proposed FRRM algorithm and to compare its performance with that of existing feature ranking methods. The experimental results demonstrated that the proposed FRRM outperformed the competitors.Keywords: clustering analysis, multiple-k ensemble, random subspace-based feature evaluation, unsupervised feature ranking
Procedia PDF Downloads 339183 Application of Bayesian Model Averaging and Geostatistical Output Perturbation to Generate Calibrated Ensemble Weather Forecast
Authors: Muhammad Luthfi, Sutikno Sutikno, Purhadi Purhadi
Abstract:
Weather forecast has necessarily been improved to provide the communities an accurate and objective prediction as well. To overcome such issue, the numerical-based weather forecast was extensively developed to reduce the subjectivity of forecast. Yet the Numerical Weather Predictions (NWPs) outputs are unfortunately issued without taking dynamical weather behavior and local terrain features into account. Thus, NWPs outputs are not able to accurately forecast the weather quantities, particularly for medium and long range forecast. The aim of this research is to aid and extend the development of ensemble forecast for Meteorology, Climatology, and Geophysics Agency of Indonesia. Ensemble method is an approach combining various deterministic forecast to produce more reliable one. However, such forecast is biased and uncalibrated due to its underdispersive or overdispersive nature. As one of the parametric methods, Bayesian Model Averaging (BMA) generates the calibrated ensemble forecast and constructs predictive PDF for specified period. Such method is able to utilize ensemble of any size but does not take spatial correlation into account. Whereas space dependencies involve the site of interest and nearby site, influenced by dynamic weather behavior. Meanwhile, Geostatistical Output Perturbation (GOP) reckons the spatial correlation to generate future weather quantities, though merely built by a single deterministic forecast, and is able to generate an ensemble of any size as well. This research conducts both BMA and GOP to generate the calibrated ensemble forecast for the daily temperature at few meteorological sites nearby Indonesia international airport.Keywords: Bayesian Model Averaging, ensemble forecast, geostatistical output perturbation, numerical weather prediction, temperature
Procedia PDF Downloads 282182 Lipschitz Classifiers Ensembles: Usage for Classification of Target Events in C-OTDR Monitoring Systems
Authors: Andrey V. Timofeev
Abstract:
This paper introduces an original method for guaranteed estimation of the accuracy of an ensemble of Lipschitz classifiers. The solution was obtained as a finite closed set of alternative hypotheses, which contains an object of classification with a probability of not less than the specified value. Thus, the classification is represented by a set of hypothetical classes. In this case, the smaller the cardinality of the discrete set of hypothetical classes is, the higher is the classification accuracy. Experiments have shown that if the cardinality of the classifiers ensemble is increased then the cardinality of this set of hypothetical classes is reduced. The problem of the guaranteed estimation of the accuracy of an ensemble of Lipschitz classifiers is relevant in the multichannel classification of target events in C-OTDR monitoring systems. Results of suggested approach practical usage to accuracy control in C-OTDR monitoring systems are present.Keywords: Lipschitz classifiers, confidence set, C-OTDR monitoring, classifiers accuracy, classifiers ensemble
Procedia PDF Downloads 493181 Simulation of Optimal Runoff Hydrograph Using Ensemble of Radar Rainfall and Blending of Runoffs Model
Authors: Myungjin Lee, Daegun Han, Jongsung Kim, Soojun Kim, Hung Soo Kim
Abstract:
Recently, the localized heavy rainfall and typhoons are frequently occurred due to the climate change and the damage is becoming bigger. Therefore, we may need a more accurate prediction of the rainfall and runoff. However, the gauge rainfall has the limited accuracy in space. Radar rainfall is better than gauge rainfall for the explanation of the spatial variability of rainfall but it is mostly underestimated with the uncertainty involved. Therefore, the ensemble of radar rainfall was simulated using error structure to overcome the uncertainty and gauge rainfall. The simulated ensemble was used as the input data of the rainfall-runoff models for obtaining the ensemble of runoff hydrographs. The previous studies discussed about the accuracy of the rainfall-runoff model. Even if the same input data such as rainfall is used for the runoff analysis using the models in the same basin, the models can have different results because of the uncertainty involved in the models. Therefore, we used two models of the SSARR model which is the lumped model, and the Vflo model which is a distributed model and tried to simulate the optimum runoff considering the uncertainty of each rainfall-runoff model. The study basin is located in Han river basin and we obtained one integrated runoff hydrograph which is an optimum runoff hydrograph using the blending methods such as Multi-Model Super Ensemble (MMSE), Simple Model Average (SMA), Mean Square Error (MSE). From this study, we could confirm the accuracy of rainfall and rainfall-runoff model using ensemble scenario and various rainfall-runoff model and we can use this result to study flood control measure due to climate change. Acknowledgements: This work is supported by the Korea Agency for Infrastructure Technology Advancement(KAIA) grant funded by the Ministry of Land, Infrastructure and Transport (Grant 18AWMP-B083066-05).Keywords: radar rainfall ensemble, rainfall-runoff models, blending method, optimum runoff hydrograph
Procedia PDF Downloads 280180 Breast Cancer Survivability Prediction via Classifier Ensemble
Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia
Abstract:
This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.Keywords: classifier ensemble, breast cancer survivability, data mining, SEER
Procedia PDF Downloads 329179 A Video Surveillance System Using an Ensemble of Simple Neural Network Classifiers
Authors: Rodrigo S. Moreira, Nelson F. F. Ebecken
Abstract:
This paper proposes a maritime vessel tracker composed of an ensemble of WiSARD weightless neural network classifiers. A failure detector analyzes vessel movement with a Kalman filter and corrects the tracking, if necessary, using FFT matching. The use of the WiSARD neural network to track objects is uncommon. The additional contributions of the present study include a performance comparison with four state-of-art trackers, an experimental study of the features that improve maritime vessel tracking, the first use of an ensemble of classifiers to track maritime vessels and a new quantization algorithm that compares the values of pixel pairs.Keywords: ram memory, WiSARD weightless neural network, object tracking, quantization
Procedia PDF Downloads 312178 Statistical Comparison of Ensemble Based Storm Surge Forecasting Models
Authors: Amin Salighehdar, Ziwen Ye, Mingzhe Liu, Ionut Florescu, Alan F. Blumberg
Abstract:
Storm surge is an abnormal water level caused by a storm. Accurate prediction of a storm surge is a challenging problem. Researchers developed various ensemble modeling techniques to combine several individual forecasts to produce an overall presumably better forecast. There exist some simple ensemble modeling techniques in literature. For instance, Model Output Statistics (MOS), and running mean-bias removal are widely used techniques in storm surge prediction domain. However, these methods have some drawbacks. For instance, MOS is based on multiple linear regression and it needs a long period of training data. To overcome the shortcomings of these simple methods, researchers propose some advanced methods. For instance, ENSURF (Ensemble SURge Forecast) is a multi-model application for sea level forecast. This application creates a better forecast of sea level using a combination of several instances of the Bayesian Model Averaging (BMA). An ensemble dressing method is based on identifying best member forecast and using it for prediction. Our contribution in this paper can be summarized as follows. First, we investigate whether the ensemble models perform better than any single forecast. Therefore, we need to identify the single best forecast. We present a methodology based on a simple Bayesian selection method to select the best single forecast. Second, we present several new and simple ways to construct ensemble models. We use correlation and standard deviation as weights in combining different forecast models. Third, we use these ensembles and compare with several existing models in literature to forecast storm surge level. We then investigate whether developing a complex ensemble model is indeed needed. To achieve this goal, we use a simple average (one of the simplest and widely used ensemble model) as benchmark. Predicting the peak level of Surge during a storm as well as the precise time at which this peak level takes place is crucial, thus we develop a statistical platform to compare the performance of various ensemble methods. This statistical analysis is based on root mean square error of the ensemble forecast during the testing period and on the magnitude and timing of the forecasted peak surge compared to the actual time and peak. In this work, we analyze four hurricanes: hurricanes Irene and Lee in 2011, hurricane Sandy in 2012, and hurricane Joaquin in 2015. Since hurricane Irene developed at the end of August 2011 and hurricane Lee started just after Irene at the beginning of September 2011, in this study we consider them as a single contiguous hurricane event. The data set used for this study is generated by the New York Harbor Observing and Prediction System (NYHOPS). We find that even the simplest possible way of creating an ensemble produces results superior to any single forecast. We also show that the ensemble models we propose generally have better performance compared to the simple average ensemble technique.Keywords: Bayesian learning, ensemble model, statistical analysis, storm surge prediction
Procedia PDF Downloads 309177 Faster, Lighter, More Accurate: A Deep Learning Ensemble for Content Moderation
Authors: Arian Hosseini, Mahmudul Hasan
Abstract:
To address the increasing need for efficient and accurate content moderation, we propose an efficient and lightweight deep classification ensemble structure. Our approach is based on a combination of simple visual features, designed for high-accuracy classification of violent content with low false positives. Our ensemble architecture utilizes a set of lightweight models with narrowed-down color features, and we apply it to both images and videos. We evaluated our approach using a large dataset of explosion and blast contents and compared its performance to popular deep learning models such as ResNet-50. Our evaluation results demonstrate significant improvements in prediction accuracy, while benefiting from 7.64x faster inference and lower computation cost. While our approach is tailored to explosion detection, it can be applied to other similar content moderation and violence detection use cases as well. Based on our experiments, we propose a "think small, think many" philosophy in classification scenarios. We argue that transforming a single, large, monolithic deep model into a verification-based step model ensemble of multiple small, simple, and lightweight models with narrowed-down visual features can possibly lead to predictions with higher accuracy.Keywords: deep classification, content moderation, ensemble learning, explosion detection, video processing
Procedia PDF Downloads 55176 Assessing Student Collaboration in Music Ensemble Class: From the Formulation of Grading Rubrics to Their Effective Implementation
Authors: Jason Sah
Abstract:
Music ensemble class is a non-traditional classroom in the sense that it is always a group effort during rehearsal. When measuring student performance ability in class, it is imperative that the grading rubric includes a collaborative skill component. Assessments that stop short of testing students' ability to make music with others undermine the group mentality by elevating individual prowess. Applying empirical and evidence-based methodology, this research develops a grading rubric that defines the criteria for assessing collaborative skill, and then explores different strategies for implementing this rubric in a timely and effective manner. Findings show that when collaborative skill is regularly tested, students gradually shift their attention from playing their own part well to sharing their part with others.Keywords: assessment, ensemble class, grading rubric, student collaboration
Procedia PDF Downloads 136175 An Ensemble-based Method for Vehicle Color Recognition
Authors: Saeedeh Barzegar Khalilsaraei, Manoocheher Kelarestaghi, Farshad Eshghi
Abstract:
The vehicle color, as a prominent and stable feature, helps to identify a vehicle more accurately. As a result, vehicle color recognition is of great importance in intelligent transportation systems. Unlike conventional methods which use only a single Convolutional Neural Network (CNN) for feature extraction or classification, in this paper, four CNNs, with different architectures well-performing in different classes, are trained to extract various features from the input image. To take advantage of the distinct capability of each network, the multiple outputs are combined using a stack generalization algorithm as an ensemble technique. As a result, the final model performs better than each CNN individually in vehicle color identification. The evaluation results in terms of overall average accuracy and accuracy variance show the proposed method’s outperformance compared to the state-of-the-art rivals.Keywords: Vehicle Color Recognition, Ensemble Algorithm, Stack Generalization, Convolutional Neural Network
Procedia PDF Downloads 85174 Machine Learning Predictive Models for Hydroponic Systems: A Case Study Nutrient Film Technique and Deep Flow Technique
Authors: Kritiyaporn Kunsook
Abstract:
Machine learning algorithms (MLAs) such us artificial neural networks (ANNs), decision tree, support vector machines (SVMs), Naïve Bayes, and ensemble classifier by voting are powerful data driven methods that are relatively less widely used in the mapping of technique of system, and thus have not been comparatively evaluated together thoroughly in this field. The performances of a series of MLAs, ANNs, decision tree, SVMs, Naïve Bayes, and ensemble classifier by voting in technique of hydroponic systems prospectively modeling are compared based on the accuracy of each model. Classification of hydroponic systems only covers the test samples from vegetables grown with Nutrient film technique (NFT) and Deep flow technique (DFT). The feature, which are the characteristics of vegetables compose harvesting height width, temperature, require light and color. The results indicate that the classification performance of the ANNs is 98%, decision tree is 98%, SVMs is 97.33%, Naïve Bayes is 96.67%, and ensemble classifier by voting is 98.96% algorithm respectively.Keywords: artificial neural networks, decision tree, support vector machines, naïve Bayes, ensemble classifier by voting
Procedia PDF Downloads 375173 Ensemble-Based SVM Classification Approach for miRNA Prediction
Authors: Sondos M. Hammad, Sherin M. ElGokhy, Mahmoud M. Fahmy, Elsayed A. Sallam
Abstract:
In this paper, an ensemble-based Support Vector Machine (SVM) classification approach is proposed. It is used for miRNA prediction. Three problems, commonly associated with previous approaches, are alleviated. These problems arise due to impose assumptions on the secondary structural of premiRNA, imbalance between the numbers of the laboratory checked miRNAs and the pseudo-hairpins, and finally using a training data set that does not consider all the varieties of samples in different species. We aggregate the predicted outputs of three well-known SVM classifiers; namely, Triplet-SVM, Virgo and Mirident, weighted by their variant features without any structural assumptions. An additional SVM layer is used in aggregating the final output. The proposed approach is trained and then tested with balanced data sets. The results of the proposed approach outperform the three base classifiers. Improved values for the metrics of 88.88% f-score, 92.73% accuracy, 90.64% precision, 96.64% specificity, 87.2% sensitivity, and the area under the ROC curve is 0.91 are achieved.Keywords: MiRNAs, SVM classification, ensemble algorithm, assumption problem, imbalance data
Procedia PDF Downloads 349172 Recommender Systems Using Ensemble Techniques
Authors: Yeonjeong Lee, Kyoung-jae Kim, Youngtae Kim
Abstract:
This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.Keywords: product recommender system, ensemble technique, association rules, decision tree, artificial neural networks
Procedia PDF Downloads 295171 Differences in the Level of Self-Efficacy and Intensity of Narcissism among Band and Solo Musicians
Authors: Weronika Molińska, Joanna Rajchert
Abstract:
A musical career is not only about the quality of performing or playing music. Musicians can choose from a variety of specializations and career paths. The described study focused on psychological traits which relate to a solo career (performing individually or as a leader) or performing as part of a chamber ensemble, ensemble, choir, or orchestra. The hypothesis predicted that narcissism and self-efficacy would be higher in musicians performing solo. The study involved 124 professional musicians: instrumentalists and soloists, singers (n = 59), and ensemble instrumentalists and singers (n = 65). The results confirmed the hypothesis and showed that soloists were higher on self-efficacy and narcissism. In particular, soloists were higher on leader characteristics, demand for admiration, and vanity than musicians performing in ensembles. The result of these studies is a good introduction to a broader project answering the questions of what can increase or decrease the musician's sense of self-efficacy and whether the decreased self-efficacy could induce musicians to give up their solo careers.Keywords: self-efficacy, musicians, musical profession, narcissism, soloists
Procedia PDF Downloads 66170 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education
Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue
Abstract:
In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education
Procedia PDF Downloads 109169 An Ensemble Learning Method for Applying Particle Swarm Optimization Algorithms to Systems Engineering Problems
Authors: Ken Hampshire, Thomas Mazzuchi, Shahram Sarkani
Abstract:
As a subset of metaheuristics, nature-inspired optimization algorithms such as particle swarm optimization (PSO) have shown promise both in solving intractable problems and in their extensibility to novel problem formulations due to their general approach requiring few assumptions. Unfortunately, single instantiations of algorithms require detailed tuning of parameters and cannot be proven to be best suited to a particular illustrative problem on account of the “no free lunch” (NFL) theorem. Using these algorithms in real-world problems requires exquisite knowledge of the many techniques and is not conducive to reconciling the various approaches to given classes of problems. This research aims to present a unified view of PSO-based approaches from the perspective of relevant systems engineering problems, with the express purpose of then eliciting the best solution for any problem formulation in an ensemble learning bucket of models approach. The central hypothesis of the research is that extending the PSO algorithms found in the literature to real-world optimization problems requires a general ensemble-based method for all problem formulations but a specific implementation and solution for any instance. The main results are a problem-based literature survey and a general method to find more globally optimal solutions for any systems engineering optimization problem.Keywords: particle swarm optimization, nature-inspired optimization, metaheuristics, systems engineering, ensemble learning
Procedia PDF Downloads 99168 Methods for Enhancing Ensemble Learning or Improving Classifiers of This Technique in the Analysis and Classification of Brain Signals
Authors: Seyed Mehdi Ghezi, Hesam Hasanpoor
Abstract:
This scientific article explores enhancement methods for ensemble learning with the aim of improving the performance of classifiers in the analysis and classification of brain signals. The research approach in this field consists of two main parts, each with its own strengths and weaknesses. The choice of approach depends on the specific research question and available resources. By combining these approaches and leveraging their respective strengths, researchers can enhance the accuracy and reliability of classification results, consequently advancing our understanding of the brain and its functions. The first approach focuses on utilizing machine learning methods to identify the best features among the vast array of features present in brain signals. The selection of features varies depending on the research objective, and different techniques have been employed for this purpose. For instance, the genetic algorithm has been used in some studies to identify the best features, while optimization methods have been utilized in others to identify the most influential features. Additionally, machine learning techniques have been applied to determine the influential electrodes in classification. Ensemble learning plays a crucial role in identifying the best features that contribute to learning, thereby improving the overall results. The second approach concentrates on designing and implementing methods for selecting the best classifier or utilizing meta-classifiers to enhance the final results in ensemble learning. In a different section of the research, a single classifier is used instead of multiple classifiers, employing different sets of features to improve the results. The article provides an in-depth examination of each technique, highlighting their advantages and limitations. By integrating these techniques, researchers can enhance the performance of classifiers in the analysis and classification of brain signals. This advancement in ensemble learning methodologies contributes to a better understanding of the brain and its functions, ultimately leading to improved accuracy and reliability in brain signal analysis and classification.Keywords: ensemble learning, brain signals, classification, feature selection, machine learning, genetic algorithm, optimization methods, influential features, influential electrodes, meta-classifiers
Procedia PDF Downloads 76167 Multilabel Classification with Neural Network Ensemble Method
Authors: Sezin Ekşioğlu
Abstract:
Multilabel classification has a huge importance for several applications, it is also a challenging research topic. It is a kind of supervised learning that contains binary targets. The distance between multilabel and binary classification is having more than one class in multilabel classification problems. Features can belong to one class or many classes. There exists a wide range of applications for multi label prediction such as image labeling, text categorization, gene functionality. Even though features are classified in many classes, they may not always be properly classified. There are many ensemble methods for the classification. However, most of the researchers have been concerned about better multilabel methods. Especially little ones focus on both efficiency of classifiers and pairwise relationships at the same time in order to implement better multilabel classification. In this paper, we worked on modified ensemble methods by getting benefit from k-Nearest Neighbors and neural network structure to address issues within a beneficial way and to get better impacts from the multilabel classification. Publicly available datasets (yeast, emotion, scene and birds) are performed to demonstrate the developed algorithm efficiency and the technique is measured by accuracy, F1 score and hamming loss metrics. Our algorithm boosts benchmarks for each datasets with different metrics.Keywords: multilabel, classification, neural network, KNN
Procedia PDF Downloads 155166 A Genetic Algorithm Based Ensemble Method with Pairwise Consensus Score on Malware Cacophonous Labels
Authors: Shih-Yu Wang, Shun-Wen Hsiao
Abstract:
In the field of cybersecurity, there exists many vendors giving malware samples classified results, namely naming after the label that contains some important information which is also called AV label. Lots of researchers relay on AV labels for research. Unfortunately, AV labels are too cluttered. They do not have a fixed format and fixed naming rules because the naming results were based on each classifiers' viewpoints. A way to fix the problem is taking a majority vote. However, voting can sometimes create problems of bias. Thus, we create a novel ensemble approach which does not rely on the cacophonous naming result but depend on group identification to aggregate everyone's opinion. To achieve this purpose, we develop an scoring system called Pairwise Consensus Score (PCS) to calculate result similarity. The entire method architecture combine Genetic Algorithm and PCS to find maximum consensus in the group. Experimental results revealed that our method outperformed the majority voting by 10% in term of the score.Keywords: genetic algorithm, ensemble learning, malware family, malware labeling, AV labels
Procedia PDF Downloads 87165 Neuroevolution Based on Adaptive Ensembles of Biologically Inspired Optimization Algorithms Applied for Modeling a Chemical Engineering Process
Authors: Sabina-Adriana Floria, Marius Gavrilescu, Florin Leon, Silvia Curteanu, Costel Anton
Abstract:
Neuroevolution is a subfield of artificial intelligence used to solve various problems in different application areas. Specifically, neuroevolution is a technique that applies biologically inspired methods to generate neural network architectures and optimize their parameters automatically. In this paper, we use different biologically inspired optimization algorithms in an ensemble strategy with the aim of training multilayer perceptron neural networks, resulting in regression models used to simulate the industrial chemical process of obtaining bricks from silicone-based materials. Installations in the raw ceramics industry, i.e., bricks, are characterized by significant energy consumption and large quantities of emissions. In addition, the initial conditions that were taken into account during the design and commissioning of the installation can change over time, which leads to the need to add new mixes to adjust the operating conditions for the desired purpose, e.g., material properties and energy saving. The present approach follows the study by simulation of a process of obtaining bricks from silicone-based materials, i.e., the modeling and optimization of the process. Optimization aims to determine the working conditions that minimize the emissions represented by nitrogen monoxide. We first use a search procedure to find the best values for the parameters of various biologically inspired optimization algorithms. Then, we propose an adaptive ensemble strategy that uses only a subset of the best algorithms identified in the search stage. The adaptive ensemble strategy combines the results of selected algorithms and automatically assigns more processing capacity to the more efficient algorithms. Their efficiency may also vary at different stages of the optimization process. In a given ensemble iteration, the most efficient algorithms aim to maintain good convergence, while the less efficient algorithms can improve population diversity. The proposed adaptive ensemble strategy outperforms the individual optimizers and the non-adaptive ensemble strategy in convergence speed, and the obtained results provide lower error values.Keywords: optimization, biologically inspired algorithm, neuroevolution, ensembles, bricks, emission minimization
Procedia PDF Downloads 118164 A Dynamic Ensemble Learning Approach for Online Anomaly Detection in Alibaba Datacenters
Authors: Wanyi Zhu, Xia Ming, Huafeng Wang, Junda Chen, Lu Liu, Jiangwei Jiang, Guohua Liu
Abstract:
Anomaly detection is a first and imperative step needed to respond to unexpected problems and to assure high performance and security in large data center management. This paper presents an online anomaly detection system through an innovative approach of ensemble machine learning and adaptive differentiation algorithms, and applies them to performance data collected from a continuous monitoring system for multi-tier web applications running in Alibaba data centers. We evaluate the effectiveness and efficiency of this algorithm with production traffic data and compare with the traditional anomaly detection approaches such as a static threshold and other deviation-based detection techniques. The experiment results show that our algorithm correctly identifies the unexpected performance variances of any running application, with an acceptable false positive rate. This proposed approach has already been deployed in real-time production environments to enhance the efficiency and stability in daily data center operations.Keywords: Alibaba data centers, anomaly detection, big data computation, dynamic ensemble learning
Procedia PDF Downloads 203163 An Ensemble Deep Learning Architecture for Imbalanced Classification of Thoracic Surgery Patients
Authors: Saba Ebrahimi, Saeed Ahmadian, Hedie Ashrafi
Abstract:
Selecting appropriate patients for surgery is one of the main issues in thoracic surgery (TS). Both short-term and long-term risks and benefits of surgery must be considered in the patient selection criteria. There are some limitations in the existing datasets of TS patients because of missing values of attributes and imbalanced distribution of survival classes. In this study, a novel ensemble architecture of deep learning networks is proposed based on stacking different linear and non-linear layers to deal with imbalance datasets. The categorical and numerical features are split using different layers with ability to shrink the unnecessary features. Then, after extracting the insight from the raw features, a novel biased-kernel layer is applied to reinforce the gradient of the minority class and cause the network to be trained better comparing the current methods. Finally, the performance and advantages of our proposed model over the existing models are examined for predicting patient survival after thoracic surgery using a real-life clinical data for lung cancer patients.Keywords: deep learning, ensemble models, imbalanced classification, lung cancer, TS patient selection
Procedia PDF Downloads 146162 Ensemble of Deep CNN Architecture for Classifying the Source and Quality of Teff Cereal
Authors: Belayneh Matebie, Michael Melese
Abstract:
The study focuses on addressing the challenges in classifying and ensuring the quality of Eragrostis Teff, a small and round grain that is the smallest cereal grain. Employing a traditional classification method is challenging because of its small size and the similarity of its environmental characteristics. To overcome this, this study employs a machine learning approach to develop a source and quality classification system for Teff cereal. Data is collected from various production areas in the Amhara regions, considering two types of cereal (high and low quality) across eight classes. A total of 5,920 images are collected, with 740 images for each class. Image enhancement techniques, including scaling, data augmentation, histogram equalization, and noise removal, are applied to preprocess the data. Convolutional Neural Network (CNN) is then used to extract relevant features and reduce dimensionality. The dataset is split into 80% for training and 20% for testing. Different classifiers, including FVGG16, FINCV3, QSCTC, EMQSCTC, SVM, and RF, are employed for classification, achieving accuracy rates ranging from 86.91% to 97.72%. The ensemble of FVGG16, FINCV3, and QSCTC using the Max-Voting approach outperforms individual algorithms.Keywords: Teff, ensemble learning, max-voting, CNN, SVM, RF
Procedia PDF Downloads 57