Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 132

Search results for: multiple-k ensemble

132 Evaluation of Ensemble Classifiers for Intrusion Detection

Authors: M. Govindarajan

Abstract:

One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. In this research work, new ensemble classification methods are proposed with homogeneous ensemble classifier using bagging and heterogeneous ensemble classifier using arcing and their performances are analyzed in terms of accuracy. A Classifier ensemble is designed using Radial Basis Function (RBF) and Support Vector Machine (SVM) as base classifiers. The feasibility and the benefits of the proposed approaches are demonstrated by the means of standard datasets of intrusion detection. The main originality of the proposed approach is based on three main parts: preprocessing phase, classification phase, and combining phase. A wide range of comparative experiments is conducted for standard datasets of intrusion detection. The performance of the proposed homogeneous and heterogeneous ensemble classifiers are compared to the performance of other standard homogeneous and heterogeneous ensemble methods. The standard homogeneous ensemble methods include Error correcting output codes, Dagging and heterogeneous ensemble methods include majority voting, stacking. The proposed ensemble methods provide significant improvement of accuracy compared to individual classifiers and the proposed bagged RBF and SVM performs significantly better than ECOC and Dagging and the proposed hybrid RBF-SVM performs significantly better than voting and stacking. Also heterogeneous models exhibit better results than homogeneous models for standard datasets of intrusion detection. 

Keywords: data mining, ensemble, radial basis function, support vector machine, accuracy

Procedia PDF Downloads 169
131 Random Subspace Ensemble of CMAC Classifiers

Authors: Somaiyeh Dehghan, Mohammad Reza Kheirkhahan Haghighi

Abstract:

The rapid growth of domains that have data with a large number of features, while the number of samples is limited has caused difficulty in constructing strong classifiers. To reduce the dimensionality of the feature space becomes an essential step in classification task. Random subspace method (or attribute bagging) is an ensemble classifier that consists of several classifiers that each base learner in ensemble has subset of features. In the present paper, we introduce Random Subspace Ensemble of CMAC neural network (RSE-CMAC), each of which has training with subset of features. Then we use this model for classification task. For evaluation performance of our model, we compare it with bagging algorithm on 36 UCI datasets. The results reveal that the new model has better performance.

Keywords: classification, random subspace, ensemble, CMAC neural network

Procedia PDF Downloads 241
130 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis

Authors: C. B. Le, V. N. Pham

Abstract:

In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.

Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering

Procedia PDF Downloads 67
129 Rank-Based Chain-Mode Ensemble for Binary Classification

Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu

Abstract:

In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.

Keywords: consensus, curse of correlation, imbalance classification, rank-based chain-mode ensemble

Procedia PDF Downloads 49
128 Sentiment Analysis of Ensemble-Based Classifiers for E-Mail Data

Authors: Muthukumarasamy Govindarajan

Abstract:

Detection of unwanted, unsolicited mails called spam from email is an interesting area of research. It is necessary to evaluate the performance of any new spam classifier using standard data sets. Recently, ensemble-based classifiers have gained popularity in this domain. In this research work, an efficient email filtering approach based on ensemble methods is addressed for developing an accurate and sensitive spam classifier. The proposed approach employs Naive Bayes (NB), Support Vector Machine (SVM) and Genetic Algorithm (GA) as base classifiers along with different ensemble methods. The experimental results show that the ensemble classifier was performing with accuracy greater than individual classifiers, and also hybrid model results are found to be better than the combined models for the e-mail dataset. The proposed ensemble-based classifiers turn out to be good in terms of classification accuracy, which is considered to be an important criterion for building a robust spam classifier.

Keywords: accuracy, arcing, bagging, genetic algorithm, Naive Bayes, sentiment mining, support vector machine

Procedia PDF Downloads 57
127 Feature Evaluation Based on Random Subspace and Multiple-K Ensemble

Authors: Jaehong Yu, Seoung Bum Kim

Abstract:

Clustering analysis can facilitate the extraction of intrinsic patterns in a dataset and reveal its natural groupings without requiring class information. For effective clustering analysis in high dimensional datasets, unsupervised dimensionality reduction is an important task. Unsupervised dimensionality reduction can generally be achieved by feature extraction or feature selection. In many situations, feature selection methods are more appropriate than feature extraction methods because of their clear interpretation with respect to the original features. The unsupervised feature selection can be categorized as feature subset selection and feature ranking method, and we focused on unsupervised feature ranking methods which evaluate the features based on their importance scores. Recently, several unsupervised feature ranking methods were developed based on ensemble approaches to achieve their higher accuracy and stability. However, most of the ensemble-based feature ranking methods require the true number of clusters. Furthermore, these algorithms evaluate the feature importance depending on the ensemble clustering solution, and they produce undesirable evaluation results if the clustering solutions are inaccurate. To address these limitations, we proposed an ensemble-based feature ranking method with random subspace and multiple-k ensemble (FRRM). The proposed FRRM algorithm evaluates the importance of each feature with the random subspace ensemble, and all evaluation results are combined with the ensemble importance scores. Moreover, FRRM does not require the determination of the true number of clusters in advance through the use of the multiple-k ensemble idea. Experiments on various benchmark datasets were conducted to examine the properties of the proposed FRRM algorithm and to compare its performance with that of existing feature ranking methods. The experimental results demonstrated that the proposed FRRM outperformed the competitors.

Keywords: clustering analysis, multiple-k ensemble, random subspace-based feature evaluation, unsupervised feature ranking

Procedia PDF Downloads 238
126 Application of Bayesian Model Averaging and Geostatistical Output Perturbation to Generate Calibrated Ensemble Weather Forecast

Authors: Muhammad Luthfi, Sutikno Sutikno, Purhadi Purhadi

Abstract:

Weather forecast has necessarily been improved to provide the communities an accurate and objective prediction as well. To overcome such issue, the numerical-based weather forecast was extensively developed to reduce the subjectivity of forecast. Yet the Numerical Weather Predictions (NWPs) outputs are unfortunately issued without taking dynamical weather behavior and local terrain features into account. Thus, NWPs outputs are not able to accurately forecast the weather quantities, particularly for medium and long range forecast. The aim of this research is to aid and extend the development of ensemble forecast for Meteorology, Climatology, and Geophysics Agency of Indonesia. Ensemble method is an approach combining various deterministic forecast to produce more reliable one. However, such forecast is biased and uncalibrated due to its underdispersive or overdispersive nature. As one of the parametric methods, Bayesian Model Averaging (BMA) generates the calibrated ensemble forecast and constructs predictive PDF for specified period. Such method is able to utilize ensemble of any size but does not take spatial correlation into account. Whereas space dependencies involve the site of interest and nearby site, influenced by dynamic weather behavior. Meanwhile, Geostatistical Output Perturbation (GOP) reckons the spatial correlation to generate future weather quantities, though merely built by a single deterministic forecast, and is able to generate an ensemble of any size as well. This research conducts both BMA and GOP to generate the calibrated ensemble forecast for the daily temperature at few meteorological sites nearby Indonesia international airport.

Keywords: Bayesian Model Averaging, ensemble forecast, geostatistical output perturbation, numerical weather prediction, temperature

Procedia PDF Downloads 200
125 Lipschitz Classifiers Ensembles: Usage for Classification of Target Events in C-OTDR Monitoring Systems

Authors: Andrey V. Timofeev

Abstract:

This paper introduces an original method for guaranteed estimation of the accuracy of an ensemble of Lipschitz classifiers. The solution was obtained as a finite closed set of alternative hypotheses, which contains an object of classification with a probability of not less than the specified value. Thus, the classification is represented by a set of hypothetical classes. In this case, the smaller the cardinality of the discrete set of hypothetical classes is, the higher is the classification accuracy. Experiments have shown that if the cardinality of the classifiers ensemble is increased then the cardinality of this set of hypothetical classes is reduced. The problem of the guaranteed estimation of the accuracy of an ensemble of Lipschitz classifiers is relevant in the multichannel classification of target events in C-OTDR monitoring systems. Results of suggested approach practical usage to accuracy control in C-OTDR monitoring systems are present.

Keywords: Lipschitz classifiers, confidence set, C-OTDR monitoring, classifiers accuracy, classifiers ensemble

Procedia PDF Downloads 405
124 Simulation of Optimal Runoff Hydrograph Using Ensemble of Radar Rainfall and Blending of Runoffs Model

Authors: Myungjin Lee, Daegun Han, Jongsung Kim, Soojun Kim, Hung Soo Kim

Abstract:

Recently, the localized heavy rainfall and typhoons are frequently occurred due to the climate change and the damage is becoming bigger. Therefore, we may need a more accurate prediction of the rainfall and runoff. However, the gauge rainfall has the limited accuracy in space. Radar rainfall is better than gauge rainfall for the explanation of the spatial variability of rainfall but it is mostly underestimated with the uncertainty involved. Therefore, the ensemble of radar rainfall was simulated using error structure to overcome the uncertainty and gauge rainfall. The simulated ensemble was used as the input data of the rainfall-runoff models for obtaining the ensemble of runoff hydrographs. The previous studies discussed about the accuracy of the rainfall-runoff model. Even if the same input data such as rainfall is used for the runoff analysis using the models in the same basin, the models can have different results because of the uncertainty involved in the models. Therefore, we used two models of the SSARR model which is the lumped model, and the Vflo model which is a distributed model and tried to simulate the optimum runoff considering the uncertainty of each rainfall-runoff model. The study basin is located in Han river basin and we obtained one integrated runoff hydrograph which is an optimum runoff hydrograph using the blending methods such as Multi-Model Super Ensemble (MMSE), Simple Model Average (SMA), Mean Square Error (MSE). From this study, we could confirm the accuracy of rainfall and rainfall-runoff model using ensemble scenario and various rainfall-runoff model and we can use this result to study flood control measure due to climate change. Acknowledgements: This work is supported by the Korea Agency for Infrastructure Technology Advancement(KAIA) grant funded by the Ministry of Land, Infrastructure and Transport (Grant 18AWMP-B083066-05).

Keywords: radar rainfall ensemble, rainfall-runoff models, blending method, optimum runoff hydrograph

Procedia PDF Downloads 189
123 Breast Cancer Survivability Prediction via Classifier Ensemble

Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia

Abstract:

This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.

Keywords: classifier ensemble, breast cancer survivability, data mining, SEER

Procedia PDF Downloads 247
122 A Video Surveillance System Using an Ensemble of Simple Neural Network Classifiers

Authors: Rodrigo S. Moreira, Nelson F. F. Ebecken

Abstract:

This paper proposes a maritime vessel tracker composed of an ensemble of WiSARD weightless neural network classifiers. A failure detector analyzes vessel movement with a Kalman filter and corrects the tracking, if necessary, using FFT matching. The use of the WiSARD neural network to track objects is uncommon. The additional contributions of the present study include a performance comparison with four state-of-art trackers, an experimental study of the features that improve maritime vessel tracking, the first use of an ensemble of classifiers to track maritime vessels and a new quantization algorithm that compares the values of pixel pairs.

Keywords: ram memory, WiSARD weightless neural network, object tracking, quantization

Procedia PDF Downloads 217
121 Statistical Comparison of Ensemble Based Storm Surge Forecasting Models

Authors: Amin Salighehdar, Ziwen Ye, Mingzhe Liu, Ionut Florescu, Alan F. Blumberg

Abstract:

Storm surge is an abnormal water level caused by a storm. Accurate prediction of a storm surge is a challenging problem. Researchers developed various ensemble modeling techniques to combine several individual forecasts to produce an overall presumably better forecast. There exist some simple ensemble modeling techniques in literature. For instance, Model Output Statistics (MOS), and running mean-bias removal are widely used techniques in storm surge prediction domain. However, these methods have some drawbacks. For instance, MOS is based on multiple linear regression and it needs a long period of training data. To overcome the shortcomings of these simple methods, researchers propose some advanced methods. For instance, ENSURF (Ensemble SURge Forecast) is a multi-model application for sea level forecast. This application creates a better forecast of sea level using a combination of several instances of the Bayesian Model Averaging (BMA). An ensemble dressing method is based on identifying best member forecast and using it for prediction. Our contribution in this paper can be summarized as follows. First, we investigate whether the ensemble models perform better than any single forecast. Therefore, we need to identify the single best forecast. We present a methodology based on a simple Bayesian selection method to select the best single forecast. Second, we present several new and simple ways to construct ensemble models. We use correlation and standard deviation as weights in combining different forecast models. Third, we use these ensembles and compare with several existing models in literature to forecast storm surge level. We then investigate whether developing a complex ensemble model is indeed needed. To achieve this goal, we use a simple average (one of the simplest and widely used ensemble model) as benchmark. Predicting the peak level of Surge during a storm as well as the precise time at which this peak level takes place is crucial, thus we develop a statistical platform to compare the performance of various ensemble methods. This statistical analysis is based on root mean square error of the ensemble forecast during the testing period and on the magnitude and timing of the forecasted peak surge compared to the actual time and peak. In this work, we analyze four hurricanes: hurricanes Irene and Lee in 2011, hurricane Sandy in 2012, and hurricane Joaquin in 2015. Since hurricane Irene developed at the end of August 2011 and hurricane Lee started just after Irene at the beginning of September 2011, in this study we consider them as a single contiguous hurricane event. The data set used for this study is generated by the New York Harbor Observing and Prediction System (NYHOPS). We find that even the simplest possible way of creating an ensemble produces results superior to any single forecast. We also show that the ensemble models we propose generally have better performance compared to the simple average ensemble technique.

Keywords: Bayesian learning, ensemble model, statistical analysis, storm surge prediction

Procedia PDF Downloads 219
120 Assessing Student Collaboration in Music Ensemble Class: From the Formulation of Grading Rubrics to Their Effective Implementation

Authors: Jason Sah

Abstract:

Music ensemble class is a non-traditional classroom in the sense that it is always a group effort during rehearsal. When measuring student performance ability in class, it is imperative that the grading rubric includes a collaborative skill component. Assessments that stop short of testing students' ability to make music with others undermine the group mentality by elevating individual prowess. Applying empirical and evidence-based methodology, this research develops a grading rubric that defines the criteria for assessing collaborative skill, and then explores different strategies for implementing this rubric in a timely and effective manner. Findings show that when collaborative skill is regularly tested, students gradually shift their attention from playing their own part well to sharing their part with others.

Keywords: assessment, ensemble class, grading rubric, student collaboration

Procedia PDF Downloads 51
119 Machine Learning Predictive Models for Hydroponic Systems: A Case Study Nutrient Film Technique and Deep Flow Technique

Authors: Kritiyaporn Kunsook

Abstract:

Machine learning algorithms (MLAs) such us artificial neural networks (ANNs), decision tree, support vector machines (SVMs), Naïve Bayes, and ensemble classifier by voting are powerful data driven methods that are relatively less widely used in the mapping of technique of system, and thus have not been comparatively evaluated together thoroughly in this field. The performances of a series of MLAs, ANNs, decision tree, SVMs, Naïve Bayes, and ensemble classifier by voting in technique of hydroponic systems prospectively modeling are compared based on the accuracy of each model. Classification of hydroponic systems only covers the test samples from vegetables grown with Nutrient film technique (NFT) and Deep flow technique (DFT). The feature, which are the characteristics of vegetables compose harvesting height width, temperature, require light and color. The results indicate that the classification performance of the ANNs is 98%, decision tree is 98%, SVMs is 97.33%, Naïve Bayes is 96.67%, and ensemble classifier by voting is 98.96% algorithm respectively.

Keywords: artificial neural networks, decision tree, support vector machines, naïve Bayes, ensemble classifier by voting

Procedia PDF Downloads 222
118 Ensemble-Based SVM Classification Approach for miRNA Prediction

Authors: Sondos M. Hammad, Sherin M. ElGokhy, Mahmoud M. Fahmy, Elsayed A. Sallam

Abstract:

In this paper, an ensemble-based Support Vector Machine (SVM) classification approach is proposed. It is used for miRNA prediction. Three problems, commonly associated with previous approaches, are alleviated. These problems arise due to impose assumptions on the secondary structural of premiRNA, imbalance between the numbers of the laboratory checked miRNAs and the pseudo-hairpins, and finally using a training data set that does not consider all the varieties of samples in different species. We aggregate the predicted outputs of three well-known SVM classifiers; namely, Triplet-SVM, Virgo and Mirident, weighted by their variant features without any structural assumptions. An additional SVM layer is used in aggregating the final output. The proposed approach is trained and then tested with balanced data sets. The results of the proposed approach outperform the three base classifiers. Improved values for the metrics of 88.88% f-score, 92.73% accuracy, 90.64% precision, 96.64% specificity, 87.2% sensitivity, and the area under the ROC curve is 0.91 are achieved.

Keywords: MiRNAs, SVM classification, ensemble algorithm, assumption problem, imbalance data

Procedia PDF Downloads 255
117 Recommender Systems Using Ensemble Techniques

Authors: Yeonjeong Lee, Kyoung-jae Kim, Youngtae Kim

Abstract:

This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.

Keywords: product recommender system, ensemble technique, association rules, decision tree, artificial neural networks

Procedia PDF Downloads 213
116 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education

Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue

Abstract:

In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.

Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education

Procedia PDF Downloads 13
115 Multilabel Classification with Neural Network Ensemble Method

Authors: Sezin Ekşioğlu

Abstract:

Multilabel classification has a huge importance for several applications, it is also a challenging research topic. It is a kind of supervised learning that contains binary targets. The distance between multilabel and binary classification is having more than one class in multilabel classification problems. Features can belong to one class or many classes. There exists a wide range of applications for multi label prediction such as image labeling, text categorization, gene functionality. Even though features are classified in many classes, they may not always be properly classified. There are many ensemble methods for the classification. However, most of the researchers have been concerned about better multilabel methods. Especially little ones focus on both efficiency of classifiers and pairwise relationships at the same time in order to implement better multilabel classification. In this paper, we worked on modified ensemble methods by getting benefit from k-Nearest Neighbors and neural network structure to address issues within a beneficial way and to get better impacts from the multilabel classification. Publicly available datasets (yeast, emotion, scene and birds) are performed to demonstrate the developed algorithm efficiency and the technique is measured by accuracy, F1 score and hamming loss metrics. Our algorithm boosts benchmarks for each datasets with different metrics.

Keywords: multilabel, classification, neural network, KNN

Procedia PDF Downloads 32
114 A Dynamic Ensemble Learning Approach for Online Anomaly Detection in Alibaba Datacenters

Authors: Wanyi Zhu, Xia Ming, Huafeng Wang, Junda Chen, Lu Liu, Jiangwei Jiang, Guohua Liu

Abstract:

Anomaly detection is a first and imperative step needed to respond to unexpected problems and to assure high performance and security in large data center management. This paper presents an online anomaly detection system through an innovative approach of ensemble machine learning and adaptive differentiation algorithms, and applies them to performance data collected from a continuous monitoring system for multi-tier web applications running in Alibaba data centers. We evaluate the effectiveness and efficiency of this algorithm with production traffic data and compare with the traditional anomaly detection approaches such as a static threshold and other deviation-based detection techniques. The experiment results show that our algorithm correctly identifies the unexpected performance variances of any running application, with an acceptable false positive rate. This proposed approach has already been deployed in real-time production environments to enhance the efficiency and stability in daily data center operations.

Keywords: Alibaba data centers, anomaly detection, big data computation, dynamic ensemble learning

Procedia PDF Downloads 101
113 An Ensemble Deep Learning Architecture for Imbalanced Classification of Thoracic Surgery Patients

Authors: Saba Ebrahimi, Saeed Ahmadian, Hedie Ashrafi

Abstract:

Selecting appropriate patients for surgery is one of the main issues in thoracic surgery (TS). Both short-term and long-term risks and benefits of surgery must be considered in the patient selection criteria. There are some limitations in the existing datasets of TS patients because of missing values of attributes and imbalanced distribution of survival classes. In this study, a novel ensemble architecture of deep learning networks is proposed based on stacking different linear and non-linear layers to deal with imbalance datasets. The categorical and numerical features are split using different layers with ability to shrink the unnecessary features. Then, after extracting the insight from the raw features, a novel biased-kernel layer is applied to reinforce the gradient of the minority class and cause the network to be trained better comparing the current methods. Finally, the performance and advantages of our proposed model over the existing models are examined for predicting patient survival after thoracic surgery using a real-life clinical data for lung cancer patients.

Keywords: deep learning, ensemble models, imbalanced classification, lung cancer, TS patient selection

Procedia PDF Downloads 60
112 Ensemble Machine Learning Approach for Estimating Missing Data from CO₂ Time Series

Authors: Atbin Mahabbati, Jason Beringer, Matthias Leopold

Abstract:

To address the global challenges of climate and environmental changes, there is a need for quantifying and reducing uncertainties in environmental data, including observations of carbon, water, and energy. Global eddy covariance flux tower networks (FLUXNET), and their regional counterparts (i.e., OzFlux, AmeriFlux, China Flux, etc.) were established in the late 1990s and early 2000s to address the demand. Despite the capability of eddy covariance in validating process modelling analyses, field surveys and remote sensing assessments, there are some serious concerns regarding the challenges associated with the technique, e.g. data gaps and uncertainties. To address these concerns, this research has developed an ensemble model to fill the data gaps of CO₂ flux to avoid the limitations of using a single algorithm, and therefore, provide less error and decline the uncertainties associated with the gap-filling process. In this study, the data of five towers in the OzFlux Network (Alice Springs Mulga, Calperum, Gingin, Howard Springs and Tumbarumba) during 2013 were used to develop an ensemble machine learning model, using five feedforward neural networks (FFNN) with different structures combined with an eXtreme Gradient Boosting (XGB) algorithm. The former methods, FFNN, provided the primary estimations in the first layer, while the later, XGB, used the outputs of the first layer as its input to provide the final estimations of CO₂ flux. The introduced model showed slight superiority over each single FFNN and the XGB, while each of these two methods was used individually, overall RMSE: 2.64, 2.91, and 3.54 g C m⁻² yr⁻¹ respectively (3.54 provided by the best FFNN). The most significant improvement happened to the estimation of the extreme diurnal values (during midday and sunrise), as well as nocturnal estimations, which is generally considered as one of the most challenging parts of CO₂ flux gap-filling. The towers, as well as seasonality, showed different levels of sensitivity to improvements provided by the ensemble model. For instance, Tumbarumba showed more sensitivity compared to Calperum, where the differences between the Ensemble model on the one hand and the FFNNs and XGB, on the other hand, were the least of all 5 sites. Besides, the performance difference between the ensemble model and its components individually were more significant during the warm season (Jan, Feb, Mar, Oct, Nov, and Dec) compared to the cold season (Apr, May, Jun, Jul, Aug, and Sep) due to the higher amount of photosynthesis of plants, which led to a larger range of CO₂ exchange. In conclusion, the introduced ensemble model slightly improved the accuracy of CO₂ flux gap-filling and robustness of the model. Therefore, using ensemble machine learning models is potentially capable of improving data estimation and regression outcome when it seems to be no more room for improvement while using a single algorithm.

Keywords: carbon flux, Eddy covariance, extreme gradient boosting, gap-filling comparison, hybrid model, OzFlux network

Procedia PDF Downloads 59
111 Study of Functional Relevant Conformational Mobility of β-2 Adrenoreceptor by Means of Molecular Dynamics Simulation

Authors: G. V. Novikov, V. S. Sivozhelezov, S. S. Kolesnikov, K. V. Shaitan

Abstract:

The study reports about the influence of binding of orthosteric ligands as well as point mutations on the conformational dynamics of β-2-adrenoreceptor. Using molecular dynamics simulation we found that there was a little fraction of active states of the receptor in its apo (ligand free) ensemble corresponded to its constitutive activity. Analysis of MD trajectories indicated that such spontaneous activation of the receptor is accompanied by the motion in intracellular part of its alpha-helices. Thus receptor’s constitutive activity directly results from its conformational dynamics. On the other hand the binding of a full agonist resulted in a significant shift of the initial equilibrium towards its active state. Finally, the binding of the inverse agonist stabilized the receptor in its inactive state. It is likely that the binding of inverse agonists might be a universal way of constitutive activity inhibition in vivo. Our results indicate that ligand binding redistribute pre-existing conformational degrees of freedom (in accordance to the Monod-Wyman-Changeux-Model) of the receptor rather than cause induced fit in it. Therefore, the ensemble of biologically relevant receptor conformations is encoded in its spatial structure, and individual conformations from that ensemble might be used by the cell in conformity with the physiological behaviour.

Keywords: seven-transmembrane receptors, constitutive activity, activation, x-ray crystallography, principal component analysis, molecular dynamics simulation

Procedia PDF Downloads 163
110 Ethnic Identity Formation in Diaspora of Bajau Samah: An Ethnomusicological Study of Bertitik Music Ensemble in the Northwest Coast of Sabah, Malaysia

Authors: Mohd Hassan Abdullah, Mohd Azam Sulong, Mohd Nizam Nasrifan, Nor Azman Mohd Ramli, Suflan Faidzal Arshad

Abstract:

The Bajau Samah is a maritime ethnic community that inhabits the west coast of Sabah, Malaysia. The majority of these ethnicities embrace Islam and practice their own culture. Bertitik music ensemble is one of the musical practices performed in various social events, especially weddings. The ensemble, which combines several musical instruments including gongs, drums and kulintangan is played by six musicians to accompany various social events in the community. The position of the Bajau Samah in a multi-ethnic community such as Kadazandusun, Rungus, Suluk, Malay, Iranun and others exposes to the cultural activities with various artistic elements of the surrounding community. Western influences have also played an important role in the process of hybridity and acculturation in this society. Cultural change and the influx of foreign cultures have threatened the sustainability of this musical practice. This study aims to musicologically analyze the elements of bertitik ensemble that form the uniqueness of the cultural identity of the Bajau Samah Ethnic group. An ethnomusicological approach has been used to parse the essence of the bertitik music repertoire in depth. Ethnographic study design which comprises fieldwork, interviews, observations and document analysis as the main methods were utilized to collect data. Music recordings were transcribed in the form of musical notation and then analyzed based on the theory of "the norms of musical styles". This study reveals that musical elements featured in the ensemble represent the symbol and cultural identity to this ethnic group. The findings of the study were documented in the form of musicological analysis, audio and video as well as transcriptions of the musical notation of the repertoire of the music ensemble. This study is in line with the National cultural policy gazetted by the government, which is "Conservation, preservation and development of culture towards strengthening the foundations of National Culture through joint research, development, education, expansion and cultural relations" It will benefit various parties including students, teachers, academics, cultural arts activists and so on towards preserving the nation's cultural heritage as well as strengthening the spirit of nationhood among the people of various races and ethnic group in Malaysia.

Keywords: ethnomusicology, ethnic music, Malaysian music, cultural identity

Procedia PDF Downloads 49
109 Multiple Relaxation Times in the Gibbs Ensemble Monte Carlo Simulation of Phase Separation

Authors: Bina Kumari, Subir K. Sarkar, Pradipta Bandyopadhyay

Abstract:

The autocorrelation function of the density fluctuation is studied in each of the two phases in a Gibbs Ensemble Monte Carlo (GEMC) simulation of the problem of phase separation for a square well potential with various values of its range. We find that the normalized autocorrelation function is described very well as a linear combination of an exponential function with a time scale τ₂ and a stretched exponential function with a time scale τ₁ and an exponent α. Dependence of (α, τ₁, τ₂) on the parameters of the GEMC algorithm and the range of the square well potential is investigated and interpreted. We also analyse the issue of how to choose the parameters of the GEMC simulation optimally.

Keywords: autocorrelation function, density fluctuation, GEMC, simulation

Procedia PDF Downloads 64
108 Design of an Ensemble Learning Behavior Anomaly Detection Framework

Authors: Abdoulaye Diop, Nahid Emad, Thierry Winter, Mohamed Hilia

Abstract:

Data assets protection is a crucial issue in the cybersecurity field. Companies use logical access control tools to vault their information assets and protect them against external threats, but they lack solutions to counter insider threats. Nowadays, insider threats are the most significant concern of security analysts. They are mainly individuals with legitimate access to companies information systems, which use their rights with malicious intents. In several fields, behavior anomaly detection is the method used by cyber specialists to counter the threats of user malicious activities effectively. In this paper, we present the step toward the construction of a user and entity behavior analysis framework by proposing a behavior anomaly detection model. This model combines machine learning classification techniques and graph-based methods, relying on linear algebra and parallel computing techniques. We show the utility of an ensemble learning approach in this context. We present some detection methods tests results on an representative access control dataset. The use of some explored classifiers gives results up to 99% of accuracy.

Keywords: cybersecurity, data protection, access control, insider threat, user behavior analysis, ensemble learning, high performance computing

Procedia PDF Downloads 43
107 Pipat Ensemble and Music for Ligkey in Amphur Muaeng, Chachoengsao Province

Authors: Prasan Briboonnanggoul

Abstract:

The major objective of this research study was to explore some aspects of the performance culture of musical folk drama called Ligkey. This study was undertaken in an effect to focus on the specific functions of orchestra which accompanied Ligkey on Thai musical instruments in Chachoengsao Province. The process of study and exploration consisted of questionnaire, interview, a tape recording of an interview and photographs of performances which all of them were analyzed for the finding. The information obtained from the study indicated that Ligkey still received stable attention from people despite lesser performances affected by economics crisis. Almost all of the performances were organized and supported by both the public sector and the private sector. Based on the summary and finding of this study, a) there were ten Ligkey ensemble and ten orchestra which were Mon orchestra, not the precedent and the predecessor known as Thai orchestra; b) a variety of functions performed by musicians must harmonize discipline, punctuality, patience, no negligence, proficiency in performance; c) folklore melodies known as Plengnapad were performed as usual, but folklore melodies and songs known as Plangsongchan got lesser and got a tendency towards extinction because of the plot which corresponded with a market-driven entertainment. Therefore, a purpose-built schema of the preservation of Thai folklore songs was that they should have been recognized by both the performers and the audiences and patronized by the public sector via the government media to publicize the value of popular art form.

Keywords: Pipat Ensemble, Ligkey, Amphur Muaeng, Chachoengsao Province

Procedia PDF Downloads 255
106 Performance Assessment of Multi-Level Ensemble for Multi-Class Problems

Authors: Rodolfo Lorbieski, Silvia Modesto Nassar

Abstract:

Many supervised machine learning tasks require decision making across numerous different classes. Multi-class classification has several applications, such as face recognition, text recognition and medical diagnostics. The objective of this article is to analyze an adapted method of Stacking in multi-class problems, which combines ensembles within the ensemble itself. For this purpose, a training similar to Stacking was used, but with three levels, where the final decision-maker (level 2) performs its training by combining outputs from the tree-based pair of meta-classifiers (level 1) from Bayesian families. These are in turn trained by pairs of base classifiers (level 0) of the same family. This strategy seeks to promote diversity among the ensembles forming the meta-classifier level 2. Three performance measures were used: (1) accuracy, (2) area under the ROC curve, and (3) time for three factors: (a) datasets, (b) experiments and (c) levels. To compare the factors, ANOVA three-way test was executed for each performance measure, considering 5 datasets by 25 experiments by 3 levels. A triple interaction between factors was observed only in time. The accuracy and area under the ROC curve presented similar results, showing a double interaction between level and experiment, as well as for the dataset factor. It was concluded that level 2 had an average performance above the other levels and that the proposed method is especially efficient for multi-class problems when compared to binary problems.

Keywords: stacking, multi-layers, ensemble, multi-class

Procedia PDF Downloads 191
105 Decision Trees Constructing Based on K-Means Clustering Algorithm

Authors: Loai Abdallah, Malik Yousef

Abstract:

A domain space for the data should reflect the actual similarity between objects. Since objects belonging to the same cluster usually share some common traits even though their geometric distance might be relatively large. In general, the Euclidean distance of data points that represented by large number of features is not capturing the actual relation between those points. In this study, we propose a new method to construct a different space that is based on clustering to form a new distance metric. The new distance space is based on ensemble clustering (EC). The EC distance space is defined by tracking the membership of the points over multiple runs of clustering algorithm metric. Over this distance, we train the decision trees classifier (DT-EC). The results obtained by applying DT-EC on 10 datasets confirm our hypotheses that embedding the EC space as a distance metric would improve the performance.

Keywords: ensemble clustering, decision trees, classification, K nearest neighbors

Procedia PDF Downloads 91
104 Optimizing Approach for Sifting Process to Solve a Common Type of Empirical Mode Decomposition Mode Mixing

Authors: Saad Al-Baddai, Karema Al-Subari, Elmar Lang, Bernd Ludwig

Abstract:

Empirical mode decomposition (EMD), a new data-driven of time-series decomposition, has the advantage of supposing that a time series is non-linear or non-stationary, as is implicitly achieved in Fourier decomposition. However, the EMD suffers of mode mixing problem in some cases. The aim of this paper is to present a solution for a common type of signals causing of EMD mode mixing problem, in case a signal suffers of an intermittency. By an artificial example, the solution shows superior performance in terms of cope EMD mode mixing problem comparing with the conventional EMD and Ensemble Empirical Mode decomposition (EEMD). Furthermore, the over-sifting problem is also completely avoided; and computation load is reduced roughly six times compared with EEMD, an ensemble number of 50.

Keywords: empirical mode decomposition (EMD), mode mixing, sifting process, over-sifting

Procedia PDF Downloads 291
103 The Design of a Vehicle Traffic Flow Prediction Model for a Gauteng Freeway Based on an Ensemble of Multi-Layer Perceptron

Authors: Tebogo Emma Makaba, Barnabas Ndlovu Gatsheni

Abstract:

The cities of Johannesburg and Pretoria both located in the Gauteng province are separated by a distance of 58 km. The traffic queues on the Ben Schoeman freeway which connects these two cities can stretch for almost 1.5 km. Vehicle traffic congestion impacts negatively on the business and the commuter’s quality of life. The goal of this paper is to identify variables that influence the flow of traffic and to design a vehicle traffic prediction model, which will predict the traffic flow pattern in advance. The model will unable motorist to be able to make appropriate travel decisions ahead of time. The data used was collected by Mikro’s Traffic Monitoring (MTM). Multi-Layer perceptron (MLP) was used individually to construct the model and the MLP was also combined with Bagging ensemble method to training the data. The cross—validation method was used for evaluating the models. The results obtained from the techniques were compared using predictive and prediction costs. The cost was computed using combination of the loss matrix and the confusion matrix. The predicted models designed shows that the status of the traffic flow on the freeway can be predicted using the following parameters travel time, average speed, traffic volume and day of month. The implications of this work is that commuters will be able to spend less time travelling on the route and spend time with their families. The logistics industry will save more than twice what they are currently spending.

Keywords: bagging ensemble methods, confusion matrix, multi-layer perceptron, vehicle traffic flow

Procedia PDF Downloads 256