Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 235

Search results for: ensemble kernels

235 Enhancing Predictive Accuracy in Pharmaceutical Sales through an Ensemble Kernel Gaussian Process Regression Approach

Authors: Shahin Mirshekari, Mohammadreza Moradi, Hossein Jafari, Mehdi Jafari, Mohammad Ensaf

Abstract:

This research employs Gaussian Process Regression (GPR) with an ensemble kernel, integrating Exponential Squared, Revised Matern, and Rational Quadratic kernels to analyze pharmaceutical sales data. Bayesian optimization was used to identify optimal kernel weights: 0.76 for Exponential Squared, 0.21 for Revised Matern, and 0.13 for Rational Quadratic. The ensemble kernel demonstrated superior performance in predictive accuracy, achieving an R² score near 1.0, and significantly lower values in MSE, MAE, and RMSE. These findings highlight the efficacy of ensemble kernels in GPR for predictive analytics in complex pharmaceutical sales datasets.

Keywords: Gaussian process regression, ensemble kernels, bayesian optimization, pharmaceutical sales analysis, time series forecasting, data analysis

Procedia PDF Downloads 26

234 On the Fourth-Order Hybrid Beta Polynomial Kernels in Kernel Density Estimation

Authors: Benson Ade Eniola Afere

Abstract:

This paper introduces a family of fourth-order hybrid beta polynomial kernels developed for statistical analysis. The assessment of these kernels' performance centers on two critical metrics: asymptotic mean integrated squared error (AMISE) and kernel efficiency. Through the utilization of both simulated and real-world datasets, a comprehensive evaluation was conducted, facilitating a thorough comparison with conventional fourth-order polynomial kernels. The evaluation procedure encompassed the computation of AMISE and efficiency values for both the proposed hybrid kernels and the established classical kernels. The consistently observed trend was the superior performance of the hybrid kernels when compared to their classical counterparts. This trend persisted across diverse datasets, underscoring the resilience and efficacy of the hybrid approach. By leveraging these performance metrics and conducting evaluations on both simulated and real-world data, this study furnishes compelling evidence in favour of the superiority of the proposed hybrid beta polynomial kernels. The discernible enhancement in performance, as indicated by lower AMISE values and higher efficiency scores, strongly suggests that the proposed kernels offer heightened suitability for statistical analysis tasks when compared to traditional kernels.

Keywords: AMISE, efficiency, fourth-order Kernels, hybrid Kernels, Kernel density estimation

Procedia PDF Downloads 40

233 On the Cluster of the Families of Hybrid Polynomial Kernels in Kernel Density Estimation

Authors: Benson Ade Eniola Afere

Abstract:

Over the years, kernel density estimation has been extensively studied within the context of nonparametric density estimation. The fundamental components of kernel density estimation are the kernel function and the bandwidth. While the mathematical exploration of the kernel component has been relatively limited, its selection and development remain crucial. The Mean Integrated Squared Error (MISE), serving as a measure of discrepancy, provides a robust framework for assessing the effectiveness of any kernel function. A kernel function with a lower MISE is generally considered to perform better than one with a higher MISE. Hence, the primary aim of this article is to create kernels that exhibit significantly reduced MISE when compared to existing classical kernels. Consequently, this article introduces a cluster of hybrid polynomial kernel families. The construction of these proposed kernel functions is carried out heuristically by combining two kernels from the classical polynomial kernel family using probability axioms. We delve into the analysis of error propagation within these kernels. To assess their performance, simulation experiments, and real-life datasets are employed. The obtained results demonstrate that the proposed hybrid kernels surpass their classical kernel counterparts in terms of performance.

Keywords: classical polynomial kernels, cluster of families, global error, hybrid Kernels, Kernel density estimation, Monte Carlo simulation

Procedia PDF Downloads 58

232 Image Compression Based on Regression SVM and Biorthogonal Wavelets

Authors: Zikiou Nadia, Lahdir Mourad, Ameur Soltane

Abstract:

In this paper, we propose an effective method for image compression based on SVM Regression (SVR), with three different kernels, and biorthogonal 2D Discrete Wavelet Transform. SVM regression could learn dependency from training data and compressed using fewer training points (support vectors) to represent the original data and eliminate the redundancy. Biorthogonal wavelet has been used to transform the image and the coefficients acquired are then trained with different kernels SVM (Gaussian, Polynomial, and Linear). Run-length and Arithmetic coders are used to encode the support vectors and its corresponding weights, obtained from the SVM regression. The peak signal noise ratio (PSNR) and their compression ratios of several test images, compressed with our algorithm, with different kernels are presented. Compared with other kernels, Gaussian kernel achieves better image quality. Experimental results show that the compression performance of our method gains much improvement.

Keywords: image compression, 2D discrete wavelet transform (DWT-2D), support vector regression (SVR), SVM Kernels, run-length, arithmetic coding

Procedia PDF Downloads 349

231 Sharp Estimates of Oscillatory Singular Integrals with Rough Kernels

Authors: H. Al-Qassem, L. Cheng, Y. Pan

Abstract:

In this paper, we establish sharp bounds for oscillatory singular integrals with an arbitrary real polynomial phase P. Our kernels are allowed to be rough both on the unit sphere and in the radial direction. We show that the bounds grow no faster than log (deg(P)), which is optimal and was first obtained by Parissis and Papadimitrakis for kernels without any radial roughness. Our results substantially improve many previously known results. Among key ingredients of our methods are an L¹→L² sharp estimate and using extrapolation.

Keywords: oscillatory singular integral, rough kernel, singular integral, orlicz spaces, block spaces, extrapolation, L^{p} boundedness

Procedia PDF Downloads 431

230 Evaluation of Ensemble Classifiers for Intrusion Detection

Authors: M. Govindarajan

Abstract:

One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. In this research work, new ensemble classification methods are proposed with homogeneous ensemble classifier using bagging and heterogeneous ensemble classifier using arcing and their performances are analyzed in terms of accuracy. A Classifier ensemble is designed using Radial Basis Function (RBF) and Support Vector Machine (SVM) as base classifiers. The feasibility and the benefits of the proposed approaches are demonstrated by the means of standard datasets of intrusion detection. The main originality of the proposed approach is based on three main parts: preprocessing phase, classification phase, and combining phase. A wide range of comparative experiments is conducted for standard datasets of intrusion detection. The performance of the proposed homogeneous and heterogeneous ensemble classifiers are compared to the performance of other standard homogeneous and heterogeneous ensemble methods. The standard homogeneous ensemble methods include Error correcting output codes, Dagging and heterogeneous ensemble methods include majority voting, stacking. The proposed ensemble methods provide significant improvement of accuracy compared to individual classifiers and the proposed bagged RBF and SVM performs significantly better than ECOC and Dagging and the proposed hybrid RBF-SVM performs significantly better than voting and stacking. Also heterogeneous models exhibit better results than homogeneous models for standard datasets of intrusion detection.

Keywords: data mining, ensemble, radial basis function, support vector machine, accuracy

Procedia PDF Downloads 220

229 Random Subspace Ensemble of CMAC Classifiers

Authors: Somaiyeh Dehghan, Mohammad Reza Kheirkhahan Haghighi

Abstract:

The rapid growth of domains that have data with a large number of features, while the number of samples is limited has caused difficulty in constructing strong classifiers. To reduce the dimensionality of the feature space becomes an essential step in classification task. Random subspace method (or attribute bagging) is an ensemble classifier that consists of several classifiers that each base learner in ensemble has subset of features. In the present paper, we introduce Random Subspace Ensemble of CMAC neural network (RSE-CMAC), each of which has training with subset of features. Then we use this model for classification task. For evaluation performance of our model, we compare it with bagging algorithm on 36 UCI datasets. The results reveal that the new model has better performance.

Keywords: classification, random subspace, ensemble, CMAC neural network

Procedia PDF Downloads 300

228 Fuzzy Optimization Multi-Objective Clustering Ensemble Model for Multi-Source Data Analysis

Authors: C. B. Le, V. N. Pham

Abstract:

In modern data analysis, multi-source data appears more and more in real applications. Multi-source data clustering has emerged as a important issue in the data mining and machine learning community. Different data sources provide information about different data. Therefore, multi-source data linking is essential to improve clustering performance. However, in practice multi-source data is often heterogeneous, uncertain, and large. This issue is considered a major challenge from multi-source data. Ensemble is a versatile machine learning model in which learning techniques can work in parallel, with big data. Clustering ensemble has been shown to outperform any standard clustering algorithm in terms of accuracy and robustness. However, most of the traditional clustering ensemble approaches are based on single-objective function and single-source data. This paper proposes a new clustering ensemble method for multi-source data analysis. The fuzzy optimized multi-objective clustering ensemble method is called FOMOCE. Firstly, a clustering ensemble mathematical model based on the structure of multi-objective clustering function, multi-source data, and dark knowledge is introduced. Then, rules for extracting dark knowledge from the input data, clustering algorithms, and base clusterings are designed and applied. Finally, a clustering ensemble algorithm is proposed for multi-source data analysis. The experiments were performed on the standard sample data set. The experimental results demonstrate the superior performance of the FOMOCE method compared to the existing clustering ensemble methods and multi-source clustering methods.

Keywords: clustering ensemble, multi-source, multi-objective, fuzzy clustering

Procedia PDF Downloads 142

227 Rank-Based Chain-Mode Ensemble for Binary Classification

Authors: Chongya Song, Kang Yen, Alexander Pons, Jin Liu

Abstract:

In the field of machine learning, the ensemble has been employed as a common methodology to improve the performance upon multiple base classifiers. However, the true predictions are often canceled out by the false ones during consensus due to a phenomenon called “curse of correlation” which is represented as the strong interferences among the predictions produced by the base classifiers. In addition, the existing practices are still not able to effectively mitigate the problem of imbalanced classification. Based on the analysis on our experiment results, we conclude that the two problems are caused by some inherent deficiencies in the approach of consensus. Therefore, we create an enhanced ensemble algorithm which adopts a designed rank-based chain-mode consensus to overcome the two problems. In order to evaluate the proposed ensemble algorithm, we employ a well-known benchmark data set NSL-KDD (the improved version of dataset KDDCup99 produced by University of New Brunswick) to make comparisons between the proposed and 8 common ensemble algorithms. Particularly, each compared ensemble classifier uses the same 22 base classifiers, so that the differences in terms of the improvements toward the accuracy and reliability upon the base classifiers can be truly revealed. As a result, the proposed rank-based chain-mode consensus is proved to be a more effective ensemble solution than the traditional consensus approach, which outperforms the 8 ensemble algorithms by 20% on almost all compared metrices which include accuracy, precision, recall, F1-score and area under receiver operating characteristic curve.

Keywords: consensus, curse of correlation, imbalance classification, rank-based chain-mode ensemble

Procedia PDF Downloads 105

226 Sentiment Analysis of Ensemble-Based Classifiers for E-Mail Data

Authors: Muthukumarasamy Govindarajan

Abstract:

Detection of unwanted, unsolicited mails called spam from email is an interesting area of research. It is necessary to evaluate the performance of any new spam classifier using standard data sets. Recently, ensemble-based classifiers have gained popularity in this domain. In this research work, an efficient email filtering approach based on ensemble methods is addressed for developing an accurate and sensitive spam classifier. The proposed approach employs Naive Bayes (NB), Support Vector Machine (SVM) and Genetic Algorithm (GA) as base classifiers along with different ensemble methods. The experimental results show that the ensemble classifier was performing with accuracy greater than individual classifiers, and also hybrid model results are found to be better than the combined models for the e-mail dataset. The proposed ensemble-based classifiers turn out to be good in terms of classification accuracy, which is considered to be an important criterion for building a robust spam classifier.

Keywords: accuracy, arcing, bagging, genetic algorithm, Naive Bayes, sentiment mining, support vector machine

Procedia PDF Downloads 111

225 Feature Evaluation Based on Random Subspace and Multiple-K Ensemble

Authors: Jaehong Yu, Seoung Bum Kim

Abstract:

Clustering analysis can facilitate the extraction of intrinsic patterns in a dataset and reveal its natural groupings without requiring class information. For effective clustering analysis in high dimensional datasets, unsupervised dimensionality reduction is an important task. Unsupervised dimensionality reduction can generally be achieved by feature extraction or feature selection. In many situations, feature selection methods are more appropriate than feature extraction methods because of their clear interpretation with respect to the original features. The unsupervised feature selection can be categorized as feature subset selection and feature ranking method, and we focused on unsupervised feature ranking methods which evaluate the features based on their importance scores. Recently, several unsupervised feature ranking methods were developed based on ensemble approaches to achieve their higher accuracy and stability. However, most of the ensemble-based feature ranking methods require the true number of clusters. Furthermore, these algorithms evaluate the feature importance depending on the ensemble clustering solution, and they produce undesirable evaluation results if the clustering solutions are inaccurate. To address these limitations, we proposed an ensemble-based feature ranking method with random subspace and multiple-k ensemble (FRRM). The proposed FRRM algorithm evaluates the importance of each feature with the random subspace ensemble, and all evaluation results are combined with the ensemble importance scores. Moreover, FRRM does not require the determination of the true number of clusters in advance through the use of the multiple-k ensemble idea. Experiments on various benchmark datasets were conducted to examine the properties of the proposed FRRM algorithm and to compare its performance with that of existing feature ranking methods. The experimental results demonstrated that the proposed FRRM outperformed the competitors.

Keywords: clustering analysis, multiple-k ensemble, random subspace-based feature evaluation, unsupervised feature ranking

Procedia PDF Downloads 306

224 Sorting Maize Haploids from Hybrids Using Single-Kernel Near-Infrared Spectroscopy

Authors: Paul R Armstrong

Abstract:

Doubled haploids (DHs) have become an important breeding tool for creating maize inbred lines, although several bottlenecks in the DH production process limit wider development, application, and adoption of the technique. DH kernels are typically sorted manually and represent about 10% of the seeds in a much larger pool where the remaining 90% are hybrid siblings. This introduces time constraints on DH production and manual sorting is often not accurate. Automated sorting based on the chemical composition of the kernel can be effective, but devices, namely NMR, have not achieved the sorting speed to be a cost-effective replacement to manual sorting. This study evaluated a single kernel near-infrared reflectance spectroscopy (skNIR) platform to accurately identify DH kernels based on oil content. The skNIR platform is a higher-throughput device, approximately 3 seeds/s, that uses spectra to predict oil content of each kernel from maize crosses intentionally developed to create larger than normal oil differences, 1.5%-2%, between DH and hybrid kernels. Spectra from the skNIR were used to construct a partial least squares regression (PLS) model for oil and for a categorical reference model of 1 (DH kernel) or 2 (hybrid kernel) and then used to sort several crosses to evaluate performance. Two approaches were used for sorting. The first used a general PLS model developed from all crosses to predict oil content and then used for sorting each induction cross, the second was the development of a specific model from a single induction cross where approximately fifty DH and one hundred hybrid kernels used. This second approach used a categorical reference value of 1 and 2, instead of oil content, for the PLS model and kernels selected for the calibration set were manually referenced based on traditional commercial methods using coloration of the tip cap and germ areas. The generalized PLS oil model statistics were R2 = 0.94 and RMSE = .93% for kernels spanning an oil content of 2.7% to 19.3%. Sorting by this model resulted in extracting 55% to 85% of haploid kernels from the four induction crosses. Using the second method of generating a model for each cross yielded model statistics ranging from R2s = 0.96 to 0.98 and RMSEs from 0.08 to 0.10. Sorting in this case resulted in 100% correct classification but required models that were cross. In summary, the first generalized model oil method could be used to sort a significant number of kernels from a kernel pool but was not close to the accuracy of developing a sorting model from a single cross. The penalty for the second method is that a PLS model would need to be developed for each individual cross. In conclusion both methods could find useful application in the sorting of DH from hybrid kernels.

Keywords: NIR, haploids, maize, sorting

Procedia PDF Downloads 276

223 Application of Bayesian Model Averaging and Geostatistical Output Perturbation to Generate Calibrated Ensemble Weather Forecast

Authors: Muhammad Luthfi, Sutikno Sutikno, Purhadi Purhadi

Abstract:

Weather forecast has necessarily been improved to provide the communities an accurate and objective prediction as well. To overcome such issue, the numerical-based weather forecast was extensively developed to reduce the subjectivity of forecast. Yet the Numerical Weather Predictions (NWPs) outputs are unfortunately issued without taking dynamical weather behavior and local terrain features into account. Thus, NWPs outputs are not able to accurately forecast the weather quantities, particularly for medium and long range forecast. The aim of this research is to aid and extend the development of ensemble forecast for Meteorology, Climatology, and Geophysics Agency of Indonesia. Ensemble method is an approach combining various deterministic forecast to produce more reliable one. However, such forecast is biased and uncalibrated due to its underdispersive or overdispersive nature. As one of the parametric methods, Bayesian Model Averaging (BMA) generates the calibrated ensemble forecast and constructs predictive PDF for specified period. Such method is able to utilize ensemble of any size but does not take spatial correlation into account. Whereas space dependencies involve the site of interest and nearby site, influenced by dynamic weather behavior. Meanwhile, Geostatistical Output Perturbation (GOP) reckons the spatial correlation to generate future weather quantities, though merely built by a single deterministic forecast, and is able to generate an ensemble of any size as well. This research conducts both BMA and GOP to generate the calibrated ensemble forecast for the daily temperature at few meteorological sites nearby Indonesia international airport.

Keywords: Bayesian Model Averaging, ensemble forecast, geostatistical output perturbation, numerical weather prediction, temperature

Procedia PDF Downloads 246

222 Rough Oscillatory Singular Integrals on Rⁿ

Authors: H. M. Al-Qassem, L. Cheng, Y. Pan

Abstract:

In this paper we establish sharp bounds for oscillatory singular integrals with an arbitrary real polynomial phase P. Our kernels are allowed to be rough both on the unit sphere and in the radial direction. We show that the bounds grow no faster than log(deg(P)), which is optimal and was first obtained by Parissis and Papadimitrakis for kernels without any radial roughness. Among key ingredients of our methods are an L¹→L² estimate and extrapolation.

Keywords: oscillatory singular integral, rough kernel, singular integral, Orlicz spaces, Block spaces, extrapolation, L^{p} boundedness

Procedia PDF Downloads 326

221 On the Mathematical Modelling of Aggregative Stability of Disperse Systems

Authors: Arnold M. Brener, Lesbek Tashimov, Ablakim S. Muratov

Abstract:

The paper deals with the special model for coagulation kernels which represents new control parameters in the Smoluchowski equation for binary aggregation. On the base of the model the new approach to evaluating aggregative stability of disperse systems has been submitted. With the help of this approach the simple estimates for aggregative stability of various types of hydrophilic nano-suspensions have been obtained.

Keywords: aggregative stability, coagulation kernels, disperse systems, mathematical model

Procedia PDF Downloads 285

220 The Linear Combination of Kernels in the Estimation of the Cumulative Distribution Functions

Authors: Abdel-Razzaq Mugdadi, Ruqayyah Sani

Abstract:

The Kernel Distribution Function Estimator (KDFE) method is the most popular method for nonparametric estimation of the cumulative distribution function. The kernel and the bandwidth are the most important components of this estimator. In this investigation, we replace the kernel in the KDFE with a linear combination of kernels to obtain a new estimator based on the linear combination of kernels, the mean integrated squared error (MISE), asymptotic mean integrated squared error (AMISE) and the asymptotically optimal bandwidth for the new estimator are derived. We propose a new data-based method to select the bandwidth for the new estimator. The new technique is based on the Plug-in technique in density estimation. We evaluate the new estimator and the new technique using simulations and real-life data.

Keywords: estimation, bandwidth, mean square error, cumulative distribution function

Procedia PDF Downloads 542

219 Lipschitz Classifiers Ensembles: Usage for Classification of Target Events in C-OTDR Monitoring Systems

Authors: Andrey V. Timofeev

Abstract:

This paper introduces an original method for guaranteed estimation of the accuracy of an ensemble of Lipschitz classifiers. The solution was obtained as a finite closed set of alternative hypotheses, which contains an object of classification with a probability of not less than the specified value. Thus, the classification is represented by a set of hypothetical classes. In this case, the smaller the cardinality of the discrete set of hypothetical classes is, the higher is the classification accuracy. Experiments have shown that if the cardinality of the classifiers ensemble is increased then the cardinality of this set of hypothetical classes is reduced. The problem of the guaranteed estimation of the accuracy of an ensemble of Lipschitz classifiers is relevant in the multichannel classification of target events in C-OTDR monitoring systems. Results of suggested approach practical usage to accuracy control in C-OTDR monitoring systems are present.

Keywords: Lipschitz classifiers, confidence set, C-OTDR monitoring, classifiers accuracy, classifiers ensemble

Procedia PDF Downloads 458

218 Simulation of Optimal Runoff Hydrograph Using Ensemble of Radar Rainfall and Blending of Runoffs Model

Authors: Myungjin Lee, Daegun Han, Jongsung Kim, Soojun Kim, Hung Soo Kim

Abstract:

Recently, the localized heavy rainfall and typhoons are frequently occurred due to the climate change and the damage is becoming bigger. Therefore, we may need a more accurate prediction of the rainfall and runoff. However, the gauge rainfall has the limited accuracy in space. Radar rainfall is better than gauge rainfall for the explanation of the spatial variability of rainfall but it is mostly underestimated with the uncertainty involved. Therefore, the ensemble of radar rainfall was simulated using error structure to overcome the uncertainty and gauge rainfall. The simulated ensemble was used as the input data of the rainfall-runoff models for obtaining the ensemble of runoff hydrographs. The previous studies discussed about the accuracy of the rainfall-runoff model. Even if the same input data such as rainfall is used for the runoff analysis using the models in the same basin, the models can have different results because of the uncertainty involved in the models. Therefore, we used two models of the SSARR model which is the lumped model, and the Vflo model which is a distributed model and tried to simulate the optimum runoff considering the uncertainty of each rainfall-runoff model. The study basin is located in Han river basin and we obtained one integrated runoff hydrograph which is an optimum runoff hydrograph using the blending methods such as Multi-Model Super Ensemble (MMSE), Simple Model Average (SMA), Mean Square Error (MSE). From this study, we could confirm the accuracy of rainfall and rainfall-runoff model using ensemble scenario and various rainfall-runoff model and we can use this result to study flood control measure due to climate change. Acknowledgements: This work is supported by the Korea Agency for Infrastructure Technology Advancement(KAIA) grant funded by the Ministry of Land, Infrastructure and Transport (Grant 18AWMP-B083066-05).

Keywords: radar rainfall ensemble, rainfall-runoff models, blending method, optimum runoff hydrograph

Procedia PDF Downloads 244

217 Study of Biodegradable Composite Materials Based on Polylactic Acid and Vegetal Reinforcements

Authors: Manel Hannachi, Mustapha Nechiche, Said Azem

Abstract:

This study focuses on biodegradable materials made from Poly-lactic acid (PLA) and vegetal reinforcements. Three materials are developed from PLA, as a matrix, and : (i) olive kernels (OK); (ii) alfa (α) short fibers and (iii) OK+ α mixture, as reinforcements. After processing of PLA pellets and olive kernels in powder and alfa stems in short fibers, three mixtures, namely PLA-OK, PLA-α, and PLA-OK-α are prepared and homogenized in Turbula®. These mixtures are then compacted at 180°C under 10 MPa during 15 mn. Scanning Electron Microscopy (SEM) examinations show that PLA matrix adheres at surface of all reinforcements and the dispersion of these ones in matrix is good. X-ray diffraction (XRD) analyses highlight an increase of PLA inter-reticular distances, especially for the PLA-OK case. These results are explained by the dissociation of some molecules derived from reinforcements followed by diffusion of the released atoms in the structure of PLA. This is consistent with Fourier Transform Infrared Spectroscopy (FTIR) and Differential Scanning Calorimetry (DSC) analysis results.

Keywords: alfa short fibers, biodegradable composite, olive kernels, poly-lactic acid

Procedia PDF Downloads 124

216 Breast Cancer Survivability Prediction via Classifier Ensemble

Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia

Abstract:

This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.

Keywords: classifier ensemble, breast cancer survivability, data mining, SEER

Procedia PDF Downloads 292

215 A Video Surveillance System Using an Ensemble of Simple Neural Network Classifiers

Authors: Rodrigo S. Moreira, Nelson F. F. Ebecken

Abstract:

This paper proposes a maritime vessel tracker composed of an ensemble of WiSARD weightless neural network classifiers. A failure detector analyzes vessel movement with a Kalman filter and corrects the tracking, if necessary, using FFT matching. The use of the WiSARD neural network to track objects is uncommon. The additional contributions of the present study include a performance comparison with four state-of-art trackers, an experimental study of the features that improve maritime vessel tracking, the first use of an ensemble of classifiers to track maritime vessels and a new quantization algorithm that compares the values of pixel pairs.

Keywords: ram memory, WiSARD weightless neural network, object tracking, quantization

Procedia PDF Downloads 279

214 Statistical Comparison of Ensemble Based Storm Surge Forecasting Models

Authors: Amin Salighehdar, Ziwen Ye, Mingzhe Liu, Ionut Florescu, Alan F. Blumberg

Abstract:

Storm surge is an abnormal water level caused by a storm. Accurate prediction of a storm surge is a challenging problem. Researchers developed various ensemble modeling techniques to combine several individual forecasts to produce an overall presumably better forecast. There exist some simple ensemble modeling techniques in literature. For instance, Model Output Statistics (MOS), and running mean-bias removal are widely used techniques in storm surge prediction domain. However, these methods have some drawbacks. For instance, MOS is based on multiple linear regression and it needs a long period of training data. To overcome the shortcomings of these simple methods, researchers propose some advanced methods. For instance, ENSURF (Ensemble SURge Forecast) is a multi-model application for sea level forecast. This application creates a better forecast of sea level using a combination of several instances of the Bayesian Model Averaging (BMA). An ensemble dressing method is based on identifying best member forecast and using it for prediction. Our contribution in this paper can be summarized as follows. First, we investigate whether the ensemble models perform better than any single forecast. Therefore, we need to identify the single best forecast. We present a methodology based on a simple Bayesian selection method to select the best single forecast. Second, we present several new and simple ways to construct ensemble models. We use correlation and standard deviation as weights in combining different forecast models. Third, we use these ensembles and compare with several existing models in literature to forecast storm surge level. We then investigate whether developing a complex ensemble model is indeed needed. To achieve this goal, we use a simple average (one of the simplest and widely used ensemble model) as benchmark. Predicting the peak level of Surge during a storm as well as the precise time at which this peak level takes place is crucial, thus we develop a statistical platform to compare the performance of various ensemble methods. This statistical analysis is based on root mean square error of the ensemble forecast during the testing period and on the magnitude and timing of the forecasted peak surge compared to the actual time and peak. In this work, we analyze four hurricanes: hurricanes Irene and Lee in 2011, hurricane Sandy in 2012, and hurricane Joaquin in 2015. Since hurricane Irene developed at the end of August 2011 and hurricane Lee started just after Irene at the beginning of September 2011, in this study we consider them as a single contiguous hurricane event. The data set used for this study is generated by the New York Harbor Observing and Prediction System (NYHOPS). We find that even the simplest possible way of creating an ensemble produces results superior to any single forecast. We also show that the ensemble models we propose generally have better performance compared to the simple average ensemble technique.

Keywords: Bayesian learning, ensemble model, statistical analysis, storm surge prediction

Procedia PDF Downloads 284

213 Faster, Lighter, More Accurate: A Deep Learning Ensemble for Content Moderation

Authors: Arian Hosseini, Mahmudul Hasan

Abstract:

To address the increasing need for efficient and accurate content moderation, we propose an efficient and lightweight deep classification ensemble structure. Our approach is based on a combination of simple visual features, designed for high-accuracy classification of violent content with low false positives. Our ensemble architecture utilizes a set of lightweight models with narrowed-down color features, and we apply it to both images and videos. We evaluated our approach using a large dataset of explosion and blast contents and compared its performance to popular deep learning models such as ResNet-50. Our evaluation results demonstrate significant improvements in prediction accuracy, while benefiting from 7.64x faster inference and lower computation cost. While our approach is tailored to explosion detection, it can be applied to other similar content moderation and violence detection use cases as well. Based on our experiments, we propose a "think small, think many" philosophy in classification scenarios. We argue that transforming a single, large, monolithic deep model into a verification-based step model ensemble of multiple small, simple, and lightweight models with narrowed-down visual features can possibly lead to predictions with higher accuracy.

Keywords: deep classification, content moderation, ensemble learning, explosion detection, video processing

Procedia PDF Downloads 19

212 Assessing Student Collaboration in Music Ensemble Class: From the Formulation of Grading Rubrics to Their Effective Implementation

Authors: Jason Sah

Abstract:

Music ensemble class is a non-traditional classroom in the sense that it is always a group effort during rehearsal. When measuring student performance ability in class, it is imperative that the grading rubric includes a collaborative skill component. Assessments that stop short of testing students' ability to make music with others undermine the group mentality by elevating individual prowess. Applying empirical and evidence-based methodology, this research develops a grading rubric that defines the criteria for assessing collaborative skill, and then explores different strategies for implementing this rubric in a timely and effective manner. Findings show that when collaborative skill is regularly tested, students gradually shift their attention from playing their own part well to sharing their part with others.

Keywords: assessment, ensemble class, grading rubric, student collaboration

Procedia PDF Downloads 105

211 An Ensemble-based Method for Vehicle Color Recognition

Authors: Saeedeh Barzegar Khalilsaraei, Manoocheher Kelarestaghi, Farshad Eshghi

Abstract:

The vehicle color, as a prominent and stable feature, helps to identify a vehicle more accurately. As a result, vehicle color recognition is of great importance in intelligent transportation systems. Unlike conventional methods which use only a single Convolutional Neural Network (CNN) for feature extraction or classification, in this paper, four CNNs, with different architectures well-performing in different classes, are trained to extract various features from the input image. To take advantage of the distinct capability of each network, the multiple outputs are combined using a stack generalization algorithm as an ensemble technique. As a result, the final model performs better than each CNN individually in vehicle color identification. The evaluation results in terms of overall average accuracy and accuracy variance show the proposed method’s outperformance compared to the state-of-the-art rivals.

Keywords: Vehicle Color Recognition, Ensemble Algorithm, Stack Generalization, Convolutional Neural Network

Procedia PDF Downloads 48

210 Machine Learning Predictive Models for Hydroponic Systems: A Case Study Nutrient Film Technique and Deep Flow Technique

Authors: Kritiyaporn Kunsook

Abstract:

Machine learning algorithms (MLAs) such us artificial neural networks (ANNs), decision tree, support vector machines (SVMs), Naïve Bayes, and ensemble classifier by voting are powerful data driven methods that are relatively less widely used in the mapping of technique of system, and thus have not been comparatively evaluated together thoroughly in this field. The performances of a series of MLAs, ANNs, decision tree, SVMs, Naïve Bayes, and ensemble classifier by voting in technique of hydroponic systems prospectively modeling are compared based on the accuracy of each model. Classification of hydroponic systems only covers the test samples from vegetables grown with Nutrient film technique (NFT) and Deep flow technique (DFT). The feature, which are the characteristics of vegetables compose harvesting height width, temperature, require light and color. The results indicate that the classification performance of the ANNs is 98%, decision tree is 98%, SVMs is 97.33%, Naïve Bayes is 96.67%, and ensemble classifier by voting is 98.96% algorithm respectively.

Keywords: artificial neural networks, decision tree, support vector machines, naïve Bayes, ensemble classifier by voting

Procedia PDF Downloads 327

209 Extraction and Antibacterial Studies of Oil from Three Mango Kernel Obtained from Makurdi, Nigeria

Authors: K. Asemave, D. O. Abakpa, T. T. Ligom

Abstract:

The ability of bacteria to develop resistance to many antibiotics cannot be undermined, given the multifaceted health challenges in the present times. For this reason, a lot of attention is on botanicals and their products in search of new antibacterial agents. On the other hand, mango kernel oils (MKO) can be heavily valorized by taking advantage of the myriads bioactive phytochemicals it contains. Herein, we validated the use of MKO as bioactive agent against bacteria. The MKOs for the study were extracted by soxhlet means with ethanol and hexane for 4 h from 3 different mango kernels, namely; 'local' (sample A), 'julie' (sample B), and 'john' (sample C). Prior to the extraction, ground fine particles of the kernels were obtained from the seed kernels dried in oven at 100 °C for 8 h. Hexane gave higher yield of the oils than ethanol. It was also qualitatively confirmed that the mango kernel oils contain some phytochemicals such as phenol, quinone, saponin, and terpenoid. The results of the antibacterial activities of the MKO against both gram positive (Staphylococcus aureus) and gram negative (Pseudomonas aeruginosa) at different concentrations showed that the oils extracted with ethanol gave better antibacterial properties than those of the hexane. More so, the bioactivities were best with the local mango kernel oil. Indeed this work has completely validated the previous claim that MKOs are effective antibacterial agents. Thus, these oils (especially the ethanol-derived ones) can be used as bacteriostatic and antibacterial agents in say food, cosmetics, and allied industries.

Keywords: bacteria, mango, kernel, oil, phytochemicals

Procedia PDF Downloads 121

208 Ensemble-Based SVM Classification Approach for miRNA Prediction

Authors: Sondos M. Hammad, Sherin M. ElGokhy, Mahmoud M. Fahmy, Elsayed A. Sallam

Abstract:

In this paper, an ensemble-based Support Vector Machine (SVM) classification approach is proposed. It is used for miRNA prediction. Three problems, commonly associated with previous approaches, are alleviated. These problems arise due to impose assumptions on the secondary structural of premiRNA, imbalance between the numbers of the laboratory checked miRNAs and the pseudo-hairpins, and finally using a training data set that does not consider all the varieties of samples in different species. We aggregate the predicted outputs of three well-known SVM classifiers; namely, Triplet-SVM, Virgo and Mirident, weighted by their variant features without any structural assumptions. An additional SVM layer is used in aggregating the final output. The proposed approach is trained and then tested with balanced data sets. The results of the proposed approach outperform the three base classifiers. Improved values for the metrics of 88.88% f-score, 92.73% accuracy, 90.64% precision, 96.64% specificity, 87.2% sensitivity, and the area under the ROC curve is 0.91 are achieved.

Keywords: MiRNAs, SVM classification, ensemble algorithm, assumption problem, imbalance data

Procedia PDF Downloads 305

207 Evolving Convolutional Filter Using Genetic Algorithm for Image Classification

Authors: Rujia Chen, Ajit Narayanan

Abstract:

Convolutional neural networks (CNN), as typically applied in deep learning, use layer-wise backpropagation (BP) to construct filters and kernels for feature extraction. Such filters are 2D or 3D groups of weights for constructing feature maps at subsequent layers of the CNN and are shared across the entire input. BP as a gradient descent algorithm has well-known problems of getting stuck at local optima. The use of genetic algorithms (GAs) for evolving weights between layers of standard artificial neural networks (ANNs) is a well-established area of neuroevolution. In particular, the use of crossover techniques when optimizing weights can help to overcome problems of local optima. However, the application of GAs for evolving the weights of filters and kernels in CNNs is not yet an established area of neuroevolution. In this paper, a GA-based filter development algorithm is proposed. The results of the proof-of-concept experiments described in this paper show the proposed GA algorithm can find filter weights through evolutionary techniques rather than BP learning. For some simple classification tasks like geometric shape recognition, the proposed algorithm can achieve 100% accuracy. The results for MNIST classification, while not as good as possible through standard filter learning through BP, show that filter and kernel evolution warrants further investigation as a new subarea of neuroevolution for deep architectures.

Keywords: neuroevolution, convolutional neural network, genetic algorithm, filters, kernels

Procedia PDF Downloads 155

206 Recommender Systems Using Ensemble Techniques

Authors: Yeonjeong Lee, Kyoung-jae Kim, Youngtae Kim

Abstract:

This study proposes a novel recommender system that uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user’s preference. The proposed model consists of two steps. In the first step, this study uses logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. Then, this study combines the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. In the second step, this study uses the market basket analysis to extract association rules for co-purchased products. Finally, the system selects customers who have high likelihood to purchase products in each product group and recommends proper products from same or different product groups to them through above two steps. We test the usability of the proposed system by using prototype and real-world transaction and profile data. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The results also show that the proposed system may be useful in real-world online shopping store.

Keywords: product recommender system, ensemble technique, association rules, decision tree, artificial neural networks

Procedia PDF Downloads 261