Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 21

Search results for: cross-validation

21 Optimizing the Probabilistic Neural Network Training Algorithm for Multi-Class Identification

Authors: Abdelhadi Lotfi, Abdelkader Benyettou

Abstract:

In this work, a training algorithm for probabilistic neural networks (PNN) is presented. The algorithm addresses one of the major drawbacks of PNN, which is the size of the hidden layer in the network. By using a cross-validation training algorithm, the number of hidden neurons is shrunk to a smaller number consisting of the most representative samples of the training set. This is done without affecting the overall architecture of the network. Performance of the network is compared against performance of standard PNN for different databases from the UCI database repository. Results show an important gain in network size and performance.

Keywords: Classification, probabilistic neural networks, network optimization, pattern recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 142
20 Specific Emitter Identification Based on Refined Composite Multiscale Dispersion Entropy

Authors: Shaoying Guo, Yanyun Xu, Meng Zhang, Weiqing Huang

Abstract:

The wireless communication network is developing rapidly, thus the wireless security becomes more and more important. Specific emitter identification (SEI) is an vital part of wireless communication security as a technique to identify the unique transmitters. In this paper, a SEI method based on multiscale dispersion entropy (MDE) and refined composite multiscale dispersion entropy (RCMDE) is proposed. The algorithms of MDE and RCMDE are used to extract features for identification of five wireless devices and cross-validation support vector machine (CV-SVM) is used as the classifier. The experimental results show that the total identification accuracy is 99.3%, even at low signal-to-noise ratio(SNR) of 5dB, which proves that MDE and RCMDE can describe the communication signal series well. In addition, compared with other methods, the proposed method is effective and provides better accuracy and stability for SEI.

Keywords: Cross-validation support vector machine, refined composite multiscale dispersion entropy, specific emitter identification, transient signal, wireless communication device.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 126
19 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: Machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 336
18 PM10 Prediction and Forecasting Using CART: A Case Study for Pleven, Bulgaria

Authors: Snezhana G. Gocheva-Ilieva, Maya P. Stoimenova

Abstract:

Ambient air pollution with fine particulate matter (PM10) is a systematic permanent problem in many countries around the world. The accumulation of a large number of measurements of both the PM10 concentrations and the accompanying atmospheric factors allow for their statistical modeling to detect dependencies and forecast future pollution. This study applies the classification and regression trees (CART) method for building and analyzing PM10 models. In the empirical study, average daily air data for the city of Pleven, Bulgaria for a period of 5 years are used. Predictors in the models are seven meteorological variables, time variables, as well as lagged PM10 variables and some lagged meteorological variables, delayed by 1 or 2 days with respect to the initial time series, respectively. The degree of influence of the predictors in the models is determined. The selected best CART models are used to forecast future PM10 concentrations for two days ahead after the last date in the modeling procedure and show very accurate results.

Keywords: Cross-validation, decision tree, lagged variables, short-term forecasting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 241
17 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 420
16 A Psychophysiological Evaluation of an Effective Recognition Technique Using Interactive Dynamic Virtual Environments

Authors: Mohammadhossein Moghimi, Robert Stone, Pia Rotshtein

Abstract:

Recording psychological and physiological correlates of human performance within virtual environments and interpreting their impacts on human engagement, ‘immersion’ and related emotional or ‘effective’ states is both academically and technologically challenging. By exposing participants to an effective, real-time (game-like) virtual environment, designed and evaluated in an earlier study, a psychophysiological database containing the EEG, GSR and Heart Rate of 30 male and female gamers, exposed to 10 games, was constructed. Some 174 features were subsequently identified and extracted from a number of windows, with 28 different timing lengths (e.g. 2, 3, 5, etc. seconds). After reducing the number of features to 30, using a feature selection technique, K-Nearest Neighbour (KNN) and Support Vector Machine (SVM) methods were subsequently employed for the classification process. The classifiers categorised the psychophysiological database into four effective clusters (defined based on a 3-dimensional space – valence, arousal and dominance) and eight emotion labels (relaxed, content, happy, excited, angry, afraid, sad, and bored). The KNN and SVM classifiers achieved average cross-validation accuracies of 97.01% (±1.3%) and 92.84% (±3.67%), respectively. However, no significant differences were found in the classification process based on effective clusters or emotion labels.

Keywords: Virtual Reality, effective computing, effective VR, emotion-based effective physiological database.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 468
15 A Linear Regression Model for Estimating Anxiety Index Using Wide Area Frontal Lobe Brain Blood Volume

Authors: Takashi Kaburagi, Masashi Takenaka, Yosuke Kurihara, Takashi Matsumoto

Abstract:

Major depressive disorder (MDD) is one of the most common mental illnesses today. It is believed to be caused by a combination of several factors, including stress. Stress can be quantitatively evaluated using the State-Trait Anxiety Inventory (STAI), one of the best indices to evaluate anxiety. Although STAI scores are widely used in applications ranging from clinical diagnosis to basic research, the scores are calculated based on a self-reported questionnaire. An objective evaluation is required because the subject may intentionally change his/her answers if multiple tests are carried out. In this article, we present a modified index called the “multi-channel Laterality Index at Rest (mc-LIR)” by recording the brain activity from a wider area of the frontal lobe using multi-channel functional near-infrared spectroscopy (fNIRS). The presented index aims to measure multiple positions near the Fpz defined by the international 10-20 system positioning. Using 24 subjects, the dependencies on the number of measuring points used to calculate the mc-LIR and its correlation coefficients with the STAI scores are reported. Furthermore, a simple linear regression was performed to estimate the STAI scores from mc-LIR. The cross-validation error is also reported. The experimental results show that using multiple positions near the Fpz will improve the correlation coefficients and estimation than those using only two positions.

Keywords: Stress, functional near-infrared spectroscopy, frontal lobe, state-trait anxiety inventory score.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 636
14 Radiochemical Purity of 68Ga-BCA-Peptides: Separation of All 68Ga Species with a Single iTLC Strip

Authors: Anton A. Larenkov, Alesya Ya Maruk

Abstract:

In the present study, highly effective iTLC single strip method for the determination of radiochemical purity (RCP) of 68Ga-BCA-peptides was developed (with no double-developing, changing of eluents or other additional manipulation). In this method iTLC-SG strips and commonly used eluent TFAaq. (3-5 % (v/v)) are used. The method allows determining each of the key radiochemical forms of 68Ga (colloidal, bound, ionic) separately with the peaks separation being no less than 4 σ. Rf = 0.0-0.1 for 68Ga-colloid; Rf = 0.5-0.6 for 68Ga-BCA-peptides; Rf = 0.9-1.0 for ionic 68Ga. The method is simple and fast: For developing length of 75 mm only 4-6 min is required (versus 18-20 min for pharmacopoeial method). The method has been tested on various compounds (including 68Ga-DOTA-TOC, 68Ga-DOTA-TATE, 68Ga-NODAGA-RGD2 etc.). The cross-validation work for every specific form of 68Ga showed good correlation between method developed and control (pharmacopoeial) methods. The method can become convenient and much more informative replacement for pharmacopoeial methods, including HPLC.

Keywords: DOTA-TATE, 68Ga, quality control, radiochemical purity, radiopharmaceuticals, iTLC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1259
13 Efficient Tuning Parameter Selection by Cross-Validated Score in High Dimensional Models

Authors: Yoonsuh Jung

Abstract:

As DNA microarray data contain relatively small sample size compared to the number of genes, high dimensional models are often employed. In high dimensional models, the selection of tuning parameter (or, penalty parameter) is often one of the crucial parts of the modeling. Cross-validation is one of the most common methods for the tuning parameter selection, which selects a parameter value with the smallest cross-validated score. However, selecting a single value as an ‘optimal’ value for the parameter can be very unstable due to the sampling variation since the sample sizes of microarray data are often small. Our approach is to choose multiple candidates of tuning parameter first, then average the candidates with different weights depending on their performance. The additional step of estimating the weights and averaging the candidates rarely increase the computational cost, while it can considerably improve the traditional cross-validation. We show that the selected value from the suggested methods often lead to stable parameter selection as well as improved detection of significant genetic variables compared to the tradition cross-validation via real data and simulated data sets.

Keywords: Cross Validation, Parameter Averaging, Parameter Selection, Regularization Parameter Search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1173
12 Classifying Students for E-Learning in Information Technology Course Using ANN

Authors: S. Areerachakul, N. Ployong, S. Na Songkla

Abstract:

This research’s objective is to select the model with most accurate value by using Neural Network Technique as a way to filter potential students who enroll in IT course by Electronic learning at Suan Suanadha Rajabhat University. It is designed to help students selecting the appropriate courses by themselves. The result showed that the most accurate model was 100 Folds Cross-validation which had 73.58% points of accuracy.

Keywords: Artificial neural network, classification, students.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1126
11 Vehicle Type Classification with Geometric and Appearance Attributes

Authors: Ghada S. Moussa

Abstract:

With the increase in population along with economic prosperity, an enormous increase in the number and types of vehicles on the roads occurred. This fact brings a growing need for efficiently yet effectively classifying vehicles into their corresponding categories, which play a crucial role in many areas of infrastructure planning and traffic management.

This paper presents two vehicle-type classification approaches; 1) geometric-based and 2) appearance-based. The two classification approaches are used for two tasks: multi-class and intra-class vehicle classifications. For the evaluation purpose of the proposed classification approaches’ performance and the identification of the most effective yet efficient one, 10-fold cross-validation technique is used with a large dataset. The proposed approaches are distinguishable from previous research on vehicle classification in which: i) they consider both geometric and appearance attributes of vehicles, and ii) they perform remarkably well in both multi-class and intra-class vehicle classification. Experimental results exhibit promising potentials implementations of the proposed vehicle classification approaches into real-world applications.

Keywords: Appearance attributes, Geometric attributes, Support vector machine, Vehicle classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3443
10 QSAR Studies of Certain Novel Heterocycles Derived from Bis-1, 2, 4 Triazoles as Anti-Tumor Agents

Authors: Madhusudan Purohit, Stephen Philip, Bharathkumar Inturi

Abstract:

In this paper we report the quantitative structure activity relationship of novel bis-triazole derivatives for predicting the activity profile. The full model encompassed a dataset of 46 Bis- triazoles. Tripos Sybyl X 2.0 program was used to conduct CoMSIA QSAR modeling. The Partial Least-Squares (PLS) analysis method was used to conduct statistical analysis and to derive a QSAR model based on the field values of CoMSIA descriptor. The compounds were divided into test and training set. The compounds were evaluated by various CoMSIA parameters to predict the best QSAR model. An optimum numbers of components were first determined separately by cross-validation regression for CoMSIA model, which were then applied in the final analysis. A series of parameters were used for the study and the best fit model was obtained using donor, partition coefficient and steric parameters. The CoMSIA models demonstrated good statistical results with regression coefficient (r2) and the cross-validated coefficient (q2) of 0.575 and 0.830 respectively. The standard error for the predicted model was 0.16322. In the CoMSIA model, the steric descriptors make a marginally larger contribution than the electrostatic descriptors. The finding that the steric descriptor is the largest contributor for the CoMSIA QSAR models is consistent with the observation that more than half of the binding site area is occupied by steric regions.

Keywords: 3D QSAR, CoMSIA, Triazoles.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1143
9 Stature Estimation Based On Lower Limb Dimensions in the Malaysian Population

Authors: F. M. Nor, N. Abdullah, Al-M. Mustapa, L. Q. Wen, N. A. Faisal, D. A. A. Ahmad Nazari

Abstract:

Estimation of stature is an important step in developing a biological profile for human identification. It may provide a valuable indicator for unknown individual in a population. The aim of this study was to analyses the relationship between stature and lower limb dimensions in the Malaysian population. The sample comprised 100 corpses, which included 69 males and 31 females between age ranges of 20 to 90 years old. The parameters measured were stature, thigh length, lower leg length, leg length, foot length, foot height and foot breadth. Results showed that mean values in males were significantly higher than those in females (P < 0.05). There were significant correlations between lower limb dimensions and stature. Cross-validation of the equation on 100 individuals showed close approximation between known stature and estimated stature. It was concluded that lower limb dimensions were useful for estimation of stature, which should be validated in future studies. 

Keywords: Forensic anthropology population data, lower leg length, Malaysian, stature.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2821
8 Autonomously Determining the Parameters for SVDD with RBF Kernel from a One-Class Training Set

Authors: Andreas Theissler, Ian Dear

Abstract:

The one-class support vector machine “support vector data description” (SVDD) is an ideal approach for anomaly or outlier detection. However, for the applicability of SVDD in real-world applications, the ease of use is crucial. The results of SVDD are massively determined by the choice of the regularisation parameter C and the kernel parameter  of the widely used RBF kernel. While for two-class SVMs the parameters can be tuned using cross-validation based on the confusion matrix, for a one-class SVM this is not possible, because only true positives and false negatives can occur during training. This paper proposes an approach to find the optimal set of parameters for SVDD solely based on a training set from one class and without any user parameterisation. Results on artificial and real data sets are presented, underpinning the usefulness of the approach.

Keywords: Support vector data description, anomaly detection, one-class classification, parameter tuning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2472
7 Comparative Study of Filter Characteristics as Statistical Vocal Correlates of Clinical Psychiatric State in Human

Authors: Thaweesak Yingthawornsuk, Chusak Thanawattano

Abstract:

Acoustical properties of speech have been shown to be related to mental states of speaker with symptoms: depression and remission. This paper describes way to address the issue of distinguishing depressed patients from remitted subjects based on measureable acoustics change of their spoken sound. The vocal-tract related frequency characteristics of speech samples from female remitted and depressed patients were analyzed via speech processing techniques and consequently, evaluated statistically by cross-validation with Support Vector Machine. Our results comparatively show the classifier's performance with effectively correct separation of 93% determined from testing with the subjectbased feature model and 88% from the frame-based model based on the same speech samples collected from hospital visiting interview sessions between patients and psychiatrists.

Keywords: Depression, SVM, Vocal Extract, Vocal Tract

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1225
6 Geostatistical Analysis and Mapping of Groundlevel Ozone in a Medium Sized Urban Area

Authors: F. J. Moral García, P. Valiente González, F. López Rodríguez

Abstract:

Ground-level tropospheric ozone is one of the air pollutants of most concern. It is mainly produced by photochemical processes involving nitrogen oxides and volatile organic compounds in the lower parts of the atmosphere. Ozone levels become particularly high in regions close to high ozone precursor emissions and during summer, when stagnant meteorological conditions with high insolation and high temperatures are common. In this work, some results of a study about urban ozone distribution patterns in the city of Badajoz, which is the largest and most industrialized city in Extremadura region (southwest Spain) are shown. Fourteen sampling campaigns, at least one per month, were carried out to measure ambient air ozone concentrations, during periods that were selected according to favourable conditions to ozone production, using an automatic portable analyzer. Later, to evaluate the ozone distribution at the city, the measured ozone data were analyzed using geostatistical techniques. Thus, first, during the exploratory analysis of data, it was revealed that they were distributed normally, which is a desirable property for the subsequent stages of the geostatistical study. Secondly, during the structural analysis of data, theoretical spherical models provided the best fit for all monthly experimental variograms. The parameters of these variograms (sill, range and nugget) revealed that the maximum distance of spatial dependence is between 302-790 m and the variable, air ozone concentration, is not evenly distributed in reduced distances. Finally, predictive ozone maps were derived for all points of the experimental study area, by use of geostatistical algorithms (kriging). High prediction accuracy was obtained in all cases as cross-validation showed. Useful information for hazard assessment was also provided when probability maps, based on kriging interpolation and kriging standard deviation, were produced.

Keywords: Kriging, map, tropospheric ozone, variogram.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1450
5 Virulent-GO: Prediction of Virulent Proteins in Bacterial Pathogens Utilizing Gene Ontology Terms

Authors: Chia-Ta Tsai, Wen-Lin Huang, Shinn-Jang Ho, Li-Sun Shu, Shinn-Ying Ho

Abstract:

Prediction of bacterial virulent protein sequences can give assistance to identification and characterization of novel virulence-associated factors and discover drug/vaccine targets against proteins indispensable to pathogenicity. Gene Ontology (GO) annotation which describes functions of genes and gene products as a controlled vocabulary of terms has been shown effectively for a variety of tasks such as gene expression study, GO annotation prediction, protein subcellular localization, etc. In this study, we propose a sequence-based method Virulent-GO by mining informative GO terms as features for predicting bacterial virulent proteins. Each protein in the datasets used by the existing method VirulentPred is annotated by using BLAST to obtain its homologies with known accession numbers for retrieving GO terms. After investigating various popular classifiers using the same five-fold cross-validation scheme, Virulent-GO using the single kind of GO term features with an accuracy of 82.5% is slightly better than VirulentPred with 81.8% using five kinds of sequence-based features. For the evaluation of independent test, Virulent-GO also yields better results (82.0%) than VirulentPred (80.7%). When evaluating single kind of feature with SVM, the GO term feature performs much well, compared with each of the five kinds of features.

Keywords: Bacterial virulence factors, GO terms, prediction, protein sequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1730
4 Detailed Mapping of Pyroclastic Flow Deposits by SAR Data Processing for an Active Volcano in the Torrid Zone

Authors: Asep Saepuloh, Katsuaki Koike

Abstract:

Field mapping activity for an active volcano mainly in the Torrid Zone is usually hampered by several problems such as steep terrain and bad atmosphere conditions. In this paper we present a simple solution for such problem by a combination Synthetic Aperture Radar (SAR) and geostatistical methods. By this combination, we could reduce the speckle effect from the SAR data and then estimate roughness distribution of the pyroclastic flow deposits. The main purpose of this study is to detect spatial distribution of new pyroclastic flow deposits termed as P-zone accurately using the β°data from two RADARSAT-1 SAR level-0 data. Single scene of Hyperion data and field observation were used for cross-validation of the SAR results. Mt. Merapi in central Java, Indonesia, was chosen as a study site and the eruptions in May-June 2006 were examined. The P-zones were found in the western and southern flanks. The area size and the longest flow distance were calculated as 2.3 km2 and 6.8 km, respectively. The grain size variation of the P-zone was mapped in detail from fine to coarse deposits regarding the C-band wavelength of 5.6 cm.

Keywords: Geostatistical Method, Mt. Merapi, Pyroclastic, RADARSAT-1.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1022
3 Performance Optimization of Data Mining Application Using Radial Basis Function Classifier

Authors: M. Govindarajan, R. M.Chandrasekaran

Abstract:

Text data mining is a process of exploratory data analysis. Classification maps data into predefined groups or classes. It is often referred to as supervised learning because the classes are determined before examining the data. This paper describes proposed radial basis function Classifier that performs comparative crossvalidation for existing radial basis function Classifier. The feasibility and the benefits of the proposed approach are demonstrated by means of data mining problem: direct Marketing. Direct marketing has become an important application field of data mining. Comparative Cross-validation involves estimation of accuracy by either stratified k-fold cross-validation or equivalent repeated random subsampling. While the proposed method may have high bias; its performance (accuracy estimation in our case) may be poor due to high variance. Thus the accuracy with proposed radial basis function Classifier was less than with the existing radial basis function Classifier. However there is smaller the improvement in runtime and larger improvement in precision and recall. In the proposed method Classification accuracy and prediction accuracy are determined where the prediction accuracy is comparatively high.

Keywords: Text Data Mining, Comparative Cross-validation, Radial Basis Function, runtime, accuracy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1250
2 Ensembling Adaptively Constructed Polynomial Regression Models

Authors: Gints Jekabsons

Abstract:

The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed – a potentially non-trivial (and long) trial and error process. In our research we consider a potentially more efficient approach – Adaptive Basis Function Construction (ABFC). It lets the model building method itself construct the basis functions necessary for creating a model of arbitrary complexity with adequate predictive performance. However, there are two issues that to some extent plague the methods of both the subset selection and the ABFC, especially when working with relatively small data samples: the selection bias and the selection instability. We try to correct these issues by model post-evaluation using Cross-Validation and model ensembling. To evaluate the proposed method, we empirically compare it to ABFC methods without ensembling, to a widely used method of subset selection, as well as to some other well-known regression modeling methods, using publicly available data sets.

Keywords: Basis function construction, heuristic search, modelensembles, polynomial regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1319
1 An Evaluation of Algorithms for Single-Echo Biosonar Target Classification

Authors: Turgay Temel, John Hallam

Abstract:

A recent neurospiking coding scheme for feature extraction from biosonar echoes of various plants is examined with avariety of stochastic classifiers. Feature vectors derived are employedin well-known stochastic classifiers, including nearest-neighborhood,single Gaussian and a Gaussian mixture with EM optimization.Classifiers' performances are evaluated by using cross-validation and bootstrapping techniques. It is shown that the various classifers perform equivalently and that the modified preprocessing configuration yields considerably improved results.

Keywords: Classification, neuro-spike coding, non-parametricmodel, parametric model, Gaussian mixture, EM algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1298