Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 14

Search results for: high dimensional data

14 Bidirectional Discriminant Supervised Locality Preserving Projection for Face Recognition

Authors: Yiqin Lin, Wenbo Li

Abstract:

Dimensionality reduction and feature extraction are of crucial importance for achieving high efficiency in manipulating the high dimensional data. Two-dimensional discriminant locality preserving projection (2D-DLPP) and two-dimensional discriminant supervised LPP (2D-DSLPP) are two effective two-dimensional projection methods for dimensionality reduction and feature extraction of face image matrices. Since 2D-DLPP and 2D-DSLPP preserve the local structure information of the original data and exploit the discriminant information, they usually have good recognition performance. However, 2D-DLPP and 2D-DSLPP only employ single-sided projection, and thus the generated low dimensional data matrices have still many features. In this paper, by combining the discriminant supervised LPP with the bidirectional projection, we propose the bidirectional discriminant supervised LPP (BDSLPP). The left and right projection matrices for BDSLPP can be computed iteratively. Experimental results show that the proposed BDSLPP achieves higher recognition accuracy than 2D-DLPP, 2D-DSLPP, and bidirectional discriminant LPP (BDLPP).

Keywords: Face Recognition, Dimension Reduction, locality preserving projection, discriminant information, bidirectional projection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 64
13 Automatic Threshold Search for Heat Map Based Feature Selection: A Cancer Dataset Analysis

Authors: Carlos Huertas, Reyes Juarez-Ramirez

Abstract:

Public health is one of the most critical issues today; therefore, there is great interest to improve technologies in the area of diseases detection. With machine learning and feature selection, it has been possible to aid the diagnosis of several diseases such as cancer. In this work, we present an extension to the Heat Map Based Feature Selection algorithm, this modification allows automatic threshold parameter selection that helps to improve the generalization performance of high dimensional data such as mass spectrometry. We have performed a comparison analysis using multiple cancer datasets and compare against the well known Recursive Feature Elimination algorithm and our original proposal, the results show improved classification performance that is very competitive against current techniques.

Keywords: Cancer, Mass Spectrometry, Feature selection, biomarker discovery

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1068
12 Distribution Sampling of Vector Variance without Duplications

Authors: Erna T. Herdiani, Maman A. Djauhari

Abstract:

In recent years, the use of vector variance as a measure of multivariate variability has received much attention in wide range of statistics. This paper deals with a more economic measure of multivariate variability, defined as vector variance minus all duplication elements. For high dimensional data, this will increase the computational efficiency almost 50 % compared to the original vector variance. Its sampling distribution will be investigated to make its applications possible.

Keywords: covariance matrix, likelihood ratio test, Asymptotic distribution, vector variance

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1173
11 Visualization and Indexing of Spectral Databases

Authors: Tibor Kulcsar, Gabor Sarossy, Gabor Bereznai, Robert Auer, Janos Abonyi

Abstract:

On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.

Keywords: Clustering, similarity, indexing high dimensional databases, dimensional reduction, k-nn algorithm

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1423
10 Support Vector Fuzzy Based Neural Networks For Exchange Rate Modeling

Authors: Prof. Chokri SLIM

Abstract:

A Novel fuzzy neural network combining with support vector learning mechanism called support-vector-based fuzzy neural networks (SVBFNN) is proposed. The SVBFNN combine the capability of minimizing the empirical risk (training error) and expected risk (testing error) of support vector learning in high dimensional data spaces and the efficient human-like reasoning of FNN.

Keywords: Machine Learning, Neural Network, fuzzy inference, fuzzy modeling and rule extraction, support vector regression

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16223
9 Dimension Reduction of Microarray Data Based on Local Principal Component

Authors: Ali Anaissi, Paul J. Kennedy, Madhu Goyal

Abstract:

Analysis and visualization of microarraydata is veryassistantfor biologists and clinicians in the field of diagnosis and treatment of patients. It allows Clinicians to better understand the structure of microarray and facilitates understanding gene expression in cells. However, microarray dataset is a complex data set and has thousands of features and a very small number of observations. This very high dimensional data set often contains some noise, non-useful information and a small number of relevant features for disease or genotype. This paper proposes a non-linear dimensionality reduction algorithm Local Principal Component (LPC) which aims to maps high dimensional data to a lower dimensional space. The reduced data represents the most important variables underlying the original data. Experimental results and comparisons are presented to show the quality of the proposed algorithm. Moreover, experiments also show how this algorithm reduces high dimensional data whilst preserving the neighbourhoods of the points in the low dimensional space as in the high dimensional space.

Keywords: Principal Component Analysis, Linear Dimension Reduction, Non-Linear Dimension Reduction, Biologists

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1249
8 Assessing and Visualizing the Stability of Feature Selectors: A Case Study with Spectral Data

Authors: R.Guzman-Martinez, Oscar Garcia-Olalla, R.Alaiz-Rodriguez

Abstract:

Feature selection plays an important role in applications with high dimensional data. The assessment of the stability of feature selection/ranking algorithms becomes an important issue when the dataset is small and the aim is to gain insight into the underlying process by analyzing the most relevant features. In this work, we propose a graphical approach that enables to analyze the similarity between feature ranking techniques as well as their individual stability. Moreover, it works with whatever stability metric (Canberra distance, Spearman's rank correlation coefficient, Kuncheva's stability index,...). We illustrate this visualization technique evaluating the stability of several feature selection techniques on a spectral binary dataset. Experimental results with a neural-based classifier show that stability and ranking quality may not be linked together and both issues have to be studied jointly in order to offer answers to the domain experts.

Keywords: Data Visualization, Feature Selection Stability, Spectral data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1183
7 An Efficient and Generic Hybrid Framework for High Dimensional Data Clustering

Authors: Dharmveer Singh Rajput , P. K. Singh, Mahua Bhattacharya

Abstract:

Clustering in high dimensional space is a difficult problem which is recurrent in many fields of science and engineering, e.g., bioinformatics, image processing, pattern reorganization and data mining. In high dimensional space some of the dimensions are likely to be irrelevant, thus hiding the possible clustering. In very high dimensions it is common for all the objects in a dataset to be nearly equidistant from each other, completely masking the clusters. Hence, performance of the clustering algorithm decreases. In this paper, we propose an algorithmic framework which combines the (reduct) concept of rough set theory with the k-means algorithm to remove the irrelevant dimensions in a high dimensional space and obtain appropriate clusters. Our experiment on test data shows that this framework increases efficiency of the clustering process and accuracy of the results.

Keywords: rough set, discernibility matrix, k-means, High dimensional clustering, sub-space

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1498
6 Evaluation of the Impact of Dataset Characteristics for Classification Problems in Biological Applications

Authors: Kanthida Kusonmano, Michael Netzer, Bernhard Pfeifer, Christian Baumgartner, Klaus R. Liedl, Armin Graber

Abstract:

Availability of high dimensional biological datasets such as from gene expression, proteomic, and metabolic experiments can be leveraged for the diagnosis and prognosis of diseases. Many classification methods in this area have been studied to predict disease states and separate between predefined classes such as patients with a special disease versus healthy controls. However, most of the existing research only focuses on a specific dataset. There is a lack of generic comparison between classifiers, which might provide a guideline for biologists or bioinformaticians to select the proper algorithm for new datasets. In this study, we compare the performance of popular classifiers, which are Support Vector Machine (SVM), Logistic Regression, k-Nearest Neighbor (k-NN), Naive Bayes, Decision Tree, and Random Forest based on mock datasets. We mimic common biological scenarios simulating various proportions of real discriminating biomarkers and different effect sizes thereof. The result shows that SVM performs quite stable and reaches a higher AUC compared to other methods. This may be explained due to the ability of SVM to minimize the probability of error. Moreover, Decision Tree with its good applicability for diagnosis and prognosis shows good performance in our experimental setup. Logistic Regression and Random Forest, however, strongly depend on the ratio of discriminators and perform better when having a higher number of discriminators.

Keywords: Machine Learning, classification, high dimensional data

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1955
5 ISC–Intelligent Subspace Clustering, A Density Based Clustering Approach for High Dimensional Dataset

Authors: Sunita Jahirabadkar, Parag Kulkarni

Abstract:

Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as Ñ” – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.

Keywords: high dimensional data, Subspace Clustering, Density based clustering, dynamic parameter setting

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1650
4 Unsupervised Feature Selection Using Feature Density Functions

Authors: Mina Alibeigi, Sattar Hashemi, Ali Hamzeh

Abstract:

Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. In this paper, we propose a new unsupervised feature selection method which will remove redundant features from the original feature space by the use of probability density functions of various features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several datasets derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both classification accuracy and the number of selected features.

Keywords: Feature selection, filter, feature, Probability Density Function

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1673
3 Network Intrusion Detection Design Using Feature Selection of Soft Computing Paradigms

Authors: T. S. Chou, K. K. Yen, J. Luo

Abstract:

The network traffic data provided for the design of intrusion detection always are large with ineffective information and enclose limited and ambiguous information about users- activities. We study the problems and propose a two phases approach in our intrusion detection design. In the first phase, we develop a correlation-based feature selection algorithm to remove the worthless information from the original high dimensional database. Next, we design an intrusion detection method to solve the problems of uncertainty caused by limited and ambiguous information. In the experiments, we choose six UCI databases and DARPA KDD99 intrusion detection data set as our evaluation tools. Empirical studies indicate that our feature selection algorithm is capable of reducing the size of data set. Our intrusion detection method achieves a better performance than those of participating intrusion detectors.

Keywords: Intrusion Detection, Feature selection, Fuzzy Clustering, Dempster-Shafer theory, k-nearest neighbors

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1503
2 An Intelligent Human-Computer Interaction System for Decision Support

Authors: Chee Siong Teh, Chee Peng Lim

Abstract:

This paper proposes a novel architecture for developing decision support systems. Unlike conventional decision support systems, the proposed architecture endeavors to reveal the decision-making process such that humans' subjectivity can be incorporated into a computerized system and, at the same time, to preserve the capability of the computerized system in processing information objectively. A number of techniques used in developing the decision support system are elaborated to make the decisionmarking process transparent. These include procedures for high dimensional data visualization, pattern classification, prediction, and evolutionary computational search. An artificial data set is first employed to compare the proposed approach with other methods. A simulated handwritten data set and a real data set on liver disease diagnosis are then employed to evaluate the efficacy of the proposed approach. The results are analyzed and discussed. The potentials of the proposed architecture as a useful decision support system are demonstrated.

Keywords: Topographic Map, pattern classification, Interactive evolutionary computation, multivariate data projection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1109
1 Investigation of Some Technical Indexes inStock Forecasting Using Neural Networks

Authors: Myungsook Klassen

Abstract:

Training neural networks to capture an intrinsic property of a large volume of high dimensional data is a difficult task, as the training process is computationally expensive. Input attributes should be carefully selected to keep the dimensionality of input vectors relatively small. Technical indexes commonly used for stock market prediction using neural networks are investigated to determine its effectiveness as inputs. The feed forward neural network of Levenberg-Marquardt algorithm is applied to perform one step ahead forecasting of NASDAQ and Dow stock prices.

Keywords: Neural Networks, stock market prediction, Levenberg-Marquadt Algorithm, Technical Indexes

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1608