Search results for: high dimensionality

19425 An Adaptive Dimensionality Reduction Approach for Hyperspectral Imagery Semantic Interpretation

Authors: Akrem Sellami, Imed Riadh Farah, Basel Solaiman

Abstract:

With the development of HyperSpectral Imagery (HSI) technology, the spectral resolution of HSI became denser, which resulted in large number of spectral bands, high correlation between neighboring, and high data redundancy. However, the semantic interpretation is a challenging task for HSI analysis due to the high dimensionality and the high correlation of the different spectral bands. In fact, this work presents a dimensionality reduction approach that allows to overcome the different issues improving the semantic interpretation of HSI. Therefore, in order to preserve the spatial information, the Tensor Locality Preserving Projection (TLPP) has been applied to transform the original HSI. In the second step, knowledge has been extracted based on the adjacency graph to describe the different pixels. Based on the transformation matrix using TLPP, a weighted matrix has been constructed to rank the different spectral bands based on their contribution score. Thus, the relevant bands have been adaptively selected based on the weighted matrix. The performance of the presented approach has been validated by implementing several experiments, and the obtained results demonstrate the efficiency of this approach compared to various existing dimensionality reduction techniques. Also, according to the experimental results, we can conclude that this approach can adaptively select the relevant spectral improving the semantic interpretation of HSI.

Keywords: band selection, dimensionality reduction, feature extraction, hyperspectral imagery, semantic interpretation

Procedia PDF Downloads 320

19424 A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning

Authors: Samina Khalid, Shamila Nasreen

Abstract:

Dimensionality reduction as a preprocessing step to machine learning is effective in removing irrelevant and redundant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection and feature extraction methods with respect to efficiency and effectiveness. In the field of machine learning and pattern recognition, dimensionality reduction is important area, where many approaches have been proposed. In this paper, some widely used feature selection and feature extraction techniques have analyzed with the purpose of how effectively these techniques can be used to achieve high performance of learning algorithms that ultimately improves predictive accuracy of classifier. An endeavor to analyze dimensionality reduction techniques briefly with the purpose to investigate strengths and weaknesses of some widely used dimensionality reduction methods is presented.

Keywords: age related macular degeneration, feature selection feature subset selection feature extraction/transformation, FSA’s, relief, correlation based method, PCA, ICA

Procedia PDF Downloads 446

19423 A Spatial Hypergraph Based Semi-Supervised Band Selection Method for Hyperspectral Imagery Semantic Interpretation

Authors: Akrem Sellami, Imed Riadh Farah

Abstract:

Hyperspectral imagery (HSI) typically provides a wealth of information captured in a wide range of the electromagnetic spectrum for each pixel in the image. Hence, a pixel in HSI is a high-dimensional vector of intensities with a large spectral range and a high spectral resolution. Therefore, the semantic interpretation is a challenging task of HSI analysis. We focused in this paper on object classification as HSI semantic interpretation. However, HSI classification still faces some issues, among which are the following: The spatial variability of spectral signatures, the high number of spectral bands, and the high cost of true sample labeling. Therefore, the high number of spectral bands and the low number of training samples pose the problem of the curse of dimensionality. In order to resolve this problem, we propose to introduce the process of dimensionality reduction trying to improve the classification of HSI. The presented approach is a semi-supervised band selection method based on spatial hypergraph embedding model to represent higher order relationships with different weights of the spatial neighbors corresponding to the centroid of pixel. This semi-supervised band selection has been developed to select useful bands for object classification. The presented approach is evaluated on AVIRIS and ROSIS HSIs and compared to other dimensionality reduction methods. The experimental results demonstrate the efficacy of our approach compared to many existing dimensionality reduction methods for HSI classification.

Keywords: dimensionality reduction, hyperspectral image, semantic interpretation, spatial hypergraph

Procedia PDF Downloads 279

19422 Microarray Gene Expression Data Dimensionality Reduction Using PCA

Authors: Fuad M. Alkoot

Abstract:

Different experimental technologies such as microarray sequencing have been proposed to generate high-resolution genetic data, in order to understand the complex dynamic interactions between complex diseases and the biological system components of genes and gene products. However, the generated samples have a very large dimension reaching thousands. Therefore, hindering all attempts to design a classifier system that can identify diseases based on such data. Additionally, the high overlap in the class distributions makes the task more difficult. The data we experiment with is generated for the identification of autism. It includes 142 samples, which is small compared to the large dimension of the data. The classifier systems trained on this data yield very low classification rates that are almost equivalent to a guess. We aim at reducing the data dimension and improve it for classification. Here, we experiment with applying a multistage PCA on the genetic data to reduce its dimensionality. Results show a significant improvement in the classification rates which increases the possibility of building an automated system for autism detection.

Keywords: PCA, gene expression, dimensionality reduction, classification, autism

Procedia PDF Downloads 525

19421 Hydrothermally Fabricated 3-D Nanostructure Metal Oxide Sensors

Authors: Mohammad Alenezi

Abstract:

Hierarchical nanostructures with higher dimensionality, consisting of nanostructure building blocks such as nanowires, nanotubes, or nanosheets are very attractive. They hold great properties like the high surface-to-volume ratio and well-ordered porous structures, which can be very challenging to attain for other mono-morphological nanostructures. Well-ordered hierarchical nanostructures with high surface-to-volume ratios facilitate gas diffusion into their surfaces as well as scattering of light. Therefore, hierarchical nanostructures are expected to perform highly as gas sensors. A multistage controlled hydrothermal synthesis method to fabricate high-performance single ZnO brushlike hierarchical nanostructure gas sensor from initial nanowires is reported. The performance of the sensor based on brush-like hierarchical nanostructure is analyzed and compared to that of a nanowire gas sensor. The hierarchical gas sensor demonstrated high sensitivity toward low concentration of acetone at high speed of response. The enhancement in the hierarchical sensor performance is attributed to the increased surface to volume ratio, reduction in dimensionality of the nanowire building blocks, formation of junctions between the initial nanowire and the secondary nanowires, and enhanced gas diffusion into the surfaces of the hierarchical nanostructures.

Keywords: metal oxide, nanostructure, hydrothermal, sensor

Procedia PDF Downloads 238

19420 Using Confirmatory Factor Analysis to Test the Dimensional Structure of Tourism Service Quality

Authors: Ibrahim A. Elshaer, Alaa M. Shaker

Abstract:

Several previous empirical studies have operationalized service quality as either a multidimensional or unidimensional construct. While few earlier studies investigated some practices of the assumed dimensional structure of service quality, no study has been found to have tested the construct’s dimensionality using confirmatory factor analysis (CFA). To gain a better insight into the dimensional structure of service quality construct, this paper tests its dimensionality using three CFA models (higher order factor model, oblique factor model, and one factor model) on a set of data collected from 390 British tourists visited Egypt. The results of the three tests models indicate that service quality construct is multidimensional. This result helps resolving the problems that might arise from the lack of clarity concerning the dimensional structure of service quality, as without testing the dimensional structure of a measure, researchers cannot assume that the significant correlation is a result of factors measuring the same construct.

Keywords: service quality, dimensionality, confirmatory factor analysis, Egypt

Procedia PDF Downloads 550

19419 A Local Invariant Generalized Hough Transform Method for Integrated Circuit Visual Positioning

Authors: Wei Feilong

Abstract:

In this study, an local invariant generalized Houghtransform (LI-GHT) method is proposed for integrated circuit (IC) visual positioning. The original generalized Hough transform (GHT) is robust to external noise; however, it is not suitable for visual positioning of IC chips due to the four-dimensionality (4D) of parameter space which leads to the substantial storage requirement and high computational complexity. The proposed LI-GHT method can reduce the dimensionality of parameter space to 2D thanks to the rotational invariance of local invariant geometric feature and it can estimate the accuracy position and rotation angle of IC chips in real-time under noise and blur influence. The experiment results show that the proposed LI-GHT can estimate position and rotation angle of IC chips with high accuracy and fast speed. The proposed LI-GHT algorithm was implemented in IC visual positioning system of radio frequency identification (RFID) packaging equipment.

Keywords: Integrated Circuit Visual Positioning, Generalized Hough Transform, Local invariant Generalized Hough Transform, ICpacking equipment

Procedia PDF Downloads 234

19418 Solving Dimensionality Problem and Finding Statistical Constructs on Latent Regression Models: A Novel Methodology with Real Data Application

Authors: Sergio Paez Moncaleano, Alvaro Mauricio Montenegro

Abstract:

This paper presents a novel statistical methodology for measuring and founding constructs in Latent Regression Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations on Item Response Theory (IRT). In addition, based on the fundamentals of submodel theory and with a convergence of many ideas of IRT, we propose an algorithm not just to solve the dimensionality problem (nowadays an open discussion) but a new research field that promises more fear and realistic qualifications for examiners and a revolution on IRT and educational research. In the end, the methodology is applied to a set of real data set presenting impressive results for the coherence, speed and precision. Acknowledgments: This research was financed by Colciencias through the project: 'Multidimensional Item Response Theory Models for Practical Application in Large Test Designed to Measure Multiple Constructs' and both authors belong to SICS Research Group from Universidad Nacional de Colombia.

Keywords: item response theory, dimensionality, submodel theory, factorial analysis

Procedia PDF Downloads 338

19417 Genomic Sequence Representation Learning: An Analysis of K-Mer Vector Embedding Dimensionality

Authors: James Jr. Mashiyane, Risuna Nkolele, Stephanie J. Müller, Gciniwe S. Dlamini, Rebone L. Meraba, Darlington S. Mapiye

Abstract:

When performing language tasks in natural language processing (NLP), the dimensionality of word embeddings is chosen either ad-hoc or is calculated by optimizing the Pairwise Inner Product (PIP) loss. The PIP loss is a metric that measures the dissimilarity between word embeddings, and it is obtained through matrix perturbation theory by utilizing the unitary invariance of word embeddings. Unlike in natural language, in genomics, especially in genome sequence processing, unlike in natural language processing, there is no notion of a “word,” but rather, there are sequence substrings of length k called k-mers. K-mers sizes matter, and they vary depending on the goal of the task at hand. The dimensionality of word embeddings in NLP has been studied using the matrix perturbation theory and the PIP loss. In this paper, the sufficiency and reliability of applying word-embedding algorithms to various genomic sequence datasets are investigated to understand the relationship between the k-mer size and their embedding dimension. This is completed by studying the scaling capability of three embedding algorithms, namely Latent Semantic analysis (LSA), Word2Vec, and Global Vectors (GloVe), with respect to the k-mer size. Utilising the PIP loss as a metric to train embeddings on different datasets, we also show that Word2Vec outperforms LSA and GloVe in accurate computing embeddings as both the k-mer size and vocabulary increase. Finally, the shortcomings of natural language processing embedding algorithms in performing genomic tasks are discussed.

Keywords: word embeddings, k-mer embedding, dimensionality reduction

Procedia PDF Downloads 88

19416 The Spectral Power Amplification on the Regular Lattices

Authors: Kotbi Lakhdar, Hachi Mostefa

Abstract:

We show that a simple transformation between the regular lattices (the square, the triangular, and the honeycomb) belonging to the same dimensionality can explain in a natural way the universality of the critical exponents found in phase transitions and critical phenomena. It suffices that the Hamiltonian and the lattice present similar writing forms. In addition, it appears that if a property can be calculated for a given lattice then it can be extrapolated simply to any other lattice belonging to the same dimensionality. In this study, we have restricted ourselves on the spectral power amplification (SPA), we note that the SPA does not have an effect on the critical exponents but does have an effect by the criticality temperature of the lattice; the generalisation to other lattice could be shown according to the containment principle.

Keywords: ising model, phase transitions, critical temperature, critical exponent, spectral power amplification

Procedia PDF Downloads 276

19415 The Effect of Feature Selection on Pattern Classification

Authors: Chih-Fong Tsai, Ya-Han Hu

Abstract:

The aim of feature selection (or dimensionality reduction) is to filter out unrepresentative features (or variables) making the classifier perform better than the one without feature selection. Since there are many well-known feature selection algorithms, and different classifiers based on different selection results may perform differently, very few studies consider examining the effect of performing different feature selection algorithms on the classification performances by different classifiers over different types of datasets. In this paper, two widely used algorithms, which are the genetic algorithm (GA) and information gain (IG), are used to perform feature selection. On the other hand, three well-known classifiers are constructed, which are the CART decision tree (DT), multi-layer perceptron (MLP) neural network, and support vector machine (SVM). Based on 14 different types of datasets, the experimental results show that in most cases IG is a better feature selection algorithm than GA. In addition, the combinations of IG with DT and IG with SVM perform best and second best for small and large scale datasets.

Keywords: data mining, feature selection, pattern classification, dimensionality reduction

Procedia PDF Downloads 632

19414 Music Genre Classification Based on Non-Negative Matrix Factorization Features

Authors: Soyon Kim, Edward Kim

Abstract:

In order to retrieve information from the massive stream of songs in the music industry, music search by title, lyrics, artist, mood, and genre has become more important. Despite the subjectivity and controversy over the definition of music genres across different nations and cultures, automatic genre classification systems that facilitate the process of music categorization have been developed. Manual genre selection by music producers is being provided as statistical data for designing automatic genre classification systems. In this paper, an automatic music genre classification system utilizing non-negative matrix factorization (NMF) is proposed. Short-term characteristics of the music signal can be captured based on the timbre features such as mel-frequency cepstral coefficient (MFCC), decorrelated filter bank (DFB), octave-based spectral contrast (OSC), and octave band sum (OBS). Long-term time-varying characteristics of the music signal can be summarized with (1) the statistical features such as mean, variance, minimum, and maximum of the timbre features and (2) the modulation spectrum features such as spectral flatness measure, spectral crest measure, spectral peak, spectral valley, and spectral contrast of the timbre features. Not only these conventional basic long-term feature vectors, but also NMF based feature vectors are proposed to be used together for genre classification. In the training stage, NMF basis vectors were extracted for each genre class. The NMF features were calculated in the log spectral magnitude domain (NMF-LSM) as well as in the basic feature vector domain (NMF-BFV). For NMF-LSM, an entire full band spectrum was used. However, for NMF-BFV, only low band spectrum was used since high frequency modulation spectrum of the basic feature vectors did not contain important information for genre classification. In the test stage, using the set of pre-trained NMF basis vectors, the genre classification system extracted the NMF weighting values of each genre as the NMF feature vectors. A support vector machine (SVM) was used as a classifier. The GTZAN multi-genre music database was used for training and testing. It is composed of 10 genres and 100 songs for each genre. To increase the reliability of the experiments, 10-fold cross validation was used. For a given input song, an extracted NMF-LSM feature vector was composed of 10 weighting values that corresponded to the classification probabilities for 10 genres. An NMF-BFV feature vector also had a dimensionality of 10. Combined with the basic long-term features such as statistical features and modulation spectrum features, the NMF features provided the increased accuracy with a slight increase in feature dimensionality. The conventional basic features by themselves yielded 84.0% accuracy, but the basic features with NMF-LSM and NMF-BFV provided 85.1% and 84.2% accuracy, respectively. The basic features required dimensionality of 460, but NMF-LSM and NMF-BFV required dimensionalities of 10 and 10, respectively. Combining the basic features, NMF-LSM and NMF-BFV together with the SVM with a radial basis function (RBF) kernel produced the significantly higher classification accuracy of 88.3% with a feature dimensionality of 480.

Keywords: mel-frequency cepstral coefficient (MFCC), music genre classification, non-negative matrix factorization (NMF), support vector machine (SVM)

Procedia PDF Downloads 261

19413 Hierarchical Piecewise Linear Representation of Time Series Data

Authors: Vineetha Bettaiah, Heggere S. Ranganath

Abstract:

This paper presents a Hierarchical Piecewise Linear Approximation (HPLA) for the representation of time series data in which the time series is treated as a curve in the time-amplitude image space. The curve is partitioned into segments by choosing perceptually important points as break points. Each segment between adjacent break points is recursively partitioned into two segments at the best point or midpoint until the error between the approximating line and the original curve becomes less than a pre-specified threshold. The HPLA representation achieves dimensionality reduction while preserving prominent local features and general shape of time series. The representation permits course-fine processing at different levels of details, allows flexible definition of similarity based on mathematical measures or general time series shape, and supports time series data mining operations including query by content, clustering and classification based on whole or subsequence similarity.

Keywords: data mining, dimensionality reduction, piecewise linear representation, time series representation

Procedia PDF Downloads 242

19412 Novel Recommender Systems Using Hybrid CF and Social Network Information

Authors: Kyoung-Jae Kim

Abstract:

Collaborative Filtering (CF) is a popular technique for the personalization in the E-commerce domain to reduce information overload. In general, CF provides recommending items list based on other similar users’ preferences from the user-item matrix and predicts the focal user’s preference for particular items by using them. Many recommender systems in real-world use CF techniques because it’s excellent accuracy and robustness. However, it has some limitations including sparsity problems and complex dimensionality in a user-item matrix. In addition, traditional CF does not consider the emotional interaction between users. In this study, we propose recommender systems using social network and singular value decomposition (SVD) to alleviate some limitations. The purpose of this study is to reduce the dimensionality of data set using SVD and to improve the performance of CF by using emotional information from social network data of the focal user. In this study, we test the usability of hybrid CF, SVD and social network information model using the real-world data. The experimental results show that the proposed model outperforms conventional CF models.

Keywords: recommender systems, collaborative filtering, social network information, singular value decomposition

Procedia PDF Downloads 253

19411 A Comparative Study of Additive and Nonparametric Regression Estimators and Variable Selection Procedures

Authors: Adriano Z. Zambom, Preethi Ravikumar

Abstract:

One of the biggest challenges in nonparametric regression is the curse of dimensionality. Additive models are known to overcome this problem by estimating only the individual additive effects of each covariate. However, if the model is misspecified, the accuracy of the estimator compared to the fully nonparametric one is unknown. In this work the efficiency of completely nonparametric regression estimators such as the Loess is compared to the estimators that assume additivity in several situations, including additive and non-additive regression scenarios. The comparison is done by computing the oracle mean square error of the estimators with regards to the true nonparametric regression function. Then, a backward elimination selection procedure based on the Akaike Information Criteria is proposed, which is computed from either the additive or the nonparametric model. Simulations show that if the additive model is misspecified, the percentage of time it fails to select important variables can be higher than that of the fully nonparametric approach. A dimension reduction step is included when nonparametric estimator cannot be computed due to the curse of dimensionality. Finally, the Boston housing dataset is analyzed using the proposed backward elimination procedure and the selected variables are identified.

Keywords: additive model, nonparametric regression, variable selection, Akaike Information Criteria

Procedia PDF Downloads 235

19410 Study of Thermal and Mechanical Properties of Ethylene/1-Octene Copolymer Based Nanocomposites

Authors: Sharmila Pradhan, Ralf Lach, George Michler, Jean Mark Saiter, Rameshwar Adhikari

Abstract:

Ethylene/1-octene copolymer was modified incorporating three types of nanofillers differed in their dimensionality in order to investigate the effect of filler dimensionality on mechanical properties, for instance, tensile strength, microhardness etc. The samples were prepared by melt mixing followed by compression moldings. The microstructure of the novel material was characterized by Fourier transform infrared spectroscopy (FTIR), X-ray diffraction (XRD) method and Transmission electron microscopy (TEM). Other important properties such as melting, crystallizing and thermal stability were also investigated via differential scanning calorimetry (DSC) and Thermogravimetry analysis (TGA). The FTIR and XRD results showed that the composites were formed by physical mixing. The TEM result supported the homogeneous dispersion of nanofillers in the matrix. The mechanical characterization performed by tensile testing showed that the composites with 1D nanofiller effectively reinforced the polymer. TGA results revealed that the thermal stability of pure EOC is marginally improved by the addition of nanofillers. Likewise, melting and crystallizing properties of the composites are not much different from that of pure.

Keywords: copolymer, differential scanning calorimetry, nanofiller, tensile strength

Procedia PDF Downloads 202

19409 Feature Evaluation Based on Random Subspace and Multiple-K Ensemble

Authors: Jaehong Yu, Seoung Bum Kim

Abstract:

Clustering analysis can facilitate the extraction of intrinsic patterns in a dataset and reveal its natural groupings without requiring class information. For effective clustering analysis in high dimensional datasets, unsupervised dimensionality reduction is an important task. Unsupervised dimensionality reduction can generally be achieved by feature extraction or feature selection. In many situations, feature selection methods are more appropriate than feature extraction methods because of their clear interpretation with respect to the original features. The unsupervised feature selection can be categorized as feature subset selection and feature ranking method, and we focused on unsupervised feature ranking methods which evaluate the features based on their importance scores. Recently, several unsupervised feature ranking methods were developed based on ensemble approaches to achieve their higher accuracy and stability. However, most of the ensemble-based feature ranking methods require the true number of clusters. Furthermore, these algorithms evaluate the feature importance depending on the ensemble clustering solution, and they produce undesirable evaluation results if the clustering solutions are inaccurate. To address these limitations, we proposed an ensemble-based feature ranking method with random subspace and multiple-k ensemble (FRRM). The proposed FRRM algorithm evaluates the importance of each feature with the random subspace ensemble, and all evaluation results are combined with the ensemble importance scores. Moreover, FRRM does not require the determination of the true number of clusters in advance through the use of the multiple-k ensemble idea. Experiments on various benchmark datasets were conducted to examine the properties of the proposed FRRM algorithm and to compare its performance with that of existing feature ranking methods. The experimental results demonstrated that the proposed FRRM outperformed the competitors.

Keywords: clustering analysis, multiple-k ensemble, random subspace-based feature evaluation, unsupervised feature ranking

Procedia PDF Downloads 302

19408 Spatial Rank-Based High-Dimensional Monitoring through Random Projection

Authors: Chen Zhang, Nan Chen

Abstract:

High-dimensional process monitoring becomes increasingly important in many application domains, where usually the process distribution is unknown and much more complicated than the normal distribution, and the between-stream correlation can not be neglected. However, since the process dimension is generally much bigger than the reference sample size, most traditional nonparametric multivariate control charts fail in high-dimensional cases due to the curse of dimensionality. Furthermore, when the process goes out of control, the influenced variables are quite sparse compared with the whole dimension, which increases the detection difficulty. Targeting at these issues, this paper proposes a new nonparametric monitoring scheme for high-dimensional processes. This scheme first projects the high-dimensional process into several subprocesses using random projections for dimension reduction. Then, for every subprocess with the dimension much smaller than the reference sample size, a local nonparametric control chart is constructed based on the spatial rank test to detect changes in this subprocess. Finally, the results of all the local charts are fused together for decision. Furthermore, after an out-of-control (OC) alarm is triggered, a diagnostic framework is proposed. using the square-root LASSO. Numerical studies demonstrate that the chart has satisfactory detection power for sparse OC changes and robust performance for non-normally distributed data, The diagnostic framework is also effective to identify truly changed variables. Finally, a real-data example is presented to demonstrate the application of the proposed method.

Keywords: random projection, high-dimensional process control, spatial rank, sequential change detection

Procedia PDF Downloads 271

19407 Enhanced Image Representation for Deep Belief Network Classification of Hyperspectral Images

Authors: Khitem Amiri, Mohamed Farah

Abstract:

Image classification is a challenging task and is gaining lots of interest since it helps us to understand the content of images. Recently Deep Learning (DL) based methods gave very interesting results on several benchmarks. For Hyperspectral images (HSI), the application of DL techniques is still challenging due to the scarcity of labeled data and to the curse of dimensionality. Among other approaches, Deep Belief Network (DBN) based approaches gave a fair classification accuracy. In this paper, we address the problem of the curse of dimensionality by reducing the number of bands and replacing the HSI channels by the channels representing radiometric indices. Therefore, instead of using all the HSI bands, we compute the radiometric indices such as NDVI (Normalized Difference Vegetation Index), NDWI (Normalized Difference Water Index), etc, and we use the combination of these indices as input for the Deep Belief Network (DBN) based classification model. Thus, we keep almost all the pertinent spectral information while reducing considerably the size of the image. In order to test our image representation, we applied our method on several HSI datasets including the Indian pines dataset, Jasper Ridge data and it gave comparable results to the state of the art methods while reducing considerably the time of training and testing.

Keywords: hyperspectral images, deep belief network, radiometric indices, image classification

Procedia PDF Downloads 241

19406 SC-LSH: An Efficient Indexing Method for Approximate Similarity Search in High Dimensional Space

Authors: Sanaa Chafik, Imane Daoudi, Mounim A. El Yacoubi, Hamid El Ouardi

Abstract:

Locality Sensitive Hashing (LSH) is one of the most promising techniques for solving nearest neighbour search problem in high dimensional space. Euclidean LSH is the most popular variation of LSH that has been successfully applied in many multimedia applications. However, the Euclidean LSH presents limitations that affect structure and query performances. The main limitation of the Euclidean LSH is the large memory consumption. In order to achieve a good accuracy, a large number of hash tables is required. In this paper, we propose a new hashing algorithm to overcome the storage space problem and improve query time, while keeping a good accuracy as similar to that achieved by the original Euclidean LSH. The Experimental results on a real large-scale dataset show that the proposed approach achieves good performances and consumes less memory than the Euclidean LSH.

Keywords: approximate nearest neighbor search, content based image retrieval (CBIR), curse of dimensionality, locality sensitive hashing, multidimensional indexing, scalability

Procedia PDF Downloads 299

19405 Combining Diffusion Maps and Diffusion Models for Enhanced Data Analysis

Authors: Meng Su

Abstract:

High-dimensional data analysis often presents challenges in capturing the complex, nonlinear relationships and manifold structures inherent to the data. This article presents a novel approach that leverages the strengths of two powerful techniques, Diffusion Maps and Diffusion Probabilistic Models (DPMs), to address these challenges. By integrating the dimensionality reduction capability of Diffusion Maps with the data modeling ability of DPMs, the proposed method aims to provide a comprehensive solution for analyzing and generating high-dimensional data. The Diffusion Map technique preserves the nonlinear relationships and manifold structure of the data by mapping it to a lower-dimensional space using the eigenvectors of the graph Laplacian matrix. Meanwhile, DPMs capture the dependencies within the data, enabling effective modeling and generation of new data points in the low-dimensional space. The generated data points can then be mapped back to the original high-dimensional space, ensuring consistency with the underlying manifold structure. Through a detailed example implementation, the article demonstrates the potential of the proposed hybrid approach to achieve more accurate and effective modeling and generation of complex, high-dimensional data. Furthermore, it discusses possible applications in various domains, such as image synthesis, time-series forecasting, and anomaly detection, and outlines future research directions for enhancing the scalability, performance, and integration with other machine learning techniques. By combining the strengths of Diffusion Maps and DPMs, this work paves the way for more advanced and robust data analysis methods.

Keywords: diffusion maps, diffusion probabilistic models (DPMs), manifold learning, high-dimensional data analysis

Procedia PDF Downloads 62

19404 Detection and Classification of Mammogram Images Using Principle Component Analysis and Lazy Classifiers

Authors: Rajkumar Kolangarakandy

Abstract:

Feature extraction and selection is the primary part of any mammogram classification algorithms. The choice of feature, attribute or measurements have an important influence in any classification system. Discrete Wavelet Transformation (DWT) coefficients are one of the prominent features for representing images in frequency domain. The features obtained after the decomposition of the mammogram images using wavelet transformations have higher dimension. Even though the features are higher in dimension, they were highly correlated and redundant in nature. The dimensionality reduction techniques play an important role in selecting the optimum number of features from the higher dimension data, which are highly correlated. PCA is a mathematical tool that reduces the dimensionality of the data while retaining most of the variation in the dataset. In this paper, a multilevel classification of mammogram images using reduced discrete wavelet transformation coefficients and lazy classifiers is proposed. The classification is accomplished in two different levels. In the first level, mammogram ROIs extracted from the dataset is classified as normal and abnormal types. In the second level, all the abnormal mammogram ROIs is classified into benign and malignant too. A further classification is also accomplished based on the variation in structure and intensity distribution of the images in the dataset. The Lazy classifiers called Kstar, IBL and LWL are used for classification. The classification results obtained with the reduced feature set is highly promising and the result is also compared with the performance obtained without dimension reduction.

Keywords: PCA, wavelet transformation, lazy classifiers, Kstar, IBL, LWL

Procedia PDF Downloads 304

19403 Ordinary Differentiation Equations (ODE) Reconstruction of High-Dimensional Genetic Networks through Game Theory with Application to Dissecting Tree Salt Tolerance

Authors: Libo Jiang, Huan Li, Rongling Wu

Abstract:

Ordinary differentiation equations (ODE) have proven to be powerful for reconstructing precise and informative gene regulatory networks (GRNs) from dynamic gene expression data. However, joint modeling and analysis of all genes, essential for the systematical characterization of genetic interactions, are challenging due to high dimensionality and a complex pattern of genetic regulation including activation, repression, and antitermination. Here, we address these challenges by unifying variable selection and game theory through ODE. Each gene within a GRN is co-expressed with its partner genes in a way like a game of multiple players, each of which tends to choose an optimal strategy to maximize its “fitness” across the whole network. Based on this unifying theory, we designed and conducted a real experiment to infer salt tolerance-related GRNs for Euphrates poplar, a hero tree that can grow in the saline desert. The pattern and magnitude of interactions between several hub genes within these GRNs were found to determine the capacity of Euphrates poplar to resist to saline stress.

Keywords: gene regulatory network, ordinary differential equation, game theory, LASSO, saline resistance

Procedia PDF Downloads 609

19402 Mathematical Based Forecasting of Heart Attack

Authors: Razieh Khalafi

Abstract:

Myocardial infarction (MI) or acute myocardial infarction (AMI), commonly known as a heart attack, occurs when blood flow stops to part of the heart causing damage to the heart muscle. An ECG can often show evidence of a previous heart attack or one that's in progress. The patterns on the ECG may indicate which part of your heart has been damaged, as well as the extent of the damage. In chaos theory, the correlation dimension is a measure of the dimensionality of the space occupied by a set of random points, often referred to as a type of fractal dimension. In this research by considering ECG signal as a random walk we work on forecasting the oncoming heart attack by analyzing the ECG signals using the correlation dimension. In order to test the model a set of ECG signals for patients before and after heart attack was used and the strength of model for forecasting the behavior of these signals were checked. Results shows this methodology can forecast the ECG and accordingly heart attack with high accuracy.

Keywords: heart attack, ECG, random walk, correlation dimension, forecasting

Procedia PDF Downloads 500

19401 A New Mathematical Method for Heart Attack Forecasting

Authors: Razi Khalafi

Abstract:

Myocardial Infarction (MI) or acute Myocardial Infarction (AMI), commonly known as a heart attack, occurs when blood flow stops to part of the heart causing damage to the heart muscle. An ECG can often show evidence of a previous heart attack or one that's in progress. The patterns on the ECG may indicate which part of your heart has been damaged, as well as the extent of the damage. In chaos theory, the correlation dimension is a measure of the dimensionality of the space occupied by a set of random points, often referred to as a type of fractal dimension. In this research by considering ECG signal as a random walk we work on forecasting the oncoming heart attack by analysing the ECG signals using the correlation dimension. In order to test the model a set of ECG signals for patients before and after heart attack was used and the strength of model for forecasting the behaviour of these signals were checked. Results show this methodology can forecast the ECG and accordingly heart attack with high accuracy.

Keywords: heart attack, ECG, random walk, correlation dimension, forecasting

Procedia PDF Downloads 463

19400 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu

Abstract:

Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 366

19399 Quantum Computing with Qudits on a Graph

Authors: Aleksey Fedorov

Abstract:

Building a scalable platform for quantum computing remains one of the most challenging tasks in quantum science and technologies. However, the implementation of most important quantum operations with qubits (quantum analogues of classical bits), such as multiqubit Toffoli gate, requires either a polynomial number of operation or a linear number of operations with the use of ancilla qubits. Therefore, the reduction of the number of operations in the presence of scalability is a crucial goal in quantum information processing. One of the most elegant ideas in this direction is to use qudits (multilevel systems) instead of qubits and rely on additional levels of qudits instead of ancillas. Although some of the already obtained results demonstrate a reduction of the number of operation, they suffer from high complexity and/or of the absence of scalability. We show a strong reduction of the number of operations for the realization of the Toffoli gate by using qudits for a scalable multi-qudit processor. This is done on the basis of a general relation between the dimensionality of qudits and their topology of connections, that we derived.

Keywords: quantum computing, qudits, Toffoli gates, gate decomposition

Procedia PDF Downloads 112

19398 Approach Based on Fuzzy C-Means for Band Selection in Hyperspectral Images

Authors: Diego Saqui, José H. Saito, José R. Campos, Lúcio A. de C. Jorge

Abstract:

Hyperspectral images and remote sensing are important for many applications. A problem in the use of these images is the high volume of data to be processed, stored and transferred. Dimensionality reduction techniques can be used to reduce the volume of data. In this paper, an approach to band selection based on clustering algorithms is presented. This approach allows to reduce the volume of data. The proposed structure is based on Fuzzy C-Means (or K-Means) and NWHFC algorithms. New attributes in relation to other studies in the literature, such as kurtosis and low correlation, are also considered. A comparison of the results of the approach using the Fuzzy C-Means and K-Means with different attributes is performed. The use of both algorithms show similar good results but, particularly when used attributes variance and kurtosis in the clustering process, however applicable in hyperspectral images.

Keywords: band selection, fuzzy c-means, k-means, hyperspectral image

Procedia PDF Downloads 365

19397 Stacking Ensemble Approach for Combining Different Methods in Real Estate Prediction

Authors: Sol Girouard, Zona Kostic

Abstract:

A home is often the largest and most expensive purchase a person makes. Whether the decision leads to a successful outcome will be determined by a combination of critical factors. In this paper, we propose a method that efficiently handles all the factors in residential real estate and performs predictions given a feature space with high dimensionality while controlling for overfitting. The proposed method was built on gradient descent and boosting algorithms and uses a mixed optimizing technique to improve the prediction power. Usually, a single model cannot handle all the cases thus our approach builds multiple models based on different subsets of the predictors. The algorithm was tested on 3 million homes across the U.S., and the experimental results demonstrate the efficiency of this approach by outperforming techniques currently used in forecasting prices. With everyday changes on the real estate market, our proposed algorithm capitalizes from new events allowing more efficient predictions.

Keywords: real estate prediction, gradient descent, boosting, ensemble methods, active learning, training

Procedia PDF Downloads 241

19396 Sentiment Classification of Documents

Authors: Swarnadip Ghosh

Abstract:

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it determines whether a piece of writing is positive, negative or neutral.Sentiment analysis of documents holds great importance in today's world, when numerous information is stored in databases and in the world wide web. An efficient algorithm to illicit such information, would be beneficial for social, economic as well as medical purposes. In this project, we have developed an algorithm to classify a document into positive or negative. Using our algorithm, we obtained a feature set from the data, and classified the documents based on this feature set. It is important to note that, in the classification, we have not used the independence assumption, which is considered by many procedures like the Naive Bayes. This makes the algorithm more general in scope. Moreover, because of the sparsity and high dimensionality of such data, we did not use empirical distribution for estimation, but developed a method by finding degree of close clustering of the data points. We have applied our algorithm on a movie review data set obtained from IMDb and obtained satisfactory results.

Keywords: sentiment, Run's Test, cross validation, higher dimensional pmf estimation

Procedia PDF Downloads 369