Search results for: curse of dimensionality
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 138

Search results for: curse of dimensionality

108 Feature Evaluation Based on Random Subspace and Multiple-K Ensemble

Authors: Jaehong Yu, Seoung Bum Kim

Abstract:

Clustering analysis can facilitate the extraction of intrinsic patterns in a dataset and reveal its natural groupings without requiring class information. For effective clustering analysis in high dimensional datasets, unsupervised dimensionality reduction is an important task. Unsupervised dimensionality reduction can generally be achieved by feature extraction or feature selection. In many situations, feature selection methods are more appropriate than feature extraction methods because of their clear interpretation with respect to the original features. The unsupervised feature selection can be categorized as feature subset selection and feature ranking method, and we focused on unsupervised feature ranking methods which evaluate the features based on their importance scores. Recently, several unsupervised feature ranking methods were developed based on ensemble approaches to achieve their higher accuracy and stability. However, most of the ensemble-based feature ranking methods require the true number of clusters. Furthermore, these algorithms evaluate the feature importance depending on the ensemble clustering solution, and they produce undesirable evaluation results if the clustering solutions are inaccurate. To address these limitations, we proposed an ensemble-based feature ranking method with random subspace and multiple-k ensemble (FRRM). The proposed FRRM algorithm evaluates the importance of each feature with the random subspace ensemble, and all evaluation results are combined with the ensemble importance scores. Moreover, FRRM does not require the determination of the true number of clusters in advance through the use of the multiple-k ensemble idea. Experiments on various benchmark datasets were conducted to examine the properties of the proposed FRRM algorithm and to compare its performance with that of existing feature ranking methods. The experimental results demonstrated that the proposed FRRM outperformed the competitors.

Keywords: clustering analysis, multiple-k ensemble, random subspace-based feature evaluation, unsupervised feature ranking

Procedia PDF Downloads 309
107 Evaluation of E-Government Service Quality

Authors: Nguyen Manh Hien

Abstract:

Service quality is the highest requirement from users, especially for the service in electronic government. During the past decades, it has become a major area of academic investigation. Considering this issue, there are many researches that evaluated the dimensions and e-service contexts. This study also identified the dimensions of service quality but focused on a new conceptual and provides a new methodological in developing measurement scales of e-service quality such as information quality, service quality and organization quality. Finally, the study will suggest a key factor to evaluate e-government service quality better.

Keywords: dimensionality, e-government, e-service, e-service quality

Procedia PDF Downloads 506
106 Questioning the Predominant Feminism in Ahalya, a Short Film by Sujoy Ghosh

Authors: Somya Sharma

Abstract:

Ahalya, the critically acclaimed short film, is known to demolish the gender constructs of the age old myth of Ahalya. The paper tries to crack the overt meaning of the short film by reading between the dialogues and deconstructing the idea of the pseudo feminism in the short film Ahalya by Sujoy Ghosh. The film, by subverting the role of male character by making it seem submissive as compared to the female character's role seems to be just a surface level reading of the text. It seems that Sujoy Ghosh has played not just with changing the paradigm, but also trying to alter the history by doing so. The age old myth of putting Ahalya as a part of the five virgins (panchkanya) of Hindu mythology is explored in the paper. God's manoeuvre cannot be questioned and the two male characters tend to again shape the deed and the life of the female character, Ahalya. It is of importance to note that even in the 21st century, progressive actors like Radhika Apte fail to acknowledge the politics of altering history, not in a progressive way. The film blinds the viewer in the first watch to fall for the female strength and ownership of her sexuality, which is reflected in the opening scene itself where she opens the gate for the police man Indra Sen (representing God Indra who seduced her) who is charmed by her white dress. White, in Hindu mythology, stands for mourning, and this can be a hint towards the prophecy of what is about to come. Ahalya, bold, strong, and confident in this scene seems to be in total ownership of her sexual identity. Further, as the film progresses, control of Ahalya over her acts becomes even more dominant. In the myth of Ahalya, Gautama Maharishi, her husband, who wins her by Brahma's courtesy, curses her for her infidelity. She is then turned into a stone because of the curse and is redeemed when Lord Rama's foot brushes the stone. In the film, it is with the help of Ahalya that Goutam Sadhu turns Indra Sen into a stone doll. Ahalya is seen as a seductress who bewitches Indra Sen, and because the latter falls for the trap laid by the husband wife duo, he is turned into a doll. The attempt made by the paper is to read Ahalya as a character of the stand in wife who is yet again a pawn in the play of Goutama's revenge from Indra (who in the myth is able to escape from any curse or punishment for the act). The paper, therefore, reverts the idea which has till now been signified by the film and attempts to study the feminism this film appropriates. It is essential to break down the structure formed by such overt transgressing films in order to provide a real outlook of how feminism is twisted and moulded according to a man’s wishes.

Keywords: deconstructing, Hindu mythology, Panchkanya, predominant feminism, seductress, stone doll

Procedia PDF Downloads 212
105 Construction of the Large Scale Biological Networks from Microarrays

Authors: Fadhl Alakwaa

Abstract:

One of the sustainable goals of the system biology is understanding gene-gene interactions. Hence, gene regulatory networks (GRN) need to be constructed for understanding the disease ontology and to reduce the cost of drug development. To construct gene regulatory from gene expression we need to overcome many challenges such as data denoising and dimensionality. In this paper, we develop an integrated system to reduce data dimension and remove the noise. The generated network from our system was validated via available interaction databases and was compared to previous methods. The result revealed the performance of our proposed method.

Keywords: gene regulatory network, biclustering, denoising, system biology

Procedia PDF Downloads 212
104 Fast and Accurate Finite-Difference Method Solving Multicomponent Smoluchowski Coagulation Equation

Authors: Alexander P. Smirnov, Sergey A. Matveev, Dmitry A. Zheltkov, Eugene E. Tyrtyshnikov

Abstract:

We propose a new computational technique for multidimensional (multicomponent) Smoluchowski coagulation equation. Using low-rank approximations in Tensor Train format of both the solution and the coagulation kernel, we accelerate the classical finite-difference Runge-Kutta scheme keeping its level of accuracy. The complexity of the taken finite-difference scheme is reduced from O(N^2d) to O(d^2 N log N ), where N is the number of grid nodes and d is a dimensionality of the problem. The efficiency and the accuracy of the new method are demonstrated on concrete problem with known analytical solution.

Keywords: tensor train decomposition, multicomponent Smoluchowski equation, runge-kutta scheme, convolution

Procedia PDF Downloads 392
103 Features Dimensionality Reduction and Multi-Dimensional Voice-Processing Program to Parkinson Disease Discrimination

Authors: Djamila Meghraoui, Bachir Boudraa, Thouraya Meksen, M.Boudraa

Abstract:

Parkinson's disease is a pathology that involves characteristic perturbations in patients’ voices. This paper describes a proposed method that aims to diagnose persons with Parkinson (PWP) by analyzing on line their voices signals. First, Thresholds signals alterations are determined by the Multi-Dimensional Voice Program (MDVP). Principal Analysis (PCA) is exploited to select the main voice principal componentsthat are significantly affected in a patient. The decision phase is realized by a Mul-tinomial Bayes (MNB) Classifier that categorizes an analyzed voice in one of the two resulting classes: healthy or PWP. The prediction accuracy achieved reaching 98.8% is very promising.

Keywords: Parkinson’s disease recognition, PCA, MDVP, multinomial Naive Bayes

Procedia PDF Downloads 251
102 Dimensionality Reduction in Modal Analysis for Structural Health Monitoring

Authors: Elia Favarelli, Enrico Testi, Andrea Giorgetti

Abstract:

Autonomous structural health monitoring (SHM) of many structures and bridges became a topic of paramount importance for maintenance purposes and safety reasons. This paper proposes a set of machine learning (ML) tools to perform automatic feature selection and detection of anomalies in a bridge from vibrational data and compare different feature extraction schemes to increase the accuracy and reduce the amount of data collected. As a case study, the Z-24 bridge is considered because of the extensive database of accelerometric data in both standard and damaged conditions. The proposed framework starts from the first four fundamental frequencies extracted through operational modal analysis (OMA) and clustering, followed by density-based time-domain filtering (tracking). The fundamental frequencies extracted are then fed to a dimensionality reduction block implemented through two different approaches: feature selection (intelligent multiplexer) that tries to estimate the most reliable frequencies based on the evaluation of some statistical features (i.e., mean value, variance, kurtosis), and feature extraction (auto-associative neural network (ANN)) that combine the fundamental frequencies to extract new damage sensitive features in a low dimensional feature space. Finally, one class classifier (OCC) algorithms perform anomaly detection, trained with standard condition points, and tested with normal and anomaly ones. In particular, a new anomaly detector strategy is proposed, namely one class classifier neural network two (OCCNN2), which exploit the classification capability of standard classifiers in an anomaly detection problem, finding the standard class (the boundary of the features space in normal operating conditions) through a two-step approach: coarse and fine boundary estimation. The coarse estimation uses classics OCC techniques, while the fine estimation is performed through a feedforward neural network (NN) trained that exploits the boundaries estimated in the coarse step. The detection algorithms vare then compared with known methods based on principal component analysis (PCA), kernel principal component analysis (KPCA), and auto-associative neural network (ANN). In many cases, the proposed solution increases the performance with respect to the standard OCC algorithms in terms of F1 score and accuracy. In particular, by evaluating the correct features, the anomaly can be detected with accuracy and an F1 score greater than 96% with the proposed method.

Keywords: anomaly detection, frequencies selection, modal analysis, neural network, sensor network, structural health monitoring, vibration measurement

Procedia PDF Downloads 97
101 Strong Antiferromagnetic Super Exchange in AgF2

Authors: Wojciech Grochala

Abstract:

AgF2 is an important two-dimensional antiferromagnet and an analogue of [CuO2]2– sheet. However, the strength of magnetic superexchange as well as magnetic dimensionality have not been explored before . Here we report our recent Raman and neutron scattering experiments which led to better understanding of the magnetic properties of the title compound. It turns out that intra-sheet magnetic superexchange constant reaches 70 meV, thus some 2/3 of the value measured for parent compounds of oxocuprate superconductors which is over 100 meV. The ratio of intra-to-inter-sheet superexchange constants is of the order of 102 rendering AgF2 a quasi-2D material, similar to the said oxocuprates. The quantum mechanical calculations reproduce the abovementioned values quite well and they point out to substantial covalence of the Ag–F bonding. After 3 decades of intense research on layered oxocuprates, AgF2 now stands as a second-to-none analogue of these fascinating systems. It remains to be seen whether this 012 parent compound may be doped in order to achieve superconductivity.

Keywords: antiferromagnets, superexchange, silver, fluorine

Procedia PDF Downloads 104
100 Broad Survey of Fine Root Traits to Investigate the Root Economic Spectrum Hypothesis and Plant-Fire Dynamics Worldwide

Authors: Jacob Lewis Watts, Adam F. A. Pellegrini

Abstract:

Prairies, grasslands, and forests cover an expansive portion of the world’s surface and contribute significantly to Earth’s carbon cycle. The largest driver of carbon dynamics in some of these ecosystems is fire. As the global climate changes, most fire-dominated ecosystems will experience increased fire frequency and intensity, leading to increased carbon flux into the atmosphere and soil nutrient depletion. The plant communities associated with different fire regimes are important for reassimilation of carbon lost during fire and soil recovery. More frequent fires promote conservative plant functional traits aboveground; however, belowground fine root traits are poorly explored and arguably more important drivers of ecosystem function as the primary interface between the soil and plant. The root economic spectrum (RES) hypothesis describes single-dimensional covariation between important fine-root traits along a range of plant strategies from acquisitive to conservative – parallel to the well-established leaf economic spectrum (LES). However, because of the paucity of root trait data, the complex nature of the rhizosphere, and the phylogenetic conservatism of root traits, it is unknown whether the RES hypothesis accurately describes plant nutrient and water acquisition strategies. This project utilizesplants grown in common garden conditions in the Cambridge University Botanic Garden and a meta-analysis of long-term fire manipulation experiments to examine the belowground physiological traits of fire-adapted and non-fire-adapted herbaceous species to 1) test the RES hypothesis and 2) describe the effect of fire regimes on fine root functional traits – which in turn affect carbon and nutrient cycling. A suite of morphological, chemical, and biological root traits (e.g. root diameter, specific root length, percent N, percent mycorrhizal colonization, etc.) of 50 herbaceous species were measuredand tested for phylogenetic conservatism and RES dimensionality. Fire-adapted and non-fire-adapted plants traits were compared using phylogenetic PCA techniques. Preliminary evidence suggests that phylogenetic conservatism may weaken the single-dimensionality of the RES, suggesting that there may not be a single way that plants optimize nutrient and water acquisition and storage in the complex rhizosphere; additionally, fire-adapted species are expected to be more conservative than non-fire-adapted species, which may be indicative of slower carbon cycling with increasing fire frequency and intensity.

Keywords: climate change, fire regimes, root economic spectrum, fine roots

Procedia PDF Downloads 90
99 Curvelet Features with Mouth and Face Edge Ratios for Facial Expression Identification

Authors: S. Kherchaoui, A. Houacine

Abstract:

This paper presents a facial expression recognition system. It performs identification and classification of the seven basic expressions; happy, surprise, fear, disgust, sadness, anger, and neutral states. It consists of three main parts. The first one is the detection of a face and the corresponding facial features to extract the most expressive portion of the face, followed by a normalization of the region of interest. Then calculus of curvelet coefficients is performed with dimensionality reduction through principal component analysis. The resulting coefficients are combined with two ratios; mouth ratio and face edge ratio to constitute the whole feature vector. The third step is the classification of the emotional state using the SVM method in the feature space.

Keywords: facial expression identification, curvelet coefficient, support vector machine (SVM), recognition system

Procedia PDF Downloads 213
98 Random Subspace Ensemble of CMAC Classifiers

Authors: Somaiyeh Dehghan, Mohammad Reza Kheirkhahan Haghighi

Abstract:

The rapid growth of domains that have data with a large number of features, while the number of samples is limited has caused difficulty in constructing strong classifiers. To reduce the dimensionality of the feature space becomes an essential step in classification task. Random subspace method (or attribute bagging) is an ensemble classifier that consists of several classifiers that each base learner in ensemble has subset of features. In the present paper, we introduce Random Subspace Ensemble of CMAC neural network (RSE-CMAC), each of which has training with subset of features. Then we use this model for classification task. For evaluation performance of our model, we compare it with bagging algorithm on 36 UCI datasets. The results reveal that the new model has better performance.

Keywords: classification, random subspace, ensemble, CMAC neural network

Procedia PDF Downloads 305
97 Homosexuality in Burundi and Homosexuals Rights

Authors: Ciza Didier

Abstract:

By definition, homosexuality designates the sexual or amorous attraction towards a person of the same sex or of the same gender as one's own. The Burundi country has superficially 27834km2 with 13 millions of population. There are groups of certain people assuming that they are homosexual and that they want to claim their rights. Burundian homosexuals often organise seminars in the premises of the National Health Security Agency (NHSA) located at Kigobe quarter, in Bujumbura, this is the place where they meet to try to exchange and create their association for claim their rights. There are 2 categories of homosexuals: - gays: homosexuality between men (male sex) - lesbians: homosexuality between women (female sex) In the gay couple, there is one who behaves like a woman and often wears feminine styles while the other always remains like a man and always wears masculine styles. In the lesbian couple, there is one who behaves like a man and wears men's styles while the other remains as she is like a woman. In general, Burundian society is against homosexuality. Our society sees them as pariahs carrying a curse. According to Burundian culture and customs, homosexuality is satanic, therefore it is a great sin. In April 2011, Burundian President Pierre Nkurunziza signed a law criminalizing homosexual acts and providing for a sentence of three months to two years in prison, as well as a fine of BIF 50,000 to BIF 100,000 for any homosexual behavior. The investigation recently done shows that out of 300 people questioned, 299 were against homosexuality saying that it is against Burundian culture and 1 was for homosexuality. All Burundians are not against homosexuality. Their country must therefore take into consideration the small party of people who are for homosexuality. Homosexuals, too, need to live like others.

Keywords: homosexuality, lesbian, gay, law

Procedia PDF Downloads 41
96 Mathematical Based Forecasting of Heart Attack

Authors: Razieh Khalafi

Abstract:

Myocardial infarction (MI) or acute myocardial infarction (AMI), commonly known as a heart attack, occurs when blood flow stops to part of the heart causing damage to the heart muscle. An ECG can often show evidence of a previous heart attack or one that's in progress. The patterns on the ECG may indicate which part of your heart has been damaged, as well as the extent of the damage. In chaos theory, the correlation dimension is a measure of the dimensionality of the space occupied by a set of random points, often referred to as a type of fractal dimension. In this research by considering ECG signal as a random walk we work on forecasting the oncoming heart attack by analyzing the ECG signals using the correlation dimension. In order to test the model a set of ECG signals for patients before and after heart attack was used and the strength of model for forecasting the behavior of these signals were checked. Results shows this methodology can forecast the ECG and accordingly heart attack with high accuracy.

Keywords: heart attack, ECG, random walk, correlation dimension, forecasting

Procedia PDF Downloads 507
95 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs

Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro

Abstract:

This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.

Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression

Procedia PDF Downloads 415
94 A New Mathematical Method for Heart Attack Forecasting

Authors: Razi Khalafi

Abstract:

Myocardial Infarction (MI) or acute Myocardial Infarction (AMI), commonly known as a heart attack, occurs when blood flow stops to part of the heart causing damage to the heart muscle. An ECG can often show evidence of a previous heart attack or one that's in progress. The patterns on the ECG may indicate which part of your heart has been damaged, as well as the extent of the damage. In chaos theory, the correlation dimension is a measure of the dimensionality of the space occupied by a set of random points, often referred to as a type of fractal dimension. In this research by considering ECG signal as a random walk we work on forecasting the oncoming heart attack by analysing the ECG signals using the correlation dimension. In order to test the model a set of ECG signals for patients before and after heart attack was used and the strength of model for forecasting the behaviour of these signals were checked. Results show this methodology can forecast the ECG and accordingly heart attack with high accuracy.

Keywords: heart attack, ECG, random walk, correlation dimension, forecasting

Procedia PDF Downloads 469
93 Extended Arithmetic Precision in Meshfree Calculations

Authors: Edward J. Kansa, Pavel Holoborodko

Abstract:

Continuously differentiable radial basis functions (RBFs) are meshfree, converge faster as the dimensionality increases, and is theoretically spectrally convergent. When implemented on current single and double precision computers, such RBFs can suffer from ill-conditioning because the systems of equations needed to be solved to find the expansion coefficients are full. However, the Advanpix extended precision software package allows computer mathematics to resemble asymptotically ideal Platonic mathematics. Additionally, full systems with extended precision execute faster graphical processors units and field-programmable gate arrays because no branching is needed. Sparse equation systems are fast for iterative solvers in a very limited number of cases.

Keywords: partial differential equations, Meshfree radial basis functions, , no restrictions on spatial dimensions, Extended arithmetic precision.

Procedia PDF Downloads 125
92 2.5D Face Recognition Using Gabor Discrete Cosine Transform

Authors: Ali Cheraghian, Farshid Hajati, Soheila Gheisari, Yongsheng Gao

Abstract:

In this paper, we present a novel 2.5D face recognition method based on Gabor Discrete Cosine Transform (GDCT). In the proposed method, the Gabor filter is applied to extract feature vectors from the texture and the depth information. Then, Discrete Cosine Transform (DCT) is used for dimensionality and redundancy reduction to improve computational efficiency. The system is combined texture and depth information in the decision level, which presents higher performance compared to methods, which use texture and depth information, separately. The proposed algorithm is examined on publically available Bosphorus database including models with pose variation. The experimental results show that the proposed method has a higher performance compared to the benchmark.

Keywords: Gabor filter, discrete cosine transform, 2.5d face recognition, pose

Procedia PDF Downloads 300
91 Quantum Computing with Qudits on a Graph

Authors: Aleksey Fedorov

Abstract:

Building a scalable platform for quantum computing remains one of the most challenging tasks in quantum science and technologies. However, the implementation of most important quantum operations with qubits (quantum analogues of classical bits), such as multiqubit Toffoli gate, requires either a polynomial number of operation or a linear number of operations with the use of ancilla qubits. Therefore, the reduction of the number of operations in the presence of scalability is a crucial goal in quantum information processing. One of the most elegant ideas in this direction is to use qudits (multilevel systems) instead of qubits and rely on additional levels of qudits instead of ancillas. Although some of the already obtained results demonstrate a reduction of the number of operation, they suffer from high complexity and/or of the absence of scalability. We show a strong reduction of the number of operations for the realization of the Toffoli gate by using qudits for a scalable multi-qudit processor. This is done on the basis of a general relation between the dimensionality of qudits and their topology of connections, that we derived.

Keywords: quantum computing, qudits, Toffoli gates, gate decomposition

Procedia PDF Downloads 119
90 Approach Based on Fuzzy C-Means for Band Selection in Hyperspectral Images

Authors: Diego Saqui, José H. Saito, José R. Campos, Lúcio A. de C. Jorge

Abstract:

Hyperspectral images and remote sensing are important for many applications. A problem in the use of these images is the high volume of data to be processed, stored and transferred. Dimensionality reduction techniques can be used to reduce the volume of data. In this paper, an approach to band selection based on clustering algorithms is presented. This approach allows to reduce the volume of data. The proposed structure is based on Fuzzy C-Means (or K-Means) and NWHFC algorithms. New attributes in relation to other studies in the literature, such as kurtosis and low correlation, are also considered. A comparison of the results of the approach using the Fuzzy C-Means and K-Means with different attributes is performed. The use of both algorithms show similar good results but, particularly when used attributes variance and kurtosis in the clustering process, however applicable in hyperspectral images.

Keywords: band selection, fuzzy c-means, k-means, hyperspectral image

Procedia PDF Downloads 373
89 A t-SNE and UMAP Based Neural Network Image Classification Algorithm

Authors: Shelby Simpson, William Stanley, Namir Naba, Xiaodi Wang

Abstract:

Both t-SNE and UMAP are brand new state of art tools to predominantly preserve the local structure that is to group neighboring data points together, which indeed provides a very informative visualization of heterogeneity in our data. In this research, we develop a t-SNE and UMAP base neural network image classification algorithm to embed the original dataset to a corresponding low dimensional dataset as a preprocessing step, then use this embedded database as input to our specially designed neural network classifier for image classification. We use the fashion MNIST data set, which is a labeled data set of images of clothing objects in our experiments. t-SNE and UMAP are used for dimensionality reduction of the data set and thus produce low dimensional embeddings. Furthermore, we use the embeddings from t-SNE and UMAP to feed into two neural networks. The accuracy of the models from the two neural networks is then compared to a dense neural network that does not use embedding as an input to show which model can classify the images of clothing objects more accurately.

Keywords: t-SNE, UMAP, fashion MNIST, neural networks

Procedia PDF Downloads 167
88 Stacking Ensemble Approach for Combining Different Methods in Real Estate Prediction

Authors: Sol Girouard, Zona Kostic

Abstract:

A home is often the largest and most expensive purchase a person makes. Whether the decision leads to a successful outcome will be determined by a combination of critical factors. In this paper, we propose a method that efficiently handles all the factors in residential real estate and performs predictions given a feature space with high dimensionality while controlling for overfitting. The proposed method was built on gradient descent and boosting algorithms and uses a mixed optimizing technique to improve the prediction power. Usually, a single model cannot handle all the cases thus our approach builds multiple models based on different subsets of the predictors. The algorithm was tested on 3 million homes across the U.S., and the experimental results demonstrate the efficiency of this approach by outperforming techniques currently used in forecasting prices. With everyday changes on the real estate market, our proposed algorithm capitalizes from new events allowing more efficient predictions.

Keywords: real estate prediction, gradient descent, boosting, ensemble methods, active learning, training

Procedia PDF Downloads 249
87 Sentiment Classification of Documents

Authors: Swarnadip Ghosh

Abstract:

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it determines whether a piece of writing is positive, negative or neutral.Sentiment analysis of documents holds great importance in today's world, when numerous information is stored in databases and in the world wide web. An efficient algorithm to illicit such information, would be beneficial for social, economic as well as medical purposes. In this project, we have developed an algorithm to classify a document into positive or negative. Using our algorithm, we obtained a feature set from the data, and classified the documents based on this feature set. It is important to note that, in the classification, we have not used the independence assumption, which is considered by many procedures like the Naive Bayes. This makes the algorithm more general in scope. Moreover, because of the sparsity and high dimensionality of such data, we did not use empirical distribution for estimation, but developed a method by finding degree of close clustering of the data points. We have applied our algorithm on a movie review data set obtained from IMDb and obtained satisfactory results.

Keywords: sentiment, Run's Test, cross validation, higher dimensional pmf estimation

Procedia PDF Downloads 375
86 Facial Pose Classification Using Hilbert Space Filling Curve and Multidimensional Scaling

Authors: Mekamı Hayet, Bounoua Nacer, Benabderrahmane Sidahmed, Taleb Ahmed

Abstract:

Pose estimation is an important task in computer vision. Though the majority of the existing solutions provide good accuracy results, they are often overly complex and computationally expensive. In this perspective, we propose the use of dimensionality reduction techniques to address the problem of facial pose estimation. Firstly, a face image is converted into one-dimensional time series using Hilbert space filling curve, then the approach converts these time series data to a symbolic representation. Furthermore, a distance matrix is calculated between symbolic series of an input learning dataset of images, to generate classifiers of frontal vs. profile face pose. The proposed method is evaluated with three public datasets. Experimental results have shown that our approach is able to achieve a correct classification rate exceeding 97% with K-NN algorithm.

Keywords: machine learning, pattern recognition, facial pose classification, time series

Procedia PDF Downloads 324
85 Sparse Unmixing of Hyperspectral Data by Exploiting Joint-Sparsity and Rank-Deficiency

Authors: Fanqiang Kong, Chending Bian

Abstract:

In this work, we exploit two assumed properties of the abundances of the observed signatures (endmembers) in order to reconstruct the abundances from hyperspectral data. Joint-sparsity is the first property of the abundances, which assumes the adjacent pixels can be expressed as different linear combinations of same materials. The second property is rank-deficiency where the number of endmembers participating in hyperspectral data is very small compared with the dimensionality of spectral library, which means that the abundances matrix of the endmembers is a low-rank matrix. These assumptions lead to an optimization problem for the sparse unmixing model that requires minimizing a combined l2,p-norm and nuclear norm. We propose a variable splitting and augmented Lagrangian algorithm to solve the optimization problem. Experimental evaluation carried out on synthetic and real hyperspectral data shows that the proposed method outperforms the state-of-the-art algorithms with a better spectral unmixing accuracy.

Keywords: hyperspectral unmixing, joint-sparse, low-rank representation, abundance estimation

Procedia PDF Downloads 222
84 A Data-Driven Monitoring Technique Using Combined Anomaly Detectors

Authors: Fouzi Harrou, Ying Sun, Sofiane Khadraoui

Abstract:

Anomaly detection based on Principal Component Analysis (PCA) was studied intensively and largely applied to multivariate processes with highly cross-correlated process variables. Monitoring metrics such as the Hotelling's T2 and the Q statistics are usually used in PCA-based monitoring to elucidate the pattern variations in the principal and residual subspaces, respectively. However, these metrics are ill suited to detect small faults. In this paper, the Exponentially Weighted Moving Average (EWMA) based on the Q and T statistics, T2-EWMA and Q-EWMA, were developed for detecting faults in the process mean. The performance of the proposed methods was compared with that of the conventional PCA-based fault detection method using synthetic data. The results clearly show the benefit and the effectiveness of the proposed methods over the conventional PCA method, especially for detecting small faults in highly correlated multivariate data.

Keywords: data-driven method, process control, anomaly detection, dimensionality reduction

Procedia PDF Downloads 271
83 An Approach for Multilayered Ecological Networks

Authors: N. F. F. Ebecken, G. C. Pereira

Abstract:

Although networks provide a powerful approach to the study of a wide variety of ecological systems, their formulation usually does not include various types of interactions, interactions that vary in space and time, and interconnected systems such as networks. The emerging field of 'multilayer networks' provides a natural framework for extending ecological systems analysis to include these multiple layers of complexity as it specifically allows for differentiation and modeling of intralayer and interlayer connectivity. The structure provides a set of concepts and tools that can be adapted and applied to the ecology, facilitating research in high dimensionality, heterogeneous systems in nature. Here, ecological multilayer networks are formally defined based on a review of prior and related approaches, illustrates their application and potential with existing data analyzes, and discusses limitations, challenges, and future applications. The integration of multilayer network theory into ecology offers a largely untapped potential to further address ecological complexity, to finally provide new theoretical and empirical insights into the architecture and dynamics of ecological systems.

Keywords: ecological networks, multilayered networks, sea ecology, Brazilian Coastal Area

Procedia PDF Downloads 122
82 A Chinese Nested Named Entity Recognition Model Based on Lexical Features

Authors: Shuo Liu, Dan Liu

Abstract:

In the field of named entity recognition, most of the research has been conducted around simple entities. However, for nested named entities, which still contain entities within entities, it has been difficult to identify them accurately due to their boundary ambiguity. In this paper, a hierarchical recognition model is constructed based on the grammatical structure and semantic features of Chinese text for boundary calculation based on lexical features. The analysis is carried out at different levels in terms of granularity, semantics, and lexicality, respectively, avoiding repetitive work to reduce computational effort and using the semantic features of words to calculate the boundaries of entities to improve the accuracy of the recognition work. The results of the experiments carried out on web-based microblogging data show that the model achieves an accuracy of 86.33% and an F1 value of 89.27% in recognizing nested named entities, making up for the shortcomings of some previous recognition models and improving the efficiency of recognition of nested named entities.

Keywords: coarse-grained, nested named entity, Chinese natural language processing, word embedding, T-SNE dimensionality reduction algorithm

Procedia PDF Downloads 101
81 On the Estimation of Crime Rate in the Southwest of Nigeria: Principal Component Analysis Approach

Authors: Kayode Balogun, Femi Ayoola

Abstract:

Crime is at alarming rate in this part of world and there are many factors that are contributing to this antisocietal behaviour both among the youths and old. In this work, principal component analysis (PCA) was used as a tool to reduce the dimensionality and to really know those variables that were crime prone in the study region. Data were collected on twenty-eight crime variables from National Bureau of Statistics (NBS) databank for a period of fifteen years, while retaining as much of the information as possible. We use PCA in this study to know the number of major variables and contributors to the crime in the Southwest Nigeria. The results of our analysis revealed that there were eight principal variables have been retained using the Scree plot and Loading plot which implies an eight-equation solution will be appropriate for the data. The eight components explained 93.81% of the total variation in the data set. We also found that the highest and commonly committed crimes in the Southwestern Nigeria were: Assault, Grievous Harm and Wounding, theft/stealing, burglary, house breaking, false pretence, unlawful arms possession and breach of public peace.

Keywords: crime rates, data, Southwest Nigeria, principal component analysis, variables

Procedia PDF Downloads 414
80 Distributed Perceptually Important Point Identification for Time Series Data Mining

Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung

Abstract:

In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.

Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining

Procedia PDF Downloads 400
79 Dimensionality and Superconducting Parameters of YBa2Cu3O7 Foams

Authors: Michael Koblischka, Anjela Koblischka-Veneva, XianLin Zeng, Essia Hannachi, Yassine Slimani

Abstract:

Superconducting foams of YBa2Cu3O7 (abbreviated Y-123) were produced using the infiltration growth (IG) technique from Y2BaCuO5 (Y-211) foams. The samples were investigated by SEM (scanning electron microscopy) and electrical resistivity measurements. SEM observations indicated the specific microstructure of the foam struts with numerous tiny Y-211 particles (50-100 nm diameter) embedded in channel-like structures between the Y-123 grains. The investigation of the excess conductivity of different prepared composites was analyzed using Aslamazov-Larkin (AL) model. The investigated samples comprised of five distinct fluctuation regimes, namely short-wave (SWF), one-dimensional (1D), two-dimensional (2D), three-dimensional (3D), and critical (CR) fluctuations regimes. The coherence length along the c-axis at zero-temperature (ξc(0)), lower and upper critical magnetic fields (Bc1 and Bc2), critical current density (Jc) and numerous other superconducting parameters were estimated from the data. The analysis reveals that the presence of the tiny Y-211 particles alters the excess conductivity and the fluctuation behavior observed in standard YBCO samples.

Keywords: Excess conductivity, Foam, Microstructure, Superconductor YBa2Cu3Oy

Procedia PDF Downloads 141