Search results for: dimensionality
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 120

Search results for: dimensionality

90 Multidimensional Item Response Theory Models for Practical Application in Large Tests Designed to Measure Multiple Constructs

Authors: Maria Fernanda Ordoñez Martinez, Alvaro Mauricio Montenegro

Abstract:

This work presents a statistical methodology for measuring and founding constructs in Latent Semantic Analysis. This approach uses the qualities of Factor Analysis in binary data with interpretations present on Item Response Theory. More precisely, we propose initially reducing dimensionality with specific use of Principal Component Analysis for the linguistic data and then, producing axes of groups made from a clustering analysis of the semantic data. This approach allows the user to give meaning to previous clusters and found the real latent structure presented by data. The methodology is applied in a set of real semantic data presenting impressive results for the coherence, speed and precision.

Keywords: semantic analysis, factorial analysis, dimension reduction, penalized logistic regression

Procedia PDF Downloads 408
89 A New Mathematical Method for Heart Attack Forecasting

Authors: Razi Khalafi

Abstract:

Myocardial Infarction (MI) or acute Myocardial Infarction (AMI), commonly known as a heart attack, occurs when blood flow stops to part of the heart causing damage to the heart muscle. An ECG can often show evidence of a previous heart attack or one that's in progress. The patterns on the ECG may indicate which part of your heart has been damaged, as well as the extent of the damage. In chaos theory, the correlation dimension is a measure of the dimensionality of the space occupied by a set of random points, often referred to as a type of fractal dimension. In this research by considering ECG signal as a random walk we work on forecasting the oncoming heart attack by analysing the ECG signals using the correlation dimension. In order to test the model a set of ECG signals for patients before and after heart attack was used and the strength of model for forecasting the behaviour of these signals were checked. Results show this methodology can forecast the ECG and accordingly heart attack with high accuracy.

Keywords: heart attack, ECG, random walk, correlation dimension, forecasting

Procedia PDF Downloads 464
88 Extended Arithmetic Precision in Meshfree Calculations

Authors: Edward J. Kansa, Pavel Holoborodko

Abstract:

Continuously differentiable radial basis functions (RBFs) are meshfree, converge faster as the dimensionality increases, and is theoretically spectrally convergent. When implemented on current single and double precision computers, such RBFs can suffer from ill-conditioning because the systems of equations needed to be solved to find the expansion coefficients are full. However, the Advanpix extended precision software package allows computer mathematics to resemble asymptotically ideal Platonic mathematics. Additionally, full systems with extended precision execute faster graphical processors units and field-programmable gate arrays because no branching is needed. Sparse equation systems are fast for iterative solvers in a very limited number of cases.

Keywords: partial differential equations, Meshfree radial basis functions, , no restrictions on spatial dimensions, Extended arithmetic precision.

Procedia PDF Downloads 114
87 2.5D Face Recognition Using Gabor Discrete Cosine Transform

Authors: Ali Cheraghian, Farshid Hajati, Soheila Gheisari, Yongsheng Gao

Abstract:

In this paper, we present a novel 2.5D face recognition method based on Gabor Discrete Cosine Transform (GDCT). In the proposed method, the Gabor filter is applied to extract feature vectors from the texture and the depth information. Then, Discrete Cosine Transform (DCT) is used for dimensionality and redundancy reduction to improve computational efficiency. The system is combined texture and depth information in the decision level, which presents higher performance compared to methods, which use texture and depth information, separately. The proposed algorithm is examined on publically available Bosphorus database including models with pose variation. The experimental results show that the proposed method has a higher performance compared to the benchmark.

Keywords: Gabor filter, discrete cosine transform, 2.5d face recognition, pose

Procedia PDF Downloads 292
86 Quantum Computing with Qudits on a Graph

Authors: Aleksey Fedorov

Abstract:

Building a scalable platform for quantum computing remains one of the most challenging tasks in quantum science and technologies. However, the implementation of most important quantum operations with qubits (quantum analogues of classical bits), such as multiqubit Toffoli gate, requires either a polynomial number of operation or a linear number of operations with the use of ancilla qubits. Therefore, the reduction of the number of operations in the presence of scalability is a crucial goal in quantum information processing. One of the most elegant ideas in this direction is to use qudits (multilevel systems) instead of qubits and rely on additional levels of qudits instead of ancillas. Although some of the already obtained results demonstrate a reduction of the number of operation, they suffer from high complexity and/or of the absence of scalability. We show a strong reduction of the number of operations for the realization of the Toffoli gate by using qudits for a scalable multi-qudit processor. This is done on the basis of a general relation between the dimensionality of qudits and their topology of connections, that we derived.

Keywords: quantum computing, qudits, Toffoli gates, gate decomposition

Procedia PDF Downloads 112
85 Approach Based on Fuzzy C-Means for Band Selection in Hyperspectral Images

Authors: Diego Saqui, José H. Saito, José R. Campos, Lúcio A. de C. Jorge

Abstract:

Hyperspectral images and remote sensing are important for many applications. A problem in the use of these images is the high volume of data to be processed, stored and transferred. Dimensionality reduction techniques can be used to reduce the volume of data. In this paper, an approach to band selection based on clustering algorithms is presented. This approach allows to reduce the volume of data. The proposed structure is based on Fuzzy C-Means (or K-Means) and NWHFC algorithms. New attributes in relation to other studies in the literature, such as kurtosis and low correlation, are also considered. A comparison of the results of the approach using the Fuzzy C-Means and K-Means with different attributes is performed. The use of both algorithms show similar good results but, particularly when used attributes variance and kurtosis in the clustering process, however applicable in hyperspectral images.

Keywords: band selection, fuzzy c-means, k-means, hyperspectral image

Procedia PDF Downloads 366
84 A t-SNE and UMAP Based Neural Network Image Classification Algorithm

Authors: Shelby Simpson, William Stanley, Namir Naba, Xiaodi Wang

Abstract:

Both t-SNE and UMAP are brand new state of art tools to predominantly preserve the local structure that is to group neighboring data points together, which indeed provides a very informative visualization of heterogeneity in our data. In this research, we develop a t-SNE and UMAP base neural network image classification algorithm to embed the original dataset to a corresponding low dimensional dataset as a preprocessing step, then use this embedded database as input to our specially designed neural network classifier for image classification. We use the fashion MNIST data set, which is a labeled data set of images of clothing objects in our experiments. t-SNE and UMAP are used for dimensionality reduction of the data set and thus produce low dimensional embeddings. Furthermore, we use the embeddings from t-SNE and UMAP to feed into two neural networks. The accuracy of the models from the two neural networks is then compared to a dense neural network that does not use embedding as an input to show which model can classify the images of clothing objects more accurately.

Keywords: t-SNE, UMAP, fashion MNIST, neural networks

Procedia PDF Downloads 160
83 Stacking Ensemble Approach for Combining Different Methods in Real Estate Prediction

Authors: Sol Girouard, Zona Kostic

Abstract:

A home is often the largest and most expensive purchase a person makes. Whether the decision leads to a successful outcome will be determined by a combination of critical factors. In this paper, we propose a method that efficiently handles all the factors in residential real estate and performs predictions given a feature space with high dimensionality while controlling for overfitting. The proposed method was built on gradient descent and boosting algorithms and uses a mixed optimizing technique to improve the prediction power. Usually, a single model cannot handle all the cases thus our approach builds multiple models based on different subsets of the predictors. The algorithm was tested on 3 million homes across the U.S., and the experimental results demonstrate the efficiency of this approach by outperforming techniques currently used in forecasting prices. With everyday changes on the real estate market, our proposed algorithm capitalizes from new events allowing more efficient predictions.

Keywords: real estate prediction, gradient descent, boosting, ensemble methods, active learning, training

Procedia PDF Downloads 241
82 Sentiment Classification of Documents

Authors: Swarnadip Ghosh

Abstract:

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it determines whether a piece of writing is positive, negative or neutral.Sentiment analysis of documents holds great importance in today's world, when numerous information is stored in databases and in the world wide web. An efficient algorithm to illicit such information, would be beneficial for social, economic as well as medical purposes. In this project, we have developed an algorithm to classify a document into positive or negative. Using our algorithm, we obtained a feature set from the data, and classified the documents based on this feature set. It is important to note that, in the classification, we have not used the independence assumption, which is considered by many procedures like the Naive Bayes. This makes the algorithm more general in scope. Moreover, because of the sparsity and high dimensionality of such data, we did not use empirical distribution for estimation, but developed a method by finding degree of close clustering of the data points. We have applied our algorithm on a movie review data set obtained from IMDb and obtained satisfactory results.

Keywords: sentiment, Run's Test, cross validation, higher dimensional pmf estimation

Procedia PDF Downloads 369
81 Facial Pose Classification Using Hilbert Space Filling Curve and Multidimensional Scaling

Authors: Mekamı Hayet, Bounoua Nacer, Benabderrahmane Sidahmed, Taleb Ahmed

Abstract:

Pose estimation is an important task in computer vision. Though the majority of the existing solutions provide good accuracy results, they are often overly complex and computationally expensive. In this perspective, we propose the use of dimensionality reduction techniques to address the problem of facial pose estimation. Firstly, a face image is converted into one-dimensional time series using Hilbert space filling curve, then the approach converts these time series data to a symbolic representation. Furthermore, a distance matrix is calculated between symbolic series of an input learning dataset of images, to generate classifiers of frontal vs. profile face pose. The proposed method is evaluated with three public datasets. Experimental results have shown that our approach is able to achieve a correct classification rate exceeding 97% with K-NN algorithm.

Keywords: machine learning, pattern recognition, facial pose classification, time series

Procedia PDF Downloads 319
80 Sparse Unmixing of Hyperspectral Data by Exploiting Joint-Sparsity and Rank-Deficiency

Authors: Fanqiang Kong, Chending Bian

Abstract:

In this work, we exploit two assumed properties of the abundances of the observed signatures (endmembers) in order to reconstruct the abundances from hyperspectral data. Joint-sparsity is the first property of the abundances, which assumes the adjacent pixels can be expressed as different linear combinations of same materials. The second property is rank-deficiency where the number of endmembers participating in hyperspectral data is very small compared with the dimensionality of spectral library, which means that the abundances matrix of the endmembers is a low-rank matrix. These assumptions lead to an optimization problem for the sparse unmixing model that requires minimizing a combined l2,p-norm and nuclear norm. We propose a variable splitting and augmented Lagrangian algorithm to solve the optimization problem. Experimental evaluation carried out on synthetic and real hyperspectral data shows that the proposed method outperforms the state-of-the-art algorithms with a better spectral unmixing accuracy.

Keywords: hyperspectral unmixing, joint-sparse, low-rank representation, abundance estimation

Procedia PDF Downloads 214
79 A Data-Driven Monitoring Technique Using Combined Anomaly Detectors

Authors: Fouzi Harrou, Ying Sun, Sofiane Khadraoui

Abstract:

Anomaly detection based on Principal Component Analysis (PCA) was studied intensively and largely applied to multivariate processes with highly cross-correlated process variables. Monitoring metrics such as the Hotelling's T2 and the Q statistics are usually used in PCA-based monitoring to elucidate the pattern variations in the principal and residual subspaces, respectively. However, these metrics are ill suited to detect small faults. In this paper, the Exponentially Weighted Moving Average (EWMA) based on the Q and T statistics, T2-EWMA and Q-EWMA, were developed for detecting faults in the process mean. The performance of the proposed methods was compared with that of the conventional PCA-based fault detection method using synthetic data. The results clearly show the benefit and the effectiveness of the proposed methods over the conventional PCA method, especially for detecting small faults in highly correlated multivariate data.

Keywords: data-driven method, process control, anomaly detection, dimensionality reduction

Procedia PDF Downloads 263
78 An Approach for Multilayered Ecological Networks

Authors: N. F. F. Ebecken, G. C. Pereira

Abstract:

Although networks provide a powerful approach to the study of a wide variety of ecological systems, their formulation usually does not include various types of interactions, interactions that vary in space and time, and interconnected systems such as networks. The emerging field of 'multilayer networks' provides a natural framework for extending ecological systems analysis to include these multiple layers of complexity as it specifically allows for differentiation and modeling of intralayer and interlayer connectivity. The structure provides a set of concepts and tools that can be adapted and applied to the ecology, facilitating research in high dimensionality, heterogeneous systems in nature. Here, ecological multilayer networks are formally defined based on a review of prior and related approaches, illustrates their application and potential with existing data analyzes, and discusses limitations, challenges, and future applications. The integration of multilayer network theory into ecology offers a largely untapped potential to further address ecological complexity, to finally provide new theoretical and empirical insights into the architecture and dynamics of ecological systems.

Keywords: ecological networks, multilayered networks, sea ecology, Brazilian Coastal Area

Procedia PDF Downloads 116
77 A Chinese Nested Named Entity Recognition Model Based on Lexical Features

Authors: Shuo Liu, Dan Liu

Abstract:

In the field of named entity recognition, most of the research has been conducted around simple entities. However, for nested named entities, which still contain entities within entities, it has been difficult to identify them accurately due to their boundary ambiguity. In this paper, a hierarchical recognition model is constructed based on the grammatical structure and semantic features of Chinese text for boundary calculation based on lexical features. The analysis is carried out at different levels in terms of granularity, semantics, and lexicality, respectively, avoiding repetitive work to reduce computational effort and using the semantic features of words to calculate the boundaries of entities to improve the accuracy of the recognition work. The results of the experiments carried out on web-based microblogging data show that the model achieves an accuracy of 86.33% and an F1 value of 89.27% in recognizing nested named entities, making up for the shortcomings of some previous recognition models and improving the efficiency of recognition of nested named entities.

Keywords: coarse-grained, nested named entity, Chinese natural language processing, word embedding, T-SNE dimensionality reduction algorithm

Procedia PDF Downloads 95
76 On the Estimation of Crime Rate in the Southwest of Nigeria: Principal Component Analysis Approach

Authors: Kayode Balogun, Femi Ayoola

Abstract:

Crime is at alarming rate in this part of world and there are many factors that are contributing to this antisocietal behaviour both among the youths and old. In this work, principal component analysis (PCA) was used as a tool to reduce the dimensionality and to really know those variables that were crime prone in the study region. Data were collected on twenty-eight crime variables from National Bureau of Statistics (NBS) databank for a period of fifteen years, while retaining as much of the information as possible. We use PCA in this study to know the number of major variables and contributors to the crime in the Southwest Nigeria. The results of our analysis revealed that there were eight principal variables have been retained using the Scree plot and Loading plot which implies an eight-equation solution will be appropriate for the data. The eight components explained 93.81% of the total variation in the data set. We also found that the highest and commonly committed crimes in the Southwestern Nigeria were: Assault, Grievous Harm and Wounding, theft/stealing, burglary, house breaking, false pretence, unlawful arms possession and breach of public peace.

Keywords: crime rates, data, Southwest Nigeria, principal component analysis, variables

Procedia PDF Downloads 406
75 Distributed Perceptually Important Point Identification for Time Series Data Mining

Authors: Tak-Chung Fu, Ying-Kit Hung, Fu-Lai Chung

Abstract:

In the field of time series data mining, the concept of the Perceptually Important Point (PIP) identification process is first introduced in 2001. This process originally works for financial time series pattern matching and it is then found suitable for time series dimensionality reduction and representation. Its strength is on preserving the overall shape of the time series by identifying the salient points in it. With the rise of Big Data, time series data contributes a major proportion, especially on the data which generates by sensors in the Internet of Things (IoT) environment. According to the nature of PIP identification and the successful cases, it is worth to further explore the opportunity to apply PIP in time series ‘Big Data’. However, the performance of PIP identification is always considered as the limitation when dealing with ‘Big’ time series data. In this paper, two distributed versions of PIP identification based on the Specialized Binary (SB) Tree are proposed. The proposed approaches solve the bottleneck when running the PIP identification process in a standalone computer. Improvement in term of speed is obtained by the distributed versions.

Keywords: distributed computing, performance analysis, Perceptually Important Point identification, time series data mining

Procedia PDF Downloads 392
74 Dimensionality and Superconducting Parameters of YBa2Cu3O7 Foams

Authors: Michael Koblischka, Anjela Koblischka-Veneva, XianLin Zeng, Essia Hannachi, Yassine Slimani

Abstract:

Superconducting foams of YBa2Cu3O7 (abbreviated Y-123) were produced using the infiltration growth (IG) technique from Y2BaCuO5 (Y-211) foams. The samples were investigated by SEM (scanning electron microscopy) and electrical resistivity measurements. SEM observations indicated the specific microstructure of the foam struts with numerous tiny Y-211 particles (50-100 nm diameter) embedded in channel-like structures between the Y-123 grains. The investigation of the excess conductivity of different prepared composites was analyzed using Aslamazov-Larkin (AL) model. The investigated samples comprised of five distinct fluctuation regimes, namely short-wave (SWF), one-dimensional (1D), two-dimensional (2D), three-dimensional (3D), and critical (CR) fluctuations regimes. The coherence length along the c-axis at zero-temperature (ξc(0)), lower and upper critical magnetic fields (Bc1 and Bc2), critical current density (Jc) and numerous other superconducting parameters were estimated from the data. The analysis reveals that the presence of the tiny Y-211 particles alters the excess conductivity and the fluctuation behavior observed in standard YBCO samples.

Keywords: Excess conductivity, Foam, Microstructure, Superconductor YBa2Cu3Oy

Procedia PDF Downloads 137
73 SC-LSH: An Efficient Indexing Method for Approximate Similarity Search in High Dimensional Space

Authors: Sanaa Chafik, Imane Daoudi, Mounim A. El Yacoubi, Hamid El Ouardi

Abstract:

Locality Sensitive Hashing (LSH) is one of the most promising techniques for solving nearest neighbour search problem in high dimensional space. Euclidean LSH is the most popular variation of LSH that has been successfully applied in many multimedia applications. However, the Euclidean LSH presents limitations that affect structure and query performances. The main limitation of the Euclidean LSH is the large memory consumption. In order to achieve a good accuracy, a large number of hash tables is required. In this paper, we propose a new hashing algorithm to overcome the storage space problem and improve query time, while keeping a good accuracy as similar to that achieved by the original Euclidean LSH. The Experimental results on a real large-scale dataset show that the proposed approach achieves good performances and consumes less memory than the Euclidean LSH.

Keywords: approximate nearest neighbor search, content based image retrieval (CBIR), curse of dimensionality, locality sensitive hashing, multidimensional indexing, scalability

Procedia PDF Downloads 299
72 Optimal Feature Extraction Dimension in Finger Vein Recognition Using Kernel Principal Component Analysis

Authors: Amir Hajian, Sepehr Damavandinejadmonfared

Abstract:

In this paper the issue of dimensionality reduction is investigated in finger vein recognition systems using kernel Principal Component Analysis (KPCA). One aspect of KPCA is to find the most appropriate kernel function on finger vein recognition as there are several kernel functions which can be used within PCA-based algorithms. In this paper, however, another side of PCA-based algorithms -particularly KPCA- is investigated. The aspect of dimension of feature vector in PCA-based algorithms is of importance especially when it comes to the real-world applications and usage of such algorithms. It means that a fixed dimension of feature vector has to be set to reduce the dimension of the input and output data and extract the features from them. Then a classifier is performed to classify the data and make the final decision. We analyze KPCA (Polynomial, Gaussian, and Laplacian) in details in this paper and investigate the optimal feature extraction dimension in finger vein recognition using KPCA.

Keywords: biometrics, finger vein recognition, principal component analysis (PCA), kernel principal component analysis (KPCA)

Procedia PDF Downloads 333
71 Electromyography Pattern Classification with Laplacian Eigenmaps in Human Running

Authors: Elnaz Lashgari, Emel Demircan

Abstract:

Electromyography (EMG) is one of the most important interfaces between humans and robots for rehabilitation. Decoding this signal helps to recognize muscle activation and converts it into smooth motion for the robots. Detecting each muscle’s pattern during walking and running is vital for improving the quality of a patient’s life. In this study, EMG data from 10 muscles in 10 subjects at 4 different speeds were analyzed. EMG signals are nonlinear with high dimensionality. To deal with this challenge, we extracted some features in time-frequency domain and used manifold learning and Laplacian Eigenmaps algorithm to find the intrinsic features that represent data in low-dimensional space. We then used the Bayesian classifier to identify various patterns of EMG signals for different muscles across a range of running speeds. The best result for vastus medialis muscle corresponds to 97.87±0.69 for sensitivity and 88.37±0.79 for specificity with 97.07±0.29 accuracy using Bayesian classifier. The results of this study provide important insight into human movement and its application for robotics research.

Keywords: electromyography, manifold learning, ISOMAP, Laplacian Eigenmaps, locally linear embedding

Procedia PDF Downloads 327
70 Contractual Complexity and Contract Parties' Opportunistic Behavior in Construction Projects: In a Contractual Function View

Authors: Mengxia Jin, Yongqiang Chen, Wenqian Wang, Yu Wang

Abstract:

The complexity and specificity of construction projects have made common opportunism phenomenon, and contractual governance for opportunism has been a topic of considerable ongoing research. Based on TCE, the research distinguishes control and coordination as different functions of the contract to investigate their complexity separately. And in a nuanced way, the dimensionality of contractual control is examined. Through the analysis of motivation and capability of strong or weak form opportunism, the framework focuses on the relationship between the complexity of above contractual dimensions and different types of opportunistic behavior and attempts to verify the possible explanatory mechanism. The explanatory power of the research model is evaluated in the light of empirical evidence from questionnaires. We collect data from Chinese companies in the construction industry, and the data collection is still in progress. The findings will speak to the debate surrounding the effects of contract complexity on opportunistic behavior. This nuanced research will derive implications for research on the role of contractual mechanisms in dealing with inter-organizational opportunism and offer suggestions for curbing contract parties’ opportunistic behavior in construction projects.

Keywords: contractual complexity, contractual control, contractual coordinatio, opportunistic behavior

Procedia PDF Downloads 354
69 Isothermal Crystallization Kinetics of Lauric Acid Methyl Ester from DSC Measurements

Authors: Charine Faith H. Lagrimas, Rommel N. Galvan, Rizalinda L. de Leon

Abstract:

An ongoing study, methyl laurate to be used as a refrigerant in an HVAC system, requires the crystallization kinetics of the said substance. Step-wise and normal forms of Avrami model parameters were used to describe the isothermal crystallization kinetics of methyl laurate at different temperatures from Differential Scanning Calorimetry (DSC) measurements. At 3 °C, parameters showed that methyl laurate exhibits a secondary crystallization. The primary crystallization occurred with instantaneous nuclei and spherulitic growth; followed by a secondary instantaneous nucleation with a lower growth of dimensionality, rod-like. At 4 °C to 6 °C, the exotherms from DSC implied that the system was under the isokinetic range. The kinetics behavior is the same which is instantaneous nucleation with one-dimensional growth. The differences for the isokinetic range temperatures are the activation energies (directly proportional to T) and nucleation rates (inversely proportional to T). From the images obtained during the crystallization of methyl laurate using an optical microscope, it is confirmed that the nucleation and crystal growth modes obtained from the optical microscope are consistent with the parameters from Avrami model.

Keywords: Avrami model, isothermal crystallization, lipids kinetics, methyl laurate

Procedia PDF Downloads 298
68 Curating Pluralistic Futures: Leveling up for Whole-Systems Change

Authors: Daniel Schimmelpfennig

Abstract:

This paper attempts to delineate the idea to curate the leveling up for whole-systems change. Curation is the act fo select, organize, look after, or present information from a professional point of view through expert knowledge. The trans-paradigmatic, trans-contextual, trans-disciplinary, trans-perspective of trans-media futures studies hopes to enable a move from a monochrome intellectual pursuit towards breathing a higher dimensionality. Progressing to the next level to equip actors for whole-systems change is in consideration of the commonly known symptoms of our time as well as in anticipation of future challenges, both a necessity and desirability. Systems of collective intelligence could potentially scale regenerative, adaptive, and anticipatory capacities. How could such a curation then be enacted and implemented, to initiate the process of leveling-up? The suggestion here is to focus on the metasystem transition, the bio-digital fusion, namely, by merging neurosciences, the ontological design of money as our operating system, and our understanding of the billions of years of time-proven permutations in nature, biomimicry, and biological metaphors like symbiogenesis. Evolutionary cybernetics accompanies the process of whole-systems change.

Keywords: bio-digital fusion, evolutionary cybernetics, metasystem transition, symbiogenesis, transmedia futures studies

Procedia PDF Downloads 115
67 Hyperspectral Image Classification Using Tree Search Algorithm

Authors: Shreya Pare, Parvin Akhter

Abstract:

Remotely sensing image classification becomes a very challenging task owing to the high dimensionality of hyperspectral images. The pixel-wise classification methods fail to take the spatial structure information of an image. Therefore, to improve the performance of classification, spatial information can be integrated into the classification process. In this paper, the multilevel thresholding algorithm based on a modified fuzzy entropy function is used to perform the segmentation of hyperspectral images. The fuzzy parameters of the MFE function have been optimized by using a new meta-heuristic algorithm based on the Tree-Search algorithm. The segmented image is classified by a large distribution machine (LDM) classifier. Experimental results are shown on a hyperspectral image dataset. The experimental outputs indicate that the proposed technique (MFE-TSA-LDM) achieves much higher classification accuracy for hyperspectral images when compared to state-of-art classification techniques. The proposed algorithm provides accurate segmentation and classification maps, thus becoming more suitable for image classification with large spatial structures.

Keywords: classification, hyperspectral images, large distribution margin, modified fuzzy entropy function, multilevel thresholding, tree search algorithm, hyperspectral image classification using tree search algorithm

Procedia PDF Downloads 135
66 Supervised/Unsupervised Mahalanobis Algorithm for Improving Performance for Cyberattack Detection over Communications Networks

Authors: Radhika Ranjan Roy

Abstract:

Deployment of machine learning (ML)/deep learning (DL) algorithms for cyberattack detection in operational communications networks (wireless and/or wire-line) is being delayed because of low-performance parameters (e.g., recall, precision, and f₁-score). If datasets become imbalanced, which is the usual case for communications networks, the performance tends to become worse. Complexities in handling reducing dimensions of the feature sets for increasing performance are also a huge problem. Mahalanobis algorithms have been widely applied in scientific research because Mahalanobis distance metric learning is a successful framework. In this paper, we have investigated the Mahalanobis binary classifier algorithm for increasing cyberattack detection performance over communications networks as a proof of concept. We have also found that high-dimensional information in intermediate features that are not utilized as much for classification tasks in ML/DL algorithms are the main contributor to the state-of-the-art of improved performance of the Mahalanobis method, even for imbalanced and sparse datasets. With no feature reduction, MD offers uniform results for precision, recall, and f₁-score for unbalanced and sparse NSL-KDD datasets.

Keywords: Mahalanobis distance, machine learning, deep learning, NS-KDD, local intrinsic dimensionality, chi-square, positive semi-definite, area under the curve

Procedia PDF Downloads 47
65 Ordinary Differentiation Equations (ODE) Reconstruction of High-Dimensional Genetic Networks through Game Theory with Application to Dissecting Tree Salt Tolerance

Authors: Libo Jiang, Huan Li, Rongling Wu

Abstract:

Ordinary differentiation equations (ODE) have proven to be powerful for reconstructing precise and informative gene regulatory networks (GRNs) from dynamic gene expression data. However, joint modeling and analysis of all genes, essential for the systematical characterization of genetic interactions, are challenging due to high dimensionality and a complex pattern of genetic regulation including activation, repression, and antitermination. Here, we address these challenges by unifying variable selection and game theory through ODE. Each gene within a GRN is co-expressed with its partner genes in a way like a game of multiple players, each of which tends to choose an optimal strategy to maximize its “fitness” across the whole network. Based on this unifying theory, we designed and conducted a real experiment to infer salt tolerance-related GRNs for Euphrates poplar, a hero tree that can grow in the saline desert. The pattern and magnitude of interactions between several hub genes within these GRNs were found to determine the capacity of Euphrates poplar to resist to saline stress.

Keywords: gene regulatory network, ordinary differential equation, game theory, LASSO, saline resistance

Procedia PDF Downloads 609
64 A Portable Cognitive Tool for Engagement Level and Activity Identification

Authors: Terry Teo, Sun Woh Lye, Yufei Li, Zainuddin Zakaria

Abstract:

Wearable devices such as Electroencephalography (EEG) hold immense potential in the monitoring and assessment of a person’s task engagement. This is especially so in remote or online sites. Research into its use in measuring an individual's cognitive state while performing task activities is therefore expected to increase. Despite the growing number of EEG research into brain functioning activities of a person, key challenges remain in adopting EEG for real-time operations. These include limited portability, long preparation time, high number of channel dimensionality, intrusiveness, as well as level of accuracy in acquiring neurological data. This paper proposes an approach using a 4-6 EEG channels to determine the cognitive states of a subject when undertaking a set of passive and active monitoring tasks of a subject. Air traffic controller (ATC) dynamic-tasks are used as a proxy. The work found that when using the channel reduction and identifier algorithm, good trend adherence of 89.1% can be obtained between a commercially available BCI 14 channel Emotiv EPOC+ EEG headset and that of a carefully selected set of reduced 4-6 channels. The approach can also identify different levels of engagement activities ranging from general monitoring ad hoc and repeated active monitoring activities involving information search, extraction, and memory activities.

Keywords: assessment, neurophysiology, monitoring, EEG

Procedia PDF Downloads 42
63 Gait Biometric for Person Re-Identification

Authors: Lavanya Srinivasan

Abstract:

Biometric identification is to identify unique features in a person like fingerprints, iris, ear, and voice recognition that need the subject's permission and physical contact. Gait biometric is used to identify the unique gait of the person by extracting moving features. The main advantage of gait biometric to identify the gait of a person at a distance, without any physical contact. In this work, the gait biometric is used for person re-identification. The person walking naturally compared with the same person walking with bag, coat, and case recorded using longwave infrared, short wave infrared, medium wave infrared, and visible cameras. The videos are recorded in rural and in urban environments. The pre-processing technique includes human identified using YOLO, background subtraction, silhouettes extraction, and synthesis Gait Entropy Image by averaging the silhouettes. The moving features are extracted from the Gait Entropy Energy Image. The extracted features are dimensionality reduced by the principal component analysis and recognised using different classifiers. The comparative results with the different classifier show that linear discriminant analysis outperforms other classifiers with 95.8% for visible in the rural dataset and 94.8% for longwave infrared in the urban dataset.

Keywords: biometric, gait, silhouettes, YOLO

Procedia PDF Downloads 144
62 Data Science-Based Key Factor Analysis and Risk Prediction of Diabetic

Authors: Fei Gao, Rodolfo C. Raga Jr.

Abstract:

This research proposal will ascertain the major risk factors for diabetes and to design a predictive model for risk assessment. The project aims to improve diabetes early detection and management by utilizing data science techniques, which may improve patient outcomes and healthcare efficiency. The phase relation values of each attribute were used to analyze and choose the attributes that might influence the examiner's survival probability using Diabetes Health Indicators Dataset from Kaggle’s data as the research data. We compare and evaluate eight machine learning algorithms. Our investigation begins with comprehensive data preprocessing, including feature engineering and dimensionality reduction, aimed at enhancing data quality. The dataset, comprising health indicators and medical data, serves as a foundation for training and testing these algorithms. A rigorous cross-validation process is applied, and we assess their performance using five key metrics like accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). After analyzing the data characteristics, investigate their impact on the likelihood of diabetes and develop corresponding risk indicators.

Keywords: diabetes, risk factors, predictive model, risk assessment, data science techniques, early detection, data analysis, Kaggle

Procedia PDF Downloads 36
61 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu

Abstract:

Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 366