Search results for: High dimensional data
12603 An Efficient and Generic Hybrid Framework for High Dimensional Data Clustering
Authors: Dharmveer Singh Rajput , P. K. Singh, Mahua Bhattacharya
Abstract:
Clustering in high dimensional space is a difficult problem which is recurrent in many fields of science and engineering, e.g., bioinformatics, image processing, pattern reorganization and data mining. In high dimensional space some of the dimensions are likely to be irrelevant, thus hiding the possible clustering. In very high dimensions it is common for all the objects in a dataset to be nearly equidistant from each other, completely masking the clusters. Hence, performance of the clustering algorithm decreases. In this paper, we propose an algorithmic framework which combines the (reduct) concept of rough set theory with the k-means algorithm to remove the irrelevant dimensions in a high dimensional space and obtain appropriate clusters. Our experiment on test data shows that this framework increases efficiency of the clustering process and accuracy of the results.Keywords: High dimensional clustering, sub-space, k-means, rough set, discernibility matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 194712602 Dimension Reduction of Microarray Data Based on Local Principal Component
Authors: Ali Anaissi, Paul J. Kennedy, Madhu Goyal
Abstract:
Analysis and visualization of microarraydata is veryassistantfor biologists and clinicians in the field of diagnosis and treatment of patients. It allows Clinicians to better understand the structure of microarray and facilitates understanding gene expression in cells. However, microarray dataset is a complex data set and has thousands of features and a very small number of observations. This very high dimensional data set often contains some noise, non-useful information and a small number of relevant features for disease or genotype. This paper proposes a non-linear dimensionality reduction algorithm Local Principal Component (LPC) which aims to maps high dimensional data to a lower dimensional space. The reduced data represents the most important variables underlying the original data. Experimental results and comparisons are presented to show the quality of the proposed algorithm. Moreover, experiments also show how this algorithm reduces high dimensional data whilst preserving the neighbourhoods of the points in the low dimensional space as in the high dimensional space.
Keywords: Linear Dimension Reduction, Non-Linear Dimension Reduction, Principal Component Analysis, Biologists.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 157312601 Application of Multi-Dimensional Principal Component Analysis to Medical Data
Authors: Naoki Yamamoto, Jun Murakami, Chiharu Okuma, Yutaro Shigeto, Satoko Saito, Takashi Izumi, Nozomi Hayashida
Abstract:
Multi-dimensional principal component analysis (PCA) is the extension of the PCA, which is used widely as the dimensionality reduction technique in multivariate data analysis, to handle multi-dimensional data. To calculate the PCA the singular value decomposition (SVD) is commonly employed by the reason of its numerical stability. The multi-dimensional PCA can be calculated by using the higher-order SVD (HOSVD), which is proposed by Lathauwer et al., similarly with the case of ordinary PCA. In this paper, we apply the multi-dimensional PCA to the multi-dimensional medical data including the functional independence measure (FIM) score, and describe the results of experimental analysis.Keywords: multi-dimensional principal component analysis, higher-order SVD (HOSVD), functional independence measure (FIM), medical data, tensor decomposition
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 250212600 ISC–Intelligent Subspace Clustering, A Density Based Clustering Approach for High Dimensional Dataset
Authors: Sunita Jahirabadkar, Parag Kulkarni
Abstract:
Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC – Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as є – distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.
Keywords: Density based clustering, high dimensional data, subspace clustering, dynamic parameter setting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 201712599 Dimensional Modeling of HIV Data Using Open Source
Authors: Charles D. Otine, Samuel B. Kucel, Lena Trojer
Abstract:
Selecting the data modeling technique for an information system is determined by the objective of the resultant data model. Dimensional modeling is the preferred modeling technique for data destined for data warehouses and data mining, presenting data models that ease analysis and queries which are in contrast with entity relationship modeling. The establishment of data warehouses as components of information system landscapes in many organizations has subsequently led to the development of dimensional modeling. This has been significantly more developed and reported for the commercial database management systems as compared to the open sources thereby making it less affordable for those in resource constrained settings. This paper presents dimensional modeling of HIV patient information using open source modeling tools. It aims to take advantage of the fact that the most affected regions by the HIV virus are also heavily resource constrained (sub-Saharan Africa) whereas having large quantities of HIV data. Two HIV data source systems were studied to identify appropriate dimensions and facts these were then modeled using two open source dimensional modeling tools. Use of open source would reduce the software costs for dimensional modeling and in turn make data warehousing and data mining more feasible even for those in resource constrained settings but with data available.Keywords: About Database, Data Mining, Data warehouse, Dimensional Modeling, Open Source.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 195812598 A Survey on Facial Feature Points Detection Techniques and Approaches
Authors: Rachid Ahdid, Khaddouj Taifi, Said Safi, Bouzid Manaut
Abstract:
Automatic detection of facial feature points plays an important role in applications such as facial feature tracking, human-machine interaction and face recognition. The majority of facial feature points detection methods using two-dimensional or three-dimensional data are covered in existing survey papers. In this article chosen approaches to the facial features detection have been gathered and described. This overview focuses on the class of researches exploiting facial feature points detection to represent facial surface for two-dimensional or three-dimensional face. In the conclusion, we discusses advantages and disadvantages of the presented algorithms.Keywords: Facial feature points, face recognition, facial feature tracking, two-dimensional data, three-dimensional data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 168112597 Bidirectional Discriminant Supervised Locality Preserving Projection for Face Recognition
Abstract:
Dimensionality reduction and feature extraction are of crucial importance for achieving high efficiency in manipulating the high dimensional data. Two-dimensional discriminant locality preserving projection (2D-DLPP) and two-dimensional discriminant supervised LPP (2D-DSLPP) are two effective two-dimensional projection methods for dimensionality reduction and feature extraction of face image matrices. Since 2D-DLPP and 2D-DSLPP preserve the local structure information of the original data and exploit the discriminant information, they usually have good recognition performance. However, 2D-DLPP and 2D-DSLPP only employ single-sided projection, and thus the generated low dimensional data matrices have still many features. In this paper, by combining the discriminant supervised LPP with the bidirectional projection, we propose the bidirectional discriminant supervised LPP (BDSLPP). The left and right projection matrices for BDSLPP can be computed iteratively. Experimental results show that the proposed BDSLPP achieves higher recognition accuracy than 2D-DLPP, 2D-DSLPP, and bidirectional discriminant LPP (BDLPP).Keywords: Face recognition, dimension reduction, locality preserving projection, discriminant information, bidirectional projection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 68912596 Influence of Parameters of Modeling and Data Distribution for Optimal Condition on Locally Weighted Projection Regression Method
Authors: Farhad Asadi, Mohammad Javad Mollakazemi, Aref Ghafouri
Abstract:
Recent research in neural networks science and neuroscience for modeling complex time series data and statistical learning has focused mostly on learning from high input space and signals. Local linear models are a strong choice for modeling local nonlinearity in data series. Locally weighted projection regression is a flexible and powerful algorithm for nonlinear approximation in high dimensional signal spaces. In this paper, different learning scenario of one and two dimensional data series with different distributions are investigated for simulation and further noise is inputted to data distribution for making different disordered distribution in time series data and for evaluation of algorithm in locality prediction of nonlinearity. Then, the performance of this algorithm is simulated and also when the distribution of data is high or when the number of data is less the sensitivity of this approach to data distribution and influence of important parameter of local validity in this algorithm with different data distribution is explained.
Keywords: Local nonlinear estimation, LWPR algorithm, Online training method.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 160112595 Efficient Tuning Parameter Selection by Cross-Validated Score in High Dimensional Models
Authors: Yoonsuh Jung
Abstract:
As DNA microarray data contain relatively small sample size compared to the number of genes, high dimensional models are often employed. In high dimensional models, the selection of tuning parameter (or, penalty parameter) is often one of the crucial parts of the modeling. Cross-validation is one of the most common methods for the tuning parameter selection, which selects a parameter value with the smallest cross-validated score. However, selecting a single value as an ‘optimal’ value for the parameter can be very unstable due to the sampling variation since the sample sizes of microarray data are often small. Our approach is to choose multiple candidates of tuning parameter first, then average the candidates with different weights depending on their performance. The additional step of estimating the weights and averaging the candidates rarely increase the computational cost, while it can considerably improve the traditional cross-validation. We show that the selected value from the suggested methods often lead to stable parameter selection as well as improved detection of significant genetic variables compared to the tradition cross-validation via real data and simulated data sets.Keywords: Cross Validation, Parameter Averaging, Parameter Selection, Regularization Parameter Search.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 157212594 Normalizing Flow to Augmented Posterior: Conditional Density Estimation with Interpretable Dimension Reduction for High Dimensional Data
Authors: Cheng Zeng, George Michailidis, Hitoshi Iyatomi, Leo L Duan
Abstract:
The conditional density characterizes the distribution of a response variable y given other predictor x, and plays a key role in many statistical tasks, including classification and outlier detection. Although there has been abundant work on the problem of Conditional Density Estimation (CDE) for a low-dimensional response in the presence of a high-dimensional predictor, little work has been done for a high-dimensional response such as images. The promising performance of normalizing flow (NF) neural networks in unconditional density estimation acts a motivating starting point. In this work, we extend NF neural networks when external x is present. Specifically, they use the NF to parameterize a one-to-one transform between a high-dimensional y and a latent z that comprises two components [zP , zN]. The zP component is a low-dimensional subvector obtained from the posterior distribution of an elementary predictive model for x, such as logistic/linear regression. The zN component is a high-dimensional independent Gaussian vector, which explains the variations in y not or less related to x. Unlike existing CDE methods, the proposed approach, coined Augmented Posterior CDE (AP-CDE), only requires a simple modification on the common normalizing flow framework, while significantly improving the interpretation of the latent component, since zP represents a supervised dimension reduction. In image analytics applications, AP-CDE shows good separation of x-related variations due to factors such as lighting condition and subject id, from the other random variations. Further, the experiments show that an unconditional NF neural network, based on an unsupervised model of z, such as Gaussian mixture, fails to generate interpretable results.
Keywords: Conditional density estimation, image generation, normalizing flow, supervised dimension reduction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16512593 Generic Data Warehousing for Consumer Electronics Retail Industry
Authors: S. Habte, K. Ouazzane, P. Patel, S. Patel
Abstract:
The dynamic and highly competitive nature of the consumer electronics retail industry means that businesses in this industry are experiencing different decision making challenges in relation to pricing, inventory control, consumer satisfaction and product offerings. To overcome the challenges facing retailers and create opportunities, we propose a generic data warehousing solution which can be applied to a wide range of consumer electronics retailers with a minimum configuration. The solution includes a dimensional data model, a template SQL script, a high level architectural descriptions, ETL tool developed using C#, a set of APIs, and data access tools. It has been successfully applied by ASK Outlets Ltd UK resulting in improved productivity and enhanced sales growth.
Keywords: Consumer electronics retail, dimensional data model, data analysis, generic data warehousing, reporting.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 138412592 The Robust Clustering with Reduction Dimension
Authors: Dyah E. Herwindiati
Abstract:
A clustering is process to identify a homogeneous groups of object called as cluster. Clustering is one interesting topic on data mining. A group or class behaves similarly characteristics. This paper discusses a robust clustering process for data images with two reduction dimension approaches; i.e. the two dimensional principal component analysis (2DPCA) and principal component analysis (PCA). A standard approach to overcome this problem is dimension reduction, which transforms a high-dimensional data into a lower-dimensional space with limited loss of information. One of the most common forms of dimensionality reduction is the principal components analysis (PCA). The 2DPCA is often called a variant of principal component (PCA), the image matrices were directly treated as 2D matrices; they do not need to be transformed into a vector so that the covariance matrix of image can be constructed directly using the original image matrices. The decomposed classical covariance matrix is very sensitive to outlying observations. The objective of paper is to compare the performance of robust minimizing vector variance (MVV) in the two dimensional projection PCA (2DPCA) and the PCA for clustering on an arbitrary data image when outliers are hiden in the data set. The simulation aspects of robustness and the illustration of clustering images are discussed in the end of paperKeywords: Breakdown point, Consistency, 2DPCA, PCA, Outlier, Vector Variance
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 169712591 One Dimensional Reactor Modeling for Methanol Steam Reforming to Hydrogen
Authors: Hongfang Ma, Mingchuan Zhou, Haitao Zhang, Weiyong Ying
Abstract:
One dimensional pseudo-homogenous modeling has been performed for methanol steam reforming reactor. The results show that the models can well predict the industrial data. The reactor had minimum temperature along axial because of endothermic reaction. Hydrogen productions and temperature profiles along axial were investigated regarding operation conditions such as inlet mass flow rate and mass fraction of methanol, inlet temperature of external thermal oil. Low inlet mass flow rate of methanol, low inlet temperature, and high mass fraction of methanol decreased minimum temperature along axial. Low inlet mass flow rate of methanol, high mass fraction of methanol, and high inlet temperature of thermal oil made cold point forward. Low mass fraction, high mass flow rate, and high inlet temperature of thermal oil increased hydrogen production. One dimensional models can be a guide for industrial operation.
Keywords: Reactor, modeling, methanol, steam reforming.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 74712590 NOHIS-Tree: High-Dimensional Index Structure for Similarity Search
Authors: Mounira Taileb, Sami Touati
Abstract:
In Content-Based Image Retrieval systems it is important to use an efficient indexing technique in order to perform and accelerate the search in huge databases. The used indexing technique should also support the high dimensions of image features. In this paper we present the hierarchical index NOHIS-tree (Non Overlapping Hierarchical Index Structure) when we scale up to very large databases. We also present a study of the influence of clustering on search time. The performance test results show that NOHIS-tree performs better than SR-tree. Tests also show that NOHIS-tree keeps its performances in high dimensional spaces. We include the performance test that try to determine the number of clusters in NOHIS-tree to have the best search time.Keywords: High-dimensional indexing, k-nearest neighborssearch.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 144312589 Comparison between Higher-Order SVD and Third-order Orthogonal Tensor Product Expansion
Authors: Chiharu Okuma, Jun Murakami, Naoki Yamamoto
Abstract:
In digital signal processing it is important to approximate multi-dimensional data by the method called rank reduction, in which we reduce the rank of multi-dimensional data from higher to lower. For 2-dimennsional data, singular value decomposition (SVD) is one of the most known rank reduction techniques. Additional, outer product expansion expanded from SVD was proposed and implemented for multi-dimensional data, which has been widely applied to image processing and pattern recognition. However, the multi-dimensional outer product expansion has behavior of great computation complex and has not orthogonally between the expansion terms. Therefore we have proposed an alterative method, Third-order Orthogonal Tensor Product Expansion short for 3-OTPE. 3-OTPE uses the power method instead of nonlinear optimization method for decreasing at computing time. At the same time the group of B. D. Lathauwer proposed Higher-Order SVD (HOSVD) that is also developed with SVD extensions for multi-dimensional data. 3-OTPE and HOSVD are similarly on the rank reduction of multi-dimensional data. Using these two methods we can obtain computation results respectively, some ones are the same while some ones are slight different. In this paper, we compare 3-OTPE to HOSVD in accuracy of calculation and computing time of resolution, and clarify the difference between these two methods.Keywords: Singular value decomposition (SVD), higher-order SVD (HOSVD), higher-order tensor, outer product expansion, power method.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 156112588 Three Dimensional Analysis of Sequential Quasi Isotropic Composite Disc for Rotating Machine Application
Authors: Amin Almasi
Abstract:
Composite laminates are relatively weak in out of plane loading, inter-laminar stress, stress concentration near the edge and stress singularities. This paper develops a new analytical formulation for laminated composite rotating disc fabricated from symmetric sequential quasi isotropic layers to predict three dimensional stress and deformation. This analysis is necessary to evaluate mechanical integrity of fiber reinforced multi-layer laminates used for high speed rotating applications such as high speed impellers. Three dimensional governing equations are written for rotating composite disc. Explicit solution is obtained with "Frobenius" expansion series. Based on analytical results, there are two separate zones of three dimensional stress fields in centre and edge of rotating disc. For thin discs, out of plane deformations and stresses are small in comparison with plane ones. For relatively thick discs deformation and stress fields are three dimensional.Keywords: Composite Disc, Rotating Machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 139212587 A Software Framework for Predicting Oil-Palm Yield from Climate Data
Authors: Mohd. Noor Md. Sap, A. Majid Awan
Abstract:
Intelligent systems based on machine learning techniques, such as classification, clustering, are gaining wide spread popularity in real world applications. This paper presents work on developing a software system for predicting crop yield, for example oil-palm yield, from climate and plantation data. At the core of our system is a method for unsupervised partitioning of data for finding spatio-temporal patterns in climate data using kernel methods which offer strength to deal with complex data. This work gets inspiration from the notion that a non-linear data transformation into some high dimensional feature space increases the possibility of linear separability of the patterns in the transformed space. Therefore, it simplifies exploration of the associated structure in the data. Kernel methods implicitly perform a non-linear mapping of the input data into a high dimensional feature space by replacing the inner products with an appropriate positive definite function. In this paper we present a robust weighted kernel k-means algorithm incorporating spatial constraints for clustering the data. The proposed algorithm can effectively handle noise, outliers and auto-correlation in the spatial data, for effective and efficient data analysis by exploring patterns and structures in the data, and thus can be used for predicting oil-palm yield by analyzing various factors affecting the yield.Keywords: Pattern analysis, clustering, kernel methods, spatial data, crop yield
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 197812586 Sensitivity Analysis during the Optimization Process Using Genetic Algorithms
Authors: M. A. Rubio, A. Urquia
Abstract:
Genetic algorithms (GA) are applied to the solution of high-dimensional optimization problems. Additionally, sensitivity analysis (SA) is usually carried out to determine the effect on optimal solutions of changes in parameter values of the objective function. These two analyses (i.e., optimization and sensitivity analysis) are computationally intensive when applied to high-dimensional functions. The approach presented in this paper consists in performing the SA during the GA execution, by statistically analyzing the data obtained of running the GA. The advantage is that in this case SA does not involve making additional evaluations of the objective function and, consequently, this proposed approach requires less computational effort than conducting optimization and SA in two consecutive steps.Keywords: Optimization, sensitivity, genetic algorithms, model calibration.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 147412585 Estimation of Attenuation and Phase Delay in Driving Voltage Waveform of an Ultra-High-Speed Image Sensor by Dimensional Analysis
Authors: V. T. S. Dao, T. G. Etoh, C. Vo Le, H. D. Nguyen, K. Takehara, T. Akino, K. Nishi
Abstract:
We present an explicit expression to estimate driving voltage attenuation through RC networks representation of an ultrahigh- speed image sensor. Elmore delay metric for a fundamental RC chain is employed as the first-order approximation. By application of dimensional analysis to SPICE simulation data, we found a simple expression that significantly improves the accuracy of the approximation. Estimation error of the resultant expression for uniform RC networks is less than 2%. Similarly, another simple closed-form model to estimate 50 % delay through fundamental RC networks is also derived with sufficient accuracy. The framework of this analysis can be extended to address delay or attenuation issues of other VLSI structures.
Keywords: Dimensional Analysis, Elmore model, RC network, Signal Attenuation, Ultra-High-Speed Image Sensor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 142312584 Unsupervised Segmentation by Hidden Markov Chain with Bi-dimensional Observed Process
Authors: Abdelali Joumad, Abdelaziz Nasroallah
Abstract:
In unsupervised segmentation context, we propose a bi-dimensional hidden Markov chain model (X,Y) that we adapt to the image segmentation problem. The bi-dimensional observed process Y = (Y 1, Y 2) is such that Y 1 represents the noisy image and Y 2 represents a noisy supplementary information on the image, for example a noisy proportion of pixels of the same type in a neighborhood of the current pixel. The proposed model can be seen as a competitive alternative to the Hilbert-Peano scan. We propose a bayesian algorithm to estimate parameters of the considered model. The performance of this algorithm is globally favorable, compared to the bi-dimensional EM algorithm through numerical and visual data.
Keywords: Image segmentation, Hidden Markov chain with a bi-dimensional observed process, Peano-Hilbert scan, Bayesian approach, MCMC methods, Bi-dimensional EM algorithm.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 161212583 The Comparison of Anchor and Star Schema from a Query Performance Perspective
Authors: Radek Němec
Abstract:
Today's business environment requires that companies have access to highly relevant information in a matter of seconds. Modern Business Intelligence tools rely on data structured mostly in traditional dimensional database schemas, typically represented by star schemas. Dimensional modeling is already recognized as a leading industry standard in the field of data warehousing although several drawbacks and pitfalls were reported. This paper focuses on the analysis of another data warehouse modeling technique - the anchor modeling, and its characteristics in context with the standardized dimensional modeling technique from a query performance perspective. The results of the analysis show information about performance of queries executed on database schemas structured according to principles of each database modeling technique.Keywords: Data warehousing, anchor modeling, star schema, anchor schema, query performance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 331912582 Parallel Algorithm for Numerical Solution of Three-Dimensional Poisson Equation
Authors: Alibek Issakhov
Abstract:
In this paper developed and realized absolutely new algorithm for solving three-dimensional Poisson equation. This equation used in research of turbulent mixing, computational fluid dynamics, atmospheric front, and ocean flows and so on. Moreover in the view of rising productivity of difficult calculation there was applied the most up-to-date and the most effective parallel programming technology - MPI in combination with OpenMP direction, that allows to realize problems with very large data content. Resulted products can be used in solving of important applications and fundamental problems in mathematics and physics.Keywords: MPI, OpenMP, three dimensional Poisson equation
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 169412581 Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings
Authors: G. Candel, D. Naccache
Abstract:
t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embedding. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic, and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n2) to O(n2/k), and the memory requirement from n2 to 2(n/k)2 which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.
Keywords: Concept drift, data visualization, dimension reduction, embedding, monitoring, reusability, t-SNE, unsupervised learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 48912580 Hierarchies Based On the Number of Cooperating Systems of Finite Automata on Four-Dimensional Input Tapes
Authors: Makoto Sakamoto, Yasuo Uchida, Makoto Nagatomo, Takao Ito, Tsunehiro Yoshinaga, Satoshi Ikeda, Masahiro Yokomichi, Hiroshi Furutani
Abstract:
In theoretical computer science, the Turing machine has played a number of important roles in understanding and exploiting basic concepts and mechanisms in computing and information processing [20]. It is a simple mathematical model of computers [9]. After that, M.Blum and C.Hewitt first proposed two-dimensional automata as a computational model of two-dimensional pattern processing, and investigated their pattern recognition abilities in 1967 [7]. Since then, a lot of researchers in this field have been investigating many properties about automata on a two- or three-dimensional tape. On the other hand, the question of whether processing fourdimensional digital patterns is much more difficult than two- or threedimensional ones is of great interest from the theoretical and practical standpoints. Thus, the study of four-dimensional automata as a computasional model of four-dimensional pattern processing has been meaningful [8]-[19],[21]. This paper introduces a cooperating system of four-dimensional finite automata as one model of four-dimensional automata. A cooperating system of four-dimensional finite automata consists of a finite number of four-dimensional finite automata and a four-dimensional input tape where these finite automata work independently (in parallel). Those finite automata whose input heads scan the same cell of the input tape can communicate with each other, that is, every finite automaton is allowed to know the internal states of other finite automata on the same cell it is scanning at the moment. In this paper, we mainly investigate some accepting powers of a cooperating system of eight- or seven-way four-dimensional finite automata. The seven-way four-dimensional finite automaton is an eight-way four-dimensional finite automaton whose input head can move east, west, south, north, up, down, or in the fu-ture, but not in the past on a four-dimensional input tape.
Keywords: computational complexity, cooperating system, finite automaton, four-dimension, hierarchy, multihead.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 188812579 k-Neighborhood Template A-Type Three-Dimensional Bounded Cellular Acceptor
Authors: Makoto Nagatomo, Yasuo Uchida, Makoto Sakamoto, Tuo Zhang, Tatsuma Kurogi, Takao Ito, Tsunehiro Yoshinaga, Satoshi Ikeda, Masahiro Yokomichi, Hiroshi Furutani
Abstract:
This paper presents a four-dimensional computational model, k-neighborhood template A-type three-dimensional bounded cellular acceptor (abbreviated as A-3BCA(k)), and discusses the hierarchical properties. An A-3BCA(k) is a four-dimensional automaton which consists of a pair of a converter and a configuration-reader. The former converts the given four-dimensional tape to the three- and two- dimensional configuration and the latter determines the acceptance or nonacceptance of given four-dimensional tape whether or not the derived two-dimensional configuration is accepted. We mainly investigate the difference of the accepting power based on the difference of the configuration-reader. It is shown that the difference of the accepting power of the configuration-reader tends to affect directly that of the A-3BCA(k) for the case when the converter is deterministic. On the other hand, results are not analogous for the nondeterministic case.Keywords: Cellular acceptor, configuration-reader, converter, finite automaton, four-dimension, on-line tessellation acceptor, parallel/sequential array acceptor, turing machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 151012578 Predicting Bridge Pier Scour Depth with SVM
Authors: Arun Goel
Abstract:
Prediction of maximum local scour is necessary for the safety and economical design of the bridges. A number of equations have been developed over the years to predict local scour depth using laboratory data and a few pier equations have also been proposed using field data. Most of these equations are empirical in nature as indicated by the past publications. In this paper attempts have been made to compute local depth of scour around bridge pier in dimensional and non-dimensional form by using linear regression, simple regression and SVM (Poly & Rbf) techniques along with few conventional empirical equations. The outcome of this study suggests that the SVM (Poly & Rbf) based modeling can be employed as an alternate to linear regression, simple regression and the conventional empirical equations in predicting scour depth of bridge piers. The results of present study on the basis of non-dimensional form of bridge pier scour indicate the improvement in the performance of SVM (Poly & Rbf) in comparison to dimensional form of scour.Keywords: Modeling, pier scour, regression, prediction, SVM (Poly & Rbf kernels).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 154312577 An Investigation of a Three-Dimensional Constitutive Model of Gas Diffusion Layers in Polymer Electrolyte Membrane Fuel Cells
Authors: Yanqin Chen, Chao Jiang, Chongdu Cho
Abstract:
This research presents the three-dimensional mechanical characteristics of a commercial gas diffusion layer by experiment and simulation results. Although the mechanical performance of gas diffusion layers has attracted much attention, its reliability and accuracy are still a major challenge. With the help of simulation analysis methods, it is beneficial to the gas diffusion layer’s extensive commercial development and the overall stress analysis of proton electrolyte membrane fuel cells during its pre-production design period. Therefore, in this paper, a three-dimensional constitutive model of a commercial gas diffusion layer, including its material stiffness matrix parameters, is developed and coded, in the user-defined material model of a commercial finite element method software for simulation. Then, the model is validated by comparing experimental results as well as simulation outcomes. As a result, both the experimental data and simulation results show a good agreement with each other, with high accuracy.
Keywords: Gas diffusion layer, proton electrolyte membrane fuel cell, stiffness matrix, three-dimensional mechanical characteristics, user-defined material model.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 94812576 Extend Three-wave Method for the (3+1)-Dimensional Soliton Equation
Authors: Somayeh Arbabi Mohammad-Abadi, Maliheh Najafi
Abstract:
In this paper, we study (3+1)-dimensional Soliton equation. We employ the Hirota-s bilinear method to obtain the bilinear form of (3+1)-dimensional Soliton equation. Then by the idea of extended three-wave method, some exact soliton solutions including breather type solutions are presented.
Keywords: Three-wave method, (3+1)-dimensional Soliton equation, Hirota's bilinear form.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 155912575 Effects of Double Delta Doping on Millimeter and Sub-millimeter Wave Response of Two-Dimensional Hot Electrons in GaAs Nanostructures
Authors: N. Basanta Singh, Sanjoy Deb, G. P Mishra, Subir Kumar Sarkar
Abstract:
Carrier mobility has become the most important characteristic of high speed low dimensional devices. Due to development of very fast switching semiconductor devices, speed of computer and communication equipment has been increasing day by day and will continue to do so in future. As the response of any device depends on the carrier motion within the devices, extensive studies of carrier mobility in the devices has been established essential for the growth in the field of low dimensional devices. Small-signal ac transport of degenerate two-dimensional hot electrons in GaAs quantum wells is studied here incorporating deformation potential acoustic, polar optic and ionized impurity scattering in the framework of heated drifted Fermi-Dirac carrier distribution. Delta doping is considered in the calculations to investigate the effects of double delta doping on millimeter and submillimeter wave response of two dimensional hot electrons in GaAs nanostructures. The inclusion of delta doping is found to enhance considerably the two dimensional electron density which in turn improves the carrier mobility (both ac and dc) values in the GaAs quantum wells thereby providing scope of getting higher speed devices in future.Keywords: Carrier mobility, Delta doping, Hot carriers, Quantum wells.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 167212574 High-Fidelity 1D Dynamic Model of a Hydraulic Servo Valve Using 3D Computational Fluid Dynamics and Electromagnetic Finite Element Analysis
Authors: D. Henninger, A. Zopey, T. Ihde, C. Mehring
Abstract:
The dynamic performance of a 4-way solenoid operated hydraulic spool valve has been analyzed by means of a one-dimensional modeling approach capturing flow, magnetic and fluid forces, valve inertia forces, fluid compressibility, and damping. Increased model accuracy was achieved by analyzing the detailed three-dimensional electromagnetic behavior of the solenoids and flow behavior through the spool valve body for a set of relevant operating conditions, thereby allowing the accurate mapping of flow and magnetic forces on the moving valve body, in lieu of representing the respective forces by lower-order models or by means of simplistic textbook correlations. The resulting high-fidelity one-dimensional model provided the basis for specific and timely design modification eliminating experimentally observed valve oscillations.Keywords: Dynamic performance model, high-fidelity model, 1D-3D decoupled analysis, solenoid-operated hydraulic servo valve, CFD and electromagnetic FEA.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1152