Search results for: pipelined algorithm.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3437

Search results for: pipelined algorithm.

1337 The Different Ways to Describe Regular Languages by Using Finite Automata and the Changing Algorithm Implementation

Authors: Abdulmajid Mukhtar Afat

Abstract:

This paper aims at introducing finite automata theory, the different ways to describe regular languages and create a program to implement the subset construction algorithms to convert nondeterministic finite automata (NFA) to deterministic finite automata (DFA). This program is written in c++ programming language. The program reads FA 5tuples from text file and then classifies it into either DFA or NFA. For DFA, the program will read the string w and decide whether it is acceptable or not. If accepted, the program will save the tracking path and point it out. On the other hand, when the automation is NFA, the program will change the Automation to DFA so that it is easy to track and it can decide whether the w exists in the regular language or not.

Keywords: Finite Automata, subset construction DFA, NFA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1986
1336 Reservoir Operating by Ant Colony Optimization for Continuous Domains (ACOR) Case Study: Dez Reservoir

Authors: A. B. Dariane, A. M. Moradi

Abstract:

A direct search approach to determine optimal reservoir operating is proposed with ant colony optimization for continuous domains (ACOR). The model is applied to a system of single reservoir to determine the optimum releases during 42 years of monthly steps. A disadvantage of ant colony based methods and the ACOR in particular, refers to great amount of computer run time consumption. In this study a highly effective procedure for decreasing run time has been developed. The results are compared to those of a GA based model.

Keywords: Ant colony optimization, continuous, metaheuristics, reservoir, decreasing run time, genetic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2029
1335 Cooperative Multi Agent Soccer Robot Team

Authors: Vahid Rostami, Saeed Ebrahimijam, P.khajehpoor, P.Mirzaei, Mahdi Yousefiazar

Abstract:

This paper introduces our first efforts of developing a new team for RoboCup Middle Size Competition. In our robots we have applied omni directional based mobile system with omnidirectional vision system and fuzzy control algorithm to navigate robots. The control architecture of MRL middle-size robots is a three layered architecture, Planning, Sequencing, and Executing. It also uses Blackboard system to achieve coordination among agents. Moreover, the architecture should have minimum dependency on low level structure and have a uniform protocol to interact with real robot.

Keywords: Robocup, Soccer robots, Fuzzy controller, Multi agent.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1557
1334 A Framework for Semantics Preserving SPARQL-to-SQL Translation

Authors: N. Soussi, M. Bahaj

Abstract:

The enormous amount of information stored on the web increases from one day to the next, exposing the web currently faced with the inevitable difficulties of research pertinent information that users really want. The problem today is not limited to expanding the size of the information highways, but to design a system for intelligent search. The vast majority of this information is stored in relational databases, which in turn represent a backend for managing RDF data of the semantic web. This problem has motivated us to write this paper in order to establish an effective approach to support semantic transformation algorithm for SPARQL queries to SQL queries, more precisely SPARQL SELECT queries; by adopting this method, the relational database can be questioned easily with SPARQL queries maintaining the same performance.

Keywords: RDF, Semantic Web, SPARQL, SPARQL Query Transformation, SQL.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1754
1333 OCR/ICR Text Recognition Using ABBYY FineReader as an Example Text

Authors: A. R. Bagirzade, A. Sh. Najafova, S. M. Yessirkepova, E. S. Albert

Abstract:

This article describes a text recognition method based on Optical Character Recognition (OCR). The features of the OCR method were examined using the ABBYY FineReader program. It describes automatic text recognition in images. OCR is necessary because optical input devices can only transmit raster graphics as a result. Text recognition describes the task of recognizing letters shown as such, to identify and assign them an assigned numerical value in accordance with the usual text encoding (ASCII, Unicode). The peculiarity of this study conducted by the authors using the example of the ABBYY FineReader, was confirmed and shown in practice, the improvement of digital text recognition platforms developed by Electronic Publication.

Keywords: ABBYY FineReader system, algorithm symbol recognition, OCR/ICR techniques, recognition technologies.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 781
1332 Harmonic Reduction In Three-Phase Parallel Connected Inverter

Authors: M.A.A. Younis, N. A. Rahim, S. Mekhilef

Abstract:

This paper presents the design and analysis of a parallel connected inverter configuration of. The configuration consists of parallel connected three-phase dc/ac inverter. Series resistors added to the inverter output to maintain same current in each inverter of the two parallel inverters, and to reduce the circulating current in the parallel inverters to the minimum. High frequency third harmonic injection PWM (THIPWM) employed to reduce the total harmonic distortion and to make maximum use of the voltage source. DSP was used to generate the THIPWM and the control algorithm for the converter. Selected experimental results have been shown to validate the proposed system.

Keywords: Three-phase inverter, Third harmonic injection PWM, inverters parallel connection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3775
1331 A Nonlinear Parabolic Partial Differential Equation Model for Image Enhancement

Authors: Tudor Barbu

Abstract:

We present a robust nonlinear parabolic partial differential equation (PDE)-based denoising scheme in this article. Our approach is based on a second-order anisotropic diffusion model that is described first. Then, a consistent and explicit numerical approximation algorithm is constructed for this continuous model by using the finite-difference method. Finally, our restoration experiments and method comparison, which prove the effectiveness of this proposed technique, are discussed in this paper.

Keywords: Image denoising and restoration, nonlinear PDE model, anisotropic diffusion, numerical approximation scheme, finite differences.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1303
1330 Tool Path Generation and Manufacturing Process for Blades of a Compressor Rotor

Authors: C. Tung, P.-L. Tso

Abstract:

This paper presents a complete procedure for tool path planning and blade machining in 5-axis manufacturing. The actual cutting contact and cutter locations can be determined by lead and tilt angles. The tool path generation is implemented by piecewise curved approximation and chordal deviation detection. An application about drive surface method promotes flexibility of tool control and stability of machine motion. A real manufacturing process is proposed to separate the operation into three regions with five stages and to modify the local tool orientation with an interactive algorithm.

Keywords: 5-axis machining, tool orientation, lead and tilt angles, tool path generation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2266
1329 Learning FCM by Tabu Search

Authors: Somayeh Alizadeh, Mehdi Ghazanfari, Mostafa Jafari, Salman Hooshmand

Abstract:

Fuzzy Cognitive Maps (FCMs) is a causal graph, which shows the relations between essential components in complex systems. Experts who are familiar with the system components and their relations can generate a related FCM. There is a big gap when human experts cannot produce FCM or even there is no expert to produce the related FCM. Therefore, a new mechanism must be used to bridge this gap. In this paper, a novel learning method is proposed to construct causal graph based on historical data and by using metaheuristic such Tabu Search (TS). The efficiency of the proposed method is shown via comparison of its results of some numerical examples with those of some other methods.

Keywords: Fuzzy Cognitive Map (FCM), Learning, Meta heuristic, Genetic Algorithm, Tabu search.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1864
1328 Tabu Search Approach to Solve Routing Issues in Communication Networks

Authors: Anant Oonsivilai, Wichai Srisuruk, Boonruang Marungsri, Thanatchai Kulworawanichpong

Abstract:

Optimal routing in communication networks is a major issue to be solved. In this paper, the application of Tabu Search (TS) in the optimum routing problem where the aim is to minimize the computational time and improvement of quality of the solution in the communication have been addressed. The goal is to minimize the average delays in the communication. The effectiveness of Tabu Search method is shown by the results of simulation to solve the shortest path problem. Through this approach computational cost can be reduced.

Keywords: Communication networks, optimum routing network, tabu search algorithm, shortest path.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2096
1327 Moving Vehicles Detection Using Automatic Background Extraction

Authors: Saad M. Al-Garni, Adel A. Abdennour

Abstract:

Vehicle detection is the critical step for highway monitoring. In this paper we propose background subtraction and edge detection technique for vehicle detection. This technique uses the advantages of both approaches. The practical applications approved the effectiveness of this method. This method consists of two procedures: First, automatic background extraction procedure, in which the background is extracted automatically from the successive frames; Second vehicles detection procedure, which depend on edge detection and background subtraction. Experimental results show the effective application of this algorithm. Vehicles detection rate was higher than 91%.

Keywords: Image processing, Automatic background extraction, Moving vehicle detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2424
1326 Classification of Non Stationary Signals Using Ben Wavelet and Artificial Neural Networks

Authors: Mohammed Benbrahim, Khalid Benjelloun, Aomar Ibenbrahim, Adil Daoudi

Abstract:

The automatic classification of non stationary signals is an important practical goal in several domains. An essential classification task is to allocate the incoming signal to a group associated with the kind of physical phenomena producing it. In this paper, we present a modular system composed by three blocs: 1) Representation, 2) Dimensionality reduction and 3) Classification. The originality of our work consists in the use of a new wavelet called "Ben wavelet" in the representation stage. For the dimensionality reduction, we propose a new algorithm based on the random projection and the principal component analysis.

Keywords: Seismic signals, Ben Wavelet, Dimensionality reduction, Artificial neural networks, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1450
1325 Independent Encryption Technique for Mobile Voice Calls

Authors: Nael Hirzalla

Abstract:

The legality of some countries or agencies’ acts to spy on personal phone calls of the public became a hot topic to many social groups’ talks. It is believed that this act is considered an invasion to someone’s privacy. Such act may be justified if it is singling out specific cases but to spy without limits is very unacceptable. This paper discusses the needs for not only a simple and light weight technique to secure mobile voice calls but also a technique that is independent from any encryption standard or library. It then presents and tests one encrypting algorithm that is based of Frequency scrambling technique to show fair and delay-free process that can be used to protect phone calls from such spying acts.

Keywords: Frequency Scrambling, Mobile Applications, Real- Time Voice Encryption, Spying on Calls.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2557
1324 Accurate Optical Flow Based on Spatiotemporal Gradient Constancy Assumption

Authors: Adam Rabcewicz

Abstract:

Variational methods for optical flow estimation are known for their excellent performance. The method proposed by Brox et al. [5] exemplifies the strength of that framework. It combines several concepts into single energy functional that is then minimized according to clear numerical procedure. In this paper we propose a modification of that algorithm starting from the spatiotemporal gradient constancy assumption. The numerical scheme allows to establish the connection between our model and the CLG(H) method introduced in [18]. Experimental evaluation carried out on synthetic sequences shows the significant superiority of the spatial variant of the proposed method. The comparison between methods for the realworld sequence is also enclosed.

Keywords: optical flow, variational methods, gradient constancy assumption.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2180
1323 Kinetics of Palm Oil Cracking in Batch Reactor

Authors: Farouq Twaiq, Ishaq Al-Anbari, Mustafa Nasser

Abstract:

The kinetics of palm oil catalytic cracking over aluminum containing mesoporous silica Al-MCM-41 (5% Al) was investigated in a batch autoclave reactor at the temperatures range of 573 – 673 K. The catalyst was prepared by using sol-gel technique and has been characterized by nitrogen adsorption and x-ray diffraction methods. Surface area of 1276 m2/g with average pore diameter of 2.54 nm and pore volume of 0.811 cm3/g was obtained. The experimental catalytic cracking runs were conducted using 50 g of oil and 1 g of catalyst. The reaction pressure was recorded at different time intervals and the data were analyzed using Levenberg- Marquardt (LM) algorithm using polymath software. The results show that the reaction order was found to be -1.5 and activation energy of 3200 J/gmol.

Keywords: Batch Reactor, Catalytic Cracking, Kinetics, Palm Oil.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2998
1322 Similarity Based Retrieval in Case Based Reasoning for Analysis of Medical Images

Authors: M. Das Gupta, S. Banerjee

Abstract:

Content Based Image Retrieval (CBIR) coupled with Case Based Reasoning (CBR) is a paradigm that is becoming increasingly popular in the diagnosis and therapy planning of medical ailments utilizing the digital content of medical images. This paper presents a survey of some of the promising approaches used in the detection of abnormalities in retina images as well in mammographic screening and detection of regions of interest in MRI scans of the brain. We also describe our proposed algorithm to detect hard exudates in fundus images of the retina of Diabetic Retinopathy patients.

Keywords: Case based reasoning, Exudates, Retina image, Similarity based retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2124
1321 Designing and Implementing a Novel Scheduler for Multiprocessor System using Genetic Algorithm

Authors: Iman Zangeneh, Mostafa Moradi, Mazyar Baranpouyan

Abstract:

System is using multiple processors for computing and information processing, is increasing rapidly speed operation of these systems compared with single processor systems, very significant impact on system performance is increased .important differences to yield a single multi-processor cpu, the scheduling policies, to reduce the implementation time of all processes. Notwithstanding the famous algorithms such as SPT, LPT, LSPT and RLPT for scheduling and there, but none led to the answer are not optimal.In this paper scheduling using genetic algorithms and innovative way to finish the whole process faster that we do and the result compared with three algorithms we mentioned.

Keywords: Multiprocessor system, genetic algorithms, time implementation process.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1559
1320 A New Method of Adaptation in Integrated Learning Environment

Authors: Ildar Galeev, Renat Mustaphin, C. Ardil

Abstract:

A new method of adaptation in a partially integrated learning environment that includes electronic textbook (ET) and integrated tutoring system (ITS) is described. The algorithm of adaptation is described in detail. It includes: establishment of Interconnections of operations and concepts; estimate of the concept mastering level (for all concepts); estimate of student-s non-mastering level on the current learning step of information on each page of ET; creation of a rank-order list of links to the e-manual pages containing information that require repeated work.

Keywords: Adaptation, Integrated Learning Environment, Integrated Tutoring System, Electronic Textbook.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1468
1319 Electrodermal Activity Measurement Using Constant Current AC Source

Authors: Cristian Chacha, David Asiain, Jesús Ponce de León, José Ramón Beltrán

Abstract:

This work explores and characterizes the behavior of the AFE AD5941 in impedance measurement using an embedded algorithm that allows using a constant current AC source. The main aim of this research is to improve the exact measurement of impedance values for their application in EDA-focused wearable devices. Through comprehensive study and characterization, it has been observed that employing a measurement sequence with a constant current source produces results with increased dispersion but higher accuracy and a more linear behavior with respect to error. As a result, this approach leads to a more accurate system for impedance measurement.

Keywords: Electrodermal Activity, constant current AC source, wearable, precision, accuracy, impedance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 86
1318 Blind Channel Estimation Based on URV Decomposition Technique for Uplink of MC-CDMA

Authors: Pradya Pornnimitkul, Suwich Kunaruttanapruk, Bamrung Tau Sieskul, Somchai Jitapunkul

Abstract:

In this paper, we investigate a blind channel estimation method for Multi-carrier CDMA systems that use a subspace decomposition technique. This technique exploits the orthogonality property between the noise subspace and the received user codes to obtain channel of each user. In the past we used Singular Value Decomposition (SVD) technique but SVD have most computational complexity so in this paper use a new algorithm called URV Decomposition, which serve as an intermediary between the QR decomposition and SVD, replaced in SVD technique to track the noise space of the received data. Because of the URV decomposition has almost the same estimation performance as the SVD, but has less computational complexity.

Keywords: Channel estimation, MC-CDMA, SVD, URV.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1780
1317 Capacity Optimization in Cooperative Cognitive Radio Networks

Authors: Mahdi Pirmoradian, Olayinka Adigun, Christos Politis

Abstract:

Cooperative spectrum sensing is a crucial challenge in cognitive radio networks. Cooperative sensing can increase the reliability of spectrum hole detection, optimize sensing time and reduce delay in cooperative networks. In this paper, an efficient central capacity optimization algorithm is proposed to minimize cooperative sensing time in a homogenous sensor network using OR decision rule subject to the detection and false alarm probabilities constraints. The evaluation results reveal significant improvement in the sensing time and normalized capacity of the cognitive sensors.

Keywords: Cooperative networks, normalized capacity, sensing time.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1879
1316 Application of Computational Intelligence for Sensor Fault Detection and Isolation

Authors: A. Jabbari, R. Jedermann, W. Lang

Abstract:

The new idea of this research is application of a new fault detection and isolation (FDI) technique for supervision of sensor networks in transportation system. In measurement systems, it is necessary to detect all types of faults and failures, based on predefined algorithm. Last improvements in artificial neural network studies (ANN) led to using them for some FDI purposes. In this paper, application of new probabilistic neural network features for data approximation and data classification are considered for plausibility check in temperature measurement. For this purpose, two-phase FDI mechanism was considered for residual generation and evaluation.

Keywords: Fault detection and Isolation, Neural network, Temperature measurement, measurement approximation and classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2070
1315 Effect of a Linear-Exponential Penalty Functionon the GA-s Efficiency in Optimization of a Laminated Composite Panel

Authors: A. Abedian, M. H. Ghiasi, B. Dehghan-Manshadi

Abstract:

A stiffened laminated composite panel (1 m length × 0.5m width) was optimized for minimum weight and deflection under several constraints using genetic algorithm. Here, a significant study on the performance of a penalty function with two kinds of static and dynamic penalty factors was conducted. The results have shown that linear dynamic penalty factors are more effective than the static ones. Also, a specially combined linear-exponential function has shown to perform more effective than the previously mentioned penalty functions. This was then resulted in the less sensitivity of the GA to the amount of penalty factor.

Keywords: Genetic algorithms, penalty function, stiffenedcomposite panel, finite element method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1678
1314 Real-time ROI Acquisition for Unsupervised and Touch-less Palmprint

Authors: Yi Feng, Jingwen Li, Lei Huang, Changping Liu

Abstract:

In this paper we proposed a novel method to acquire the ROI (Region of interest) of unsupervised and touch-less palmprint captured from a web camera in real-time. We use Viola-Jones approach and skin model to get the target area in real time. Then an innovative course-to-fine approach to detect the key points on the hand is described. A new algorithm is used to find the candidate key points coarsely and quickly. In finely stage, we verify the hand key points with the shape context descriptor. To make the user much comfortable, it can process the hand image with different poses, even the hand is closed. Experiments show promising result by using the proposed method in various conditions.

Keywords: Palmprint recoginition, hand detection, touch-lesspalmprint, ROI localization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1726
1313 An Improved Switching Median filter for Uniformly Distributed Impulse Noise Removal

Authors: Rajoo Pandey

Abstract:

The performance of an image filtering system depends on its ability to detect the presence of noisy pixels in the image. Most of the impulse detection schemes assume the presence of salt and pepper noise in the images and do not work satisfactorily in case of uniformly distributed impulse noise. In this paper, a new algorithm is presented to improve the performance of switching median filter in detection of uniformly distributed impulse noise. The performance of the proposed scheme is demonstrated by the results obtained from computer simulations on various images.

Keywords: Switching median filter, Impulse noise, Imagefiltering, Impulse detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1957
1312 On Improving Breast Cancer Prediction Using GRNN-CP

Authors: Kefaya Qaddoum

Abstract:

The aim of this study is to predict breast cancer and to construct a supportive model that will stimulate a more reliable prediction as a factor that is fundamental for public health. In this study, we utilize general regression neural networks (GRNN) to replace the normal predictions with prediction periods to achieve a reasonable percentage of confidence. The mechanism employed here utilises a machine learning system called conformal prediction (CP), in order to assign consistent confidence measures to predictions, which are combined with GRNN. We apply the resulting algorithm to the problem of breast cancer diagnosis. The results show that the prediction constructed by this method is reasonable and could be useful in practice.

Keywords: Neural network, conformal prediction, cancer classification, regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 839
1311 Arabic Character Recognition using Artificial Neural Networks and Statistical Analysis

Authors: Ahmad M. Sarhan, Omar I. Al Helalat

Abstract:

In this paper, an Arabic letter recognition system based on Artificial Neural Networks (ANNs) and statistical analysis for feature extraction is presented. The ANN is trained using the Least Mean Squares (LMS) algorithm. In the proposed system, each typed Arabic letter is represented by a matrix of binary numbers that are used as input to a simple feature extraction system whose output, in addition to the input matrix, are fed to an ANN. Simulation results are provided and show that the proposed system always produces a lower Mean Squared Error (MSE) and higher success rates than the current ANN solutions.

Keywords: ANN, Backpropagation, Gaussian, LMS, MSE, Neuron, standard deviation, Widrow-Hoff rule.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2014
1310 Design and Development of Automatic Leveling and Equalizing Hoist Device for Spacecraft

Authors: Fu Hao, Sun Gang, Tang Laiying, Cui Junfeng

Abstract:

To solve the quick and accurate level-adjusting problem in the process of spacecraft precise mating, automatic leveling and equalizing hoist device for spacecraft is developed. Based on lifting point adjustment by utilizing XY-workbench, the leveling and equalizing controller by a self-adaptive control algorithm is proposed. By simulation analysis and lifting test using engineering prototype, validity and reliability of the hoist device is verified, which can meet the precision mating requirements of practical applications for spacecraft.

Keywords: automatic leveling and equalizing, hoist device, lifting point adjustment, self-adaptive control

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2020
1309 A Visual Control Flow Language and Its Termination Properties

Authors: László Lengyel, Tihamér Levendovszky, Hassan Charaf

Abstract:

This paper presents the visual control flow support of Visual Modeling and Transformation System (VMTS), which facilitates composing complex model transformations out of simple transformation steps and executing them. The VMTS Visual Control Flow Language (VCFL) uses stereotyped activity diagrams to specify control flow structures and OCL constraints to choose between different control flow branches. This work discusses the termination properties of VCFL and provides an algorithm to support the termination analysis of VCFL transformations.

Keywords: Control Flow, Metamodel-Based Visual Model Transformation, OCL, Termination Properties, UML.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2066
1308 Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

Authors: G. Candel, D. Naccache

Abstract:

t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embedding. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic, and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n2) to O(n2/k), and the memory requirement from n2 to 2(n/k)2 which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.

Keywords: Concept drift, data visualization, dimension reduction, embedding, monitoring, reusability, t-SNE, unsupervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 489