Search results for: VSS (Vector Space Similarity)
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5143

Search results for: VSS (Vector Space Similarity)

5113 Exploring Students' Alternative Conception in Vector Components

Authors: Umporn Wutchana

Abstract:

An open ended problem and unstructured interview had been used to explore students’ conceptual and procedural understanding of vector components. The open ended problem had been designed based on research instrument used in previous physics education research. Without physical context, we asked students to find out magnitude and draw graphical form of vector components. The open ended problem was given to 211 first year students of faculty of science during the third (summer) semester in 2014 academic year. The students spent approximately 15 minutes of their second time of the General Physics I course to complete the open ended problem after they had failed. Consequently, their responses were classified based on the similarity of errors performed in the responses. Then, an unstructured interview was conducted. 7 students were randomly selected and asked to reason and explain their answers. The study results showed that 53% of 211 students provided correct numerical magnitude of vector components while 10.9% of them confused and punctuated the magnitude of vectors in x- with y-components. Others 20.4% provided just symbols and the last 15.6% gave no answer. When asking to draw graphical form of vector components, only 10% of 211 students made corrections. A majority of them produced errors and revealed alternative conceptions. 46.5% drew longer and/or shorter magnitude of vector components. 43.1% drew vectors in different forms or wrote down other symbols. Results from the unstructured interview indicated that some students just memorized the method to get numerical magnitude of x- and y-components. About graphical form of component vectors, some students though that the length of component vectors should be shorter than those of the given one. So then, it could be combined to be equal length of the given vectors while others though that component vectors should has the same length as the given vectors. It was likely to be that many students did not develop a strong foundation of understanding in vector components but just learn by memorizing its solution or the way to compute its magnitude and attribute little meaning to such concept.

Keywords: graphical vectors, vectors, vector components, misconceptions, alternative conceptions

Procedia PDF Downloads 165
5112 2D Fingerprint Performance for PubChem Chemical Database

Authors: Fatimah Zawani Abdullah, Shereena Mohd Arif, Nurul Malim

Abstract:

The study of molecular similarity search in chemical database is increasingly widespread, especially in the area of drug discovery. Similarity search is an application in the field of Chemoinformatics to measure the similarity between the molecular structure which is known as the query and the structure of chemical compounds in the database. Similarity search is also one of the approaches in virtual screening which involves computational techniques and scoring the probabilities of activity. The main objective of this work is to determine the best fingerprint when compared to the other five fingerprints selected in this study using PubChem chemical dataset. This paper will discuss the similarity searching process conducted using 6 types of descriptors, which are ECFP4, ECFC4, FCFP4, FCFC4, SRECFC4 and SRFCFC4 on 15 activity classes of PubChem dataset using Tanimoto coefficient to calculate the similarity between the query structures and each of the database structure. The results suggest that ECFP4 performs the best to be used with Tanimoto coefficient in the PubChem dataset.

Keywords: 2D fingerprints, Tanimoto, PubChem, similarity searching, chemoinformatics

Procedia PDF Downloads 261
5111 Component Based Testing Using Clustering and Support Vector Machine

Authors: Iqbaldeep Kaur, Amarjeet Kaur

Abstract:

Software Reusability is important part of software development. So component based software development in case of software testing has gained a lot of practical importance in the field of software engineering from academic researcher and also from software development industry perspective. Finding test cases for efficient reuse of test cases is one of the important problems aimed by researcher. Clustering reduce the search space, reuse test cases by grouping similar entities according to requirements ensuring reduced time complexity as it reduce the search time for retrieval the test cases. In this research paper we proposed approach for re-usability of test cases by unsupervised approach. In unsupervised learning we proposed k-mean and Support Vector Machine. We have designed the algorithm for requirement and test case document clustering according to its tf-idf vector space and the output is set of highly cohesive pattern groups.

Keywords: software testing, reusability, clustering, k-mean, SVM

Procedia PDF Downloads 398
5110 A Comparative Analysis of Multicarrier SPWM Strategies for Five-Level Flying Capacitor Inverter

Authors: Bachir Belmadani, Rachid Taleb, Zinelaabidine Boudjema, Adil Yahdou

Abstract:

Carrier-based methods have been used widely for switching of multilevel inverters due to their simplicity, flexibility and reduced computational requirements compared to space vector modulation (SVM). This paper focuses on Multicarrier Sinusoidal Pulse Width Modulation (MCSPWM) strategy for the three phase Five-Level Flying Capacitor Inverter (5LFCI). The inverter is simulated for Induction Motor (IM) load and Total Harmonic Distortion (THD) for output waveforms is observed for different controlling schemes.

Keywords: flying capacitor inverter, multicarrier sinusoidal pulse width modulation, space vector modulation, total harmonic distortion, induction motor

Procedia PDF Downloads 384
5109 Decision Trees Constructing Based on K-Means Clustering Algorithm

Authors: Loai Abdallah, Malik Yousef

Abstract:

A domain space for the data should reflect the actual similarity between objects. Since objects belonging to the same cluster usually share some common traits even though their geometric distance might be relatively large. In general, the Euclidean distance of data points that represented by large number of features is not capturing the actual relation between those points. In this study, we propose a new method to construct a different space that is based on clustering to form a new distance metric. The new distance space is based on ensemble clustering (EC). The EC distance space is defined by tracking the membership of the points over multiple runs of clustering algorithm metric. Over this distance, we train the decision trees classifier (DT-EC). The results obtained by applying DT-EC on 10 datasets confirm our hypotheses that embedding the EC space as a distance metric would improve the performance.

Keywords: ensemble clustering, decision trees, classification, K nearest neighbors

Procedia PDF Downloads 160
5108 Optimality Conditions for Weak Efficient Solutions Generated by a Set Q in Vector Spaces

Authors: Elham Kiyani, S. Mansour Vaezpour, Javad Tavakoli

Abstract:

In this paper, we first introduce a new distance function in a linear space not necessarily endowed with a topology. The algebraic concepts of interior and closure are useful to study optimization problems without topology. So, we define Q-weak efficient solutions generated by the algebraic interior of a set Q, where Q is not necessarily convex. Studying nonconvex vector optimization is valuable since, for a convex cone K in topological spaces, we have int(K)=cor(K), which means that topological interior of a convex cone K is equal to the algebraic interior of K. Moreover, we used the scalarization technique including the distance function generated by the vectorial closure of a set to characterize these Q-weak efficient solutions. Scalarization is a useful approach for solving vector optimization problems. This technique reduces the optimization problem to a scalar problem which tends to be an optimization problem with a real-valued objective function. For instance, Q-weak efficient solutions of vector optimization problems can be characterized and computed as solutions of appropriate scalar optimization problems. In the convex case, linear functionals can be used as objective functionals of the scalar problems. But in the nonconvex case, we should present a suitable objective function. It is the aim of this paper to present a new distance function that be useful to obtain sufficient and necessary conditions for Q-weak efficient solutions of general optimization problems via scalarization.

Keywords: weak efficient, algebraic interior, vector closure, linear space

Procedia PDF Downloads 195
5107 Space Vector Pulse Width Modulation Based Design and Simulation of a Three-Phase Voltage Source Converter Systems

Authors: Farhan Beg

Abstract:

A space vector based pulse width modulation control technique for the three-phase PWM converter is proposed in this paper. The proposed control scheme is based on a synchronous reference frame model. High performance and efficiency is obtained with regards to the DC bus voltage and the power factor considerations of the PWM rectifier thus leading to low losses. MATLAB/SIMULINK are used as a platform for the simulations and a SIMULINK model is presented in the paper. The results show that the proposed model demonstrates better performance and properties compared to the traditional SPWM method and the method improves the dynamic performance of the closed loop drastically. For the space vector based pulse width modulation, sine signal is the reference waveform and triangle waveform is the carrier waveform. When the value of sine signal is larger than triangle signal, the pulse will start producing to high; and then when the triangular signals higher than sine signal, the pulse will come to low. SPWM output will change by changing the value of the modulation index and frequency used in this system to produce more pulse width. When more pulse width is produced, the output voltage will have lower harmonics contents and the resolution will increase.

Keywords: power factor, SVPWM, PWM rectifier, SPWM

Procedia PDF Downloads 309
5106 Intracellular Strategies for Gene Delivery into Mammalian Cells Using Bacteria as a Vector

Authors: Kumaran Narayanan, Andrew N. Osahor

Abstract:

E. coli has been engineered by our group and by others as a vector to deliver DNA into cultured human and animal cells. However, so far conditions to improve gene delivery using this vector have not been investigated, resulting in a major gap in our understanding of the requirements for this vector to function optimally. Our group recently published novel data showing that simple addition of the DNA transfection reagent Lipofectamine increased the efficiency of the E. coli vector by almost 3-fold, providing the first strong evidence that further optimization of bactofection is possible. This presentation will discuss advances that demonstrate the effects of several intracellular strategies that improve the efficiency of this vector. Conditions that promote endosomal escape of internalized bacteria to evade lysosomal destruction after entry in the cell, a known obstacle limiting this vector, are elucidated. Further, treatments that increase bacterial lysis so that the vector can release its transgene into the mammalian environment for expression will be discussed. These experiments will provide valuable new insight to advance this E. coli system as an important class of vector technology for genetic correction of human disease models in cells and whole animals.

Keywords: DNA, E. coli, gene expression, vector

Procedia PDF Downloads 329
5105 Experimental Implementation of Model Predictive Control for Permanent Magnet Synchronous Motor

Authors: Abdelsalam A. Ahmed

Abstract:

Fast speed drives for Permanent Magnet Synchronous Motor (PMSM) is a crucial performance for the electric traction systems. In this paper, PMSM is drived with a Model-based Predictive Control (MPC) technique. Fast speed tracking is achieved through optimization of the DC source utilization using MPC. The technique is based on predicting the optimum voltage vector applied to the driver. Control technique is investigated by comparing to the cascaded PI control based on Space Vector Pulse Width Modulation (SVPWM). MPC and SVPWM-based FOC are implemented with the TMS320F2812 DSP and its power driver circuits. The designed MPC for a PMSM drive is experimentally validated on a laboratory test bench. The performances are compared with those obtained by a conventional PI-based system in order to highlight the improvements, especially regarding speed tracking response.

Keywords: permanent magnet synchronous motor, model-based predictive control, DC source utilization, cascaded PI control, space vector pulse width modulation, TMS320F2812 DSP

Procedia PDF Downloads 612
5104 Speed up Vector Median Filtering by Quasi Euclidean Norm

Authors: Vinai K. Singh

Abstract:

For reducing impulsive noise without degrading image contours, median filtering is a powerful tool. In multiband images as for example colour images or vector fields obtained by optic flow computation, a vector median filter can be used. Vector median filters are defined on the basis of a suitable distance, the best performing distance being the Euclidean. Euclidean distance is evaluated by using the Euclidean norms which is quite demanding from the point of view of computation given that a square root is required. In this paper an optimal piece-wise linear approximation of the Euclidean norm is presented which is applied to vector median filtering.

Keywords: euclidean norm, quasi euclidean norm, vector median filtering, applied mathematics

Procedia PDF Downloads 437
5103 ANN Based Simulation of PWM Scheme for Seven Phase Voltage Source Inverter Using MATLAB/Simulink

Authors: Mohammad Arif Khan

Abstract:

This paper analyzes and presents the development of Artificial Neural Network based controller of space vector modulation (ANN-SVPWM) for a seven-phase voltage source inverter. At first, the conventional method of producing sinusoidal output voltage by utilizing six active and one zero space vectors are used to synthesize the input reference, is elaborated and then new PWM scheme called Artificial Neural Network Based PWM is presented. The ANN based controller has the advantage of the very fast implementation and analyzing the algorithms and avoids the direct computation of trigonometric and non-linear functions. The ANN controller uses the individual training strategy with the fixed weight and supervised models. A computer simulation program has been developed using Matlab/Simulink together with the neural network toolbox for training the ANN-controller. A comparison of the proposed scheme with the conventional scheme is presented based on various performance indices. Extensive Simulation results are provided to validate the findings.

Keywords: space vector PWM, total harmonic distortion, seven-phase, voltage source inverter, multi-phase, artificial neural network

Procedia PDF Downloads 431
5102 Theoretical BER Analyzing of MPSK Signals Based on the Signal Space

Authors: Jing Qing-feng, Liu Danmei

Abstract:

Based on the optimum detection, signal projection and Maximum A Posteriori (MAP) rule, Proakis has deduced the theoretical BER equation of Gray coded MPSK signals. Proakis analyzed the BER theoretical equations mainly based on the projection of signals, which is difficult to be understood. This article solve the same problem based on the signal space, which explains the vectors relations among the sending signals, received signals and noises. The more explicit and easy-deduced process is illustrated in this article based on the signal space, which can illustrated the relations among the signals and noises clearly. This kind of deduction has a univocal geometry meaning. It can explain the correlation between the production and calculation of BER in vector level.

Keywords: MPSK, MAP, signal space, BER

Procedia PDF Downloads 318
5101 A Comparative Study of Approaches in User-Centred Health Information Retrieval

Authors: Harsh Thakkar, Ganesh Iyer

Abstract:

In this paper, we survey various user-centered or context-based biomedical health information retrieval systems. We present and discuss the performance of systems submitted in CLEF eHealth 2014 Task 3 for this purpose. We classify and focus on comparing the two most prevalent retrieval models in biomedical information retrieval namely: Language Model (LM) and Vector Space Model (VSM). We also report on the effectiveness of using external medical resources and ontologies like MeSH, Metamap, UMLS, etc. We observed that the LM based retrieval systems outperform VSM based systems on various fronts. From the results we conclude that the state-of-art system scores for MAP was 0.4146, P@10 was 0.7560 and NDCG@10 was 0.7445, respectively. All of these score were reported by systems built on language modeling approaches.

Keywords: clinical document retrieval, concept-based information retrieval, query expansion, language models, vector space models

Procedia PDF Downloads 290
5100 Similarity Based Membership of Elements to Uncertain Concept in Information System

Authors: M. Kamel El-Sayed

Abstract:

The process of determining the degree of membership for an element to an uncertain concept has been found in many ways, using equivalence and symmetry relations in information systems. In the case of similarity, these methods did not take into account the degree of symmetry between elements. In this paper, we use a new definition for finding the membership based on the degree of symmetry. We provide an example to clarify the suggested methods and compare it with previous methods. This method opens the door to more accurate decisions in information systems.

Keywords: information system, uncertain concept, membership function, similarity relation, degree of similarity

Procedia PDF Downloads 188
5099 Support Vector Machine Based Retinal Therapeutic for Glaucoma Using Machine Learning Algorithm

Authors: P. S. Jagadeesh Kumar, Mingmin Pan, Yang Yung, Tracy Lin Huan

Abstract:

Glaucoma is a group of visual maladies represented by the scheduled optic nerve neuropathy; means to the increasing dwindling in vision ground, resulting in loss of sight. In this paper, a novel support vector machine based retinal therapeutic for glaucoma using machine learning algorithm is conservative. The algorithm has fitting pragmatism; subsequently sustained on correlation clustering mode, it visualizes perfect computations in the multi-dimensional space. Support vector clustering turns out to be comparable to the scale-space advance that investigates the cluster organization by means of a kernel density estimation of the likelihood distribution, where cluster midpoints are idiosyncratic by the neighborhood maxima of the concreteness. The predicted planning has 91% attainment rate on data set deterrent on a consolidation of 500 realistic images of resolute and glaucoma retina; therefore, the computational benefit of depending on the cluster overlapping system pedestal on machine learning algorithm has complete performance in glaucoma therapeutic.

Keywords: machine learning algorithm, correlation clustering mode, cluster overlapping system, glaucoma, kernel density estimation, retinal therapeutic

Procedia PDF Downloads 212
5098 Comparing Deep Architectures for Selecting Optimal Machine Translation

Authors: Despoina Mouratidis, Katia Lida Kermanidis

Abstract:

Machine translation (MT) is a very important task in Natural Language Processing (NLP). MT evaluation is crucial in MT development, as it constitutes the means to assess the success of an MT system, and also helps improve its performance. Several methods have been proposed for the evaluation of (MT) systems. Some of the most popular ones in automatic MT evaluation are score-based, such as the BLEU score, and others are based on lexical similarity or syntactic similarity between the MT outputs and the reference involving higher-level information like part of speech tagging (POS). This paper presents a language-independent machine learning framework for classifying pairwise translations. This framework uses vector representations of two machine-produced translations, one from a statistical machine translation model (SMT) and one from a neural machine translation model (NMT). The vector representations consist of automatically extracted word embeddings and string-like language-independent features. These vector representations used as an input to a multi-layer neural network (NN) that models the similarity between each MT output and the reference, as well as between the two MT outputs. To evaluate the proposed approach, a professional translation and a "ground-truth" annotation are used. The parallel corpora used are English-Greek (EN-GR) and English-Italian (EN-IT), in the educational domain and of informal genres (video lecture subtitles, course forum text, etc.) that are difficult to be reliably translated. They have tested three basic deep learning (DL) architectures to this schema: (i) fully-connected dense, (ii) Convolutional Neural Network (CNN), and (iii) Long Short-Term Memory (LSTM). Experiments show that all tested architectures achieved better results when compared against those of some of the well-known basic approaches, such as Random Forest (RF) and Support Vector Machine (SVM). Better accuracy results are obtained when LSTM layers are used in our schema. In terms of a balance between the results, better accuracy results are obtained when dense layers are used. The reason for this is that the model correctly classifies more sentences of the minority class (SMT). For a more integrated analysis of the accuracy results, a qualitative linguistic analysis is carried out. In this context, problems have been identified about some figures of speech, as the metaphors, or about certain linguistic phenomena, such as per etymology: paronyms. It is quite interesting to find out why all the classifiers led to worse accuracy results in Italian as compared to Greek, taking into account that the linguistic features employed are language independent.

Keywords: machine learning, machine translation evaluation, neural network architecture, pairwise classification

Procedia PDF Downloads 103
5097 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures

Authors: Salima Kouici, Abdelkader Khelladi

Abstract:

In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally, the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.

Keywords: binary data, similarity measure, Tθ measures, agglomerative hierarchical clustering

Procedia PDF Downloads 444
5096 Design and Simulation of a Double-Stator Linear Induction Machine with Short Squirrel-Cage Mover

Authors: David Rafetseder, Walter Bauer, Florian Poltschak, Wolfgang Amrhein

Abstract:

A flat double-stator linear induction machine (DSLIM) with a short squirrel-cage mover is designed for high thrust force at moderate speed < 5m/s. The performance and motor parameters are determined on the basis of a 2D time-transient simulation with the finite element (FE) software Maxwell 2015. Design guidelines and transformation rules for space vector theory of the LIM are presented. Resulting thrust calculated by flux and current vectors is compared with the FE results showing good coherence and reduced noise. The parameters of the equivalent circuit model are obtained.

Keywords: equivalent circuit model, finite element model, linear induction motor, space vector theory

Procedia PDF Downloads 537
5095 Efficient Antenna Array Beamforming with Robustness against Random Steering Mismatch

Authors: Ju-Hong Lee, Ching-Wei Liao, Kun-Che Lee

Abstract:

This paper deals with the problem of using antenna sensors for adaptive beamforming in the presence of random steering mismatch. We present an efficient adaptive array beamformer with robustness to deal with the considered problem. The robustness of the proposed beamformer comes from the efficient designation of the steering vector. Using the received array data vector, we construct an appropriate correlation matrix associated with the received array data vector and a correlation matrix associated with signal sources. Then, the eigenvector associated with the largest eigenvalue of the constructed signal correlation matrix is designated as an appropriate estimate of the steering vector. Finally, the adaptive weight vector required for adaptive beamforming is obtained by using the estimated steering vector and the constructed correlation matrix of the array data vector. Simulation results confirm the effectiveness of the proposed method.

Keywords: adaptive beamforming, antenna array, linearly constrained minimum variance, robustness, steering vector

Procedia PDF Downloads 170
5094 Empirical Study of Partitions Similarity Measures

Authors: Abdelkrim Alfalah, Lahcen Ouarbya, John Howroyd

Abstract:

This paper investigates and compares the performance of four existing distances and similarity measures between partitions. The partition measures considered are Rand Index (RI), Adjusted Rand Index (ARI), Variation of Information (VI), and Normalised Variation of Information (NVI). This work investigates the ability of these partition measures to capture three predefined intuitions: the variation within randomly generated partitions, the sensitivity to small perturbations, and finally the independence from the dataset scale. It has been shown that the Adjusted Rand Index performed well overall, with regards to these three intuitions.

Keywords: clustering, comparing partitions, similarity measure, partition distance, partition metric, similarity between partitions, clustering comparison.

Procedia PDF Downloads 147
5093 Reconstructed Phase Space Features for Estimating Post Traumatic Stress Disorder

Authors: Andre Wittenborn, Jarek Krajewski

Abstract:

Trauma-related sadness in speech can alter the voice in several ways. The generation of non-linear aerodynamic phenomena within the vocal tract is crucial when analyzing trauma-influenced speech production. They include non-laminar flow and formation of jets rather than well-behaved laminar flow aspects. Especially state-space reconstruction methods based on chaotic dynamics and fractal theory have been suggested to describe these aerodynamic turbulence-related phenomena of the speech production system. To extract the non-linear properties of the speech signal, we used the time delay embedding method to reconstruct from a scalar time series (reconstructed phase space, RPS). This approach results in the extraction of 7238 Features per .wav file (N= 47, 32 m, 15 f). The speech material was prompted by telling about autobiographical related sadness-inducing experiences (sampling rate 16 kHz, 8-bit resolution). After combining these features in a support vector machine based machine learning approach (leave-one-sample out validation), we achieved a correlation of r = .41 with the well-established, self-report ground truth measure (RATS) of post-traumatic stress disorder (PTSD).

Keywords: non-linear dynamics features, post traumatic stress disorder, reconstructed phase space, support vector machine

Procedia PDF Downloads 79
5092 Comparative Analysis of SVPWM and the Standard PWM Technique for Three Level Diode Clamped Inverter fed Induction Motor

Authors: L. Lakhdari, B. Bouchiba, M. Bechar

Abstract:

The multi-level inverters present an important novelty in the field of energy control with high voltage and power. The major advantage of all multi-level inverters is the improvement and spectral quality of its generated output signals. In recent years, various pulse width modulation techniques have been developed. From these technics we have: Sinusoidal Pulse Width Modulation (SPWM) and Space Vector Pulse Width Modulation (SVPWM). This work presents a detailed analysis of the comparative advantage of space vector pulse width modulation (SVPWM) and the standard SPWM technique for Three Level Diode Clamped Inverter fed Induction Motor. The comparison is based on the evaluation of harmonic distortion THD.

Keywords: induction motor, multilevel inverters, SVPWM, SPWM, THD

Procedia PDF Downloads 309
5091 A Clustering Algorithm for Massive Texts

Authors: Ming Liu, Chong Wu, Bingquan Liu, Lei Chen

Abstract:

Internet users have to face the massive amount of textual data every day. Organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering, in fact, is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection. This situation mainly attributes to the high- dimensional vectors generated from texts. To effectively and efficiently cluster large-scale text collection, this paper proposes a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster’s representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature’s weight is fine-tuned by iterative process. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster’s representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high quality on both small-scale and large-scale text collections.

Keywords: vector reconstruction, large-scale text clustering, partial tuning sub-process, overall tuning sub-process

Procedia PDF Downloads 404
5090 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity

Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang

Abstract:

The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.

Keywords: text information retrieval, natural language processing, new word discovery, information extraction

Procedia PDF Downloads 61
5089 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 480
5088 Tool for Determining the Similarity between Two Web Applications

Authors: Doru Anastasiu Popescu, Raducanu Dragos Ionut

Abstract:

In this paper the presentation of a tool which measures the similarity between two websites is made. The websites are compound only from webpages created with HTML. The tool uses three ways of calculating the similarity between two websites based on certain results already published. The first way compares all the webpages within a website, the second way compares a webpage with all the pages within the second website and the third way compares two webpages. Java programming language and technologies such as spring, Jsoup, log4j were used for the implementation of the tool.

Keywords: Java, Jsoup, HTM, spring

Procedia PDF Downloads 351
5087 Algorithms for Fast Computation of Pan Matrix Profiles of Time Series Under Unnormalized Euclidean Distances

Authors: Jing Zhang, Daniel Nikovski

Abstract:

We propose an approximation algorithm called LINKUMP to compute the Pan Matrix Profile (PMP) under the unnormalized l∞ distance (useful for value-based similarity search) using double-ended queue and linear interpolation. The algorithm has comparable time/space complexities as the state-of-the-art algorithm for typical PMP computation under the normalized l₂ distance (useful for shape-based similarity search). We validate its efficiency and effectiveness through extensive numerical experiments and a real-world anomaly detection application.

Keywords: pan matrix profile, unnormalized euclidean distance, double-ended queue, discord discovery, anomaly detection

Procedia PDF Downloads 213
5086 Improving Similarity Search Using Clustered Data

Authors: Deokho Kim, Wonwoo Lee, Jaewoong Lee, Teresa Ng, Gun-Ill Lee, Jiwon Jeong

Abstract:

This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.

Keywords: visual search, deep learning, convolutional neural network, machine learning

Procedia PDF Downloads 188
5085 Parallel Vector Processing Using Multi Level Orbital DATA

Authors: Nagi Mekhiel

Abstract:

Many applications use vector operations by applying single instruction to multiple data that map to different locations in conventional memory. Transferring data from memory is limited by access latency and bandwidth affecting the performance gain of vector processing. We present a memory system that makes all of its content available to processors in time so that processors need not to access the memory, we force each location to be available to all processors at a specific time. The data move in different orbits to become available to other processors in higher orbits at different time. We use this memory to apply parallel vector operations to data streams at first orbit level. Data processed in the first level move to upper orbit one data element at a time, allowing a processor in that orbit to apply another vector operation to deal with serial code limitations inherited in all parallel applications and interleaved it with lower level vector operations.

Keywords: Memory Organization, Parallel Processors, Serial Code, Vector Processing

Procedia PDF Downloads 239
5084 Impact of Similarity Ratings on Human Judgement

Authors: Ian A. McCulloh, Madelaine Zinser, Jesse Patsolic, Michael Ramos

Abstract:

Recommender systems are a common artificial intelligence (AI) application. For any given input, a search system will return a rank-ordered list of similar items. As users review returned items, they must decide when to halt the search and either revise search terms or conclude their requirement is novel with no similar items in the database. We present a statistically designed experiment that investigates the impact of similarity ratings on human judgement to conclude a search item is novel and halt the search. 450 participants were recruited from Amazon Mechanical Turk to render judgement across 12 decision tasks. We find the inclusion of ratings increases the human perception that items are novel. Percent similarity increases novelty discernment when compared with star-rated similarity or the absence of a rating. Ratings reduce the time to decide and improve decision confidence. This suggests the inclusion of similarity ratings can aid human decision-makers in knowledge search tasks.

Keywords: ratings, rankings, crowdsourcing, empirical studies, user studies, similarity measures, human-centered computing, novelty in information retrieval

Procedia PDF Downloads 91