Search results for: VSS (Vector Space Similarity)
5180 A New Nonlinear State-Space Model and Its Application
Authors: Abdullah Eqal Al Mazrooei
Abstract:
In this work, a new nonlinear model will be introduced. The model is in the state-space form. The nonlinearity of this model is in the state equation where the state vector is multiplied by its self. This technique makes our model generalizes many famous models as Lotka-Volterra model and Lorenz model which have many applications in the real life. We will apply our new model to estimate the wind speed by using a new nonlinear estimator which suitable to work with our model.Keywords: nonlinear systems, state-space model, Kronecker product, nonlinear estimator
Procedia PDF Downloads 6755179 2D Fingerprint Performance for PubChem Chemical Database
Authors: Fatimah Zawani Abdullah, Shereena Mohd Arif, Nurul Malim
Abstract:
The study of molecular similarity search in chemical database is increasingly widespread, especially in the area of drug discovery. Similarity search is an application in the field of Chemoinformatics to measure the similarity between the molecular structure which is known as the query and the structure of chemical compounds in the database. Similarity search is also one of the approaches in virtual screening which involves computational techniques and scoring the probabilities of activity. The main objective of this work is to determine the best fingerprint when compared to the other five fingerprints selected in this study using PubChem chemical dataset. This paper will discuss the similarity searching process conducted using 6 types of descriptors, which are ECFP4, ECFC4, FCFP4, FCFC4, SRECFC4 and SRFCFC4 on 15 activity classes of PubChem dataset using Tanimoto coefficient to calculate the similarity between the query structures and each of the database structure. The results suggest that ECFP4 performs the best to be used with Tanimoto coefficient in the PubChem dataset.Keywords: 2D fingerprints, Tanimoto, PubChem, similarity searching, chemoinformatics
Procedia PDF Downloads 2795178 Exploring Students' Alternative Conception in Vector Components
Authors: Umporn Wutchana
Abstract:
An open ended problem and unstructured interview had been used to explore students’ conceptual and procedural understanding of vector components. The open ended problem had been designed based on research instrument used in previous physics education research. Without physical context, we asked students to find out magnitude and draw graphical form of vector components. The open ended problem was given to 211 first year students of faculty of science during the third (summer) semester in 2014 academic year. The students spent approximately 15 minutes of their second time of the General Physics I course to complete the open ended problem after they had failed. Consequently, their responses were classified based on the similarity of errors performed in the responses. Then, an unstructured interview was conducted. 7 students were randomly selected and asked to reason and explain their answers. The study results showed that 53% of 211 students provided correct numerical magnitude of vector components while 10.9% of them confused and punctuated the magnitude of vectors in x- with y-components. Others 20.4% provided just symbols and the last 15.6% gave no answer. When asking to draw graphical form of vector components, only 10% of 211 students made corrections. A majority of them produced errors and revealed alternative conceptions. 46.5% drew longer and/or shorter magnitude of vector components. 43.1% drew vectors in different forms or wrote down other symbols. Results from the unstructured interview indicated that some students just memorized the method to get numerical magnitude of x- and y-components. About graphical form of component vectors, some students though that the length of component vectors should be shorter than those of the given one. So then, it could be combined to be equal length of the given vectors while others though that component vectors should has the same length as the given vectors. It was likely to be that many students did not develop a strong foundation of understanding in vector components but just learn by memorizing its solution or the way to compute its magnitude and attribute little meaning to such concept.Keywords: graphical vectors, vectors, vector components, misconceptions, alternative conceptions
Procedia PDF Downloads 1755177 Component Based Testing Using Clustering and Support Vector Machine
Authors: Iqbaldeep Kaur, Amarjeet Kaur
Abstract:
Software Reusability is important part of software development. So component based software development in case of software testing has gained a lot of practical importance in the field of software engineering from academic researcher and also from software development industry perspective. Finding test cases for efficient reuse of test cases is one of the important problems aimed by researcher. Clustering reduce the search space, reuse test cases by grouping similar entities according to requirements ensuring reduced time complexity as it reduce the search time for retrieval the test cases. In this research paper we proposed approach for re-usability of test cases by unsupervised approach. In unsupervised learning we proposed k-mean and Support Vector Machine. We have designed the algorithm for requirement and test case document clustering according to its tf-idf vector space and the output is set of highly cohesive pattern groups.Keywords: software testing, reusability, clustering, k-mean, SVM
Procedia PDF Downloads 4185176 Decision Trees Constructing Based on K-Means Clustering Algorithm
Authors: Loai Abdallah, Malik Yousef
Abstract:
A domain space for the data should reflect the actual similarity between objects. Since objects belonging to the same cluster usually share some common traits even though their geometric distance might be relatively large. In general, the Euclidean distance of data points that represented by large number of features is not capturing the actual relation between those points. In this study, we propose a new method to construct a different space that is based on clustering to form a new distance metric. The new distance space is based on ensemble clustering (EC). The EC distance space is defined by tracking the membership of the points over multiple runs of clustering algorithm metric. Over this distance, we train the decision trees classifier (DT-EC). The results obtained by applying DT-EC on 10 datasets confirm our hypotheses that embedding the EC space as a distance metric would improve the performance.Keywords: ensemble clustering, decision trees, classification, K nearest neighbors
Procedia PDF Downloads 1765175 A Comparative Analysis of Multicarrier SPWM Strategies for Five-Level Flying Capacitor Inverter
Authors: Bachir Belmadani, Rachid Taleb, Zinelaabidine Boudjema, Adil Yahdou
Abstract:
Carrier-based methods have been used widely for switching of multilevel inverters due to their simplicity, flexibility and reduced computational requirements compared to space vector modulation (SVM). This paper focuses on Multicarrier Sinusoidal Pulse Width Modulation (MCSPWM) strategy for the three phase Five-Level Flying Capacitor Inverter (5LFCI). The inverter is simulated for Induction Motor (IM) load and Total Harmonic Distortion (THD) for output waveforms is observed for different controlling schemes.Keywords: flying capacitor inverter, multicarrier sinusoidal pulse width modulation, space vector modulation, total harmonic distortion, induction motor
Procedia PDF Downloads 3985174 Optimality Conditions for Weak Efficient Solutions Generated by a Set Q in Vector Spaces
Authors: Elham Kiyani, S. Mansour Vaezpour, Javad Tavakoli
Abstract:
In this paper, we first introduce a new distance function in a linear space not necessarily endowed with a topology. The algebraic concepts of interior and closure are useful to study optimization problems without topology. So, we define Q-weak efficient solutions generated by the algebraic interior of a set Q, where Q is not necessarily convex. Studying nonconvex vector optimization is valuable since, for a convex cone K in topological spaces, we have int(K)=cor(K), which means that topological interior of a convex cone K is equal to the algebraic interior of K. Moreover, we used the scalarization technique including the distance function generated by the vectorial closure of a set to characterize these Q-weak efficient solutions. Scalarization is a useful approach for solving vector optimization problems. This technique reduces the optimization problem to a scalar problem which tends to be an optimization problem with a real-valued objective function. For instance, Q-weak efficient solutions of vector optimization problems can be characterized and computed as solutions of appropriate scalar optimization problems. In the convex case, linear functionals can be used as objective functionals of the scalar problems. But in the nonconvex case, we should present a suitable objective function. It is the aim of this paper to present a new distance function that be useful to obtain sufficient and necessary conditions for Q-weak efficient solutions of general optimization problems via scalarization.Keywords: weak efficient, algebraic interior, vector closure, linear space
Procedia PDF Downloads 2175173 Space Vector Pulse Width Modulation Based Design and Simulation of a Three-Phase Voltage Source Converter Systems
Authors: Farhan Beg
Abstract:
A space vector based pulse width modulation control technique for the three-phase PWM converter is proposed in this paper. The proposed control scheme is based on a synchronous reference frame model. High performance and efficiency is obtained with regards to the DC bus voltage and the power factor considerations of the PWM rectifier thus leading to low losses. MATLAB/SIMULINK are used as a platform for the simulations and a SIMULINK model is presented in the paper. The results show that the proposed model demonstrates better performance and properties compared to the traditional SPWM method and the method improves the dynamic performance of the closed loop drastically. For the space vector based pulse width modulation, sine signal is the reference waveform and triangle waveform is the carrier waveform. When the value of sine signal is larger than triangle signal, the pulse will start producing to high; and then when the triangular signals higher than sine signal, the pulse will come to low. SPWM output will change by changing the value of the modulation index and frequency used in this system to produce more pulse width. When more pulse width is produced, the output voltage will have lower harmonics contents and the resolution will increase.Keywords: power factor, SVPWM, PWM rectifier, SPWM
Procedia PDF Downloads 3225172 Intracellular Strategies for Gene Delivery into Mammalian Cells Using Bacteria as a Vector
Authors: Kumaran Narayanan, Andrew N. Osahor
Abstract:
E. coli has been engineered by our group and by others as a vector to deliver DNA into cultured human and animal cells. However, so far conditions to improve gene delivery using this vector have not been investigated, resulting in a major gap in our understanding of the requirements for this vector to function optimally. Our group recently published novel data showing that simple addition of the DNA transfection reagent Lipofectamine increased the efficiency of the E. coli vector by almost 3-fold, providing the first strong evidence that further optimization of bactofection is possible. This presentation will discuss advances that demonstrate the effects of several intracellular strategies that improve the efficiency of this vector. Conditions that promote endosomal escape of internalized bacteria to evade lysosomal destruction after entry in the cell, a known obstacle limiting this vector, are elucidated. Further, treatments that increase bacterial lysis so that the vector can release its transgene into the mammalian environment for expression will be discussed. These experiments will provide valuable new insight to advance this E. coli system as an important class of vector technology for genetic correction of human disease models in cells and whole animals.Keywords: DNA, E. coli, gene expression, vector
Procedia PDF Downloads 3435171 Experimental Implementation of Model Predictive Control for Permanent Magnet Synchronous Motor
Authors: Abdelsalam A. Ahmed
Abstract:
Fast speed drives for Permanent Magnet Synchronous Motor (PMSM) is a crucial performance for the electric traction systems. In this paper, PMSM is drived with a Model-based Predictive Control (MPC) technique. Fast speed tracking is achieved through optimization of the DC source utilization using MPC. The technique is based on predicting the optimum voltage vector applied to the driver. Control technique is investigated by comparing to the cascaded PI control based on Space Vector Pulse Width Modulation (SVPWM). MPC and SVPWM-based FOC are implemented with the TMS320F2812 DSP and its power driver circuits. The designed MPC for a PMSM drive is experimentally validated on a laboratory test bench. The performances are compared with those obtained by a conventional PI-based system in order to highlight the improvements, especially regarding speed tracking response.Keywords: permanent magnet synchronous motor, model-based predictive control, DC source utilization, cascaded PI control, space vector pulse width modulation, TMS320F2812 DSP
Procedia PDF Downloads 6285170 Speed up Vector Median Filtering by Quasi Euclidean Norm
Authors: Vinai K. Singh
Abstract:
For reducing impulsive noise without degrading image contours, median filtering is a powerful tool. In multiband images as for example colour images or vector fields obtained by optic flow computation, a vector median filter can be used. Vector median filters are defined on the basis of a suitable distance, the best performing distance being the Euclidean. Euclidean distance is evaluated by using the Euclidean norms which is quite demanding from the point of view of computation given that a square root is required. In this paper an optimal piece-wise linear approximation of the Euclidean norm is presented which is applied to vector median filtering.Keywords: euclidean norm, quasi euclidean norm, vector median filtering, applied mathematics
Procedia PDF Downloads 4585169 ANN Based Simulation of PWM Scheme for Seven Phase Voltage Source Inverter Using MATLAB/Simulink
Authors: Mohammad Arif Khan
Abstract:
This paper analyzes and presents the development of Artificial Neural Network based controller of space vector modulation (ANN-SVPWM) for a seven-phase voltage source inverter. At first, the conventional method of producing sinusoidal output voltage by utilizing six active and one zero space vectors are used to synthesize the input reference, is elaborated and then new PWM scheme called Artificial Neural Network Based PWM is presented. The ANN based controller has the advantage of the very fast implementation and analyzing the algorithms and avoids the direct computation of trigonometric and non-linear functions. The ANN controller uses the individual training strategy with the fixed weight and supervised models. A computer simulation program has been developed using Matlab/Simulink together with the neural network toolbox for training the ANN-controller. A comparison of the proposed scheme with the conventional scheme is presented based on various performance indices. Extensive Simulation results are provided to validate the findings.Keywords: space vector PWM, total harmonic distortion, seven-phase, voltage source inverter, multi-phase, artificial neural network
Procedia PDF Downloads 4455168 Theoretical BER Analyzing of MPSK Signals Based on the Signal Space
Authors: Jing Qing-feng, Liu Danmei
Abstract:
Based on the optimum detection, signal projection and Maximum A Posteriori (MAP) rule, Proakis has deduced the theoretical BER equation of Gray coded MPSK signals. Proakis analyzed the BER theoretical equations mainly based on the projection of signals, which is difficult to be understood. This article solve the same problem based on the signal space, which explains the vectors relations among the sending signals, received signals and noises. The more explicit and easy-deduced process is illustrated in this article based on the signal space, which can illustrated the relations among the signals and noises clearly. This kind of deduction has a univocal geometry meaning. It can explain the correlation between the production and calculation of BER in vector level.Keywords: MPSK, MAP, signal space, BER
Procedia PDF Downloads 3375167 A Comparative Study of Approaches in User-Centred Health Information Retrieval
Authors: Harsh Thakkar, Ganesh Iyer
Abstract:
In this paper, we survey various user-centered or context-based biomedical health information retrieval systems. We present and discuss the performance of systems submitted in CLEF eHealth 2014 Task 3 for this purpose. We classify and focus on comparing the two most prevalent retrieval models in biomedical information retrieval namely: Language Model (LM) and Vector Space Model (VSM). We also report on the effectiveness of using external medical resources and ontologies like MeSH, Metamap, UMLS, etc. We observed that the LM based retrieval systems outperform VSM based systems on various fronts. From the results we conclude that the state-of-art system scores for MAP was 0.4146, P@10 was 0.7560 and NDCG@10 was 0.7445, respectively. All of these score were reported by systems built on language modeling approaches.Keywords: clinical document retrieval, concept-based information retrieval, query expansion, language models, vector space models
Procedia PDF Downloads 3105166 Similarity Based Membership of Elements to Uncertain Concept in Information System
Authors: M. Kamel El-Sayed
Abstract:
The process of determining the degree of membership for an element to an uncertain concept has been found in many ways, using equivalence and symmetry relations in information systems. In the case of similarity, these methods did not take into account the degree of symmetry between elements. In this paper, we use a new definition for finding the membership based on the degree of symmetry. We provide an example to clarify the suggested methods and compare it with previous methods. This method opens the door to more accurate decisions in information systems.Keywords: information system, uncertain concept, membership function, similarity relation, degree of similarity
Procedia PDF Downloads 2085165 Comparing Deep Architectures for Selecting Optimal Machine Translation
Authors: Despoina Mouratidis, Katia Lida Kermanidis
Abstract:
Machine translation (MT) is a very important task in Natural Language Processing (NLP). MT evaluation is crucial in MT development, as it constitutes the means to assess the success of an MT system, and also helps improve its performance. Several methods have been proposed for the evaluation of (MT) systems. Some of the most popular ones in automatic MT evaluation are score-based, such as the BLEU score, and others are based on lexical similarity or syntactic similarity between the MT outputs and the reference involving higher-level information like part of speech tagging (POS). This paper presents a language-independent machine learning framework for classifying pairwise translations. This framework uses vector representations of two machine-produced translations, one from a statistical machine translation model (SMT) and one from a neural machine translation model (NMT). The vector representations consist of automatically extracted word embeddings and string-like language-independent features. These vector representations used as an input to a multi-layer neural network (NN) that models the similarity between each MT output and the reference, as well as between the two MT outputs. To evaluate the proposed approach, a professional translation and a "ground-truth" annotation are used. The parallel corpora used are English-Greek (EN-GR) and English-Italian (EN-IT), in the educational domain and of informal genres (video lecture subtitles, course forum text, etc.) that are difficult to be reliably translated. They have tested three basic deep learning (DL) architectures to this schema: (i) fully-connected dense, (ii) Convolutional Neural Network (CNN), and (iii) Long Short-Term Memory (LSTM). Experiments show that all tested architectures achieved better results when compared against those of some of the well-known basic approaches, such as Random Forest (RF) and Support Vector Machine (SVM). Better accuracy results are obtained when LSTM layers are used in our schema. In terms of a balance between the results, better accuracy results are obtained when dense layers are used. The reason for this is that the model correctly classifies more sentences of the minority class (SMT). For a more integrated analysis of the accuracy results, a qualitative linguistic analysis is carried out. In this context, problems have been identified about some figures of speech, as the metaphors, or about certain linguistic phenomena, such as per etymology: paronyms. It is quite interesting to find out why all the classifiers led to worse accuracy results in Italian as compared to Greek, taking into account that the linguistic features employed are language independent.Keywords: machine learning, machine translation evaluation, neural network architecture, pairwise classification
Procedia PDF Downloads 1165164 Support Vector Machine Based Retinal Therapeutic for Glaucoma Using Machine Learning Algorithm
Authors: P. S. Jagadeesh Kumar, Mingmin Pan, Yang Yung, Tracy Lin Huan
Abstract:
Glaucoma is a group of visual maladies represented by the scheduled optic nerve neuropathy; means to the increasing dwindling in vision ground, resulting in loss of sight. In this paper, a novel support vector machine based retinal therapeutic for glaucoma using machine learning algorithm is conservative. The algorithm has fitting pragmatism; subsequently sustained on correlation clustering mode, it visualizes perfect computations in the multi-dimensional space. Support vector clustering turns out to be comparable to the scale-space advance that investigates the cluster organization by means of a kernel density estimation of the likelihood distribution, where cluster midpoints are idiosyncratic by the neighborhood maxima of the concreteness. The predicted planning has 91% attainment rate on data set deterrent on a consolidation of 500 realistic images of resolute and glaucoma retina; therefore, the computational benefit of depending on the cluster overlapping system pedestal on machine learning algorithm has complete performance in glaucoma therapeutic.Keywords: machine learning algorithm, correlation clustering mode, cluster overlapping system, glaucoma, kernel density estimation, retinal therapeutic
Procedia PDF Downloads 2315163 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures
Authors: Salima Kouici, Abdelkader Khelladi
Abstract:
In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally, the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.Keywords: binary data, similarity measure, Tθ measures, agglomerative hierarchical clustering
Procedia PDF Downloads 4685162 Design and Simulation of a Double-Stator Linear Induction Machine with Short Squirrel-Cage Mover
Authors: David Rafetseder, Walter Bauer, Florian Poltschak, Wolfgang Amrhein
Abstract:
A flat double-stator linear induction machine (DSLIM) with a short squirrel-cage mover is designed for high thrust force at moderate speed < 5m/s. The performance and motor parameters are determined on the basis of a 2D time-transient simulation with the finite element (FE) software Maxwell 2015. Design guidelines and transformation rules for space vector theory of the LIM are presented. Resulting thrust calculated by flux and current vectors is compared with the FE results showing good coherence and reduced noise. The parameters of the equivalent circuit model are obtained.Keywords: equivalent circuit model, finite element model, linear induction motor, space vector theory
Procedia PDF Downloads 5545161 Empirical Study of Partitions Similarity Measures
Authors: Abdelkrim Alfalah, Lahcen Ouarbya, John Howroyd
Abstract:
This paper investigates and compares the performance of four existing distances and similarity measures between partitions. The partition measures considered are Rand Index (RI), Adjusted Rand Index (ARI), Variation of Information (VI), and Normalised Variation of Information (NVI). This work investigates the ability of these partition measures to capture three predefined intuitions: the variation within randomly generated partitions, the sensitivity to small perturbations, and finally the independence from the dataset scale. It has been shown that the Adjusted Rand Index performed well overall, with regards to these three intuitions.Keywords: clustering, comparing partitions, similarity measure, partition distance, partition metric, similarity between partitions, clustering comparison.
Procedia PDF Downloads 1825160 Efficient Antenna Array Beamforming with Robustness against Random Steering Mismatch
Authors: Ju-Hong Lee, Ching-Wei Liao, Kun-Che Lee
Abstract:
This paper deals with the problem of using antenna sensors for adaptive beamforming in the presence of random steering mismatch. We present an efficient adaptive array beamformer with robustness to deal with the considered problem. The robustness of the proposed beamformer comes from the efficient designation of the steering vector. Using the received array data vector, we construct an appropriate correlation matrix associated with the received array data vector and a correlation matrix associated with signal sources. Then, the eigenvector associated with the largest eigenvalue of the constructed signal correlation matrix is designated as an appropriate estimate of the steering vector. Finally, the adaptive weight vector required for adaptive beamforming is obtained by using the estimated steering vector and the constructed correlation matrix of the array data vector. Simulation results confirm the effectiveness of the proposed method.Keywords: adaptive beamforming, antenna array, linearly constrained minimum variance, robustness, steering vector
Procedia PDF Downloads 1895159 Reconstructed Phase Space Features for Estimating Post Traumatic Stress Disorder
Authors: Andre Wittenborn, Jarek Krajewski
Abstract:
Trauma-related sadness in speech can alter the voice in several ways. The generation of non-linear aerodynamic phenomena within the vocal tract is crucial when analyzing trauma-influenced speech production. They include non-laminar flow and formation of jets rather than well-behaved laminar flow aspects. Especially state-space reconstruction methods based on chaotic dynamics and fractal theory have been suggested to describe these aerodynamic turbulence-related phenomena of the speech production system. To extract the non-linear properties of the speech signal, we used the time delay embedding method to reconstruct from a scalar time series (reconstructed phase space, RPS). This approach results in the extraction of 7238 Features per .wav file (N= 47, 32 m, 15 f). The speech material was prompted by telling about autobiographical related sadness-inducing experiences (sampling rate 16 kHz, 8-bit resolution). After combining these features in a support vector machine based machine learning approach (leave-one-sample out validation), we achieved a correlation of r = .41 with the well-established, self-report ground truth measure (RATS) of post-traumatic stress disorder (PTSD).Keywords: non-linear dynamics features, post traumatic stress disorder, reconstructed phase space, support vector machine
Procedia PDF Downloads 945158 A Clustering Algorithm for Massive Texts
Authors: Ming Liu, Chong Wu, Bingquan Liu, Lei Chen
Abstract:
Internet users have to face the massive amount of textual data every day. Organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering, in fact, is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection. This situation mainly attributes to the high- dimensional vectors generated from texts. To effectively and efficiently cluster large-scale text collection, this paper proposes a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster’s representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature’s weight is fine-tuned by iterative process. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster’s representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high quality on both small-scale and large-scale text collections.Keywords: vector reconstruction, large-scale text clustering, partial tuning sub-process, overall tuning sub-process
Procedia PDF Downloads 4165157 Comparative Analysis of SVPWM and the Standard PWM Technique for Three Level Diode Clamped Inverter fed Induction Motor
Authors: L. Lakhdari, B. Bouchiba, M. Bechar
Abstract:
The multi-level inverters present an important novelty in the field of energy control with high voltage and power. The major advantage of all multi-level inverters is the improvement and spectral quality of its generated output signals. In recent years, various pulse width modulation techniques have been developed. From these technics we have: Sinusoidal Pulse Width Modulation (SPWM) and Space Vector Pulse Width Modulation (SVPWM). This work presents a detailed analysis of the comparative advantage of space vector pulse width modulation (SVPWM) and the standard SPWM technique for Three Level Diode Clamped Inverter fed Induction Motor. The comparison is based on the evaluation of harmonic distortion THD.Keywords: induction motor, multilevel inverters, SVPWM, SPWM, THD
Procedia PDF Downloads 3315156 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications
Authors: K. P. Sandesh, M. H. Suman
Abstract:
Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms
Procedia PDF Downloads 5085155 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity
Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang
Abstract:
The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.Keywords: text information retrieval, natural language processing, new word discovery, information extraction
Procedia PDF Downloads 815154 Tool for Determining the Similarity between Two Web Applications
Authors: Doru Anastasiu Popescu, Raducanu Dragos Ionut
Abstract:
In this paper the presentation of a tool which measures the similarity between two websites is made. The websites are compound only from webpages created with HTML. The tool uses three ways of calculating the similarity between two websites based on certain results already published. The first way compares all the webpages within a website, the second way compares a webpage with all the pages within the second website and the third way compares two webpages. Java programming language and technologies such as spring, Jsoup, log4j were used for the implementation of the tool.Keywords: Java, Jsoup, HTM, spring
Procedia PDF Downloads 3765153 Algorithms for Fast Computation of Pan Matrix Profiles of Time Series Under Unnormalized Euclidean Distances
Authors: Jing Zhang, Daniel Nikovski
Abstract:
We propose an approximation algorithm called LINKUMP to compute the Pan Matrix Profile (PMP) under the unnormalized l∞ distance (useful for value-based similarity search) using double-ended queue and linear interpolation. The algorithm has comparable time/space complexities as the state-of-the-art algorithm for typical PMP computation under the normalized l₂ distance (useful for shape-based similarity search). We validate its efficiency and effectiveness through extensive numerical experiments and a real-world anomaly detection application.Keywords: pan matrix profile, unnormalized euclidean distance, double-ended queue, discord discovery, anomaly detection
Procedia PDF Downloads 2365152 Improving Similarity Search Using Clustered Data
Authors: Deokho Kim, Wonwoo Lee, Jaewoong Lee, Teresa Ng, Gun-Ill Lee, Jiwon Jeong
Abstract:
This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.Keywords: visual search, deep learning, convolutional neural network, machine learning
Procedia PDF Downloads 2035151 Impact of Similarity Ratings on Human Judgement
Authors: Ian A. McCulloh, Madelaine Zinser, Jesse Patsolic, Michael Ramos
Abstract:
Recommender systems are a common artificial intelligence (AI) application. For any given input, a search system will return a rank-ordered list of similar items. As users review returned items, they must decide when to halt the search and either revise search terms or conclude their requirement is novel with no similar items in the database. We present a statistically designed experiment that investigates the impact of similarity ratings on human judgement to conclude a search item is novel and halt the search. 450 participants were recruited from Amazon Mechanical Turk to render judgement across 12 decision tasks. We find the inclusion of ratings increases the human perception that items are novel. Percent similarity increases novelty discernment when compared with star-rated similarity or the absence of a rating. Ratings reduce the time to decide and improve decision confidence. This suggests the inclusion of similarity ratings can aid human decision-makers in knowledge search tasks.Keywords: ratings, rankings, crowdsourcing, empirical studies, user studies, similarity measures, human-centered computing, novelty in information retrieval
Procedia PDF Downloads 109