Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5

similarity measure Related Publications

5 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures

Authors: Salima Kouici, Abdelkader Khelladi

Abstract:

In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.

Keywords: binary data, similarity measure, Tθ measures, agglomerative hierarchical clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3154
4 A Review on Important Aspects of Information Retrieval

Authors: Yogesh Gupta, Ashish Saini, A.K. Saxena

Abstract:

Information retrieval has become an important field of study and research under computer science due to explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive study, which discusses not only emergence and evolution of information retrieval but also includes different information retrieval models and some important aspects such as document representation, similarity measure and query expansion.

Keywords: Information Retrieval, query expansion, similarity measure, vector space model

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2616
3 Prediction of Research Topics Using Ensemble of Best Predictors from Similar Dataset

Authors: Indra Budi, Rizal Fathoni Aji, Agus Widodo

Abstract:

Prediction of future research topics by using time series analysis either statistical or machine learning has been conducted previously by several researchers. Several methods have been proposed to combine the forecasting results into single forecast. These methods use fixed combination of individual forecast to get the final forecast result. In this paper, quite different approach is employed to select the forecasting methods, in which every point to forecast is calculated by using the best methods used by similar validation dataset. The dataset used in the experiment is time series derived from research report in Garuda, which is an online sites belongs to the Ministry of Education in Indonesia, over the past 20 years. The experimental result demonstrates that the proposed method may perform better compared to the fix combination of predictors. In addition, based on the prediction result, we can forecast emerging research topics for the next few years.

Keywords: Forecasting, Machine Learning, Time series, Research Topics, prediction, Ensemble, similarity measure, combination, emerging topics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1588
2 Similarity Measure Functions for Strategy-Based Biometrics

Authors: Roman V. Yampolskiy, Venu Govindaraju

Abstract:

Functioning of a biometric system in large part depends on the performance of the similarity measure function. Frequently a generalized similarity distance measure function such as Euclidian distance or Mahalanobis distance is applied to the task of matching biometric feature vectors. However, often accuracy of a biometric system can be greatly improved by designing a customized matching algorithm optimized for a particular biometric application. In this paper we propose a tailored similarity measure function for behavioral biometric systems based on the expert knowledge of the feature level data in the domain. We compare performance of a proposed matching algorithm to that of other well known similarity distance functions and demonstrate its superiority with respect to the chosen domain.

Keywords: Matching, Behavioral Biometrics, similarity measure, Euclidian Distance

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1339
1 Clustering Protein Sequences with Tailored General Regression Model Technique

Authors: G. Lavanya Devi, Allam Appa Rao, A. Damodaram, GR Sridhar, G. Jaya Suma

Abstract:

Cluster analysis divides data into groups that are meaningful, useful, or both. Analysis of biological data is creating a new generation of epidemiologic, prognostic, diagnostic and treatment modalities. Clustering of protein sequences is one of the current research topics in the field of computer science. Linear relation is valuable in rule discovery for a given data, such as if value X goes up 1, value Y will go down 3", etc. The classical linear regression models the linear relation of two sequences perfectly. However, if we need to cluster a large repository of protein sequences into groups where sequences have strong linear relationship with each other, it is prohibitively expensive to compare sequences one by one. In this paper, we propose a new technique named General Regression Model Technique Clustering Algorithm (GRMTCA) to benignly handle the problem of linear sequences clustering. GRMT gives a measure, GR*, to tell the degree of linearity of multiple sequences without having to compare each pair of them.

Keywords: Clustering, similarity measure, General Regression Model, Protein Sequences

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1275