Search results for: similarity function.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2481

Search results for: similarity function.

2451 Image Similarity: A Genetic Algorithm Based Approach

Authors: R. C. Joshi, Shashikala Tapaswi

Abstract:

The paper proposes an approach using genetic algorithm for computing the region based image similarity. The image is denoted using a set of segmented regions reflecting color and texture properties of an image. An image is associated with a family of image features corresponding to the regions. The resemblance of two images is then defined as the overall similarity between two families of image features, and quantified by a similarity measure, which integrates properties of all the regions in the images. A genetic algorithm is applied to decide the most plausible matching. The performance of the proposed method is illustrated using examples from an image database of general-purpose images, and is shown to produce good results.

Keywords: Image Features, color descriptor, segmented classes, texture descriptors, genetic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2330
2450 A Combination of Similarity Ranking and Time for Social Research Paper Searching

Authors: P. Jomsri

Abstract:

Nowadays social media are important tools for web resource discovery. The performance and capabilities of web searches are vital, especially search results from social research paper bookmarking. This paper proposes a new algorithm for ranking method that is a combination of similarity ranking with paper posted time or CSTRank. The paper posted time is static ranking for improving search results. For this particular study, the paper posted time is combined with similarity ranking to produce a better ranking than other methods such as similarity ranking or SimRank. The retrieval performance of combination rankings is evaluated using mean values of NDCG. The evaluation in the experiments implies that the chosen CSTRank ranking by using weight score at ratio 90:10 can improve the efficiency of research paper searching on social bookmarking websites.

Keywords: combination ranking, information retrieval, time, similarity ranking, static ranking, weight score

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1668
2449 Impact of Similarity Ratings on Human Judgement

Authors: Ian A. McCulloh, Madelaine Zinser, Jesse Patsolic, Michael Ramos

Abstract:

Recommender systems are a common artificial intelligence (AI) application. For any given input, a search system will return a rank-ordered list of similar items. As users review returned items, they must decide when to halt the search and either revise search terms or conclude their requirement is novel with no similar items in the database. We present a statistically designed experiment that investigates the impact of similarity ratings on human judgement to conclude a search item is novel and halt the search. In the study, 450 participants were recruited from Amazon Mechanical Turk to render judgement across 12 decision tasks. We find the inclusion of ratings increases the human perception that items are novel. Percent similarity increases novelty discernment when compared with star-rated similarity or the absence of a rating. Ratings reduce the time to decide and improve decision confidence. This suggests that the inclusion of similarity ratings can aid human decision-makers in knowledge search tasks.

Keywords: Ratings, rankings, crowdsourcing, empirical studies, user studies, similarity measures, human-centered computing, novelty in information retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 482
2448 Graph-Based Text Similarity Measurement by Exploiting Wikipedia as Background Knowledge

Authors: Lu Zhang, Chunping Li, Jun Liu, Hui Wang

Abstract:

Text similarity measurement is a fundamental issue in many textual applications such as document clustering, classification, summarization and question answering. However, prevailing approaches based on Vector Space Model (VSM) more or less suffer from the limitation of Bag of Words (BOW), which ignores the semantic relationship among words. Enriching document representation with background knowledge from Wikipedia is proven to be an effective way to solve this problem, but most existing methods still cannot avoid similar flaws of BOW in a new vector space. In this paper, we propose a novel text similarity measurement which goes beyond VSM and can find semantic affinity between documents. Specifically, it is a unified graph model that exploits Wikipedia as background knowledge and synthesizes both document representation and similarity computation. The experimental results on two different datasets show that our approach significantly improves VSM-based methods in both text clustering and classification.

Keywords: Text classification, Text clustering, Text similarity, Wikipedia

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2120
2447 Discovering the Dimension of Abstractness: Structure-Based Model that Learns New Categories and Categorizes on Different Levels of Abstraction

Authors: Georgi I. Petkov, Ivan I. Vankov, Yolina A. Petrova

Abstract:

A structure-based model of category learning and categorization at different levels of abstraction is presented. The model compares different structures and expresses their similarity implicitly in the forms of mappings. Based on this similarity, the model can categorize different targets either as members of categories that it already has or creates new categories. The model is novel using two threshold parameters to evaluate the structural correspondence. If the similarity between two structures exceeds the higher threshold, a new sub-ordinate category is created. Vice versa, if the similarity does not exceed the higher threshold but does the lower one, the model creates a new category on higher level of abstraction.

Keywords: Analogy-making, categorization, learning of categories, abstraction, hierarchical structure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 783
2446 Comparative Analysis of Diversity and Similarity Indices with Special Relevance to Vegetations around Sewage Drains

Authors: Ekta Singh

Abstract:

Indices summarizing community structure are used to evaluate fundamental community ecology, species interaction, biogeographical factors, and environmental stress. Some of these indices are insensitive to gross community changes induced by contaminants of pollution. Diversity indices and similarity indices are reviewed considering their ecological application, both theoretical and practical. For some useful indices, empirical equations are given to calculate the expected maximum value of the indices to which the observed values can be related at any combination of sample sizes at the experimental sites. This paper examines the effects of sample size and diversity on the expected values of diversity indices and similarity indices, using various formulae. It has been shown that all indices are strongly affected by sample size and diversity. In some indices, this influence is greater than the others and an attempt has been made to deal with these influences.

Keywords: Biogeographical factors, Diversity Indices, Ecology and Similarity Indices

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3001
2445 Investigation of Self-Similarity Solution for Wake Flow of a Cylinder

Authors: A. B. Khoshnevis, F. Zeydabadi, F. Sokhanvar

Abstract:

The data measurement of mean velocity has been taken for the wake of single circular cylinder with three different diameters for two different velocities. The effects of change in diameter and in velocity are studied in self-similar coordinate system. The spatial variations of velocity defect and that of the half-width have been investigated. The results are compared with those published by H.Schlichting. In the normalized coordinates, it is also observed that all cases except for the first station are self-similar. By attention to self-similarity profiles of mean velocity, it is observed for all the cases at the each station curves tend to zero at a same point.

Keywords: Self-similarity, wake of single circular cylinder

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2402
2444 Map Matching Performance under Various Similarity Metrics for Heterogeneous Robot Teams

Authors: M. C. Akay, A. Aybakan, H. Temeltas

Abstract:

Aerial and ground robots have various advantages of usage in different missions. Aerial robots can move quickly and get a different sight of view of the area, but those vehicles cannot carry heavy payloads. On the other hand, unmanned ground vehicles (UGVs) are slow moving vehicles, since those can carry heavier payloads than unmanned aerial vehicles (UAVs). In this context, we investigate the performances of various Similarity Metrics to provide a common map for Heterogeneous Robot Team (HRT) in complex environments. Within the usage of Lidar Odometry and Octree Mapping technique, the local 3D maps of the environment are gathered.  In order to obtain a common map for HRT, informative theoretic similarity metrics are exploited. All types of these similarity metrics gave adequate as allowable simulation time and accurate results that can be used in different types of applications. For the heterogeneous multi robot team, those methods can be used to match different types of maps.

Keywords: Common maps, heterogeneous robot team, map matching, informative theoretic similarity metrics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 907
2443 A Simplified and Effective Algorithm Used to Mine Similar Processes: An Illustrated Example

Authors: Min-Hsun Kuo, Yun-Shiow Chen

Abstract:

The running logs of a process hold valuable information about its executed activity behavior and generated activity logic structure. Theses informative logs can be extracted, analyzed and utilized to improve the efficiencies of the process's execution and conduction. One of the techniques used to accomplish the process improvement is called as process mining. To mine similar processes is such an improvement mission in process mining. Rather than directly mining similar processes using a single comparing coefficient or a complicate fitness function, this paper presents a simplified heuristic process mining algorithm with two similarity comparisons that are able to relatively conform the activity logic sequences (traces) of mining processes with those of a normalized (regularized) one. The relative process conformance is to find which of the mining processes match the required activity sequences and relationships, further for necessary and sufficient applications of the mined processes to process improvements. One similarity presented is defined by the relationships in terms of the number of similar activity sequences existing in different processes; another similarity expresses the degree of the similar (identical) activity sequences among the conforming processes. Since these two similarities are with respect to certain typical behavior (activity sequences) occurred in an entire process, the common problems, such as the inappropriateness of an absolute comparison and the incapability of an intrinsic information elicitation, which are often appeared in other process conforming techniques, can be solved by the relative process comparison presented in this paper. To demonstrate the potentiality of the proposed algorithm, a numerical example is illustrated.

Keywords: process mining, process similarity, artificial intelligence, process conformance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1448
2442 Use of Gaussian-Euclidean Hybrid Function Based Artificial Immune System for Breast Cancer Diagnosis

Authors: Cuneyt Yucelbas, Seral Ozsen, Sule Yucelbas, Gulay Tezel

Abstract:

Due to the fact that there exist only a small number of complex systems in artificial immune system (AIS) that work out nonlinear problems, nonlinear AIS approaches, among the well-known solution techniques, need to be developed. Gaussian function is usually used as similarity estimation in classification problems and pattern recognition. In this study, diagnosis of breast cancer, the second type of the most widespread cancer in women, was performed with different distance calculation functions that euclidean, gaussian and gaussian-euclidean hybrid function in the clonal selection model of classical AIS on Wisconsin Breast Cancer Dataset (WBCD), which was taken from the University of California, Irvine Machine-Learning Repository. We used 3-fold cross validation method to train and test the dataset. According to the results, the maximum test classification accuracy was reported as 97.35% by using of gaussian-euclidean hybrid function for fold-3. Also, mean of test classification accuracies for all of functions were obtained as 94.78%, 94.45% and 95.31% with use of euclidean, gaussian and gaussian-euclidean, respectively. With these results, gaussian-euclidean hybrid function seems to be a potential distance calculation method, and it may be considered as an alternative distance calculation method for hard nonlinear classification problems.

Keywords: Artificial Immune System, Breast Cancer Diagnosis, Euclidean Function, Gaussian Function.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2124
2441 Similarity Detection in Collaborative Development of Object-Oriented Formal Specifications

Authors: Fathi Taibi, Fouad Mohammed Abbou, Md. Jahangir Alam

Abstract:

The complexity of today-s software systems makes collaborative development necessary to accomplish tasks. Frameworks are necessary to allow developers perform their tasks independently yet collaboratively. Similarity detection is one of the major issues to consider when developing such frameworks. It allows developers to mine existing repositories when developing their own views of a software artifact, and it is necessary for identifying the correspondences between the views to allow merging them and checking their consistency. Due to the importance of the requirements specification stage in software development, this paper proposes a framework for collaborative development of Object- Oriented formal specifications along with a similarity detection approach to support the creation, merging and consistency checking of specifications. The paper also explores the impact of using additional concepts on improving the matching results. Finally, the proposed approach is empirically evaluated.

Keywords: Collaborative Development, Formal methods, Object-Oriented, Similarity detection

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1472
2440 Flocking Behaviors for Multiple Groups with Heterogeneous Agents

Authors: Jae Moon Lee

Abstract:

Most of researches for conventional simulations were studied focusing on flocks with a single species. While there exist the flocking behaviors with a single species in nature, the flocking behaviors are frequently observed with multi-species. This paper studies on the flocking simulation for heterogeneous agents. In order to simulate the flocks for heterogeneous agents, the conventional method uses the identifier of flock, while the proposed method defines the feature vector of agent and uses the similarity between agents by comparing with those feature vectors. Based on the similarity, the paper proposed the attractive force and repulsive force and then executed the simulation by applying two forces. The results of simulation showed that flock formation with heterogeneous agents is very natural in both cases. In addition, it showed that unlike the existing method, the proposed method can not only control the density of the flocks, but also be possible for two different groups of agents to flock close to each other if they have a high similarity.

Keywords: Flocking behavior, heterogeneous agents, similarity, simulation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1603
2439 An Extension of the Kratzel Function and Associated Inverse Gaussian Probability Distribution Occurring in Reliability Theory

Authors: R. K. Saxena, Ravi Saxena

Abstract:

In view of their importance and usefulness in reliability theory and probability distributions, several generalizations of the inverse Gaussian distribution and the Krtzel function are investigated in recent years. This has motivated the authors to introduce and study a new generalization of the inverse Gaussian distribution and the Krtzel function associated with a product of a Bessel function of the third kind )(zKQ and a Z - Fox-Wright generalized hyper geometric function introduced in this paper. The introduced function turns out to be a unified gamma-type function. Its incomplete forms are also discussed. Several properties of this gamma-type function are obtained. By means of this generalized function, we introduce a generalization of inverse Gaussian distribution, which is useful in reliability analysis, diffusion processes, and radio techniques etc. The inverse Gaussian distribution thus introduced also provides a generalization of the Krtzel function. Some basic statistical functions associated with this probability density function, such as moments, the Mellin transform, the moment generating function, the hazard rate function, and the mean residue life function are also obtained.KeywordsFox-Wright function, Inverse Gaussian distribution, Krtzel function & Bessel function of the third kind.

Keywords: Fox-Wright function, Inverse Gaussian distribution, Krtzel function & Bessel function of the third kind.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1726
2438 A Relational Case-Based Reasoning Framework for Project Delivery System Selection

Authors: Yang Cui, Yong Qiang Chen

Abstract:

An appropriate project delivery system (PDS) is crucial to the success of a construction projects. Case-based Reasoning (CBR) is a useful support for PDS selection. However, the traditional CBR approach represents cases as attribute-value vectors without taking relations among attributes into consideration, and could not calculate the similarity when the structures of cases are not strictly same. Therefore, this paper solves this problem by adopting the Relational Case-based Reasoning (RCBR) approach for PDS selection, considering both the structural similarity and feature similarity. To develop the feature terms of the construction projects, the criteria and factors governing PDS selection process are first identified. Then feature terms for the construction projects are developed. Finally, the mechanism of similarity calculation and a case study indicate how RCBR works for PDS selection. The adoption of RCBR in PDS selection expands the scope of application of traditional CBR method and improves the accuracy of the PDS selection system.

Keywords: Relational Cased-based Reasoning, Case-based Reasoning, Project delivery system, Selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1997
2437 Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Authors: Shohei Maruyama, Yasuo Matsuyama, Sachiyo Aburatani

Abstract:

Development of a method to estimate gene functions is an important task in bioinformatics. One of the approaches for the annotation is the identification of the metabolic pathway that genes are involved in. Since gene expression data reflect various intracellular phenomena, those data are considered to be related with genes’ functions. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.

Keywords: Metabolic pathways, gene expression data, microarray, Kullback–Leibler divergence, KL divergence, support vector machines, SVM, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2338
2436 Computing Entropy for Ortholog Detection

Authors: Hsing-Kuo Pao, John Case

Abstract:

Biological sequences from different species are called or-thologs if they evolved from a sequence of a common ancestor species and they have the same biological function. Approximations of Kolmogorov complexity or entropy of biological sequences are already well known to be useful in extracting similarity information between such sequences -in the interest, for example, of ortholog detection. As is well known, the exact Kolmogorov complexity is not algorithmically computable. In prac-tice one can approximate it by computable compression methods. How-ever, such compression methods do not provide a good approximation to Kolmogorov complexity for short sequences. Herein is suggested a new ap-proach to overcome the problem that compression approximations may notwork well on short sequences. This approach is inspired by new, conditional computations of Kolmogorov entropy. A main contribution of the empir-ical work described shows the new set of entropy-based machine learning attributes provides good separation between positive (ortholog) and nega-tive (non-ortholog) data - better than with good, previously known alter-natives (which do not employ some means to handle short sequences well).Also empirically compared are the new entropy based attribute set and a number of other, more standard similarity attributes sets commonly used in genomic analysis. The various similarity attributes are evaluated by cross validation, through boosted decision tree induction C5.0, and by Receiver Operating Characteristic (ROC) analysis. The results point to the conclu-sion: the new, entropy based attribute set by itself is not the one giving the best prediction; however, it is the best attribute set for use in improving the other, standard attribute sets when conjoined with them.

Keywords: compression, decision tree, entropy, ortholog, ROC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1829
2435 Effects of Introducing Similarity Measures into Artificial Bee Colony Approach for Optimization of Vehicle Routing Problem

Authors: P. Shunmugapriya, S. Kanmani, P. Jude Fredieric, U. Vignesh, J. Reman Justin, K. Vivek

Abstract:

Vehicle Routing Problem (VRP) is a complex combinatorial optimization problem and it is quite difficult to find an optimal solution consisting of a set of routes for vehicles whose total cost is minimum. Evolutionary and swarm intelligent (SI) algorithms play a vital role in solving optimization problems. While the SI algorithms perform search, the diversity between the solutions they exploit is very important. This is because of the need to avoid early convergence and to get an appropriate balance between the exploration and exploitation. Therefore, it is important to check how far the solutions are diverse. In this paper, we measure the similarity between solutions, which ABC exploits while optimizing VRP. The similar solutions found are discarded at the end of the iteration and only unique solutions are passed on to the next iteration. The bees of discarded solutions become scouts and they start searching for new solutions. This process is continued and results show that the solution is optimized at lesser number of iterations but with the overhead of computing similarity in all the iterations. The problem instance from Solomon benchmarked dataset has been used for evaluating the presented methodology.

Keywords: ABC algorithm, vehicle routing problem, optimization, Jaccard’s similarity measure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 850
2434 Graph Cuts Segmentation Approach Using a Patch-Based Similarity Measure Applied for Interactive CT Lung Image Segmentation

Authors: Aicha Majda, Abdelhamid El Hassani

Abstract:

Lung CT image segmentation is a prerequisite in lung CT image analysis. Most of the conventional methods need a post-processing to deal with the abnormal lung CT scans such as lung nodules or other lesions. The simplest similarity measure in the standard Graph Cuts Algorithm consists of directly comparing the pixel values of the two neighboring regions, which is not accurate because this kind of metrics is extremely sensitive to minor transformations such as noise or other artifacts problems. In this work, we propose an improved version of the standard graph cuts algorithm based on the Patch-Based similarity metric. The boundary penalty term in the graph cut algorithm is defined Based on Patch-Based similarity measurement instead of the simple intensity measurement in the standard method. The weights between each pixel and its neighboring pixels are Based on the obtained new term. The graph is then created using theses weights between its nodes. Finally, the segmentation is completed with the minimum cut/Max-Flow algorithm. Experimental results show that the proposed method is very accurate and efficient, and can directly provide explicit lung regions without any post-processing operations compared to the standard method.

Keywords: Graph cuts, lung CT scan, lung parenchyma segmentation, patch based similarity metric.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 749
2433 OCIRS: An Ontology-based Chinese Idioms Retrieval System

Authors: Hu Haibo, Tu Chunmei, Fu Chunlei, Fu Li, Mao Fan, Ma Yuan

Abstract:

Chinese Idioms are a type of traditional Chinese idiomatic expressions with specific meanings and stereotypes structure which are widely used in classical Chinese and are still common in vernacular written and spoken Chinese today. Currently, Chinese Idioms are retrieved in glossary with key character or key word in morphology or pronunciation index that can not meet the need of searching semantically. OCIRS is proposed to search the desired idiom in the case of users only knowing its meaning without any key character or key word. The user-s request in a sentence or phrase will be grammatically analyzed in advance by word segmentation, key word extraction and semantic similarity computation, thus can be mapped to the idiom domain ontology which is constructed to provide ample semantic relations and to facilitate description logics-based reasoning for idiom retrieval. The experimental evaluation shows that OCIRS realizes the function of searching idioms via semantics, obtaining preliminary achievement as requested by the users.

Keywords: Chinese idiom, idiom retrieval, semantic searching, ontology, semantics similarity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1723
2432 Maximum Common Substructure Extraction in RNA Secondary Structures Using Clique Detection Approach

Authors: Shih-Yi Chao

Abstract:

The similarity comparison of RNA secondary structures is important in studying the functions of RNAs. In recent years, most existing tools represent the secondary structures by tree-based presentation and calculate the similarity by tree alignment distance. Different to previous approaches, we propose a new method based on maximum clique detection algorithm to extract the maximum common structural elements in compared RNA secondary structures. A new graph-based similarity measurement and maximum common subgraph detection procedures for comparing purely RNA secondary structures is introduced. Given two RNA secondary structures, the proposed algorithm consists of a process to determine the score of the structural similarity, followed by comparing vertices labelling, the labelled edges and the exact degree of each vertex. The proposed algorithm also consists of a process to extract the common structural elements between compared secondary structures based on a proposed maximum clique detection of the problem. This graph-based model also can work with NC-IUB code to perform the pattern-based searching. Therefore, it can be used to identify functional RNA motifs from database or to extract common substructures between complex RNA secondary structures. We have proved the performance of this proposed algorithm by experimental results. It provides a new idea of comparing RNA secondary structures. This tool is helpful to those who are interested in structural bioinformatics.

Keywords: Clique detection, labeled vertices, RNA secondary structures, subgraph, similarity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1459
2431 SAF: A Substitution and Alignment Free Similarity Measure for Protein Sequences

Authors: Abdellali Kelil, Shengrui Wang, Ryszard Brzezinski

Abstract:

The literature reports a large number of approaches for measuring the similarity between protein sequences. Most of these approaches estimate this similarity using alignment-based techniques that do not necessarily yield biologically plausible results, for two reasons. First, for the case of non-alignable (i.e., not yet definitively aligned and biologically approved) sequences such as multi-domain, circular permutation and tandem repeat protein sequences, alignment-based approaches do not succeed in producing biologically plausible results. This is due to the nature of the alignment, which is based on the matching of subsequences in equivalent positions, while non-alignable proteins often have similar and conserved domains in non-equivalent positions. Second, the alignment-based approaches lead to similarity measures that depend heavily on the parameters set by the user for the alignment (e.g., gap penalties and substitution matrices). For easily alignable protein sequences, it's possible to supply a suitable combination of input parameters that allows such an approach to yield biologically plausible results. However, for difficult-to-align protein sequences, supplying different combinations of input parameters yields different results. Such variable results create ambiguities and complicate the similarity measurement task. To overcome these drawbacks, this paper describes a novel and effective approach for measuring the similarity between protein sequences, called SAF for Substitution and Alignment Free. Without resorting either to the alignment of protein sequences or to substitution relations between amino acids, SAF is able to efficiently detect the significant subsequences that best represent the intrinsic properties of protein sequences, those underlying the chronological dependencies of structural features and biochemical activities of protein sequences. Moreover, by using a new efficient subsequence matching scheme, SAF more efficiently handles protein sequences that contain similar structural features with significant meaning in chronologically non-equivalent positions. To show the effectiveness of SAF, extensive experiments were performed on protein datasets from different databases, and the results were compared with those obtained by several mainstream algorithms.

Keywords: Protein, Similarity, Substitution, Alignment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1412
2430 Sequence Relationships Similarity of Swine Influenza a (H1N1) Virus

Authors: Patsaraporn Somboonsak, Mud-Armeen Munlin

Abstract:

In April 2009, a new variant of Influenza A virus subtype H1N1 emerged in Mexico and spread all over the world. The influenza has three subtypes in human (H1N1, H1N2 and H3N2) Types B and C influenza tend to be associated with local or regional epidemics. Preliminary genetic characterization of the influenza viruses has identified them as swine influenza A (H1N1) viruses. Nucleotide sequence analysis of the Haemagglutinin (HA) and Neuraminidase (NA) are similar to each other and the majority of their genes of swine influenza viruses, two genes coding for the neuraminidase (NA) and matrix (M) proteins are similar to corresponding genes of swine influenza. Sequence similarity between the 2009 A (H1N1) virus and its nearest relatives indicates that its gene segments have been circulating undetected for an extended period. Nucleic acid sequence Maximum Likelihood (MCL) and DNA Empirical base frequencies, Phylogenetic relationship amongst the HA genes of H1N1 virus isolated in Genbank having high nucleotide sequence homology. In this paper we used 16 HA nucleotide sequences from NCBI for computing sequence relationships similarity of swine influenza A virus using the following method MCL the result is 28%, 36.64% for Optimal tree with the sum of branch length, 35.62% for Interior branch phylogeny Neighber – Join Tree, 1.85% for the overall transition/transversion, and 8.28% for Overall mean distance.

Keywords: Sequence DNA, Relationship of swine, Swineinfluenza, Sequence Similarity

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2128
2429 Robust Face Recognition using AAM and Gabor Features

Authors: Sanghoon Kim, Sun-Tae Chung, Souhwan Jung, Seoungseon Jeon, Jaemin Kim, Seongwon Cho

Abstract:

In this paper, we propose a face recognition algorithm using AAM and Gabor features. Gabor feature vectors which are well known to be robust with respect to small variations of shape, scaling, rotation, distortion, illumination and poses in images are popularly employed for feature vectors for many object detection and recognition algorithms. EBGM, which is prominent among face recognition algorithms employing Gabor feature vectors, requires localization of facial feature points where Gabor feature vectors are extracted. However, localization method employed in EBGM is based on Gabor jet similarity and is sensitive to initial values. Wrong localization of facial feature points affects face recognition rate. AAM is known to be successfully applied to localization of facial feature points. In this paper, we devise a facial feature point localization method which first roughly estimate facial feature points using AAM and refine facial feature points using Gabor jet similarity-based facial feature localization method with initial points set by the rough facial feature points obtained from AAM, and propose a face recognition algorithm using the devised localization method for facial feature localization and Gabor feature vectors. It is observed through experiments that such a cascaded localization method based on both AAM and Gabor jet similarity is more robust than the localization method based on only Gabor jet similarity. Also, it is shown that the proposed face recognition algorithm using this devised localization method and Gabor feature vectors performs better than the conventional face recognition algorithm using Gabor jet similarity-based localization method and Gabor feature vectors like EBGM.

Keywords: Face Recognition, AAM, Gabor features, EBGM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2210
2428 A New Definition of the Intrinsic Mode Function

Authors: Zhihua Yang, Lihua Yang

Abstract:

This paper makes a detailed analysis regarding the definition of the intrinsic mode function and proves that Condition 1 of the intrinsic mode function can really be deduced from Condition 2. Finally, an improved definition of the intrinsic mode function is given.

Keywords: Empirical Mode Decomposition (EMD), Hilbert-Huang transform(HHT), Intrinsic Mode Function(IMF).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3597
2427 Computational Method for Annotation of Protein Sequence According to Gene Ontology Terms

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias

Abstract:

Annotation of a protein sequence is pivotal for the understanding of its function. Accuracy of manual annotation provided by curators is still questionable by having lesser evidence strength and yet a hard task and time consuming. A number of computational methods including tools have been developed to tackle this challenging task. However, they require high-cost hardware, are difficult to be setup by the bioscientists, or depend on time intensive and blind sequence similarity search like Basic Local Alignment Search Tool. This paper introduces a new method of assigning highly correlated Gene Ontology terms of annotated protein sequences to partially annotated or newly discovered protein sequences. This method is fully based on Gene Ontology data and annotations. Two problems had been identified to achieve this method. The first problem relates to splitting the single monolithic Gene Ontology RDF/XML file into a set of smaller files that can be easy to assess and process. Thus, these files can be enriched with protein sequences and Inferred from Electronic Annotation evidence associations. The second problem involves searching for a set of semantically similar Gene Ontology terms to a given query. The details of macro and micro problems involved and their solutions including objective of this study are described. This paper also describes the protein sequence annotation and the Gene Ontology. The methodology of this study and Gene Ontology based protein sequence annotation tool namely extended UTMGO is presented. Furthermore, its basic version which is a Gene Ontology browser that is based on semantic similarity search is also introduced.

Keywords: automatic clustering, bioinformatics tool, gene ontology, protein sequence annotation, semantic similarity search

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3131
2426 Stability Analysis of Three-Dimensional Flow and Heat Transfer over a Permeable Shrinking Surface in a Cu-Water Nanofluid

Authors: Roslinda Nazar, Amin Noor, Khamisah Jafar, Ioan Pop

Abstract:

In this paper, the steady laminar three-dimensional boundary layer flow and heat transfer of a copper (Cu)-water nanofluid in the vicinity of a permeable shrinking flat surface in an otherwise quiescent fluid is studied. The nanofluid mathematical model in which the effect of the nanoparticle volume fraction is taken into account is considered. The governing nonlinear partial differential equations are transformed into a system of nonlinear ordinary differential equations using a similarity transformation which is then solved numerically using the function bvp4c from Matlab. Dual solutions (upper and lower branch solutions) are found for the similarity boundary layer equations for a certain range of the suction parameter. A stability analysis has been performed to show which branch solutions are stable and physically realizable. The numerical results for the skin friction coefficient and the local Nusselt number as well as the velocity and temperature profiles are obtained, presented and discussed in detail for a range of various governing parameters.

Keywords: Heat Transfer, Nanofluid, Shrinking Surface, Stability Analysis, Three-Dimensional Flow.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2196
2425 A Review and Comparative Analysis on Cluster Ensemble Methods

Authors: S. Sarumathi, P. Ranjetha, C. Saraswathy, M. Vaishnavi, S. Geetha

Abstract:

Clustering is an unsupervised learning technique for aggregating data objects into meaningful classes so that intra cluster similarity is maximized and inter cluster similarity is minimized in data mining. However, no single clustering algorithm proves to be the most effective in producing the best result. As a result, a new challenging technique known as the cluster ensemble approach has blossomed in order to determine the solution to this problem. For the cluster analysis issue, this new technique is a successful approach. The cluster ensemble's main goal is to combine similar clustering solutions in a way that achieves the precision while also improving the quality of individual data clustering. Because of the massive and rapid creation of new approaches in the field of data mining, the ongoing interest in inventing novel algorithms necessitates a thorough examination of current techniques and future innovation. This paper presents a comparative analysis of various cluster ensemble approaches, including their methodologies, formal working process, and standard accuracy and error rates. As a result, the society of clustering practitioners will benefit from this exploratory and clear research, which will aid in determining the most appropriate solution to the problem at hand.

Keywords: Clustering, cluster ensemble methods, consensus function, data mining, unsupervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 830
2424 Towards Clustering of Web-based Document Structures

Authors: Matthias Dehmer, Frank Emmert Streib, Jürgen Kilian, Andreas Zulauf

Abstract:

Methods for organizing web data into groups in order to analyze web-based hypertext data and facilitate data availability are very important in terms of the number of documents available online. Thereby, the task of clustering web-based document structures has many applications, e.g., improving information retrieval on the web, better understanding of user navigation behavior, improving web users requests servicing, and increasing web information accessibility. In this paper we investigate a new approach for clustering web-based hypertexts on the basis of their graph structures. The hypertexts will be represented as so called generalized trees which are more general than usual directed rooted trees, e.g., DOM-Trees. As a important preprocessing step we measure the structural similarity between the generalized trees on the basis of a similarity measure d. Then, we apply agglomerative clustering to the obtained similarity matrix in order to create clusters of hypertext graph patterns representing navigation structures. In the present paper we will run our approach on a data set of hypertext structures and obtain good results in Web Structure Mining. Furthermore we outline the application of our approach in Web Usage Mining as future work.

Keywords: Clustering methods, graph-based patterns, graph similarity, hypertext structures, web structure mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1508
2423 Incorporating Semantic Similarity Measure in Genetic Algorithm : An Approach for Searching the Gene Ontology Terms

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias, Hany T. Alashwal, Rohayanti Hassan, FarhanMohamed

Abstract:

The most important property of the Gene Ontology is the terms. These control vocabularies are defined to provide consistent descriptions of gene products that are shareable and computationally accessible by humans, software agent, or other machine-readable meta-data. Each term is associated with information such as definition, synonyms, database references, amino acid sequences, and relationships to other terms. This information has made the Gene Ontology broadly applied in microarray and proteomic analysis. However, the process of searching the terms is still carried out using traditional approach which is based on keyword matching. The weaknesses of this approach are: ignoring semantic relationships between terms, and highly depending on a specialist to find similar terms. Therefore, this study combines semantic similarity measure and genetic algorithm to perform a better retrieval process for searching semantically similar terms. The semantic similarity measure is used to compute similitude strength between two terms. Then, the genetic algorithm is employed to perform batch retrievals and to handle the situation of the large search space of the Gene Ontology graph. The computational results are presented to show the effectiveness of the proposed algorithm.

Keywords: Gene Ontology, Semantic similarity measure, Genetic algorithm, Ontology search

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1491
2422 Effective Keyword and Similarity Thresholds for the Discovery of Themes from the User Web Access Patterns

Authors: Haider A Ramadhan, Khalil Shihab

Abstract:

Clustering techniques have been used by many intelligent software agents to group similar access patterns of the Web users into high level themes which express users intentions and interests. However, such techniques have been mostly focusing on one salient feature of the Web document visited by the user, namely the extracted keywords. The major aim of these techniques is to come up with an optimal threshold for the number of keywords needed to produce more focused themes. In this paper we focus on both keyword and similarity thresholds to generate themes with concentrated themes, and hence build a more sound model of the user behavior. The purpose of this paper is two fold: use distance based clustering methods to recognize overall themes from the Proxy log file, and suggest an efficient cut off levels for the keyword and similarity thresholds which tend to produce more optimal clusters with better focus and efficient size.

Keywords: Data mining, knowledge discovery, clustering, dataanalysis, Web log analysis, theme based searching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1457