Search results for: similarity ranking
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 547

Search results for: similarity ranking

427 Weighted Clustering Coefficient for Identifying Modular Formations in Protein-Protein Interaction Networks

Authors: Zelmina Lubovac, Björn Olsson, Jonas Gamalielsson

Abstract:

This paper describes a novel approach for deriving modules from protein-protein interaction networks, which combines functional information with topological properties of the network. This approach is based on weighted clustering coefficient, which uses weights representing the functional similarities between the proteins. These weights are calculated according to the semantic similarity between the proteins, which is based on their Gene Ontology terms. We recently proposed an algorithm for identification of functional modules, called SWEMODE (Semantic WEights for MODule Elucidation), that identifies dense sub-graphs containing functionally similar proteins. The rational underlying this approach is that each module can be reduced to a set of triangles (protein triplets connected to each other). Here, we propose considering semantic similarity weights of all triangle-forming edges between proteins. We also apply varying semantic similarity thresholds between neighbours of each node that are not neighbours to each other (and hereby do not form a triangle), to derive new potential triangles to include in module-defining procedure. The results show an improvement of pure topological approach, in terms of number of predicted modules that match known complexes.

Keywords: Modules, systems biology, protein interactionnetworks, yeast.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2053
426 A Novel Pareto-Based Meta-Heuristic Algorithm to Optimize Multi-Facility Location-Allocation Problem

Authors: Vahid Hajipour, Samira V. Noshafagh, Reza Tavakkoli-Moghaddam

Abstract:

This article proposes a novel Pareto-based multiobjective meta-heuristic algorithm named non-dominated ranking genetic algorithm (NRGA) to solve multi-facility location-allocation problem. In NRGA, a fitness value representing rank is assigned to each individual of the population. Moreover, two features ranked based roulette wheel selection including select the fronts and choose solutions from the fronts, are utilized. The proposed solving methodology is validated using several examples taken from the specialized literature. The performance of our approach shows that NRGA algorithm is able to generate true and well distributed Pareto optimal solutions.

Keywords: Non-dominated ranking genetic algorithm, Pareto solutions, Multi-facility location-allocation problem.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2131
425 Using Suffix Tree Document Representation in Hierarchical Agglomerative Clustering

Authors: Daniel I. Morariu, Radu G. Cretulescu, Lucian N. Vintan

Abstract:

In text categorization problem the most used method for documents representation is based on words frequency vectors called VSM (Vector Space Model). This representation is based only on words from documents and in this case loses any “word context" information found in the document. In this article we make a comparison between the classical method of document representation and a method called Suffix Tree Document Model (STDM) that is based on representing documents in the Suffix Tree format. For the STDM model we proposed a new approach for documents representation and a new formula for computing the similarity between two documents. Thus we propose to build the suffix tree only for any two documents at a time. This approach is faster, it has lower memory consumption and use entire document representation without using methods for disposing nodes. Also for this method is proposed a formula for computing the similarity between documents, which improves substantially the clustering quality. This representation method was validated using HAC - Hierarchical Agglomerative Clustering. In this context we experiment also the stemming influence in the document preprocessing step and highlight the difference between similarity or dissimilarity measures to find “closer" documents.

Keywords: Text Clustering, Suffix tree documentrepresentation, Hierarchical Agglomerative Clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1866
424 Effects of Some Natural Antioxidants Mixtures on Margarine Stability

Authors: Maryam Azizkhani, Parvin Zandi

Abstract:

Application of synthetic antioxidants such as tertbutylhydroquinon (TBHQ), in spite of their efficiency, is questioned because of their possible carcinogenic effect. The purpose of this study was application of mixtures of natural antioxidants that provide the best oxidative stability for margarine. Antioxidant treatments included 10 various mixtures (F1- F10) containing 100-500ppm tocopherol mixture (Toc), 100-200ppm ascorbyl palmitate (AP), 100- 200ppm rosemary extract (Ros) and 1000ppm lecithin(Lec) along with a control or F0 (with no antioxidant) and F11 with 120ppm TBHQ. The effect of antioxidant mixtures on the stability of margarine samples during oven test (60°C), rancimat test at 110°C and storage at 4°C was evaluated. Final ranking of natural antioxidant mixtures was as follows: F2,F10>F5,F9>F8>F1,F3,F4>F6, F7. Considering the results of this research and ranking criteria, F2(200ppmAp + 200ppmRos) and F10(200ppmRos + 200ppmToc +1000ppmLec) were recommended as substitutes for TBHQ to maintain the quality and increase the shelf-life of margarine.

Keywords: Margarine, Natural antioxidant, Oxidative stability, Shelf-life.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3079
423 Analysis of Diverse Clustering Tools in Data Mining

Authors: S. Sarumathi, N. Shanthi, M. Sharmila

Abstract:

Clustering in data mining is an unsupervised learning technique of aggregating the data objects into meaningful groups such that the intra cluster similarity of objects are maximized and inter cluster similarity of objects are minimized. Over the past decades several clustering tools were emerged in which clustering algorithms are inbuilt and are easier to use and extract the expected results. Data mining mainly deals with the huge databases that inflicts on cluster analysis and additional rigorous computational constraints. These challenges pave the way for the emergence of powerful expansive data mining clustering softwares. In this survey, a variety of clustering tools used in data mining are elucidated along with the pros and cons of each software.

Keywords: Cluster Analysis, Clustering Algorithms, Clustering Techniques, Association, Visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2150
422 Unsteady Reversed Stagnation-Point Flow over a Flat Plate

Authors: Vai Kuong Sin, Chon Kit Chio

Abstract:

This paper investigates the nature of the development of two-dimensional laminar flow of an incompressible fluid at the reversed stagnation-point. ". In this study, we revisit the problem of reversed stagnation-point flow over a flat plate. Proudman and Johnson (1962) first studied the flow and obtained an asymptotic solution by neglecting the viscous terms. This is no true in neglecting the viscous terms within the total flow field. In particular it is pointed out that for a plate impulsively accelerated from rest to a constant velocity V0 that a similarity solution to the self-similar ODE is obtained which is noteworthy completely analytical.

Keywords: reversed stagnation-point flow, similarity solutions, analytical solution, numerical solution

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1414
421 Isolation and Identification of Diacylglycerol Acyltransferase Type- 2 (GAT2) Genes from Three Egyptian Olive Cultivars

Authors: Yahia I. Mohamed, Ahmed I. Marzouk, Mohamed A. Yacout

Abstract:

Aim of this work was to study the genetic basis for oil accumulation in olive fruit via tracking DGAT2 (Diacylglycerol acyltransferase type-2) gene in three Egyptian Origen Olive cultivars namely Toffahi, Hamed and Maraki using molecular marker techniques and bioinformatics tools. Results illustrate that, firstly: specific genomic band of Maraki cultivars was identified as DGAT2 (Diacylglycerol acyltransferase type-2) and identical for this gene in Olea europaea with 100% of similarity. Secondly, differential genomic band of Maraki cultivars which produced from RAPD fingerprinting technique reflected predicted distinguished sequence which identified as DGAT2 (Diacylglycerol acyltransferase type-2) in Fragaria vesca subsp. Vesca with 76% of sequential similarity. Third and finally, specific genomic specific band of Hamed cultivars was identified as two fragments, 1- Olea europaea cultivar Koroneiki diacylglycerol acyltransferase type 2 mRNA, complete cds with two matches regions with 99% or 2- Predicted: Fragaria vesca subsp. vesca diacylglycerol O-acyltransferase 2-like (LOC101313050), mRNA with 86 % of similarity.

Keywords: Olea europaea, fingerprinting, Diacylglycerol acyltransferase type- 2 (DGAT2).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2375
420 A Comparative Study of Page Ranking Algorithms for Information Retrieval

Authors: Ashutosh Kumar Singh, Ravi Kumar P

Abstract:

This paper gives an introduction to Web mining, then describes Web Structure mining in detail, and explores the data structure used by the Web. This paper also explores different Page Rank algorithms and compare those algorithms used for Information Retrieval. In Web Mining, the basics of Web mining and the Web mining categories are explained. Different Page Rank based algorithms like PageRank (PR), WPR (Weighted PageRank), HITS (Hyperlink-Induced Topic Search), DistanceRank and DirichletRank algorithms are discussed and compared. PageRanks are calculated for PageRank and Weighted PageRank algorithms for a given hyperlink structure. Simulation Program is developed for PageRank algorithm because PageRank is the only ranking algorithm implemented in the search engine (Google). The outputs are shown in a table and chart format.

Keywords: Web Mining, Web Structure, Web Graph, LinkAnalysis, PageRank, Weighted PageRank, HITS, DistanceRank, DirichletRank,

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2771
419 Incremental Algorithm to Cluster the Categorical Data with Frequency Based Similarity Measure

Authors: S.Aranganayagi, K.Thangavel

Abstract:

Clustering categorical data is more complicated than the numerical clustering because of its special properties. Scalability and memory constraint is the challenging problem in clustering large data set. This paper presents an incremental algorithm to cluster the categorical data. Frequencies of attribute values contribute much in clustering similar categorical objects. In this paper we propose new similarity measures based on the frequencies of attribute values and its cardinalities. The proposed measures and the algorithm are experimented with the data sets from UCI data repository. Results prove that the proposed method generates better clusters than the existing one.

Keywords: Clustering, Categorical, Incremental, Frequency, Domain

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1783
418 A Social Decision Support Mechanism for Group Purchasing

Authors: Lien-Fa Lin, Yung-Ming Li, Fu-Shun Hsieh

Abstract:

With the advancement of information technology and development of group commerce, people have obviously changed in their lifestyle. However, group commerce faces some challenging problems. The products or services provided by vendors do not satisfactorily reflect customers’ opinions, so that the sale and revenue of group commerce gradually become lower. On the other hand, the process for a formed customer group to reach group-purchasing consensus is time-consuming and the final decision is not the best choice for each group members. In this paper, we design a social decision support mechanism, by using group discussion message to recommend suitable options for group members and we consider social influence and personal preference to generate option ranking list. The proposed mechanism can enhance the group purchasing decision making efficiently and effectively and venders can provide group products or services according to the group option ranking list.

Keywords: Social network, group decision, text mining, group commerce.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1351
417 The Effect of Increment in Simulation Samples on a Combined Selection Procedure

Authors: Mohammad H. Almomani, Rosmanjawati Abdul Rahman

Abstract:

Statistical selection procedures are used to select the best simulated system from a finite set of alternatives. In this paper, we present a procedure that can be used to select the best system when the number of alternatives is large. The proposed procedure consists a combination between Ranking and Selection, and Ordinal Optimization procedures. In order to improve the performance of Ordinal Optimization, Optimal Computing Budget Allocation technique is used to determine the best simulation lengths for all simulation systems and to reduce the total computation time. We also argue the effect of increment in simulation samples for the combined procedure. The results of numerical illustration show clearly the effect of increment in simulation samples on the proposed combination of selection procedure.

Keywords: Indifference-Zone, Optimal Computing Budget Allocation, Ordinal Optimization, Ranking and Selection, Subset Selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1199
416 Wasp Venom Peptides may play a role in the Pathogenesis of Acute Disseminated Encephalomyelitis in Humans: A Structural Similarity Analysis

Authors: Permphan Dharmasaroja

Abstract:

Acute disseminated encephalomyelitis (ADEM) has been reported to develop after a hymenoptera sting, but its pathogenesis is not known in detail. Myelin basic protein (MBP)- specific T cells have been detected in the blood of patients with ADEM, and a proportion of these patients develop multiple sclerosis (MS). In an attempt to understand the mechanisms underlying ADEM, molecular mimicry between hymenoptera venom peptides and the human immunodominant MBP peptide was scrutinized, based on the sequence and structural similarities, whether it was the root of the disease. The results suggest that the three wasp venom peptides have low sequence homology with the human immunodominant MBP residues 85-99. Structural similarity analysis among the three venom peptides and the MS-related HLA-DR2b (DRA, DRB1*1501)-associated immunodominant MHC binding/TCR contact residues 88-93, VVHFFK showed that hyaluronidase residues 7-12, phospholipase A1 residues 98-103, and antigen 5 residues 109-114 showed a high degree of similarity 83.3%, 100%, and 83.3% respectively. In conclusion, some wasp venom peptides, particularly phospholipase A1, may potentially act as the molecular motifs of the human 3HLA-DR2b-associated immunodominant MBP88-93, and possibly present a mechanism for induction of wasp sting-associated ADEM.

Keywords: central nervous system, Hymenoptera, myelin basicprotein, molecular mimicry.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1576
415 Image Indexing Using a Color Similarity Metric based on the Human Visual System

Authors: Angelo Nodari, Ignazio Gallo

Abstract:

The novelty proposed in this study is twofold and consists in the developing of a new color similarity metric based on the human visual system and a new color indexing based on a textual approach. The new color similarity metric proposed is based on the color perception of the human visual system. Consequently the results returned by the indexing system can fulfill as much as possibile the user expectations. We developed a web application to collect the users judgments about the similarities between colors, whose results are used to estimate the metric proposed in this study. In order to index the image's colors, we used a text indexing engine to facilitate the integration of visual features in a database of text documents. The textual signature is build by weighting the image's colors in according to their occurrence in the image. The use of a textual indexing engine, provide us a simple, fast and robust solution to index images. A typical usage of the system proposed in this study, is the development of applications whose data type is both visual and textual. In order to evaluate the proposed method we chose a price comparison engine as a case of study, collecting a series of commercial offers containing the textual description and the image representing a specific commercial offer.

Keywords: Color Extraction, Content-Based Image Retrieval, Indexing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2986
414 An Optimization Algorithm Based on Dynamic Schema with Dissimilarities and Similarities of Chromosomes

Authors: Radhwan Yousif Sedik Al-Jawadi

Abstract:

Optimization is necessary for finding appropriate solutions to a range of real-life problems. In particular, genetic (or more generally, evolutionary) algorithms have proved very useful in solving many problems for which analytical solutions are not available. In this paper, we present an optimization algorithm called Dynamic Schema with Dissimilarity and Similarity of Chromosomes (DSDSC) which is a variant of the classical genetic algorithm. This approach constructs new chromosomes from a schema and pairs of existing ones by exploring their dissimilarities and similarities. To show the effectiveness of the algorithm, it is tested and compared with the classical GA, on 15 two-dimensional optimization problems taken from literature. We have found that, in most cases, our method is better than the classical genetic algorithm.

Keywords: Genetic algorithm, similarity and dissimilarity, chromosome injection, dynamic schema.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1255
413 Skin Lesion Segmentation Using Color Channel Optimization and Clustering-based Histogram Thresholding

Authors: Rahil Garnavi, Mohammad Aldeen, M. Emre Celebi, Alauddin Bhuiyan, Constantinos Dolianitis, George Varigos

Abstract:

Automatic segmentation of skin lesions is the first step towards the automated analysis of malignant melanoma. Although numerous segmentation methods have been developed, few studies have focused on determining the most effective color space for melanoma application. This paper proposes an automatic segmentation algorithm based on color space analysis and clustering-based histogram thresholding, a process which is able to determine the optimal color channel for detecting the borders in dermoscopy images. The algorithm is tested on a set of 30 high resolution dermoscopy images. A comprehensive evaluation of the results is provided, where borders manually drawn by four dermatologists, are compared to automated borders detected by the proposed algorithm, applying three previously used metrics of accuracy, sensitivity, and specificity and a new metric of similarity. By performing ROC analysis and ranking the metrics, it is demonstrated that the best results are obtained with the X and XoYoR color channels, resulting in an accuracy of approximately 97%. The proposed method is also compared with two state-of-theart skin lesion segmentation methods.

Keywords: Border detection, Color space analysis, Dermoscopy, Histogram thresholding, Melanoma, Segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2194
412 A Similarity Metric for Assessment of Image Fusion Algorithms

Authors: Nedeljko Cvejic, Artur Łoza, David Bull, Nishan Canagarajah

Abstract:

In this paper, we present a novel objective nonreference performance assessment algorithm for image fusion. It takes into account local measurements to estimate how well the important information in the source images is represented by the fused image. The metric is based on the Universal Image Quality Index and uses the similarity between blocks of pixels in the input images and the fused image as the weighting factors for the metrics. Experimental results confirm that the values of the proposed metrics correlate well with the subjective quality of the fused images, giving a significant improvement over standard measures based on mean squared error and mutual information.

Keywords: Fusion performance measures, image fusion, nonreferencequality measures, objective quality measures.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2411
411 Generation of Sets of Synthetic Classifiers for the Evaluation of Abstract-Level Combination Methods

Authors: N. Greco, S. Impedovo, R.Modugno, G. Pirlo

Abstract:

This paper presents a new technique for generating sets of synthetic classifiers to evaluate abstract-level combination methods. The sets differ in terms of both recognition rates of the individual classifiers and degree of similarity. For this purpose, each abstract-level classifier is considered as a random variable producing one class label as the output for an input pattern. From the initial set of classifiers, new slightly different sets are generated by applying specific operators, which are defined at the purpose. Finally, the sets of synthetic classifiers have been used to estimate the performance of combination methods for abstract-level classifiers. The experimental results demonstrate the effectiveness of the proposed approach.

Keywords: Abstract-level Classifier, Dempster-Shafer Rule, Multi-expert Systems, Similarity Index, System Evaluation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1435
410 Automatic Segmentation of Dermoscopy Images Using Histogram Thresholding on Optimal Color Channels

Authors: Rahil Garnavi, Mohammad Aldeen, M. Emre Celebi, Alauddin Bhuiyan, Constantinos Dolianitis, George Varigos

Abstract:

Automatic segmentation of skin lesions is the first step towards development of a computer-aided diagnosis of melanoma. Although numerous segmentation methods have been developed, few studies have focused on determining the most discriminative and effective color space for melanoma application. This paper proposes a novel automatic segmentation algorithm using color space analysis and clustering-based histogram thresholding, which is able to determine the optimal color channel for segmentation of skin lesions. To demonstrate the validity of the algorithm, it is tested on a set of 30 high resolution dermoscopy images and a comprehensive evaluation of the results is provided, where borders manually drawn by four dermatologists, are compared to automated borders detected by the proposed algorithm. The evaluation is carried out by applying three previously used metrics of accuracy, sensitivity, and specificity and a new metric of similarity. Through ROC analysis and ranking the metrics, it is shown that the best results are obtained with the X and XoYoR color channels which results in an accuracy of approximately 97%. The proposed method is also compared with two state-ofthe- art skin lesion segmentation methods, which demonstrates the effectiveness and superiority of the proposed segmentation method.

Keywords: Border detection, Color space analysis, Dermoscopy, Histogram thresholding, Melanoma, Segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2024
409 Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance

Authors: Loai AbdAllah, Mahmoud Kaiyal

Abstract:

Missing values in real-world datasets are a common problem. Many algorithms were developed to deal with this problem, most of them replace the missing values with a fixed value that was computed based on the observed values. In our work, we used a distance function based on Bhattacharyya distance to measure the distance between objects with missing values. Bhattacharyya distance, which measures the similarity of two probability distributions. The proposed distance distinguishes between known and unknown values. Where the distance between two known values is the Mahalanobis distance. When, on the other hand, one of them is missing the distance is computed based on the distribution of the known values, for the coordinate that contains the missing value. This method was integrated with Wikaya, a digital health company developing a platform that helps to improve prevention of chronic diseases such as diabetes and cancer. In order for Wikaya’s recommendation system to work distance between users need to be measured. Since there are missing values in the collected data, there is a need to develop a distance function distances between incomplete users profiles. To evaluate the accuracy of the proposed distance function in reflecting the actual similarity between different objects, when some of them contain missing values, we integrated it within the framework of k nearest neighbors (kNN) classifier, since its computation is based only on the similarity between objects. To validate this, we ran the algorithm over diabetes and breast cancer datasets, standard benchmark datasets from the UCI repository. Our experiments show that kNN classifier using our proposed distance function outperforms the kNN using other existing methods.

Keywords: Missing values, distance metric, Bhattacharyya distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 733
408 Genetic Characterization of Barley Genotypes via Inter-Simple Sequence Repeat

Authors: Mustafa Yorgancılar, Emine Atalay, Necdet Akgün, Ali Topal

Abstract:

In this study, polymerase chain reaction based Inter-simple sequence repeat (ISSR) from DNA fingerprinting techniques were used to investigate the genetic relationships among barley crossbreed genotypes in Turkey. It is important that selection based on the genetic base in breeding programs via ISSR, in terms of breeding time. 14 ISSR primers generated a total of 97 bands, of which 81 (83.35%) were polymorphic. The highest total resolution power (RP) value was obtained from the F2 (0.53) and M16 (0.51) primers. According to the ISSR result, the genetic similarity index changed between 0.64–095; Lane 3 with Line 6 genotypes were the closest, while Line 36 were the most distant ones. The ISSR markers were found to be promising for assessing genetic diversity in barley crossbreed genotypes.

Keywords: Barley, crossbreed, genetic similarity, ISSR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 864
407 The Appraisal of Construction Sites Productivity: In Kendall’s Concordance

Authors: Abdulkadir Abu Lawal

Abstract:

For the dearth of reliable cardinal numerical data, the linked phenomena in productivity indices such as operational costs and company turnovers, etc. could not be investigated. This would not give us insight to the root of productivity problems at unique sites. So, ordinal ranking by professionals who were most directly involved with construction sites was applied for Kendall’s concordance. Responses gathered from independent architects, builders/engineers, and quantity surveyors were herein analyzed. They were responses based on factors that affect sites productivity, and these factors were categorized as head office factors, resource management effectiveness factors, motivational factors, and training/skill development factors. It was found that productivity is low and has to be improved in order to facilitate Nigerian efforts in bridging its infrastructure deficit. The significance of this work is underlined with the Kendall’s coefficient of concordance of 0.78, while remedial measures must be emphasized to stimulate better productivity. Further detailed study can be undertaken by using Fuzzy logic analysis on wider Delphi survey.

Keywords: Factors, Kendall’s coefficient of concordance, magnitude of agreement, percentage magnitude of dichotomy, ranking variables.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 929
406 XML Schema Automatic Matching Solution

Authors: Huynh Quyet Thang, Vo Sy Nam

Abstract:

Schema matching plays a key role in many different applications, such as schema integration, data integration, data warehousing, data transformation, E-commerce, peer-to-peer data management, ontology matching and integration, semantic Web, semantic query processing, etc. Manual matching is expensive and error-prone, so it is therefore important to develop techniques to automate the schema matching process. In this paper, we present a solution for XML schema automated matching problem which produces semantic mappings between corresponding schema elements of given source and target schemas. This solution contributed in solving more comprehensively and efficiently XML schema automated matching problem. Our solution based on combining linguistic similarity, data type compatibility and structural similarity of XML schema elements. After describing our solution, we present experimental results that demonstrate the effectiveness of this approach.

Keywords: XML Schema, Schema Matching, SemanticMatching, Automatic XML Schema Matching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1785
405 Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Authors: Shohei Maruyama, Yasuo Matsuyama, Sachiyo Aburatani

Abstract:

Development of a method to estimate gene functions is an important task in bioinformatics. One of the approaches for the annotation is the identification of the metabolic pathway that genes are involved in. Since gene expression data reflect various intracellular phenomena, those data are considered to be related with genes’ functions. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.

Keywords: Metabolic pathways, gene expression data, microarray, Kullback–Leibler divergence, KL divergence, support vector machines, SVM, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2270
404 Fighter Aircraft Selection Using Technique for Order Preference by Similarity to Ideal Solution with Multiple Criteria Decision Making Analysis

Authors: C. Ardil

Abstract:

This paper presents a multiple criteria decision making analysis technique for selecting fighter aircraft for the national air force. The selection of military aircraft is a process consisting of contradictory goals and objectives. When a modern air force needs to choose fighter aircraft to upgrade existing fleets, a multiple criteria decision making analysis and scenario planning for defense acquisition has been put forward. The selection of fighter aircraft for the air defense force is a strategic decision making process, since the purchase or lease of fighter jets, maintenance and operating costs and having a fleet is the biggest cost for the air force. Multiple criteria decision making analysis methods are effectively applied to facilitate decision making from various available options. The selection criteria were determined using the literature on the problem of fighter aircraft selection. The selection of fighter aircraft to be purchased for the air defense forces is handled using a multiple criteria decision making analysis technique that also determines a suitable methodological approach for the defense procurement and fleet upgrade planning process. The aim of this study is to originate an approach to evaluate fighter aircraft alternatives, Su-35, F-35, and TF-X (MMU), based on technique for order preference by similarity to ideal solution (TOPSIS).

Keywords: Fighter Aircraft, Fighter Aircraft Selection, Technique for Order Preference by Similarity to Ideal Solution, TOPSIS, Multiple Criteria Decision Making, Multiple Criteria Decision Making Analysis, MCDMA, Su-35, F-35, TF-X (MMU)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 562
403 Computing Entropy for Ortholog Detection

Authors: Hsing-Kuo Pao, John Case

Abstract:

Biological sequences from different species are called or-thologs if they evolved from a sequence of a common ancestor species and they have the same biological function. Approximations of Kolmogorov complexity or entropy of biological sequences are already well known to be useful in extracting similarity information between such sequences -in the interest, for example, of ortholog detection. As is well known, the exact Kolmogorov complexity is not algorithmically computable. In prac-tice one can approximate it by computable compression methods. How-ever, such compression methods do not provide a good approximation to Kolmogorov complexity for short sequences. Herein is suggested a new ap-proach to overcome the problem that compression approximations may notwork well on short sequences. This approach is inspired by new, conditional computations of Kolmogorov entropy. A main contribution of the empir-ical work described shows the new set of entropy-based machine learning attributes provides good separation between positive (ortholog) and nega-tive (non-ortholog) data - better than with good, previously known alter-natives (which do not employ some means to handle short sequences well).Also empirically compared are the new entropy based attribute set and a number of other, more standard similarity attributes sets commonly used in genomic analysis. The various similarity attributes are evaluated by cross validation, through boosted decision tree induction C5.0, and by Receiver Operating Characteristic (ROC) analysis. The results point to the conclu-sion: the new, entropy based attribute set by itself is not the one giving the best prediction; however, it is the best attribute set for use in improving the other, standard attribute sets when conjoined with them.

Keywords: compression, decision tree, entropy, ortholog, ROC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1788
402 Determinate Fuzzy Set Ranking Analysis for Combat Aircraft Selection with Multiple Criteria Group Decision Making

Authors: C. Ardil

Abstract:

Using the aid of Hausdorff distance function and Minkowski distance function, this study proposes a novel method for selecting combat aircraft for Air Force. In order to do this, the proximity measure method was developed with determinate fuzzy degrees based on the relationship between attributes and combat aircraft alternatives. The combat aircraft selection attributes were identified as payloadability, maneuverability, speedability, stealthability, and survivability. Determinate fuzzy data from the combat aircraft attributes was then aggregated using the determinate fuzzy weighted arithmetic average operator. For the selection of combat aircraft, correlation analysis of the ranking order patterns of options was also examined. A numerical example from military aviation is used to demonstrate the applicability and effectiveness of the proposed method.

Keywords: Combat aircraft selection, multiple criteria decision making, fuzzy sets, determinate fuzzy sets, intuitionistic fuzzy sets, proximity measure method, Hausdorff distance function, Minkowski distance function, PMM, MCDM

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 269
401 Web Proxy Detection via Bipartite Graphs and One-Mode Projections

Authors: Zhipeng Chen, Peng Zhang, Qingyun Liu, Li Guo

Abstract:

With the Internet becoming the dominant channel for business and life, many IPs are increasingly masked using web proxies for illegal purposes such as propagating malware, impersonate phishing pages to steal sensitive data or redirect victims to other malicious targets. Moreover, as Internet traffic continues to grow in size and complexity, it has become an increasingly challenging task to detect the proxy service due to their dynamic update and high anonymity. In this paper, we present an approach based on behavioral graph analysis to study the behavior similarity of web proxy users. Specifically, we use bipartite graphs to model host communications from network traffic and build one-mode projections of bipartite graphs for discovering social-behavior similarity of web proxy users. Based on the similarity matrices of end-users from the derived one-mode projection graphs, we apply a simple yet effective spectral clustering algorithm to discover the inherent web proxy users behavior clusters. The web proxy URL may vary from time to time. Still, the inherent interest would not. So, based on the intuition, by dint of our private tools implemented by WebDriver, we examine whether the top URLs visited by the web proxy users are web proxies. Our experiment results based on real datasets show that the behavior clusters not only reduce the number of URLs analysis but also provide an effective way to detect the web proxies, especially for the unknown web proxies.

Keywords: Bipartite graph, clustering, one-mode projection, web proxy detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 696
400 Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Authors: Yunseok Noh, Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park

Abstract:

Scripts are one of the basic text resources to understand broadcasting contents. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches, and provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scene segments consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics by statistical learning method. To tackle this problem, we propose a method to improve topic quality with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, more accurate topical representations lead to get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. By iteratively inferring topics and determining semantically neighborhood scene segments, we draw a topic space represents broadcasting contents well. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

Keywords: Broadcasting contents, generalized P´olya urn model, scripts, text similarity, topic model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1775
399 Error-Robust Nature of Genome Profiling Applied for Clustering of Species Demonstrated by Computer Simulation

Authors: Shamim Ahmed Koichi Nishigaki

Abstract:

Genome profiling (GP), a genotype based technology, which exploits random PCR and temperature gradient gel electrophoresis, has been successful in identification/classification of organisms. In this technology, spiddos (Species identification dots) and PaSS (Pattern similarity score) were employed for measuring the closeness (or distance) between genomes. Based on the closeness (PaSS), we can buildup phylogenetic trees of the organisms. We noticed that the topology of the tree is rather robust against the experimental fluctuation conveyed by spiddos. This fact was confirmed quantitatively in this study by computer-simulation, providing the limit of the reliability of this highly powerful methodology. As a result, we could demonstrate the effectiveness of the GP approach for identification/classification of organisms.

Keywords: Fluctuation, Genome profiling (GP), Pattern similarity score (PaSS), Robustness, Spiddos-shift.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1493
398 Destination Port Detection for Vessels: An Analytic Tool for Optimizing Port Authorities Resources

Authors: Lubna Eljabu, Mohammad Etemad, Stan Matwin

Abstract:

Port authorities have many challenges in congested ports to allocate their resources to provide a safe and secure loading/unloading procedure for cargo vessels. Selecting a destination port is the decision of a vessel master based on many factors such as weather, wavelength and changes of priorities. Having access to a tool which leverages Automatic Identification System (AIS) messages to monitor vessel’s movements and accurately predict their next destination port promotes an effective resource allocation process for port authorities. In this research, we propose a method, namely, Reference Route of Trajectory (RRoT) to assist port authorities in predicting inflow and outflow traffic in their local environment by monitoring AIS messages. Our RRo method creates a reference route based on historical AIS messages. It utilizes some of the best trajectory similarity measures to identify the destination of a vessel using their recent movement. We evaluated five different similarity measures such as Discrete Frechet Distance (DFD), Dynamic Time ´ Warping (DTW), Partial Curve Mapping (PCM), Area between two curves (Area) and Curve length (CL). Our experiments show that our method identifies the destination port with an accuracy of 98.97% and an f-measure of 99.08% using Dynamic Time Warping (DTW) similarity measure.

Keywords: Spatial temporal data mining, trajectory mining, trajectory similarity, resource optimization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 623