Search results for: similarity searching
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 997

Search results for: similarity searching

997 2D Fingerprint Performance for PubChem Chemical Database

Authors: Fatimah Zawani Abdullah, Shereena Mohd Arif, Nurul Malim

Abstract:

The study of molecular similarity search in chemical database is increasingly widespread, especially in the area of drug discovery. Similarity search is an application in the field of Chemoinformatics to measure the similarity between the molecular structure which is known as the query and the structure of chemical compounds in the database. Similarity search is also one of the approaches in virtual screening which involves computational techniques and scoring the probabilities of activity. The main objective of this work is to determine the best fingerprint when compared to the other five fingerprints selected in this study using PubChem chemical dataset. This paper will discuss the similarity searching process conducted using 6 types of descriptors, which are ECFP4, ECFC4, FCFP4, FCFC4, SRECFC4 and SRFCFC4 on 15 activity classes of PubChem dataset using Tanimoto coefficient to calculate the similarity between the query structures and each of the database structure. The results suggest that ECFP4 performs the best to be used with Tanimoto coefficient in the PubChem dataset.

Keywords: 2D fingerprints, Tanimoto, PubChem, similarity searching, chemoinformatics

Procedia PDF Downloads 267
996 3D Objects Indexing Using Spherical Harmonic for Optimum Measurement Similarity

Authors: S. Hellam, Y. Oulahrir, F. El Mounchid, A. Sadiq, S. Mbarki

Abstract:

In this paper, we propose a method for three-dimensional (3-D)-model indexing based on defining a new descriptor, which we call new descriptor using spherical harmonics. The purpose of the method is to minimize, the processing time on the database of objects models and the searching time of similar objects to request object. Firstly we start by defining the new descriptor using a new division of 3-D object in a sphere. Then we define a new distance which will be used in the search for similar objects in the database.

Keywords: 3D indexation, spherical harmonic, similarity of 3D objects, measurement similarity

Procedia PDF Downloads 406
995 Approximately Similarity Measurement of Web Sites Using Genetic Algorithms and Binary Trees

Authors: Doru Anastasiu Popescu, Dan Rădulescu

Abstract:

In this paper, we determine the similarity of two HTML web applications. We are going to use a genetic algorithm in order to determine the most significant web pages of each application (we are not going to use every web page of a site). Using these significant web pages, we will find the similarity value between the two applications. The algorithm is going to be efficient because we are going to use a reduced number of web pages for comparisons but it will return an approximate value of the similarity. The binary trees are used to keep the tags from the significant pages. The algorithm was implemented in Java language.

Keywords: Tag, HTML, web page, genetic algorithm, similarity value, binary tree

Procedia PDF Downloads 336
994 Measuring Text-Based Semantics Relatedness Using WordNet

Authors: Madiha Khan, Sidrah Ramzan, Seemab Khan, Shahzad Hassan, Kamran Saeed

Abstract:

Measuring semantic similarity between texts is calculating semantic relatedness between texts using various techniques. Our web application (Measuring Relatedness of Concepts-MRC) allows user to input two text corpuses and get semantic similarity percentage between both using WordNet. Our application goes through five stages for the computation of semantic relatedness. Those stages are: Preprocessing (extracts keywords from content), Feature Extraction (classification of words into Parts-of-Speech), Synonyms Extraction (retrieves synonyms against each keyword), Measuring Similarity (using keywords and synonyms, similarity is measured) and Visualization (graphical representation of similarity measure). Hence the user can measure similarity on basis of features as well. The end result is a percentage score and the word(s) which form the basis of similarity between both texts with use of different tools on same platform. In future work we look forward for a Web as a live corpus application that provides a simpler and user friendly tool to compare documents and extract useful information.

Keywords: Graphviz representation, semantic relatedness, similarity measurement, WordNet similarity

Procedia PDF Downloads 212
993 Quick Similarity Measurement of Binary Images via Probabilistic Pixel Mapping

Authors: Adnan A. Y. Mustafa

Abstract:

In this paper we present a quick technique to measure the similarity between binary images. The technique is based on a probabilistic mapping approach and is fast because only a minute percentage of the image pixels need to be compared to measure the similarity, and not the whole image. We exploit the power of the Probabilistic Matching Model for Binary Images (PMMBI) to arrive at an estimate of the similarity. We show that the estimate is a good approximation of the actual value, and the quality of the estimate can be improved further with increased image mappings. Furthermore, the technique is image size invariant; the similarity between big images can be measured as fast as that for small images. Examples of trials conducted on real images are presented.

Keywords: big images, binary images, image matching, image similarity

Procedia PDF Downloads 171
992 A Context-Sensitive Algorithm for Media Similarity Search

Authors: Guang-Ho Cha

Abstract:

This paper presents a context-sensitive media similarity search algorithm. One of the central problems regarding media search is the semantic gap between the low-level features computed automatically from media data and the human interpretation of them. This is because the notion of similarity is usually based on high-level abstraction but the low-level features do not sometimes reflect the human perception. Many media search algorithms have used the Minkowski metric to measure similarity between image pairs. However those functions cannot adequately capture the aspects of the characteristics of the human visual system as well as the nonlinear relationships in contextual information given by images in a collection. Our search algorithm tackles this problem by employing a similarity measure and a ranking strategy that reflect the nonlinearity of human perception and contextual information in a dataset. Similarity search in an image database based on this contextual information shows encouraging experimental results.

Keywords: context-sensitive search, image search, similarity ranking, similarity search

Procedia PDF Downloads 342
991 Nazca: A Context-Based Matching Method for Searching Heterogeneous Structures

Authors: Karine B. de Oliveira, Carina F. Dorneles

Abstract:

The structure level matching is the problem of combining elements of a structure, which can be represented as entities, classes, XML elements, web forms, and so on. This is a challenge due to large number of distinct representations of semantically similar structures. This paper describes a structure-based matching method applied to search for different representations in data sources, considering the similarity between elements of two structures and the data source context. Using real data sources, we have conducted an experimental study comparing our approach with our baseline implementation and with another important schema matching approach. We demonstrate that our proposal reaches higher precision than the baseline.

Keywords: context, data source, index, matching, search, similarity, structure

Procedia PDF Downloads 338
990 Review and Suggestions of the Similarity between Employee and Its Workplace

Authors: Gi Ryung Song, Kyoung Seok Kim

Abstract:

This study reviewed the literature that focused on similarity of various characteristics such as values, personality, or demographics between employee and other elements in its organization for example employee with leader, job, and organization. We divided a body of this study into two parts and organized and demonstrated recent studies in first part. Three issues appeared in this part, which are statistical ways of measuring similarity, supervisor-subordinate similarity, and person-organization fit with person-job fit. In the latter part, based on the three issues of recent studies, we suggested three propositions about points that the recent studies missed or the studies did not orient. First proposition argued about the direction of similarity, which could also be interpreted as there is causal relation between employee and its workplace environments. Second, we suggested a consideration of eliminating common variance buried in one’s characteristics or its profiles. Third proposition was about the similarity of extra role behavior between individual and organization, and we treated this organization’s level of extra role behavior as a kind of its culture. In doing so, similarity of individual’s extra role behavior and organization’s has the meaning that individual’s congruence against their organization culture.

Keywords: similarity, person-organization fit, supervisor-subordinate similarity, literature review

Procedia PDF Downloads 253
989 Searching Linguistic Synonyms through Parts of Speech Tagging

Authors: Faiza Hussain, Usman Qamar

Abstract:

Synonym-based searching is recognized to be a complicated problem as text mining from unstructured data of web is challenging. Finding useful information which matches user need from bulk of web pages is a cumbersome task. In this paper, a novel and practical synonym retrieval technique is proposed for addressing this problem. For replacement of semantics, user intent is taken into consideration to realize the technique. Parts-of-Speech tagging is applied for pattern generation of the query and a thesaurus for this experiment was formed and used. Comparison with Non-Context Based Searching, Context Based searching proved to be a more efficient approach while dealing with linguistic semantics. This approach is very beneficial in doing intent based searching. Finally, results and future dimensions are presented.

Keywords: natural language processing, text mining, information retrieval, parts-of-speech tagging, grammar, semantics

Procedia PDF Downloads 288
988 Development of a Vegetation Searching System

Authors: Rattanathip Rattanachai, Kunyanuth Kularbphettong

Abstract:

This paper describes the development of a Vegetation Searching System based on Web Application in case of Suan Sunandha Rajabhat University. The model was developed by PHP, JavaScript, and MySQL database system and it was designed to support searching endemic and rare species of tree on web site. We describe the design methods and functional components of this prototype. To evaluate the system performance, questionnaires for system usability and Black Box Testing were used to measure expert and user satisfaction. The results were satisfactory as followed: Means for experts and users were 4.3 and 4.5, and standard deviation for experts and users were 0.61 and 0.73 respectively. Further analysis showed that the quality of plant searching web site was also at a good level as well.

Keywords: endemic species, vegetation, web-based system, black box testing, Thailand

Procedia PDF Downloads 292
987 Predictive Analysis of Personnel Relationship in Graph Database

Authors: Kay Thi Yar, Khin Mar Lar Tun

Abstract:

Nowadays, social networks are so popular and widely used in all over the world. In addition, searching personal information of each person and searching connection between them (peoples’ relation in real world) becomes interesting issue in our society. In this paper, we propose a framework with three portions for exploring peoples’ relations from their connected information. The first portion focuses on the Graph database structure to store the connected data of peoples’ information. The second one proposes the graph database searching algorithm, the Modified-SoS-ACO (Sense of Smell-Ant Colony Optimization). The last portion proposes the Deductive Reasoning Algorithm to define two persons’ relationship. This study reveals the proper storage structure for connected information, graph searching algorithm and deductive reasoning algorithm to predict and analyze the personnel relationship from peoples’ relation in their connected information.

Keywords: personnel information, graph storage structure, graph searching algorithm, deductive reasoning algorithm

Procedia PDF Downloads 431
986 Similarity Based Membership of Elements to Uncertain Concept in Information System

Authors: M. Kamel El-Sayed

Abstract:

The process of determining the degree of membership for an element to an uncertain concept has been found in many ways, using equivalence and symmetry relations in information systems. In the case of similarity, these methods did not take into account the degree of symmetry between elements. In this paper, we use a new definition for finding the membership based on the degree of symmetry. We provide an example to clarify the suggested methods and compare it with previous methods. This method opens the door to more accurate decisions in information systems.

Keywords: information system, uncertain concept, membership function, similarity relation, degree of similarity

Procedia PDF Downloads 195
985 Agglomerative Hierarchical Clustering Using the Tθ Family of Similarity Measures

Authors: Salima Kouici, Abdelkader Khelladi

Abstract:

In this work, we begin with the presentation of the Tθ family of usual similarity measures concerning multidimensional binary data. Subsequently, some properties of these measures are proposed. Finally, the impact of the use of different inter-elements measures on the results of the Agglomerative Hierarchical Clustering Methods is studied.

Keywords: binary data, similarity measure, Tθ measures, agglomerative hierarchical clustering

Procedia PDF Downloads 461
984 Internet of Things: Route Search Optimization Applying Ant Colony Algorithm and Theory of Computer Science

Authors: Tushar Bhardwaj

Abstract:

Internet of Things (IoT) possesses a dynamic network where the network nodes (mobile devices) are added and removed constantly and randomly, hence the traffic distribution in the network is quite variable and irregular. The basic but very important part in any network is route searching. We have many conventional route searching algorithms like link-state, and distance vector algorithms but they are restricted to the static point to point network topology. In this paper we propose a model that uses the Ant Colony Algorithm for route searching. It is dynamic in nature and has positive feedback mechanism that conforms to the route searching. We have also embedded the concept of Non-Deterministic Finite Automata [NDFA] minimization to reduce the network to increase the performance. Results show that Ant Colony Algorithm gives the shortest path from the source to destination node and NDFA minimization reduces the broadcasting storm effectively.

Keywords: routing, ant colony algorithm, NDFA, IoT

Procedia PDF Downloads 418
983 Empirical Study of Partitions Similarity Measures

Authors: Abdelkrim Alfalah, Lahcen Ouarbya, John Howroyd

Abstract:

This paper investigates and compares the performance of four existing distances and similarity measures between partitions. The partition measures considered are Rand Index (RI), Adjusted Rand Index (ARI), Variation of Information (VI), and Normalised Variation of Information (NVI). This work investigates the ability of these partition measures to capture three predefined intuitions: the variation within randomly generated partitions, the sensitivity to small perturbations, and finally the independence from the dataset scale. It has been shown that the Adjusted Rand Index performed well overall, with regards to these three intuitions.

Keywords: clustering, comparing partitions, similarity measure, partition distance, partition metric, similarity between partitions, clustering comparison.

Procedia PDF Downloads 167
982 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 497
981 Tool for Determining the Similarity between Two Web Applications

Authors: Doru Anastasiu Popescu, Raducanu Dragos Ionut

Abstract:

In this paper the presentation of a tool which measures the similarity between two websites is made. The websites are compound only from webpages created with HTML. The tool uses three ways of calculating the similarity between two websites based on certain results already published. The first way compares all the webpages within a website, the second way compares a webpage with all the pages within the second website and the third way compares two webpages. Java programming language and technologies such as spring, Jsoup, log4j were used for the implementation of the tool.

Keywords: Java, Jsoup, HTM, spring

Procedia PDF Downloads 364
980 Improving Similarity Search Using Clustered Data

Authors: Deokho Kim, Wonwoo Lee, Jaewoong Lee, Teresa Ng, Gun-Ill Lee, Jiwon Jeong

Abstract:

This paper presents a method for improving object search accuracy using a deep learning model. A major limitation to provide accurate similarity with deep learning is the requirement of huge amount of data for training pairwise similarity scores (metrics), which is impractical to collect. Thus, similarity scores are usually trained with a relatively small dataset, which comes from a different domain, causing limited accuracy on measuring similarity. For this reason, this paper proposes a deep learning model that can be trained with a significantly small amount of data, a clustered data which of each cluster contains a set of visually similar images. In order to measure similarity distance with the proposed method, visual features of two images are extracted from intermediate layers of a convolutional neural network with various pooling methods, and the network is trained with pairwise similarity scores which is defined zero for images in identical cluster. The proposed method outperforms the state-of-the-art object similarity scoring techniques on evaluation for finding exact items. The proposed method achieves 86.5% of accuracy compared to the accuracy of the state-of-the-art technique, which is 59.9%. That is, an exact item can be found among four retrieved images with an accuracy of 86.5%, and the rest can possibly be similar products more than the accuracy. Therefore, the proposed method can greatly reduce the amount of training data with an order of magnitude as well as providing a reliable similarity metric.

Keywords: visual search, deep learning, convolutional neural network, machine learning

Procedia PDF Downloads 194
979 Impact of Similarity Ratings on Human Judgement

Authors: Ian A. McCulloh, Madelaine Zinser, Jesse Patsolic, Michael Ramos

Abstract:

Recommender systems are a common artificial intelligence (AI) application. For any given input, a search system will return a rank-ordered list of similar items. As users review returned items, they must decide when to halt the search and either revise search terms or conclude their requirement is novel with no similar items in the database. We present a statistically designed experiment that investigates the impact of similarity ratings on human judgement to conclude a search item is novel and halt the search. 450 participants were recruited from Amazon Mechanical Turk to render judgement across 12 decision tasks. We find the inclusion of ratings increases the human perception that items are novel. Percent similarity increases novelty discernment when compared with star-rated similarity or the absence of a rating. Ratings reduce the time to decide and improve decision confidence. This suggests the inclusion of similarity ratings can aid human decision-makers in knowledge search tasks.

Keywords: ratings, rankings, crowdsourcing, empirical studies, user studies, similarity measures, human-centered computing, novelty in information retrieval

Procedia PDF Downloads 99
978 Text Similarity in Vector Space Models: A Comparative Study

Authors: Omid Shahmirzadi, Adam Lugowski, Kenneth Younge

Abstract:

Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.

Keywords: big data, patent, text embedding, text similarity, vector space model

Procedia PDF Downloads 149
977 Static vs. Stream Mining Trajectories Similarity Measures

Authors: Musaab Riyadh, Norwati Mustapha, Dina Riyadh

Abstract:

Trajectory similarity can be defined as the cost of transforming one trajectory into another based on certain similarity method. It is the core of numerous mining tasks such as clustering, classification, and indexing. Various approaches have been suggested to measure similarity based on the geometric and dynamic properties of trajectory, the overlapping between trajectory segments, and the confined area between entire trajectories. In this article, an evaluation of these approaches has been done based on computational cost, usage memory, accuracy, and the amount of data which is needed in advance to determine its suitability to stream mining applications. The evaluation results show that the stream mining applications support similarity methods which have low computational cost and memory, single scan on data, and free of mathematical complexity due to the high-speed generation of data.

Keywords: global distance measure, local distance measure, semantic trajectory, spatial dimension, stream data mining

Procedia PDF Downloads 376
976 Discovering the Dimension of Abstractness: Structure-Based Model that Learns New Categories and Categorizes on Different Levels of Abstraction

Authors: Georgi I. Petkov, Ivan I. Vankov, Yolina A. Petrova

Abstract:

A structure-based model of category learning and categorization at different levels of abstraction is presented. The model compares different structures and expresses their similarity implicitly in the forms of mappings. Based on this similarity, the model can categorize different targets either as members of categories that it already has or creates new categories. The model is novel using two threshold parameters to evaluate the structural correspondence. If the similarity between two structures exceeds the higher threshold, a new sub-ordinate category is created. Vice versa, if the similarity does not exceed the higher threshold but does the lower one, the model creates a new category on higher level of abstraction.

Keywords: analogy-making, categorization, learning of categories, abstraction, hierarchical structure

Procedia PDF Downloads 163
975 Graph Similarity: Algebraic Model and Its Application to Nonuniform Signal Processing

Authors: Nileshkumar Vishnav, Aditya Tatu

Abstract:

A recent approach of representing graph signals and graph filters as polynomials is useful for graph signal processing. In this approach, the adjacency matrix plays pivotal role; instead of the more common approach involving graph-Laplacian. In this work, we follow the adjacency matrix based approach and corresponding algebraic signal model. We further expand the theory and introduce the concept of similarity of two graphs. The similarity of graphs is useful in that key properties (such as filter-response, algebra related to graph) get transferred from one graph to another. We demonstrate potential applications of the relation between two similar graphs, such as nonuniform filter design, DTMF detection and signal reconstruction.

Keywords: graph signal processing, algebraic signal processing, graph similarity, isospectral graphs, nonuniform signal processing

Procedia PDF Downloads 324
974 The Use of Voice in Online Public Access Catalog as Faster Searching Device

Authors: Maisyatus Suadaa Irfana, Nove Eka Variant Anna, Dyah Puspitasari Sri Rahayu

Abstract:

Technological developments provide convenience to all the people. Nowadays, the communication of human with the computer is done via text. With the development of technology, human and computer communications have been conducted with a voice like communication between human beings. It provides an easy facility for many people, especially those who have special needs. Voice search technology is applied in the search of book collections in the OPAC (Online Public Access Catalog), so library visitors will find it faster and easier to find books that they need. Integration with Google is needed to convert the voice into text. To optimize the time and the results of searching, Server will download all the book data that is available in the server database. Then, the data will be converted into JSON format. In addition, the incorporation of some algorithms is conducted including Decomposition (parse) in the form of array of JSON format, the index making, analyzer to the result. It aims to make the process of searching much faster than the usual searching in OPAC because the data are directly taken to the database for every search warrant. Data Update Menu is provided with the purpose to enable users perform their own data updates and get the latest data information.

Keywords: OPAC, voice, searching, faster

Procedia PDF Downloads 320
973 Clustering of Association Rules of ISIS & Al-Qaeda Based on Similarity Measures

Authors: Tamanna Goyal, Divya Bansal, Sanjeev Sofat

Abstract:

In world-threatening terrorist attacks, where early detection, distinction, and prediction are effective diagnosis techniques and for functionally accurate and precise analysis of terrorism data, there are so many data mining & statistical approaches to assure accuracy. The computational extraction of derived patterns is a non-trivial task which comprises specific domain discovery by means of sophisticated algorithm design and analysis. This paper proposes an approach for similarity extraction by obtaining the useful attributes from the available datasets of terrorist attacks and then applying feature selection technique based on the statistical impurity measures followed by clustering techniques on the basis of similarity measures. On the basis of degree of participation of attributes in the rules, the associative dependencies between the attacks are analyzed. Consequently, to compute the similarity among the discovered rules, we applied a weighted similarity measure. Finally, the rules are grouped by applying using hierarchical clustering. We have applied it to an open source dataset to determine the usability and efficiency of our technique, and a literature search is also accomplished to support the efficiency and accuracy of our results.

Keywords: association rules, clustering, similarity measure, statistical approaches

Procedia PDF Downloads 296
972 Map Matching Performance under Various Similarity Metrics for Heterogeneous Robot Teams

Authors: M. C. Akay, A. Aybakan, H. Temeltas

Abstract:

Aerial and ground robots have various advantages of usage in different missions. Aerial robots can move quickly and get a different sight of view of the area, but those vehicles cannot carry heavy payloads. On the other hand, unmanned ground vehicles (UGVs) are slow moving vehicles, since those can carry heavier payloads than unmanned aerial vehicles (UAVs). In this context, we investigate the performances of various Similarity Metrics to provide a common map for Heterogeneous Robot Team (HRT) in complex environments. Within the usage of Lidar Odometry and Octree Mapping technique, the local 3D maps of the environment are gathered.  In order to obtain a common map for HRT, informative theoretic similarity metrics are exploited. All types of these similarity metrics gave adequate as allowable simulation time and accurate results that can be used in different types of applications. For the heterogeneous multi robot team, those methods can be used to match different types of maps.

Keywords: common maps, heterogeneous robot team, map matching, informative theoretic similarity metrics

Procedia PDF Downloads 140
971 A Similarity/Dissimilarity Measure to Biological Sequence Alignment

Authors: Muhammad A. Khan, Waseem Shahzad

Abstract:

Analysis of protein sequences is carried out for the purpose to discover their structural and ancestry relationship. Sequence similarity determines similar protein structures, similar function, and homology detection. Biological sequences composed of amino acid residues or nucleotides provide significant information through sequence alignment. In this paper, we present a new similarity/dissimilarity measure to sequence alignment based on the primary structure of a protein. The approach finds the distance between the two given sequences using the novel sequence alignment algorithm and a mathematical model. The algorithm runs at a time complexity of O(n²). A distance matrix is generated to construct a phylogenetic tree of different species. The new similarity/dissimilarity measure outperforms other existing methods.

Keywords: alignment, distance, homology, mathematical model, phylogenetic tree

Procedia PDF Downloads 156
970 Analytical Similarity Assessment of Bevacizumab Biosimilar Candidate MB02 Using Multiple State-of-the-Art Assays

Authors: Marie-Elise Beydon, Daniel Sacristan, Isabel Ruppen

Abstract:

MB02 (Alymsys®) is a candidate biosimilar to bevacizumab, which was developed against the reference product (RP) Avastin® sourced from both the European Union (EU) and United States (US). MB02 has been extensively characterized comparatively to Avastin® at a physicochemical and biological level using sensitive orthogonal state-of-the-art analytical methods. MB02 has been demonstrated similar to the RP with regard to its primary and higher-order structure, post- and co-translational profiles such as glycosylation, charge, and size variants. Specific focus has been put on the characterization of Fab-related activities, such as binding to VEGF A 165, which directly reflect the bevacizumab mechanism of action. Fc-related functionality was also investigated, including binding to FcRn, which is indicative of antibodies' half-life. The data generated during the analytical similarity assessment demonstrate the high analytical similarity of MB02 to its RP.

Keywords: analytical similarity, bevacizumab, biosimilar, MB02

Procedia PDF Downloads 251
969 A Word-to-Vector Formulation for Word Representation

Authors: Sandra Rizkallah, Amir F. Atiya

Abstract:

This work presents a novel word to vector representation that is based on embedding the words into a sphere, whereby the dot product of the corresponding vectors represents the similarity between any two words. Embedding the vectors into a sphere enabled us to take into consideration the antonymity between words, not only the synonymity, because of the suitability to handle the polarity nature of words. For example, a word and its antonym can be represented as a vector and its negative. Moreover, we have managed to extract an adequate vocabulary. The obtained results show that the proposed approach can capture the essence of the language, and can be generalized to estimate a correct similarity of any new pair of words.

Keywords: natural language processing, word to vector, text similarity, text mining

Procedia PDF Downloads 249
968 Resume Ranking Using Custom Word2vec and Rule-Based Natural Language Processing Techniques

Authors: Subodh Chandra Shakya, Rajendra Sapkota, Aakash Tamang, Shushant Pudasaini, Sujan Adhikari, Sajjan Adhikari

Abstract:

Lots of efforts have been made in order to measure the semantic similarity between the text corpora in the documents. Techniques have been evolved to measure the similarity of two documents. One such state-of-art technique in the field of Natural Language Processing (NLP) is word to vector models, which converts the words into their word-embedding and measures the similarity between the vectors. We found this to be quite useful for the task of resume ranking. So, this research paper is the implementation of the word2vec model along with other Natural Language Processing techniques in order to rank the resumes for the particular job description so as to automate the process of hiring. The research paper proposes the system and the findings that were made during the process of building the system.

Keywords: chunking, document similarity, information extraction, natural language processing, word2vec, word embedding

Procedia PDF Downloads 134