Search results for: Jaccard similarity
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 675

Search results for: Jaccard similarity

615 Genomic and Proteomic Variability in Glycine Max Genotypes in Response to Salt Stress

Authors: Faheema Khan

Abstract:

To investigate the ability of sensitive and tolerant genotype of Glycine max to adapt to a saline environment in a field, we examined the growth performance, water relation and activities of antioxidant enzymes in relation to photosynthetic rate, chlorophyll a fluorescence, photosynthetic pigment concentration, protein and proline in plants exposed to salt stress. Ten soybean genotypes (Pusa-20, Pusa-40, Pusa-37, Pusa-16, Pusa-24, Pusa-22, BRAGG, PK-416, PK-1042, and DS-9712) were selected and grown hydroponically. After 3 days of proper germination, the seedlings were transferred to Hoagland’s solution (Hoagland and Arnon 1950). The growth chamber was maintained at a photosynthetic photon flux density of 430 μmol m−2 s−1, 14 h of light, 10 h of dark and a relative humidity of 60%. The nutrient solution was bubbled with sterile air and changed on alternate days. Ten-day-old seedlings were given seven levels of salt in the form of NaCl viz., T1 = 0 mM NaCl, T2=25 mM NaCl, T3=50 mM NaCl, T4=75 mM NaCl, T5=100 mM NaCl, T6=125 mM NaCl, T7=150 mM NaCl. The investigation showed that genotype Pusa-24, PK-416 and Pusa-20 appeared to be the most salt-sensitive. genotypes as inferred from their significantly reduced length, fresh weight and dry weight in response to the NaCl exposure. Pusa-37 appeared to be the most tolerant genotype since no significant effect of NaCl treatment on growth was found. We observed a greater decline in the photosynthetic variables like photosynthetic rate, chlorophyll fluorescence and chlorophyll content, in salt-sensitive (Pusa-24) genotype than in salt-tolerant Pusa-37 under high salinity. Numerous primers were verified on ten soybean genotypes obtained from Operon technologies among which 30 RAPD primers shown high polymorphism and genetic variation. The Jaccard’s similarity coefficient values for each pairwise comparison between cultivars were calculated and similarity coefficient matrix was constructed. The closer varieties in the cluster behaved similar in their response to salinity tolerance. Intra-clustering within the two clusters precisely grouped the 10 genotypes in sub-cluster as expected from their physiological findings.Salt tolerant genotype Pusa-37, was further analysed by 2-Dimensional gel electrophoresis to analyse the differential expression of proteins at high salt stress. In the Present study, 173 protein spots were identified. Of these, 40 proteins responsive to salinity were either up- or down-regulated in Pusa-37. Proteomic analysis in salt-tolerant genotype (Pusa-37) led to the detection of proteins involved in a variety of biological processes, such as protein synthesis (12 %), redox regulation (19 %), primary and secondary metabolism (25 %), or disease- and defence-related processes (32 %). In conclusion, the soybean plants in our study responded to salt stress by changing their protein expression pattern. The photosynthetic, biochemical and molecular study showed that there is variability in salt tolerance behaviour in soybean genotypes. Pusa-24 is the salt-sensitive and Pusa-37 is the salt-tolerant genotype. Moreover this study gives new insights into the salt-stress response in soybean and demonstrates the power of genomic and proteomic approach in plant biology studies which finally could help us in identifying the possible regulatory switches (gene/s) controlling the salt tolerant genotype of the crop plants and their possible role in defence mechanism.

Keywords: glycine max, salt stress, RAPD, genomic and proteomic variability

Procedia PDF Downloads 423
614 Top-K Shortest Distance as a Similarity Measure

Authors: Andrey Lebedev, Ilya Dmitrenok, JooYoung Lee, Leonard Johard

Abstract:

Top-k shortest path routing problem is an extension of finding the shortest path in a given network. Shortest path is one of the most essential measures as it reveals the relations between two nodes in a network. However, in many real world networks, whose diameters are small, top-k shortest path is more interesting as it contains more information about the network topology. Many variations to compute top-k shortest paths have been studied. In this paper, we apply an efficient top-k shortest distance routing algorithm to the link prediction problem and test its efficacy. We compare the results with other base line and state-of-the-art methods as well as with the shortest path. Then, we also propose a top-k distance based graph matching algorithm.

Keywords: graph matching, link prediction, shortest path, similarity

Procedia PDF Downloads 358
613 The Effects of Different Types of Herbicides Used for Lawn Maintenance on the Dynamics of Weeds in an Urban Environment

Authors: Yetunde I. Bulu, Moses B. Adewole, Julius O. Faluyi

Abstract:

This study investigates the effect of aggressive application of herbicide on weed succession in an urban environment in Ile-Ife, Osun State. An inspection of the communities was carried out to identify sites maintained by herbicides (test plots) and those without herbicide history (control plots). Four different experimental plots located at Olasode, Eleweran, Ife City and Parakin within Ile-Ife town were monitored during the study. Comprehensive enumeration and identification of plant populations to species level was carried out on each of the plots and at every visit to determine the direction of succession. Index of similarities was used to determine the relationship in plant species composition between plots treated with herbicide and the untreated plots. The trend of increasing plant species was observed in all the study plots. Low Similarity Index between the treated plots and the control vegetation was observed at all visitations. Low similarity was also observed between the above-ground vegetation and the seed bank in all the plots. The study concluded that the weed population observed from the experimental plots showed an increase in species richness and diversity when the plots were left to recover compared to the control plots.

Keywords: herbicide, index of similarity, population, soil seed bank, succession

Procedia PDF Downloads 161
612 Case-Based Reasoning Approach for Process Planning of Internal Thread Cold Extrusion

Authors: D. Zhang, H. Y. Du, G. W. Li, J. Zeng, D. W. Zuo, Y. P. You

Abstract:

For the difficult issues of process selection, case-based reasoning technology is applied to computer aided process planning system for cold form tapping of internal threads on the basis of similarity in the process. A model is established based on the analysis of process planning. Case representation and similarity computing method are given. Confidence degree is used to evaluate the case. Rule-based reuse strategy is presented. The scheme is illustrated and verified by practical application. The case shows the design results with the proposed method are effective.

Keywords: case-based reasoning, internal thread, cold extrusion, process planning

Procedia PDF Downloads 510
611 Algorithms for Fast Computation of Pan Matrix Profiles of Time Series Under Unnormalized Euclidean Distances

Authors: Jing Zhang, Daniel Nikovski

Abstract:

We propose an approximation algorithm called LINKUMP to compute the Pan Matrix Profile (PMP) under the unnormalized l∞ distance (useful for value-based similarity search) using double-ended queue and linear interpolation. The algorithm has comparable time/space complexities as the state-of-the-art algorithm for typical PMP computation under the normalized l₂ distance (useful for shape-based similarity search). We validate its efficiency and effectiveness through extensive numerical experiments and a real-world anomaly detection application.

Keywords: pan matrix profile, unnormalized euclidean distance, double-ended queue, discord discovery, anomaly detection

Procedia PDF Downloads 247
610 Flow Behavior and Performances of Centrifugal Compressor Stage Vaneless Diffusers

Authors: Y.Galerkin, O. Solovieva

Abstract:

Flow parameters are calculated in vaneless diffusers with relative width 0,014 – 0,10 constant along radii. Inlet flow angles and similarity criteria were varied. Information about flow structure is presented – meridian streamlines configuration, information on flow full development, flow separation. Polytrophic efficiency, loss and recovery coefficient are used to compare diffusers’ effectiveness. The sample of narrow diffuser optimization by conical walls application is presented. Three tampered variants of a wide diffuser are compared too. The work is made in the R&D laboratory “Gas dynamics of turbo machines” of the TU SPb.

Keywords: vaneless diffuser, relative width, flow angle, flow separation, loss coefficient, similarity criteria

Procedia PDF Downloads 490
609 Isolation and Identification of Diacylglycerol Acyltransferase Type-2 (GAT2) Genes from Three Egyptian Olive Cultivars

Authors: Yahia I. Mohamed, Ahmed I. Marzouk, Mohamed A. Yacout

Abstract:

Aim of this work was to study the genetic basis for oil accumulation in olive fruit via tracking DGAT2 (Diacylglycerol acyltransferase type-2) gene in three Egyptian Origen Olive cultivars namely Toffahi, Hamed and Maraki using molecular marker techniques and bioinformatics tools. Results illustrate that, firstly: specific genomic band of Maraki cultivars was identified as DGAT2 (Diacylglycerol acyltransferase type-2) and identical for this gene in Olea europaea with 100 % of similarity. Secondly, differential genomic band of Maraki cultivars which produced from RAPD fingerprinting technique reflected predicted distinguished sequence which identified as DGAT2 (Diacylglycerol acyltransferase type-2) in Fragaria vesca subsp. Vesca with 76% of sequential similarity. Third and finally, specific genomic specific band of Hamed cultivars was indentified as two fragments, 1-Olea europaea cultivar Koroneiki diacylglycerol acyltransferase type 2 mRNA, complete cds with two matches regions with 99% or 2-PREDICTED: Fragaria vesca subsp. vesca diacylglycerol O-acyltransferase 2-like (LOC101313050), mRNA with 86% of similarity.

Keywords: Olea europaea, fingerprinting, diacylglycerol acyltransferase type-2 (DGAT2), Egypt

Procedia PDF Downloads 503
608 Bird Diversity along Boat Touring Routes in Tha Ka Sub-District, Amphawa District, Samut Songkram Province, Thailand

Authors: N. Charoenpokaraj, P. Chitman

Abstract:

This research aims to study species, abundance, status of birds, the similarities and activity characteristics of birds which reap benefits from the research area in boat touring routes in Tha Ka sub-district, Amphawa District, Samut Songkram Province, Thailand. from October 2012 – September 2013. The data was analyzed to find the abundance, and similarity index of the birds. The results from the survey of birds on all three routes found that there are 33 families and 63 species. Route 3 (traditional coconut sugar making kiln – resort) had the most species; 56 species. There were 18 species of commonly found birds with an abundance level of 5, which calculates to 28.57% of all bird species. In August, 46 species are found, being the greatest number of bird species benefiting from this route. As for the status of the birds, there are 51 resident birds, 7 resident and migratory birds, and 5 migratory birds. On Route 2 and Route 3, the similarity index value is equal to 0.881. The birds are classified by their activity characteristics i.e. insectivore, piscivore, granivore, nectrivore and aquatic invertebrate feeder birds. Some birds also use the area for nesting.

Keywords: bird diversity, boat touring routes, Samut Songkram, similarity index

Procedia PDF Downloads 336
607 Correlation between Funding and Publications: A Pre-Step towards Future Research Prediction

Authors: Ning Kang, Marius Doornenbal

Abstract:

Funding is a very important – if not crucial – resource for research projects. Usually, funding organizations will publish a description of the funded research to describe the scope of the funding award. Logically, we would expect research outcomes to align with this funding award. For that reason, we might be able to predict future research topics based on present funding award data. That said, it remains to be shown if and how future research topics can be predicted by using the funding information. In this paper, we extract funding project information and their generated paper abstracts from the Gateway to Research database as a group, and use the papers from the same domains and publication years in the Scopus database as a baseline comparison group. We annotate both the project awards and the papers resulting from the funded projects with linguistic features (noun phrases), and then calculate tf-idf and cosine similarity between these two set of features. We show that the cosine similarity between the project-generated papers group is bigger than the project-baseline group, and also that these two groups of similarities are significantly different. Based on this result, we conclude that the funding information actually correlates with the content of future research output for the funded project on the topical level. How funding really changes the course of science or of scientific careers remains an elusive question.

Keywords: natural language processing, noun phrase, tf-idf, cosine similarity

Procedia PDF Downloads 245
606 An Optimization Algorithm Based on Dynamic Schema with Dissimilarities and Similarities of Chromosomes

Authors: Radhwan Yousif Sedik Al-Jawadi

Abstract:

Optimization is necessary for finding appropriate solutions to a range of real-life problems. In particular, genetic (or more generally, evolutionary) algorithms have proved very useful in solving many problems for which analytical solutions are not available. In this paper, we present an optimization algorithm called Dynamic Schema with Dissimilarity and Similarity of Chromosomes (DSDSC) which is a variant of the classical genetic algorithm. This approach constructs new chromosomes from a schema and pairs of existing ones by exploring their dissimilarities and similarities. To show the effectiveness of the algorithm, it is tested and compared with the classical GA, on 15 two-dimensional optimization problems taken from literature. We have found that, in most cases, our method is better than the classical genetic algorithm.

Keywords: chromosome injection, dynamic schema, genetic algorithm, similarity and dissimilarity

Procedia PDF Downloads 348
605 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity

Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang

Abstract:

The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.

Keywords: text information retrieval, natural language processing, new word discovery, information extraction

Procedia PDF Downloads 95
604 ScRNA-Seq RNA Sequencing-Based Program-Polygenic Risk Scores Associated with Pancreatic Cancer Risks in the UK Biobank Cohort

Authors: Yelin Zhao, Xinxiu Li, Martin Smelik, Oleg Sysoev, Firoj Mahmud, Dina Mansour Aly, Mikael Benson

Abstract:

Background: Early diagnosis of pancreatic cancer is clinically challenging due to vague, or no symptoms, and lack of biomarkers. Polygenic risk score (PRS) scores may provide a valuable tool to assess increased or decreased risk of PC. This study aimed to develop such PRS by filtering genetic variants identified by GWAS using transcriptional programs identified by single-cell RNA sequencing (scRNA-seq). Methods: ScRNA-seq data from 24 pancreatic ductal adenocarcinoma (PDAC) tumor samples and 11 normal pancreases were analyzed to identify differentially expressed genes (DEGs) in in tumor and microenvironment cell types compared to healthy tissues. Pathway analysis showed that the DEGs were enriched for hundreds of significant pathways. These were clustered into 40 “programs” based on gene similarity, using the Jaccard index. Published genetic variants associated with PDAC were mapped to each program to generate program PRSs (pPRSs). These pPRSs, along with five previously published PRSs (PGS000083, PGS000725, PGS000663, PGS000159, and PGS002264), were evaluated in a European-origin population from the UK Biobank, consisting of 1,310 PDAC participants and 407,473 non-pancreatic cancer participants. Stepwise Cox regression analysis was performed to determine associations between pPRSs with the development of PC, with adjustments of sex and principal components of genetic ancestry. Results: The PDAC genetic variants were mapped to 23 programs and were used to generate pPRSs for these programs. Four distinct pPRSs (P1, P6, P11, and P16) and two published PRSs (PGS000663 and PGS002264) were significantly associated with an increased risk of developing PC. Among these, P6 exhibited the greatest hazard ratio (adjusted HR[95% CI] = 1.67[1.14-2.45], p = 0.008). In contrast, P10 and P4 were associated with lower risk of developing PC (adjusted HR[95% CI] = 0.58[0.42-0.81], p = 0.001, and adjusted HR[95% CI] = 0.75[0.59-0.96], p = 0.019). By comparison, two of the five published PRS exhibited an association with PDAC onset with HR (PGS000663: adjusted HR[95% CI] = 1.24[1.14-1.35], p < 0.001 and PGS002264: adjusted HR[95% CI] = 1.14[1.07-1.22], p < 0.001). Conclusion: Compared to published PRSs, scRNA-seq-based pPRSs may be used not only to assess increased but also decreased risk of PDAC.

Keywords: cox regression, pancreatic cancer, polygenic risk score, scRNA-seq, UK biobank

Procedia PDF Downloads 101
603 Nazca: A Context-Based Matching Method for Searching Heterogeneous Structures

Authors: Karine B. de Oliveira, Carina F. Dorneles

Abstract:

The structure level matching is the problem of combining elements of a structure, which can be represented as entities, classes, XML elements, web forms, and so on. This is a challenge due to large number of distinct representations of semantically similar structures. This paper describes a structure-based matching method applied to search for different representations in data sources, considering the similarity between elements of two structures and the data source context. Using real data sources, we have conducted an experimental study comparing our approach with our baseline implementation and with another important schema matching approach. We demonstrate that our proposal reaches higher precision than the baseline.

Keywords: context, data source, index, matching, search, similarity, structure

Procedia PDF Downloads 364
602 3D Model Completion Based on Similarity Search with Slim-Tree

Authors: Alexis Aldo Mendoza Villarroel, Ademir Clemente Villena Zevallos, Cristian Jose Lopez Del Alamo

Abstract:

With the advancement of technology it is now possible to scan entire objects and obtain their digital representation by using point clouds or polygon meshes. However, some objects may be broken or have missing parts; thus, several methods focused on this problem have been proposed based on Geometric Deep Learning, such as GCNN, ACNN, PointNet, among others. In this article an approach from a different paradigm is proposed, using metric data structures to index global descriptors in the spectral domain and allow the recovery of a set of similar models in polynomial time; to later use the Iterative Close Point algorithm and recover the parts of the incomplete model using the geometry and topology of the model with less Hausdorff distance.

Keywords: 3D reconstruction method, point cloud completion, shape completion, similarity search

Procedia PDF Downloads 122
601 A Nonlocal Means Algorithm for Poisson Denoising Based on Information Geometry

Authors: Dongxu Chen, Yipeng Li

Abstract:

This paper presents an information geometry NonlocalMeans(NLM) algorithm for Poisson denoising. NLM estimates a noise-free pixel as a weighted average of image pixels, where each pixel is weighted according to the similarity between image patches in Euclidean space. In this work, every pixel is a Poisson distribution locally estimated by Maximum Likelihood (ML), all distributions consist of a statistical manifold. A NLM denoising algorithm is conducted on the statistical manifold where Fisher information matrix can be used for computing distribution geodesics referenced as the similarity between patches. This approach was demonstrated to be competitive with related state-of-the-art methods.

Keywords: image denoising, Poisson noise, information geometry, nonlocal-means

Procedia PDF Downloads 285
600 Phishing Detection: Comparison between Uniform Resource Locator and Content-Based Detection

Authors: Nuur Ezaini Akmar Ismail, Norbazilah Rahim, Norul Huda Md Rasdi, Maslina Daud

Abstract:

A web application is the most targeted by the attacker because the web application is accessible by the end users. It has become more advantageous to the attacker since not all the end users aware of what kind of sensitive data already leaked by them through the Internet especially via social network in shake on ‘sharing’. The attacker can use this information such as personal details, a favourite of artists, a favourite of actors or actress, music, politics, and medical records to customize phishing attack thus trick the user to click on malware-laced attachments. The Phishing attack is one of the most popular attacks for social engineering technique against web applications. There are several methods to detect phishing websites such as Blacklist/Whitelist based detection, heuristic-based, and visual similarity-based detection. This paper illustrated a comparison between the heuristic-based technique using features of a uniform resource locator (URL) and visual similarity-based detection techniques that compares the content of a suspected phishing page with the legitimate one in order to detect new phishing sites based on the paper reviewed from the past few years. The comparison focuses on three indicators which are false positive and negative, accuracy of the method, and time consumed to detect phishing website.

Keywords: heuristic-based technique, phishing detection, social engineering and visual similarity-based technique

Procedia PDF Downloads 177
599 Genetic Characterization of Barley Genotypes via Inter-Simple Sequence Repeat

Authors: Mustafa Yorgancılar, Emine Atalay, Necdet Akgün, Ali Topal

Abstract:

In this study, polymerase chain reaction based Inter-simple sequence repeat (ISSR) from DNA fingerprinting techniques were used to investigate the genetic relationships among barley crossbreed genotypes in Turkey. It is important that selection based on the genetic base in breeding programs via ISSR, in terms of breeding time. 14 ISSR primers generated a total of 97 bands, of which 81 (83.35%) were polymorphic. The highest total resolution power (RP) value was obtained from the F2 (0.53) and M16 (0.51) primers. According to the ISSR result, the genetic similarity index changed between 0.64–095; Lane 3 with Line 6 genotypes were the closest, while Line 36 were the most distant ones. The ISSR markers were found to be promising for assessing genetic diversity in barley crossbreed genotypes.

Keywords: barley, crossbreed, genetic similarity, ISSR

Procedia PDF Downloads 347
598 An Integrated Fuzzy Inference System and Technique for Order of Preference by Similarity to Ideal Solution Approach for Evaluation of Lean Healthcare Systems

Authors: Aydin M. Torkabadi, Ehsan Pourjavad

Abstract:

A decade after the introduction of Lean in Saskatchewan’s public healthcare system, its effectiveness remains a controversial subject among health researchers, workers, managers, and politicians. Therefore, developing a framework to quantitatively assess the Lean achievements is significant. This study investigates the success of initiatives across Saskatchewan health regions by recognizing the Lean healthcare criteria, measuring the success levels, comparing the regions, and identifying the areas for improvements. This study proposes an integrated intelligent computing approach by applying Fuzzy Inference System (FIS) and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). FIS is used as an efficient approach to assess the Lean healthcare criteria, and TOPSIS is applied for ranking the values in regards to the level of leanness. Due to the innate uncertainty in decision maker judgments on criteria, principals of the fuzzy theory are applied. Finally, FIS-TOPSIS was established as an efficient technique in determining the lean merit in healthcare systems.

Keywords: lean healthcare, intelligent computing, fuzzy inference system, healthcare evaluation, technique for order of preference by similarity to ideal solution, multi-criteria decision making, MCDM

Procedia PDF Downloads 162
597 Semantic-Based Collaborative Filtering to Improve Visitor Cold Start in Recommender Systems

Authors: Baba Mbaye

Abstract:

In collaborative filtering recommendation systems, a user receives suggested items based on the opinions and evaluations of a community of users. This type of recommendation system uses only the information (notes in numerical values) contained in a usage matrix as input data. This matrix can be constructed based on users' behaviors or by offering users to declare their opinions on the items they know. The cold start problem leads to very poor performance for new users. It is a phenomenon that occurs at the beginning of use, in the situation where the system lacks data to make recommendations. There are three types of cold start problems: cold start for a new item, a new system, and a new user. We are interested in this article at the cold start for a new user. When the system welcomes a new user, the profile exists but does not have enough data, and its communities with other users profiles are still unknown. This leads to recommendations not adapted to the profile of the new user. In this paper, we propose an approach that improves cold start by using the notions of similarity and semantic proximity between users profiles during cold start. We will use the cold-metadata available (metadata extracted from the new user's data) useful in positioning the new user within a community. The aim is to look for similarities and semantic proximities with the old and current user profiles of the system. Proximity is represented by close concepts considered to belong to the same group, while similarity groups together elements that appear similar. Similarity and proximity are two close but not similar concepts. This similarity leads us to the construction of similarity which is based on: a) the concepts (properties, terms, instances) independent of ontology structure and, b) the simultaneous representation of the two concepts (relations, presence of terms in a document, simultaneous presence of the authorities). We propose an ontology, OIVCSRS (Ontology of Improvement Visitor Cold Start in Recommender Systems), in order to structure the terms and concepts representing the meaning of an information field, whether by the metadata of a namespace, or the elements of a knowledge domain. This approach allows us to automatically attach the new user to a user community, partially compensate for the data that was not initially provided and ultimately to associate a better first profile with the cold start. Thus, the aim of this paper is to propose an approach to improving cold start using semantic technologies.

Keywords: visitor cold start, recommender systems, collaborative filtering, semantic filtering

Procedia PDF Downloads 218
596 Semantic Search Engine Based on Query Expansion with Google Ranking and Similarity Measures

Authors: Ahmad Shahin, Fadi Chakik, Walid Moudani

Abstract:

Our study is about elaborating a potential solution for a search engine that involves semantic technology to retrieve information and display it significantly. Semantic search engines are not used widely over the web as the majorities are still in Beta stage or under construction. Many problems face the current applications in semantic search, the major problem is to analyze and calculate the meaning of query in order to retrieve relevant information. Another problem is the ontology based index and its updates. Ranking results according to concept meaning and its relation with query is another challenge. In this paper, we are offering a light meta-engine (QESM) which uses Google search, and therefore Google’s index, with some adaptations to its returned results by adding multi-query expansion. The mission was to find a reliable ranking algorithm that involves semantics and uses concepts and meanings to rank results. At the beginning, the engine finds synonyms of each query term entered by the user based on a lexical database. Then, query expansion is applied to generate different semantically analogous sentences. These are generated randomly by combining the found synonyms and the original query terms. Our model suggests the use of semantic similarity measures between two sentences. Practically, we used this method to calculate semantic similarity between each query and the description of each page’s content generated by Google. The generated sentences are sent to Google engine one by one, and ranked again all together with the adapted ranking method (QESM). Finally, our system will place Google pages with higher similarities on the top of the results. We have conducted experimentations with 6 different queries. We have observed that most ranked results with QESM were altered with Google’s original generated pages. With our experimented queries, QESM generates frequently better accuracy than Google. In some worst cases, it behaves like Google.

Keywords: semantic search engine, Google indexing, query expansion, similarity measures

Procedia PDF Downloads 425
595 Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Authors: Yunseok Noh, Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park

Abstract:

Scripts are one of the basic text resources to understand broadcasting contents. Since broadcast media wields lots of influence over the public, tools for understanding broadcasting contents are more required. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches. Scripts also provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scripts consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics based on statistical learning method. To tackle this problem, we propose a method of learning with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, by using high quality of topics, we can get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

Keywords: broadcasting contents, scripts, text similarity, topic model

Procedia PDF Downloads 318
594 Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance

Authors: Loai AbdAllah, Mahmoud Kaiyal

Abstract:

Missing values in real-world datasets are a common problem. Many algorithms were developed to deal with this problem, most of them replace the missing values with a fixed value that was computed based on the observed values. In our work, we used a distance function based on Bhattacharyya distance to measure the distance between objects with missing values. Bhattacharyya distance, which measures the similarity of two probability distributions. The proposed distance distinguishes between known and unknown values. Where the distance between two known values is the Mahalanobis distance. When, on the other hand, one of them is missing the distance is computed based on the distribution of the known values, for the coordinate that contains the missing value. This method was integrated with Wikaya, a digital health company developing a platform that helps to improve prevention of chronic diseases such as diabetes and cancer. In order for Wikaya’s recommendation system to work distance between users need to be measured. Since there are missing values in the collected data, there is a need to develop a distance function distances between incomplete users profiles. To evaluate the accuracy of the proposed distance function in reflecting the actual similarity between different objects, when some of them contain missing values, we integrated it within the framework of k nearest neighbors (kNN) classifier, since its computation is based only on the similarity between objects. To validate this, we ran the algorithm over diabetes and breast cancer datasets, standard benchmark datasets from the UCI repository. Our experiments show that kNN classifier using our proposed distance function outperforms the kNN using other existing methods.

Keywords: missing values, incomplete data, distance, incomplete diabetes data

Procedia PDF Downloads 225
593 Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Authors: Shohei Maruyama, Yasuo Matsuyama, Sachiyo Aburatani

Abstract:

The development of the method to annotate unknown gene functions is an important task in bioinformatics. One of the approaches for the annotation is The identification of the metabolic pathway that genes are involved in. Gene expression data have been utilized for the identification, since gene expression data reflect various intracellular phenomena. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.

Keywords: metabolic pathways, gene expression data, microarray, Kullback–Leibler divergence, KL divergence, support vector machines, SVM, machine learning

Procedia PDF Downloads 403
592 Recruitment Model (FSRM) for Faculty Selection Based on Fuzzy Soft

Authors: G. S. Thakur

Abstract:

This paper presents a Fuzzy Soft Recruitment Model (FSRM) for faculty selection of MHRD technical institutions. The selection criteria are based on 4-tier flexible structure in the institutions. The Advisory Committee on Faculty Recruitment (ACoFAR) suggested nine criteria for faculty in the proposed FSRM. The model Fuzzy Soft is proposed with consultation of ACoFAR based on selection criteria. The Fuzzy Soft distance similarity measures are applied for finding best faculty from the applicant pool.

Keywords: fuzzy soft set, fuzzy sets, fuzzy soft distance, fuzzy soft similarity measures, ACoFAR

Procedia PDF Downloads 347
591 Decoding Gender Disparities in AI: An Experimental Exploration Within the Realm of AI and Trust Building

Authors: Alexander Scott English, Yilin Ma, Xiaoying Liu

Abstract:

The widespread use of artificial intelligence in everyday life has triggered a fervent discussion covering a wide range of areas. However, to date, research on the influence of gender in various segments and factors from a social science perspective is still limited. This study aims to explore whether there are gender differences in human trust in AI for its application in basic everyday life and correlates with human perceived similarity, perceived emotions (including competence and warmth), and attractiveness. We conducted a study involving 321 participants using a two-subject experimental design with a two-factor (masculinized vs. feminized voice of the AI) multiplied by a two-factor (pitch level of the AI's voice) between-subject experimental design. Four contexts were created for the study and randomly assigned. The results of the study showed significant gender differences in perceived similarity, trust, and perceived emotion of the AIs, with females rating them significantly higher than males. Trust was higher in relation to AIs presenting the same gender (e.g., human female to female AI, human male to male AI). Mediation modeling tests indicated that emotion perception and similarity played a sufficiently mediating role in trust. Notably, although trust in AIs was strongly correlated with human gender, there was no significant effect on the gender of the AI. In addition, the study discusses the effects of subjects' age, job search experience, and job type on the findings.

Keywords: artificial intelligence, gender differences, human-robot trust, mediation modeling

Procedia PDF Downloads 45
590 Plagiarism Detection for Flowchart and Figures in Texts

Authors: Ahmadu Maidorawa, Idrissa Djibo, Muhammad Tella

Abstract:

This paper presents a method for detecting flow chart and figure plagiarism based on shape of image processing and multimedia retrieval. The method managed to retrieve flowcharts with ranked similarity according to different matching sets. Plagiarism detection is well known phenomenon in the academic arena. Copying other people is considered as serious offense that needs to be checked. There are many plagiarism detection systems such as turn-it-in that has been developed to provide these checks. Most, if not all, discard the figures and charts before checking for plagiarism. Discarding the figures and charts result in look holes that people can take advantage. That means people can plagiarize figures and charts easily without the current plagiarism systems detecting it. There are very few papers which talks about flowcharts plagiarism detection. Therefore, there is a need to develop a system that will detect plagiarism in figures and charts.

Keywords: flowchart, multimedia retrieval, figures similarity, image comparison, figure retrieval

Procedia PDF Downloads 464
589 U-Net Based Multi-Output Network for Lung Disease Segmentation and Classification Using Chest X-Ray Dataset

Authors: Jaiden X. Schraut

Abstract:

Medical Imaging Segmentation of Chest X-rays is used for the purpose of identification and differentiation of lung cancer, pneumonia, COVID-19, and similar respiratory diseases. Widespread application of computer-supported perception methods into the diagnostic pipeline has been demonstrated to increase prognostic accuracy and aid doctors in efficiently treating patients. Modern models attempt the task of segmentation and classification separately and improve diagnostic efficiency; however, to further enhance this process, this paper proposes a multi-output network that follows a U-Net architecture for image segmentation output and features an additional CNN module for auxiliary classification output. The proposed model achieves a final Jaccard Index of .9634 for image segmentation and a final accuracy of .9600 for classification on the COVID-19 radiography database.

Keywords: chest X-ray, deep learning, image segmentation, image classification

Procedia PDF Downloads 144
588 BiFormerDTA: Structural Embedding of Protein in Drug Target Affinity Prediction Using BiFormer

Authors: Leila Baghaarabani, Parvin Razzaghi, Mennatolla Magdy Mostafa, Ahmad Albaqsami, Al Warith Al Rushaidi, Masoud Al Rawahi

Abstract:

Predicting the interaction between drugs and their molecular targets is pivotal for advancing drug development processes. Due to the time and cost limitations, computational approaches have emerged as an effective approach to drug-target interaction (DTI) prediction. Most of the introduced computational based approaches utilize the drug molecule and protein sequence as input. This study does not only utilize these inputs, it also introduces a protein representation developed using a masked protein language model. In this representation, for every individual amino acid residue within the protein sequence, there exists a corresponding probability distribution that indicates the likelihood of each amino acid being present at that particular position. Then, the similarity between each pair of amino-acids is computed to create similarity matrix. To encode the knowledge of the similarity matrix, Bi-Level Routing Attention (BiFormer) is utilized, which combines aspects of transformer-based models with protein sequence analysis and represents a significant advancement in the field of drug-protein interaction prediction. BiFormer has the ability to pinpoint the most effective regions of the protein sequence that are responsible for facilitating interactions between the protein and drugs, thereby enhancing the understanding of these critical interactions. Thus, it appears promising in its ability to capture the local structural relationship of the proteins by enhancing the understanding of how it contributes to drug protein interactions, thereby facilitating more accurate predictions. To evaluate the proposed method, it was tested on two widely recognized datasets: Davis and KIBA. A comprehensive series of experiments was conducted to illustrate its effectiveness in comparison to cuttingedge techniques.

Keywords: BiFormer, transformer, protein language processing, self-attention mechanism, binding affinity, drug target interaction, similarity matrix, protein masked representation, protein language model

Procedia PDF Downloads 7
587 Web Proxy Detection via Bipartite Graphs and One-Mode Projections

Authors: Zhipeng Chen, Peng Zhang, Qingyun Liu, Li Guo

Abstract:

With the Internet becoming the dominant channel for business and life, many IPs are increasingly masked using web proxies for illegal purposes such as propagating malware, impersonate phishing pages to steal sensitive data or redirect victims to other malicious targets. Moreover, as Internet traffic continues to grow in size and complexity, it has become an increasingly challenging task to detect the proxy service due to their dynamic update and high anonymity. In this paper, we present an approach based on behavioral graph analysis to study the behavior similarity of web proxy users. Specifically, we use bipartite graphs to model host communications from network traffic and build one-mode projections of bipartite graphs for discovering social-behavior similarity of web proxy users. Based on the similarity matrices of end-users from the derived one-mode projection graphs, we apply a simple yet effective spectral clustering algorithm to discover the inherent web proxy users behavior clusters. The web proxy URL may vary from time to time. Still, the inherent interest would not. So, based on the intuition, by dint of our private tools implemented by WebDriver, we examine whether the top URLs visited by the web proxy users are web proxies. Our experiment results based on real datasets show that the behavior clusters not only reduce the number of URLs analysis but also provide an effective way to detect the web proxies, especially for the unknown web proxies.

Keywords: bipartite graph, one-mode projection, clustering, web proxy detection

Procedia PDF Downloads 245
586 Comparison of Crossover Types to Obtain Optimal Queries Using Adaptive Genetic Algorithm

Authors: Wafa’ Alma'Aitah, Khaled Almakadmeh

Abstract:

this study presents an information retrieval system of using genetic algorithm to increase information retrieval efficiency. Using vector space model, information retrieval is based on the similarity measurement between query and documents. Documents with high similarity to query are judge more relevant to the query and should be retrieved first. Using genetic algorithms, each query is represented by a chromosome; these chromosomes are fed into genetic operator process: selection, crossover, and mutation until an optimized query chromosome is obtained for document retrieval. Results show that information retrieval with adaptive crossover probability and single point type crossover and roulette wheel as selection type give the highest recall. The proposed approach is verified using (242) proceedings abstracts collected from the Saudi Arabian national conference.

Keywords: genetic algorithm, information retrieval, optimal queries, crossover

Procedia PDF Downloads 292