Search results for: lexical similarity
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 882

Search results for: lexical similarity

762 Case-Based Reasoning Approach for Process Planning of Internal Thread Cold Extrusion

Authors: D. Zhang, H. Y. Du, G. W. Li, J. Zeng, D. W. Zuo, Y. P. You

Abstract:

For the difficult issues of process selection, case-based reasoning technology is applied to computer aided process planning system for cold form tapping of internal threads on the basis of similarity in the process. A model is established based on the analysis of process planning. Case representation and similarity computing method are given. Confidence degree is used to evaluate the case. Rule-based reuse strategy is presented. The scheme is illustrated and verified by practical application. The case shows the design results with the proposed method are effective.

Keywords: case-based reasoning, internal thread, cold extrusion, process planning

Procedia PDF Downloads 479
761 Algorithms for Fast Computation of Pan Matrix Profiles of Time Series Under Unnormalized Euclidean Distances

Authors: Jing Zhang, Daniel Nikovski

Abstract:

We propose an approximation algorithm called LINKUMP to compute the Pan Matrix Profile (PMP) under the unnormalized l∞ distance (useful for value-based similarity search) using double-ended queue and linear interpolation. The algorithm has comparable time/space complexities as the state-of-the-art algorithm for typical PMP computation under the normalized l₂ distance (useful for shape-based similarity search). We validate its efficiency and effectiveness through extensive numerical experiments and a real-world anomaly detection application.

Keywords: pan matrix profile, unnormalized euclidean distance, double-ended queue, discord discovery, anomaly detection

Procedia PDF Downloads 213
760 Flow Behavior and Performances of Centrifugal Compressor Stage Vaneless Diffusers

Authors: Y.Galerkin, O. Solovieva

Abstract:

Flow parameters are calculated in vaneless diffusers with relative width 0,014 – 0,10 constant along radii. Inlet flow angles and similarity criteria were varied. Information about flow structure is presented – meridian streamlines configuration, information on flow full development, flow separation. Polytrophic efficiency, loss and recovery coefficient are used to compare diffusers’ effectiveness. The sample of narrow diffuser optimization by conical walls application is presented. Three tampered variants of a wide diffuser are compared too. The work is made in the R&D laboratory “Gas dynamics of turbo machines” of the TU SPb.

Keywords: vaneless diffuser, relative width, flow angle, flow separation, loss coefficient, similarity criteria

Procedia PDF Downloads 459
759 Historical Development of Negative Emotive Intensifiers in Hungarian

Authors: Martina Katalin Szabó, Bernadett Lipóczi, Csenge Guba, István Uveges

Abstract:

In this study, an exhaustive analysis was carried out about the historical development of negative emotive intensifiers in the Hungarian language via NLP methods. Intensifiers are linguistic elements which modify or reinforce a variable character in the lexical unit they apply to. Therefore, intensifiers appear with other lexical items, such as adverbs, adjectives, verbs, infrequently with nouns. Due to the complexity of this phenomenon (set of sociolinguistic, semantic, and historical aspects), there are many lexical items which can operate as intensifiers. The group of intensifiers are admittedly one of the most rapidly changing elements in the language. From a linguistic point of view, particularly interesting are a special group of intensifiers, the so-called negative emotive intensifiers, that, on their own, without context, have semantic content that can be associated with negative emotion, but in particular cases, they may function as intensifiers (e.g.borzasztóanjó ’awfully good’, which means ’excellent’). Despite their special semantic features, negative emotive intensifiers are scarcely examined in literature based on large Historical corpora via NLP methods. In order to become better acquainted with trends over time concerning the intensifiers, The exhaustively analysed a specific historical corpus, namely the Magyar TörténetiSzövegtár (Hungarian Historical Corpus). This corpus (containing 3 millions text words) is a collection of texts of various genres and styles, produced between 1772 and 2010. Since the corpus consists of raw texts and does not contain any additional information about the language features of the data (such as stemming or morphological analysis), a large amount of manual work was required to process the data. Thus, based on a lexicon of negative emotive intensifiers compiled in a previous phase of the research, every occurrence of each intensifier was queried, and the results were stored in a separate data frame. Then, basic linguistic processing (POS-tagging, lemmatization etc.) was carried out automatically with the ‘magyarlanc’ NLP-toolkit. Finally, the frequency and collocation features of all the negative emotive words were automatically analyzed in the corpus. Outcomes of the research revealed in detail how these words have proceeded through grammaticalization over time, i.e., they change from lexical elements to grammatical ones, and they slowly go through a delexicalization process (their negative content diminishes over time). What is more, it was also pointed out which negative emotive intensifiers are at the same stage in this process in the same time period. Giving a closer look to the different domains of the analysed corpus, it also became certain that during this process, the pragmatic role’s importance increases: the newer use expresses the speaker's subjective, evaluative opinion at a certain level.

Keywords: historical corpus analysis, historical linguistics, negative emotive intensifiers, semantic changes over time

Procedia PDF Downloads 196
758 Isolation and Identification of Diacylglycerol Acyltransferase Type-2 (GAT2) Genes from Three Egyptian Olive Cultivars

Authors: Yahia I. Mohamed, Ahmed I. Marzouk, Mohamed A. Yacout

Abstract:

Aim of this work was to study the genetic basis for oil accumulation in olive fruit via tracking DGAT2 (Diacylglycerol acyltransferase type-2) gene in three Egyptian Origen Olive cultivars namely Toffahi, Hamed and Maraki using molecular marker techniques and bioinformatics tools. Results illustrate that, firstly: specific genomic band of Maraki cultivars was identified as DGAT2 (Diacylglycerol acyltransferase type-2) and identical for this gene in Olea europaea with 100 % of similarity. Secondly, differential genomic band of Maraki cultivars which produced from RAPD fingerprinting technique reflected predicted distinguished sequence which identified as DGAT2 (Diacylglycerol acyltransferase type-2) in Fragaria vesca subsp. Vesca with 76% of sequential similarity. Third and finally, specific genomic specific band of Hamed cultivars was indentified as two fragments, 1-Olea europaea cultivar Koroneiki diacylglycerol acyltransferase type 2 mRNA, complete cds with two matches regions with 99% or 2-PREDICTED: Fragaria vesca subsp. vesca diacylglycerol O-acyltransferase 2-like (LOC101313050), mRNA with 86% of similarity.

Keywords: Olea europaea, fingerprinting, diacylglycerol acyltransferase type-2 (DGAT2), Egypt

Procedia PDF Downloads 470
757 Bird Diversity along Boat Touring Routes in Tha Ka Sub-District, Amphawa District, Samut Songkram Province, Thailand

Authors: N. Charoenpokaraj, P. Chitman

Abstract:

This research aims to study species, abundance, status of birds, the similarities and activity characteristics of birds which reap benefits from the research area in boat touring routes in Tha Ka sub-district, Amphawa District, Samut Songkram Province, Thailand. from October 2012 – September 2013. The data was analyzed to find the abundance, and similarity index of the birds. The results from the survey of birds on all three routes found that there are 33 families and 63 species. Route 3 (traditional coconut sugar making kiln – resort) had the most species; 56 species. There were 18 species of commonly found birds with an abundance level of 5, which calculates to 28.57% of all bird species. In August, 46 species are found, being the greatest number of bird species benefiting from this route. As for the status of the birds, there are 51 resident birds, 7 resident and migratory birds, and 5 migratory birds. On Route 2 and Route 3, the similarity index value is equal to 0.881. The birds are classified by their activity characteristics i.e. insectivore, piscivore, granivore, nectrivore and aquatic invertebrate feeder birds. Some birds also use the area for nesting.

Keywords: bird diversity, boat touring routes, Samut Songkram, similarity index

Procedia PDF Downloads 307
756 The Processing of Context-Dependent and Context-Independent Scalar Implicatures

Authors: Liu Jia’nan

Abstract:

The default accounts hold the view that there exists a kind of scalar implicature which can be processed without context and own a psychological privilege over other scalar implicatures which depend on context. In contrast, the Relevance Theorist regards context as a must because all the scalar implicatures have to meet the need of relevance in discourse. However, in Katsos, the experimental results showed: Although quantitatively the adults rejected under-informative utterance with lexical scales (context-independent) and the ad hoc scales (context-dependent) at almost the same rate, adults still regarded the violation of utterance with lexical scales much more severe than with ad hoc scales. Neither default account nor Relevance Theory can fully explain this result. Thus, there are two questionable points to this result: (1) Is it possible that the strange discrepancy is due to other factors instead of the generation of scalar implicature? (2) Are the ad hoc scales truly formed under the possible influence from mental context? Do the participants generate scalar implicatures with ad hoc scales instead of just comparing semantic difference among target objects in the under- informative utterance? In my Experiment 1, the question (1) will be answered by repetition of Experiment 1 by Katsos. Test materials will be showed by PowerPoint in the form of pictures, and each procedure will be done under the guidance of a tester in a quiet room. Our Experiment 2 is intended to answer question (2). The test material of picture will be transformed into the literal words in DMDX and the target sentence will be showed word-by-word to participants in the soundproof room in our lab. Reading time of target parts, i.e. words containing scalar implicatures, will be recorded. We presume that in the group with lexical scale, standardized pragmatically mental context would help generate scalar implicature once the scalar word occurs, which will make the participants hope the upcoming words to be informative. Thus if the new input after scalar word is under-informative, more time will be cost for the extra semantic processing. However, in the group with ad hoc scale, scalar implicature may hardly be generated without the support from fixed mental context of scale. Thus, whether the new input is informative or not does not matter at all, and the reading time of target parts will be the same in informative and under-informative utterances. People’s mind may be a dynamic system, in which lots of factors would co-occur. If Katsos’ experimental result is reliable, will it shed light on the interplay of default accounts and context factors in scalar implicature processing? We might be able to assume, based on our experiments, that one single dominant processing paradigm may not be plausible. Furthermore, in the processing of scalar implicature, the semantic interpretation and the pragmatic interpretation may be made in a dynamic interplay in the mind. As to the lexical scale, the pragmatic reading may prevail over the semantic reading because of its greater exposure in daily language use, which may also lead the possible default or standardized paradigm override the role of context. However, those objects in ad hoc scale are not usually treated as scalar membership in mental context, and thus lexical-semantic association of the objects may prevent their pragmatic reading from generating scalar implicature. Only when the sufficient contextual factors are highlighted, can the pragmatic reading get privilege and generate scalar implicature.

Keywords: scalar implicature, ad hoc scale, dynamic interplay, default account, Mandarin Chinese processing

Procedia PDF Downloads 290
755 Correlation between Funding and Publications: A Pre-Step towards Future Research Prediction

Authors: Ning Kang, Marius Doornenbal

Abstract:

Funding is a very important – if not crucial – resource for research projects. Usually, funding organizations will publish a description of the funded research to describe the scope of the funding award. Logically, we would expect research outcomes to align with this funding award. For that reason, we might be able to predict future research topics based on present funding award data. That said, it remains to be shown if and how future research topics can be predicted by using the funding information. In this paper, we extract funding project information and their generated paper abstracts from the Gateway to Research database as a group, and use the papers from the same domains and publication years in the Scopus database as a baseline comparison group. We annotate both the project awards and the papers resulting from the funded projects with linguistic features (noun phrases), and then calculate tf-idf and cosine similarity between these two set of features. We show that the cosine similarity between the project-generated papers group is bigger than the project-baseline group, and also that these two groups of similarities are significantly different. Based on this result, we conclude that the funding information actually correlates with the content of future research output for the funded project on the topical level. How funding really changes the course of science or of scientific careers remains an elusive question.

Keywords: natural language processing, noun phrase, tf-idf, cosine similarity

Procedia PDF Downloads 217
754 The Perception and Integration of Lexical Tone and Vowel in Mandarin-speaking Children with Autism: An Event-Related Potential Study

Authors: Rui Wang, Luodi Yu, Dan Huang, Hsuan-Chih Chen, Yang Zhang, Suiping Wang

Abstract:

Enhanced discrimination of pure tones but diminished discrimination of speech pitch (i.e., lexical tone) were found in children with autism who speak a tonal language (Mandarin), suggesting a speech-specific impairment of pitch perception in these children. However, in tonal languages, both lexical tone and vowel are phonemic cues and integrally dependent on each other. Therefore, it is unclear whether the presence of phonemic vowel dimension contributes to the observed lexical tone deficits in Mandarin-speaking children with autism. The current study employed a multi-feature oddball paradigm to examine how vowel and tone dimensions contribute to the neural responses for syllable change detection and involuntary attentional orienting in school-age Mandarin-speaking children with autism. In the oddball sequence, syllable /da1/ served as the standard stimulus. There were three deviant stimulus conditions, representing tone-only change (TO, /da4/), vowel-only change (VO, /du1/), and change of tone and vowel simultaneously (TV, /du4/). EEG data were collected from 25 children with autism and 20 age-matched normal controls during passive listening to the stimulation. For each deviant condition, difference waveform measuring mismatch negativity (MMN) was derived from subtracting the ERP waveform to the standard sound from that to the deviant sound for each participant. Additionally, the linear summation of TO and VO difference waveforms was compared to the TV difference waveform, to examine whether neural sensitivity for TV change detection reflects simple summation or nonlinear integration of the two individual dimensions. The MMN results showed that the autism group had smaller amplitude compared with the control group in the TO and VO conditions, suggesting impaired discriminative sensitivity for both dimensions. In the control group, amplitude of the TV difference waveform approximated the linear summation of the TO and VO waveforms only in the early time window but not in the late window, suggesting a time course from dimensional summation to nonlinear integration. In the autism group, however, the nonlinear TV integration was already present in the early window. These findings suggest that speech perception atypicality in children with autism rests not only in the processing of single phonemic dimensions, but also in the dimensional integration process.

Keywords: autism, event-related potentials , mismatch negativity, speech perception

Procedia PDF Downloads 172
753 An Optimization Algorithm Based on Dynamic Schema with Dissimilarities and Similarities of Chromosomes

Authors: Radhwan Yousif Sedik Al-Jawadi

Abstract:

Optimization is necessary for finding appropriate solutions to a range of real-life problems. In particular, genetic (or more generally, evolutionary) algorithms have proved very useful in solving many problems for which analytical solutions are not available. In this paper, we present an optimization algorithm called Dynamic Schema with Dissimilarity and Similarity of Chromosomes (DSDSC) which is a variant of the classical genetic algorithm. This approach constructs new chromosomes from a schema and pairs of existing ones by exploring their dissimilarities and similarities. To show the effectiveness of the algorithm, it is tested and compared with the classical GA, on 15 two-dimensional optimization problems taken from literature. We have found that, in most cases, our method is better than the classical genetic algorithm.

Keywords: chromosome injection, dynamic schema, genetic algorithm, similarity and dissimilarity

Procedia PDF Downloads 314
752 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity

Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang

Abstract:

The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.

Keywords: text information retrieval, natural language processing, new word discovery, information extraction

Procedia PDF Downloads 61
751 Reduplication In Urdu-Hindi Nonsensical Words: An OT Analysis

Authors: Riaz Ahmed Mangrio

Abstract:

Reduplication in Urdu-Hindi affects all major word categories, particles, and even nonsensical words. It conveys a variety of meanings, including distribution, emphasis, iteration, adjectival and adverbial. This study will primarily discuss reduplicative structures of nonsensical words in Urdu-Hindi and then briefly look at some examples from other Indo-Aryan languages to introduce the debate regarding the same structures in them. The goal of this study is to present counter-evidence against Keane (2005: 241), who claims “the base in the cases of lexical and phrasal echo reduplication is always independently meaningful”. However, Urdu-Hindi reduplication derives meaningful compounds from nonsensical words e.g. gũ mgũ (A) ‘silent and confused’ and d̪əb d̪əb-a (N) ‘one’s fear over others’. This needs a comprehensive examination to see whether and how the various structures form patterns of a base-reduplicant relationship or, rather, they are merely sub lexical items joining together to form a word pattern of any grammatical category in content words. Another interesting theoretical question arises within the Optimality framework: in an OT analysis, is it necessary to identify one of the two constituents as the base and the other as reduplicant? Or is it best to consider this a pattern, but then how does this fit in with an OT analysis? This may be an even more interesting theoretical question. Looking for the solution to such questions can serve to make an important contribution. In the case at hand, each of the two constituents is an independent nonsensical word, but their echo reduplication is nonetheless meaningful. This casts significant doubt upon Keane’s (2005: 241) observation of some examples from Hindi and Tamil reduplication that “the base in cases of lexical and phrasal echo reduplication is always independently meaningful”. The debate on the point becomes further interesting when the triplication of nonsensical words in Urdu-Hindi e.g. aẽ baẽ ʃaẽ (N) ‘useless talk’ is also seen, which is equally important to discuss. The example is challenging to Harrison’s (1973) claim that only the monosyllabic verbs in their progressive forms reduplicate twice to result in triplication, which is not the case with the example presented. The study will consist of a thorough descriptive analysis of the data for the purpose of documentation, and then there will be OT analysis.

Keywords: reduplication, urdu-hindi, nonsensical, optimality theory

Procedia PDF Downloads 40
750 Investigating Translations of Websites of Pakistani Public Offices

Authors: Sufia Maroof

Abstract:

This empirical study investigated the web-translations of five Pakistani public offices (FPSC, FIA, HEC, USB, and Ministry of Finance) offering Urdu tab as an option to access information on their official websites. Triangulation of quantitative and qualitative research design informed the researcher of the semantic, lexical and syntactic caveats in these translations. The study hypothesized that majority of the Pakistani population is oblivious of the Supreme Court’s amendments in language policy concerning national and official language; hence, Urdu web-translations of the public departments have not been accessed effectively. Firstly, the researcher conducted an online survey, comprising of two sections, close ended and short answer based questions. Secondly, the researcher compiled corpus of the five selected websites in a tabular form to compare the data. Thirdly, the administrators of the departments had been contacted regarding the methods of translation and the expertise of the personnel involved. The corpus was assessed for TQA after examining the lexical, semantic, syntactical and technical alignment inaccuracies and imperfections. The study suggests the public offices to invest in their Urdu webs by either hiring expert translators or engaging expertise of a translation agency for this project to offer quality translation to public.

Keywords: machine translations, public offices, Urdu translations, websites

Procedia PDF Downloads 98
749 Contentious Issues Concerning the Methodology of Using the Lexical Approach in Teaching ESP

Authors: Elena Krutskikh, Elena Khvatova

Abstract:

In tertiary settings expanding students’ vocabulary and teaching discursive competence is seen as one of the chief goals of a professional development course. However, such a focus often is detrimental to students’ cognitive competences, such as analysis, synthesis, and creative processing of information, and deprives students of motivation for self-improvement and self-development of language skills. The presentation is going to argue that in an ESP course special attention should be paid to reading/listening which can promote understanding and using the language as a tool for solving significant real world problems, including professional ones. It is claimed that in the learning process it is necessary to maintain a balance between the content and the linguistic aspect of the educational process as language acquisition is inextricably linked with mental activity and the need to express oneself is a primary stimulus for using a language. A study conducted among undergraduates indicates that they place a premium on quality materials that motivate them and stimulate their further linguistic and professional development. Thus, more demands are placed on study materials that should contain new information for students and serve not only as a source of new vocabulary but also prepare them for real tasks related to professional activities.

Keywords: critical reading, english for professional development, english for specific purposes, high order thinking skills, lexical approach, vocabulary acquisition

Procedia PDF Downloads 139
748 Nazca: A Context-Based Matching Method for Searching Heterogeneous Structures

Authors: Karine B. de Oliveira, Carina F. Dorneles

Abstract:

The structure level matching is the problem of combining elements of a structure, which can be represented as entities, classes, XML elements, web forms, and so on. This is a challenge due to large number of distinct representations of semantically similar structures. This paper describes a structure-based matching method applied to search for different representations in data sources, considering the similarity between elements of two structures and the data source context. Using real data sources, we have conducted an experimental study comparing our approach with our baseline implementation and with another important schema matching approach. We demonstrate that our proposal reaches higher precision than the baseline.

Keywords: context, data source, index, matching, search, similarity, structure

Procedia PDF Downloads 333
747 3D Model Completion Based on Similarity Search with Slim-Tree

Authors: Alexis Aldo Mendoza Villarroel, Ademir Clemente Villena Zevallos, Cristian Jose Lopez Del Alamo

Abstract:

With the advancement of technology it is now possible to scan entire objects and obtain their digital representation by using point clouds or polygon meshes. However, some objects may be broken or have missing parts; thus, several methods focused on this problem have been proposed based on Geometric Deep Learning, such as GCNN, ACNN, PointNet, among others. In this article an approach from a different paradigm is proposed, using metric data structures to index global descriptors in the spectral domain and allow the recovery of a set of similar models in polynomial time; to later use the Iterative Close Point algorithm and recover the parts of the incomplete model using the geometry and topology of the model with less Hausdorff distance.

Keywords: 3D reconstruction method, point cloud completion, shape completion, similarity search

Procedia PDF Downloads 95
746 A Nonlocal Means Algorithm for Poisson Denoising Based on Information Geometry

Authors: Dongxu Chen, Yipeng Li

Abstract:

This paper presents an information geometry NonlocalMeans(NLM) algorithm for Poisson denoising. NLM estimates a noise-free pixel as a weighted average of image pixels, where each pixel is weighted according to the similarity between image patches in Euclidean space. In this work, every pixel is a Poisson distribution locally estimated by Maximum Likelihood (ML), all distributions consist of a statistical manifold. A NLM denoising algorithm is conducted on the statistical manifold where Fisher information matrix can be used for computing distribution geodesics referenced as the similarity between patches. This approach was demonstrated to be competitive with related state-of-the-art methods.

Keywords: image denoising, Poisson noise, information geometry, nonlocal-means

Procedia PDF Downloads 262
745 Phishing Detection: Comparison between Uniform Resource Locator and Content-Based Detection

Authors: Nuur Ezaini Akmar Ismail, Norbazilah Rahim, Norul Huda Md Rasdi, Maslina Daud

Abstract:

A web application is the most targeted by the attacker because the web application is accessible by the end users. It has become more advantageous to the attacker since not all the end users aware of what kind of sensitive data already leaked by them through the Internet especially via social network in shake on ‘sharing’. The attacker can use this information such as personal details, a favourite of artists, a favourite of actors or actress, music, politics, and medical records to customize phishing attack thus trick the user to click on malware-laced attachments. The Phishing attack is one of the most popular attacks for social engineering technique against web applications. There are several methods to detect phishing websites such as Blacklist/Whitelist based detection, heuristic-based, and visual similarity-based detection. This paper illustrated a comparison between the heuristic-based technique using features of a uniform resource locator (URL) and visual similarity-based detection techniques that compares the content of a suspected phishing page with the legitimate one in order to detect new phishing sites based on the paper reviewed from the past few years. The comparison focuses on three indicators which are false positive and negative, accuracy of the method, and time consumed to detect phishing website.

Keywords: heuristic-based technique, phishing detection, social engineering and visual similarity-based technique

Procedia PDF Downloads 152
744 Genetic Characterization of Barley Genotypes via Inter-Simple Sequence Repeat

Authors: Mustafa Yorgancılar, Emine Atalay, Necdet Akgün, Ali Topal

Abstract:

In this study, polymerase chain reaction based Inter-simple sequence repeat (ISSR) from DNA fingerprinting techniques were used to investigate the genetic relationships among barley crossbreed genotypes in Turkey. It is important that selection based on the genetic base in breeding programs via ISSR, in terms of breeding time. 14 ISSR primers generated a total of 97 bands, of which 81 (83.35%) were polymorphic. The highest total resolution power (RP) value was obtained from the F2 (0.53) and M16 (0.51) primers. According to the ISSR result, the genetic similarity index changed between 0.64–095; Lane 3 with Line 6 genotypes were the closest, while Line 36 were the most distant ones. The ISSR markers were found to be promising for assessing genetic diversity in barley crossbreed genotypes.

Keywords: barley, crossbreed, genetic similarity, ISSR

Procedia PDF Downloads 311
743 An Integrated Fuzzy Inference System and Technique for Order of Preference by Similarity to Ideal Solution Approach for Evaluation of Lean Healthcare Systems

Authors: Aydin M. Torkabadi, Ehsan Pourjavad

Abstract:

A decade after the introduction of Lean in Saskatchewan’s public healthcare system, its effectiveness remains a controversial subject among health researchers, workers, managers, and politicians. Therefore, developing a framework to quantitatively assess the Lean achievements is significant. This study investigates the success of initiatives across Saskatchewan health regions by recognizing the Lean healthcare criteria, measuring the success levels, comparing the regions, and identifying the areas for improvements. This study proposes an integrated intelligent computing approach by applying Fuzzy Inference System (FIS) and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). FIS is used as an efficient approach to assess the Lean healthcare criteria, and TOPSIS is applied for ranking the values in regards to the level of leanness. Due to the innate uncertainty in decision maker judgments on criteria, principals of the fuzzy theory are applied. Finally, FIS-TOPSIS was established as an efficient technique in determining the lean merit in healthcare systems.

Keywords: lean healthcare, intelligent computing, fuzzy inference system, healthcare evaluation, technique for order of preference by similarity to ideal solution, multi-criteria decision making, MCDM

Procedia PDF Downloads 134
742 The Presence of Anglicisms in Italian Fashion Magazines and Fashion Blogs

Authors: Vivian Orsi

Abstract:

The present research investigates the lexicon of a fashion magazine, whose universe is very receptive to lexical loans, especially those from English, called Anglicisms. Specifically, we intend to discuss the presence of English items and expressions in the Vogue Italia fashion magazine. Besides, we aim to study the anglicisms used in an Italian fashion blog called The Blonde Salad. Within the discussion of fashion blogs and their contributions to scientific studies, we adopt the theories of Lexicology / Lexicography to define Anglicism (BIDERMAN, 2001), and the observation of its prestige in the Italian Language (ROGATO, 2008; BISETTO, 2003). According to the theoretical basis mentioned, we intend to make a brief analysis of the Anglicisms collected from posts of the first year of existence of such fashion blog, emphasizing also the keywords that have the role to encapsulate the content of the text, allowing the reader to retrieve information from the post of the blog. About the use of English in Italian magazines and blogs, we can affirm that it seems to represent sophistication, assuming the value of prerequisite to participate in the fashion centers of the world. Besides, we believe, as Barthes says (1990, p. 215), that “Fashion does not evolve, it changes: its lexicon is new each year, like that of a language which always keeps the same system but suddenly and regularly ‘changes’ the currency of its words”. Fashion is a mode of communication: it is present in man's interaction with the world, which means that such lexical universe is represented according to the particularities of each culture.

Keywords: anglicism, lexicology, magazines, blogs, fashion

Procedia PDF Downloads 301
741 Aspects of Semantics of Standard British English and Nigerian English: A Contrastive Study

Authors: Chris Adetuyi, Adeola Adeniran

Abstract:

The concept of meaning is a complex one in language study when cultural features are added. This is mandatory because language cannot be completely separated from the culture in which case language and culture complement each other. When there are two varieties of a language in a society, i.e. two varieties functioning side by side in a speech community, there is a tendency to view one of the varieties with each other. There is, therefore, the need to make a linguistic comparative study of varieties of such languages. In this paper, a semantic contrastive study is made between Standard British English (SBE) and Nigerian English (NB). The semantic study is limited to aspects of semantics: semantic extension (Kinship terms, metaphors), semantic shift (lexical items considered are ‘drop’ ‘befriend’ ‘dowry’ and escort) acronyms (NEPA, JAMB, NTA) linguistic borrowing or loan words (Seriki, Agbada, Eba, Dodo, Iroko) coinages (long leg, bush meat; bottom power and juju). In the study of these aspects of semantics of SBE and NE lexical terms, conservative statements are made, problems areas and hierarchy of difficulties are highlighted with a view to bringing out areas of differences are highlighted in this paper are concerned. The study will also serve as a guide in further contrastive studies in some other area of languages.

Keywords: aspect, British, English, Nigeria, semantics

Procedia PDF Downloads 319
740 Revisiting the Swadesh Wordlist: How Long Should It Be

Authors: Feda Negesse

Abstract:

One of the most important indicators of research quality is a good data - collection instrument that can yield reliable and valid data. The Swadesh wordlist has been used for more than half a century for collecting data in comparative and historical linguistics though arbitrariness is observed in its application and size. This research compare s the classification results of the 100 Swadesh wordlist with those of its subsets to determine if reducing the size of the wordlist impact s its effectiveness. In the comparison, the 100, 50 and 40 wordlists were used to compute lexical distances of 29 Cushitic and Semitic languages spoken in Ethiopia and neighbouring countries. Gabmap, a based application, was employed to compute the lexical distances and to divide the languages into related clusters. The study shows that the subsets are not as effective as the 100 wordlist in clustering languages into smaller subgroups but they are equally effective in di viding languages into bigger groups such as subfamilies. It is noted that the subsets may lead to an erroneous classification whereby unrelated languages by chance form a cluster which is not attested by a comparative study. The chance to get a wrong result is higher when the subsets are used to classify languages which are not closely related. Though a further study is still needed to settle the issues around the size of the Swadesh wordlist, this study indicates that the 50 and 40 wordlists cannot be recommended as reliable substitute s for the 100 wordlist under all circumstances. The choice seems to be determined by the objective of a researcher and the degree of affiliation among the languages to be classified.

Keywords: classification, Cushitic, Swadesh, wordlist

Procedia PDF Downloads 271
739 Semantic-Based Collaborative Filtering to Improve Visitor Cold Start in Recommender Systems

Authors: Baba Mbaye

Abstract:

In collaborative filtering recommendation systems, a user receives suggested items based on the opinions and evaluations of a community of users. This type of recommendation system uses only the information (notes in numerical values) contained in a usage matrix as input data. This matrix can be constructed based on users' behaviors or by offering users to declare their opinions on the items they know. The cold start problem leads to very poor performance for new users. It is a phenomenon that occurs at the beginning of use, in the situation where the system lacks data to make recommendations. There are three types of cold start problems: cold start for a new item, a new system, and a new user. We are interested in this article at the cold start for a new user. When the system welcomes a new user, the profile exists but does not have enough data, and its communities with other users profiles are still unknown. This leads to recommendations not adapted to the profile of the new user. In this paper, we propose an approach that improves cold start by using the notions of similarity and semantic proximity between users profiles during cold start. We will use the cold-metadata available (metadata extracted from the new user's data) useful in positioning the new user within a community. The aim is to look for similarities and semantic proximities with the old and current user profiles of the system. Proximity is represented by close concepts considered to belong to the same group, while similarity groups together elements that appear similar. Similarity and proximity are two close but not similar concepts. This similarity leads us to the construction of similarity which is based on: a) the concepts (properties, terms, instances) independent of ontology structure and, b) the simultaneous representation of the two concepts (relations, presence of terms in a document, simultaneous presence of the authorities). We propose an ontology, OIVCSRS (Ontology of Improvement Visitor Cold Start in Recommender Systems), in order to structure the terms and concepts representing the meaning of an information field, whether by the metadata of a namespace, or the elements of a knowledge domain. This approach allows us to automatically attach the new user to a user community, partially compensate for the data that was not initially provided and ultimately to associate a better first profile with the cold start. Thus, the aim of this paper is to propose an approach to improving cold start using semantic technologies.

Keywords: visitor cold start, recommender systems, collaborative filtering, semantic filtering

Procedia PDF Downloads 195
738 Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Authors: Yunseok Noh, Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park

Abstract:

Scripts are one of the basic text resources to understand broadcasting contents. Since broadcast media wields lots of influence over the public, tools for understanding broadcasting contents are more required. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches. Scripts also provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scripts consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics based on statistical learning method. To tackle this problem, we propose a method of learning with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, by using high quality of topics, we can get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

Keywords: broadcasting contents, scripts, text similarity, topic model

Procedia PDF Downloads 291
737 Learning Physics Concepts through Language Syntagmatic Paradigmatic Relations

Authors: C. E. Laburu, M. A. Barros, A. F. Zompero, O. H. M. Silva

Abstract:

The work presents a teaching strategy that employs syntagmatic and paradigmatic linguistic relations in order to monitor the understanding of physics students’ concepts. Syntagmatic and paradigmatic relations are theoretical elements of semiotics studies and our research circumstances and justified them within the research program of multi-modal representations. Among the multi-modal representations to learning scientific knowledge, the scope of action of syntagmatic and paradigmatic relations belongs to the discursive writing form. The use of such relations has the purpose to seek innovate didactic work with discourse representation in the write form before translate to another different representational form. The research was conducted with a sample of first year high school students. The students were asked to produce syntagmatic and paradigmatic of Newton’ first law statement. This statement was delivered in paper for each student that should individually write the relations. The student’s records were collected for analysis. It was possible observed in one student used here as example that their monemes replaced and rearrangements produced by, respectively, syntagmatic and paradigmatic relations, kept the original meaning of the law. In paradigmatic production he specified relevant significant units of the linguistic signs, the monemas, which constitute the first articulation and each word substituted kept equivalence to the original meaning of original monema. Also, it was noted a number of diverse and many monemas were chosen, with balanced combination of grammatical (grammatical monema is what changes the meaning of a word, in certain positions of the syntagma, along with a relatively small number of other monemes. It is the smallest linguistic unit that has grammatical meaning) and lexical (lexical monema is what belongs to unlimited inventories; is the monema endowed with lexical meaning) monemas. In syntagmatic production, monemas ordinations were syntactically coherent, being linked with semantic conservation and preserved number. In general, the results showed that the written representation mode based on linguistic relations paradigmatic and syntagmatic qualifies itself to be used in the classroom as a potential identifier and accompanist of meanings acquired from students in the process of scientific inquiry.

Keywords: semiotics, language, high school, physics teaching

Procedia PDF Downloads 106
736 Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance

Authors: Loai AbdAllah, Mahmoud Kaiyal

Abstract:

Missing values in real-world datasets are a common problem. Many algorithms were developed to deal with this problem, most of them replace the missing values with a fixed value that was computed based on the observed values. In our work, we used a distance function based on Bhattacharyya distance to measure the distance between objects with missing values. Bhattacharyya distance, which measures the similarity of two probability distributions. The proposed distance distinguishes between known and unknown values. Where the distance between two known values is the Mahalanobis distance. When, on the other hand, one of them is missing the distance is computed based on the distribution of the known values, for the coordinate that contains the missing value. This method was integrated with Wikaya, a digital health company developing a platform that helps to improve prevention of chronic diseases such as diabetes and cancer. In order for Wikaya’s recommendation system to work distance between users need to be measured. Since there are missing values in the collected data, there is a need to develop a distance function distances between incomplete users profiles. To evaluate the accuracy of the proposed distance function in reflecting the actual similarity between different objects, when some of them contain missing values, we integrated it within the framework of k nearest neighbors (kNN) classifier, since its computation is based only on the similarity between objects. To validate this, we ran the algorithm over diabetes and breast cancer datasets, standard benchmark datasets from the UCI repository. Our experiments show that kNN classifier using our proposed distance function outperforms the kNN using other existing methods.

Keywords: missing values, incomplete data, distance, incomplete diabetes data

Procedia PDF Downloads 188
735 Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Authors: Shohei Maruyama, Yasuo Matsuyama, Sachiyo Aburatani

Abstract:

The development of the method to annotate unknown gene functions is an important task in bioinformatics. One of the approaches for the annotation is The identification of the metabolic pathway that genes are involved in. Gene expression data have been utilized for the identification, since gene expression data reflect various intracellular phenomena. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.

Keywords: metabolic pathways, gene expression data, microarray, Kullback–Leibler divergence, KL divergence, support vector machines, SVM, machine learning

Procedia PDF Downloads 375
734 Recruitment Model (FSRM) for Faculty Selection Based on Fuzzy Soft

Authors: G. S. Thakur

Abstract:

This paper presents a Fuzzy Soft Recruitment Model (FSRM) for faculty selection of MHRD technical institutions. The selection criteria are based on 4-tier flexible structure in the institutions. The Advisory Committee on Faculty Recruitment (ACoFAR) suggested nine criteria for faculty in the proposed FSRM. The model Fuzzy Soft is proposed with consultation of ACoFAR based on selection criteria. The Fuzzy Soft distance similarity measures are applied for finding best faculty from the applicant pool.

Keywords: fuzzy soft set, fuzzy sets, fuzzy soft distance, fuzzy soft similarity measures, ACoFAR

Procedia PDF Downloads 314
733 A Corpus-Based Study on the Lexical, Syntactic and Sequential Features across Interpreting Types

Authors: Qianxi Lv, Junying Liang

Abstract:

Among the various modes of interpreting, simultaneous interpreting (SI) is regarded as a ‘complex’ and ‘extreme condition’ of cognitive tasks while consecutive interpreters (CI) do not have to share processing capacity between tasks. Given that SI exerts great cognitive demand, it makes sense to posit that the output of SI may be more compromised than that of CI in the linguistic features. The bulk of the research has stressed the varying cognitive demand and processes involved in different modes of interpreting; however, related empirical research is sparse. In keeping with our interest in investigating the quantitative linguistic factors discriminating between SI and CI, the current study seeks to examine the potential lexical simplification, syntactic complexity and sequential organization mechanism with a self-made inter-model corpus of transcribed simultaneous and consecutive interpretation, translated speech and original speech texts with a total running word of 321960. The lexical features are extracted in terms of the lexical density, list head coverage, hapax legomena, and type-token ratio, as well as core vocabulary percentage. Dependency distance, an index for syntactic complexity and reflective of processing demand is employed. Frequency motif is a non-grammatically-bound sequential unit and is also used to visualize the local function distribution of interpreting the output. While SI is generally regarded as multitasking with high cognitive load, our findings evidently show that CI may impose heavier or taxing cognitive resource differently and hence yields more lexically and syntactically simplified output. In addition, the sequential features manifest that SI and CI organize the sequences from the source text in different ways into the output, to minimize the cognitive load respectively. We reasoned the results in the framework that cognitive demand is exerted both on maintaining and coordinating component of Working Memory. On the one hand, the information maintained in CI is inherently larger in volume compared to SI. On the other hand, time constraints directly influence the sentence reformulation process. The temporal pressure from the input in SI makes the interpreters only keep a small chunk of information in the focus of attention. Thus, SI interpreters usually produce the output by largely retaining the source structure so as to relieve the information from the working memory immediately after formulated in the target language. Conversely, CI interpreters receive at least a few sentences before reformulation, when they are more self-paced. CI interpreters may thus tend to retain and generate the information in a way to lessen the demand. In other words, interpreters cope with the high demand in the reformulation phase of CI by generating output with densely distributed function words, more content words of higher frequency values and fewer variations, simpler structures and more frequently used language sequences. We consequently propose a revised effort model based on the result for a better illustration of cognitive demand during both interpreting types.

Keywords: cognitive demand, corpus-based, dependency distance, frequency motif, interpreting types, lexical simplification, sequential units distribution, syntactic complexity

Procedia PDF Downloads 141