Search results for: POS tagging
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 52

Search results for: POS tagging

52 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm

Procedia PDF Downloads 301
51 Touch Interaction through Tagging Context

Authors: Gabriel Chavira, Jorge Orozco, Salvador Nava, Eduardo Álvarez, Julio Rolón, Roberto Pichardo

Abstract:

Ambient Intelligence promotes a shift in computing which involves fitting-out the environments with devices to support context-aware applications. One of main objectives is the reduction to a minimum of the user’s interactive effort, the diversity and quantity of devices with which people are surrounded with, in existing environments; increase the level of difficulty to achieve this goal. The mobile phones and their amazing global penetration, makes it an excellent device for delivering new services to the user, without requiring a learning effort. The environment will have to be able to perceive all of the interaction techniques. In this paper, we present the PICTAC model (Perceiving touch Interaction through TAgging Context), which similarly delivers service to members of a research group.

Keywords: ambient intelligence, tagging context, touch interaction, touching services

Procedia PDF Downloads 353
50 Searching Linguistic Synonyms through Parts of Speech Tagging

Authors: Faiza Hussain, Usman Qamar

Abstract:

Synonym-based searching is recognized to be a complicated problem as text mining from unstructured data of web is challenging. Finding useful information which matches user need from bulk of web pages is a cumbersome task. In this paper, a novel and practical synonym retrieval technique is proposed for addressing this problem. For replacement of semantics, user intent is taken into consideration to realize the technique. Parts-of-Speech tagging is applied for pattern generation of the query and a thesaurus for this experiment was formed and used. Comparison with Non-Context Based Searching, Context Based searching proved to be a more efficient approach while dealing with linguistic semantics. This approach is very beneficial in doing intent based searching. Finally, results and future dimensions are presented.

Keywords: natural language processing, text mining, information retrieval, parts-of-speech tagging, grammar, semantics

Procedia PDF Downloads 278
49 Unsupervised Part-of-Speech Tagging for Amharic Using K-Means Clustering

Authors: Zelalem Fantahun

Abstract:

Part-of-speech tagging is the process of assigning a part-of-speech or other lexical class marker to each word into naturally occurring text. Part-of-speech tagging is the most fundamental and basic task almost in all natural language processing. In natural language processing, the problem of providing large amount of manually annotated data is a knowledge acquisition bottleneck. Since, Amharic is one of under-resourced language, the availability of tagged corpus is the bottleneck problem for natural language processing especially for POS tagging. A promising direction to tackle this problem is to provide a system that does not require manually tagged data. In unsupervised learning, the learner is not provided with classifications. Unsupervised algorithms seek out similarity between pieces of data in order to determine whether they can be characterized as forming a group. This paper explicates the development of unsupervised part-of-speech tagger using K-Means clustering for Amharic language since large amount of data is produced in day-to-day activities. In the development of the tagger, the following procedures are followed. First, the unlabeled data (raw text) is divided into 10 folds and tokenization phase takes place; at this level, the raw text is chunked at sentence level and then into words. The second phase is feature extraction which includes word frequency, syntactic and morphological features of a word. The third phase is clustering. Among different clustering algorithms, K-means is selected and implemented in this study that brings group of similar words together. The fourth phase is mapping, which deals with looking at each cluster carefully and the most common tag is assigned to a group. This study finds out two features that are capable of distinguishing one part-of-speech from others these are morphological feature and positional information and show that it is possible to use unsupervised learning for Amharic POS tagging. In order to increase performance of the unsupervised part-of-speech tagger, there is a need to incorporate other features that are not included in this study, such as semantic related information. Finally, based on experimental result, the performance of the system achieves a maximum of 81% accuracy.

Keywords: POS tagging, Amharic, unsupervised learning, k-means

Procedia PDF Downloads 413
48 Automatic Tagging and Accuracy in Assamese Text Data

Authors: Chayanika Hazarika Bordoloi

Abstract:

This paper is an attempt to work on a highly inflectional language called Assamese. This is also one of the national languages of India and very little has been achieved in terms of computational research. Building a language processing tool for a natural language is not very smooth as the standard and language representation change at various levels. This paper presents inflectional suffixes of Assamese verbs and how the statistical tools, along with linguistic features, can improve the tagging accuracy. Conditional random fields (CRF tool) was used to automatically tag and train the text data; however, accuracy was improved after linguistic featured were fed into the training data. Assamese is a highly inflectional language; hence, it is challenging to standardizing its morphology. Inflectional suffixes are used as a feature of the text data. In order to analyze the inflections of Assamese word forms, a list of suffixes is prepared. This list comprises suffixes, comprising of all possible suffixes that various categories can take is prepared. Assamese words can be classified into inflected classes (noun, pronoun, adjective and verb) and un-inflected classes (adverb and particle). The corpus used for this morphological analysis has huge tokens. The corpus is a mixed corpus and it has given satisfactory accuracy. The accuracy rate of the tagger has gradually improved with the modified training data.

Keywords: CRF, morphology, tagging, tagset

Procedia PDF Downloads 167
47 Morpheme Based Parts of Speech Tagger for Kannada Language

Authors: M. C. Padma, R. J. Prathibha

Abstract:

Parts of speech tagging is the process of assigning appropriate parts of speech tags to the words in a given text. The critical or crucial information needed for tagging a word come from its internal structure rather from its neighboring words. The internal structure of a word comprises of its morphological features and grammatical information. This paper presents a morpheme based parts of speech tagger for Kannada language. This proposed work uses hierarchical tag set for assigning tags. The system is tested on some Kannada words taken from EMILLE corpus. Experimental result shows that the performance of the proposed system is above 90%.

Keywords: hierarchical tag set, morphological analyzer, natural language processing, paradigms, parts of speech

Procedia PDF Downloads 261
46 The Automatisation of Dictionary-Based Annotation in a Parallel Corpus of Old English

Authors: Ana Elvira Ojanguren Lopez, Javier Martin Arista

Abstract:

The aims of this paper are to present the automatisation procedure adopted in the implementation of a parallel corpus of Old English, as well as, to assess the progress of automatisation with respect to tagging, annotation, and lemmatisation. The corpus consists of an aligned parallel text with word-for-word comparison Old English-English that provides the Old English segment with inflectional form tagging (gloss, lemma, category, and inflection) and lemma annotation (spelling, meaning, inflectional class, paradigm, word-formation and secondary sources). This parallel corpus is intended to fill a gap in the field of Old English, in which no parallel and/or lemmatised corpora are available, while the average amount of corpus annotation is low. With this background, this presentation has two main parts. The first part, which focuses on tagging and annotation, selects the layouts and fields of lexical databases that are relevant for these tasks. Most information used for the annotation of the corpus can be retrieved from the lexical and morphological database Nerthus and the database of secondary sources Freya. These are the sources of linguistic and metalinguistic information that will be used for the annotation of the lemmas of the corpus, including morphological and semantic aspects as well as the references to the secondary sources that deal with the lemmas in question. Although substantially adapted and re-interpreted, the lemmatised part of these databases draws on the standard dictionaries of Old English, including The Student's Dictionary of Anglo-Saxon, An Anglo-Saxon Dictionary, and A Concise Anglo-Saxon Dictionary. The second part of this paper deals with lemmatisation. It presents the lemmatiser Norna, which has been implemented on Filemaker software. It is based on a concordance and an index to the Dictionary of Old English Corpus, which comprises around three thousand texts and three million words. In its present state, the lemmatiser Norna can assign lemma to around 80% of textual forms on an automatic basis, by searching the index and the concordance for prefixes, stems and inflectional endings. The conclusions of this presentation insist on the limits of the automatisation of dictionary-based annotation in a parallel corpus. While the tagging and annotation are largely automatic even at the present stage, the automatisation of alignment is pending for future research. Lemmatisation and morphological tagging are expected to be fully automatic in the near future, once the database of secondary sources Freya and the lemmatiser Norna have been completed.

Keywords: corpus linguistics, historical linguistics, old English, parallel corpus

Procedia PDF Downloads 170
45 Extracting Actions with Improved Part of Speech Tagging for Social Networking Texts

Authors: Yassine Jamoussi, Ameni Youssfi, Henda Ben Ghezala

Abstract:

With the growing interest in social networking, the interaction of social actors evolved to a source of knowledge in which it becomes possible to perform context aware-reasoning. The information extraction from social networking especially Twitter and Facebook is one of the problems in this area. To extract text from social networking, we need several lexical features and large scale word clustering. We attempt to expand existing tokenizer and to develop our own tagger in order to support the incorrect words currently in existence in Facebook and Twitter. Our goal in this work is to benefit from the lexical features developed for Twitter and online conversational text in previous works, and to develop an extraction model for constructing a huge knowledge based on actions

Keywords: social networking, information extraction, part-of-speech tagging, natural language processing

Procedia PDF Downloads 274
44 Tagging a corpus of Media Interviews with Diplomats: Challenges and Solutions

Authors: Roberta Facchinetti, Sara Corrizzato, Silvia Cavalieri

Abstract:

Increasing interconnection between data digitalization and linguistic investigation has given rise to unprecedented potentialities and challenges for corpus linguists, who need to master IT tools for data analysis and text processing, as well as to develop techniques for efficient and reliable annotation in specific mark-up languages that encode documents in a format that is both human and machine-readable. In the present paper, the challenges emerging from the compilation of a linguistic corpus will be taken into consideration, focusing on the English language in particular. To do so, the case study of the InterDiplo corpus will be illustrated. The corpus, currently under development at the University of Verona (Italy), represents a novelty in terms both of the data included and of the tag set used for its annotation. The corpus covers media interviews and debates with diplomats and international operators conversing in English with journalists who do not share the same lingua-cultural background as their interviewees. To date, this appears to be the first tagged corpus of international institutional spoken discourse and will be an important database not only for linguists interested in corpus analysis but also for experts operating in international relations. In the present paper, special attention will be dedicated to the structural mark-up, parts of speech annotation, and tagging of discursive traits, that are the innovational parts of the project being the result of a thorough study to find the best solution to suit the analytical needs of the data. Several aspects will be addressed, with special attention to the tagging of the speakers’ identity, the communicative events, and anthropophagic. Prominence will be given to the annotation of question/answer exchanges to investigate the interlocutors’ choices and how such choices impact communication. Indeed, the automated identification of questions, in relation to the expected answers, is functional to understand how interviewers elicit information as well as how interviewees provide their answers to fulfill their respective communicative aims. A detailed description of the aforementioned elements will be given using the InterDiplo-Covid19 pilot corpus. The data yielded by our preliminary analysis of the data will highlight the viable solutions found in the construction of the corpus in terms of XML conversion, metadata definition, tagging system, and discursive-pragmatic annotation to be included via Oxygen.

Keywords: spoken corpus, diplomats’ interviews, tagging system, discursive-pragmatic annotation, english linguistics

Procedia PDF Downloads 153
43 Geophysical and Laboratory Evaluation of Aquifer Position, Aquifer Protective Capacity and Groundwater Quality in Selected Dumpsites in Calabar Municipal Local Government Area, South Eastern Nigeria

Authors: Egor Atan Obeten, Abong Augustine Agwul, Bissong A. Samson

Abstract:

The position of the aquifer, its protective capability, and the quality of the groundwater beneath the dumpsite were all investigated. The techniques employed were laboratory, tritium tagging, electrical resistivity tomography (ERT), and vertical electrical sounding (VES). With a maximum electrode spacing of 500 meters, fifteen VES stations were used, and IPI2win software was used to analyze the data collected. The resistivity map of the dumpsite was determined by deploying six ERT stations for the 2 D survey. To ascertain the degree of soil infiltration beneath the dumpsite, the tritium tagging method was used. Using a conventional laboratory procedure, groundwater samples were taken from neighboring boreholes and examined. The findings showed that there were three to five geoelectric layers, with the aquifer position being inferred to be between 24.2 and 75.1 meters deep in the third, fourth, and fifth levels. Siemens with values in the range of 0.0235 to 0.1908 for the load protection capacity were deemed to be, at most, weakly and badly protected. The obtained porosity values ranged from 44.45 to 89.75. Strong calculated values for transmissivity and porosity indicate a permeable aquifer system with considerable storativity. The area has an infiltration value between 8 and 22 percent, according to the results of the tritium tagging technique, which was used to evaluate the level of infiltration from the dumpsite. Groundwater samples that have been analyzed reveal levels of NO2, DO, Pb2+, magnesium, and cadmium that are higher than what the NSDWQ has approved. Overall analysis of the results from the above-described methodologies shows that the study area's aquifer system is porous and that contaminants will circulate through it quickly if they are contaminated.

Keywords: aquifer, transmissivity, dumpsite, groundwater

Procedia PDF Downloads 12
42 A Tagging Algorithm in Augmented Reality for Mobile Device Screens

Authors: Doga Erisik, Ahmet Karaman, Gulfem Alptekin, Ozlem Durmaz Incel

Abstract:

Augmented reality (AR) is a type of virtual reality aiming to duplicate real world’s environment on a computer’s video feed. The mobile application, which is built for this project (called SARAS), enables annotating real world point of interests (POIs) that are located near mobile user. In this paper, we aim at introducing a robust and simple algorithm for placing labels in an augmented reality system. The system places labels of the POIs on the mobile device screen whose GPS coordinates are given. The proposed algorithm is compared to an existing one in terms of energy consumption and accuracy. The results show that the proposed algorithm gives better results in energy consumption and accuracy while standing still, and acceptably accurate results when driving. The technique provides benefits to AR browsers with its open access algorithm. Going forward, the algorithm will be improved to more rapidly react to position changes while driving.

Keywords: accurate tagging algorithm, augmented reality, localization, location-based AR

Procedia PDF Downloads 339
41 Resource Creation Using Natural Language Processing Techniques for Malay Translated Qur'an

Authors: Nor Diana Ahmad, Eric Atwell, Brandon Bennett

Abstract:

Text processing techniques for English have been developed for several decades. But for the Malay language, text processing methods are still far behind. Moreover, there are limited resources, tools for computational linguistic analysis available for the Malay language. Therefore, this research presents the use of natural language processing (NLP) in processing Malay translated Qur’an text. As the result, a new language resource for Malay translated Qur’an was created. This resource will help other researchers to build the necessary processing tools for the Malay language. This research also develops a simple question-answer prototype to demonstrate the use of the Malay Qur’an resource for text processing. This prototype has been developed using Python. The prototype pre-processes the Malay Qur’an and an input query using a stemming algorithm and then searches for occurrences of the query word stem. The result produced shows improved matching likelihood between user query and its answer. A POS-tagging algorithm has also been produced. The stemming and tagging algorithms can be used as tools for research related to other Malay texts and can be used to support applications such as information retrieval, question answering systems, ontology-based search and other text analysis tasks.

Keywords: language resource, Malay translated Qur'an, natural language processing (NLP), text processing

Procedia PDF Downloads 281
40 Video Processing of a Football Game: Detecting Features of a Football Match for Automated Calculation of Statistics

Authors: Rishabh Beri, Sahil Shah

Abstract:

We have applied a range of filters and processing in order to extract out the various features of the football game, like the field lines of a football field. Another important aspect was the detection of the players in the field and tagging them according to their teams distinguished by their jersey colours. This extracted information combined about the players and field helped us to create a virtual field that consists of the playing field and the players mapped to their locations in it.

Keywords: Detect, Football, Players, Virtual

Procedia PDF Downloads 300
39 Methods and Techniques for Lower Danube Sturgeon Monitoring Used for the Assessment of Anthropic Activities Pressures and the Quantification of Risks on These Species

Authors: Gyorgy Deak, Marius C. Raischi, Lucian P. Georgescu, Tiberius M. Danalache, Elena Holban, Madalina G. Boboc, Monica Matei, Catalina Iticescu, Marius V. Olteanu, Stefan Zamfir, Gabriel Cornateanu

Abstract:

At present, on the Lower Danube, different types of pressures have been identified that affect the anadromous sturgeons stocks with an impact that leads to their decline. This paper presents techniques and procedures used by Romanian experts in the tagging and monitoring of anadromous sturgeons, as well as unique results at international level obtained on the basis of an informational volume collected in over 7 years of monitoring on these species behavior (both for adults as well as for ultrasonically tagged juveniles) on the Lower Danube. The local impact of hydrotechnical constructions (bottom sill, maritime navigation channel), the global impact of the poaching phenomenon and the impact of the restocking programs with sturgeon juveniles were assessed. Thus, the bottom sill impact on the Bala branch, the Bastroe Channel (cross-border impact) and the poaching phenomenon at the level of the Lower Danube was analyzed on the basis of a unique informational volume obtained through the use of patented monitoring systems by the Romanian experts (DKTB respectively, DKMR-01T). At the same time, the results from the monitoring of ultrasonically tagged sturgeon juveniles from the 2015 repopulation program are presented. Conclusions resulting from research can ensure favorable premises for finding some conservation solutions for CITES-protected sturgeon species that have survived for millions of years, currently being 1 species on the brink of extinction - Russian sturgeon, 2 species in danger of extinction - Beluga sturgeon and Stellate sturgeon and 2 species already extinct from the Lower Danube, namely common sturgeon and ship sturgeon.

Keywords: Lower Danube, sturgeons monitoring (adults and juveniles), tagging, impact on conservation

Procedia PDF Downloads 212
38 The Amount of Conformity of Persian Subject Headlines with Users' Social Tagging

Authors: Amir Reza Asnafi, Masoumeh Kazemizadeh, Najmeh Salemi

Abstract:

Due to the diversity of information resources in the web0.2 environment, which is increasing in number from time to time, the social tagging system should be used to discuss Internet resources. Studying the relevance of social tags to thematic headings can help enrich resources and make them more accessible to resources. The present research is of applied-theoretical type and research method of content analysis. In this study, using the listing method and content analysis, the level of accurate, approximate, relative, and non-conformity of social labels of books available in the field of information science and bibliography of Kitabrah website with Persian subject headings was determined. The exact matching of subject headings with social tags averaged 22 items, the approximate matching of subject headings with social tags averaged 36 items, the relative matching of thematic headings with social tags averaged 36 social items, and the average matching titles did not match the title. The average is 116. According to the findings, the exact matching of subject headings with social labels is the lowest and the most inconsistent. This study showed that the average non-compliance of subject headings with social labels is even higher than the sum of the three types of exact, relative, and approximate matching. As a result, the relevance of thematic titles to social labels is low. Due to the fact that the subject headings are in the form of static text and users are not allowed to interact and insert new selected words and topics, and on the other hand, in websites based on Web 2 and based on the social classification system, this possibility is available for users. An important point of the present study and the studies that have matched the syntactic and semantic matching of social labels with thematic headings is that the degree of conformity of thematic headings with social labels is low. Therefore, these two methods can complement each other and create a hybrid cataloging that includes subject headings and social tags. The low level of conformity of thematic headings with social tags confirms the results of backgrounds and writings that have compared the social tags of books with the thematic headings of the Library of Congress. It is not enough to match social labels with thematic headings. It can be said that these two methods can be complementary.

Keywords: Web 2/0, social tags, subject headings, hybrid cataloging

Procedia PDF Downloads 133
37 Synthesis of AgInS2–ZnS at Low Temperature with Tunable Photoluminescence for Photovoltaic Applications

Authors: Nitu Chhikaraa, S. B. Tyagia, Kiran Jainb, Mamta Kharkwala

Abstract:

The I–III–VI2 semiconductor Nanocrystals such as AgInS2 have great interest for various applications such as optical devices (solar cell and LED), cellular Imaging and bio tagging etc. we synthesized the phase and shape controlled chalcopyrite AgInS2 (AIS) colloidal nanoparticles by thermal decomposition of metal xanthate at low temperature in an organic solvent’s containing surfactant molecules. Here we are focusing on enhancements of photoluminescence of AgInS2 Nps by coating of ZnS at low temperature for application of optical devices. The size of core shell Nps was less than 50nm.by increasing the time and temperature the emission of the wavelength of the Zn coated AgInS2 Nps could be adjusted from visible region to IR the QY of the AgInS2 Nps could be increased by coating of ZnS from 20 to 80% which was reasonably good as compared to those of the previously reported. The synthesized NPs were characterized by PL, UV, XRD and TEM.

Keywords: PL, UV, XRD, TEM

Procedia PDF Downloads 344
36 Data about Loggerhead Sea Turtle (Caretta caretta) and Green Turtle (Chelonia mydas) in Vlora Bay, Albania

Authors: Enerit Sacdanaku, Idriz Haxhiu

Abstract:

This study was conducted in the area of Vlora Bay, Albania. Data about Sea Turtles Caretta caretta and Chelonia mydas, belonging to two periods of time (1984–1991; 2008–2014) are given. All data gathered were analyzed using recent methodologies. For all turtles captured (as by catch), the Curve Carapace Length (CCL) and Curved Carapace Width (CCW) were measured. These data were statistically analyzed, where the mean was 67.11 cm for CCL and 57.57 cm for CCW of all individuals studied (n=13). All untagged individuals of marine turtles were tagged using metallic tags (Stockbrand’s titanium tag) with an Albanian address. Sex was determined and resulted that 45.4% of individuals were females, 27.3% males and 27.3% juveniles. All turtles were studied for the presence of the epibionts. The area of Vlora Bay is used from marine turtles (Caretta caretta) as a migratory corridor to pass from the Mediterranean to the northern part of the Adriatic Sea.

Keywords: Caretta caretta, Chelonia mydas, CCL, CCW, tagging, Vlora Bay

Procedia PDF Downloads 151
35 Neuro-Fuzzy Based Model for Phrase Level Emotion Understanding

Authors: Vadivel Ayyasamy

Abstract:

The present approach deals with the identification of Emotions and classification of Emotional patterns at Phrase-level with respect to Positive and Negative Orientation. The proposed approach considers emotion triggered terms, its co-occurrence terms and also associated sentences for recognizing emotions. The proposed approach uses Part of Speech Tagging and Emotion Actifiers for classification. Here sentence patterns are broken into phrases and Neuro-Fuzzy model is used to classify which results in 16 patterns of emotional phrases. Suitable intensities are assigned for capturing the degree of emotion contents that exist in semantics of patterns. These emotional phrases are assigned weights which supports in deciding the Positive and Negative Orientation of emotions. The approach uses web documents for experimental purpose and the proposed classification approach performs well and achieves good F-Scores.

Keywords: emotions, sentences, phrases, classification, patterns, fuzzy, positive orientation, negative orientation

Procedia PDF Downloads 348
34 Named Entity Recognition System for Tigrinya Language

Authors: Sham Kidane, Fitsum Gaim, Ibrahim Abdella, Sirak Asmerom, Yoel Ghebrihiwot, Simon Mulugeta, Natnael Ambassager

Abstract:

The lack of annotated datasets is a bottleneck to the progress of NLP in low-resourced languages. The work presented here consists of large-scale annotated datasets and models for the named entity recognition (NER) system for the Tigrinya language. Our manually constructed corpus comprises over 340K words tagged for NER, with over 118K of the tokens also having parts-of-speech (POS) tags, annotated with 12 distinct classes of entities, represented using several types of tagging schemes. We conducted extensive experiments covering convolutional neural networks and transformer models; the highest performance achieved is 88.8% weighted F1-score. These results are especially noteworthy given the unique challenges posed by Tigrinya’s distinct grammatical structure and complex word morphologies. The system can be an essential building block for the advancement of NLP systems in Tigrinya and other related low-resourced languages and serve as a bridge for cross-referencing against higher-resourced languages.

Keywords: Tigrinya NER corpus, TiBERT, TiRoBERTa, BiLSTM-CRF

Procedia PDF Downloads 61
33 Social Data-Based Users Profiles' Enrichment

Authors: Amel Hannech, Mehdi Adda, Hamid Mcheick

Abstract:

In this paper, we propose a generic model of user profile integrating several elements that may positively impact the research process. We exploit the classical behavior of users and integrate a delimitation process of their research activities into several research sessions enriched with contextual and temporal information, which allows reflecting the current interests of these users in every period of time and infer data freshness. We argue that the annotation of resources gives more transparency on users' needs. It also strengthens social links among resources and users, and can so increase the scope of the user profile. Based on this idea, we integrate the social tagging practice in order to exploit the social users' behavior to enrich their profiles. These profiles are then integrated into a recommendation system in order to predict the interesting personalized items of users allowing to assist them in their researches and further enrich their profiles. In this recommendation, we provide users new research experiences.

Keywords: user profiles, topical ontology, contextual information, folksonomies, tags' clusters, data freshness, association rules, data recommendation

Procedia PDF Downloads 238
32 The Advancements of Transformer Models in Part-of-Speech Tagging System for Low-Resource Tigrinya Language

Authors: Shamm Kidane, Ibrahim Abdella, Fitsum Gaim, Simon Mulugeta, Sirak Asmerom, Natnael Ambasager, Yoel Ghebrihiwot

Abstract:

The call for natural language processing (NLP) systems for low-resource languages has become more apparent than ever in the past few years, with the arduous challenges still present in preparing such systems. This paper presents an improved dataset version of the Nagaoka Tigrinya Corpus for Parts-of-Speech (POS) classification system in the Tigrinya language. The size of the initial Nagaoka dataset was incremented, totaling the new tagged corpus to 118K tokens, which comprised the 12 basic POS annotations used previously. The additional content was also annotated manually in a stringent manner, followed similar rules to the former dataset and was formatted in CONLL format. The system made use of the novel approach in NLP tasks and use of the monolingually pre-trained TiELECTRA, TiBERT and TiRoBERTa transformer models. The highest achieved score is an impressive weighted F1-score of 94.2%, which surpassed the previous systems by a significant measure. The system will prove useful in the progress of NLP-related tasks for Tigrinya and similarly related low-resource languages with room for cross-referencing higher-resource languages.

Keywords: Tigrinya POS corpus, TiBERT, TiRoBERTa, conditional random fields

Procedia PDF Downloads 55
31 MhAGCN: Multi-Head Attention Graph Convolutional Network for Web Services Classification

Authors: Bing Li, Zhi Li, Yilong Yang

Abstract:

Web classification can promote the quality of service discovery and management in the service repository. It is widely used to locate developers desired services. Although traditional classification methods based on supervised learning models can achieve classification tasks, developers need to manually mark web services, and the quality of these tags may not be enough to establish an accurate classifier for service classification. With the doubling of the number of web services, the manual tagging method has become unrealistic. In recent years, the attention mechanism has made remarkable progress in the field of deep learning, and its huge potential has been fully demonstrated in various fields. This paper designs a multi-head attention graph convolutional network (MHAGCN) service classification method, which can assign different weights to the neighborhood nodes without complicated matrix operations or relying on understanding the entire graph structure. The framework combines the advantages of the attention mechanism and graph convolutional neural network. It can classify web services through automatic feature extraction. The comprehensive experimental results on a real dataset not only show the superior performance of the proposed model over the existing models but also demonstrate its potentially good interpretability for graph analysis.

Keywords: attention mechanism, graph convolutional network, interpretability, service classification, service discovery

Procedia PDF Downloads 108
30 Atlantic Sailfish (Istiophorus albicans) Distribution off the East Coast of Florida from 2003 to 2018 in Response to Sea Surface Temperature

Authors: Meredith M. Pratt

Abstract:

The Atlantic sailfish (Istiophorus albicans) ranges from 40°N to 40°S in the Western Atlantic Ocean and has great economic and recreational value for sport fishers. Off the eastern coast of Florida, charter boats often target this species. Stuart, Florida, bills itself as the sailfish capital of the world. Sailfish tag data from The Billfish Foundation and NOAA was used to determine the relationship between sea surface temperature (SST) and the distribution of Atlantic sailfish caught and released over a fifteen-year period (2003 to 2018). Tagging information was collected from local sports fishermen in Florida. Using the time and location of each landed sailfish, a satellite-derived SST value was obtained for each point. The purpose of this study was to determine if sea surface warming was associated with changes in sailfish distribution. On average, sailfish were caught at 26.16 ± 1.70°C (x̄ ± s.d.) over the fifteen-year period. The most sailfish catches occurred at temperatures ranging from 25.2°C to 25.5°C. Over the fifteen-year period, sailfish catches decreased at lower temperatures (~23°C and ~24°C) and at 31°C. At ~25°C and ~30°C there was no change in catch numbers of sailfish. From 26°C to 29°C, there was an increase in the number of sailfish. Based on these results, increasing ocean temperatures will have an impact on the distribution and habitat utilization of sailfish. Warming sea surface temperatures create a need for more policy and regulation to protect the Atlantic sailfish and related highly migratory billfish species.

Keywords: atlantic sailfish, Billfish, istiophorus albicans, sea surface temperature

Procedia PDF Downloads 106
29 Extraction of Compound Words in Malay Sentences Using Linguistic and Statistical Approaches

Authors: Zamri Abu Bakar Zamri, Normaly Kamal Ismail Normaly, Mohd Izani Mohamed Rawi Izani

Abstract:

Malay noun compound are phrases that consist of two or more nouns. The key characteristic behind noun compounds lies on its frequent occurrences within the text. Therefore, extracting these noun compounds is essential for several domains of research such as Information Retrieval, Sentiment Analysis and Question Answering. Many research efforts have been proposed in terms of extracting Malay noun compounds using linguistic and statistical approaches. Most of the existing methods have concentrated on the extraction of bi-gram noun+noun compound. However, extracting noun+verb, noun+adjective and noun+prepositional is challenging due to the difficulty of selecting an appropriate method with effective results. Thus, there is still room for improvement in terms of enhancing the effectiveness of compound word extraction. Therefore, this study proposed a combination of linguistic approach and statistical measures in order to enhance the extraction of compound words. Several preprocessing steps are involved including normalization, tokenization, and stemming. The linguistic approach that has been used in this study is Part-of-Speech (POS) tagging. In addition, a new linguistic pattern for named entities has been utilized using a list of Malays named entities in order to enhance the linguistic approach in terms of noun compound recognition. The proposed statistical measures consists of NC-value, NTC-value and NLC value.

Keywords: Compound Word, Noun Compound, Linguistic Approach, Statistical Approach

Procedia PDF Downloads 315
28 Following the Modulation of Transcriptional Activity of Genes by Chromatin Modifications during the Cell Cycle in Living Cells

Authors: Sharon Yunger, Liat Altman, Yuval Garini, Yaron Shav-Tal

Abstract:

Understanding the dynamics of transcription in living cells has improved since the development of quantitative fluorescence-based imaging techniques. We established a method for following transcription from a single copy gene in living cells. A gene tagged with MS2 repeats, used for mRNA tagging, in its 3' UTR was integrated into a single genomic locus. The actively transcribing gene was detected and analyzed by fluorescence in situ hybridization (FISH) and live-cell imaging. Several cell clones were created that differed in the promoter regulating the gene. Thus, comparative analysis could be obtained without the risk of different position effects at each integration site. Cells in S/G2 phases could be detected exhibiting two adjacent transcription sites on sister chromatids. A sharp reduction in the transcription levels was observed as cells progressed along the cell cycle. We hypothesized that a change in chromatin structure acts as a general mechanism during the cell cycle leading to down-regulation in the activity of some genes. We addressed this question by treating the cells with chromatin decondensing agents. Quantifying and imaging the treated cells suggests that chromatin structure plays a role both in regulating transcriptional levels along the cell cycle, as well as in limiting an active gene from reaching its maximum transcription potential at any given time. These results contribute to understanding the role of chromatin as a regulator of gene expression.

Keywords: cell cycle, living cells, nucleus, transcription

Procedia PDF Downloads 255
27 Characteristic Sentence Stems in Academic English Texts: Definition, Identification, and Extraction

Authors: Jingjie Li, Wenjie Hu

Abstract:

Phraseological units in academic English texts have been a central focus in recent corpus linguistic research. A wide variety of phraseological units have been explored, including collocations, chunks, lexical bundles, patterns, semantic sequences, etc. This paper describes a special category of clause-level phraseological units, namely, Characteristic Sentence Stems (CSSs), with a view to describing their defining criteria and extraction method. CSSs are contiguous lexico-grammatical sequences which contain a subject-predicate structure and which are frame expressions characteristic of academic writing. The extraction of CSSs consists of six steps: Part-of-speech tagging, n-gram segmentation, structure identification, significance of occurrence calculation, text range calculation, and overlapping sequence reduction. Significance of occurrence calculation is the crux of this study. It includes the computing of both the internal association and the boundary independence of a CSS and tests the occurring significance of the CSS from both inside and outside perspectives. A new normalization algorithm is also introduced into the calculation of LocalMaxs for reducing overlapping sequences. It is argued that many sentence stems are so recurrent in academic texts that the most typical of them have become the habitual ways of making meaning in academic writing. Therefore, studies of CSSs could have potential implications and reference value for academic discourse analysis, English for Academic Purposes (EAP) teaching and writing.

Keywords: characteristic sentence stem, extraction method, phraseological unit, the statistical measure

Procedia PDF Downloads 135
26 Personalized Social Resource Recommender Systems on Interest-Based Social Networks

Authors: C. L. Huang, J. J. Sia

Abstract:

The interest-based social networks, also known as social bookmark sharing systems, are useful platforms for people to conveniently read and collect internet resources. These platforms also providing function of social networks, and users can share and explore internet resources from the social networks. Providing personalized internet resources to users is an important issue on these platforms. This study uses two types of relationship on the social networks—following and follower and proposes a collaborative recommender system, consisting of two main steps. First, this study calculates the relationship strength between the target user and the target user's followings and followers to find top-N similar neighbors. Second, from the top-N similar neighbors, the articles (internet resources) that may interest the target user are recommended to the target user. In this system, users can efficiently obtain recent, related and diverse internet resources (knowledge) from the interest-based social network. This study collected the experimental dataset from Diigo, which is a famous bookmark sharing system. The experimental results show that the proposed recommendation model is more accurate than two traditional baseline recommendation models but slightly lower than the cosine model in accuracy. However, in the metrics of the diversity and executing time, our proposed model outperforms the cosine model.

Keywords: recommender systems, social networks, tagging, bookmark sharing systems, collaborative recommender systems, knowledge management

Procedia PDF Downloads 137
25 Label Free Detection of Small Molecules Using Surface-Enhanced Raman Spectroscopy with Gold Nanoparticles Synthesized with Various Capping Agents

Authors: Zahra Khan

Abstract:

Surface-Enhanced Raman Spectroscopy (SERS) has received increased attention in recent years, focusing on biological and medical applications due to its great sensitivity as well as molecular specificity. In the context of biological samples, there are generally two methodologies for SERS based applications: label-free detection and the use of SERS tags. The necessity of tagging can make the process slower and limits the use for real life. Label-free detection offers the advantage that it reports direct spectroscopic evidence associated with the target molecule rather than the label. Reproducible, highly monodisperse gold nanoparticles (Au NPs) were synthesized using a relatively facile seed-mediated growth method. Different capping agents (TRIS, citrate, and CTAB) were used during synthesis, and characterization was performed. They were then mixed with different analyte solutions before drop-casting onto a glass slide prior to Raman measurements to see which NPs displayed the highest SERS activity as well as their stability. A host of different analytes were tested, both non-biomolecules and biomolecules, which were all successfully detected using this method at concentrations as low as 10-3M with salicylic acid reaching a detection limit in the nanomolar range. SERS was also performed on samples with a mixture of analytes present, whereby peaks from both target molecules were distinctly observed. This is a fast and effective rapid way of testing samples and offers potential applications in the biomedical field as a tool for diagnostic and treatment purposes.

Keywords: gold nanoparticles, label free, seed-mediated growth, SERS

Procedia PDF Downloads 94
24 Policy Brief/Note of Philippine Health Issues: Human Rights Violations Committed on Healthcare Workers

Authors: Trina Isabel Santiago, Daniel Chua, Jumee Tayaban, Joseph Daniel Timbol, Joshua Yanes

Abstract:

Numerous instances of human rights violations on healthcare workers have been reported during the COVID-19 pandemic in the Philippines. This brief aims to explore these civil and political rights violations and propose recommendations to address these. Our review shows that a wide range of civic and political human rights violations have been committed by individual citizens and government agencies on individual healthcare workers and health worker groups. These violations include discrimination, red-tagging, evictions, illegal arrests, and acts of violence ranging from chemical attacks to homicide. If left unchecked, these issues, compounded by the pandemic, may lead to the exacerbations of the pre-existing problems of the Philippine healthcare system. Despite all pre-existing reports by human rights groups and public media articles, there still seems to be a lack of government action to condemn and prevent these violations. The existence of government agencies which directly contribute to these violations with the lack of condemnation from other agencies further propagate the problem. Given these issues, this policy brief recommends the establishment of an interagency task force for the protection of human rights of healthcare workers as well as the expedited passing of current legislative bills towards the same goal. For more immediate action, we call for the establishment of a dedicated hotline for these incidents with adequate appointment and training of point persons, construction of clear guidelines, and closer collaboration between government agencies in being united against these issues.

Keywords: human rights violations, healthcare workers, COVID-19 pandemic, Philippines

Procedia PDF Downloads 579
23 Designing of Content Management Systems (CMS) for Web Development

Authors: Abdul Basit Kiani, Maryam Kiani

Abstract:

Content Management Systems (CMS) have transformed the landscape of web development by providing an accessible and efficient platform for creating and managing digital content. This abstract explores the key features and benefits of CMS in web development, highlighting its impact on website creation and maintenance. CMS offers a user-friendly interface that empowers individuals to create, edit, and publish content without requiring extensive technical knowledge. With customizable templates and themes, users can personalize the design and layout of their websites, ensuring a visually appealing online presence. Furthermore, CMS facilitates efficient content organization through categorization and tagging, enabling visitors to navigate and search for information effortlessly. It also supports version control, allowing users to track and manage revisions effectively. Scalability is a notable advantage of CMS, as it offers a wide range of plugins and extensions to integrate additional features into websites. From e-commerce functionality to social media integration, CMS adapts to evolving business needs. Additionally, CMS enhances collaborative workflows by allowing multiple user roles and permissions. This enables teams to collaborate effectively on content creation and management, streamlining processes and ensuring smooth coordination. In conclusion, CMS serves as a powerful tool in web development, simplifying content creation, customization, organization, scalability, and collaboration. With CMS, individuals and businesses can create dynamic and engaging websites, establishing a strong online presence with ease.

Keywords: web development, content management systems, information technology, programming

Procedia PDF Downloads 46