Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 264

Search results for: grammatical annotation

174 Identifying Protein-Coding and Non-Coding Regions in Transcriptomes

Abstract:

Protein-coding and Non-coding regions determine the biology of a sequenced transcriptome. Research advances have shown that Non-coding regions are important in disease progression and clinical diagnosis. Existing bioinformatics tools have been targeted towards Protein-coding regions alone. Therefore, there are challenges associated with gaining biological insights from transcriptome sequence data. These tools are also limited to computationally intensive sequence alignment, which is inadequate and less accurate to identify both Protein-coding and Non-coding regions. Alignment-free techniques can overcome the limitation of identifying both regions. Therefore, this study was designed to develop an efficient sequence alignment-free model for identifying both Protein-coding and Non-coding regions in sequenced transcriptomes. Feature grouping and randomization procedures were applied to the input transcriptomes (37,503 data points). Successive iterations were carried out to compute the gradient vector that converged the developed Protein-coding and Non-coding Region Identifier (PNRI) model to the approximate coefficient vector. The logistic regression algorithm was used with a sigmoid activation function. A parameter vector was estimated for every sample in 37,503 data points in a bid to reduce the generalization error and cost. Maximum Likelihood Estimation (MLE) was used for parameter estimation by taking the log-likelihood of six features and combining them into a summation function. Dynamic thresholding was used to classify the Protein-coding and Non-coding regions, and the Receiver Operating Characteristic (ROC) curve was determined. The generalization performance of PNRI was determined in terms of F1 score, accuracy, sensitivity, and specificity. The average generalization performance of PNRI was determined using a benchmark of multi-species organisms. The generalization error for identifying Protein-coding and Non-coding regions decreased from 0.514 to 0.508 and to 0.378, respectively, after three iterations. The cost (difference between the predicted and the actual outcome) also decreased from 1.446 to 0.842 and to 0.718, respectively, for the first, second and third iterations. The iterations terminated at the 390th epoch, having an error of 0.036 and a cost of 0.316. The computed elements of the parameter vector that maximized the objective function were 0.043, 0.519, 0.715, 0.878, 1.157, and 2.575. The PNRI gave an ROC of 0.97, indicating an improved predictive ability. The PNRI identified both Protein-coding and Non-coding regions with an F1 score of 0.970, accuracy (0.969), sensitivity (0.966), and specificity of 0.973. Using 13 non-human multi-species model organisms, the average generalization performance of the traditional method was 74.4%, while that of the developed model was 85.2%, thereby making the developed model better in the identification of Protein-coding and Non-coding regions in transcriptomes. The developed Protein-coding and Non-coding region identifier model efficiently identified the Protein-coding and Non-coding transcriptomic regions. It could be used in genome annotation and in the analysis of transcriptomes.

Keywords: sequence alignment-free model, dynamic thresholding classification, input randomization, genome annotation

Procedia PDF Downloads 26

173 Enhancement of Indexing Model for Heterogeneous Multimedia Documents: User Profile Based Approach

Authors: Aicha Aggoune, Abdelkrim Bouramoul, Mohamed Khiereddine Kholladi

Abstract:

Recent research shows that user profile as important element can improve heterogeneous information retrieval with its content. In this context, we present our indexing model for heterogeneous multimedia documents. This model is based on the combination of user profile to the indexing process. The general idea of our proposal is to operate the common concepts between the representation of a document and the definition of a user through his profile. These two elements will be added as additional indexing entities to enrich the heterogeneous corpus documents indexes. We have developed IRONTO domain ontology allowing annotation of documents. We will present also the developed tool validating the proposed model.

Keywords: indexing model, user profile, multimedia document, heterogeneous of sources, ontology

Procedia PDF Downloads 321

172 The Influence of Modernity and Globalization upon Language: The Korean Language between Confucianism and Americanization

Authors: Raluca-Ioana Antonescu

Abstract:

The field research of the paper stands at the intersection between Linguistics and Sociology, while the problem of the research is the importance of language in the modernization process and in a globalized society. The research objective is to prove that language is a stimulant for modernity, while it defines the tradition and the culture of a specific society. In order to examine the linguistic change of the Korean language due to the modernity and globalization, the paper tries to answer one main question, What are the changes the Korean language underwent from a traditional version of Korean, towards one influenced by modernity?, and two secondary questions, How are explored in specialized literature the relations between globalization (and modernity) and culture (focusing on language)? and What influences the Korean language? For the purpose of answering the research questions, the paper has the main premise that due to modernity and globalization, the Korean language changed its discourse construction, and two secondary hypothesis, first is that in literature there are not much explored the relations between culture and modernity focusing on the language discourse construction, but more about identity issue and commodification problems, and the second hypothesis is that the Korean language is influenced by traditional values (like Confucianism) while receiving influence also of globalization process (especially from English language). In terms of methodology, the paper will analyze the two main influences upon the Korean language, referring to traditionalism (being defined as the influence of Confucianism) and modernism (as the influence of other countries’ language and culture), and how the Korean language it was constructed and modified due to these two elements. The paper will analyze at what level (grammatical, lexical, etc.) the traditionalism help at the construction of the Korean language, and what are the changes at each level that modernism brought along. As for the results of this research, the influence of modernism changed both lexically and grammatically the Korean language. In 60 years the increase of English influence is astonishing, and this paper shows the main changes the Korean language underwent, like the loanwords (Konglish), but also the reduction of the speech levels and the ease of the register variation use. Therefore the grammatical influence of modernity and globalization could be seen at the reduction of the speech level and register variation, while the lexical change comes with the influence of English language especially, where about 10% of the Korean vocabulary is considered to be loanwords. Also the paper presents the interrelation between traditionalism and modernity, with the example of Konglish, but not only (we can consider also the Korean greetings which are translated by Koreans when they speak in other languages, bringing their cultural characteristics in English discourse construction), which makes the Koreans global, since they speak in an international language, but still local since they cannot get rid completely of their culture.

Keywords: Confucianism, globalization, language and linguistic change, modernism, traditionalism

Procedia PDF Downloads 170

171 Linguistic Analysis of Argumentation Structures in Georgian Political Speeches

Authors: Mariam Matiashvili

Abstract:

Argumentation is an integral part of our daily communications - formal or informal. Argumentative reasoning, techniques, and language tools are used both in personal conversations and in the business environment. Verbalization of the opinions requires the use of extraordinary syntactic-pragmatic structural quantities - arguments that add credibility to the statement. The study of argumentative structures allows us to identify the linguistic features that make the text argumentative. Knowing what elements make up an argumentative text in a particular language helps the users of that language improve their skills. Also, natural language processing (NLP) has become especially relevant recently. In this context, one of the main emphases is on the computational processing of argumentative texts, which will enable the automatic recognition and analysis of large volumes of textual data. The research deals with the linguistic analysis of the argumentative structures of Georgian political speeches - particularly the linguistic structure, characteristics, and functions of the parts of the argumentative text - claims, support, and attack statements. The research aims to describe the linguistic cues that give the sentence a judgmental/controversial character and helps to identify reasoning parts of the argumentative text. The empirical data comes from the Georgian Political Corpus, particularly TV debates. Consequently, the texts are of a dialogical nature, representing a discussion between two or more people (most often between a journalist and a politician). The research uses the following approaches to identify and analyze the argumentative structures Lexical Classification & Analysis - Identify lexical items that are relevant in argumentative texts creating process - Creating the lexicon of argumentation (presents groups of words gathered from a semantic point of view); Grammatical Analysis and Classification - means grammatical analysis of the words and phrases identified based on the arguing lexicon. Argumentation Schemas - Describe and identify the Argumentation Schemes that are most likely used in Georgian Political Speeches. As a final step, we analyzed the relations between the above mentioned components. For example, If an identified argument scheme is “Argument from Analogy”, identified lexical items semantically express analogy too, and they are most likely adverbs in Georgian. As a result, we created the lexicon with the words that play a significant role in creating Georgian argumentative structures. Linguistic analysis has shown that verbs play a crucial role in creating argumentative structures.

Keywords: georgian, argumentation schemas, argumentation structures, argumentation lexicon

Procedia PDF Downloads 47

170 An Improvement of Multi-Label Image Classification Method Based on Histogram of Oriented Gradient

Authors: Ziad Abdallah, Mohamad Oueidat, Ali El-Zaart

Abstract:

Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.

Keywords: data mining, information retrieval system, multi-label, problem transformation, histogram of gradients

Procedia PDF Downloads 346

169 Language in Court: Ideology, Power and Cognition

Authors: Mehdi Damaliamiri

Abstract:

Undoubtedly, the power of language is hardly a new topic; indeed, the persuasive power of language accompanied by ideology has long been recognized in different aspects of life. The two and a half thousand-year-old Bisitun inscriptions in Iran, proclaiming the victories of the Persian King, Darius, are considered by some historians to have been an early example of the use of propaganda. Added to this, the modern age is the true cradle of fully-fledged ideologies and the ongoing process of centrifugal ideologization. The most visible work on ideology today within the field of linguistics is “Critical Discourse Analysis” (CDA). The focus of CDA is on “uncovering injustice, inequality, taking sides with the powerless and suppressed” and making “mechanisms of manipulation, discrimination, demagogy, and propaganda explicit and transparent.” possible way of relating language to ideology is to propose that ideology and language are inextricably intertwined. From this perspective, language is always ideological, and ideology depends on the language. All language use involves ideology, and so ideology is ubiquitous – in our everyday encounters, as much as in the business of the struggle for power within and between the nation-states and social statuses. At the same time, ideology requires language. Its key characteristics – its power and pervasiveness, its mechanisms for continuity and for change – all come out of the inner organization of language. The two phenomena are homologous: they share the same evolutionary trajectory. To get a more robust portrait of the power and ideology, we need to examine its potential place in the structure, and consider how such structures pattern in terms of the functional elements which organize meanings in the clause. This is based on the belief that all grammatical, including syntactic, knowledge is stored mentally as constructions have become immensely popular. When the structure of the clause is taken into account, the power and ideology have a preference for Complement over Subject and Adjunct. The subject is a central interpersonal element in discourse: it is one of two elements that form the central interactive nub of a proposition. Conceptually, there are countless ways of construing a given event and linguistically, a variety of grammatical devices that are usually available as alternate means of coding a given conception, such as political crime and corruption. In the theory of construal, then, which, like transitivity in Halliday, makes options available, Cognitive Linguistics can offer a cognitive account of ideology in language, where ideology is made possible by the choices a language allows for representing the same material situation in different ways. The possibility of promoting alternative construals of the same reality means that any particular choice in representation is always ideologically constrained or motivated and indicates the perspective and interests of the text-producer.

Keywords: power, ideology, court, discourse

Procedia PDF Downloads 136

168 A Comparison of YOLO Family for Apple Detection and Counting in Orchards

Authors: Yuanqing Li, Changyi Lei, Zhaopeng Xue, Zhuo Zheng, Yanbo Long

Abstract:

In agricultural production and breeding, implementing automatic picking robot in orchard farming to reduce human labour and error is challenging. The core function of it is automatic identification based on machine vision. This paper focuses on apple detection and counting in orchards and implements several deep learning methods. Extensive datasets are used and a semi-automatic annotation method is proposed. The proposed deep learning models are in state-of-the-art YOLO family. In view of the essence of the models with various backbones, a multi-dimensional comparison in details is made in terms of counting accuracy, mAP and model memory, laying the foundation for realising automatic precision agriculture.

Keywords: agricultural object detection, deep learning, machine vision, YOLO family

Procedia PDF Downloads 164

167 Scalable Learning of Tree-Based Models on Sparsely Representable Data

Authors: Fares Hedayatit, Arnauld Joly, Panagiotis Papadimitriou

Abstract:

Many machine learning tasks such as text annotation usually require training over very big datasets, e.g., millions of web documents, that can be represented in a sparse input space. State-of the-art tree-based ensemble algorithms cannot scale to such datasets, since they include operations whose running time is a function of the input space size rather than a function of the non-zero input elements. In this paper, we propose an efficient splitting algorithm to leverage input sparsity within decision tree methods. Our algorithm improves training time over sparse datasets by more than two orders of magnitude and it has been incorporated in the current version of scikit-learn.org, the most popular open source Python machine learning library.

Keywords: big data, sparsely representable data, tree-based models, scalable learning

Procedia PDF Downloads 236

166 The Platform for Digitization of Georgian Documents

Authors: Erekle Magradze, Davit Soselia, Levan Shughliashvili, Irakli Koberidze, Shota Tsiskaridze, Victor Kakhniashvili, Tamar Chaghiashvili

Abstract:

Since the beginning of active publishing activity in Georgia, voluminous printed material has been accumulated, the digitization of which is an important task. Digitized materials will be available to the audience, and it will be possible to find text in them and conduct various factual research. Digitizing scanned documents means scanning documents, extracting text from the scanned documents, and processing the text into a corresponding language model to detect inaccuracies and grammatical errors. Implementing these stages requires a unified, scalable, and automated platform, where the digital service developed for each stage will perform the task assigned to it; at the same time, it will be possible to develop these services dynamically so that there is no interruption in the work of the platform.

Keywords: NLP, OCR, BERT, Kubernetes, transformers

Procedia PDF Downloads 114

165 A Stylistic Analysis of the Short Story ‘The Escape’ by Qaisra Shahraz

Authors: Huma Javed

Abstract:

Stylistics is a broad term that is concerned with both literature and linguistics, due to which the significance of the stylistics increases. This research aims to analyze Qaisra Shahraz's short story ‘The Escape’ from the stylistic analysis viewpoint. The focus of this study is on three aspects grammar category, lexical category, and figure of speech of the short story. The research designs for this article are both explorative and descriptive. The analysis of the data shows that the writer has used more nouns in the story as compared to other lexical items, which suggests that story has a descriptive style rather than narrative.

Keywords: The Escape, stylistics, grammatical category, lexical category, figure of speech

Procedia PDF Downloads 190

164 A Syntactic Approach to Applied and Socio-Linguistics in Arabic Language in Modern Communications

Authors: Adeyemo Abduljeeel Taiwo

Abstract:

This research is an attempt that creates a conducive atmosphere of a phonological and morphological compendium of Arabic language in Modern Standard Arabic (MSA) for modern day communications. The research is carried out with the chief aim of grammatical analysis of the two broad fields of Arabic linguistics namely: Applied and Socio-Linguistics. It draws a pictorial record of Applied and Socio-Linguistics in Arabic phonology and morphology. Thematically, it postulates and contemplates to a large degree, the theory of concord in contemporary modern Arabic language acquisition. It utilizes an analytical method while it portrays Arabic as a Semitic language that promotes linguistics and syntax among the scholars of the fields.

Keywords: Arabic language, applied linguistics, socio-linguistics, modern communications

Procedia PDF Downloads 295

163 Background Knowledge and Reading Comprehension in ELT Classes: A Pedagogical Perspective

Authors: Davoud Ansari Kejal, Meysam Sabour

Abstract:

For long, there has been a belief that a reader can easily comprehend a text if he is strong enough in vocabulary and grammatical knowledge but there was no account for the ability of understanding different subjects based on readers’ understanding of the surrounding world which is called world background knowledge. This paper attempts to investigate the reading comprehension process applying the schema theory as an influential factor in comprehending texts, in order to prove the important role of background knowledge in reading comprehension. Based on the discussion, some teaching methods are suggested for employing world background knowledge for an elaborated teaching of reading comprehension in an active learning environment in EFL classes.

Keywords: background knowledge, reading comprehension, schema theory, ELT classes

Procedia PDF Downloads 421

162 The Effects of the Inference Process in Reading Texts in Arabic

Authors: May George

Abstract:

Inference plays an important role in the learning process and it can lead to a rapid acquisition of a second language. When learning a non-native language, i.e., a critical language like Arabic, the students depend on the teacher’s support most of the time to learn new concepts. The students focus on memorizing the new vocabulary and stress on learning all the grammatical rules. Hence, the students became mechanical and cannot produce the language easily. As a result, they are unable to predict the meaning of words in the context by relying heavily on the teacher, in that they cannot link their prior knowledge or even identify the meaning of the words without the support of the teacher. This study explores how the teacher guides students learning during the inference process and what are the processes of learning that can direct student’s inference.

Keywords: inference, reading, Arabic, language acquisition

Procedia PDF Downloads 501

161 Opinion Mining and Sentiment Analysis on DEFT

Authors: Najiba Ouled Omar, Azza Harbaoui, Henda Ben Ghezala

Abstract:

Current research practices sentiment analysis with a focus on social networks, DEfi Fouille de Texte (DEFT) (Text Mining Challenge) evaluation campaign focuses on opinion mining and sentiment analysis on social networks, especially social network Twitter. It aims to confront the systems produced by several teams from public and private research laboratories. DEFT offers participants the opportunity to work on regularly renewed themes and proposes to work on opinion mining in several editions. The purpose of this article is to scrutinize and analyze the works relating to opinions mining and sentiment analysis in the Twitter social network realized by DEFT. It examines the tasks proposed by the organizers of the challenge and the methods used by the participants.

Keywords: opinion mining, sentiment analysis, emotion, polarity, annotation, OSEE, figurative language, DEFT, Twitter, Tweet

Procedia PDF Downloads 109

160 The English Translation of Arabic Metaphors in the Holy Qura’n

Authors: Mohammad Hamzah Alshehab

Abstract:

Metaphor is a substitute expression in everyday life in languages, thoughts and actions. It has an original value in language use with different conceptual, grammatical and properties. In addition, it is a central concept in literary studies. The present paper aims at investigating metaphor’s types imbedded in some Holy Verses (HV). For achieving the objectives of this paper, two English versions were chosen , the first is the Translation of the Meanings of the Noble Qura’n in the English Language by Mohammad AlHilali and Mohammad Khan, and the second version is the English Translation of the Holy Qura’n by Mohammad Ali were used. The researcher selected (20) Holy Verses include metaphors to be analyzed and investigated. Metaphor types were categorized by an assessment of the two translations followed by a discussion between the two versions of translation.

Keywords: metaphor, metaphor’s types, Holy Qura’n, Holy Verses

Procedia PDF Downloads 616

159 The Effect of Information vs. Reasoning Gap Tasks on the Frequency of Conversational Strategies and Accuracy in Speaking among Iranian Intermediate EFL Learners

Authors: Hooriya Sadr Dadras, Shiva Seyed Erfani

Abstract:

Speaking skills merit meticulous attention both on the side of the learners and the teachers. In particular, accuracy is a critical component to guarantee the messages to be conveyed through conversation because a wrongful change may adversely alter the content and purpose of the talk. Different types of tasks have served teachers to meet numerous educational objectives. Besides, negotiation of meaning and the use of different strategies have been areas of concern in socio-cultural theories of SLA. Negotiation of meaning is among the conversational processes which have a crucial role in facilitating the understanding and expression of meaning in a given second language. Conversational strategies are used during interaction when there is a breakdown in communication that leads to the interlocutor attempting to remedy the gap through talk. Therefore, this study was an attempt to investigate if there was any significant difference between the effect of reasoning gap tasks and information gap tasks on the frequency of conversational strategies used in negotiation of meaning in classrooms on one hand, and on the accuracy in speaking of Iranian intermediate EFL learners on the other. After a pilot study to check the practicality of the treatments, at the outset of the main study, the Preliminary English Test was administered to ensure the homogeneity of 87 out of 107 participants who attended the intact classes of a 15 session term in one control and two experimental groups. Also, speaking sections of PET were used as pretest and posttest to examine their speaking accuracy. The tests were recorded and transcribed to estimate the percentage of the number of the clauses with no grammatical errors in the total produced clauses to measure the speaking accuracy. In all groups, the grammatical points of accuracy were instructed and the use of conversational strategies was practiced. Then, different kinds of reasoning gap tasks (matchmaking, deciding on the course of action, and working out a time table) and information gap tasks (restoring an incomplete chart, spot the differences, arranging sentences into stories, and guessing game) were manipulated in experimental groups during treatment sessions, and the students were required to practice conversational strategies when doing speaking tasks. The conversations throughout the terms were recorded and transcribed to count the frequency of the conversational strategies used in all groups. The results of statistical analysis demonstrated that applying both the reasoning gap tasks and information gap tasks significantly affected the frequency of conversational strategies through negotiation. In the face of the improvements, the reasoning gap tasks had a more significant impact on encouraging the negotiation of meaning and increasing the number of conversational frequencies every session. The findings also indicated both task types could help learners significantly improve their speaking accuracy. Here, applying the reasoning gap tasks was more effective than the information gap tasks in improving the level of learners’ speaking accuracy.

Keywords: accuracy in speaking, conversational strategies, information gap tasks, reasoning gap tasks

Procedia PDF Downloads 281

158 Number Variation of the Personal Pronoun We in American Spoken English

Authors: Qiong Hu, Ming Yue

Abstract:

Language variation signals the newest usage of language community, which might become the developmental trend of that language. The personal pronoun we is prescribed as a plural pronoun in grammar, but its number value is more flexible in actual use. Based on the homemade Friends corpus, the present research explores the number value of the first person pronoun we in nowadays American spoken English. With consideration of the subjectivity of we, this paper used ‘we+ PCU (Perception-cognation-utterance) verbs’ collocations and ‘we+ plural categories’ as the parameters. Results from corpus data and manual annotation show that: 1) the overall frequency of we has been increasing; 2) we has been increasingly used with other plural categories, indicating a weakening of its plural reference; and 3) we has been increasingly used with PCU (perception-cognition-utterance) verbs of strong subjectivity, indicating a strengthening of its singular reference. All these seem to support our hypothesis that we is undergoing the process of further grammaticalization towards a singular reference, though future evidence is needed to attest the bold prediction.

Keywords: number, PCU verbs, personal pronoun we,

Procedia PDF Downloads 203

157 A Study of Bilingual Development of a Mandarin and English Bilingual Preschool Child from China to Australia

Authors: Qiang Guo, Ruying Qi

Abstract:

This project aims to trace the developmental patterns of a child's Mandarin and English from China to Australia from age 3; 03 till 5; 06. In childhood bilingual studies, there is an assumption that age 3 is the dividing line between simultaneous bilinguals and sequential bilinguals. Determining similarities and differences between Bilingual First Language Acquisition, Early Second Language Acquisition, and Second Language Acquisition is of great theoretical significance. Studies on Bilingual First Language Acquisition, hereafter, BFLA in the past three decades have shown that the grammatical development of bilingual children progresses through the same developmental trajectories as their monolingual counterparts. Cross-linguistic interaction does not show changes of the basic grammatical knowledge, even in the weaker language. While BFLA studies show consistent results under the conditions of adequate input and meaningful interactional context, the research findings of Early Second Language Acquisition (ESLA) have demonstrated that this cohort proceeds their early English differently from both BFLA and SLA. The different development could be attributed to the age of migration, input pattern, and their Environmental Languages (Lε). In the meantime, the dynamic relationship between the two languages is an issue to invite further attention. The present study attempts to fill this gap. The child in this case study started acquiring L1 Mandarin from birth in China, where the environmental language (Lε) coincided with L1 Mandarin. When she migrated to Australia at 3;06, where the environmental language (Lε) was L2 English, her Mandarin exposure was reduced. On the other hand, she received limited English input starting from 1; 02 in China, where the environmental language (Lε) was L1 Mandarin, a non-English environment. When she relocated to Australia at 3; 06, where the environmental language (Lε) coincided with L2 English, her English exposure significantly increased. The child’s linguistic profile provides an opportunity to explore: (1) What does the child’s English developmental route look like? (2) What does the L1 Mandarin developmental pattern look like in different environmental languages? (3) How do input and environmental language interact in shaping the bilingual child’s linguistic repertoire? In order to answer these questions, two linguistic areas are selected as the focus of the investigation, namely, subject realization and wh-questions. The chosen areas are contrastive in structure but perform the same semantic functions in the two linguistically distant languages and can serve as an ideal testing ground for exploring the developmental path in the two languages. The longitudinal case study adopts a combined approach of qualitative and quantitative analysis. Two years’ Mandarin and English data are examined, and comparisons are made with age-matched monolinguals in each language in CHILDES. To the author’s best knowledge, this study is the first of this kind examining a Mandarin-English bilingual child's bilingual development at a critical age, in different input patterns, and in different environmental languages (Lε). It also expands the scope of the theory of Lε, adding empirical evidence on the relationship between input and Lε in bilingual acquisition.

Keywords: bilingual development, age, input, environmental language (Le)

Procedia PDF Downloads 104

156 Isolation and Characterization of a Narrow-Host Range Aeromonas hydrophila Lytic Bacteriophage

Authors: Sumeet Rai, Anuj Tyagi, B. T. Naveen Kumar, Shubhkaramjeet Kaur, Niraj K. Singh

Abstract:

Since their discovery, indiscriminate use of antibiotics in human, veterinary and aquaculture systems has resulted in global emergence/spread of multidrug-resistant bacterial pathogens. Thus, the need for alternative approaches to control bacterial infections has become utmost important. High selectivity/specificity of bacteriophages (phages) permits the targeting of specific bacteria without affecting the desirable flora. In this study, a lytic phage (Ahp1) specific to Aeromonas hydrophila subsp. hydrophila was isolated from finfish aquaculture pond. The host range of Ahp1 range was tested against 10 isolates of A. hydrophila, 7 isolates of A. veronii, 25 Vibrio cholerae isolates, 4 V. parahaemolyticus isolates and one isolate each of V. harveyi and Salmonella enterica collected previously. Except the host A. hydrophila subsp. hydrophila strain, no lytic activity against any other bacterial was detected. During the adsorption rate and one-step growth curve analysis, 69.7% of phage particles were able to get adsorbed on host cell followed by the release of 93 ± 6 phage progenies per host cell after a latent period of ~30 min. Phage nucleic acid was extracted by column purification methods. After determining the nature of phage nucleic acid as dsDNA, phage genome was subjected to next-generation sequencing by generating paired-end (PE, 2 x 300bp) reads on Illumina MiSeq system. De novo assembly of sequencing reads generated circular phage genome of 42,439 bp with G+C content of 58.95%. During open read frame (ORF) prediction and annotation, 22 ORFs (out of 49 total predicted ORFs) were functionally annotated and rest encoded for hypothetical proteins. Proteins involved in major functions such as phage structure formation and packaging, DNA replication and repair, DNA transcription and host cell lysis were encoded by the phage genome. The complete genome sequence of Ahp1 along with gene annotation was submitted to NCBI GenBank (accession number MF683623). Stability of Ahp1 preparations at storage temperatures of 4 °C, 30 °C, and 40 °C was studied over a period of 9 months. At 40 °C storage, phage counts declined by 4 log units within one month; with a total loss of viability after 2 months. At 30 °C temperature, phage preparation was stable for < 5 months. On the other hand, phage counts decreased by only 2 log units over a period of 9 during storage at 4 °C. As some of the phages have also been reported as glycerol sensitive, the stability of Ahp1 preparations in (0%, 15%, 30% and 45%) glycerol stocks were also studied during storage at -80 °C over a period of 9 months. The phage counts decreased only by 2 log units during storage, and no significant difference in phage counts was observed at different concentrations of glycerol. The Ahp1 phage discovered in our study had a very narrow host range and it may be useful for phage typing applications. Moreover, the endolysin and holin genes in Ahp1 genome could be ideal candidates for recombinant cloning and expression of antimicrobial proteins.

Keywords: Aeromonas hydrophila, endolysin, phage, narrow host range

Procedia PDF Downloads 142

155 The Effect of Written Corrective Feedback on the Accurate Use of Grammatical Forms by Japanese Low-Intermediate EFL Learners

Authors: Ayako Hasegawa, Ken Ubukata

Abstract:

The purpose of this study is to investigate whether corrective feedback has any significant effect on Japanese low-intermediate EFL learners’ performance on a specific set of linguistic features. The subjects are Japanese college students majoring in English. They have studied English for about 7 years, but their inter-language seems to fossilize because non-target like errors is frequently observed in traditional deductive teacher-fronted approach. It has been reported that corrective feedback plays an important role in diminishing or overcoming inter-language fossilization and achieving TL competency. Therefore, it was examined how the corrective feedback (the focus of this study was metalinguistic feedback) and self-correction raised the students’ awareness and helped them notice the gaps between their inter-language and the TL.

Keywords: written corrective feedback, fossilized error, grammar teaching, language teaching

Procedia PDF Downloads 332

154 Morpheme Based Parts of Speech Tagger for Kannada Language

Authors: M. C. Padma, R. J. Prathibha

Abstract:

Parts of speech tagging is the process of assigning appropriate parts of speech tags to the words in a given text. The critical or crucial information needed for tagging a word come from its internal structure rather from its neighboring words. The internal structure of a word comprises of its morphological features and grammatical information. This paper presents a morpheme based parts of speech tagger for Kannada language. This proposed work uses hierarchical tag set for assigning tags. The system is tested on some Kannada words taken from EMILLE corpus. Experimental result shows that the performance of the proposed system is above 90%.

Keywords: hierarchical tag set, morphological analyzer, natural language processing, paradigms, parts of speech

Procedia PDF Downloads 262

153 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering

Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining

Abstract:

DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.

Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)

Procedia PDF Downloads 249

152 Provenance in Scholarly Publications: Introducing the provCite Ontology

Authors: Maria Joseph Israel, Ahmed Amer

Abstract:

Our work aims to broaden the application of provenance technology beyond its traditional domains of scientific workflow management and database systems by offering a general provenance framework to capture richer and extensible metadata in unstructured textual data sources such as literary texts, commentaries, translations, and digital humanities. Specifically, we demonstrate the feasibility of capturing and representing expressive provenance metadata, including more of the context for citing scholarly works (e.g., the authors’ explicit or inferred intentions at the time of developing his/her research content for publication), while also supporting subsequent augmentation with similar additional metadata (by third parties, be they human or automated). To better capture the nature and types of possible citations, in our proposed provenance scheme metaScribe, we extend standard provenance conceptual models to form our proposed provCite ontology. This provides a conceptual framework which can accurately capture and describe more of the functional and rhetorical properties of a citation than can be achieved with any current models.

Keywords: knowledge representation, provenance architecture, ontology, metadata, bibliographic citation, semantic web annotation

Procedia PDF Downloads 83

151 Social Data-Based Users Profiles' Enrichment

Authors: Amel Hannech, Mehdi Adda, Hamid Mcheick

Abstract:

In this paper, we propose a generic model of user profile integrating several elements that may positively impact the research process. We exploit the classical behavior of users and integrate a delimitation process of their research activities into several research sessions enriched with contextual and temporal information, which allows reflecting the current interests of these users in every period of time and infer data freshness. We argue that the annotation of resources gives more transparency on users' needs. It also strengthens social links among resources and users, and can so increase the scope of the user profile. Based on this idea, we integrate the social tagging practice in order to exploit the social users' behavior to enrich their profiles. These profiles are then integrated into a recommendation system in order to predict the interesting personalized items of users allowing to assist them in their researches and further enrich their profiles. In this recommendation, we provide users new research experiences.

Keywords: user profiles, topical ontology, contextual information, folksonomies, tags' clusters, data freshness, association rules, data recommendation

Procedia PDF Downloads 239

150 A Web-Based Self-Learning Grammar for Spoken Language Understanding

Authors: S. Biondi, V. Catania, R. Di Natale, A. R. Intilisano, D. Panno

Abstract:

One of the major goals of Spoken Dialog Systems (SDS) is to understand what the user utters. In the SDS domain, the Spoken Language Understanding (SLU) Module classifies user utterances by means of a pre-definite conceptual knowledge. The SLU module is able to recognize only the meaning previously included in its knowledge base. Due the vastity of that knowledge, the information storing is a very expensive process. Updating and managing the knowledge base are time-consuming and error-prone processes because of the rapidly growing number of entities like proper nouns and domain-specific nouns. This paper proposes a solution to the problem of Name Entity Recognition (NER) applied to a SDS domain. The proposed solution attempts to automatically recognize the meaning associated with an utterance by using the PANKOW (Pattern based Annotation through Knowledge On the Web) method at runtime. The method being proposed extracts information from the Web to increase the SLU knowledge module and reduces the development effort. In particular, the Google Search Engine is used to extract information from the Facebook social network.

Keywords: spoken dialog system, spoken language understanding, web semantic, name entity recognition

Procedia PDF Downloads 310

149 EnumTree: An Enumerative Biclustering Algorithm for DNA Microarray Data

Authors: Haifa Ben Saber, Mourad Elloumi

Abstract:

In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative tree (EnumTree) for biclustering of binary microarray data. is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA micryarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Biclusters with different numbers of rows. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevent biclusters.

Keywords: DNA microarray, biclustering, gene expression data, tree, datamining.

Procedia PDF Downloads 343

148 Collision Theory Based Sentiment Detection Using Discourse Analysis in Hadoop

Authors: Anuta Mukherjee, Saswati Mukherjee

Abstract:

Data is growing everyday. Social networking sites such as Twitter are becoming an integral part of our daily lives, contributing a large increase in the growth of data. It is a rich source especially for sentiment detection or mining since people often express honest opinion through tweets. However, although sentiment analysis is a well-researched topic in text, this analysis using Twitter data poses additional challenges since these are unstructured data with abbreviations and without a strict grammatical correctness. We have employed collision theory to achieve sentiment analysis in Twitter data. We have also incorporated discourse analysis in the collision theory based model to detect accurate sentiment from tweets. We have also used the retweet field to assign weights to certain tweets and obtained the overall weightage of a topic provided in the form of a query. Hadoop has been exploited for speed. Our experiments show effective results.

Keywords: sentiment analysis, twitter, collision theory, discourse analysis

Procedia PDF Downloads 501

147 Changes of First-Person Pronoun Pragmatic Functions in Three Historical Chinese Texts

Authors: Cher Leng Lee

Abstract:

The existence of multiple first-person pronouns (1PPs) in classical Chinese is an issue that has not been resolved despite linguists using the grammatical perspective. This paper proposes pragmatics as a viable solution. There is also a lack of research exploring the evolving usage patterns of 1PPs within the historical context of Chinese language use. Such research can help us comprehend the changes and developments of these linguistic elements. To fill these research gaps, we use the diachronic pragmatics approach to contrast the functions of Chinese 1PPs in three representative texts from three different historical periods: The Analects (The Spring and Autumn Period), The Grand Scribe’s Records (Grand Records) (Qin and Han Period), and A New Account of Tales of the World (New Account) (The Wei, Jin and Southern and Northern Period). The 1PPs of these texts are manually identified and classified according to the pragmatic functions in the given contexts to observe their historical changes, understand the factors that contribute to these changes, and provide possible answers to the development of how wo became the only 1PP in today’s spoken Mandarin.

Keywords: historical, Chinese, pronouns, pragmatics

Procedia PDF Downloads 16

146 Meta Mask Correction for Nuclei Segmentation in Histopathological Image

Authors: Jiangbo Shi, Zeyu Gao, Chen Li

Abstract:

Nuclei segmentation is a fundamental task in digital pathology analysis and can be automated by deep learning-based methods. However, the development of such an automated method requires a large amount of data with precisely annotated masks which is hard to obtain. Training with weakly labeled data is a popular solution for reducing the workload of annotation. In this paper, we propose a novel meta-learning-based nuclei segmentation method which follows the label correction paradigm to leverage data with noisy masks. Specifically, we design a fully conventional meta-model that can correct noisy masks by using a small amount of clean meta-data. Then the corrected masks are used to supervise the training of the segmentation model. Meanwhile, a bi-level optimization method is adopted to alternately update the parameters of the main segmentation model and the meta-model. Extensive experimental results on two nuclear segmentation datasets show that our method achieves the state-of-the-art result. In particular, in some noise scenarios, it even exceeds the performance of training on supervised data.

Keywords: deep learning, histopathological image, meta-learning, nuclei segmentation, weak annotations

Procedia PDF Downloads 111

145 De Novo Assembly and Characterization of the Transcriptome during Seed Development, and Generation of Genic-SSR Markers in Pomegranate (Punica granatum L.)

Authors: Ozhan Simsek, Dicle Donmez, Burhanettin Imrak, Ahsen Isik Ozguven, Yildiz Aka Kacar

Abstract:

Pomegranate (Punica granatum L.) is known to be one of the oldest edible fruit tree species, with a wide geographical global distribution. Fruits from the two defined varieties (Hicaznar and 33N26) were taken at intervals after pollination and fertilization at different sizes. Seed samples were used for transcriptome sequencing. Primary sequencing was produced by Illumina Hi-Seq™ 2000. Firstly, we had raw reads, and it was subjected to quality control (QC). Raw reads were filtered into clean reads and aligned to the reference sequences. De novo analysis was performed to detect genes expressed in seeds of pomegranate varieties. We performed downstream analysis to determine differentially expressed genes. We generated about 27.09 gb bases in total after Illumina Hi-Seq sequencing. All samples were assembled together, we got 59,264 Unigenes, the total length, average length, N50, and GC content of Unigenes are 84.547.276 bp, 1.426 bp, 2,137 bp, and 46.20 %, respectively. Unigenes were annotated with 7 functional databases, finally, 42.681(NR: 72.02%), 39.660 (NT: 66.92%), 30.790 (Swissprot: 51.95%), 20.212 (COG: 34.11%), 27.689 (KEGG: 46.72%), 12.328 (GO: 20.80%), and 33,833 (Interpro: 57.09%) Unigenes were annotated. With functional annotation results, we detected 42.376 CDS, and 4.999 SSR distribute on 16.143 Unigenes.

Keywords: next generation sequencing, SSR, RNA-Seq, Illumina

Procedia PDF Downloads 209