Search results for: annotation
15 MIM: A Species Independent Approach for Classifying Coding and Non-Coding DNA Sequences in Bacterial and Archaeal Genomes
Authors: Achraf El Allali, John R. Rose
Abstract:
A number of competing methodologies have been developed to identify genes and classify DNA sequences into coding and non-coding sequences. This classification process is fundamental in gene finding and gene annotation tools and is one of the most challenging tasks in bioinformatics and computational biology. An information theory measure based on mutual information has shown good accuracy in classifying DNA sequences into coding and noncoding. In this paper we describe a species independent iterative approach that distinguishes coding from non-coding sequences using the mutual information measure (MIM). A set of sixty prokaryotes is used to extract universal training data. To facilitate comparisons with the published results of other researchers, a test set of 51 bacterial and archaeal genomes was used to evaluate MIM. These results demonstrate that MIM produces superior results while remaining species independent.Keywords: Coding Non-coding Classification, Entropy, GeneRecognition, Mutual Information.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 172614 JConqurr - A Multi-Core Programming Toolkit for Java
Authors: G.A.C.P. Ganegoda, D.M.A. Samaranayake, L.S. Bandara, K.A.D.N.K. Wimalawarne
Abstract:
With the popularity of the multi-core and many-core architectures there is a great requirement for software frameworks which can support parallel programming methodologies. In this paper we introduce an Eclipse toolkit, JConqurr which is easy to use and provides robust support for flexible parallel progrmaming. JConqurr is a multi-core and many-core programming toolkit for Java which is capable of providing support for common parallel programming patterns which include task, data, divide and conquer and pipeline parallelism. The toolkit uses an annotation and a directive mechanism to convert the sequential code into parallel code. In addition to that we have proposed a novel mechanism to achieve the parallelism using graphical processing units (GPU). Experiments with common parallelizable algorithms have shown that our toolkit can be easily and efficiently used to convert sequential code to parallel code and significant performance gains can be achieved.
Keywords: Multi-core, parallel programming patterns, GPU, Java, Eclipse plugin, toolkit,
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 211013 Opinion Mining and Sentiment Analysis on DEFT
Authors: Najiba Ouled Omar, Azza Harbaoui, Henda Ben Ghezala
Abstract:
Current research practices sentiment analysis with a focus on social networks, DEfi Fouille de Texte (DEFT) (Text Mining Challenge) evaluation campaign focuses on opinion mining and sentiment analysis on social networks, especially social network Twitter. It aims to confront the systems produced by several teams from public and private research laboratories. DEFT offers participants the opportunity to work on regularly renewed themes and proposes to work on opinion mining in several editions. The purpose of this article is to scrutinize and analyze the works relating to opinions mining and sentiment analysis in the Twitter social network realized by DEFT. It examines the tasks proposed by the organizers of the challenge and the methods used by the participants.
Keywords: Opinion mining, sentiment analysis, emotion, polarity, annotation, OSEE, figurative language, DEFT, Twitter, Tweet.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 82012 The Spiral_OWL Model – Towards Spiral Knowledge Engineering
Authors: Hafizullah A. Hashim, Aniza. A
Abstract:
The Spiral development model has been used successfully in many commercial systems and in a good number of defense systems. This is due to the fact that cost-effective incremental commitment of funds, via an analogy of the spiral model to stud poker and also can be used to develop hardware or integrate software, hardware, and systems. To support adaptive, semantic collaboration between domain experts and knowledge engineers, a new knowledge engineering process, called Spiral_OWL is proposed. This model is based on the idea of iterative refinement, annotation and structuring of knowledge base. The Spiral_OWL model is generated base on spiral model and knowledge engineering methodology. A central paradigm for Spiral_OWL model is the concentration on risk-driven determination of knowledge engineering process. The collaboration aspect comes into play during knowledge acquisition and knowledge validation phase. Design rationales for the Spiral_OWL model are to be easy-to-implement, well-organized, and iterative development cycle as an expanding spiral.Keywords: Domain Expert, Knowledge Base, Ontology, Software Process.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 176711 Use of Bayesian Network in Information Extraction from Unstructured Data Sources
Authors: Quratulain N. Rajput, Sajjad Haider
Abstract:
This paper applies Bayesian Networks to support information extraction from unstructured, ungrammatical, and incoherent data sources for semantic annotation. A tool has been developed that combines ontologies, machine learning, and information extraction and probabilistic reasoning techniques to support the extraction process. Data acquisition is performed with the aid of knowledge specified in the form of ontology. Due to the variable size of information available on different data sources, it is often the case that the extracted data contains missing values for certain variables of interest. It is desirable in such situations to predict the missing values. The methodology, presented in this paper, first learns a Bayesian network from the training data and then uses it to predict missing data and to resolve conflicts. Experiments have been conducted to analyze the performance of the presented methodology. The results look promising as the methodology achieves high degree of precision and recall for information extraction and reasonably good accuracy for predicting missing values.Keywords: Information Extraction, Bayesian Network, ontology, Machine Learning
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 223010 Automatic Enhanced Update Summary Generation System for News Documents
Authors: S. V. Kogilavani, C. S. Kanimozhiselvi, S. Malliga
Abstract:
Fast changing knowledge systems on the Internet can be accessed more efficiently with the help of automatic document summarization and updating techniques. The aim of multi-document update summary generation is to construct a summary unfolding the mainstream of data from a collection of documents based on the hypothesis that the user has already read a set of previous documents. In order to provide a lot of semantic information from the documents, deeper linguistic or semantic analysis of the source documents were used instead of relying only on document word frequencies to select important concepts. In order to produce a responsive summary, meaning oriented structural analysis is needed. To address this issue, the proposed system presents a document summarization approach based on sentence annotation with aspects, prepositions and named entities. Semantic element extraction strategy is used to select important concepts from documents which are used to generate enhanced semantic summary.
Keywords: Aspects, named entities, prepositions, update summary.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21339 A Web-Based Self-Learning Grammar for Spoken Language Understanding
Authors: S. M. Biondi, V. Catania, R. Di Natale, A. R. Intilisano, D. Panno
Abstract:
One of the major goals of Spoken Dialog Systems (SDS) is to understand what the user utters. In the SDS domain, the Spoken Language Understanding (SLU) Module classifies user utterances by means of a pre-definite conceptual knowledge. The SLU module is able to recognize only the meaning previously included in its knowledge base. Due the vastity of that knowledge, the information storing is a very expensive process. Updating and managing the knowledge base are time-consuming and error-prone processes because of the rapidly growing number of entities like proper nouns and domain-specific nouns. This paper proposes a solution to the problem of Name Entity Recognition (NER) applied to a SDS domain. The proposed solution attempts to automatically recognize the meaning associated with an utterance by using the PANKOW (Pattern based Annotation through Knowledge On the Web) method at runtime. The method being proposed extracts information from the Web to increase the SLU knowledge module and reduces the development effort. In particular, the Google Search Engine is used to extract information from the Facebook social network.
Keywords: Spoken Dialog System, Spoken Language Understanding, Web Semantic, Name Entity Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17758 Combining the Description Features of UMLRT and CSP+T Specifications Applied to a Complete Design of Real-Time Systems
Authors: Kawtar Benghazi Akhlaki, Manuel I. Capel-Tuñón
Abstract:
UML is a collection of notations for capturing a software system specification. These notations have a specific syntax defined by the Object Management Group (OMG), but many of their constructs only present informal semantics. They are primarily graphical, with textual annotation. The inadequacies of standard UML as a vehicle for complete specification and implementation of real-time embedded systems has led to a variety of competing and complementary proposals. The Real-time UML profile (UML-RT), developed and standardized by OMG, defines a unified framework to express the time, scheduling and performance aspects of a system. We present in this paper a framework approach aimed at deriving a complete specification of a real-time system. Therefore, we combine two methods, a semiformal one, UML-RT, which allows the visual modeling of a realtime system and a formal one, CSP+T, which is a design language including the specification of real-time requirements. As to show the applicability of the approach, a correct design of a real-time system with hard real time constraints by applying a set of mapping rules is obtained.
Keywords: CSP+T, formal software specification, process algebras, real-time systems, unified modeling language.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18087 Application of KL Divergence for Estimation of Each Metabolic Pathway Genes
Authors: Shohei Maruyama, Yasuo Matsuyama, Sachiyo Aburatani
Abstract:
Development of a method to estimate gene functions is an important task in bioinformatics. One of the approaches for the annotation is the identification of the metabolic pathway that genes are involved in. Since gene expression data reflect various intracellular phenomena, those data are considered to be related with genes’ functions. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.
Keywords: Metabolic pathways, gene expression data, microarray, Kullback–Leibler divergence, KL divergence, support vector machines, SVM, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23356 Cross Signal Identification for PSG Applications
Authors: Carmen Grigoraş, Victor Grigoraş, Daniela Boişteanu
Abstract:
The standard investigational method for obstructive sleep apnea syndrome (OSAS) diagnosis is polysomnography (PSG), which consists of a simultaneous, usually overnight recording of multiple electro-physiological signals related to sleep and wakefulness. This is an expensive, encumbering and not a readily repeated protocol, and therefore there is need for simpler and easily implemented screening and detection techniques. Identification of apnea/hypopnea events in the screening recordings is the key factor for the diagnosis of OSAS. The analysis of a solely single-lead electrocardiographic (ECG) signal for OSAS diagnosis, which may be done with portable devices, at patient-s home, is the challenge of the last years. A novel artificial neural network (ANN) based approach for feature extraction and automatic identification of respiratory events in ECG signals is presented in this paper. A nonlinear principal component analysis (NLPCA) method was considered for feature extraction and support vector machine for classification/recognition. An alternative representation of the respiratory events by means of Kohonen type neural network is discussed. Our prospective study was based on OSAS patients of the Clinical Hospital of Pneumology from Iaşi, Romania, males and females, as well as on non-OSAS investigated human subjects. Our computed analysis includes a learning phase based on cross signal PSG annotation.Keywords: Artificial neural networks, feature extraction, obstructive sleep apnea syndrome, pattern recognition, signalprocessing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 15395 Implementation of a Paraconsistent-Fuzzy Digital PID Controller in a Level Control Process
Authors: H. M. Côrtes, J. I. Da Silva Filho, M. F. Blos, B. S. Zanon
Abstract:
In a modern society the factor corresponding to the increase in the level of quality in industrial production demand new techniques of control and machinery automation. In this context, this work presents the implementation of a Paraconsistent-Fuzzy Digital PID controller. The controller is based on the treatment of inconsistencies both in the Paraconsistent Logic and in the Fuzzy Logic. Paraconsistent analysis is performed on the signals applied to the system inputs using concepts from the Paraconsistent Annotated Logic with annotation of two values (PAL2v). The signals resulting from the paraconsistent analysis are two values defined as Dc - Degree of Certainty and Dct - Degree of Contradiction, which receive a treatment according to the Fuzzy Logic theory, and the resulting output of the logic actions is a single value called the crisp value, which is used to control dynamic system. Through an example, it was demonstrated the application of the proposed model. Initially, the Paraconsistent-Fuzzy Digital PID controller was built and tested in an isolated MATLAB environment and then compared to the equivalent Digital PID function of this software for standard step excitation. After this step, a level control plant was modeled to execute the controller function on a physical model, making the tests closer to the actual. For this, the control parameters (proportional, integral and derivative) were determined for the configuration of the conventional Digital PID controller and of the Paraconsistent-Fuzzy Digital PID, and the control meshes in MATLAB were assembled with the respective transfer function of the plant. Finally, the results of the comparison of the level control process between the Paraconsistent-Fuzzy Digital PID controller and the conventional Digital PID controller were presented.
Keywords: Fuzzy logic, paraconsistent annotated logic, level control, digital PID.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 12364 A Hybrid Ontology Based Approach for Ranking Documents
Authors: Sarah Motiee, Azadeh Nematzadeh, Mehrnoush Shamsfard
Abstract:
Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques to extract phrases from documents and the query and doing stemming on words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done flexible and in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.Keywords: Document ranking, Ontology, Spread activation algorithm, Annotation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16293 Image Ranking to Assist Object Labeling for Training Detection Models
Authors: Tonislav Ivanov, Oleksii Nedashkivskyi, Denis Babeshko, Vadim Pinskiy, Matthew Putman
Abstract:
Training a machine learning model for object detection that generalizes well is known to benefit from a training dataset with diverse examples. However, training datasets usually contain many repeats of common examples of a class and lack rarely seen examples. This is due to the process commonly used during human annotation where a person would proceed sequentially through a list of images labeling a sufficiently high total number of examples. Instead, the method presented involves an active process where, after the initial labeling of several images is completed, the next subset of images for labeling is selected by an algorithm. This process of algorithmic image selection and manual labeling continues in an iterative fashion. The algorithm used for the image selection is a deep learning algorithm, based on the U-shaped architecture, which quantifies the presence of unseen data in each image in order to find images that contain the most novel examples. Moreover, the location of the unseen data in each image is highlighted, aiding the labeler in spotting these examples. Experiments performed using semiconductor wafer data show that labeling a subset of the data, curated by this algorithm, resulted in a model with a better performance than a model produced from sequentially labeling the same amount of data. Also, similar performance is achieved compared to a model trained on exhaustive labeling of the whole dataset. Overall, the proposed approach results in a dataset that has a diverse set of examples per class as well as more balanced classes, which proves beneficial when training a deep learning model.Keywords: Computer vision, deep learning, object detection, semiconductor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8232 ORank: An Ontology Based System for Ranking Documents
Authors: Mehrnoush Shamsfard, Azadeh Nematzadeh, Sarah Motiee
Abstract:
Increasing growth of information volume in the internet causes an increasing need to develop new (semi)automatic methods for retrieval of documents and ranking them according to their relevance to the user query. In this paper, after a brief review on ranking models, a new ontology based approach for ranking HTML documents is proposed and evaluated in various circumstances. Our approach is a combination of conceptual, statistical and linguistic methods. This combination reserves the precision of ranking without loosing the speed. Our approach exploits natural language processing techniques for extracting phrases and stemming words. Then an ontology based conceptual method will be used to annotate documents and expand the query. To expand a query the spread activation algorithm is improved so that the expansion can be done in various aspects. The annotated documents and the expanded query will be processed to compute the relevance degree exploiting statistical methods. The outstanding features of our approach are (1) combining conceptual, statistical and linguistic features of documents, (2) expanding the query with its related concepts before comparing to documents, (3) extracting and using both words and phrases to compute relevance degree, (4) improving the spread activation algorithm to do the expansion based on weighted combination of different conceptual relationships and (5) allowing variable document vector dimensions. A ranking system called ORank is developed to implement and test the proposed model. The test results will be included at the end of the paper.Keywords: Document ranking, Ontology, Spread activation algorithm, Annotation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18871 Adapting Tools for Text Monitoring and for Scenario Analysis Related to the Field of Social Disasters
Authors: Svetlana Cojocaru, Mircea Petic, Inga Titchiev
Abstract:
Humanity faces more and more often with different social disasters, which in turn can generate new accidents and catastrophes. To mitigate their consequences, it is important to obtain early possible signals about the events which are or can occur and to prepare the corresponding scenarios that could be applied. Our research is focused on solving two problems in this domain: identifying signals related that an accident occurred or may occur and mitigation of some consequences of disasters. To solve the first problem, methods of selecting and processing texts from global network Internet are developed. Information in Romanian is of special interest for us. In order to obtain the mentioned tools, we should follow several steps, divided into preparatory stage and processing stage. Throughout the first stage, we manually collected over 724 news articles and classified them into 10 categories of social disasters. It constitutes more than 150 thousand words. Using this information, a controlled vocabulary of more than 300 keywords was elaborated, that will help in the process of classification and identification of the texts related to the field of social disasters. To solve the second problem, the formalism of Petri net has been used. We deal with the problem of inhabitants’ evacuation in useful time. The analysis methods such as reachability or coverability tree and invariants technique to determine dynamic properties of the modeled systems will be used. To perform a case study of properties of extended evacuation system by adding time, the analysis modules of PIPE such as Generalized Stochastic Petri Nets (GSPN) Analysis, Simulation, State Space Analysis, and Invariant Analysis have been used. These modules helped us to obtain the average number of persons situated in the rooms and the other quantitative properties and characteristics related to its dynamics.Keywords: Lexicon of disasters, modelling, Petri nets, text annotation, social disasters.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1156