Search results for: annotation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 98

Search results for: annotation

68 VIAN-DH: Computational Multimodal Conversation Analysis Software and Infrastructure

Authors: Teodora Vukovic, Christoph Hottiger, Noah Bubenhofer

Abstract:

The development of VIAN-DH aims at bridging two linguistic approaches: conversation analysis/interactional linguistics (IL), so far a dominantly qualitative field, and computational/corpus linguistics and its quantitative and automated methods. Contemporary IL investigates the systematic organization of conversations and interactions composed of speech, gaze, gestures, and body positioning, among others. These highly integrated multimodal behaviour is analysed based on video data aimed at uncovering so called “multimodal gestalts”, patterns of linguistic and embodied conduct that reoccur in specific sequential positions employed for specific purposes. Multimodal analyses (and other disciplines using videos) are so far dependent on time and resource intensive processes of manual transcription of each component from video materials. Automating these tasks requires advanced programming skills, which is often not in the scope of IL. Moreover, the use of different tools makes the integration and analysis of different formats challenging. Consequently, IL research often deals with relatively small samples of annotated data which are suitable for qualitative analysis but not enough for making generalized empirical claims derived quantitatively. VIAN-DH aims to create a workspace where many annotation layers required for the multimodal analysis of videos can be created, processed, and correlated in one platform. VIAN-DH will provide a graphical interface that operates state-of-the-art tools for automating parts of the data processing. The integration of tools that already exist in computational linguistics and computer vision, facilitates data processing for researchers lacking programming skills, speeds up the overall research process, and enables the processing of large amounts of data. The main features to be introduced are automatic speech recognition for the transcription of language, automatic image recognition for extraction of gestures and other visual cues, as well as grammatical annotation for adding morphological and syntactic information to the verbal content. In the ongoing instance of VIAN-DH, we focus on gesture extraction (pointing gestures, in particular), making use of existing models created for sign language and adapting them for this specific purpose. In order to view and search the data, VIAN-DH will provide a unified format and enable the import of the main existing formats of annotated video data and the export to other formats used in the field, while integrating different data source formats in a way that they can be combined in research. VIAN-DH will adapt querying methods from corpus linguistics to enable parallel search of many annotation levels, combining token-level and chronological search for various types of data. VIAN-DH strives to bring crucial and potentially revolutionary innovation to the field of IL, (that can also extend to other fields using video materials). It will allow the processing of large amounts of data automatically and, the implementation of quantitative analyses, combining it with the qualitative approach. It will facilitate the investigation of correlations between linguistic patterns (lexical or grammatical) with conversational aspects (turn-taking or gestures). Users will be able to automatically transcribe and annotate visual, spoken and grammatical information from videos, and to correlate those different levels and perform queries and analyses.

Keywords: multimodal analysis, corpus linguistics, computational linguistics, image recognition, speech recognition

Procedia PDF Downloads 69
67 A Clinician’s Perspective on Electroencephalography Annotation and Analysis for Driver Drowsiness Estimation

Authors: Ruxandra Aursulesei, David O’Callaghan, Cian Ryan, Diarmaid O’Cualain, Viktor Varkarakis, Alina Sultana, Joseph Lemley

Abstract:

Human errors caused by drowsiness are among the leading causes of road accidents. Neurobiological research gives information about the electrical signals emitted by neurons firing within the brain. Electrical signal frequencies can be determined by attaching bio-sensors to the head surface. By observing the electrical impulses and the rhythmic interaction of neurons with each other, we can predict the mental state of a person. In this paper, we aim to better understand intersubject and intrasubject variability in terms of electrophysiological patterns that occur at the onset of drowsiness and their evolution with the decreasing of vigilance. The purpose is to lay the foundations for an algorithm that detects the onset of drowsiness before the physical signs become apparent.

Keywords: electroencephalography, drowsiness, ADAS, annotations, clinician

Procedia PDF Downloads 77
66 Reading as Moral Afternoon Tea: An Empirical Study on the Compensation Effect between Literary Novel Reading and Readers’ Moral Motivation

Authors: Chong Jiang, Liang Zhao, Hua Jian, Xiaoguang Wang

Abstract:

The belief that there is a strong relationship between reading narrative and morality has generally become the basic assumption of scholars, philosophers, critics, and cultural critics. The virtuality constructed by literary novels inspires readers to regard the narrative as a thinking experiment, creating the distance between readers and events so that they can freely and morally experience the positions of different roles. Therefore, the virtual narrative combined with literary characteristics is always considered as a "moral laboratory." Well-established findings revealed that people show less lying and deceptive behaviors in the morning than in the afternoon, called the morning morality effect. As a limited self-regulation resource, morality will be constantly depleted with the change of time rhythm under the influence of the morning morality effect. It can also be compensated and restored in various ways, such as eating, sleeping, etc. As a common form of entertainment in modern society, literary novel reading gives people more virtual experience and emotional catharsis, just as a relaxing afternoon tea that helps people break away from fast-paced work, restore physical strength, and relieve stress in a short period of leisure. In this paper, inspired by the compensation control theory, we wonder whether reading literary novels in the digital environment could replenish a kind of spiritual energy for self-regulation to compensate for people's moral loss in the afternoon. Based on this assumption, we leverage the social annotation text content generated by readers in digital reading to represent the readers' reading attention. We then recognized the semantics and calculated the readers' moral motivation expressed in the annotations and investigated the fine-grained dynamics of the moral motivation changing in each time slot within 24 hours of a day. Comprehensively comparing the division of different time intervals, sufficient experiments showed that the moral motivation reflected in the annotations in the afternoon is significantly higher than that in the morning. The results robustly verified the hypothesis that reading compensates for moral motivation, which we called the moral afternoon tea effect. Moreover, we quantitatively identified that such moral compensation can last until 14:00 in the afternoon and 21:00 in the evening. In addition, it is interesting to find that the division of time intervals of different units impacts the identification of moral rhythms. Dividing the time intervals by four-hour time slot brings more insights of moral rhythms compared with that of three-hour and six-hour time slot.

Keywords: digital reading, social annotation, moral motivation, morning morality effect, control compensation

Procedia PDF Downloads 120
65 Identifying Protein-Coding and Non-Coding Regions in Transcriptomes

Authors: Angela U. Makolo

Abstract:

Protein-coding and Non-coding regions determine the biology of a sequenced transcriptome. Research advances have shown that Non-coding regions are important in disease progression and clinical diagnosis. Existing bioinformatics tools have been targeted towards Protein-coding regions alone. Therefore, there are challenges associated with gaining biological insights from transcriptome sequence data. These tools are also limited to computationally intensive sequence alignment, which is inadequate and less accurate to identify both Protein-coding and Non-coding regions. Alignment-free techniques can overcome the limitation of identifying both regions. Therefore, this study was designed to develop an efficient sequence alignment-free model for identifying both Protein-coding and Non-coding regions in sequenced transcriptomes. Feature grouping and randomization procedures were applied to the input transcriptomes (37,503 data points). Successive iterations were carried out to compute the gradient vector that converged the developed Protein-coding and Non-coding Region Identifier (PNRI) model to the approximate coefficient vector. The logistic regression algorithm was used with a sigmoid activation function. A parameter vector was estimated for every sample in 37,503 data points in a bid to reduce the generalization error and cost. Maximum Likelihood Estimation (MLE) was used for parameter estimation by taking the log-likelihood of six features and combining them into a summation function. Dynamic thresholding was used to classify the Protein-coding and Non-coding regions, and the Receiver Operating Characteristic (ROC) curve was determined. The generalization performance of PNRI was determined in terms of F1 score, accuracy, sensitivity, and specificity. The average generalization performance of PNRI was determined using a benchmark of multi-species organisms. The generalization error for identifying Protein-coding and Non-coding regions decreased from 0.514 to 0.508 and to 0.378, respectively, after three iterations. The cost (difference between the predicted and the actual outcome) also decreased from 1.446 to 0.842 and to 0.718, respectively, for the first, second and third iterations. The iterations terminated at the 390th epoch, having an error of 0.036 and a cost of 0.316. The computed elements of the parameter vector that maximized the objective function were 0.043, 0.519, 0.715, 0.878, 1.157, and 2.575. The PNRI gave an ROC of 0.97, indicating an improved predictive ability. The PNRI identified both Protein-coding and Non-coding regions with an F1 score of 0.970, accuracy (0.969), sensitivity (0.966), and specificity of 0.973. Using 13 non-human multi-species model organisms, the average generalization performance of the traditional method was 74.4%, while that of the developed model was 85.2%, thereby making the developed model better in the identification of Protein-coding and Non-coding regions in transcriptomes. The developed Protein-coding and Non-coding region identifier model efficiently identified the Protein-coding and Non-coding transcriptomic regions. It could be used in genome annotation and in the analysis of transcriptomes.

Keywords: sequence alignment-free model, dynamic thresholding classification, input randomization, genome annotation

Procedia PDF Downloads 24
64 Enhancement of Indexing Model for Heterogeneous Multimedia Documents: User Profile Based Approach

Authors: Aicha Aggoune, Abdelkrim Bouramoul, Mohamed Khiereddine Kholladi

Abstract:

Recent research shows that user profile as important element can improve heterogeneous information retrieval with its content. In this context, we present our indexing model for heterogeneous multimedia documents. This model is based on the combination of user profile to the indexing process. The general idea of our proposal is to operate the common concepts between the representation of a document and the definition of a user through his profile. These two elements will be added as additional indexing entities to enrich the heterogeneous corpus documents indexes. We have developed IRONTO domain ontology allowing annotation of documents. We will present also the developed tool validating the proposed model.

Keywords: indexing model, user profile, multimedia document, heterogeneous of sources, ontology

Procedia PDF Downloads 318
63 An Improvement of Multi-Label Image Classification Method Based on Histogram of Oriented Gradient

Authors: Ziad Abdallah, Mohamad Oueidat, Ali El-Zaart

Abstract:

Image Multi-label Classification (IMC) assigns a label or a set of labels to an image. The big demand for image annotation and archiving in the web attracts the researchers to develop many algorithms for this application domain. The existing techniques for IMC have two drawbacks: The description of the elementary characteristics from the image and the correlation between labels are not taken into account. In this paper, we present an algorithm (MIML-HOGLPP), which simultaneously handles these limitations. The algorithm uses the histogram of gradients as feature descriptor. It applies the Label Priority Power-set as multi-label transformation to solve the problem of label correlation. The experiment shows that the results of MIML-HOGLPP are better in terms of some of the evaluation metrics comparing with the two existing techniques.

Keywords: data mining, information retrieval system, multi-label, problem transformation, histogram of gradients

Procedia PDF Downloads 345
62 A Comparison of YOLO Family for Apple Detection and Counting in Orchards

Authors: Yuanqing Li, Changyi Lei, Zhaopeng Xue, Zhuo Zheng, Yanbo Long

Abstract:

In agricultural production and breeding, implementing automatic picking robot in orchard farming to reduce human labour and error is challenging. The core function of it is automatic identification based on machine vision. This paper focuses on apple detection and counting in orchards and implements several deep learning methods. Extensive datasets are used and a semi-automatic annotation method is proposed. The proposed deep learning models are in state-of-the-art YOLO family. In view of the essence of the models with various backbones, a multi-dimensional comparison in details is made in terms of counting accuracy, mAP and model memory, laying the foundation for realising automatic precision agriculture.

Keywords: agricultural object detection, deep learning, machine vision, YOLO family

Procedia PDF Downloads 158
61 Scalable Learning of Tree-Based Models on Sparsely Representable Data

Authors: Fares Hedayatit, Arnauld Joly, Panagiotis Papadimitriou

Abstract:

Many machine learning tasks such as text annotation usually require training over very big datasets, e.g., millions of web documents, that can be represented in a sparse input space. State-of the-art tree-based ensemble algorithms cannot scale to such datasets, since they include operations whose running time is a function of the input space size rather than a function of the non-zero input elements. In this paper, we propose an efficient splitting algorithm to leverage input sparsity within decision tree methods. Our algorithm improves training time over sparse datasets by more than two orders of magnitude and it has been incorporated in the current version of scikit-learn.org, the most popular open source Python machine learning library.

Keywords: big data, sparsely representable data, tree-based models, scalable learning

Procedia PDF Downloads 229
60 Opinion Mining and Sentiment Analysis on DEFT

Authors: Najiba Ouled Omar, Azza Harbaoui, Henda Ben Ghezala

Abstract:

Current research practices sentiment analysis with a focus on social networks, DEfi Fouille de Texte (DEFT) (Text Mining Challenge) evaluation campaign focuses on opinion mining and sentiment analysis on social networks, especially social network Twitter. It aims to confront the systems produced by several teams from public and private research laboratories. DEFT offers participants the opportunity to work on regularly renewed themes and proposes to work on opinion mining in several editions. The purpose of this article is to scrutinize and analyze the works relating to opinions mining and sentiment analysis in the Twitter social network realized by DEFT. It examines the tasks proposed by the organizers of the challenge and the methods used by the participants.

Keywords: opinion mining, sentiment analysis, emotion, polarity, annotation, OSEE, figurative language, DEFT, Twitter, Tweet

Procedia PDF Downloads 104
59 Number Variation of the Personal Pronoun We in American Spoken English

Authors: Qiong Hu, Ming Yue

Abstract:

Language variation signals the newest usage of language community, which might become the developmental trend of that language. The personal pronoun we is prescribed as a plural pronoun in grammar, but its number value is more flexible in actual use. Based on the homemade Friends corpus, the present research explores the number value of the first person pronoun we in nowadays American spoken English. With consideration of the subjectivity of we, this paper used ‘we+ PCU (Perception-cognation-utterance) verbs’ collocations and ‘we+ plural categories’ as the parameters. Results from corpus data and manual annotation show that: 1) the overall frequency of we has been increasing; 2) we has been increasingly used with other plural categories, indicating a weakening of its plural reference; and 3) we has been increasingly used with PCU (perception-cognition-utterance) verbs of strong subjectivity, indicating a strengthening of its singular reference. All these seem to support our hypothesis that we is undergoing the process of further grammaticalization towards a singular reference, though future evidence is needed to attest the bold prediction.

Keywords: number, PCU verbs, personal pronoun we,

Procedia PDF Downloads 201
58 Isolation and Characterization of a Narrow-Host Range Aeromonas hydrophila Lytic Bacteriophage

Authors: Sumeet Rai, Anuj Tyagi, B. T. Naveen Kumar, Shubhkaramjeet Kaur, Niraj K. Singh

Abstract:

Since their discovery, indiscriminate use of antibiotics in human, veterinary and aquaculture systems has resulted in global emergence/spread of multidrug-resistant bacterial pathogens. Thus, the need for alternative approaches to control bacterial infections has become utmost important. High selectivity/specificity of bacteriophages (phages) permits the targeting of specific bacteria without affecting the desirable flora. In this study, a lytic phage (Ahp1) specific to Aeromonas hydrophila subsp. hydrophila was isolated from finfish aquaculture pond. The host range of Ahp1 range was tested against 10 isolates of A. hydrophila, 7 isolates of A. veronii, 25 Vibrio cholerae isolates, 4 V. parahaemolyticus isolates and one isolate each of V. harveyi and Salmonella enterica collected previously. Except the host A. hydrophila subsp. hydrophila strain, no lytic activity against any other bacterial was detected. During the adsorption rate and one-step growth curve analysis, 69.7% of phage particles were able to get adsorbed on host cell followed by the release of 93 ± 6 phage progenies per host cell after a latent period of ~30 min. Phage nucleic acid was extracted by column purification methods. After determining the nature of phage nucleic acid as dsDNA, phage genome was subjected to next-generation sequencing by generating paired-end (PE, 2 x 300bp) reads on Illumina MiSeq system. De novo assembly of sequencing reads generated circular phage genome of 42,439 bp with G+C content of 58.95%. During open read frame (ORF) prediction and annotation, 22 ORFs (out of 49 total predicted ORFs) were functionally annotated and rest encoded for hypothetical proteins. Proteins involved in major functions such as phage structure formation and packaging, DNA replication and repair, DNA transcription and host cell lysis were encoded by the phage genome. The complete genome sequence of Ahp1 along with gene annotation was submitted to NCBI GenBank (accession number MF683623). Stability of Ahp1 preparations at storage temperatures of 4 °C, 30 °C, and 40 °C was studied over a period of 9 months. At 40 °C storage, phage counts declined by 4 log units within one month; with a total loss of viability after 2 months. At 30 °C temperature, phage preparation was stable for < 5 months. On the other hand, phage counts decreased by only 2 log units over a period of 9 during storage at 4 °C. As some of the phages have also been reported as glycerol sensitive, the stability of Ahp1 preparations in (0%, 15%, 30% and 45%) glycerol stocks were also studied during storage at -80 °C over a period of 9 months. The phage counts decreased only by 2 log units during storage, and no significant difference in phage counts was observed at different concentrations of glycerol. The Ahp1 phage discovered in our study had a very narrow host range and it may be useful for phage typing applications. Moreover, the endolysin and holin genes in Ahp1 genome could be ideal candidates for recombinant cloning and expression of antimicrobial proteins.

Keywords: Aeromonas hydrophila, endolysin, phage, narrow host range

Procedia PDF Downloads 139
57 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering

Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining

Abstract:

DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.

Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)

Procedia PDF Downloads 245
56 Provenance in Scholarly Publications: Introducing the provCite Ontology

Authors: Maria Joseph Israel, Ahmed Amer

Abstract:

Our work aims to broaden the application of provenance technology beyond its traditional domains of scientific workflow management and database systems by offering a general provenance framework to capture richer and extensible metadata in unstructured textual data sources such as literary texts, commentaries, translations, and digital humanities. Specifically, we demonstrate the feasibility of capturing and representing expressive provenance metadata, including more of the context for citing scholarly works (e.g., the authors’ explicit or inferred intentions at the time of developing his/her research content for publication), while also supporting subsequent augmentation with similar additional metadata (by third parties, be they human or automated). To better capture the nature and types of possible citations, in our proposed provenance scheme metaScribe, we extend standard provenance conceptual models to form our proposed provCite ontology. This provides a conceptual framework which can accurately capture and describe more of the functional and rhetorical properties of a citation than can be achieved with any current models.

Keywords: knowledge representation, provenance architecture, ontology, metadata, bibliographic citation, semantic web annotation

Procedia PDF Downloads 81
55 Social Data-Based Users Profiles' Enrichment

Authors: Amel Hannech, Mehdi Adda, Hamid Mcheick

Abstract:

In this paper, we propose a generic model of user profile integrating several elements that may positively impact the research process. We exploit the classical behavior of users and integrate a delimitation process of their research activities into several research sessions enriched with contextual and temporal information, which allows reflecting the current interests of these users in every period of time and infer data freshness. We argue that the annotation of resources gives more transparency on users' needs. It also strengthens social links among resources and users, and can so increase the scope of the user profile. Based on this idea, we integrate the social tagging practice in order to exploit the social users' behavior to enrich their profiles. These profiles are then integrated into a recommendation system in order to predict the interesting personalized items of users allowing to assist them in their researches and further enrich their profiles. In this recommendation, we provide users new research experiences.

Keywords: user profiles, topical ontology, contextual information, folksonomies, tags' clusters, data freshness, association rules, data recommendation

Procedia PDF Downloads 237
54 A Web-Based Self-Learning Grammar for Spoken Language Understanding

Authors: S. Biondi, V. Catania, R. Di Natale, A. R. Intilisano, D. Panno

Abstract:

One of the major goals of Spoken Dialog Systems (SDS) is to understand what the user utters. In the SDS domain, the Spoken Language Understanding (SLU) Module classifies user utterances by means of a pre-definite conceptual knowledge. The SLU module is able to recognize only the meaning previously included in its knowledge base. Due the vastity of that knowledge, the information storing is a very expensive process. Updating and managing the knowledge base are time-consuming and error-prone processes because of the rapidly growing number of entities like proper nouns and domain-specific nouns. This paper proposes a solution to the problem of Name Entity Recognition (NER) applied to a SDS domain. The proposed solution attempts to automatically recognize the meaning associated with an utterance by using the PANKOW (Pattern based Annotation through Knowledge On the Web) method at runtime. The method being proposed extracts information from the Web to increase the SLU knowledge module and reduces the development effort. In particular, the Google Search Engine is used to extract information from the Facebook social network.

Keywords: spoken dialog system, spoken language understanding, web semantic, name entity recognition

Procedia PDF Downloads 308
53 EnumTree: An Enumerative Biclustering Algorithm for DNA Microarray Data

Authors: Haifa Ben Saber, Mourad Elloumi

Abstract:

In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative tree (EnumTree) for biclustering of binary microarray data. is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of ​​EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA micryarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Biclusters with different numbers of rows. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevent biclusters.

Keywords: DNA microarray, biclustering, gene expression data, tree, datamining.

Procedia PDF Downloads 340
52 Meta Mask Correction for Nuclei Segmentation in Histopathological Image

Authors: Jiangbo Shi, Zeyu Gao, Chen Li

Abstract:

Nuclei segmentation is a fundamental task in digital pathology analysis and can be automated by deep learning-based methods. However, the development of such an automated method requires a large amount of data with precisely annotated masks which is hard to obtain. Training with weakly labeled data is a popular solution for reducing the workload of annotation. In this paper, we propose a novel meta-learning-based nuclei segmentation method which follows the label correction paradigm to leverage data with noisy masks. Specifically, we design a fully conventional meta-model that can correct noisy masks by using a small amount of clean meta-data. Then the corrected masks are used to supervise the training of the segmentation model. Meanwhile, a bi-level optimization method is adopted to alternately update the parameters of the main segmentation model and the meta-model. Extensive experimental results on two nuclear segmentation datasets show that our method achieves the state-of-the-art result. In particular, in some noise scenarios, it even exceeds the performance of training on supervised data.

Keywords: deep learning, histopathological image, meta-learning, nuclei segmentation, weak annotations

Procedia PDF Downloads 108
51 De Novo Assembly and Characterization of the Transcriptome during Seed Development, and Generation of Genic-SSR Markers in Pomegranate (Punica granatum L.)

Authors: Ozhan Simsek, Dicle Donmez, Burhanettin Imrak, Ahsen Isik Ozguven, Yildiz Aka Kacar

Abstract:

Pomegranate (Punica granatum L.) is known to be one of the oldest edible fruit tree species, with a wide geographical global distribution. Fruits from the two defined varieties (Hicaznar and 33N26) were taken at intervals after pollination and fertilization at different sizes. Seed samples were used for transcriptome sequencing. Primary sequencing was produced by Illumina Hi-Seq™ 2000. Firstly, we had raw reads, and it was subjected to quality control (QC). Raw reads were filtered into clean reads and aligned to the reference sequences. De novo analysis was performed to detect genes expressed in seeds of pomegranate varieties. We performed downstream analysis to determine differentially expressed genes. We generated about 27.09 gb bases in total after Illumina Hi-Seq sequencing. All samples were assembled together, we got 59,264 Unigenes, the total length, average length, N50, and GC content of Unigenes are 84.547.276 bp, 1.426 bp, 2,137 bp, and 46.20 %, respectively. Unigenes were annotated with 7 functional databases, finally, 42.681(NR: 72.02%), 39.660 (NT: 66.92%), 30.790 (Swissprot: 51.95%), 20.212 (COG: 34.11%), 27.689 (KEGG: 46.72%), 12.328 (GO: 20.80%), and 33,833 (Interpro: 57.09%) Unigenes were annotated. With functional annotation results, we detected 42.376 CDS, and 4.999 SSR distribute on 16.143 Unigenes.

Keywords: next generation sequencing, SSR, RNA-Seq, Illumina

Procedia PDF Downloads 206
50 Engagement Analysis Using DAiSEE Dataset

Authors: Naman Solanki, Souraj Mondal

Abstract:

With the world moving towards online communication, the video datastore has exploded in the past few years. Consequently, it has become crucial to analyse participant’s engagement levels in online communication videos. Engagement prediction of people in videos can be useful in many domains, like education, client meetings, dating, etc. Video-level or frame-level prediction of engagement for a user involves the development of robust models that can capture facial micro-emotions efficiently. For the development of an engagement prediction model, it is necessary to have a widely-accepted standard dataset for engagement analysis. DAiSEE is one of the datasets which consist of in-the-wild data and has a gold standard annotation for engagement prediction. Earlier research done using the DAiSEE dataset involved training and testing standard models like CNN-based models, but the results were not satisfactory according to industry standards. In this paper, a multi-level classification approach has been introduced to create a more robust model for engagement analysis using the DAiSEE dataset. This approach has recorded testing accuracies of 0.638, 0.7728, 0.8195, and 0.866 for predicting boredom level, engagement level, confusion level, and frustration level, respectively.

Keywords: computer vision, engagement prediction, deep learning, multi-level classification

Procedia PDF Downloads 85
49 Photosynthesis Metabolism Affects Yield Potentials in Jatropha curcas L.: A Transcriptomic and Physiological Data Analysis

Authors: Nisha Govender, Siju Senan, Zeti-Azura Hussein, Wickneswari Ratnam

Abstract:

Jatropha curcas, a well-described bioenergy crop has been extensively accepted as future fuel need especially in tropical regions. Ideal planting material required for large-scale plantation is still lacking. Breeding programmes for improved J. curcas varieties are rendered difficult due to limitations in genetic diversity. Using a combined transcriptome and physiological data, we investigated the molecular and physiological differences in high and low yielding Jatropha curcas to address plausible heritable variations underpinning these differences, in regard to photosynthesis, a key metabolism affecting yield potentials. A total of 6 individual Jatropha plant from 4 accessions described as high and low yielding planting materials were selected from the Experimental Plot A, Universiti Kebangsaan Malaysia (UKM), Bangi. The inflorescence and shoots were collected for transcriptome study. For the physiological study, each individual plant (n=10) from the high and low yielding populations were screened for agronomic traits, chlorophyll content and stomatal patterning. The J. curcas transcriptomes are available under BioProject PRJNA338924 and BioSample SAMN05827448-65, respectively Each transcriptome was subjected to functional annotation analysis of sequence datasets using the BLAST2Go suite; BLASTing, mapping, annotation, statistical analysis and visualization Large-scale phenotyping of the number of fruits per plant (NFPP) and fruits per inflorescence (FPI) classified the high yielding Jatropha accessions with average NFPP =60 and FPI > 10, whereas the low yielding accessions yielded an average NFPP=10 and FPI < 5. Next generation sequencing revealed genes with differential expressions in the high yielding Jatropha relative to the low yielding plants. Distinct differences were observed in transcript level associated to photosynthesis metabolism. DEGs collection in the low yielding population showed comparable CAM photosynthetic metabolism and photorespiration, evident as followings: phosphoenolpyruvate phosphate translocator chloroplastic like isoform with 2.5 fold change (FC) and malate dehydrogenase (2.03 FC). Green leaves have the most pronounced photosynthetic activity in a plant body due to significant accumulation of chloroplast. In most plants, the leaf is always the dominant photosynthesizing heart of the plant body. Large number of the DEGS in the high-yielding population were found attributable to chloroplast and chloroplast associated events; STAY-GREEN chloroplastic, Chlorophyllase-1-like (5.08 FC), beta-amylase (3.66 FC), chlorophyllase-chloroplastic-like (3.1 FC), thiamine thiazole chloroplastic like (2.8 FC), 1-4, alpha glucan branching enzyme chloroplastic amyliplastic (2.6FC), photosynthetic NDH subunit (2.1 FC) and protochlorophyllide chloroplastic (2 FC). The results were parallel to a significant increase in chlorophyll a content in the high yielding population. In addition to the chloroplast associated transcript abundance, the TOO MANY MOUTHS (TMM) at 2.9 FC, which code for distant stomatal distribution and patterning in the high-yielding population may explain high concentration of CO2. The results were in agreement with the role of TMM. Clustered stomata causes back diffusion in the presence of gaps localized closely to one another. We conclude that high yielding Jatropha population corresponds to a collective function of C3 metabolism with a low degree of CAM photosynthetic fixation. From the physiological descriptions, high chlorophyll a content and even distribution of stomata in the leaf contribute to better photosynthetic efficiency in the high yielding Jatropha compared to the low yielding population.

Keywords: chlorophyll, gene expression, genetic variation, stomata

Procedia PDF Downloads 210
48 Detection of PCD-Related Transcription Factors for Improving Salt Tolerance in Plant

Authors: A. Bahieldin, A. Atef, S. Edris, N. O. Gadalla, S. M. Hassan, M. A. Al-Kordy, A. M. Ramadan, A. S. M. Al- Hajar, F. M. El-Domyati

Abstract:

The idea of this work is based on a natural exciting phenomenon suggesting that suppression of genes related to the program cell death (or PCD) mechanism might help the plant cells to efficiently tolerate abiotic stresses. The scope of this work was the detection of PCD-related transcription factors (TFs) that might also be related to salt stress tolerance in plant. Two model plants, e.g., tobacco and Arabidopsis, were utilized in order to investigate this phenomenon. Occurrence of PCD was first proven by Evans blue staining and DNA laddering after tobacco leaf discs were treated with oxalic acid (OA) treatment (20 mM) for 24 h. A number of 31 TFs up regulated after 2 h and co-expressed with genes harboring PCD-related domains were detected via RNA-Seq analysis and annotation. These TFs were knocked down via virus induced gene silencing (VIGS), an RNA interference (RNAi) approach, and tested for their influence on triggering PCD machinery. Then, Arabidopsis SALK knocked out T-DNA insertion mutants in selected TFs analogs to those in tobacco were tested under salt stress (up to 250 mM NaCl) in order to detect the influence of different TFs on conferring salt tolerance in Arabidopsis. Involvement of a number of candidate abiotic-stress related TFs was investigated.

Keywords: VIGS, PCD, RNA-Seq, transcription factors

Procedia PDF Downloads 240
47 Enhanced Arabic Semantic Information Retrieval System Based on Arabic Text Classification

Authors: A. Elsehemy, M. Abdeen , T. Nazmy

Abstract:

Since the appearance of the Semantic web, many semantic search techniques and models were proposed to exploit the information in ontology to enhance the traditional keyword-based search. Many advances were made in languages such as English, German, French and Spanish. However, other languages such as Arabic are not fully supported yet. In this paper we present a framework for ontology based information retrieval for Arabic language. Our system consists of four main modules, namely query parser, indexer, search and a ranking module. Our approach includes building a semantic index by linking ontology concepts to documents, including an annotation weight for each link, to be used in ranking the results. We also augmented the framework with an automatic document categorizer, which enhances the overall document ranking. We have built three Arabic domain ontologies: Sports, Economic and Politics as example for the Arabic language. We built a knowledge base that consists of 79 classes and more than 1456 instances. The system is evaluated using the precision and recall metrics. We have done many retrieval operations on a sample of 40,316 documents with a size 320 MB of pure text. The results show that the semantic search enhanced with text classification gives better performance results than the system without classification.

Keywords: Arabic text classification, ontology based retrieval, Arabic semantic web, information retrieval, Arabic ontology

Procedia PDF Downloads 495
46 Number Variation of the Personal Pronoun we Used by Chinese English Learners

Authors: Qiong Hu, Ming Yue

Abstract:

Language variation signals the newest usage of language community, which might become the developmental trend of that language. However, language textbooks cannot keep up with these emergent usages. Most Chinese English learners nowadays are still exposed to traditional grammar prescribed in the textbook so that some variational usages cannot be acquired. The personal pronoun we is prescribed as a plural pronoun in the textbook grammar, but its number value is more flexible in actual use. Based on the Chinese Learner English Corpus (CLEC), and with the homemade Friends corpus as reference, the present research explores the number value of the first person pronoun we used by Chinese English learners. With consideration of the subjectivity of we, this paper annotated the number value of all the wes in “we+ PCU (Perception-cognation-utterance) verbs” collocations. Results show that though exposed to traditional textbooks which prescribe the plural reference of we, there still exists some unconventional usage (singular or vague in reference) in the writings of Chinese English learners, which is less frequent than that of the native speeches. Corpus data and results from manual semantic annotation show that this could be due to the impact of formulaic sequence on the learners and the positive transfer from their native language. An improved SLA model of native language, target language and interlanguage is put forward to recognize the existence of variation in second language acquisition, which should be given more attention during teaching.

Keywords: Chinese English learners, number, PCU verbs, Personal pronoun we

Procedia PDF Downloads 325
45 Mining the Proteome of Fusobacterium nucleatum for Potential Therapeutics Discovery

Authors: Abdul Musaweer Habib, Habibul Hasan Mazumder, Saiful Islam, Sohel Sikder, Omar Faruk Sikder

Abstract:

The plethora of genome sequence information of bacteria in recent times has ushered in many novel strategies for antibacterial drug discovery and facilitated medical science to take up the challenge of the increasing resistance of pathogenic bacteria to current antibiotics. In this study, we adopted subtractive genomics approach to analyze the whole genome sequence of the Fusobacterium nucleatum, a human oral pathogen having association with colorectal cancer. Our study divulged 1499 proteins of Fusobacterium nucleatum, which has no homolog in human genome. These proteins were subjected to screening further by using the Database of Essential Genes (DEG) that resulted in the identification of 32 vitally important proteins for the bacterium. Subsequent analysis of the identified pivotal proteins, using the KEGG Automated Annotation Server (KAAS) resulted in sorting 3 key enzymes of F. nucleatum that may be good candidates as potential drug targets, since they are unique for the bacterium and absent in humans. In addition, we have demonstrated the 3-D structure of these three proteins. Finally, determination of ligand binding sites of the key proteins as well as screening for functional inhibitors that best fitted with the ligands sites were conducted to discover effective novel therapeutic compounds against Fusobacterium nucleatum.

Keywords: colorectal cancer, drug target, Fusobacterium nucleatum, homology modeling, ligands

Procedia PDF Downloads 355
44 Application of KL Divergence for Estimation of Each Metabolic Pathway Genes

Authors: Shohei Maruyama, Yasuo Matsuyama, Sachiyo Aburatani

Abstract:

The development of the method to annotate unknown gene functions is an important task in bioinformatics. One of the approaches for the annotation is The identification of the metabolic pathway that genes are involved in. Gene expression data have been utilized for the identification, since gene expression data reflect various intracellular phenomena. However, it has been difficult to estimate the gene function with high accuracy. It is considered that the low accuracy of the estimation is caused by the difficulty of accurately measuring a gene expression. Even though they are measured under the same condition, the gene expressions will vary usually. In this study, we proposed a feature extraction method focusing on the variability of gene expressions to estimate the genes' metabolic pathway accurately. First, we estimated the distribution of each gene expression from replicate data. Next, we calculated the similarity between all gene pairs by KL divergence, which is a method for calculating the similarity between distributions. Finally, we utilized the similarity vectors as feature vectors and trained the multiclass SVM for identifying the genes' metabolic pathway. To evaluate our developed method, we applied the method to budding yeast and trained the multiclass SVM for identifying the seven metabolic pathways. As a result, the accuracy that calculated by our developed method was higher than the one that calculated from the raw gene expression data. Thus, our developed method combined with KL divergence is useful for identifying the genes' metabolic pathway.

Keywords: metabolic pathways, gene expression data, microarray, Kullback–Leibler divergence, KL divergence, support vector machines, SVM, machine learning

Procedia PDF Downloads 372
43 Assessing the Walkability and Urban Design Qualities of Campus Streets

Authors: Zhehao Zhang

Abstract:

Walking has become an indispensable and sustainable way of travel for college students in their daily lives; campus street is an important carrier for students to walk and take part in a variety of activities, improving the walkability of campus streets plays an important role in optimizing the quality of campus space environment, promoting the campus walking system and inducing multiple walking behaviors. The purpose of this paper is to explore the effect of campus layout, facility distribution, and location site selection on the walkability of campus streets, and assess the street design qualities from the elements of imageability, enclosure, complexity, transparency, and human scale, and further examines the relationship between street-level urban design perceptual qualities and walkability and its effect on walking behavior in the campus. Taking Tianjin University as the research object, this paper uses the optimized walk score method based on walking frequency, variety, and distance to evaluate the walkability of streets from a macro perspective and measures the urban design qualities in terms of the calculation of street physical environment characteristics, as well as uses behavior annotation and street image data to establish temporal and spatial behavior database to analyze walking activity from the microscopic view. In addition, based on the conclusions, the improvement and design strategy will be presented from the aspects of the built walking environment, street vitality, and walking behavior.

Keywords: walkability, streetscapes, pedestrian activity, walk score

Procedia PDF Downloads 112
42 Deep Learning-Based Object Detection on Low Quality Images: A Case Study of Real-Time Traffic Monitoring

Authors: Jean-Francois Rajotte, Martin Sotir, Frank Gouineau

Abstract:

The installation and management of traffic monitoring devices can be costly from both a financial and resource point of view. It is therefore important to take advantage of in-place infrastructures to extract the most information. Here we show how low-quality urban road traffic images from cameras already available in many cities (such as Montreal, Vancouver, and Toronto) can be used to estimate traffic flow. To this end, we use a pre-trained neural network, developed for object detection, to count vehicles within images. We then compare the results with human annotations gathered through crowdsourcing campaigns. We use this comparison to assess performance and calibrate the neural network annotations. As a use case, we consider six months of continuous monitoring over hundreds of cameras installed in the city of Montreal. We compare the results with city-provided manual traffic counting performed in similar conditions at the same location. The good performance of our system allows us to consider applications which can monitor the traffic conditions in near real-time, making the counting usable for traffic-related services. Furthermore, the resulting annotations pave the way for building a historical vehicle counting dataset to be used for analysing the impact of road traffic on many city-related issues, such as urban planning, security, and pollution.

Keywords: traffic monitoring, deep learning, image annotation, vehicles, roads, artificial intelligence, real-time systems

Procedia PDF Downloads 165
41 Metabolome-based Profiling of African Baobab Fruit (Adansonia Digitata L.) Using a Multiplex Approach of MS and NMR Techniques in Relation to Its Biological Activity

Authors: Marwa T. Badawy, Alaa F. Bakr, Nesrine Hegazi, Mohamed A. Farag, Ahmed Abdellatif

Abstract:

Diabetes Mellitus (DM) is a chronic disease affecting a large population worldwide. Africa is rich in native medicinal plants with myriad health benefits, though less explored towards the development of specific drug therapy as in diabetes. This study aims to determine the in vivo antidiabetic potential of the well-reported and traditionally used fruits of Baobab (Adansonia digitata L.) using STZ induced diabetic model. The in-vitro cytotoxic and antioxidant properties were examined using MTT assay on L-929 fibroblast cells and DPPH antioxidant assays, respectively. The extract showed minimal cytotoxicity with an IC50 value of 105.7 µg/mL. Histopathological and immunohistochemical investigations showed the hepatoprotective and the renoprotective effects of A. digitata fruits’ extract, implying its protective effects against diabetes complications. These findings were further supported by biochemical assays, which showed that i.p., injection of a low dose (150 mg/kg) of A. digitata twice a week lowered the fasting blood glucose levels, lipid profile, hepatic and renal markers. For a comprehensive overview of extract metabolites composition, ultrahigh performance (UHPLC) analysis coupled to high-resolution tandem mass spectrometry (HRMS/MS) in synchronization with molecular networks led to the annotation of 77 metabolites, among which 50% are reported for the first time in A. digitata fruits.

Keywords: adansonia digital, diabetes mellitus, metabolomics, streptozotocin, Sprague, dawley rats

Procedia PDF Downloads 128
40 Implementation of a Paraconsistent-Fuzzy Digital PID Controller in a Level Control Process

Authors: H. M. Côrtes, J. I. Da Silva Filho, M. F. Blos, B. S. Zanon

Abstract:

In a modern society the factor corresponding to the increase in the level of quality in industrial production demand new techniques of control and machinery automation. In this context, this work presents the implementation of a Paraconsistent-Fuzzy Digital PID controller. The controller is based on the treatment of inconsistencies both in the Paraconsistent Logic and in the Fuzzy Logic. Paraconsistent analysis is performed on the signals applied to the system inputs using concepts from the Paraconsistent Annotated Logic with annotation of two values (PAL2v). The signals resulting from the paraconsistent analysis are two values defined as Dc - Degree of Certainty and Dct - Degree of Contradiction, which receive a treatment according to the Fuzzy Logic theory, and the resulting output of the logic actions is a single value called the crisp value, which is used to control dynamic system. Through an example, it was demonstrated the application of the proposed model. Initially, the Paraconsistent-Fuzzy Digital PID controller was built and tested in an isolated MATLAB environment and then compared to the equivalent Digital PID function of this software for standard step excitation. After this step, a level control plant was modeled to execute the controller function on a physical model, making the tests closer to the actual. For this, the control parameters (proportional, integral and derivative) were determined for the configuration of the conventional Digital PID controller and of the Paraconsistent-Fuzzy Digital PID, and the control meshes in MATLAB were assembled with the respective transfer function of the plant. Finally, the results of the comparison of the level control process between the Paraconsistent-Fuzzy Digital PID controller and the conventional Digital PID controller were presented.

Keywords: fuzzy logic, paraconsistent annotated logic, level control, digital PID

Procedia PDF Downloads 254
39 Genome-Wide Analysis of Long Terminal Repeat (LTR) Retrotransposons in Rabbit (Oryctolagus cuniculus)

Authors: Zeeshan Khan, Faisal Nouroz, Shumaila Noureen

Abstract:

European or common rabbit (Oryctolagus cuniculus) belongs to class Mammalia, order Lagomorpha of family Leporidae. They are distributed worldwide and are native to Europe (France, Spain and Portugal) and Africa (Morocco and Algeria). LTR retrotransposons are major Class I mobile genetic elements of eukaryotic genomes and play a crucial role in genome expansion, evolution and diversification. They were mostly annotated in various genomes by conventional approaches of homology searches, which restricted the annotation of novel elements. Present work involved de novo identification of LTR retrotransposons by LTR_FINDER in haploid genome of rabbit (2247.74 Mb) distributed in 22 chromosomes, of which 7,933 putative full-length or partial copies were identified containing 69.38 Mb of elements, accounting 3.08% of the genome. Highest copy numbers (731) were found on chromosome 7, followed by chromosome 12 (705), while the lowest copy numbers (27) were detected in chromosome 19 with no elements identified from chromosome 21 due to partially sequenced chromosome, unidentified nucleotides (N) and repeated simple sequence repeats (SSRs). The identified elements ranged in sizes from 1.2 - 25.8 Kb with average sizes between 2-10 Kb. Highest percentage (4.77%) of elements was found in chromosome 15, while lowest (0.55%) in chromosome 19. The most frequent tRNA type was Arginine present in majority of the elements. Based on gained results, it was estimated that rabbit exhibits 15,866 copies having 137.73 Mb of elements accounting 6.16% of diploid genome (44 chromosomes). Further molecular analyses will be helpful in chromosomal localization and distribution of these elements on chromosomes.

Keywords: rabbit, LTR retrotransposons, genome, chromosome

Procedia PDF Downloads 115