Search results for: genome annotation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 434

Search results for: genome annotation

254 PARP1 Links Transcription of a Subset of RBL2-Dependent Genes with Cell Cycle Progression

Authors: Ewelina Wisnik, Zsolt Regdon, Kinga Chmielewska, Laszlo Virag, Agnieszka Robaszkiewicz

Abstract:

Apart from protecting genome, PARP1 has been documented to regulate many intracellular processes inter alia gene transcription by physically interacting with chromatin bound proteins and by their ADP-ribosylation. Our recent findings indicate that expression of PARP1 decreases during the differentiation of human CD34+ hematopoietic stem cells to monocytes as a consequence of differentiation-associated cell growth arrest and formation of E2F4-RBL2-HDAC1-SWI/SNF repressive complex at the promoter of this gene. Since the RBL2 complexes repress genes in a E2F-dependent manner and are widespread in the genome in G0 arrested cells, we asked (a) if RBL2 directly contributes to defining monocyte phenotype and function by targeting gene promoters and (b) if RBL2 controls gene transcription indirectly by repressing PARP1. For identification of genes controlled by RBL2 and/or PARP1,we used primer libraries for surface receptors and TLR signaling mediators, genes were silenced by siRNA or shRNA, analysis of gene promoter occupation by selected proteins was carried out by ChIP-qPCR, while statistical analysis in GraphPad Prism 5 and STATISTICA, ChIP-Seq data were analysed in Galaxy 2.5.0.0. On the list of 28 genes regulated by RBL2, we identified only four solely repressed by RBL2-E2F4-HDAC1-BRM complex. Surprisingly, 24 out of 28 emerged genes controlled by RBL2 were co-regulated by PARP1 in six different manners. In one mode of RBL2/PARP1 co-operation, represented by MAP2K6 and MAPK3, PARP1 was found to associate with gene promoters upon RBL2 silencing, which was previously shown to restore PARP1 expression in monocytes. PARP1 effect on gene transcription was observed only in the presence of active EP300, which acetylated gene promoters and activated transcription. Further analysis revealed that PARP1 binding to MA2K6 and MAPK3 promoters enabled recruitment of EP300 in monocytes, while in proliferating cancer cell lines, which actively transcribe PARP1, this protein maintained EP300 at the promoters of MA2K6 and MAPK3. Genome-wide analysis revealed a similar distribution of PARP1 and EP300 around transcription start sites and the co-occupancy of some gene promoters by PARP1 and EP300 in cancer cells. Here, we described a new RBL2/PARP1/EP300 axis which controls gene transcription regardless of the cell type. In this model cell, cycle-dependent transcription of PARP1 regulates expression of some genes repressed by RBL2 upon cell cycle limitation. Thus, RBL2 may indirectly regulate transcription of some genes by controlling the expression of EP300-recruiting PARP1. Acknowledgement: This work was financed by Polish National Science Centre grants nr DEC-2013/11/D/NZ2/00033 and DEC-2015/19/N/NZ2/01735. L.V. is funded by the National Research, Development and Innovation Office grants GINOP-2.3.2-15-2016-00020 TUMORDNS, GINOP-2.3.2-15-2016-00048-STAYALIVE and OTKA K112336. AR is supported by Polish Ministry of Science and Higher Education 776/STYP/11/2016.

Keywords: retinoblastoma transcriptional co-repressor like 2 (RBL2), poly(ADP-ribose) polymerase 1 (PARP1), E1A binding protein p300 (EP300), monocytes

Procedia PDF Downloads 183
253 Deleterious SNP’s Detection Using Machine Learning

Authors: Hamza Zidoum

Abstract:

This paper investigates the impact of human genetic variation on the function of human proteins using machine-learning algorithms. Single-Nucleotide Polymorphism represents the most common form of human genome variation. We focus on the single amino-acid polymorphism located in the coding region as they can affect the protein function leading to pathologic phenotypic change. We use several supervised Machine Learning methods to identify structural properties correlated with increased risk of the missense mutation being damaging. SVM associated with Principal Component Analysis give the best performance.

Keywords: single-nucleotide polymorphism, machine learning, feature selection, SVM

Procedia PDF Downloads 356
252 Opinion Mining and Sentiment Analysis on DEFT

Authors: Najiba Ouled Omar, Azza Harbaoui, Henda Ben Ghezala

Abstract:

Current research practices sentiment analysis with a focus on social networks, DEfi Fouille de Texte (DEFT) (Text Mining Challenge) evaluation campaign focuses on opinion mining and sentiment analysis on social networks, especially social network Twitter. It aims to confront the systems produced by several teams from public and private research laboratories. DEFT offers participants the opportunity to work on regularly renewed themes and proposes to work on opinion mining in several editions. The purpose of this article is to scrutinize and analyze the works relating to opinions mining and sentiment analysis in the Twitter social network realized by DEFT. It examines the tasks proposed by the organizers of the challenge and the methods used by the participants.

Keywords: opinion mining, sentiment analysis, emotion, polarity, annotation, OSEE, figurative language, DEFT, Twitter, Tweet

Procedia PDF Downloads 114
251 Genomic Characterisation of Equine Sarcoid-derived Bovine Papillomavirus Type 1 and 2 Using Nanopore-Based Sequencing

Authors: Lien Gysens, Bert Vanmechelen, Maarten Haspeslagh, Piet Maes, Ann Martens

Abstract:

Bovine papillomavirus (BPV) types 1 and 2 play a central role in the etiology of the most common neoplasm in horses, the equine sarcoid. The unknown mechanism behind the unique variety in a clinical presentation on the one hand and the host-dependent clinical outcome of BPV-1 infection, on the other hand, indicate the involvement of additional factors. Earlier studies have reported the potential functional significance of intratypic sequence variants, along with the existence of sarcoid-sourced BPV variants. Therefore, intratypic sequence variation seems to be an important emerging viral factor. This study aimed to give a broad insight in sarcoid-sourced BPV variation and explore its potential association with disease presentation. In order to do this, a nanopore sequencing approach was successfully optimized for screening a wide spectrum of clinical samples. Specimens of each tumour were initially screened for BPV-1/-2 by quantitative real-time PCR. A custom-designed primer set was used on BPV-positive samples to amplify the complete viral genome in two multiplex PCR reactions, resulting in a set of overlapping amplicons. For phylogenetic analysis, separate alignments were made of all available complete genome sequences for BPV-1/-2. The resulting alignments were used to infer Bayesian phylogenetic trees. We found substantial genetic variation among sarcoid-derived BPV-1, although this variation could not be linked to disease severity. Several of the BPV-1 genomes had multiple major deletions. Remarkably, the majority of the cluster within the region coding for late viral genes. Together with the extensiveness (up to 603 nucleotides) of the described deletions, this suggests an altered function of L1/L2 in disease pathogenesis. By generating a significant amount of complete-length BPV genomes, we succeeded in introducing next-generation sequencing into veterinary research focusing on the equine sarcoid, thus facilitating the first report of both nanopore-based sequencing of complete sarcoid-sourced BPV-1/-2 and the simultaneous nanopore sequencing of multiple complete genomes originating from a single clinical sample.

Keywords: Bovine papillomavirus, equine sarcoid, horse, nanopore sequencing, phylogenetic analysis

Procedia PDF Downloads 156
250 Number Variation of the Personal Pronoun We in American Spoken English

Authors: Qiong Hu, Ming Yue

Abstract:

Language variation signals the newest usage of language community, which might become the developmental trend of that language. The personal pronoun we is prescribed as a plural pronoun in grammar, but its number value is more flexible in actual use. Based on the homemade Friends corpus, the present research explores the number value of the first person pronoun we in nowadays American spoken English. With consideration of the subjectivity of we, this paper used ‘we+ PCU (Perception-cognation-utterance) verbs’ collocations and ‘we+ plural categories’ as the parameters. Results from corpus data and manual annotation show that: 1) the overall frequency of we has been increasing; 2) we has been increasingly used with other plural categories, indicating a weakening of its plural reference; and 3) we has been increasingly used with PCU (perception-cognition-utterance) verbs of strong subjectivity, indicating a strengthening of its singular reference. All these seem to support our hypothesis that we is undergoing the process of further grammaticalization towards a singular reference, though future evidence is needed to attest the bold prediction.

Keywords: number, PCU verbs, personal pronoun we,

Procedia PDF Downloads 207
249 Transcriptomic Analysis of Acanthamoeba castellanii Virulence Alteration by Epigenetic DNA Methylation

Authors: Yi-Hao Wong, Li-Li Chan, Chee-Onn Leong, Stephen Ambu, Joon-Wah Mak, Priyasashi Sahu

Abstract:

Background: Acanthamoeba is a genus of amoebae which lives as a free-living in nature or as a human pathogen that causes severe brain and eye infections. Virulence potential of Acanthamoeba is not constant and can change with growth conditions. DNA methylation, an epigenetic process which adds methyl groups to DNA, is used by eukaryotic cells, including several human parasites to control their gene expression. We used qPCR, siRNA gene silencing, and RNA sequencing (RNA-Seq) to study DNA-methyltransferase gene family (DNMT) in order to indicate the possibility of its involvement in programming Acanthamoeba virulence potential. Methods: A virulence-attenuated Acanthamoeba isolate (designation: ATCC; original isolate: ATCC 50492) was subjected to mouse passages to restore its pathogenicity; a virulence-reactivated isolate (designation: AC/5) was generated. Several established factors associated with Acanthamoeba virulence phenotype were examined to confirm the succession of reactivation process. Differential gene expression of DNMT between ATCC and AC/5 isolates was performed by qPCR. Silencing on DNMT gene expression in AC/5 isolate was achieved by siRNA duplex. Total RNAs extracted from ATCC, AC/5, and siRNA-treated (designation: si-146) were subjected to RNA-Seq for comparative transcriptomic analysis in order to identify the genome-wide effect of DNMT in regulating Acanthamoeba gene expression. qPCR was performed to validate the RNA-Seq results. Results: Physiological and cytophatic assays demonstrated an increased in virulence potential of AC/5 isolate after mouse passages. DNMT gene expression was significantly higher in AC/5 compared to ATCC isolate (p ≤ 0.01) by qPCR. si-146 duplex reduced DNMT gene expression in AC/5 isolate by 30%. Comparative transcriptome analysis identified the differentially expressed genes, with 3768 genes in AC/5 vs ATCC isolate; 2102 genes in si-146 vs AC/5 isolate and 3422 genes in si-146 vs ATCC isolate, respectively (fold-change of ≥ 2 or ≤ 0.5, p-value adjusted (padj) < 0.05). Of these, 840 and 1262 genes were upregulated and downregulated, respectively, in si-146 vs AC/5 isolate. Eukaryotic orthologous group (KOG) assignments revealed a higher percentage of downregulated gene expression in si-146 compared to AC/5 isolate, were related to posttranslational modification, signal transduction and energy production. Gene Ontology (GO) terms for those downregulated genes shown were associated with transport activity, oxidation-reduction process, and metabolic process. Among these downregulated genes were putative genes encoded for heat shock proteins, transporters, ubiquitin-related proteins, proteins for vesicular trafficking (small GTPases), and oxidoreductases. Functional analysis of similar predicted proteins had been described in other parasitic protozoa for their survival and pathogenicity. Decreased expression of these genes in si146-treated isolate may account in part for Acanthamoeba reduced pathogenicity. qPCR on 6 selected genes upregulated in AC/5 compared to ATCC isolate corroborated the RNA sequencing findings, indicating a good concordance between these two analyses. Conclusion: To the best of our knowledge, this study represents the first genome-wide analysis of DNA methylation and its effects on gene expression in Acanthamoeba spp. The present data indicate that DNA methylation has substantial effect on global gene expression, allowing further dissection of the genome-wide effects of DNA-methyltransferase gene in regulating Acanthamoeba pathogenicity.

Keywords: Acanthamoeba, DNA methylation, RNA sequencing, virulence

Procedia PDF Downloads 174
248 Prevalence and Mechanisms of Antibiotic Resistance in Escherichia coli Isolated from Mastitic Dairy Cattle in Canada

Authors: Satwik Majumder, Dongyun Jung, Jennifer Ronholm, Saji George

Abstract:

Bovine mastitis is the most common infectious disease in dairy cattle, with major economic implications for the dairy industry worldwide. Continuous monitoring for the emergence of antimicrobial resistance (AMR) among bacterial isolates from dairy farms is vital not only for animal husbandry but also for public health. In this study, the prevalence of AMR in 113 Escherichia coli isolates from cases of bovine clinical mastitis in Canada was investigated. Kirby-Bauer disk diffusion test with 18 antibiotics and microdilution method with three heavy metals (copper, zinc, and silver) was performed to determine the antibiotic and heavy-metal susceptibility. Resistant strains were assessed for efflux and ß-lactamase activities besides assessing biofilm formation and hemolysis. Whole-genome sequences for each of the isolates were examined to detect the presence of genes corresponding to the observed AMR and virulence factors. Phenotypic analysis revealed that 32 isolates were resistant to one or more antibiotics, and 107 showed resistance against at least one heavy metal. Quinolones and silver were the most efficient against the tested isolates. Among the AMR isolates, AcrAB-TolC efflux activity and ß-lactamase enzyme activities were detected in 13 and 14 isolates, respectively. All isolates produced biofilm but with different capacities, and 33 isolates showed α-hemolysin activity. A positive correlation (Pearson r = +0.89) between efflux pump activity and quantity of biofilm was observed. Genes associated with aggregation, adhesion, cyclic di-GMP, quorum sensing were detected in the AMR isolates, corroborating phenotype observations. This investigation showed the prevalence of AMR in E. coli isolates from bovine clinical mastitis. The results also suggest the inadequacy of antimicrobials with a single mode of action to curtail AMR bacteria with multiple mechanisms of resistance and virulence factors. Therefore, it calls for combinatorial therapy for the effective management of AMR infections in dairy farms and combats its potential transmission to the food supply chain through milk and dairy products.

Keywords: antimicrobial resistance, E. coli, bovine mastitis, antibiotics, heavy-metals, efflux pump, ß-lactamase enzyme, biofilm, whole-genome sequencing

Procedia PDF Downloads 183
247 Finding Bicluster on Gene Expression Data of Lymphoma Based on Singular Value Decomposition and Hierarchical Clustering

Authors: Alhadi Bustaman, Soeganda Formalidin, Titin Siswantining

Abstract:

DNA microarray technology is used to analyze thousand gene expression data simultaneously and a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been used for analyzing gene expression data. However, when analyzing very large and heterogeneous collections of gene expression data, conventional clustering methods often cannot produce a satisfactory solution. Biclustering algorithm has been used as an alternative approach to identifying structures from gene expression data. In this paper, we introduce a transform technique based on singular value decomposition to identify normalized matrix of gene expression data followed by Mixed-Clustering algorithm and the Lift algorithm, inspired in the node-deletion and node-addition phases proposed by Cheng and Church based on Agglomerative Hierarchical Clustering (AHC). Experimental study on standard datasets demonstrated the effectiveness of the algorithm in gene expression data.

Keywords: agglomerative hierarchical clustering (AHC), biclustering, gene expression data, lymphoma, singular value decomposition (SVD)

Procedia PDF Downloads 257
246 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: deep learning, disease prediction, end-to-end machine learning, metagenomics, multiple instance learning, precision medicine

Procedia PDF Downloads 102
245 CRISPR/Cas9 Based Gene Stacking in Plants for Virus Resistance Using Site-Specific Recombinases

Authors: Sabin Aslam, Sultan Habibullah Khan, James G. Thomson, Abhaya M. Dandekar

Abstract:

Losses due to viral diseases are posing a serious threat to crop production. A quick breakdown of resistance to viruses like Cotton Leaf Curl Virus (CLCuV) demands the application of a proficient technology to engineer durable resistance. Gene stacking has recently emerged as a potential approach for integrating multiple genes in crop plants. In the present study, recombinase technology has been used for site-specific gene stacking. A target vector (pG-Rec) was designed for engineering a predetermined specific site in the plant genome whereby genes can be stacked repeatedly. Using Agrobacterium-mediated transformation, the pG-Rec was transformed into Coker-312 along with Nicotiana tabacum L. cv. Xanthi and Nicotiana benthamiana. The transgene analysis of target lines was conducted through junction PCR. The transgene positive target lines were used for further transformations to site-specifically stack two genes of interest using Bxb1 and PhiC31 recombinases. In the first instance, Cas9 driven by multiplex gRNAs (for Rep gene of CLCuV) was site-specifically integrated into the target lines and determined by the junction PCR and real-time PCR. The resulting plants were subsequently used to stack the second gene of interest (AVP3 gene from Arabidopsis for enhancing cotton plant growth). The addition of the genes is simultaneously achieved with the removal of marker genes for recycling with the next round of gene stacking. Consequently, transgenic marker-free plants were produced with two genes stacked at the specific site. These transgenic plants can be potential germplasm to introduce resistance against various strains of cotton leaf curl virus (CLCuV) and abiotic stresses. The results of the research demonstrate gene stacking in crop plants, a technology that can be used to introduce multiple genes sequentially at predefined genomic sites. The current climate change scenario highlights the use of such technologies so that gigantic environmental issues can be tackled by several traits in a single step. After evaluating virus resistance in the resulting plants, the lines can be a primer to initiate stacking of further genes in Cotton for other traits as well as molecular breeding with elite cotton lines.

Keywords: cotton, CRISPR/Cas9, gene stacking, genome editing, recombinases

Procedia PDF Downloads 124
244 Provenance in Scholarly Publications: Introducing the provCite Ontology

Authors: Maria Joseph Israel, Ahmed Amer

Abstract:

Our work aims to broaden the application of provenance technology beyond its traditional domains of scientific workflow management and database systems by offering a general provenance framework to capture richer and extensible metadata in unstructured textual data sources such as literary texts, commentaries, translations, and digital humanities. Specifically, we demonstrate the feasibility of capturing and representing expressive provenance metadata, including more of the context for citing scholarly works (e.g., the authors’ explicit or inferred intentions at the time of developing his/her research content for publication), while also supporting subsequent augmentation with similar additional metadata (by third parties, be they human or automated). To better capture the nature and types of possible citations, in our proposed provenance scheme metaScribe, we extend standard provenance conceptual models to form our proposed provCite ontology. This provides a conceptual framework which can accurately capture and describe more of the functional and rhetorical properties of a citation than can be achieved with any current models.

Keywords: knowledge representation, provenance architecture, ontology, metadata, bibliographic citation, semantic web annotation

Procedia PDF Downloads 94
243 Modeling Competition Between Subpopulations with Variable DNA Content in Resource-Limited Microenvironments

Authors: Parag Katira, Frederika Rentzeperis, Zuzanna Nowicka, Giada Fiandaca, Thomas Veith, Jack Farinhas, Noemi Andor

Abstract:

Resource limitations shape the outcome of competitions between genetically heterogeneous pre-malignant cells. One example of such heterogeneity is in the ploidy (DNA content) of pre-malignant cells. A whole-genome duplication (WGD) transforms a diploid cell into a tetraploid one and has been detected in 28-56% of human cancers. If a tetraploid subclone expands, it consistently does so early in tumor evolution, when cell density is still low, and competition for nutrients is comparatively weak – an observation confirmed for several tumor types. WGD+ cells need more resources to synthesize increasing amounts of DNA, RNA, and proteins. To quantify resource limitations and how they relate to ploidy, we performed a PAN cancer analysis of WGD, PET/CT, and MRI scans. Segmentation of >20 different organs from >900 PET/CT scans were performed with MOOSE. We observed a strong correlation between organ-wide population-average estimates of Oxygen and the average ploidy of cancers growing in the respective organ (Pearson R = 0.66; P= 0.001). In-vitro experiments using near-diploid and near-tetraploid lineages derived from a breast cancer cell line supported the hypothesis that DNA content influences Glucose- and Oxygen-dependent proliferation-, death- and migration rates. To model how subpopulations with variable DNA content compete in the resource-limited environment of the human brain, we developed a stochastic state-space model of the brain (S3MB). The model discretizes the brain into voxels, whereby the state of each voxel is defined by 8+ variables that are updated over time: stiffness, Oxygen, phosphate, glucose, vasculature, dead cells, migrating cells and proliferating cells of various DNA content, and treat conditions such as radiotherapy and chemotherapy. Well-established Fokker-Planck partial differential equations govern the distribution of resources and cells across voxels. We applied S3MB on sequencing and imaging data obtained from a primary GBM patient. We performed whole genome sequencing (WGS) of four surgical specimens collected during the 1ˢᵗ and 2ⁿᵈ surgeries of the GBM and used HATCHET to quantify its clonal composition and how it changes between the two surgeries. HATCHET identified two aneuploid subpopulations of ploidy 1.98 and 2.29, respectively. The low-ploidy clone was dominant at the time of the first surgery and became even more dominant upon recurrence. MRI images were available before and after each surgery and registered to MNI space. The S3MB domain was initiated from 4mm³ voxels of the MNI space. T1 post and T2 flair scan acquired after the 1ˢᵗ surgery informed tumor cell densities per voxel. Magnetic Resonance Elastography scans and PET/CT scans informed stiffness and Glucose access per voxel. We performed a parameter search to recapitulate the GBM’s tumor cell density and ploidy composition before the 2ⁿᵈ surgery. Results suggest that the high-ploidy subpopulation had a higher Glucose-dependent proliferation rate (0.70 vs. 0.49), but a lower Glucose-dependent death rate (0.47 vs. 1.42). These differences resulted in spatial differences in the distribution of the two subpopulations. Our results contribute to a better understanding of how genomics and microenvironments interact to shape cell fate decisions and could help pave the way to therapeutic strategies that mimic prognostically favorable environments.

Keywords: tumor evolution, intra-tumor heterogeneity, whole-genome doubling, mathematical modeling

Procedia PDF Downloads 49
242 Integration of Microarray Data into a Genome-Scale Metabolic Model to Study Flux Distribution after Gene Knockout

Authors: Mona Heydari, Ehsan Motamedian, Seyed Abbas Shojaosadati

Abstract:

Prediction of perturbations after genetic manipulation (especially gene knockout) is one of the important challenges in systems biology. In this paper, a new algorithm is introduced that integrates microarray data into the metabolic model. The algorithm was used to study the change in the cell phenotype after knockout of Gss gene in Escherichia coli BW25113. Algorithm implementation indicated that gene deletion resulted in more activation of the metabolic network. Growth yield was more and less regulating gene were identified for mutant in comparison with the wild-type strain.

Keywords: metabolic network, gene knockout, flux balance analysis, microarray data, integration

Procedia PDF Downloads 558
241 Social Data-Based Users Profiles' Enrichment

Authors: Amel Hannech, Mehdi Adda, Hamid Mcheick

Abstract:

In this paper, we propose a generic model of user profile integrating several elements that may positively impact the research process. We exploit the classical behavior of users and integrate a delimitation process of their research activities into several research sessions enriched with contextual and temporal information, which allows reflecting the current interests of these users in every period of time and infer data freshness. We argue that the annotation of resources gives more transparency on users' needs. It also strengthens social links among resources and users, and can so increase the scope of the user profile. Based on this idea, we integrate the social tagging practice in order to exploit the social users' behavior to enrich their profiles. These profiles are then integrated into a recommendation system in order to predict the interesting personalized items of users allowing to assist them in their researches and further enrich their profiles. In this recommendation, we provide users new research experiences.

Keywords: user profiles, topical ontology, contextual information, folksonomies, tags' clusters, data freshness, association rules, data recommendation

Procedia PDF Downloads 243
240 Inbreeding Study Using Runs of Homozygosity in Nelore Beef Cattle

Authors: Priscila A. Bernardes, Marcos E. Buzanskas, Luciana C. A. Regitano, Ricardo V. Ventura, Danisio P. Munari

Abstract:

The best linear unbiased predictor (BLUP) is a method commonly used in genetic evaluations of breeding programs. However, this approach can lead to higher inbreeding coefficients in the population due to the intensive use of few bulls with higher genetic potential, usually presenting some degree of relatedness. High levels of inbreeding are associated to low genetic viability, fertility, and performance for some economically important traits and therefore, should be constantly monitored. Unreliable pedigree data can also lead to misleading results. Genomic information (i.e., single nucleotide polymorphism – SNP) is a useful tool to estimate the inbreeding coefficient. Runs of homozygosity have been used to evaluate homozygous segments inherited due to direct or collateral inbreeding and allows inferring population selection history. This study aimed to evaluate runs of homozygosity (ROH) and inbreeding in a population of Nelore beef cattle. A total of 814 animals were genotyped with the Illumina BovineHD BeadChip and the quality control was carried out excluding SNPs located in non-autosomal regions, with unknown position, with a p-value in the Hardy-Weinberg equilibrium lower than 10⁻⁵, call rate lower than 0.98 and samples with the call rate lower than 0.90. After the quality control, 809 animals and 509,107 SNPs remained for analyses. For the ROH analysis, PLINK software was used considering segments with at least 50 SNPs with a minimum length of 1Mb in each animal. The inbreeding coefficient was calculated using the ratio between the sum of all ROH sizes and the size of the whole genome (2,548,724kb). A total of 25.711 ROH were observed, presenting mean, median, minimum, and maximum length of 3.34Mb, 2Mb, 1Mb, and 80.8Mb, respectively. The number of SNPs present in ROH segments varied from 50 to 14.954. The longest ROH length was observed in one animal, which presented a length of 634Mb (24.88% of the genome). Four bulls were among the 10 animals with the longest extension of ROH, presenting 11% of ROH with length higher than 10Mb. Segments longer than 10Mb indicate recent inbreeding. Therefore, the results indicate an intensive use of few sires in the studied data. The distribution of ROH along the chromosomes showed that chromosomes 5 and 6 presented a large number of segments when compared to other chromosomes. The mean, median, minimum, and maximum inbreeding coefficients were 5.84%, 5.40%, 0.00%, and 24.88%, respectively. Although the mean inbreeding was considered low, the ROH indicates a recent and intensive use of few sires, which should be avoided for the genetic progress of breed.

Keywords: autozygosity, Bos taurus indicus, genomic information, single nucleotide polymorphism

Procedia PDF Downloads 129
239 A Web-Based Self-Learning Grammar for Spoken Language Understanding

Authors: S. Biondi, V. Catania, R. Di Natale, A. R. Intilisano, D. Panno

Abstract:

One of the major goals of Spoken Dialog Systems (SDS) is to understand what the user utters. In the SDS domain, the Spoken Language Understanding (SLU) Module classifies user utterances by means of a pre-definite conceptual knowledge. The SLU module is able to recognize only the meaning previously included in its knowledge base. Due the vastity of that knowledge, the information storing is a very expensive process. Updating and managing the knowledge base are time-consuming and error-prone processes because of the rapidly growing number of entities like proper nouns and domain-specific nouns. This paper proposes a solution to the problem of Name Entity Recognition (NER) applied to a SDS domain. The proposed solution attempts to automatically recognize the meaning associated with an utterance by using the PANKOW (Pattern based Annotation through Knowledge On the Web) method at runtime. The method being proposed extracts information from the Web to increase the SLU knowledge module and reduces the development effort. In particular, the Google Search Engine is used to extract information from the Facebook social network.

Keywords: spoken dialog system, spoken language understanding, web semantic, name entity recognition

Procedia PDF Downloads 318
238 PCR Based DNA Analysis in Detecting P53 Mutation in Human Breast Cancer (MDA-468)

Authors: Debbarma Asis, Guha Chandan

Abstract:

Tumor Protein-53 (P53) is one of the tumor suppressor proteins. P53 regulates the cell cycle that conserves stability by preventing genome mutation. It is named so as it runs as 53-kilodalton (kDa) protein on Polyacrylamide gel electrophoresis although the actual mass is 43.7 kDa. Experimental evidence has indicated that P53 cancer mutants loses tumor suppression activity and subsequently gain oncogenic activities to promote tumourigenesis. Tumor-specific DNA has recently been detected in the plasma of breast cancer patients. Detection of tumor-specific genetic materials in cancer patients may provide a unique and valuable tumor marker for diagnosis and prognosis. Commercially available MDA-468 breast cancer cell line was used for the proposed study.

Keywords: tumor protein (P53), cancer mutants, MDA-468, tumor suppressor gene

Procedia PDF Downloads 454
237 EnumTree: An Enumerative Biclustering Algorithm for DNA Microarray Data

Authors: Haifa Ben Saber, Mourad Elloumi

Abstract:

In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative tree (EnumTree) for biclustering of binary microarray data. is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of ​​EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA micryarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Biclusters with different numbers of rows. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevent biclusters.

Keywords: DNA microarray, biclustering, gene expression data, tree, datamining.

Procedia PDF Downloads 353
236 Meta Mask Correction for Nuclei Segmentation in Histopathological Image

Authors: Jiangbo Shi, Zeyu Gao, Chen Li

Abstract:

Nuclei segmentation is a fundamental task in digital pathology analysis and can be automated by deep learning-based methods. However, the development of such an automated method requires a large amount of data with precisely annotated masks which is hard to obtain. Training with weakly labeled data is a popular solution for reducing the workload of annotation. In this paper, we propose a novel meta-learning-based nuclei segmentation method which follows the label correction paradigm to leverage data with noisy masks. Specifically, we design a fully conventional meta-model that can correct noisy masks by using a small amount of clean meta-data. Then the corrected masks are used to supervise the training of the segmentation model. Meanwhile, a bi-level optimization method is adopted to alternately update the parameters of the main segmentation model and the meta-model. Extensive experimental results on two nuclear segmentation datasets show that our method achieves the state-of-the-art result. In particular, in some noise scenarios, it even exceeds the performance of training on supervised data.

Keywords: deep learning, histopathological image, meta-learning, nuclei segmentation, weak annotations

Procedia PDF Downloads 120
235 Association of Nuclear – Mitochondrial Epistasis with BMI in Type 1 Diabetes Mellitus Patients

Authors: Agnieszka H. Ludwig-Slomczynska, Michal T. Seweryn, Przemyslaw Kapusta, Ewelina Pitera, Katarzyna Cyganek, Urszula Mantaj, Lucja Dobrucka, Ewa Wender-Ozegowska, Maciej T. Malecki, Pawel Wolkow

Abstract:

Obesity results from an imbalance between energy intake and its expenditure. Genome-Wide Association Study (GWAS) analyses have led to discovery of only about 100 variants influencing body mass index (BMI), which explain only a small portion of genetic variability. Analysis of gene epistasis gives a chance to discover another part. Since it was shown that interaction and communication between nuclear and mitochondrial genome are indispensable for normal cell function, we have looked for epistatic interactions between the two genomes to find their correlation with BMI. Methods: The analysis was performed on 366 T1DM patients using Illumina Infinium OmniExpressExome-8 chip and followed by imputation on Michigan Imputation Server. Only genes which influence mitochondrial functioning (listed in Human MitoCarta 2.0) were included in the analysis – variants of nuclear origin (MAF > 5%) in 1140 genes and 42 mitochondrial variants (MAF > 1%). Gene expression analysis was performed on GTex data. Association analysis between genetic variants and BMI was performed with the use of Linear Mixed Models as implemented in the package 'GENESIS' in R. Analysis of association between mRNA expression and BMI was performed with the use of linear models and standard significance tests in R. Results: Among variants involved in epistasis between mitochondria and nucleus we have identified one in mitochondrial transcription factor, TFB2M (rs6701836). It interacted with mitochondrial variants localized to MT-RNR1 (p=0.0004, MAF=15%), MT-ND2 (p=0.07, MAF=5%) and MT-ND4 (p=0.01, MAF=1.1%). Analysis of the interaction between nuclear variant rs6701836 (nuc) and rs3021088 localized to MT-ND2 mitochondrial gene (mito) has shown that the combination of the two led to BMI decrease (p=0.024). Each of the variants on its own does not correlate with higher BMI [p(nuc)=0.856, p(mito)=0.116)]. Although rs6701836 is intronic, it influences gene expression in the thyroid (p=0.000037). rs3021088 is a missense variant that leads to alanine to threonine substitution in the MT-ND2 gene which belongs to complex I of the electron transport chain. The analysis of the influence of genetic variants on gene expression has confirmed the trend explained above – the interaction of the two genes leads to BMI decrease (p=0.0308). Each of the mRNAs on its own is associated with higher BMI (p(mito)=0.0244 and p(nuc)=0.0269). Conclusıons: Our results show that nuclear-mitochondrial epistasis can influence BMI in T1DM patients. The correlation between transcription factor expression and mitochondrial genetic variants will be subject to further analysis.

Keywords: body mass index, epistasis, mitochondria, type 1 diabetes

Procedia PDF Downloads 152
234 A Comprehensive Analysis of LACK (Leishmania Homologue of Receptors for Activated C Kinase) in the Context of Visceral Leishmaniasis

Authors: Sukrat Sinha, Abhay Kumar, Shanthy Sundaram

Abstract:

The Leishmania homologue of activated C kinase (LACK) is known T cell epitope from soluble Leishmania antigens (SLA) that confers protection against Leishmania challenge. This antigen has been found to be highly conserved among Leishmania strains. LACK has been shown to be protective against L. donovani challenge. A comprehensive analysis of several LACK sequences was completed. The analysis shows a high level of conservation, lower variability and higher antigenicity in specific portions of the LACK protein. This information provides insights for the potential consideration of LACK as a putative candidate in the context of visceral Leishmaniasis vaccine target.

Keywords: bioinformatics, genome assembly, leishmania activated protein kinase c (lack), next-generation sequencing

Procedia PDF Downloads 317
233 De Novo Assembly and Characterization of the Transcriptome during Seed Development, and Generation of Genic-SSR Markers in Pomegranate (Punica granatum L.)

Authors: Ozhan Simsek, Dicle Donmez, Burhanettin Imrak, Ahsen Isik Ozguven, Yildiz Aka Kacar

Abstract:

Pomegranate (Punica granatum L.) is known to be one of the oldest edible fruit tree species, with a wide geographical global distribution. Fruits from the two defined varieties (Hicaznar and 33N26) were taken at intervals after pollination and fertilization at different sizes. Seed samples were used for transcriptome sequencing. Primary sequencing was produced by Illumina Hi-Seq™ 2000. Firstly, we had raw reads, and it was subjected to quality control (QC). Raw reads were filtered into clean reads and aligned to the reference sequences. De novo analysis was performed to detect genes expressed in seeds of pomegranate varieties. We performed downstream analysis to determine differentially expressed genes. We generated about 27.09 gb bases in total after Illumina Hi-Seq sequencing. All samples were assembled together, we got 59,264 Unigenes, the total length, average length, N50, and GC content of Unigenes are 84.547.276 bp, 1.426 bp, 2,137 bp, and 46.20 %, respectively. Unigenes were annotated with 7 functional databases, finally, 42.681(NR: 72.02%), 39.660 (NT: 66.92%), 30.790 (Swissprot: 51.95%), 20.212 (COG: 34.11%), 27.689 (KEGG: 46.72%), 12.328 (GO: 20.80%), and 33,833 (Interpro: 57.09%) Unigenes were annotated. With functional annotation results, we detected 42.376 CDS, and 4.999 SSR distribute on 16.143 Unigenes.

Keywords: next generation sequencing, SSR, RNA-Seq, Illumina

Procedia PDF Downloads 217
232 Engagement Analysis Using DAiSEE Dataset

Authors: Naman Solanki, Souraj Mondal

Abstract:

With the world moving towards online communication, the video datastore has exploded in the past few years. Consequently, it has become crucial to analyse participant’s engagement levels in online communication videos. Engagement prediction of people in videos can be useful in many domains, like education, client meetings, dating, etc. Video-level or frame-level prediction of engagement for a user involves the development of robust models that can capture facial micro-emotions efficiently. For the development of an engagement prediction model, it is necessary to have a widely-accepted standard dataset for engagement analysis. DAiSEE is one of the datasets which consist of in-the-wild data and has a gold standard annotation for engagement prediction. Earlier research done using the DAiSEE dataset involved training and testing standard models like CNN-based models, but the results were not satisfactory according to industry standards. In this paper, a multi-level classification approach has been introduced to create a more robust model for engagement analysis using the DAiSEE dataset. This approach has recorded testing accuracies of 0.638, 0.7728, 0.8195, and 0.866 for predicting boredom level, engagement level, confusion level, and frustration level, respectively.

Keywords: computer vision, engagement prediction, deep learning, multi-level classification

Procedia PDF Downloads 96
231 Physicians’ Knowledge and Perception of Gene Profiling in Malaysia: A Pilot Study

Authors: Farahnaz Amini, Woo Yun Kin, Lazwani Kolandaiveloo

Abstract:

Availability of different genetic tests after completion of Human Genome Project increases the physicians’ responsibility to keep themselves update on the potential implementation of these genetic tests in their daily practice. However, due to numbers of barriers, still many of physicians are not either aware of these tests or are not willing to offer or refer their patients for genetic tests. This study was conducted an anonymous, cross-sectional, mailed-based survey to develop a primary data of Malaysian physicians’ level of knowledge and perception of gene profiling. Questionnaire had 29 questions. Total scores on selected questions were used to assess the level of knowledge. The highest possible score was 11. Descriptive statistics, one way ANOVA and chi-squared test was used for statistical analysis. Sixty three completed questionnaires was returned by 27 general practitioners (GPs) and 36 medical specialists. Responders’ age range from 24 to 55 years old (mean 30.2 ± 6.4). About 40% of the participants rated themselves as having poor level of knowledge in genetics in general whilst 60% believed that they have fair level of knowledge. However, almost half (46%) of the respondents felt that they were not knowledgeable about available genetic tests. A majority (94%) of the responders were not aware of any lab or company which is offering gene profiling services in Malaysia. Only 4% of participants were aware of using gene profiling for detection of dosage of some drugs. Respondents perceived greater utility of gene profiling for breast cancer (38%) compared to the colorectal familial cancer (3%). The score of knowledge ranged from 2 to 8 (mean 4.38 ± 1.67). Non-significant differences between score of knowledge of GPs and specialists were observed, with score of 4.19 and 4.58 respectively. There was no significant association between any demographic factors and level of knowledge. However, those who graduated between years 2001 to 2005 had higher level of knowledge. Overall, 83% of participants showed relatively high level of perception on value of gene profiling to detect patient’s risk of disease. However, low perception was observed for both statements of using gene profiling for general population in order to alter their lifestyle (25%) as well as having the full sequence of a patient genome for the purpose of determining a patient’s best match for treatment (18%). The lack of clinical guidelines, limited provider knowledge and awareness, lack of time and resources to educate patients, lack of evidence-based clinical information and cost of tests were the most barriers of ordering gene profiling mentioned by physicians. In conclusion Malaysian physicians who participate in this study had mediocre level of knowledge and awareness in gene profiling. The low exposure to the genetic questions and problems might be a key predictor of lack of awareness and knowledge on available genetic tests. Educational and training workshop might be useful in helping Malaysian physicians incorporate genetic profiling into practice for eligible patients.

Keywords: gene profiling, knowledge, Malaysia, physician

Procedia PDF Downloads 308
230 Photosynthesis Metabolism Affects Yield Potentials in Jatropha curcas L.: A Transcriptomic and Physiological Data Analysis

Authors: Nisha Govender, Siju Senan, Zeti-Azura Hussein, Wickneswari Ratnam

Abstract:

Jatropha curcas, a well-described bioenergy crop has been extensively accepted as future fuel need especially in tropical regions. Ideal planting material required for large-scale plantation is still lacking. Breeding programmes for improved J. curcas varieties are rendered difficult due to limitations in genetic diversity. Using a combined transcriptome and physiological data, we investigated the molecular and physiological differences in high and low yielding Jatropha curcas to address plausible heritable variations underpinning these differences, in regard to photosynthesis, a key metabolism affecting yield potentials. A total of 6 individual Jatropha plant from 4 accessions described as high and low yielding planting materials were selected from the Experimental Plot A, Universiti Kebangsaan Malaysia (UKM), Bangi. The inflorescence and shoots were collected for transcriptome study. For the physiological study, each individual plant (n=10) from the high and low yielding populations were screened for agronomic traits, chlorophyll content and stomatal patterning. The J. curcas transcriptomes are available under BioProject PRJNA338924 and BioSample SAMN05827448-65, respectively Each transcriptome was subjected to functional annotation analysis of sequence datasets using the BLAST2Go suite; BLASTing, mapping, annotation, statistical analysis and visualization Large-scale phenotyping of the number of fruits per plant (NFPP) and fruits per inflorescence (FPI) classified the high yielding Jatropha accessions with average NFPP =60 and FPI > 10, whereas the low yielding accessions yielded an average NFPP=10 and FPI < 5. Next generation sequencing revealed genes with differential expressions in the high yielding Jatropha relative to the low yielding plants. Distinct differences were observed in transcript level associated to photosynthesis metabolism. DEGs collection in the low yielding population showed comparable CAM photosynthetic metabolism and photorespiration, evident as followings: phosphoenolpyruvate phosphate translocator chloroplastic like isoform with 2.5 fold change (FC) and malate dehydrogenase (2.03 FC). Green leaves have the most pronounced photosynthetic activity in a plant body due to significant accumulation of chloroplast. In most plants, the leaf is always the dominant photosynthesizing heart of the plant body. Large number of the DEGS in the high-yielding population were found attributable to chloroplast and chloroplast associated events; STAY-GREEN chloroplastic, Chlorophyllase-1-like (5.08 FC), beta-amylase (3.66 FC), chlorophyllase-chloroplastic-like (3.1 FC), thiamine thiazole chloroplastic like (2.8 FC), 1-4, alpha glucan branching enzyme chloroplastic amyliplastic (2.6FC), photosynthetic NDH subunit (2.1 FC) and protochlorophyllide chloroplastic (2 FC). The results were parallel to a significant increase in chlorophyll a content in the high yielding population. In addition to the chloroplast associated transcript abundance, the TOO MANY MOUTHS (TMM) at 2.9 FC, which code for distant stomatal distribution and patterning in the high-yielding population may explain high concentration of CO2. The results were in agreement with the role of TMM. Clustered stomata causes back diffusion in the presence of gaps localized closely to one another. We conclude that high yielding Jatropha population corresponds to a collective function of C3 metabolism with a low degree of CAM photosynthetic fixation. From the physiological descriptions, high chlorophyll a content and even distribution of stomata in the leaf contribute to better photosynthetic efficiency in the high yielding Jatropha compared to the low yielding population.

Keywords: chlorophyll, gene expression, genetic variation, stomata

Procedia PDF Downloads 214
229 Detection of PCD-Related Transcription Factors for Improving Salt Tolerance in Plant

Authors: A. Bahieldin, A. Atef, S. Edris, N. O. Gadalla, S. M. Hassan, M. A. Al-Kordy, A. M. Ramadan, A. S. M. Al- Hajar, F. M. El-Domyati

Abstract:

The idea of this work is based on a natural exciting phenomenon suggesting that suppression of genes related to the program cell death (or PCD) mechanism might help the plant cells to efficiently tolerate abiotic stresses. The scope of this work was the detection of PCD-related transcription factors (TFs) that might also be related to salt stress tolerance in plant. Two model plants, e.g., tobacco and Arabidopsis, were utilized in order to investigate this phenomenon. Occurrence of PCD was first proven by Evans blue staining and DNA laddering after tobacco leaf discs were treated with oxalic acid (OA) treatment (20 mM) for 24 h. A number of 31 TFs up regulated after 2 h and co-expressed with genes harboring PCD-related domains were detected via RNA-Seq analysis and annotation. These TFs were knocked down via virus induced gene silencing (VIGS), an RNA interference (RNAi) approach, and tested for their influence on triggering PCD machinery. Then, Arabidopsis SALK knocked out T-DNA insertion mutants in selected TFs analogs to those in tobacco were tested under salt stress (up to 250 mM NaCl) in order to detect the influence of different TFs on conferring salt tolerance in Arabidopsis. Involvement of a number of candidate abiotic-stress related TFs was investigated.

Keywords: VIGS, PCD, RNA-Seq, transcription factors

Procedia PDF Downloads 251
228 Computational Investigation on Structural and Functional Impact of Oncogenes and Tumor Suppressor Genes on Cancer

Authors: Abdoulie K. Ceesay

Abstract:

Within the sequence of the whole genome, it is known that 99.9% of the human genome is similar, whilst our difference lies in just 0.1%. Among these minor dissimilarities, the most common type of genetic variations that occurs in a population is SNP, which arises due to nucleotide substitution in a protein sequence that leads to protein destabilization, alteration in dynamics, and other physio-chemical properties’ distortions. While causing variations, they are equally responsible for our difference in the way we respond to a treatment or a disease, including various cancer types. There are two types of SNPs; synonymous single nucleotide polymorphism (sSNP) and non-synonymous single nucleotide polymorphism (nsSNP). sSNP occur in the gene coding region without causing a change in the encoded amino acid, while nsSNP is deleterious due to its replacement of a nucleotide residue in the gene sequence that results in a change in the encoded amino acid. Predicting the effects of cancer related nsSNPs on protein stability, function, and dynamics is important due to the significance of phenotype-genotype association of cancer. In this thesis, Data of 5 oncogenes (ONGs) (AKT1, ALK, ERBB2, KRAS, BRAF) and 5 tumor suppressor genes (TSGs) (ESR1, CASP8, TET2, PALB2, PTEN) were retrieved from ClinVar. Five common in silico tools; Polyphen, Provean, Mutation Assessor, Suspect, and FATHMM, were used to predict and categorize nsSNPs as deleterious, benign, or neutral. To understand the impact of each variation on the phenotype, Maestro, PremPS, Cupsat, and mCSM-NA in silico structural prediction tools were used. This study comprises of in-depth analysis of 10 cancer gene variants downloaded from Clinvar. Various analysis of the genes was conducted to derive a meaningful conclusion from the data. Research done indicated that pathogenic variants are more common among ONGs. Our research also shows that pathogenic and destabilizing variants are more common among ONGs than TSGs. Moreover, our data indicated that ALK(409) and BRAF(86) has higher benign count among ONGs; whilst among TSGs, PALB2(1308) and PTEN(318) genes have higher benign counts. Looking at the individual cancer genes predisposition or frequencies of causing cancer according to our research data, KRAS(76%), BRAF(55%), and ERBB2(36%) among ONGs; and PTEN(29%) and ESR1(17%) among TSGs have higher tendencies of causing cancer. Obtained results can shed light to the future research in order to pave new frontiers in cancer therapies.

Keywords: tumor suppressor genes (TSGs), oncogenes (ONGs), non synonymous single nucleotide polymorphism (nsSNP), single nucleotide polymorphism (SNP)

Procedia PDF Downloads 65
227 From Primer Generation to Chromosome Identification: A Primer Generation Genotyping Method for Bacterial Identification and Typing

Authors: Wisam H. Benamer, Ehab A. Elfallah, Mohamed A. Elshaari, Farag A. Elshaari

Abstract:

A challenge for laboratories is to provide bacterial identification and antibiotic sensitivity results within a short time. Hence, advancement in the required technology is desirable to improve timing, accuracy and quality. Even with the current advances in methods used for both phenotypic and genotypic identification of bacteria the need is there to develop method(s) that enhance the outcome of bacteriology laboratories in accuracy and time. The hypothesis introduced here is based on the assumption that the chromosome of any bacteria contains unique sequences that can be used for its identification and typing. The outcome of a pilot study designed to test this hypothesis is reported in this manuscript. Methods: The complete chromosome sequences of several bacterial species were downloaded to use as search targets for unique sequences. Visual basic and SQL server (2014) were used to generate a complete set of 18-base long primers, a process started with reverse translation of randomly chosen 6 amino acids to limit the number of the generated primers. In addition, the software used to scan the downloaded chromosomes using the generated primers for similarities was designed, and the resulting hits were classified according to the number of similar chromosomal sequences, i.e., unique or otherwise. Results: All primers that had identical/similar sequences in the selected genome sequence(s) were classified according to the number of hits in the chromosomes search. Those that were identical to a single site on a single bacterial chromosome were referred to as unique. On the other hand, most generated primers sequences were identical to multiple sites on a single or multiple chromosomes. Following scanning, the generated primers were classified based on ability to differentiate between medically important bacterial and the initial results looks promising. Conclusion: A simple strategy that started by generating primers was introduced; the primers were used to screen bacterial genomes for match. Primer(s) that were uniquely identical to specific DNA sequence on a specific bacterial chromosome were selected. The identified unique sequence can be used in different molecular diagnostic techniques, possibly to identify bacteria. In addition, a single primer that can identify multiple sites in a single chromosome can be exploited for region or genome identification. Although genomes sequences draft of isolates of organism DNA enable high throughput primer design using alignment strategy, and this enhances diagnostic performance in comparison to traditional molecular assays. In this method the generated primers can be used to identify an organism before the draft sequence is completed. In addition, the generated primers can be used to build a bank for easy access of the primers that can be used to identify bacteria.

Keywords: bacteria chromosome, bacterial identification, sequence, primer generation

Procedia PDF Downloads 171
226 Enhanced Arabic Semantic Information Retrieval System Based on Arabic Text Classification

Authors: A. Elsehemy, M. Abdeen , T. Nazmy

Abstract:

Since the appearance of the Semantic web, many semantic search techniques and models were proposed to exploit the information in ontology to enhance the traditional keyword-based search. Many advances were made in languages such as English, German, French and Spanish. However, other languages such as Arabic are not fully supported yet. In this paper we present a framework for ontology based information retrieval for Arabic language. Our system consists of four main modules, namely query parser, indexer, search and a ranking module. Our approach includes building a semantic index by linking ontology concepts to documents, including an annotation weight for each link, to be used in ranking the results. We also augmented the framework with an automatic document categorizer, which enhances the overall document ranking. We have built three Arabic domain ontologies: Sports, Economic and Politics as example for the Arabic language. We built a knowledge base that consists of 79 classes and more than 1456 instances. The system is evaluated using the precision and recall metrics. We have done many retrieval operations on a sample of 40,316 documents with a size 320 MB of pure text. The results show that the semantic search enhanced with text classification gives better performance results than the system without classification.

Keywords: Arabic text classification, ontology based retrieval, Arabic semantic web, information retrieval, Arabic ontology

Procedia PDF Downloads 502
225 Fat-Tail Test of Regulatory DNA Sequences

Authors: Jian-Jun Shu

Abstract:

The statistical properties of CRMs are explored by estimating similar-word set occurrence distribution. It is observed that CRMs tend to have a fat-tail distribution for similar-word set occurrence. Thus, the fat-tail test with two fatness coefficients is proposed to distinguish CRMs from non-CRMs, especially from exons. For the first fatness coefficient, the separation accuracy between CRMs and exons is increased as compared with the existing content-based CRM prediction method – fluffy-tail test. For the second fatness coefficient, the computing time is reduced as compared with fluffy-tail test, making it very suitable for long sequences and large data-base analysis in the post-genome time. Moreover, these indexes may be used to predict the CRMs which have not yet been observed experimentally. This can serve as a valuable filtering process for experiment.

Keywords: statistical approach, transcription factor binding sites, cis-regulatory modules, DNA sequences

Procedia PDF Downloads 268