Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 1382

Search results for: whole genome sequence

1352 In Agile Projects - Arithmetic Sequence is More Effective than Fibonacci Sequence to Use for Estimating the Implementation Effort of User Stories

Abstract:

The estimation of effort in software development is a complex task. The traditional Waterfall approach used to develop software systems requires a lot of time to estimate the effort needed to implement user requirements. Agile manifesto, however, is currently more used in the industry than the Waterfall to develop software systems. In Agile, the user requirement is referred to as a user story. Agile teams mostly use the Fibonacci sequence 1, 2, 3, 5, 8, 11, etc. in estimating the effort needed to implement the user story. This work shows through analysis that the Arithmetic sequence, e.g., 3, 6, 9, 12, etc., is more effective than the Fibonacci sequence in estimating the user stories. This paper mathematically and visually proves the effectiveness of the Arithmetic sequence over the FB sequence.

Keywords: agie, scrum, estimation, fibonacci sequence

Procedia PDF Downloads 164

1351 Systematic Identification of Noncoding Cancer Driver Somatic Mutations

Authors: Zohar Manber, Ran Elkon

Abstract:

Accumulation of somatic mutations (SMs) in the genome is a major driving force of cancer development. Most SMs in the tumor's genome are functionally neutral; however, some cause damage to critical processes and provide the tumor with a selective growth advantage (termed cancer driver mutations). Current research on functional significance of SMs is mainly focused on finding alterations in protein coding sequences. However, the exome comprises only 3% of the human genome, and thus, SMs in the noncoding genome significantly outnumber those that map to protein-coding regions. Although our understanding of noncoding driver SMs is very rudimentary, it is likely that disruption of regulatory elements in the genome is an important, yet largely underexplored mechanism by which somatic mutations contribute to cancer development. The expression of most human genes is controlled by multiple enhancers, and therefore, it is conceivable that regulatory SMs are distributed across different enhancers of the same target gene. Yet, to date, most statistical searches for regulatory SMs have considered each regulatory element individually, which may reduce statistical power. The first challenge in considering the cumulative activity of all the enhancers of a gene as a single unit is to map enhancers to their target promoters. Such mapping defines for each gene its set of regulating enhancers (termed "set of regulatory elements" (SRE)). Considering multiple enhancers of each gene as one unit holds great promise for enhancing the identification of driver regulatory SMs. However, the success of this approach is greatly dependent on the availability of comprehensive and accurate enhancer-promoter (E-P) maps. To date, the discovery of driver regulatory SMs has been hindered by insufficient sample sizes and statistical analyses that often considered each regulatory element separately. In this study, we analyzed more than 2,500 whole-genome sequence (WGS) samples provided by The Cancer Genome Atlas (TCGA) and The International Cancer Genome Consortium (ICGC) in order to identify such driver regulatory SMs. Our analyses took into account the combinatorial aspect of gene regulation by considering all the enhancers that control the same target gene as one unit, based on E-P maps from three genomics resources. The identification of candidate driver noncoding SMs is based on their recurrence. We searched for SREs of genes that are "hotspots" for SMs (that is, they accumulate SMs at a significantly elevated rate). To test the statistical significance of recurrence of SMs within a gene's SRE, we used both global and local background mutation rates. Using this approach, we detected - in seven different cancer types - numerous "hotspots" for SMs. To support the functional significance of these recurrent noncoding SMs, we further examined their association with the expression level of their target gene (using gene expression data provided by the ICGC and TCGA for samples that were also analyzed by WGS).

Keywords: cancer genomics, enhancers, noncoding genome, regulatory elements

Procedia PDF Downloads 81

1350 Unraveling the Evolution of Mycoplasma Hominis Through Its Genome Sequence

Authors: Boutheina Ben Abdelmoumen Mardassi, Salim Chibani, Safa Boujemaa, Amaury Vaysse, Julien Guglielmini, Elhem Yacoub

Abstract:

Background and aim: Mycoplasma hominis (MH) is a pathogenic bacterium belonging to the Mollicutes class. It causes a wide range of gynecological infections and infertility among adults. Recently, we have explored for the first time the phylodistribution of Tunisian M. hominis clinical strains using an expanded MLST. We have demonstrated their distinction into two pure lineages, which each corresponding to a specific pathotype: genital infections and infertility. The aim of this project is to gain further insight into the evolutionary dynamics and the specific genetic factors that distinguish MH pathotypes Methods: Whole genome sequencing of Mycoplasma hominis clinical strains was performed using illumina Miseq. Denovo assembly was performed using a publicly available in-house pipeline. We used prokka to annotate the genomes, panaroo to generate the gene presence matrix and Jolytree to establish the phylogenetic tree. We used treeWAS to identify genetic loci associated with the pathothype of interest from the presence matrix and phylogenetic tree. Results: Our results revealed a clear categorization of the 62 MH clinical strains into two distinct genetic lineages, with each corresponding to a specific pathotype.; gynecological infections and infertility[AV1] . Genome annotation showed that GC content is ranging between 26 and 27%, which is a known characteristic of Mycoplasma genome. Housekeeping genes belonging to the core genome are highly conserved among our strains. TreeWas identified 4 virulence genes associated with the pathotype gynecological infection. encoding for asparagine--tRNA ligase, restriction endonuclease subunit S, Eco47II restriction endonuclease, and transcription regulator XRE (involved in tolerance to oxidative stress). Five genes have been identified that have a statistical association with infertility, tow lipoprotein, one hypothetical protein, a glycosyl transferase involved in capsule synthesis, and pyruvate kinase involved in biofilm formation. All strains harbored an efflux pomp that belongs to the family of multidrug resistance ABC transporter, which confers resistance to a wide range of antibiotics. Indeed many adhesion factors and lipoproteins (p120, p120', p60, p80, Vaa) have been checked and confirmed in our strains with a relatively 99 % to 96 % conserved domain and hypervariable domain that represent 1 to 4 % of the reference sequence extracted from gene bank. Conclusion: In summary, this study led to the identification of specific genetic loci associated with distinct pathotypes in M hominis.

Keywords: mycoplasma hominis, infertility, gynecological infections, virulence genes, antibiotic resistance

Procedia PDF Downloads 56

1349 Genome Editing in Sorghum: Advancements and Future Possibilities: A Review

Authors: Micheale Yifter Weldemichael, Hailay Mehari Gebremedhn, Teklehaimanot Hailesslasie

Abstract:

The advancement of target-specific genome editing tools, including clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein9 (Cas9), mega-nucleases, base editing (BE), prime editing (PE), transcription activator-like endonucleases (TALENs), and zinc-finger nucleases (ZFNs), have paved the way for a modern era of gene editing. CRISPR/Cas9, as a versatile, simple, cost-effective and robust system for genome editing, has dominated the genome manipulation field over the last few years. The application of CRISPR/Cas9 in sorghum improvement is particularly vital in the context of ecological, environmental and agricultural challenges, as well as global climate change. In this context, gene editing using CRISPR/Cas9 can improve nutritional value, yield, resistance to pests and disease and tolerance to different abiotic stress. Moreover, CRISPR/Cas9 can potentially perform complex editing to reshape already available elite varieties and new genetic variations. However, existing research is targeted at improving even further the effectiveness of the CRISPR/Cas9 genome editing techniques to fruitfully edit endogenous sorghum genes. These findings suggest that genome editing is a feasible and successful venture in sorghum. Newer improvements and developments of CRISPR/Cas9 techniques have further qualified researchers to modify extra genes in sorghum with improved efficiency. The fruitful application and development of CRISPR techniques for genome editing in sorghum will not only help in gene discovery, creating new, improved traits in sorghum regulating gene expression sorghum functional genomics, but also in making site-specific integration events.

Keywords: CRISPR/Cas9, genome editing, quality, sorghum, stress, yield

Procedia PDF Downloads 34

1348 Development and Characterization of Polymorphic Genomic-SSR Markers in Asian Long-Horned Beetle (Anoplophora glabripennis)

Authors: Zhao Yang Liu, Jing Tao

Abstract:

The Asian long-horned beetle, Anoplophora glabripennis (Motschulsky) (Coleoptera: Cerambycidae: Lamiinae), is a wood-borer and polyphagous xylophages native to Asia and killing healthy trees. As it causes serious danger to trees, the beetle has been paid close attention in the world. However, the genetic markers limited, especially microsatellite. In this study, 24 novel simple sequence repeat (SSR) molecular markers, a powerful tool for genetic diversity studies and linkage map construction, were developed and characterized from whole genome shotgun sequences. We developed SSR loci of 2 to 6 repeated and perfect units including 9895 points, the density of SSRs was found one SSR per 56.57 kb and the abundance of SSR was 0.02/kb, besides 140 types of repeats motifs were found. Half of the 48 pairs SSR primers (containing 4 di-, 7 tri-, 2 tetra- and 11 hexamers SSRs) we selected randomly from 1222 pairs of primers were polymorphism. The number of alleles for these markers in 48 individuals varied from 3 to 21 with an average of 7.71, the number of effective alleles ranged from 1.22 to 9.97 with an average of 3.54. Besides this, the polymorphic information content (PIC) ranged from 0.18 to 0.89 with a mean of 0.65, And Shannon's Information index (I) ranged from 0.46 to 2.62 with an average of 1.44. The results suggest that the method for screening of SSR in the whole genome is feasible and efficient. SSR markers developed in this study can be used for population genetic studies of A. glabripennis. Moreover, they may also be helpful for the development of microsatellites for other Coleoptera.

Keywords: SSR markers, Anoplophora glabripennis, genetic diversity, whole genome

Procedia PDF Downloads 361

1347 CMPD: Cancer Mutant Proteome Database

Authors: Po-Jung Huang, Chi-Ching Lee, Bertrand Chin-Ming Tan, Yuan-Ming Yeh, Julie Lichieh Chu, Tin-Wen Chen, Cheng-Yang Lee, Ruei-Chi Gan, Hsuan Liu, Petrus Tang

Abstract:

Whole-exome sequencing focuses on the protein coding regions of disease/cancer associated genes based on a priori knowledge is the most cost-effective method to study the association between genetic alterations and disease. Recent advances in high throughput sequencing technologies and proteomic techniques has provided an opportunity to integrate genomics and proteomics, allowing readily detectable mutated peptides corresponding to mutated genes. Since sequence database search is the most widely used method for protein identification using Mass spectrometry (MS)-based proteomics technology, a mutant proteome database is required to better approximate the real protein pool to improve disease-associated mutated protein identification. Large-scale whole exome/genome sequencing studies were launched by National Cancer Institute (NCI), Broad Institute, and The Cancer Genome Atlas (TCGA), which provide not only a comprehensive report on the analysis of coding variants in diverse samples cell lines but a invaluable resource for extensive research community. No existing database is available for the collection of mutant protein sequences related to the identified variants in these studies. CMPD is designed to address this issue, serving as a bridge between genomic data and proteomic studies and focusing on protein sequence-altering variations originated from both germline and cancer-associated somatic variations.

Keywords: TCGA, cancer, mutant, proteome

Procedia PDF Downloads 564

1346 Merging Sequence Diagrams Based Slicing

Authors: Bouras Zine Eddine, Talai Abdelouaheb

Abstract:

The need to merge software artifacts seems inherent to modern software development. Distribution of development over several teams and breaking tasks into smaller, more manageable pieces are an effective means to deal with the kind of complexity. In each case, the separately developed artifacts need to be assembled as efficiently as possible into a consistent whole in which the parts still function as described. Also, earlier changes are introduced into the life cycle and easier is their management by designers. Interaction-based specifications such as UML sequence diagrams have been found effective in this regard. As a result, sequence diagrams can be used not only for capturing system behaviors but also for merging changes in order to create a new version. The objective of this paper is to suggest a new approach to deal with the problem of software merging at the level of sequence diagrams by using the concept of dependence analysis that captures, formally, all mapping and differences between elements of sequence diagrams and serves as a key concept to create a new version of sequence diagram.

Keywords: system behaviors, sequence diagram merging, dependence analysis, sequence diagram slicing

Procedia PDF Downloads 317

1345 Biotechnological Interventions for Crop Improvement in Nutricereal Pearl Millet

Authors: Supriya Ambawat, Subaran Singh, C. Tara Satyavathi, B. S. Rajpurohit, Ummed Singh, Balraj Singh

Abstract:

Pearl millet [Pennisetum glaucum (L.) R. Br.] is an important staple food of the arid and semiarid tropical regions of Asia, Africa, and Latin America. It is rightly termed as nutricereal as it has high nutrition value and a good source of carbohydrate, protein, fat, ash, dietary fiber, potassium, magnesium, iron, zinc, etc. Pearl millet has low prolamine fraction and is gluten free which is useful for people having a gluten allergy. It has several health benefits like reduction in blood pressure, thyroid, diabe¬tes, cardiovascular and celiac diseases but its direct consumption as food has significantly declined due to several reasons. Keeping this in view, it is important to reorient the ef¬forts to generate demand through value-addition and quality improvement and create awareness on the nutritional merits of pearl millet. In India, through Indian Council of Agricultural Research-All India Coordinated Research Project on Pearl millet, multilocational coordinated trials for developed hybrids were conducted at various centers. The gene banks of pearl millet contain varieties with high levels of iron and zinc which were used to produce new pearl millet varieties with elevated iron levels bred with the high‐yielding varieties. Thus, using breeding approaches and biochemical analysis, a total of 167 hybrids and 61 varieties were identified and released for cultivation in different agro-ecological zones of the country which also includes some biofortified hybrids rich in Fe and Zn. Further, using several biotechnological interventions such as molecular markers, next-generation sequencing (NGS), association mapping, nested association mapping (NAM), MAGIC populations, genome editing, genotyping by sequencing (GBS), genome wide association studies (GWAS) advancement in millet improvement has become possible by identifying and tagging of genes underlying a trait in the genome. Using DArT markers very high density linkage maps were constructed for pearl millet. Improved HHB67 has been released using marker assisted selection (MAS) strategies, and genomic tools were used to identify Fe-Zn Quantitative Trait Loci (QTL). The draft genome sequence of millet has also opened various ways to explore pearl millet. Further, genomic positions of significantly associated simple sequence repeat (SSR) markers with iron and zinc content in the consensus map is being identified and research is in progress towards mapping QTLs for flour rancidity. The sequence information is being used to explore genes and enzymatic pathways responsible for rancidity of flour. Thus, development and application of several biotechnological approaches along with biofortification can accelerate the genetic gain targets for pearl millet improvement and help improve its quality.

Keywords: Biotechnological approaches, genomic tools, malnutrition, MAS, nutricereal, pearl millet, sequencing.

Procedia PDF Downloads 141

1344 Isolation and Characterization of Cotton Infecting Begomoviruses in Alternate Hosts from Cotton Growing Regions of Pakistan

Authors: M. Irfan Fareed, Muhammad Tahir, Alvina Gul Kazi

Abstract:

Castor bean (Ricinus communis; family Euphorbiaceae) is cultivated for the production of oil and as an ornamental plant throughout tropical regions. Leaf samples from castor bean plants with leaf curl and vein thickening were collected from areas around Okara (Pakistan) in 2011. PCR amplification using diagnostic primers showed the presence of a begomovirus and subsequently the specific pair (BurNF 5’- CCATGGTTGTGGCAGTTGATTGACAGATAC-3’, BurNR 5’- CCATGGATTCACGCACAGGGGAACCC-3’) was used to amplify and clone the whole genome of the virus. The complete nucleotide sequence was determined to be 2,759 nt (accession No. HE985227). Alignments showed the highest levels of nucleotide sequence identity (98.8%) with Cotton leaf curl Burewala virus (CLCuBuV; accession No. JF416947) No. JF416947). The virus in castor beans lacks on intact C2 gene, as is typical of CLCuBuV in cotton. An amplification product of ca. 1.4 kb was obtained in PCR with primers for betasatellites and the complete nucleotide sequence of a clone was determined to be 1373 nt (HE985228). The sequence showed 96.3% nucleotide sequence identity to the recombinant Cotton leaf curl Multan betasatellite (CLCuMB; JF502389). This is the first report of CLCuBuV and its betasatellite infecting castor bean, showing this plant species as an alternate host of the virus. Already many alternate host have been reported from different alternate host like tobacco, tomato, hibiscus, okra, ageratum, Digera arvensis, habiscus, Papaya and now in Ricinus communis. So, it is suggested that these alternate hosts should be avoided to grow near cotton growing regions.

Keywords: Ricinus communis, begomovirus, betasatellite, agriculture

Procedia PDF Downloads 490

1343 Exploring an Exome Target Capture Method for Cross-Species Population Genetic Studies

Authors: Benjamin A. Ha, Marco Morselli, Xinhui Paige Zhang, Elizabeth A. C. Heath-Heckman, Jonathan B. Puritz, David K. Jacobs

Abstract:

Next-generation sequencing has enhanced the ability to acquire massive amounts of sequence data to address classic population genetic questions for non-model organisms. Targeted approaches allow for cost effective or more precise analyses of relevant sequences; although, many such techniques require a known genome and it can be costly to purchase probes from a company. This is challenging for non-model organisms with no published genome and can be expensive for large population genetic studies. Expressed exome capture sequencing (EecSeq) synthesizes probes in the lab from expressed mRNA, which is used to capture and sequence the coding regions of genomic DNA from a pooled suite of samples. A normalization step produces probes to recover transcripts from a wide range of expression levels. This approach offers low cost recovery of a broad range of genes in the genome. This research project expands on EecSeq to investigate if mRNA from one taxon may be used to capture relevant sequences from a series of increasingly less closely related taxa. For this purpose, we propose to use the endangered Northern Tidewater goby, Eucyclogobius newberryi, a non-model organism that inhabits California coastal lagoons. mRNA will be extracted from E. newberryi to create probes and capture exomes from eight other taxa, including the more at-risk Southern Tidewater goby, E. kristinae, and more divergent species. Captured exomes will be sequenced, analyzed bioinformatically and phylogenetically, then compared to previously generated phylogenies across this group of gobies. This will provide an assessment of the utility of the technique in cross-species studies and for analyzing low genetic variation within species as is the case for E. kristinae. This method has potential applications to provide economical ways to expand population genetic and evolutionary biology studies for non-model organisms.

Keywords: coastal lagoons, endangered species, non-model organism, target capture method

Procedia PDF Downloads 161

1342 Genomic Sequence Representation Learning: An Analysis of K-Mer Vector Embedding Dimensionality

Authors: James Jr. Mashiyane, Risuna Nkolele, Stephanie J. Müller, Gciniwe S. Dlamini, Rebone L. Meraba, Darlington S. Mapiye

Abstract:

When performing language tasks in natural language processing (NLP), the dimensionality of word embeddings is chosen either ad-hoc or is calculated by optimizing the Pairwise Inner Product (PIP) loss. The PIP loss is a metric that measures the dissimilarity between word embeddings, and it is obtained through matrix perturbation theory by utilizing the unitary invariance of word embeddings. Unlike in natural language, in genomics, especially in genome sequence processing, unlike in natural language processing, there is no notion of a “word,” but rather, there are sequence substrings of length k called k-mers. K-mers sizes matter, and they vary depending on the goal of the task at hand. The dimensionality of word embeddings in NLP has been studied using the matrix perturbation theory and the PIP loss. In this paper, the sufficiency and reliability of applying word-embedding algorithms to various genomic sequence datasets are investigated to understand the relationship between the k-mer size and their embedding dimension. This is completed by studying the scaling capability of three embedding algorithms, namely Latent Semantic analysis (LSA), Word2Vec, and Global Vectors (GloVe), with respect to the k-mer size. Utilising the PIP loss as a metric to train embeddings on different datasets, we also show that Word2Vec outperforms LSA and GloVe in accurate computing embeddings as both the k-mer size and vocabulary increase. Finally, the shortcomings of natural language processing embedding algorithms in performing genomic tasks are discussed.

Keywords: word embeddings, k-mer embedding, dimensionality reduction

Procedia PDF Downloads 102

1341 Difference in Virulence Factor Genes Between Transient and Persistent Streptococcus Uberis Intramammary Infection in Dairy Cattle

Authors: Anyaphat Srithanasuwan, Noppason Pangprasit, Montira Intanon, Phongsakorn Chuammitri, Witaya Suriyasathaporn, Ynte H. Schukken

Abstract:

Streptococcus uberis is one of the most common mastitis-causing pathogens, with a wide range of intramammary infection (IMI) durations and pathogenicity. This study aimed to compare shared or unique virulence factor gene clusters distinguishing persistent and transient strains of S. uberis. A total of 139 S. uberis strains were isolated from three small-holder dairy herds with a high prevalence of S. uberis mastitis. The duration of IMI was used to categorize bacteria into two groups: transient and persistent strains with an IMI duration of less than 1 month and longer than 2 months, respectively. Six representative S. uberis strains, three from each group (transience and persistence) were selected for analysis. All transient strains exhibited multi-locus sequence types (MLST), indicating a highly diverse population of transient S. uberis. In contrast, MLST of persistent strains was available in an online database (pubMLST). Identification of virulence genes was performed using whole-genome sequencing (WGS) data. Differences in genomic size and number of virulent genes were found. For example, the BCA gene or alpha-c protein and the gene associated with capsule formation (hasAB), found in persistent strains, are important for attachment and invasion, as well as the evasion of the antimicrobial mechanisms and survival persistence, respectively. These findings suggest a genetic-level difference between the two strain types. Consequently, a comprehensive study of 139 S. uberis isolates will be conducted to perform an in-depth genetic assessment through WGS analysis on an Illumina platform.

Keywords: Streptococcus Uberis, mastitis, whole genome sequence, intramammary infection, persistent S. Uberis, transient s. Uberis

Procedia PDF Downloads 25

1340 Genome-Wide Identification and Characterization of MLO Family Genes in Pumpkin (Cucurbita maxima Duch.)

Authors: Khin Thanda Win, Chunying Zhang, Sanghyeob Lee

Abstract:

Mildew resistance locus o (Mlo), a plant-specific gene family with seven-transmembrane (TM), plays an important role in plant resistance to powdery mildew (PM). PM caused by Podosphaera xanthii is a widespread plant disease and probably represents the major fungal threat for many Cucurbits. The recent Cucurbita maxima genome sequence data provides an opportunity to identify and characterize the MLO gene family in this species. Total twenty genes (designated CmaMLO1 through CmaMLO20) have been identified by using an in silico cloning method with the MLO gene sequences of Cucumis sativus, Cucumis melo, Citrullus lanatus and Cucurbita pepo as probes. These CmaMLOs were evenly distributed on 15 chromosomes of 20 C. maxima chromosomes without any obvious clustering. Multiple sequence alignment showed that the common structural features of MLO gene family, such as TM domains, a calmodulin-binding domain and 30 important amino acid residues for MLO function, were well conserved. Phylogenetic analysis of the CmaMLO genes and other plant species reveals seven different clades (I through VII) and only clade IV is specific to monocots (rice, barley, and wheat). Phylogenetic and structural analyses provided preliminary evidence that five genes belonged to clade V could be the susceptibility genes which may play the importance role in PM resistance. This study is the first comprehensive report on MLO genes in C. maxima to our knowledge. These findings will facilitate the functional analysis of the MLOs related to PM susceptibility and are valuable resources for the development of disease resistance in pumpkin.

Keywords: Mildew resistance locus o (Mlo), powdery mildew, phylogenetic relationship, susceptibility genes

Procedia PDF Downloads 159

1339 Human Papillomavirus Type 16 E4 Gene Variation as Risk Factor for Cervical Cancer

Authors: Yudi Zhao, Ziyun Zhou, Yueting Yao, Shuying Dai, Zhiling Yan, Longyu Yang, Chuanyin Li, Li Shi, Yufeng Yao

Abstract:

HPV16 E4 gene plays an important role in viral genome amplification and release. Therefore, a variation of the E4 gene nucleic acid sequence may affect the carcinogenicity of HPV16. In order to understand the relationship between the variation of HPV16 E4 gene and cervical cancer, this study was to amplify and sequence the DNA sequences of E4 genes in 118 HPV16-positive cervical cancer patients and 151 HPV16-positive asymptomatic individuals. After obtaining E4 gene sequences, the phylogenetic trees were constructed by the Neighbor-joining method for gene variation analysis. The results showed that: 1) The distribution of HPV16 variants between the case group and the control group differed greatly (P = 0.015)，and the Asian-American（AA）variant was likely to relate to the occurrence of cervical cancer. 2) DNA sequence analysis showed that there were significant differences in the distribution of 8 variants between the case group and the control group (P < 0.05). And 3) In European (EUR) variant, two variations, C3384T (L18L) and A3449G (P39P), were associated with the initiation and development of cervical cancer. The results suggested that the variation of HPV16 E4 gene may be a contributor affecting the occurrence as well as the development of cervical cancer, and different HPV16 variants may have different carcinogenic capability.

Keywords: cervical cancer, HPV16, E4 gene, variations

Procedia PDF Downloads 144

1338 TAXAPRO, A Streamlined Pipeline to Analyze Shotgun Metagenomes

Authors: Sofia Sehli, Zainab El Ouafi, Casey Eddington, Soumaya Jbara, Kasambula Arthur Shem, Islam El Jaddaoui, Ayorinde Afolayan, Olaitan I. Awe, Allissa Dillman, Hassan Ghazal

Abstract:

The ability to promptly sequence whole genomes at a relatively low cost has revolutionized the way we study the microbiome. Microbiologists are no longer limited to studying what can be grown in a laboratory and instead are given the opportunity to rapidly identify the makeup of microbial communities in a wide variety of environments. Analyzing whole genome sequencing (WGS) data is a complex process that involves multiple moving parts and might be rather unintuitive for scientists that don’t typically work with this type of data. Thus, to help lower the barrier for less-computationally inclined individuals, TAXAPRO was developed at the first Omics Codeathon held virtually by the African Society for Bioinformatics and Computational Biology (ASBCB) in June 2021. TAXAPRO is an advanced metagenomics pipeline that accurately assembles organelle genomes from whole-genome sequencing data. TAXAPRO seamlessly combines WGS analysis tools to create a pipeline that automatically processes raw WGS data and presents organism abundance information in both a tabular and graphical format. TAXAPRO was evaluated using COVID-19 patient gut microbiome data. Analysis performed by TAXAPRO demonstrated a high abundance of Clostridia and Bacteroidia genera and a low abundance of Proteobacteria genera relative to others in the gut microbiome of patients hospitalized with COVID-19, consistent with the original findings derived using a different analysis methodology. This provides crucial evidence that the TAXAPRO workflow dispenses reliable organism abundance information overnight without the hassle of performing the analysis manually.

Keywords: metagenomics, shotgun metagenomic sequence analysis, COVID-19, pipeline, bioinformatics

Procedia PDF Downloads 177

1337 Motif Search-Aided Screening of the Pseudomonas syringae pv. Maculicola Genome for Genes Encoding Tertiary Alcohol Ester Hydrolases

Authors: M. L. Mangena, N. Mokoena, K. Rashamuse, M. G. Tlou

Abstract:

Tertiary alcohol ester (TAE) hydrolases are a group of esterases (EC 3.1.1.-) that catalyze the kinetic resolution of TAEs and as a result, they are sought-after for the production of optically pure tertiary alcohols (TAs) which are useful as building blocks for number biologically active compounds. What sets these enzymes apart is, the presence of a GGG(A)X-motif in the active site which appears to be the main reason behind their activity towards the sterically demanding TAEs. The genome of Pseudomonas syringae pv. maculicola (Psm) comprises a multitude of genes that encode esterases. We therefore, hypothesize that some of these genes encode TAE hydrolases. In this study, Psm was screened for TAE hydrolase activity using the linalyl acetate (LA) plate assay and a positive reaction was observed. As a result, the genome of Psm was screened for esterases with a GGG(A)X-motif using the motif search tool and two potential TAE hydrolase genes (PsmEST1 and 2, 1100 and 1000bp, respectively) were identified, PsmEST1 was amplified by PCR and the gene sequenced for confirmation. Analysis of the sequence data with the SingnalP 4.1 server revealed that the protein comprises a signal peptide (22 amino acid residues) on the N-terminus. Primers specific for the gene encoding the mature protein (without the signal peptide) were designed such that they contain NdeI and XhoI restriction sites for directional cloning of the PCR products into pET28a. The gene was expressed in E. coli JM109 (DE3) and the clones screened for TAE hydrolase activity using the LA plate assay. A positive clone was selected, overexpressed and the protein purified using nickel affinity chromatography. The activity of the esterase towards LA was confirmed using thin layer chromatography.

Keywords: hydrolases, tertiary alcohol esters, tertiary alcohols, screening, Pseudomonas syringae pv., maculicola genome, esterase activity, linalyl acetate

Procedia PDF Downloads 326

1336 The Cleavage of DNA by the Anti-Tumor Drug Bleomycin at the Transcription Start Sites of Human Genes Using Genome-Wide Techniques

Authors: Vincent Murray

Abstract:

The glycopeptide bleomycin is used in the treatment of testicular cancer, Hodgkin's lymphoma, and squamous cell carcinoma. Bleomycin damages and cleaves DNA in human cells, and this is considered to be the main mode of action for bleomycin's anti-tumor activity. In particular, double-strand breaks are thought to be the main mechanism for the cellular toxicity of bleomycin. Using Illumina next-generation DNA sequencing techniques, the genome-wide sequence specificity of bleomycin-induced double-strand breaks was determined in human cells. The degree of bleomycin cleavage was also assessed at the transcription start sites (TSSs) of actively transcribed genes and compared with non-transcribed genes. It was observed that bleomycin preferentially cleaved at the TSSs of actively transcribed human genes. There was a correlation between the degree of this enhanced cleavage at TSSs and the level of transcriptional activity. Bleomycin cleavage is also affected by chromatin structure and at TSSs, the peaks of bleomycin cleavage were approximately 200 bp apart. This indicated that bleomycin was able to detect phased nucleosomes at the TSSs of actively transcribed human genes. The genome-wide cleavage pattern of the bleomycin analogues 6′-deoxy-BLM Z and zorbamycin was also investigated in human cells. As found for bleomycin, these bleomycin analogues also preferentially cleaved at the TSSs of actively transcribed human genes. The cytotoxicity (IC₅₀ values) of these bleomycin analogues was determined. It was found that the degree of enhanced cleavage at TSSs was inversely correlated with the IC₅₀ values of the bleomycin analogues. This suggested that the level of cleavage at the TSSs of actively transcribed human genes was important for the cytotoxicity of bleomycin and analogues. Hence this study provided a deeper understanding of the cellular processes involved in the cancer chemotherapeutic activity of bleomycin.

Keywords: anti-tumour activity, bleomycin analogues, chromatin structure, genome-wide study, Illumina DNA sequencing

Procedia PDF Downloads 97

1335 Genome-Wide Association Study Identify COL2A1 as a Susceptibility Gene for the Hand Development Failure of Kashin-Beck Disease

Authors: Feng Zhang

Abstract:

Kashin-Beck disease (KBD) is a chronic osteochondropathy. The mechanism of hand growth and development failure of KBD remains elusive now. In this study, we conducted a two-stage genome-wide association study (GWAS) of palmar length-width ratio (LWR) of KBD, totally involving 493 Chinese Han KBD patients. Affymetrix Genome Wide Human SNP Array 6.0 was applied for SNP genotyping. Association analysis was conducted by PLINK software. Imputation analysis was performed by IMPUTE against the reference panel of the 1000 genome project. In the GWAS, the most significant association was observed between palmar LWR and rs2071358 of COL2A1 gene (P value = 4.68×10-8). Imputation analysis identified 3 SNPs surrounding rs2071358 with significant or suggestive association signals. Replication study observed additional significant association signals at both rs2071358 (P value = 0.017) and rs4760608 (P value = 0.002) of COL2A1 gene after Bonferroni correction. Our results suggest that COL2A1 gene was a novel susceptibility gene involved in the growth and development failure of hand of KBD.

Keywords: Kashin-Beck disease, genome-wide association study, COL2A1, hand

Procedia PDF Downloads 183

1334 Metagenomics-Based Molecular Epidemiology of Viral Diseases

Authors: Vyacheslav Furtak, Merja Roivainen, Olga Mirochnichenko, Majid Laassri, Bella Bidzhieva, Tatiana Zagorodnyaya, Vladimir Chizhikov, Konstantin Chumakov

Abstract:

Molecular epidemiology and environmental surveillance are parts of a rational strategy to control infectious diseases. They have been widely used in the worldwide campaign to eradicate poliomyelitis, which otherwise would be complicated by the inability to rapidly respond to outbreaks and determine sources of the infection. The conventional scheme involves isolation of viruses from patients and the environment, followed by their identification by nucleotide sequences analysis to determine phylogenetic relationships. This is a tedious and time-consuming process that yields definitive results when it may be too late to implement countermeasures. Because of the difficulty of high-throughput full-genome sequencing, most such studies are conducted by sequencing only capsid genes or their parts. Therefore the important information about the contribution of other parts of the genome and inter- and intra-species recombination to viral evolution is not captured. Here we propose a new approach based on the rapid concentration of sewage samples with tangential flow filtration followed by deep sequencing and reconstruction of nucleotide sequences of viruses present in the samples. The entire nucleic acids content of each sample is sequenced, thus preserving in digital format the complete spectrum of viruses. A set of rapid algorithms was developed to separate deep sequence reads into discrete populations corresponding to each virus and assemble them into full-length consensus contigs, as well as to generate a complete profile of sequence heterogeneities in each of them. This provides an effective approach to study molecular epidemiology and evolution of natural viral populations.

Keywords: poliovirus, eradication, environmental surveillance, laboratory diagnosis

Procedia PDF Downloads 250

1333 Encryption and Decryption of Nucleic Acid Using Deoxyribonucleic Acid Algorithm

Authors: Iftikhar A. Tayubi, Aabdulrahman Alsubhi, Abdullah Althrwi

Abstract:

The deoxyribonucleic acid text provides a single source of high-quality Cryptography about Deoxyribonucleic acid sequence for structural biologists. We will provide an intuitive, well-organized and user-friendly web interface that allows users to encrypt and decrypt Deoxy Ribonucleic Acid sequence text. It includes complex, securing by using Algorithm to encrypt and decrypt Deoxy Ribonucleic Acid sequence. The utility of this Deoxy Ribonucleic Acid Sequence Text is that, it can provide a user-friendly interface for users to Encrypt and Decrypt store the information about Deoxy Ribonucleic Acid sequence. These interfaces created in this project will satisfy the demands of the scientific community by providing fully encrypt of Deoxy Ribonucleic Acid sequence during this website. We have adopted a methodology by using C# and Active Server Page.NET for programming which is smart and secure. Deoxy Ribonucleic Acid sequence text is a wonderful piece of equipment for encrypting large quantities of data, efficiently. The users can thus navigate from one encoding and store orange text, depending on the field for user’s interest. Algorithm classification allows a user to Protect the deoxy ribonucleic acid sequence from change, whether an alteration or error occurred during the Deoxy Ribonucleic Acid sequence data transfer. It will check the integrity of the Deoxy Ribonucleic Acid sequence data during the access.

Keywords: algorithm, ASP.NET, DNA, encrypt, decrypt

Procedia PDF Downloads 203

1332 From Primer Generation to Chromosome Identification: A Primer Generation Genotyping Method for Bacterial Identification and Typing

Authors: Wisam H. Benamer, Ehab A. Elfallah, Mohamed A. Elshaari, Farag A. Elshaari

Abstract:

A challenge for laboratories is to provide bacterial identification and antibiotic sensitivity results within a short time. Hence, advancement in the required technology is desirable to improve timing, accuracy and quality. Even with the current advances in methods used for both phenotypic and genotypic identification of bacteria the need is there to develop method(s) that enhance the outcome of bacteriology laboratories in accuracy and time. The hypothesis introduced here is based on the assumption that the chromosome of any bacteria contains unique sequences that can be used for its identification and typing. The outcome of a pilot study designed to test this hypothesis is reported in this manuscript. Methods: The complete chromosome sequences of several bacterial species were downloaded to use as search targets for unique sequences. Visual basic and SQL server (2014) were used to generate a complete set of 18-base long primers, a process started with reverse translation of randomly chosen 6 amino acids to limit the number of the generated primers. In addition, the software used to scan the downloaded chromosomes using the generated primers for similarities was designed, and the resulting hits were classified according to the number of similar chromosomal sequences, i.e., unique or otherwise. Results: All primers that had identical/similar sequences in the selected genome sequence(s) were classified according to the number of hits in the chromosomes search. Those that were identical to a single site on a single bacterial chromosome were referred to as unique. On the other hand, most generated primers sequences were identical to multiple sites on a single or multiple chromosomes. Following scanning, the generated primers were classified based on ability to differentiate between medically important bacterial and the initial results looks promising. Conclusion: A simple strategy that started by generating primers was introduced; the primers were used to screen bacterial genomes for match. Primer(s) that were uniquely identical to specific DNA sequence on a specific bacterial chromosome were selected. The identified unique sequence can be used in different molecular diagnostic techniques, possibly to identify bacteria. In addition, a single primer that can identify multiple sites in a single chromosome can be exploited for region or genome identification. Although genomes sequences draft of isolates of organism DNA enable high throughput primer design using alignment strategy, and this enhances diagnostic performance in comparison to traditional molecular assays. In this method the generated primers can be used to identify an organism before the draft sequence is completed. In addition, the generated primers can be used to build a bank for easy access of the primers that can be used to identify bacteria.

Keywords: bacteria chromosome, bacterial identification, sequence, primer generation

Procedia PDF Downloads 166

1331 Prediction of Solanum Lycopersicum Genome Encoded microRNAs Targeting Tomato Spotted Wilt Virus

Authors: Muhammad Shahzad Iqbal, Zobia Sarwar, Salah-ud-Din

Abstract:

Tomato spotted wilt virus (TSWV) belongs to the genus Tospoviruses (family Bunyaviridae). It is one of the most devastating pathogens of tomato (Solanum Lycopersicum) and heavily damages the crop yield each year around the globe. In this study, we retrieved 329 mature miRNA sequences from two microRNA databases (miRBase and miRSoldb) and checked the putative target sites in the downloaded-genome sequence of TSWV. A consensus of three miRNA target prediction tools (RNA22, miRanda and psRNATarget) was used to screen the false-positive microRNAs targeting sites in the TSWV genome. These tools calculated different target sites by calculating minimum free energy (mfe), site-complementarity, minimum folding energy and other microRNA-mRNA binding factors. R language was used to plot the predicted target-site data. All the genes having possible target sites for different miRNAs were screened by building a consensus table. Out of these 329 mature miRNAs predicted by three algorithms, only eight miRNAs met all the criteria/threshold specifications. MC-Fold and MC-Sym were used to predict three-dimensional structures of miRNAs and further analyzed in USCF chimera to visualize the structural and conformational changes before and after microRNA-mRNA interactions. The results of the current study show that the predicted eight miRNAs could further be evaluated by in vitro experiments to develop TSWV-resistant transgenic tomato plants in the future.

Keywords: tomato spotted wild virus (TSWV), Solanum lycopersicum, plant virus, miRNAs, microRNA target prediction, mRNA

Procedia PDF Downloads 121

1330 Complete Genome Sequence Analysis of Pasteurella multocida Subspecies multocida Serotype A Strain PMTB2.1

Authors: Shagufta Jabeen, Faez J. Firdaus Abdullah, Zunita Zakaria, Nurulfiza M. Isa, Yung C. Tan, Wai Y. Yee, Abdul R. Omar

Abstract:

Pasteurella multocida (PM) is an important veterinary opportunistic pathogen particularly associated with septicemic pasteurellosis, pneumonic pasteurellosis and hemorrhagic septicemia in cattle and buffaloes. P. multocida serotype A has been reported to cause fatal pneumonia and septicemia. Pasteurella multocida subspecies multocida of serotype A Malaysian isolate PMTB2.1 was first isolated from buffaloes died of septicemia. In this study, the genome of P. multocida strain PMTB2.1 was sequenced using third-generation sequencing technology, PacBio RS2 system and analyzed bioinformatically via de novo analysis followed by in-depth analysis based on comparative genomics. Bioinformatics analysis based on de novo assembly of PacBio raw reads generated 3 contigs followed by gap filling of aligned contigs with PCR sequencing, generated a single contiguous circular chromosome with a genomic size of 2,315,138 bp and a GC content of approximately 40.32% (Accession number CP007205). The PMTB2.1 genome comprised of 2,176 protein-coding sequences, 6 rRNA operons and 56 tRNA and 4 ncRNAs sequences. The comparative genome sequence analysis of PMTB2.1 with nine complete genomes which include Actinobacillus pleuropneumoniae, Haemophilus parasuis, Escherichia coli and five P. multocida complete genome sequences including, PM70, PM36950, PMHN06, PM3480, PMHB01 and PMTB2.1 was carried out based on OrthoMCL analysis and Venn diagram. The analysis showed that 282 CDs (13%) are unique to PMTB2.1and 1,125 CDs with orthologs in all. This reflects overall close relationship of these bacteria and supports the classification in the Gamma subdivision of the Proteobacteria. In addition, genomic distance analysis among all nine genomes indicated that PMTB2.1 is closely related with other five Pasteurella species with genomic distance less than 0.13. Synteny analysis shows subtle differences in genetic structures among different P.multocida indicating the dynamics of frequent gene transfer events among different P. multocida strains. However, PM3480 and PM70 exhibited exceptionally large structural variation since they were swine and chicken isolates. Furthermore, genomic structure of PMTB2.1 is more resembling that of PM36950 with a genomic size difference of approximately 34,380 kb (smaller than PM36950) and strain-specific Integrative and Conjugative Elements (ICE) which was found only in PM36950 is absent in PMTB2.1. Meanwhile, two intact prophages sequences of approximately 62 kb were found to be present only in PMTB2.1. One of phage is similar to transposable phage SfMu. The phylogenomic tree was constructed and rooted with E. coli, A. pleuropneumoniae and H. parasuis based on OrthoMCL analysis. The genomes of P. multocida strain PMTB2.1 were clustered with bovine isolates of P. multocida strain PM36950 and PMHB01 and were separated from avian isolate PM70 and swine isolates PM3480 and PMHN06 and are distant from Actinobacillus and Haemophilus. Previous studies based on Single Nucleotide Polymorphism (SNPs) and Multilocus Sequence Typing (MLST) unable to show a clear phylogenetic relatedness between Pasteurella multocida and the different host. In conclusion, this study has provided insight on the genomic structure of PMTB2.1 in terms of potential genes that can function as virulence factors for future study in elucidating the mechanisms behind the ability of the bacteria in causing diseases in susceptible animals.

Keywords: comparative genomics, DNA sequencing, phage, phylogenomics

Procedia PDF Downloads 154

1329 Constructing Orthogonal De Bruijn and Kautz Sequences and Applications

Authors: Yaw-Ling Lin

Abstract:

A de Bruijn graph of order k is a graph whose vertices representing all length-k sequences with edges joining pairs of vertices whose sequences have maximum possible overlap (length k−1). Every Hamiltonian cycle of this graph defines a distinct, minimum length de Bruijn sequence containing all k-mers exactly once. A Kautz sequence is the minimal generating sequence so as the sequence of minimal length that produces all possible length-k sequences with the restriction that every two consecutive alphabets in the sequences must be different. A collection of de Bruijn/Kautz sequences are orthogonal if any two sequences are of maximally differ in sequence composition; that is, the maximum length of their common substring is k. In this paper, we discuss how such a collection of (maximal) orthogonal de Bruijn/Kautz sequences can be made and use the algorithm to build up a web application service for the synthesized DNA and other related biomolecular sequences.

Keywords: biomolecular sequence synthesis, de Bruijn sequences, Eulerian cycle, Hamiltonian cycle, Kautz sequences, orthogonal sequences

Procedia PDF Downloads 129

1328 Comparing the Sequence and Effectiveness of Teaching the Four Basic Operations and Mathematics in Primary Schools

Authors: Abubakar Sadiq Mensah, Hassan Usman

Abstract:

The study compared the effectiveness of Audition, Multiplication, subtraction and Division (AMSD) and Addition, subtraction, Multiplication and Division (ASMD), sequence of teaching these four basic operations in mathematics to primary one pupil’s in Katsina Local Government, Katsina State. The study determined the sequence that was more effective and mostly adopted by teachers of the operations. One hundred (100) teachers and sixty pupils (60) from primary one were used for the study. The pupils were divided into two equal groups. The researcher taught these operations to each group separately for four weeks (4 weeks). Group one was taught using the ASMD sequence, while group two was taught using ASMD sequence. In order to generate the needed data for the study, questionnaires and tests were administered on the samples. Data collected were analyzed and major findings were arrived at: (i) Two primary mathematics text books were used in all the primary schools in the area; (ii) Each of the textbooks contained the ASMD sequence; (iii) 73% of the teachers sampled adopted the ASMD sequence of teaching these operations; and (iv) Group one of the pupils (taught using AMSD sequence) performed significantly better than their counter parts in group two (taught using AMSD sequence). On the basis of this, the researcher concluded that the AMSD sequence was more effective in teaching the operations than the ASMD sequence. Consequently, the researcher concluded that primary schools teachers, authors of primary mathematics textbooks, and curriculum planner should adopt the AMSD sequence of teaching these operations.

Keywords: matematic, high school, four basic operations, effectiveness of teaching

Procedia PDF Downloads 229

1327 On Paranorm Zweier I-Convergent Sequence Spaces

Authors: Nazneen Khan, Vakeel A. Khan

Abstract:

In this article we introduce the Paranorm Zweier I-convergent sequence spaces, for a sequence of positive real numbers. We study some topological properties, prove the decomposition theorem and study some inclusion relations on these spaces.

Keywords: ideal, filter, I-convergence, I-nullity, paranorm

Procedia PDF Downloads 453

1326 Genetic Diversity and Discovery of Unique SNPs in Five Country Cultivars of Sesamum indicum by Next-Generation Sequencing

Authors: Nam-Kuk Kim, Jin Kim, Soomin Park, Changhee Lee, Mijin Chu, Seong-Hun Lee

Abstract:

In this study, we conducted whole genome re-sequencing of 10 cultivars originated from five countries including Korea, China, India, Pakistan and Ethiopia with Sesamum indicum (Zhongzho No. 13) genome as a reference. Almost 80% of the whole genome sequences of the reference genome could be covered by sequenced reads. Numerous SNP and InDel were detected by bioinformatic analysis. Among these variants, 266,051 SNPs were identified as unique to countries. Pakistan and Ethiopia had high densities of SNPs compared to other countries. Three main clusters (cluster 1: Korea, cluster 2: Pakistan and India, cluster 3: Ethiopia and China) were recovered by neighbor-joining analysis using all variants. Interestingly, some variants were detected in DGAT1 (diacylglycerol O-acyltransferase 1) and FADS (fatty acid desaturase) genes, which are known to be related with fatty acid synthesis and metabolism. These results can provide useful information to understand the regional characteristics and develop DNA markers for origin discrimination of sesame.

Keywords: Sesamum indicum, NGS, SNP, DNA marker

Procedia PDF Downloads 301

1325 Identifying Protein-Coding and Non-Coding Regions in Transcriptomes

Authors: Angela U. Makolo

Abstract:

Protein-coding and Non-coding regions determine the biology of a sequenced transcriptome. Research advances have shown that Non-coding regions are important in disease progression and clinical diagnosis. Existing bioinformatics tools have been targeted towards Protein-coding regions alone. Therefore, there are challenges associated with gaining biological insights from transcriptome sequence data. These tools are also limited to computationally intensive sequence alignment, which is inadequate and less accurate to identify both Protein-coding and Non-coding regions. Alignment-free techniques can overcome the limitation of identifying both regions. Therefore, this study was designed to develop an efficient sequence alignment-free model for identifying both Protein-coding and Non-coding regions in sequenced transcriptomes. Feature grouping and randomization procedures were applied to the input transcriptomes (37,503 data points). Successive iterations were carried out to compute the gradient vector that converged the developed Protein-coding and Non-coding Region Identifier (PNRI) model to the approximate coefficient vector. The logistic regression algorithm was used with a sigmoid activation function. A parameter vector was estimated for every sample in 37,503 data points in a bid to reduce the generalization error and cost. Maximum Likelihood Estimation (MLE) was used for parameter estimation by taking the log-likelihood of six features and combining them into a summation function. Dynamic thresholding was used to classify the Protein-coding and Non-coding regions, and the Receiver Operating Characteristic (ROC) curve was determined. The generalization performance of PNRI was determined in terms of F1 score, accuracy, sensitivity, and specificity. The average generalization performance of PNRI was determined using a benchmark of multi-species organisms. The generalization error for identifying Protein-coding and Non-coding regions decreased from 0.514 to 0.508 and to 0.378, respectively, after three iterations. The cost (difference between the predicted and the actual outcome) also decreased from 1.446 to 0.842 and to 0.718, respectively, for the first, second and third iterations. The iterations terminated at the 390th epoch, having an error of 0.036 and a cost of 0.316. The computed elements of the parameter vector that maximized the objective function were 0.043, 0.519, 0.715, 0.878, 1.157, and 2.575. The PNRI gave an ROC of 0.97, indicating an improved predictive ability. The PNRI identified both Protein-coding and Non-coding regions with an F1 score of 0.970, accuracy (0.969), sensitivity (0.966), and specificity of 0.973. Using 13 non-human multi-species model organisms, the average generalization performance of the traditional method was 74.4%, while that of the developed model was 85.2%, thereby making the developed model better in the identification of Protein-coding and Non-coding regions in transcriptomes. The developed Protein-coding and Non-coding region identifier model efficiently identified the Protein-coding and Non-coding transcriptomic regions. It could be used in genome annotation and in the analysis of transcriptomes.

Keywords: sequence alignment-free model, dynamic thresholding classification, input randomization, genome annotation

Procedia PDF Downloads 31

1324 Genomics of Aquatic Adaptation

Authors: Agostinho Antunes

Abstract:

The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected marine animal species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.

Keywords: comparative genomics, adaptive evolution, bioinformatics, phylogenetics, genome mining

Procedia PDF Downloads 504

1323 Genomics of Adaptation in the Sea

Authors: Agostinho Antunes

Abstract:

Keywords: marine genomics, evolutionary bioinformatics, human genome sequencing, genomic analyses

Procedia PDF Downloads 584