Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1068

Search results for: genome sequence

1068 The Role and Importance of Genome Sequencing in Prediction of Cancer Risk

Authors: M. Sadeghi, H. Pezeshk, R. Tusserkani, A. Sharifi Zarchi, A. Malekpour, M. Foroughmand, S. Goliaei, M. Totonchi, N. Ansari–Pour

Abstract:

The role and relative importance of intrinsic and extrinsic factors in the development of complex diseases such as cancer still remains a controversial issue. Determining the amount of variation explained by these factors needs experimental data and statistical models. These models are nevertheless based on the occurrence and accumulation of random mutational events during stem cell division, thus rendering cancer development a stochastic outcome. We demonstrate that not only individual genome sequencing is uninformative in determining cancer risk, but also assigning a unique genome sequence to any given individual (healthy or affected) is not meaningful. Current whole-genome sequencing approaches are therefore unlikely to realize the promise of personalized medicine. In conclusion, since genome sequence differs from cell to cell and changes over time, it seems that determining the risk factor of complex diseases based on genome sequence is somewhat unrealistic, and therefore, the resulting data are likely to be inherently uninformative.

Keywords: cancer risk, extrinsic factors, genome sequencing, intrinsic factors

Procedia PDF Downloads 122
1067 Insights into the Annotated Genome Sequence of Defluviitoga tunisiensis L3 Isolated from a Thermophilic Rural Biogas Producing Plant

Authors: Irena Maus, Katharina Gabriella Cibis, Andreas Bremges, Yvonne Stolze, Geizecler Tomazetto, Daniel Wibberg, Helmut König, Alfred Pühler, Andreas Schlüter

Abstract:

Within the agricultural sector, the production of biogas from organic substrates represents an economically attractive technology to generate bioenergy. Complex consortia of microorganisms are responsible for biomass decomposition and biogas production. Recently, species belonging to the phylum Thermotogae were detected in thermophilic biogas-production plants utilizing renewable primary products for biomethanation. To analyze adaptive genome features of representative Thermotogae strains, Defluviitoga tunisiensis L3 was isolated from a rural thermophilic biogas plant (54°C) and completely sequenced on an Illumina MiSeq system. Sequencing and assembly of the D. tunisiensis L3 genome yielded a circular chromosome with a size of 2,053,097 bp and a mean GC content of 31.38%. Functional annotation of the complete genome sequence revealed that the thermophilic strain L3 encodes several genes predicted to facilitate growth of this microorganism on arabinose, galactose, maltose, mannose, fructose, raffinose, ribose, cellobiose, lactose, xylose, xylan, lactate and mannitol. Acetate, hydrogen (H2) and carbon dioxide (CO2) are supposed to be end products of the fermentation process. The latter gene products are metabolites for methanogenic archaea, the key players in the final step of the anaerobic digestion process. To determine the degree of relatedness of dominant biogas community members within selected digester systems to D. tunisiensis L3, metagenome sequences from corresponding communities were mapped on the L3 genome. These fragment recruitments revealed that metagenome reads originating from a thermophilic biogas plant covered 95% of D. tunisiensis L3 genome sequence. In conclusion, availability of the D. tunisiensis L3 genome sequence and insights into its metabolic capabilities provide the basis for biotechnological exploitation of genome features involved in thermophilic fermentation processes utilizing renewable primary products.

Keywords: genome sequence, thermophilic biogas plant, Thermotogae, Defluviitoga tunisiensis

Procedia PDF Downloads 332
1066 Exploring MPI-Based Parallel Computing in Analyzing Very Large Sequences

Authors: Bilal Wajid, Erchin Serpedin

Abstract:

The health industry is aiming towards personalized medicine. If the patient’s genome needs to be sequenced it is important that the entire analysis be completed quickly. This paper explores use of parallel computing to analyze very large sequences. Two cases have been considered. In the first case, the sequence is kept constant and the effect of increasing the number of MPI-based processes is evaluated in terms of execution time, speed and efficiency. In the second case the number of MPI-based processes have been kept constant whereas, the length of the sequence was increased.

Keywords: parallel computing, alignment, genome assembly, alignment

Procedia PDF Downloads 138
1065 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Darlington Mapiye, Mpho Mokoatle, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on k-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0 %, 80.5 %, 80.5 %, 63.6 %, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 16
1064 Phenotype Prediction of DNA Sequence Data: A Machine and Statistical Learning Approach

Authors: Mpho Mokoatle, Darlington Mapiye, James Mashiyane, Stephanie Muller, Gciniwe Dlamini

Abstract:

Great advances in high-throughput sequencing technologies have resulted in availability of huge amounts of sequencing data in public and private repositories, enabling a holistic understanding of complex biological phenomena. Sequence data are used for a wide range of applications such as gene annotations, expression studies, personalized treatment and precision medicine. However, this rapid growth in sequence data poses a great challenge which calls for novel data processing and analytic methods, as well as huge computing resources. In this work, a machine and statistical learning approach for DNA sequence classification based on $k$-mer representation of sequence data is proposed. The approach is tested using whole genome sequences of Mycobacterium tuberculosis (MTB) isolates to (i) reduce the size of genomic sequence data, (ii) identify an optimum size of k-mers and utilize it to build classification models, (iii) predict the phenotype from whole genome sequence data of a given bacterial isolate, and (iv) demonstrate computing challenges associated with the analysis of whole genome sequence data in producing interpretable and explainable insights. The classification models were trained on 104 whole genome sequences of MTB isoloates. Cluster analysis showed that k-mers maybe used to discriminate phenotypes and the discrimination becomes more concise as the size of k-mers increase. The best performing classification model had a k-mer size of 10 (longest k-mer) an accuracy, recall, precision, specificity, and Matthews Correlation coeffient of 72.0%, 80.5%, 80.5%, 63.6%, and 0.4 respectively. This study provides a comprehensive approach for resampling whole genome sequencing data, objectively selecting a k-mer size, and performing classification for phenotype prediction. The analysis also highlights the importance of increasing the k-mer size to produce more biological explainable results, which brings to the fore the interplay that exists amongst accuracy, computing resources and explainability of classification results. However, the analysis provides a new way to elucidate genetic information from genomic data, and identify phenotype relationships which are important especially in explaining complex biological mechanisms.

Keywords: AWD-LSTM, bootstrapping, k-mers, next generation sequencing

Procedia PDF Downloads 18
1063 PMEL Marker Identification of Dark and Light Feather Colours in Local Canary

Authors: Mudawamah Mudawamah, Muhammad Z. Fadli, Gatot Ciptadi, Aulanni’am

Abstract:

Canary breeders have spread throughout Indonesian regions for the low-middle society and become an income source for them. The interesting phenomenon of the canary market is the feather colours become one of determining factor for the price. The advantages of this research were contributed to the molecular database as a base of selection and mating for the Indonesia canary breeder. The research method was experiment with the genome obtained from canary blood isolation. The genome did the PCR amplification with PMEL marker followed by sequencing. Canaries were used 24 heads of light and dark colour feathers. Research data analyses used BioEdit and Network 4.6.0.0 software. The results showed that all samples were amplification with PMEL gene with 500 bp fragment length. In base sequence of 40 was found Cytosine(C) in the light colour canaries, while the dark colour canaries was obtained Thymine (T) in same base sequence. Sequence results had 286-415 bp fragment and 10 haplotypes. The conclusions were the PMEL gene (gene of white pigment) was likely to be used PMEL gene to detect molecular genetic variation of dark and light colour feather.

Keywords: canary, haplotype, PMEL, sequence

Procedia PDF Downloads 126
1062 Genome-Wide Analysis of BES1/BZR1 Gene Family in Five Plant Species

Authors: Jafar Ahmadi, Zhohreh Asiaban, Sedigheh Fabriki Ourang

Abstract:

Brassinosteroids (BRs) regulate cell elongation, vascular differentiation, senescence and stress responses. BRs signal through the BES1/BZR1 family of transcription factors, which regulate hundreds of target genes involved in this pathway. In this research a comprehensive genome-wide analysis was carried out in BES1/BZR1 gene family in Arabidopsis thaliana, Cucumis sativus, Vitis vinifera, Glycin max, and Brachypodium distachyon. Specifications of the desired sequences, dot plot and hydropathy plot were analyzed in the protein and genome sequences of five plant species. The maximum amino acid length was attributed to protein sequence Brdic3g with 374aa and the minimum amino acid length was attributed to protein sequence Gm7g with 163aa. The maximum Instability index was attributed to protein sequence AT1G19350 equal with 79.99 and the minimum Instability index was attributed to protein sequence Gm5g equal with 33.22. Aliphatic index of these protein sequences ranged from 47.82 to 78.79 in Arabidopsis thaliana, 49.91 to 57.50 in Vitis vinifera, 55.09 to 82.43 in Glycin max, 54.09 to 54.28 in Brachypodium distachyon 55.36 to 56.83 in Cucumis sativus. Overall, data obtained from our investigation contributes a better understanding of the complexity of the BES1/BZR1 gene family and provides the first step towards directing future experimental designs to perform systematic analysis of the functions of the BES1/BZR1 gene family.

Keywords: BES1/BZR1, brassinosteroids, phylogenetic analysis, transcription factor

Procedia PDF Downloads 234
1061 Implementation of CNV-CH Algorithm Using Map-Reduce Approach

Authors: Aishik Deb, Rituparna Sinha

Abstract:

We have developed an algorithm to detect the abnormal segment/"structural variation in the genome across a number of samples. We have worked on simulated as well as real data from the BAM Files and have designed a segmentation algorithm where abnormal segments are detected. This algorithm aims to improve the accuracy and performance of the existing CNV-CH algorithm. The next-generation sequencing (NGS) approach is very fast and can generate large sequences in a reasonable time. So the huge volume of sequence information gives rise to the need for Big Data and parallel approaches of segmentation. Therefore, we have designed a map-reduce approach for the existing CNV-CH algorithm where a large amount of sequence data can be segmented and structural variations in the human genome can be detected. We have compared the efficiency of the traditional and map-reduce algorithms with respect to precision, sensitivity, and F-Score. The advantages of using our algorithm are that it is fast and has better accuracy. This algorithm can be applied to detect structural variations within a genome, which in turn can be used to detect various genetic disorders such as cancer, etc. The defects may be caused by new mutations or changes to the DNA and generally result in abnormally high or low base coverage and quantification values.

Keywords: cancer detection, convex hull segmentation, map reduce, next generation sequencing

Procedia PDF Downloads 10
1060 Genome-Wide Mining of Potential Guide RNAs for Streptococcus pyogenes and Neisseria meningitides CRISPR-Cas Systems for Genome Engineering

Authors: Farahnaz Sadat Golestan Hashemi, Mohd Razi Ismail, Mohd Y. Rafii

Abstract:

Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system can facilitate targeted genome editing in organisms. Dual or single guide RNA (gRNA) can program the Cas9 nuclease to cut target DNA in particular areas; thus, introducing concise mutations either via error-prone non-homologous end-joining repairing or via incorporating foreign DNAs by homologous recombination between donor DNA and target area. In spite of high demand of such promising technology, developing a well-organized procedure in order for reliable mining of potential target sites for gRNAs in large genomic data is still challenging. Hence, we aimed to perform high-throughput detection of target sites by specific PAMs for not only common Streptococcus pyogenes (SpCas9) but also for Neisseria meningitides (NmCas9) CRISPR-Cas systems. Previous research confirmed the successful application of such RNA-guided Cas9 orthologs for effective gene targeting and subsequently genome manipulation. However, Cas9 orthologs need their particular PAM sequence for DNA cleavage activity. Activity levels are based on the sequence of the protospacer and specific combinations of favorable PAM bases. Therefore, based on the specific length and sequence of PAM followed by a constant length of the target site for the two orthogonals of Cas9 protein, we created a reliable procedure to explore possible gRNA sequences. To mine CRISPR target sites, four different searching modes of sgRNA binding to target DNA strand were applied. These searching modes are as follows i) coding strand searching, ii) anti-coding strand searching, iii) both strand searching, and iv) paired-gRNA searching. Finally, a complete list of all potential gRNAs along with their locations, strands, and PAMs sequence orientation can be provided for both SpCas9 as well as another potential Cas9 ortholog (NmCas9). The artificial design of potential gRNAs in a genome of interest can accelerate functional genomic studies. Consequently, the application of such novel genome editing tool (CRISPR/Cas technology) will enhance by presenting increased versatility and efficiency.

Keywords: CRISPR/Cas9 genome editing, gRNA mining, SpCas9, NmCas9

Procedia PDF Downloads 139
1059 Genome Sequencing of Infectious Bronchitis Virus QX-Like Strain Isolated in Malaysia

Authors: M. Suwaibah, S. W. Tan, I. Aiini, K. Yusoff, A. R. Omar

Abstract:

Respiratory diseases are the most important infectious diseases affecting poultry worldwide. One of the avian respiratory virus of global importance causing significant economic losses is Infectious Bronchitis Virus (IBV). The virus causes a wide spectrum disease known as Infectious Bronchitis (IB), affecting not only the respiratory system but also the kidney and the reproductive system, depending on its strain. IB and Newcastle disease are two of the most prevalent diseases affecting poultry in Malaysia. However, a study on the molecular characterization of Malaysian IBV is lacking. In this study, an IBV strain IBS130 which was isolated in 2015 was fully sequenced using next-gene sequencing approach. Sequence analysis of IBS130 based on the complete genome, polyprotein 1ab and S1 genes were compared with other IBV sequences available in Genbank, National Center for Biotechnology Information (NCBI). IBV strain IBS130 is characterised as QX-like strain based on whole genome and S1 gene sequence analysis. Comparisons of the virus with other IBV strains showed that the nucleotide identity ranged from 67% to 99.2%, depending on the region analysed. The similarity in whole genome nucleotide ranging from 84.9% to 90.7% with the least similar was from Singapore strains (84.9%) and highly similar with China QX-like strains. Meanwhile, the similarity in polyprotein 1ab ranging from 85.3% to 89.9% with the least similar to Singapore strains (85.3%) and highly similar with Mass strains from USA.

Keywords: infectious bronchitis virus, phylogenetic analysis, chicken, Malaysia

Procedia PDF Downloads 91
1058 Genome Analyses of Pseudomonas Fluorescens b29b from Coastal Kerala

Authors: Wael Ali Mohammed Hadi

Abstract:

Pseudomonas fluorescens B29B, which has asparaginase enzymatic activity, was isolated from the surface coastal seawater of Trivandrum, India. We report the complete Pseudomonas fluorescens B29B genome sequenced, identified, and annotated from a marine source. We find the genome at most minuscule a 7,331,508 bp single circular chromosome with a GC content of 62.19% and 6883 protein-coding genes. Three hundred forty subsystems were identified, including two predicted asparaginases from the genome analysis of P. fluorescens B29B for further investigation. This genome data will help further industrial biotechnology applications of proteins in general and asparaginase as a target.

Keywords: pseudomonas, marine, asparaginases, Kerala, whole-genome

Procedia PDF Downloads 44
1057 Computing the Similarity and the Diversity in the Species Based on Cronobacter Genome

Authors: E. Al Daoud

Abstract:

The purpose of computing the similarity and the diversity in the species is to trace the process of evolution and to find the relationship between the species and discover the unique, the special, the common and the universal proteins. The proteins of the whole genome of 40 species are compared with the cronobacter genome which is used as reference genome. More than 3 billion pairwise alignments are performed using blastp. Several findings are introduced in this study, for example, we found 172 proteins in cronobacter genome which have insignificant hits in other species, 116 significant proteins in the all tested species with very high score value and 129 common proteins in the plants but have insignificant hits in mammals, birds, fishes, and insects.

Keywords: genome, species, blastp, conserved genes, Cronobacter

Procedia PDF Downloads 350
1056 Mining the Proteome of Fusobacterium nucleatum for Potential Therapeutics Discovery

Authors: Abdul Musaweer Habib, Habibul Hasan Mazumder, Saiful Islam, Sohel Sikder, Omar Faruk Sikder

Abstract:

The plethora of genome sequence information of bacteria in recent times has ushered in many novel strategies for antibacterial drug discovery and facilitated medical science to take up the challenge of the increasing resistance of pathogenic bacteria to current antibiotics. In this study, we adopted subtractive genomics approach to analyze the whole genome sequence of the Fusobacterium nucleatum, a human oral pathogen having association with colorectal cancer. Our study divulged 1499 proteins of Fusobacterium nucleatum, which has no homolog in human genome. These proteins were subjected to screening further by using the Database of Essential Genes (DEG) that resulted in the identification of 32 vitally important proteins for the bacterium. Subsequent analysis of the identified pivotal proteins, using the KEGG Automated Annotation Server (KAAS) resulted in sorting 3 key enzymes of F. nucleatum that may be good candidates as potential drug targets, since they are unique for the bacterium and absent in humans. In addition, we have demonstrated the 3-D structure of these three proteins. Finally, determination of ligand binding sites of the key proteins as well as screening for functional inhibitors that best fitted with the ligands sites were conducted to discover effective novel therapeutic compounds against Fusobacterium nucleatum.

Keywords: colorectal cancer, drug target, Fusobacterium nucleatum, homology modeling, ligands

Procedia PDF Downloads 261
1055 Toward Particular Series with (k,h)-Jacobsthal Sequence

Authors: Seyyd Hossein Jafari-Petroudi, Maryam Pirouz

Abstract:

This note is devoted to (k; h)-Jacobsthal sequence as a general term of particular series. More formulas for nth term and sum of the first n terms of series that their general terms are (k; h)-Jacobsthal sequence and (k; h)-Jacobsthal-Petroudi sequence are derived. Finally other properties of these sequences are represented.

Keywords: (k, h)-Jacobsthal sequence, (k, h)-Jacobsthal Petroudisequence, recursive relation, sum

Procedia PDF Downloads 225
1054 Genome-Wide Analysis of Long Terminal Repeat (LTR) Retrotransposons in Rabbit (Oryctolagus cuniculus)

Authors: Zeeshan Khan, Faisal Nouroz, Shumaila Noureen

Abstract:

European or common rabbit (Oryctolagus cuniculus) belongs to class Mammalia, order Lagomorpha of family Leporidae. They are distributed worldwide and are native to Europe (France, Spain and Portugal) and Africa (Morocco and Algeria). LTR retrotransposons are major Class I mobile genetic elements of eukaryotic genomes and play a crucial role in genome expansion, evolution and diversification. They were mostly annotated in various genomes by conventional approaches of homology searches, which restricted the annotation of novel elements. Present work involved de novo identification of LTR retrotransposons by LTR_FINDER in haploid genome of rabbit (2247.74 Mb) distributed in 22 chromosomes, of which 7,933 putative full-length or partial copies were identified containing 69.38 Mb of elements, accounting 3.08% of the genome. Highest copy numbers (731) were found on chromosome 7, followed by chromosome 12 (705), while the lowest copy numbers (27) were detected in chromosome 19 with no elements identified from chromosome 21 due to partially sequenced chromosome, unidentified nucleotides (N) and repeated simple sequence repeats (SSRs). The identified elements ranged in sizes from 1.2 - 25.8 Kb with average sizes between 2-10 Kb. Highest percentage (4.77%) of elements was found in chromosome 15, while lowest (0.55%) in chromosome 19. The most frequent tRNA type was Arginine present in majority of the elements. Based on gained results, it was estimated that rabbit exhibits 15,866 copies having 137.73 Mb of elements accounting 6.16% of diploid genome (44 chromosomes). Further molecular analyses will be helpful in chromosomal localization and distribution of these elements on chromosomes.

Keywords: rabbit, LTR retrotransposons, genome, chromosome

Procedia PDF Downloads 34
1053 In silico Comparative Analysis of Chloroplast Genome (cpDNA) and Some Individual Genes (rbcL and trnH-psbA) in Pooideae Subfamily Members

Authors: Ibrahim Ilker Ozyigit, Ertugrul Filiz, Ilhan Dogan

Abstract:

An in silico analysis of Brachypodium distachyon, Triticum aestivum, Festuca arundinacea, Lolium perenne, Hordeum vulgare subsp. vulgare of the Pooideaea was performed based on complete chloroplast genomes including rbcL coding and trnH-psbA intergenic spacer regions alone to compare phylogenetic resolving power. Neighbor-joining, Minimum Evolution, and Unweighted Pair Group Method with arithmetic mean methods were used to reconstruct phylogenies with the highest bootstrap supported the obtained data from whole chloroplast genome sequence. The highest and lowest values from nucleotide diversity (π) analysis were found to be 0.315813 and 0.043495 in rbcL coding region in chloroplast genome and complete chloroplast genome, respectively. The highest transition/transversion bias (R) value was recorded as 1.384 in complete chloroplast genomes. F. arudinacea-L. perenne clade was uncovered in all phylogenies. Sequences of rbcL and trnH-psbA regions were not able to resolve the Pooideae phylogenies due to lack of genetic variation.

Keywords: chloroplast DNA, Pooideae, phylogenetic analysis, rbcL, trnH-psbA

Procedia PDF Downloads 271
1052 Isolate-Specific Variations among Clinical Isolates of Brucella Identified by Whole-Genome Sequencing, Bioinformatics and Comparative Genomics

Authors: Abu S. Mustafa, Mohammad W. Khan, Faraz Shaheed Khan, Nazima Habibi

Abstract:

Brucellosis is a zoonotic disease of worldwide prevalence. There are at least four species and several strains of Brucella that cause human disease. Brucella genomes have very limited variation across strains, which hinder strain identification using classical molecular techniques, including PCR and 16 S rDNA sequencing. The aim of this study was to perform whole genome sequencing of clinical isolates of Brucella and perform bioinformatics and comparative genomics analyses to determine the existence of genetic differences across the isolates of a single Brucella species and strain. The draft sequence data were generated from 15 clinical isolates of Brucella melitensis (biovar 2 strain 63/9) using MiSeq next generation sequencing platform. The generated reads were used for further assembly and analysis. All the analysis was performed using Bioinformatics work station (8 core i7 processor, 8GB RAM with Bio-Linux operating system). FastQC was used to determine the quality of reads and low quality reads were trimmed or eliminated using Fastx_trimmer. Assembly was done by using Velvet and ABySS softwares. The ordering of assembled contigs was performed by Mauve. An online server RAST was employed to annotate the contigs assembly. Annotated genomes were compared using Mauve and ACT tools. The QC score for DNA sequence data, generated by MiSeq, was higher than 30 for 80% of reads with more than 100x coverage, which suggested that data could be utilized for further analysis. However when analyzed by FastQC, quality of four reads was not good enough for creating a complete genome draft so remaining 11 samples were used for further analysis. The comparative genome analyses showed that despite sharing same gene sets, single nucleotide polymorphisms and insertions/deletions existed across different genomes, which provided a variable extent of diversity to these bacteria. In conclusion, the next generation sequencing, bioinformatics, and comparative genome analysis can be utilized to find variations (point mutations, insertions and deletions) across different genomes of Brucella within a single strain. This information could be useful in surveillance and epidemiological studies supported by Kuwait University Research Sector grants MI04/15 and SRUL02/13.

Keywords: brucella, bioinformatics, comparative genomics, whole genome sequencing

Procedia PDF Downloads 241
1051 An Improved Ant Colony Algorithm for Genome Rearrangements

Authors: Essam Al Daoud

Abstract:

Genome rearrangement is an important area in computational biology and bioinformatics. The basic problem in genome rearrangements is to compute the edit distance, i.e., the minimum number of operations needed to transform one genome into another. Unfortunately, unsigned genome rearrangement problem is NP-hard. In this study an improved ant colony optimization algorithm to approximate the edit distance is proposed. The main idea is to convert the unsigned permutation to signed permutation and evaluate the ants by using Kaplan algorithm. Two new operations are added to the standard ant colony algorithm: Replacing the worst ants by re-sampling the ants from a new probability distribution and applying the crossover operations on the best ants. The proposed algorithm is tested and compared with the improved breakpoint reversal sort algorithm by using three datasets. The results indicate that the proposed algorithm achieves better accuracy ratio than the previous methods.

Keywords: ant colony algorithm, edit distance, genome breakpoint, genome rearrangement, reversal sort

Procedia PDF Downloads 189
1050 Genome Sequencing, Assembly and Annotation of Gelidium Pristoides from Kenton-on-Sea, South Africa

Authors: Sandisiwe Mangali, Graeme Bradley

Abstract:

Genome is complete set of the organism's hereditary information encoded as either deoxyribonucleic acid or ribonucleic acid in most viruses. The three different types of genomes are nuclear, mitochondrial and the plastid genome and their sequences which are uncovered by genome sequencing are known as an archive for all genetic information and enable researchers to understand the composition of a genome, regulation of gene expression and also provide information on how the whole genome works. These sequences enable researchers to explore the population structure, genetic variations, and recent demographic events in threatened species. Particularly, genome sequencing refers to a process of figuring out the exact arrangement of the basic nucleotide bases of a genome and the process through which all the afore-mentioned genomes are sequenced is referred to as whole or complete genome sequencing. Gelidium pristoides is South African endemic Rhodophyta species which has been harvested in the Eastern Cape since the 1950s for its high economic value which is one motivation for its sequencing. Its endemism further motivates its sequencing for conservation biology as endemic species are more vulnerable to anthropogenic activities endangering a species. As sequencing, mapping and annotating the Gelidium pristoides genome is the aim of this study. To accomplish this aim, the genomic DNA was extracted and quantified using the Nucleospin Plank Kit, Qubit 2.0 and Nanodrop. Thereafter, the Ion Plus Fragment Library was used for preparation of a 600bp library which was then sequenced through the Ion S5 sequencing platform for two runs. The produced reads were then quality-controlled and assembled through the SPAdes assembler with default parameters and the genome assembly was quality assessed through the QUAST software. From this assembly, the plastid and the mitochondrial genomes were then sampled out using Gelidiales organellar genomes as search queries and ordered according to them using the Geneious software. The Qubit and the Nanodrop instruments revealed an A260/A280 and A230/A260 values of 1.81 and 1.52 respectively. A total of 30792074 reads were obtained and produced a total of 94140 contigs with resulted into a sequence length of 217.06 Mbp with N50 value of 3072 bp and GC content of 41.72%. A total length of 179281bp and 25734 bp was obtained for plastid and mitochondrial respectively. Genomic data allows a clear understanding of the genomic constituent of an organism and is valuable as foundation information for studies of individual genes and resolving the evolutionary relationships between organisms including Rhodophytes and other seaweeds.

Keywords: Gelidium pristoides, genome, genome sequencing and assembly, Ion S5 sequencing platform

Procedia PDF Downloads 43
1049 High-Throughput Artificial Guide RNA Sequence Design for Type I, II and III CRISPR/Cas-Mediated Genome Editing

Authors: Farahnaz Sadat Golestan Hashemi, Mohd Razi Ismail, Mohd Y. Rafii

Abstract:

A huge revolution has emerged in genome engineering by the discovery of CRISPR (clustered regularly interspaced palindromic repeats) and CRISPR-associated system genes (Cas) in bacteria. The function of type II Streptococcus pyogenes (Sp) CRISPR/Cas9 system has been confirmed in various species. Other S. thermophilus (St) CRISPR-Cas systems, CRISPR1-Cas and CRISPR3-Cas, have been also reported for preventing phage infection. The CRISPR1-Cas system interferes by cleaving foreign dsDNA entering the cell in a length-specific and orientation-dependant manner. The S. thermophilus CRISPR3-Cas system also acts by cleaving phage dsDNA genomes at the same specific position inside the targeted protospacer as observed in the CRISPR1-Cas system. It is worth mentioning, for the effective DNA cleavage activity, RNA-guided Cas9 orthologs require their own specific PAM (protospacer adjacent motif) sequences. Activity levels are based on the sequence of the protospacer and specific combinations of favorable PAM bases. Therefore, based on the specific length and sequence of PAM followed by a constant length of target site for the three orthogonals of Cas9 protein, a well-organized procedure will be required for high-throughput and accurate mining of possible target sites in a large genomic dataset. Consequently, we created a reliable procedure to explore potential gRNA sequences for type I (Streptococcus thermophiles), II (Streptococcus pyogenes), and III (Streptococcus thermophiles) CRISPR/Cas systems. To mine CRISPR target sites, four different searching modes of sgRNA binding to target DNA strand were applied. These searching modes are as follows: i) coding strand searching, ii) anti-coding strand searching, iii) both strand searching, and iv) paired-gRNA searching. The output of such procedure highlights the power of comparative genome mining for different CRISPR/Cas systems. This could yield a repertoire of Cas9 variants with expanded capabilities of gRNA design, and will pave the way for further advance genome and epigenome engineering.

Keywords: CRISPR/Cas systems, gRNA mining, Streptococcus pyogenes, Streptococcus thermophiles

Procedia PDF Downloads 145
1048 Novel Coprocessor for DNA Sequence Alignment in Resequencing Applications

Authors: Atef Ibrahim, Hamed Elsimary, Abdullah Aljumah, Fayez Gebali

Abstract:

This paper presents a novel semi-systolic array architecture for an optimized parallel sequence alignment algorithm. This architecture has the advantage that it can be modified to be reused for multiple pass processing in order to increase the number of processing elements that can be packed into a single FPGA and to increase the number of sequences that can be aligned in parallel in a single FPGA. This resolves the potential problem of many FPGA resources left unused for designs that have large values of short read length. When using the previously published conventional hardware design. FPGA implementation results show that, for large values of short read lengths (M>128), the proposed design has a slightly higher speed up and FPGA utilization over the the conventional one.

Keywords: bioinformatics, genome sequence alignment, re-sequencing applications, systolic array

Procedia PDF Downloads 408
1047 Molecular-Genetics Studies of New Unknown APMV Isolated from Wild Bird in Ukraine

Authors: Borys Stegniy, Anton Gerilovych, Oleksii Solodiankin, Vitaliy Bolotin, Anton Stegniy, Denys Muzyka, Claudio Afonso

Abstract:

New APMV was isolated from white fronted goose in Ukraine. This isolate was tested serologically using monoclonal antibodies in haemagglutination-inhibition tests against APMV1-9. As the results obtained isolate showed cross reactions with APMV7. Following investigations were provided for the full genome sequencing using random primers and cloning into pCRII-TOPO. Analysis of 100 transformed colonies of E.coli using traditional sequencing gave us possibilities to find only 3 regions, which could identify by BLAST. The first region with the length of 367 bp had 70 % nucleotide sequence identity to the APMV 12 isolate Wigeon/Italy/3920_1/2005 at genome position 2419-2784. Next region (344 bp) had 66 % identity to the same APMV 12 isolate at position 4760-5103. The last region (365 bp) showed 71 % identity to Newcastle disease virus strain M4 at position 12569-12928.

Keywords: APMV, Newcastle disease virus, Ukraine, full genome sequencing

Procedia PDF Downloads 280
1046 Phylogenetic Relationships between the Whole Sets of Individual Flow Sorted U, M, S and C Chromosomes of Aegilops and Wheat as Revealed by COS Markers

Authors: András Farkas, István Molnár, Jan Vrána, Veronika Burešová, Petr Cápal, András Cseh, Márta Molnár-Láng, Jaroslav Doležel

Abstract:

Species of Aegilops played a central role in the evolution of wheat and are sources of traits related to yield quality and tolerance against biotic and abiotic stresses. These wild genes and alleles are desirable to use in crop improvement programs via introgressive hybridization. However, the success of chromosome mediated gene transfer to wheat are hampered by the pour knowledge on the genome structure of Aegilops relative to wheat and by the low number of cost-effective molecular markers specific for Aegilops chromosomes. The COS markers specific for genes conserved throughout evolution in both sequence and copy number between Triticeae/Aegilops taxa and define orthologous regions, thus enabling the comparison of regions on the chromosomes of related species. The present study compared individual chromosomes of Aegilops umbellulata (UU), Ae. comosa (MM), Ae. speltoides (SS) and Ae. caudata (CC) purified by flourescent labelling with oligonucleotid SSR repeats and biparametric flow cytometry with wheat by identifying orthologous chromosomal regions by COS markers. The linear order of bin-mapped COS markers along the wheat D chromosomes was identified by the use of chromosome-specific sequence data and virtual gene order. Syntenic regions of wheat identifying genome rearrangements differentiating the U, M, S or C genomes from the D genome of wheat were detected. The conserved orthologous set markers assigned to Aegilops chromosomes promise to accelerate gene introgression by facilitating the identification of alien chromatin. The syntenic relationships between the Aegilops species and wheat will facilitate the targeted development of new markers specific for U, M, S and C genomic regions and will contribute to the understanding of molecular processes related to the evolution of Aegilops.

Keywords: Aegilops, cos-markers, flow-sorting, wheat

Procedia PDF Downloads 371
1045 Merging Sequence Diagrams Based Slicing

Authors: Bouras Zine Eddine, Talai Abdelouaheb

Abstract:

The need to merge software artifacts seems inherent to modern software development. Distribution of development over several teams and breaking tasks into smaller, more manageable pieces are an effective means to deal with the kind of complexity. In each case, the separately developed artifacts need to be assembled as efficiently as possible into a consistent whole in which the parts still function as described. Also, earlier changes are introduced into the life cycle and easier is their management by designers. Interaction-based specifications such as UML sequence diagrams have been found effective in this regard. As a result, sequence diagrams can be used not only for capturing system behaviors but also for merging changes in order to create a new version. The objective of this paper is to suggest a new approach to deal with the problem of software merging at the level of sequence diagrams by using the concept of dependence analysis that captures, formally, all mapping and differences between elements of sequence diagrams and serves as a key concept to create a new version of sequence diagram.

Keywords: system behaviors, sequence diagram merging, dependence analysis, sequence diagram slicing

Procedia PDF Downloads 237
1044 RNA-Seq Based Transcriptomic Analysis of Wheat Cultivars for Unveiling of Genomic Variations and Isolation of Drought Tolerant Genes for Genome Editing

Authors: Ghulam Muhammad Ali

Abstract:

Unveiling of genes involved in drought and root architecture using transcriptomic analyses remained fragmented for further improvement of wheat through genome editing. The purpose of this research endeavor was to unveil the variations in different genes implicated in drought tolerance and root architecture in wheat through RNA-seq data analysis. In this study seedlings of 8 days old, 6 cultivars of wheat namely, Batis, Blue Silver, Local White, UZ888, Chakwal 50 and Synthetic wheat S22 were subjected to transcriptomic analysis for root and shoot genes. Total of 12 RNA samples was sequenced by Illumina. Using updated wheat transcripts from Ensembl and IWGC references with 54,175 gene models, we found that 49,621 out of 54,175 (91.5%) genes are expressed at an RPKM of 0.1 or more (in at least 1 sample). The number of genes expressed was higher in Local White than Batis. Differentially expressed genes (DEG) were higher in Chakwal 50. Expression-based clustering indicated conserved function of DRO1and RPK1 between Arabidopsis and wheat. Dendrogram showed that Local White is sister to Chakwal 50 while Batis is closely related to Blue Silver. This study flaunts transcriptomic sequence variations in different cultivars that showed mutations in genes associated with drought that may directly contribute to drought tolerance. DRO1 and RPK1 genes were fetched/isolated for genome editing. These genes are being edited in wheat through CRISPR-Cas9 for yield enhancement.

Keywords: transcriptomic, wheat, genome editing, drought, CRISPR-Cas9, yield enhancement

Procedia PDF Downloads 27
1043 Systematic Identification of Noncoding Cancer Driver Somatic Mutations

Authors: Zohar Manber, Ran Elkon

Abstract:

Accumulation of somatic mutations (SMs) in the genome is a major driving force of cancer development. Most SMs in the tumor's genome are functionally neutral; however, some cause damage to critical processes and provide the tumor with a selective growth advantage (termed cancer driver mutations). Current research on functional significance of SMs is mainly focused on finding alterations in protein coding sequences. However, the exome comprises only 3% of the human genome, and thus, SMs in the noncoding genome significantly outnumber those that map to protein-coding regions. Although our understanding of noncoding driver SMs is very rudimentary, it is likely that disruption of regulatory elements in the genome is an important, yet largely underexplored mechanism by which somatic mutations contribute to cancer development. The expression of most human genes is controlled by multiple enhancers, and therefore, it is conceivable that regulatory SMs are distributed across different enhancers of the same target gene. Yet, to date, most statistical searches for regulatory SMs have considered each regulatory element individually, which may reduce statistical power. The first challenge in considering the cumulative activity of all the enhancers of a gene as a single unit is to map enhancers to their target promoters. Such mapping defines for each gene its set of regulating enhancers (termed "set of regulatory elements" (SRE)). Considering multiple enhancers of each gene as one unit holds great promise for enhancing the identification of driver regulatory SMs. However, the success of this approach is greatly dependent on the availability of comprehensive and accurate enhancer-promoter (E-P) maps. To date, the discovery of driver regulatory SMs has been hindered by insufficient sample sizes and statistical analyses that often considered each regulatory element separately. In this study, we analyzed more than 2,500 whole-genome sequence (WGS) samples provided by The Cancer Genome Atlas (TCGA) and The International Cancer Genome Consortium (ICGC) in order to identify such driver regulatory SMs. Our analyses took into account the combinatorial aspect of gene regulation by considering all the enhancers that control the same target gene as one unit, based on E-P maps from three genomics resources. The identification of candidate driver noncoding SMs is based on their recurrence. We searched for SREs of genes that are "hotspots" for SMs (that is, they accumulate SMs at a significantly elevated rate). To test the statistical significance of recurrence of SMs within a gene's SRE, we used both global and local background mutation rates. Using this approach, we detected - in seven different cancer types - numerous "hotspots" for SMs. To support the functional significance of these recurrent noncoding SMs, we further examined their association with the expression level of their target gene (using gene expression data provided by the ICGC and TCGA for samples that were also analyzed by WGS).

Keywords: cancer genomics, enhancers, noncoding genome, regulatory elements

Procedia PDF Downloads 15
1042 Encryption and Decryption of Nucleic Acid Using Deoxyribonucleic Acid Algorithm

Authors: Iftikhar A. Tayubi, Aabdulrahman Alsubhi, Abdullah Althrwi

Abstract:

The deoxyribonucleic acid text provides a single source of high-quality Cryptography about Deoxyribonucleic acid sequence for structural biologists. We will provide an intuitive, well-organized and user-friendly web interface that allows users to encrypt and decrypt Deoxy Ribonucleic Acid sequence text. It includes complex, securing by using Algorithm to encrypt and decrypt Deoxy Ribonucleic Acid sequence. The utility of this Deoxy Ribonucleic Acid Sequence Text is that, it can provide a user-friendly interface for users to Encrypt and Decrypt store the information about Deoxy Ribonucleic Acid sequence. These interfaces created in this project will satisfy the demands of the scientific community by providing fully encrypt of Deoxy Ribonucleic Acid sequence during this website. We have adopted a methodology by using C# and Active Server Page.NET for programming which is smart and secure. Deoxy Ribonucleic Acid sequence text is a wonderful piece of equipment for encrypting large quantities of data, efficiently. The users can thus navigate from one encoding and store orange text, depending on the field for user’s interest. Algorithm classification allows a user to Protect the deoxy ribonucleic acid sequence from change, whether an alteration or error occurred during the Deoxy Ribonucleic Acid sequence data transfer. It will check the integrity of the Deoxy Ribonucleic Acid sequence data during the access.

Keywords: algorithm, ASP.NET, DNA, encrypt, decrypt

Procedia PDF Downloads 113
1041 Isolation and Characterization of Cotton Infecting Begomoviruses in Alternate Hosts from Cotton Growing Regions of Pakistan

Authors: M. Irfan Fareed, Muhammad Tahir, Alvina Gul Kazi

Abstract:

Castor bean (Ricinus communis; family Euphorbiaceae) is cultivated for the production of oil and as an ornamental plant throughout tropical regions. Leaf samples from castor bean plants with leaf curl and vein thickening were collected from areas around Okara (Pakistan) in 2011. PCR amplification using diagnostic primers showed the presence of a begomovirus and subsequently the specific pair (BurNF 5’- CCATGGTTGTGGCAGTTGATTGACAGATAC-3’, BurNR 5’- CCATGGATTCACGCACAGGGGAACCC-3’) was used to amplify and clone the whole genome of the virus. The complete nucleotide sequence was determined to be 2,759 nt (accession No. HE985227). Alignments showed the highest levels of nucleotide sequence identity (98.8%) with Cotton leaf curl Burewala virus (CLCuBuV; accession No. JF416947) No. JF416947). The virus in castor beans lacks on intact C2 gene, as is typical of CLCuBuV in cotton. An amplification product of ca. 1.4 kb was obtained in PCR with primers for betasatellites and the complete nucleotide sequence of a clone was determined to be 1373 nt (HE985228). The sequence showed 96.3% nucleotide sequence identity to the recombinant Cotton leaf curl Multan betasatellite (CLCuMB; JF502389). This is the first report of CLCuBuV and its betasatellite infecting castor bean, showing this plant species as an alternate host of the virus. Already many alternate host have been reported from different alternate host like tobacco, tomato, hibiscus, okra, ageratum, Digera arvensis, habiscus, Papaya and now in Ricinus communis. So, it is suggested that these alternate hosts should be avoided to grow near cotton growing regions.

Keywords: Ricinus communis, begomovirus, betasatellite, agriculture

Procedia PDF Downloads 412
1040 Constructing Orthogonal De Bruijn and Kautz Sequences and Applications

Authors: Yaw-Ling Lin

Abstract:

A de Bruijn graph of order k is a graph whose vertices representing all length-k sequences with edges joining pairs of vertices whose sequences have maximum possible overlap (length k−1). Every Hamiltonian cycle of this graph defines a distinct, minimum length de Bruijn sequence containing all k-mers exactly once. A Kautz sequence is the minimal generating sequence so as the sequence of minimal length that produces all possible length-k sequences with the restriction that every two consecutive alphabets in the sequences must be different. A collection of de Bruijn/Kautz sequences are orthogonal if any two sequences are of maximally differ in sequence composition; that is, the maximum length of their common substring is k. In this paper, we discuss how such a collection of (maximal) orthogonal de Bruijn/Kautz sequences can be made and use the algorithm to build up a web application service for the synthesized DNA and other related biomolecular sequences.

Keywords: biomolecular sequence synthesis, de Bruijn sequences, Eulerian cycle, Hamiltonian cycle, Kautz sequences, orthogonal sequences

Procedia PDF Downloads 20
1039 Human Papillomavirus Type 16 E4 Gene Variation as Risk Factor for Cervical Cancer

Authors: Yudi Zhao, Ziyun Zhou, Yueting Yao, Shuying Dai, Zhiling Yan, Longyu Yang, Chuanyin Li, Li Shi, Yufeng Yao

Abstract:

HPV16 E4 gene plays an important role in viral genome amplification and release. Therefore, a variation of the E4 gene nucleic acid sequence may affect the carcinogenicity of HPV16. In order to understand the relationship between the variation of HPV16 E4 gene and cervical cancer, this study was to amplify and sequence the DNA sequences of E4 genes in 118 HPV16-positive cervical cancer patients and 151 HPV16-positive asymptomatic individuals. After obtaining E4 gene sequences, the phylogenetic trees were constructed by the Neighbor-joining method for gene variation analysis. The results showed that: 1) The distribution of HPV16 variants between the case group and the control group differed greatly (P = 0.015),and the Asian-American(AA)variant was likely to relate to the occurrence of cervical cancer. 2) DNA sequence analysis showed that there were significant differences in the distribution of 8 variants between the case group and the control group (P < 0.05). And 3) In European (EUR) variant, two variations, C3384T (L18L) and A3449G (P39P), were associated with the initiation and development of cervical cancer. The results suggested that the variation of HPV16 E4 gene may be a contributor affecting the occurrence as well as the development of cervical cancer, and different HPV16 variants may have different carcinogenic capability.

Keywords: cervical cancer, HPV16, E4 gene, variations

Procedia PDF Downloads 32