Search results for: genome rearrangements

47 An Improved Ant Colony Algorithm for Genome Rearrangements

Abstract:

Genome rearrangement is an important area in computational biology and bioinformatics. The basic problem in genome rearrangements is to compute the edit distance, i.e., the minimum number of operations needed to transform one genome into another. Unfortunately, unsigned genome rearrangement problem is NP-hard. In this study an improved ant colony optimization algorithm to approximate the edit distance is proposed. The main idea is to convert the unsigned permutation to signed permutation and evaluate the ants by using Kaplan algorithm. Two new operations are added to the standard ant colony algorithm: Replacing the worst ants by re-sampling the ants from a new probability distribution and applying the crossover operations on the best ants. The proposed algorithm is tested and compared with the improved breakpoint reversal sort algorithm by using three datasets. The results indicate that the proposed algorithm achieves better accuracy ratio than the previous methods.

Keywords: Ant colony algorithm, Edit distance, Genome breakpoint, Genome rearrangement, Reversal sort.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1857

46 Sorting Primitives and Genome Rearrangementin Bioinformatics: A Unified Perspective

Authors: Swapnoneel Roy, Minhazur Rahman, Ashok Kumar Thakur

Abstract:

Bioinformatics and computational biology involve the use of techniques including applied mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level. Research in computational biology often overlaps with systems biology. Major research efforts in the field include sequence alignment, gene finding, genome assembly, protein structure alignment, protein structure prediction, prediction of gene expression and proteinprotein interactions, and the modeling of evolution. Various global rearrangements of permutations, such as reversals and transpositions,have recently become of interest because of their applications in computational molecular biology. A reversal is an operation that reverses the order of a substring of a permutation. A transposition is an operation that swaps two adjacent substrings of a permutation. The problem of determining the smallest number of reversals required to transform a given permutation into the identity permutation is called sorting by reversals. Similar problems can be defined for transpositions and other global rearrangements. In this work we perform a study about some genome rearrangement primitives. We show how a genome is modelled by a permutation, introduce some of the existing primitives and the lower and upper bounds on them. We then provide a comparison of the introduced primitives.

Keywords: Sorting Primitives, Genome Rearrangements, Transpositions, Block Interchanges, Strip Exchanges.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2098

45 On Reversal and Transposition Medians

Authors: Martin Bader

Abstract:

During the last years, the genomes of more and more species have been sequenced, providing data for phylogenetic recon- struction based on genome rearrangement measures. A main task in all phylogenetic reconstruction algorithms is to solve the median of three problem. Although this problem is NP-hard even for the sim- plest distance measures, there are exact algorithms for the breakpoint median and the reversal median that are fast enough for practical use. In this paper, this approach is extended to the transposition median as well as to the weighted reversal and transposition median. Although there is no exact polynomial algorithm known even for the pairwise distances, we will show that it is in most cases possible to solve these problems exactly within reasonable time by using a branch and bound algorithm.

Keywords: Comparative genomics, genome rearrangements, me-dian, reversals, transpositions.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1640

44 Computing the Similarity and the Diversity in the Species Based on Cronobacter Genome

Authors: E. Al Daoud

Abstract:

The purpose of computing the similarity and the diversity in the species is to trace the process of evolution and to find the relationship between the species and discover the unique, the special, the common and the universal proteins. The proteins of the whole genome of 40 species are compared with the cronobacter genome which is used as reference genome. More than 3 billion pairwise alignments are performed using blastp. Several findings are introduced in this study, for example, we found 172 proteins in cronobacter genome which have insignificant hits in other species, 116 significant proteins in the all tested species with very high score value and 129 common proteins in the plants but have insignificant hits in mammals, birds, fishes, and insects.

Keywords: Genome, species, blastp, conserved genes, cronobacter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 962

43 Block Sorting: A New Characterization and a New Heuristic

Authors: Swapnoneel Roy, Ashok Kumar Thakur, Minhazur Rahman

Abstract:

The Block Sorting problem is to sort a given permutation moving blocks. A block is defined as a substring of the given permutation, which is also a substring of the identity permutation. Block Sorting has been proved to be NP-Hard. Until now two different 2-Approximation algorithms have been presented for block sorting. These are the best known algorithms for Block Sorting till date. In this work we present a different characterization of Block Sorting in terms of a transposition cycle graph. Then we suggest a heuristic, which we show to exhibit a 2-approximation performance guarantee for most permutations.

Keywords: Block Sorting, Optical Character Recognition, Genome Rearrangements, Sorting Primitives, ApproximationAlgorithms

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2087

42 Evolutionary Distance in the Yeast Genome

Authors: Somayyeh Azizi, Saeed Kaboli, Atsushi Yagi

Abstract:

Whole genome duplication (WGD) increased the number of yeast Saccharomyces cerevisiae chromosomes from 8 to 16. In spite of retention the number of chromosomes in the genome of this organism after WGD to date, chromosomal rearrangement events have caused an evolutionary distance between current genome and its ancestor. Studies under evolutionary-based approaches on eukaryotic genomes have shown that the rearrangement distance is an approximable problem. In the case of S. cerevisiae, we describe that rearrangement distance is accessible by using dedoubled adjacency graph drawn for 55 large paired chromosomal regions originated from WGD. Then, we provide a program extracted from a C program database to draw a dedoubled genome adjacency graph for S. cerevisiae. From a bioinformatical perspective, using the duplicated blocks of current genome in S. cerevisiae, we infer that genomic organization of eukaryotes has the potential to provide valuable detailed information about their ancestrygenome.

Keywords: Whole-genome duplication, Evolution, Double-cutand- join operation, Yeast.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1462

41 The Role and Importance of Genome Sequencing in Prediction of Cancer Risk

Authors: M. Sadeghi, H. Pezeshk, R. Tusserkani, A. Sharifi Zarchi, A. Malekpour, M. Foroughmand, S. Goliaei, M. Totonchi, N. Ansari–Pour

Abstract:

The role and relative importance of intrinsic and extrinsic factors in the development of complex diseases such as cancer still remains a controversial issue. Determining the amount of variation explained by these factors needs experimental data and statistical models. These models are nevertheless based on the occurrence and accumulation of random mutational events during stem cell division, thus rendering cancer development a stochastic outcome. We demonstrate that not only individual genome sequencing is uninformative in determining cancer risk, but also assigning a unique genome sequence to any given individual (healthy or affected) is not meaningful. Current whole-genome sequencing approaches are therefore unlikely to realize the promise of personalized medicine. In conclusion, since genome sequence differs from cell to cell and changes over time, it seems that determining the risk factor of complex diseases based on genome sequence is somewhat unrealistic, and therefore, the resulting data are likely to be inherently uninformative.

Keywords: Cancer risk, extrinsic factors, genome sequencing, intrinsic factors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1059

40 SIMGraph: Simplifying Contig Graph to Improve de Novo Genome Assembly Using Next-generation Sequencing Data

Authors: Chien-Ju Li, Chun-Hui Yu, Chi-Chuan Hwang, Tsunglin Liu , Darby Tien-Hao Chang

Abstract:

De novo genome assembly is always fragmented. Assembly fragmentation is more serious using the popular next generation sequencing (NGS) data because NGS sequences are shorter than the traditional Sanger sequences. As the data throughput of NGS is high, the fragmentations in assemblies are usually not the result of missing data. On the contrary, the assembled sequences, called contigs, are often connected to more than one other contigs in a complicated manner, leading to the fragmentations. False connections in such complicated connections between contigs, named a contig graph, are inevitable because of repeats and sequencing/assembly errors. Simplifying a contig graph by removing false connections directly improves genome assembly. In this work, we have developed a tool, SIMGraph, to resolve ambiguous connections between contigs using NGS data. Applying SIMGraph to the assembly of a fungus and a fish genome, we resolved 27.6% and 60.3% ambiguous contig connections, respectively. These results can reduce the experimental efforts in resolving contig connections.

Keywords: Contig graph, NGS, de novo assembly, scaffold.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1685

39 Modified Genome-Scale Metabolic Model of Escherichia coli by Adding Hyaluronic Acid Biosynthesis-Related Enzymes (GLMU2 and HYAD) from Pasteurella multocida

Authors: P. Pasomboon, P. Chumnanpuen, T. E-kobon

Abstract:

Hyaluronic acid (HA) consists of linear heteropolysaccharides repeat of D-glucuronic acid and N-acetyl-D-glucosamine. HA has various useful properties to maintain skin elasticity and moisture, reduce inflammation, and lubricate the movement of various body parts without causing immunogenic allergy. HA can be found in several animal tissues as well as in the capsule component of some bacteria including Pasteurella multocida. This study aimed to modify a genome-scale metabolic model of Escherichia coli using computational simulation and flux analysis methods to predict HA productivity under different carbon sources and nitrogen supplement by the addition of two enzymes (GLMU2 and HYAD) from P. multocida to improve the HA production under the specified amount of carbon sources and nitrogen supplements. Result revealed that threonine and aspartate supplement raised the HA production by 12.186%. Our analyses proposed the genome-scale metabolic model is useful for improving the HA production and narrows the number of conditions to be tested further.

Keywords: Pasteurella multocida, Escherichia coli, hyaluronic acid, genome-scale metabolic model, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 718

38 Error-Robust Nature of Genome Profiling Applied for Clustering of Species Demonstrated by Computer Simulation

Authors: Shamim Ahmed Koichi Nishigaki

Abstract:

Genome profiling (GP), a genotype based technology, which exploits random PCR and temperature gradient gel electrophoresis, has been successful in identification/classification of organisms. In this technology, spiddos (Species identification dots) and PaSS (Pattern similarity score) were employed for measuring the closeness (or distance) between genomes. Based on the closeness (PaSS), we can buildup phylogenetic trees of the organisms. We noticed that the topology of the tree is rather robust against the experimental fluctuation conveyed by spiddos. This fact was confirmed quantitatively in this study by computer-simulation, providing the limit of the reliability of this highly powerful methodology. As a result, we could demonstrate the effectiveness of the GP approach for identification/classification of organisms.

Keywords: Fluctuation, Genome profiling (GP), Pattern similarity score (PaSS), Robustness, Spiddos-shift.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490

37 Application of Whole Genome Amplification Technique for Genotype Analysis of Bovine Embryos

Authors: S. Moghaddaszadeh-Ahrabi, S. Farajnia, Gh. Rahimi-Mianji, A. Nejati-Javaremi

Abstract:

In recent years, there has been an increasing interest toward the use of bovine genotyped embryos for commercial embryo transfer programs. Biopsy of a few cells in morulla stage is essential for preimplantation genetic diagnosis (PGD). Low amount of DNA have limited performing the several molecular analyses within PGD analyses. Whole genome amplification (WGA) promises to eliminate this problem. We evaluated the possibility and performance of an improved primer extension preamplification (I-PEP) method with a range of starting bovine genomic DNA from 1-8 cells into the WGA reaction. We optimized a short and simple I-PEP (ssI-PEP) procedure (~3h). This optimized WGA method was assessed by 6 loci specific polymerase chain reactions (PCRs), included restriction fragments length polymorphism (RFLP). Optimized WGA procedure possesses enough sensitivity for molecular genetic analyses through the few input cells. This is a new era for generating characterized bovine embryos in preimplantation stage.

Keywords: Whole genome amplification (WGA), Genotyping, Bovine, Preimplantation genetic diagnosis (PGD)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1621

36 Analysis and Classification of Hiv-1 Sub- Type Viruses by AR Model through Artificial Neural Networks

Authors: O. Yavuz, L. Ozyilmaz

Abstract:

HIV-1 genome is highly heterogeneous. Due to this variation, features of HIV-I genome is in a wide range. For this reason, the ability to infection of the virus changes depending on different chemokine receptors. From this point of view, R5 HIV viruses use CCR5 coreceptor while X4 viruses use CXCR5 and R5X4 viruses can utilize both coreceptors. Recently, in Bioinformatics, R5X4 viruses have been studied to classify by using the experiments on HIV-1 genome. In this study, R5X4 type of HIV viruses were classified using Auto Regressive (AR) model through Artificial Neural Networks (ANNs). The statistical data of R5X4, R5 and X4 viruses was analyzed by using signal processing methods and ANNs. Accessible residues of these virus sequences were obtained and modeled by AR model since the dimension of residues is large and different from each other. Finally the pre-processed data was used to evolve various ANN structures for determining R5X4 viruses. Furthermore ROC analysis was applied to ANNs to show their real performances. The results indicate that R5X4 viruses successfully classified with high sensitivity and specificity values training and testing ROC analysis for RBF, which gives the best performance among ANN structures.

Keywords: Auto-Regressive Model, HIV, Neural Networks, ROC Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1138

35 An Integrated Biotechnology Database of the National Agricultural Information Center in Korea

Authors: Chang Kug Kim, Dong Suk Park, Young Joo Seol, Jang Ho Hahn

Abstract:

The National Agricultural Biotechnology Information Center (NABIC) plays a leading role in the biotechnology information database for agricultural plants in Korea. Since 2002, we have concentrated on functional genomics of major crops, building an integrated biotechnology database for agro-biotech information that focuses on bioinformatics of major agricultural resources such as rice, Chinese cabbage, and microorganisms. In the NABIC, integration-based biotechnology database provides useful information through a user-friendly web interface that allows analysis of genome infrastructure, multiple plants, microbial resources, and living modified organisms.

Keywords: biotechnology, database, genome information

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2503

34 Reconstruction of a Genome-Scale Metabolic Model to Simulate Uncoupled Growth of Zymomonas mobilis

Authors: Maryam Saeidi, Ehsan Motamedian, Seyed Abbas Shojaosadati

Abstract:

Zymomonas mobilis is known as an example of the uncoupled growth phenomenon. This microorganism also has a unique metabolism that degrades glucose by the Entner–Doudoroff (ED) pathway. In this paper, a genome-scale metabolic model including 434 genes, 757 reactions and 691 metabolites was reconstructed to simulate uncoupled growth and study its effect on flux distribution in the central metabolism. The model properly predicted that ATPase was activated in experimental growth yields of Z. mobilis. Flux distribution obtained from model indicates that the major carbon flux passed through ED pathway that resulted in the production of ethanol. Small amounts of carbon source were entered into pentose phosphate pathway and TCA cycle to produce biomass precursors. Predicted flux distribution was in good agreement with experimental data. The model results also indicated that Z. mobilis metabolism is able to produce biomass with maximum growth yield of 123.7 g (mol glucose)-1 if ATP synthase is coupled with growth and produces 82 mmol ATP gDCW-1h-1. Coupling the growth and energy reduced ethanol secretion and changed the flux distribution to produce biomass precursors.

Keywords: Genome-scale metabolic model, Zymomonas mobilis, uncoupled growth, flux distribution, ATP dissipation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1102

33 Exons and Introns Classification in Human and Other Organisms

Authors: Benjamin Y. M. Kwan, Jennifer Y. Y. Kwan, Hon Keung Kwan

Abstract:

In the paper, the relative performances on spectral classification of short exon and intron sequences of the human and eleven model organisms is studied. In the simulations, all combinations of sixteen one-sequence numerical representations, four threshold values, and four window lengths are considered. Sequences of 150-base length are chosen and for each organism, a total of 16,000 sequences are used for training and testing. Results indicate that an appropriate combination of one-sequence numerical representation, threshold value, and window length is essential for arriving at top spectral classification results. For fixed-length sequences, the precisions on exon and intron classification obtained for different organisms are not the same because of their genomic differences. In general, precision increases as sequence length increases.

Keywords: Exons and introns classification, Human genome, Model organism genome, Spectral analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2009

32 Genome-Wide Analysis of BES1/BZR1 Gene Family in Five Plant Species

Authors: Jafar Ahmadi, Zhohreh Asiaban, Sedigheh Fabriki Ourang

Abstract:

Brassinosteroids (BRs) regulate cell elongation, vascular differentiation, senescence, and stress responses. BRs signal through the BES1/BZR1 family of transcription factors, which regulate hundreds of target genes involved in this pathway. In this research a comprehensive genome-wide analysis was carried out in BES1/BZR1 gene family in Arabidopsis thaliana, Cucumis sativus, Vitis vinifera, Glycin max and Brachypodium distachyon. Specifications of the desired sequences, dot plot and hydropathy plot were analyzed in the protein and genome sequences of five plant species. The maximum amino acid length was attributed to protein sequence Brdic3g with 374aa and the minimum amino acid length was attributed to protein sequence Gm7g with 163aa. The maximum Instability index was attributed to protein sequence AT1G19350 equal with 79.99 and the minimum Instability index was attributed to protein sequence Gm5g equal with 33.22. Aliphatic index of these protein sequences ranged from 47.82 to 78.79 in Arabidopsis thaliana, 49.91 to 57.50 in Vitis vinifera, 55.09 to 82.43 in Glycin max, 54.09 to 54.28 in Brachypodium distachyon 55.36 to 56.83 in Cucumis sativus. Overall, data obtained from our investigation contributes a better understanding of the complexity of the BES1/BZR1 gene family and provides the first step towards directing future experimental designs to perform systematic analysis of the functions of the BES1/BZR1 gene family.

Keywords: BES1/BZR1, Brassinosteroids, Phylogenetic analysis, Transcription factor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2205

31 Optimal Multilayer Perceptron Structure For Classification of HIV Sub-Type Viruses

Authors: Zeyneb Kurt, Oguzhan Yavuz

Abstract:

The feature of HIV genome is in a wide range because of it is highly heterogeneous. Hence, the infection ability of the virus changes related with different chemokine receptors. From this point, R5 and X4 HIV viruses use CCR5 and CXCR5 coreceptors respectively while R5X4 viruses can utilize both coreceptors. Recently, in Bioinformatics, R5X4 viruses have been studied to classify by using the coreceptors of HIV genome. The aim of this study is to develop the optimal Multilayer Perceptron (MLP) for high classification accuracy of HIV sub-type viruses. To accomplish this purpose, the unit number in hidden layer was incremented one by one, from one to a particular number. The statistical data of R5X4, R5 and X4 viruses was preprocessed by the signal processing methods. Accessible residues of these virus sequences were extracted and modeled by Auto-Regressive Model (AR) due to the dimension of residues is large and different from each other. Finally the pre-processed dataset was used to evolve MLP with various number of hidden units to determine R5X4 viruses. Furthermore, ROC analysis was used to figure out the optimal MLP structure.

Keywords: Multilayer Perceptron, Auto-Regressive Model, HIV, ROC Analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1392

30 Bioinformatic Analysis of Retroelement-Associated Sequences in Human and Mouse Promoters

Authors: Nadezhda M. Usmanova, Nikolai V. Tomilin

Abstract:

Mammalian genomes contain large number of retroelements (SINEs, LINEs and LTRs) which could affect expression of protein coding genes through associated transcription factor binding sites (TFBS). Activity of the retroelement-associated TFBS in many genes is confirmed experimentally but their global functional impact remains unclear. Human SINEs (Alu repeats) and mouse SINEs (B1 and B2 repeats) are known to be clustered in GCrich gene rich genome segments consistent with the view that they can contribute to regulation of gene expression. We have shown earlier that Alu are involved in formation of cis-regulatory modules (clusters of TFBS) in human promoters, and other authors reported that Alu located near promoter CpG islands have an increased frequency of CpG dinucleotides suggesting that these Alu are undermethylated. Human Alu and mouse B1/B2 elements have an internal bipartite promoter for RNA polymerase III containing conserved sequence motif called B-box which can bind basal transcription complex TFIIIC. It has been recently shown that TFIIIC binding to B-box leads to formation of a boundary which limits spread of repressive chromatin modifications in S. pombe. SINEassociated B-boxes may have similar function but conservation of TFIIIC binding sites in SINEs located near mammalian promoters has not been studied earlier. Here we analysed abundance and distribution of retroelements (SINEs, LINEs and LTRs) in annotated sequences of the Database of mammalian transcription start sites (DBTSS). Fractions of SINEs in human and mouse promoters are slightly lower than in all genome but >40% of human and mouse promoters contain Alu or B1/B2 elements within -1000 to +200 bp interval relative to transcription start site (TSS). Most of these SINEs is associated with distal segments of promoters (-1000 to -200 bp relative to TSS) indicating that their insertion at distances >200 bp upstream of TSS is tolerated during evolution. Distribution of SINEs in promoters correlates negatively with the distribution of CpG sequences. Using analysis of abundance of 12-mer motifs from the B1 and Alu consensus sequences in genome and DBTSS it has been confirmed that some subsegments of Alu and B1 elements are poorly conserved which depends in part on the presence of CpG dinucleotides. One of these CpG-containing subsegments in B1 elements overlaps with SINE-associated B-box and it shows better conservation in DBTSS compared to genomic sequences. It has been also studied conservation in DBTSS and genome of the B-box containing segments of old (AluJ, AluS) and young (AluY) Alu repeats and found that CpG sequence of the B-box of old Alu is better conserved in DBTSS than in genome. This indicates that Bbox- associated CpGs in promoters are better protected from methylation and mutation than B-box-associated CpGs in genomic SINEs. These results are consistent with the view that potential TFIIIC binding motifs in SINEs associated with human and mouse promoters may be functionally important. These motifs may protect promoters from repressive histone modifications which spread from adjacent sequences. This can potentially explain well known clustering of SINEs in GC-rich gene rich genome compartments and existence of unmethylated CpG islands.

Keywords: Retroelement, promoter, CpG island, DNAmethylation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1527

29 Identification of Complex Sense-antisense Gene's Module on 17q11.2 Associated with Breast Cancer Aggressiveness and Patient's Survival

Authors: O. Grinchuk, E. Motakis, V. Kuznetsov

Abstract:

Sense-antisense gene pair (SAGP) is a pair of two oppositely transcribed genes sharing a common region on a chromosome. In the mammalian genomes, SAGPs can be organized in more complex sense-antisense gene architectures (CSAGA) in which at least one gene could share loci with two or more antisense partners. Many dozens of CSAGAs can be found in the human genome. However, CSAGAs have not been systematically identified and characterized in context of their role in human diseases including cancers. In this work we characterize the structural-functional properties of a cluster of 5 genes –TMEM97, IFT20, TNFAIP1, POLDIP2 and TMEM199, termed TNFAIP1 / POLDIP2 module. This cluster is organized as CSAGA in cytoband 17q11.2. Affymetrix U133A&B expression data of two large cohorts (410 atients, in total) of breast cancer patients and patient survival data were used. For the both studied cohorts, we demonstrate (i) strong and reproducible transcriptional co-regulatory patterns of genes of TNFAIP1/POLDIP2 module in breast cancer cell subtypes and (ii) significant associations of TNFAIP1/POLDIP2 CSAGA with amplification of the CSAGA region in breast cancer, (ii) cancer aggressiveness (e.g. genetic grades) and (iv) disease free patient-s survival. Moreover, gene pairs of this module demonstrate strong synergetic effect in the prognosis of time of breast cancer relapse. We suggest that TNFAIP1/ POLDIP2 cluster can be considered as a novel type of structural-functional gene modules in the human genome.

Keywords: Sense-antisense gene pair, complex genome architecture, TMEM97, IFT20, TNFAIP1, POLDIP2, TMEM199, 17q11.2, breast cancer, transcription regulation, survival analysis, prognosis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1625

28 A Hybrid Feature Selection and Deep Learning Algorithm for Cancer Disease Classification

Authors: Niousha Bagheri Khulenjani, Mohammad Saniee Abadeh

Abstract:

Learning from very big datasets is a significant problem for most present data mining and machine learning algorithms. MicroRNA (miRNA) is one of the important big genomic and non-coding datasets presenting the genome sequences. In this paper, a hybrid method for the classification of the miRNA data is proposed. Due to the variety of cancers and high number of genes, analyzing the miRNA dataset has been a challenging problem for researchers. The number of features corresponding to the number of samples is high and the data suffer from being imbalanced. The feature selection method has been used to select features having more ability to distinguish classes and eliminating obscures features. Afterward, a Convolutional Neural Network (CNN) classifier for classification of cancer types is utilized, which employs a Genetic Algorithm to highlight optimized hyper-parameters of CNN. In order to make the process of classification by CNN faster, Graphics Processing Unit (GPU) is recommended for calculating the mathematic equation in a parallel way. The proposed method is tested on a real-world dataset with 8,129 patients, 29 different types of tumors, and 1,046 miRNA biomarkers, taken from The Cancer Genome Atlas (TCGA) database.

Keywords: Cancer classification, feature selection, deep learning, genetic algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1172

27 Investigation in Physically-Chemical Parameters of in Latvia Harvested Conventional and Organic Triticale Grains

Authors: Solvita Kalnina, Tatjana Rakcejeva, Daiga Kunkulberga, Anda Linina

Abstract:

Triticale is a manmade hybrid of wheat and rye that carries the A and B genome of durum wheat and the R genome of rye. In the scientific literature information about in Latvia harvested organic and conventional triticale grain physically-chemical composition was not found in general. Therefore, the main purpose of the current research was to investigate physically-chemical parameters of in Latvia harvested organic and convectional triticale grains. The research was accomplished on in Year 2012 from State Priekuli Plant Breeding Institute (Latvia) harvested organic and conventional triticale grains: “Dinaro”, “9403-97”, “9405-23” and “9402-3”. In the present research significant differences in chemical composition between organic and conventional triticale grains harvested in Latvia was found. It is necessary to mention that higher 1000 grain weight, bulk density and gluten index was obtained for conventional and organic triticale grain variety “9403-97”. However higher falling number, gluten and protein content was obtained for triticale grain variety “9405-23”.

Keywords: Physically-chemical parameters, technological properties, triticale grains.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2642

26 BeamGA Median: A Hybrid Heuristic Search Approach

Authors: Ghada Badr, Manar Hosny, Nuha Bintayyash, Eman Albilali, Souad Larabi Marie-Sainte

Abstract:

The median problem is significantly applied to derive the most reasonable rearrangement phylogenetic tree for many species. More specifically, the problem is concerned with finding a permutation that minimizes the sum of distances between itself and a set of three signed permutations. Genomes with equal number of genes but different order can be represented as permutations. In this paper, an algorithm, namely BeamGA median, is proposed that combines a heuristic search approach (local beam) as an initialization step to generate a number of solutions, and then a Genetic Algorithm (GA) is applied in order to refine the solutions, aiming to achieve a better median with the smallest possible reversal distance from the three original permutations. In this approach, any genome rearrangement distance can be applied. In this paper, we use the reversal distance. To the best of our knowledge, the proposed approach was not applied before for solving the median problem. Our approach considers true biological evolution scenario by applying the concept of common intervals during the GA optimization process. This allows us to imitate a true biological behavior and enhance genetic approach time convergence. We were able to handle permutations with a large number of genes, within an acceptable time performance and with same or better accuracy as compared to existing algorithms.

Keywords: Median problem, phylogenetic tree, permutation, genetic algorithm, beam search, genome rearrangement distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 929

25 Statistics of Exon Lengths in Animals, Plants, Fungi, and Protists

Authors: Alexander Kaplunovsky, Vladimir Khailenko, Alexander Bolshoy, Shara Atambayeva, AnatoliyIvashchenko

Abstract:

Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from the RNA transcripts before translation into a protein. The exon-intron structures of different eukaryotic species are quite different from each other, and the evolution of such structures raises many questions. We try to address some of these questions using statistical analysis of whole genomes. We go through all the protein-coding genes in a genome and study correlations between the net length of all the exons in a gene, the number of the exons, and the average length of an exon. We also take average values of these features for each chromosome and study correlations between those averages on the chromosomal level. Our data show universal features of exon-intron structures common to animals, plants, and protists (specifically, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Cryptococcus neoformans, Homo sapiens, Mus musculus, Oryza sativa, and Plasmodium falciparum). We have verified linear correlation between the number of exons in a gene and the length of a protein coded by the gene, while the protein length increases in proportion to the number of exons. On the other hand, the average length of an exon always decreases with the number of exons. Finally, chromosome clustering based on average chromosome properties and parameters of linear regression between the number of exons in a gene and the net length of those exons demonstrates that these average chromosome properties are genome-specific features.

Keywords: Comparative genomics, exon-intron structure, eukaryotic clustering, linear regression.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2516

24 Computational Analysis of the MembraneTargeting Domains of Plant-specific PRAF Proteins

Authors: Ewa Wywial, Shaneen M. Singh

Abstract:

The PRAF family of proteins is a plant specific family of proteins with distinct domain architecture and various unique sequence/structure traits. We have carried out an extensive search of the Arabidopsis genome using an automated pipeline and manual methods to verify previously known and identify unknown instances of PRAF proteins, characterize their sequence and build 3D structures of their individual domains. Integrating the sequence, structure and whatever little known experimental details for each of these proteins and their domains, we present a comprehensive characterization of the different domains in these proteins and their variant properties.

Keywords: PRAF proteins, homology modeling, Arabidopsisthaliana

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1370

23 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: Metagenomics, phenotype prediction, deep learning, embeddings, multiple instance learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 814

22 Integration of Microarray Data into a Genome-Scale Metabolic Model to Study Flux Distribution after Gene Knockout

Authors: Mona Heydari, Ehsan Motamedian, Seyed Abbas Shojaosadati

Abstract:

Prediction of perturbations after genetic manipulation (especially gene knockout) is one of the important challenges in systems biology. In this paper, a new algorithm is introduced that integrates microarray data into the metabolic model. The algorithm was used to study the change in the cell phenotype after knockout of Gss gene in Escherichia coli BW25113. Algorithm implementation indicated that gene deletion resulted in more activation of the metabolic network. Growth yield was more and less regulating gene were identified for mutant in comparison with the wild-type strain.

Keywords: Metabolic network, gene knockout, flux balance analysis, microarray data, integration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 955

21 Novel Inhibitor of E. coli DNA Adenine Methyltransferase (Ecodam)

Authors: H. Elsawy, A. Jeltsch

Abstract:

EcoDam is an adenine-N6 DNA methyltransferase that methylates the GATC sites in the Escherichia coli genome. DNA-adenine methylation is not present in higher eukaryotes including humans. These observations raise the possibility that dam inhibitors may be used as anti-microbial agents. Polyphosphate (Poly(P)) is an important metabolite and signaling molecule in prokaryotes and eukaryotes. Here, by using gel retardation experiments to investigate the competition of DNA binding by EcoDam in the presence of polyphosphate, we found that Poly (P) strongly interferes with DNA binding by EcoDam, while same concentration of monophosphate does not. In addition, we demonstrated that Poly (P) binding inhibits the activity of EcoDam and our results suggest that Poly (P) led to strong inhibition of the EcoDam catalytic activity, while monophosphate had only moderate effect.

Keywords: Antibacterial drugs, EcoDam inhibitors, Polyphosphate.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2519

20 Physicians’ Knowledge and Perception of Gene Profiling in Malaysia

Authors: Farahnaz Amini, Woo Yun Kin, Lazwani Kolandaiveloo

Abstract:

Availability of different genetic tests after completion of Human Genome Project increases the physicians’ responsibility to keep themselves update on the potential implementation of these genetic tests in their daily practice. However, due to numbers of barriers, still many of physicians are not either aware of these tests or are not willing to offer or refer their patients for genetic tests. This study was conducted an anonymous, cross-sectional, mailed-based survey to develop a primary data of Malaysian physicians’ level of knowledge and perception of gene profiling. Questionnaire had 29 questions. Total scores on selected questions were used to assess the level of knowledge. The highest possible score was 11. Descriptive statistics, one way ANOVA and chi-squared test was used for statistical analysis. Sixty three completed questionnaires were returned by 27 general practitioners (GPs) and 36 medical specialists. Responders’ age ranges from 24 to 55 years old (mean 30.2 ± 6.4). About 40% of the participants rated themselves as having poor level of knowledge in genetics in general whilst 60% believed that they have fair level of knowledge; however, almost half (46%) of the respondents felt that they were not knowledgeable about available genetic tests. A majority (94%) of the responders were not aware of any lab or company which is offering gene profiling services in Malaysia. Only 4% of participants were aware of using gene profiling for detection of dosage of some drugs. Respondents perceived greater utility of gene profiling for breast cancer (38%) compared to the colorectal familial cancer (3%). The score of knowledge ranged from 2 to 8 (mean 4.38 ± 1.67). Non- significant differences between score of knowledge of GPs and specialists were observed, with score of 4.19 and 4.58 respectively. There was no significant association between any demographic factors and level of knowledge. However, those who graduated between years 2001 to 2005 had higher level of knowledge. Overall, 83% of participants showed relatively high level of perception on value of gene profiling to detect patient’s risk of disease. However, low perception was observed for both statements of using gene profiling for general population in order to alter their lifestyle (25%) as well as having the full sequence of a patient genome for the purpose of determining a patient’s best match for treatment (18%). The lack of clinical guidelines, limited provider knowledge and awareness, lack of time and resources to educate patients, lack of evidence-based clinical information and cost of tests were the most barriers of ordering gene profiling mentioned by physicians. In conclusion Malaysian physicians who participate in this study had mediocre level of knowledge and awareness in gene profiling. The low exposure to the genetic questions and problems might be a key predictor of lack of awareness and knowledge on available genetic tests. Educational and training workshop might be useful in helping Malaysian physicians incorporate genetic profiling into practice for eligible patients.

Keywords: Gene Profiling, Knowledge, Malaysia, Physician.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1906

19 Detection of Pathogenic Escherichia coli Strains Pollution in Red Deer Meat in Latvia and Determination the Compatibility of VT1, VT2, eae A Genes in their Isolate

Authors: S. Liepina, A. Jemeljanovs

Abstract:

Tasks of the work were study the possible E.coli contamination in red deer meat, identify pathogenic strains from isolated E.coli, determine their incidence in red deer meat and determine the presence of VT1, VT2 and eaeA genes for the pathogenic E.coli. 8 (10%) samples were randomly selected from 80 analysed isolates of E.coli and PCR reaction was performed on them. PCR was done both on initial materials – samples of red deer meat - and for already isolated liqueurs. Two of analysed venison samples contain verotoxin-producing strains of E. coli. It means that this meat is not safe to consumer. It was proven by the sequestration reaction of E. coli and by comparison of the obtained results with the database of microorganism genome available on the internet that the isolated culture corresponds to region 16S rDNS of E. coli thus presenting correctness of the microbiological methods.

Keywords: Deer meat, pathogenic Escherichia coli

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1952

18 From Primer Generation to Chromosome Identification: A Primer Generation Genotyping Method for Bacterial Identification and Typing

Authors: Wisam H. Benamer, Ehab A. Elfallah, Mohamed A. Elshaari, Farag A. Elshaari

Abstract:

A challenge for laboratories is to provide bacterial identification and antibiotic sensitivity results within a short time. Hence, advancement in the required technology is desirable to improve timing, accuracy and quality. Even with the current advances in methods used for both phenotypic and genotypic identification of bacteria the need is there to develop method(s) that enhance the outcome of bacteriology laboratories in accuracy and time. The hypothesis introduced here is based on the assumption that the chromosome of any bacteria contains unique sequences that can be used for its identification and typing. The outcome of a pilot study designed to test this hypothesis is reported in this manuscript. Methods: The complete chromosome sequences of several bacterial species were downloaded to use as search targets for unique sequences. Visual basic and SQL server (2014) were used to generate a complete set of 18-base long primers, a process started with reverse translation of randomly chosen 6 amino acids to limit the number of the generated primers. In addition, the software used to scan the downloaded chromosomes using the generated primers for similarities was designed, and the resulting hits were classified according to the number of similar chromosomal sequences, i.e., unique or otherwise. Results: All primers that had identical/similar sequences in the selected genome sequence(s) were classified according to the number of hits in the chromosomes search. Those that were identical to a single site on a single bacterial chromosome were referred to as unique. On the other hand, most generated primers sequences were identical to multiple sites on a single or multiple chromosomes. Following scanning, the generated primers were classified based on ability to differentiate between medically important bacterial and the initial results looks promising. Conclusion: A simple strategy that started by generating primers was introduced; the primers were used to screen bacterial genomes for match. Primer(s) that were uniquely identical to specific DNA sequence on a specific bacterial chromosome were selected. The identified unique sequence can be used in different molecular diagnostic techniques, possibly to identify bacteria. In addition, a single primer that can identify multiple sites in a single chromosome can be exploited for region or genome identification. Although genomes sequences draft of isolates of organism DNA enable high throughput primer design using alignment strategy, and this enhances diagnostic performance in comparison to traditional molecular assays. In this method the generated primers can be used to identify an organism before the draft sequence is completed. In addition, the generated primers can be used to build a bank for easy access of the primers that can be used to identify bacteria.

Keywords: Bacteria chromosome, bacterial identification, sequence, primer generation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 994