Search results for: Genomic%20regions%20of%20differences
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 44

Search results for: Genomic%20regions%20of%20differences

14 Selecting Negative Examples for Protein-Protein Interaction

Authors: Mohammad Shoyaib, M. Abdullah-Al-Wadud, Oksam Chae

Abstract:

Proteomics is one of the largest areas of research for bioinformatics and medical science. An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. Predicting Protein-Protein Interaction (PPI) is one of the crucial and decisive problems in current research. Genomic data offer a great opportunity and at the same time a lot of challenges for the identification of these interactions. Many methods have already been proposed in this regard. In case of in-silico identification, most of the methods require both positive and negative examples of protein interaction and the perfection of these examples are very much crucial for the final prediction accuracy. Positive examples are relatively easy to obtain from well known databases. But the generation of negative examples is not a trivial task. Current PPI identification methods generate negative examples based on some assumptions, which are likely to affect their prediction accuracy. Hence, if more reliable negative examples are used, the PPI prediction methods may achieve even more accuracy. Focusing on this issue, a graph based negative example generation method is proposed, which is simple and more accurate than the existing approaches. An interaction graph of the protein sequences is created. The basic assumption is that the longer the shortest path between two protein-sequences in the interaction graph, the less is the possibility of their interaction. A well established PPI detection algorithm is employed with our negative examples and in most cases it increases the accuracy more than 10% in comparison with the negative pair selection method in that paper.

Keywords: Interaction graph, Negative training data, Protein-Protein interaction, Support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1666
13 Computing Entropy for Ortholog Detection

Authors: Hsing-Kuo Pao, John Case

Abstract:

Biological sequences from different species are called or-thologs if they evolved from a sequence of a common ancestor species and they have the same biological function. Approximations of Kolmogorov complexity or entropy of biological sequences are already well known to be useful in extracting similarity information between such sequences -in the interest, for example, of ortholog detection. As is well known, the exact Kolmogorov complexity is not algorithmically computable. In prac-tice one can approximate it by computable compression methods. How-ever, such compression methods do not provide a good approximation to Kolmogorov complexity for short sequences. Herein is suggested a new ap-proach to overcome the problem that compression approximations may notwork well on short sequences. This approach is inspired by new, conditional computations of Kolmogorov entropy. A main contribution of the empir-ical work described shows the new set of entropy-based machine learning attributes provides good separation between positive (ortholog) and nega-tive (non-ortholog) data - better than with good, previously known alter-natives (which do not employ some means to handle short sequences well).Also empirically compared are the new entropy based attribute set and a number of other, more standard similarity attributes sets commonly used in genomic analysis. The various similarity attributes are evaluated by cross validation, through boosted decision tree induction C5.0, and by Receiver Operating Characteristic (ROC) analysis. The results point to the conclu-sion: the new, entropy based attribute set by itself is not the one giving the best prediction; however, it is the best attribute set for use in improving the other, standard attribute sets when conjoined with them.

Keywords: compression, decision tree, entropy, ortholog, ROC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1795
12 Web–Based Tools and Databases for Micro-RNA Analysis: A Review

Authors: Sitansu Kumar Verma, Soni Yadav, Jitendra Singh, Shraddha, Ajay Kumar

Abstract:

MicroRNAs (miRNAs), a class of approximately 22 nucleotide long non coding RNAs which play critical role in different biological processes. The mature microRNA is usually 19–27 nucleotides long and is derived from a bigger precursor that folds into a flawed stem-loop structure. Mature micro RNAs are involved in many cellular processes that encompass development, proliferation, stress response, apoptosis, and fat metabolism by gene regulation. Resent finding reveals that certain viruses encode their own miRNA that processed by cellular RNAi machinery. In recent research indicate that cellular microRNA can target the genetic material of invading viruses. Cellular microRNA can be used in the virus life cycle; either to up regulate or down regulate viral gene expression Computational tools use in miRNA target prediction has been changing drastically in recent years. Many of the methods have been made available on the web and can be used by experimental researcher and scientist without expert knowledge of bioinformatics. With the development and ease of use of genomic technologies and computational tools in the field of microRNA biology has superior tremendously over the previous decade. This review attempts to give an overview over the genome wide approaches that have allow for the discovery of new miRNAs and development of new miRNA target prediction tools and databases.

Keywords: MicroRNAs, computational tools, gene regulation, databases, RNAi.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3132
11 Detection of Transgenes in Cotton (Gossypium hirsutum L.) by Using Biotechnology/Molecular Biological Techniques

Authors: Ahmad Ali Shahid, Muhammad Shakil Shaukat, Kamran Shehzad Bajwa, Abdul Qayyum Rao, Tayyab Husnain

Abstract:

Agriculture is the backbone of economy of Pakistan and cotton is the major agricultural export and supreme source of raw fiber for our textile industry. To combat severe problems of insect and weed, combination of three genes namely Cry1Ac, Cry2A and EPSPS genes was transferred in locally cultivated cotton variety MNH-786 with the use of Agrobacterium mediated genetic transformation. The present study focused on the molecular screening of transgenic cotton plants at T3 generation in order to confirm integration and expression of all three genes (Cry1Ac, Cry2A and EPSP synthase) into the cotton genome. Initially, glyphosate spray assay was used for screening of transgenic cotton plants containing EPSP synthase gene at T3 generation. Transgenic cotton plants which were healthy and showed no damage on leaves were selected after 07 days of spray. For molecular analysis of transgenic cotton plants in the laboratory, the genomic DNA of these transgenic cotton plants were isolated and subjected to amplification of the three genes. Thus, seventeen out of twenty (Cry1Ac gene), ten out of twenty (Cry2A gene) and all twenty (EPSP synthase gene) were produced positive amplification. On the base of PCR amplification, ten transgenic plant samples were subjected to protein expression analysis through ELISA. The results showed that eight out of ten plants were actively expressing the three transgenes. Real-time PCR was also done to quantify the mRNA expression levels of Cry1Ac and EPSP synthase gene. Finally, eight plants were confirmed for the presence and active expression of all three genes at T3 generation.

Keywords: Agriculture, Cotton, Transformation, Cry Genes, ELISA and PCR.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3084
10 Evolutionary Origin of the αC Helix in Integrins

Authors: B. Chouhan, A. Denesyuk, J. Heino, M. S. Johnson, K. Denessiouk

Abstract:

Integrins are a large family of multidomain α/β cell signaling receptors. Some integrins contain an additional inserted I domain, whose earliest expression appears to be with the chordates, since they are observed in the urochordates Ciona intestinalis (vase tunicate) and Halocynthia roretzi (sea pineapple), but not in integrins of earlier diverging species. The domain-s presence is viewed as a hallmark of integrins of higher metazoans, however in vertebrates, there are clearly three structurally-different classes: integrins without I domains, and two groups of integrins with I domains but separable by the presence or absence of an additional αC helix. For example, the αI domains in collagen-binding integrins from Osteichthyes (bony fish) and all higher vertebrates contain the specific αC helix, whereas the αI domains in non-collagen binding integrins from vertebrates and the αI domains from earlier diverging urochordate integrins, i.e. tunicates, do not. Unfortunately, within the early chordates, there is an evolutionary gap due to extinctions between the tunicates and cartilaginous fish. This, coupled with a knowledge gap due to the lack of complete genomic data from surviving species, means that the origin of collagen-binding αC-containing αI domains remains unknown. Here, we analyzed two available genomes from Callorhinchus milii (ghost shark/elephant shark; Chondrichthyes – cartilaginous fish) and Petromyzon marinus (sea lamprey; Agnathostomata), and several available Expression Sequence Tags from two Chondrichthyes species: Raja erinacea (little skate) and Squalus acanthias (dogfish shark); and Eptatretus burgeri (inshore hagfish; Agnathostomata), which evolutionary reside between the urochordates and osteichthyes. In P. marinus, we observed several fragments coding for the αC-containing αI domain, allowing us to shed more light on the evolution of the collagen-binding integrins.

Keywords: Integrin αI domain, integrin evolution, collagen binding, structure, αC helix

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3632
9 Screen of MicroRNA Targets in Zebrafish Using Heterogeneous Data Sources: A Case Study for Dre-miR-10 and Dre-miR-196

Authors: Yanju Zhang, Joost M. Woltering, Fons J. Verbeek

Abstract:

It has been established that microRNAs (miRNAs) play an important role in gene expression by post-transcriptional regulation of messengerRNAs (mRNAs). However, the precise relationships between microRNAs and their target genes in sense of numbers, types and biological relevance remain largely unclear. Dissecting the miRNA-target relationships will render more insights for miRNA targets identification and validation therefore promote the understanding of miRNA function. In miRBase, miRanda is the key algorithm used for target prediction for Zebrafish. This algorithm is high-throughput but brings lots of false positives (noise). Since validation of a large scale of targets through laboratory experiments is very time consuming, several computational methods for miRNA targets validation should be developed. In this paper, we present an integrative method to investigate several aspects of the relationships between miRNAs and their targets with the final purpose of extracting high confident targets from miRanda predicted targets pool. This is achieved by using the techniques ranging from statistical tests to clustering and association rules. Our research focuses on Zebrafish. It was found that validated targets do not necessarily associate with the highest sequence matching. Besides, for some miRNA families, the frequency of their predicted targets is significantly higher in the genomic region nearby their own physical location. Finally, in a case study of dre-miR-10 and dre-miR-196, it was found that the predicted target genes hoxd13a, hoxd11a, hoxd10a and hoxc4a of dre-miR- 10 while hoxa9a, hoxc8a and hoxa13a of dre-miR-196 have similar characteristics as validated target genes and therefore represent high confidence target candidates.

Keywords: MicroRNA targets validation, microRNA-target relationships, dre-miR-10, dre-miR-196.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1944
8 Mutation Analysis of the ATP7B Gene in 43 Vietnamese Wilson’s Disease Patients

Authors: Huong M. T. Nguyen, Hoa A. P. Nguyen, Mai P. T. Nguyen, Ngoc D. Ngo, Van T. Ta, Hai T. Le, Chi V. Phan

Abstract:

Wilson’s disease (WD) is an autosomal recessive disorder of the copper metabolism, which is caused by a mutation in the copper-transporting P-type ATPase (ATP7B). The mechanism of this disease is the failure of hepatic excretion of copper to bile, and leads to copper deposits in the liver and other organs. The ATP7B gene is located on the long arm of chromosome 13 (13q14.3). This study aimed to investigate the gene mutation in the Vietnamese patients with WD, and make a presymptomatic diagnosis for their familial members. Forty-three WD patients and their 65 siblings were identified as having ATP7B gene mutations. Genomic DNA was extracted from peripheral blood samples; 21 exons and exon-intron boundaries of the ATP7B gene were analyzed by direct sequencing. We recognized four mutations ([R723=; H724Tfs*34], V1042Cfs*79, D1027H, and IVS6+3A>G) in the sum of 20 detectable mutations, accounting for 87.2% of the total. Mutation S105* was determined to have a high rate (32.6%) in this study. The hotspot regions of ATP7B were found at exons 2, 16, and 8, and intron 14, in 39.6 %, 11.6 %, 9.3%, and 7 % of patients, respectively. Among nine homozygote/compound heterozygote siblings of the patients with WD, three individuals were determined as asymptomatic by screening mutations of the probands. They would begin treatment after diagnosis. In conclusion, 20 different mutations were detected in 43 WD patients. Of this number, four novel mutations were explored, including [R723=; H724Tfs*34], V1042Cfs*79, D1027H, and IVS6+3A>G. The mutation S105* is the most prevalent and has been considered as a biomarker that can be used in a rapid detection assay for diagnosis of WD patients. Exons 2, 8, and 16, and intron 14 should be screened initially for WD patients in Vietnam. Based on risk profile for WD, genetic testing for presymptomatic patients is also useful in diagnosis and treatment.

Keywords: ATP7B gene, mutation detection, presymptomatic diagnosis, Vietnamese Wilson’s disease.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1644
7 Identification of Promiscuous Epitopes for Cellular Immune Responses in the Major Antigenic Protein Rv3873 Encoded by Region of Difference 1 of Mycobacterium tuberculosis

Authors: Abu Salim Mustafa

Abstract:

Rv3873 is a relatively large size protein (371 amino acids in length) and its gene is located in the immunodominant genomic region of difference (RD)1 that is present in the genome of Mycobacterium tuberculosis but deleted from the genomes of all the vaccine strains of Bacillus Calmette Guerin (BCG) and most other mycobacteria. However, when tested for cellular immune responses using peripheral blood mononuclear cells from tuberculosis patients and BCG-vaccinated healthy subjects, this protein was found to be a major stimulator of cell mediated immune responses in both groups of subjects. In order to further identify the sequence of immunodominant epitopes and explore their Human Leukocyte Antigen (HLA)-restriction for epitope recognition, 24 peptides (25-mers overlapping with the neighboring peptides by 10 residues) covering the sequence of Rv3873 were synthesized chemically using fluorenylmethyloxycarbonyl chemistry and tested in cell mediated immune responses. The results of these experiments helped in the identification of an immunodominant peptide P9 that was recognized by people expressing varying HLA-DR types. Furthermore, it was also predicted to be a promiscuous binder with multiple epitopes for binding to HLA-DR, HLA-DP and HLA-DQ alleles of HLA-class II molecules that present antigens to T helper cells, and to HLA-class I molecules that present antigens to T cytotoxic cells. In addition, the evaluation of peptide P9 using an immunogenicity predictor server yielded a high score (0.94), which indicated a greater probability of this peptide to elicit a protective cellular immune response. In conclusion, P9, a peptide with multiple epitopes and ability to bind several HLA class I and class II molecules for presentation to cells of the cellular immune response, may be useful as a peptide-based vaccine against tuberculosis.

Keywords: Mycobacterium tuberculosis, Rv3873, peptides, vaccine

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 794
6 TNFRSF11B Gene Polymorphisms A163G and G11811C in Prediction of Osteoporosis Risk

Authors: Boroňová I., Bernasovská J., Kľoc J., Tomková Z., Petrejčíková E., Gabriková D., Mačeková S.

Abstract:

Osteoporosis is a complex health disease characterized by low bone mineral density, which is determined by an interaction of genetics with metabolic and environmental factors. Current research in genetics of osteoporosis is focused on identification of responsible genes and polymorphisms. TNFRSF11B gene plays a key role in bone remodeling. The aim of this study was to investigate the genotype and allele distribution of A163G (rs3102735) osteoprotegerin gene promoter and G1181C (rs2073618) osteoprotegerin first exon polymorphisms in the group of 180 unrelated postmenopausal women with diagnosed osteoporosis and 180 normal controls. Genomic DNA was isolated from peripheral blood leukocytes using standard methodology. Genotyping for presence of different polymorphisms was performed using the Custom Taqman®SNP Genotyping assays. Hardy-Weinberg equilibrium was tested for each SNP in the groups of participants using the chi-square (χ2) test. The distribution of investigated genotypes in the group of patients with osteoporosis were as follows: AA (66.7%), AG (32.2%), GG (1.1%) for A163G polymorphism; GG (19.4%), CG (44.4%), CC (36.1%) for G1181C polymorphism. The distribution of genotypes in normal controls were follows: AA (71.1%), AG (26.1%), GG (2.8%) for A163G polymorphism; GG (22.2%), CG (48.9%), CC (28.9%) for G1181C polymorphism. In A163G polymorphism the variant G allele was more common among patients with osteoporosis: 17.2% versus 15.8% in normal controls. Also, in G1181C polymorphism the phenomenon of more frequent occurrence of C allele in the group of patients with osteoporosis was observed (58.3% versus 53.3%). Genotype and allele distributions showed no significant differences (A163G: χ2=0.270, p=0.605; χ2=0.250, p=0.616; G1181C: χ2= 1.730, p=0.188; χ2=1.820, p=0.177). Our results represents an initial study, further studies of more numerous file and associations studies will be carried out. Knowing the distribution of genotypes is important for assessing the impact of these polymorphisms on various parameters associated with osteoporosis. Screening for identification of “at-risk” women likely to develop osteoporosis and initiating subsequent early intervention appears to be most effective strategy to substantially reduce the risks of osteoporosis.

Keywords: Osteoporosis, Real-time PCR method, SNP polymorphisms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2210
5 DNA Polymorphism Studies of β-Lactoglobulin Gene in Saudi Goats

Authors: Amr A. El Hanafy, Muhammad Qureshi, Jamal Sabir, Mohamed Mutawakil, Mohamed M. Ahmed, Hassan El Ashmaoui, Hassan Ramadan, Mohamed Abou-Alsoud, Mahmoud Abdel Sadek

Abstract:

Domestic goats (Capra hircus) are extremely diverse species and principal animal genetic resource of the developing world. These facilitate a persistent supply of meat, milk, fibre, and skin and are considered as important revenue generators in small pastoral environments. This study aimed to fingerprint β-LG gene at PCR-RFLP level in native Saudi goat breeds (Ardi, Habsi and Harri) in an attempt to have a preliminary image of β-LG genotypic patterns in Saudi breeds as compared to other foreign breeds such as Indian and Egyptian. Also, the Phylogenetic analysis was done to investigate evolutionary trends and similarities among the caprine β-LG gene with that of the other domestic specie, viz. cow, buffalo and sheep. Blood samples were collected from 300 animals (100 for each breed) and genomic DNA was extracted. A fragment of the β-LG gene (427bp) was amplified using specific primers. Subsequent digestion with Sac II restriction endonuclease revealed two alleles (A and B) and three different banding patterns or genotypes i.e. AA, AB and BB. The statistical analysis showed a general trend that β-LG AA genotype had higher milk yield than β-LG AB and β-LG BB genotypes. Nucleotide sequencing of the selected β-LG fragments was done and submitted to GenBank NCBI (Accession No. KJ544248, KJ588275, KJ588276, KJ783455, KJ783456 and KJ874959). Phylogenetic analysis on the basis of nucleotide sequences of native Saudi goats indicated evolutional similarity with the GenBank reference sequences of goat, Bubalus bubalis and Bos taurus. However, the origin of sheep which is the most closely related from the evolutionary point of view, was located some distance away.

Keywords: β-Lactoglobulin, Saudi goats, PCR-RFLP, Phylogenetic analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6105
4 A Novel Multiplex Real-Time PCR Assay Using TaqMan MGB Probes for Rapid Detection of Trisomy 21

Authors: Mehrdad Hashemi, Mitra Behrooz Aghdam, Reza Mahdian, Ahmad Reza Kamyab

Abstract:

Cytogenetic analysis still remains the gold standard method for prenatal diagnosis of trisomy 21 (Down syndrome, DS). Nevertheless, the conventional cytogenetic analysis needs live cultured cells and is too time-consuming for clinical application. In contrast, molecular methods such as FISH, QF-PCR, MLPA and quantitative Real-time PCR are rapid assays with results available in 24h. In the present study, we have successfully used a novel MGB TaqMan probe-based real time PCR assay for rapid diagnosis of trisomy 21 status in Down syndrome samples. We have also compared the results of this molecular method with corresponding results obtained by the cytogenetic analysis. Blood samples obtained from DS patients (n=25) and normal controls (n=20) were tested by quantitative Real-time PCR in parallel to standard G-banding analysis. Genomic DNA was extracted from peripheral blood lymphocytes. A high precision TaqMan probe quantitative Real-time PCR assay was developed to determine the gene dosage of DSCAM (target gene on 21q22.2) relative to PMP22 (reference gene on 17p11.2). The DSCAM/PMP22 ratio was calculated according to the formula; ratio=2 -ΔΔCT. The quantitative Real-time PCR was able to distinguish between trisomy 21 samples and normal controls with the gene ratios of 1.49±0.13 and 1.03±0.04 respectively (p value <0.001). These results represent the presence of 3 copies of target gene in DS samples Vs 2 copies in normal controls. The results of quantitative Real-time PCR were in complete agreement with results of cytogenetic analysis. This study confirms previous reports regarding successful implementation of quantitative Real-time PCR for detection of trisomy 21. However, the assay has been improved by using MGB probes and more accurate data analysis. This assay, in particular, when performed in combination with another molecular assay such as QF-PCR or MLPA, can be used as a reliable technique for rapid prenatal diagnosis of trisomy 21.

Keywords: Trisomy 21, Real-time PCR, MGB-TaqMan Probes, Gene Dosage.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2498
3 Modeling Stress-Induced Regulatory Cascades with Artificial Neural Networks

Authors: Maria E. Manioudaki, Panayiota Poirazi

Abstract:

Yeast cells live in a constantly changing environment that requires the continuous adaptation of their genomic program in order to sustain their homeostasis, survive and proliferate. Due to the advancement of high throughput technologies, there is currently a large amount of data such as gene expression, gene deletion and protein-protein interactions for S. Cerevisiae under various environmental conditions. Mining these datasets requires efficient computational methods capable of integrating different types of data, identifying inter-relations between different components and inferring functional groups or 'modules' that shape intracellular processes. This study uses computational methods to delineate some of the mechanisms used by yeast cells to respond to environmental changes. The GRAM algorithm is first used to integrate gene expression data and ChIP-chip data in order to find modules of coexpressed and co-regulated genes as well as the transcription factors (TFs) that regulate these modules. Since transcription factors are themselves transcriptionally regulated, a three-layer regulatory cascade consisting of the TF-regulators, the TFs and the regulated modules is subsequently considered. This three-layer cascade is then modeled quantitatively using artificial neural networks (ANNs) where the input layer corresponds to the expression of the up-stream transcription factors (TF-regulators) and the output layer corresponds to the expression of genes within each module. This work shows that (a) the expression of at least 33 genes over time and for different stress conditions is well predicted by the expression of the top layer transcription factors, including cases in which the effect of up-stream regulators is shifted in time and (b) identifies at least 6 novel regulatory interactions that were not previously associated with stress-induced changes in gene expression. These findings suggest that the combination of gene expression and protein-DNA interaction data with artificial neural networks can successfully model biological pathways and capture quantitative dependencies between distant regulators and downstream genes.

Keywords: gene modules, artificial neural networks, yeast, stress

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1428
2 Analysis of Metallothionein Gene MT1A (rs11076161) and MT2A (rs10636) Polymorphisms as a Molecular Marker in Type 2 Diabetes Mellitus among Malay Population

Authors: Norsakinah Mohammad Osman, Ali Etemad, Patimah Ismail

Abstract:

Type 2 diabetes mellitus (T2DM) is a complex metabolic disorder that characterized by the presence of high glucose in blood that cause from insulin resistance and insufficiency due to deterioration β-cell Langerhans functions. T2DM is commonly caused by the combination of inherited genetic variations as well as our own lifestyle. Metallothionein (MT) is a known cysteine-rich protein responsible in helping zinc homeostasis which is important in insulin signaling and secretion as well as protection our body from reactive oxygen species (ROS). MT scavenged ROS and free radicals in our body happen to be one of the reasons of T2DM and its complications. The objective of this study was to investigate the association of MT1A and MT2A polymorphisms between T2DM and control subjects among Malay populations. This study involved 150 T2DM and 120 Healthy individuals of Malay ethnic with mixed genders. The genomic DNA was extracted from buccal cells and amplified for MT1A and MT2A loci; the 347bp and 238bp banding patterns were respectively produced by mean of the Polymerase Chain Reaction (PCR). The PCR products were digested with Mlucl and Tsp451 restriction enzymes respectively and producing fragments lengths of (158/189/347bp) and (103/135/238bp) respectively. The ANOVA test was conducted and it shown that there was a significant difference between diabetic and control subjects for age, BMI, WHR, SBP, FPG, HBA1C, LDL, TG, TC and family history with (P<0.05). While the HDL, CVD risk ratio and DBP does not show any significant difference with (P>0.05). The genotype frequency for AA, AG and GG of MT1A polymorphisms was 72.7%, 22.7% and 4.7% in cases and 15%, 55% and 30% in control respectively. As for MT2A, genotype frequency of GG, GC and CC was 42.7%, 27.3% and 30% in case and 5%, 40% and 55% for control respectively. Both polymorphisms show significant difference between two investigated groups with (P=0.000). The Post hoc test was conducted and shows a significant difference between the genotypes within each polymorphism (P=0. 000). The MT1A and MT2A polymorphisms were believed to be the reliable molecular markers to distinguish the T2DM subjects from healthy individuals in Malay populations.

Keywords: Type 2 Diabetes Mellitus (T2DM), Metallothionein (MT), MT1A (rs11076161), MT2A (rs10636), Malay, Genetic Polymorphism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2278
1 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: Metagenomics, phenotype prediction, deep learning, embeddings, multiple instance learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 842