Search results for: protein sequence.
876 Parallezation Protein Sequence Similarity Algorithms using Remote Method Interface
Authors: Mubarak Saif Mohsen, Zurinahni Zainol, Rosalina Abdul Salam, Wahidah Husain
Abstract:
One of the major problems in genomic field is to perform sequence comparison on DNA and protein sequences. Executing sequence comparison on the DNA and protein data is a computationally intensive task. Sequence comparison is the basic step for all algorithms in protein sequences similarity. Parallel computing is an attractive solution to provide the computational power needed to speedup the lengthy process of the sequence comparison. Our main research is to enhance the protein sequence algorithm using dynamic programming method. In our approach, we parallelize the dynamic programming algorithm using multithreaded program to perform the sequence comparison and also developed a distributed protein database among many PCs using Remote Method Interface (RMI). As a result, we showed how different sizes of protein sequences data and computation of scoring matrix of these protein sequence on different number of processors affected the processing time and speed, as oppose to sequential processing.
Keywords: Protein sequence algorithm, dynamic programming algorithm, multithread
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1902875 Predicting Protein Function using Decision Tree
Authors: Manpreet Singh, Parminder Kaur Wadhwa, Surinder Kaur
Abstract:
The drug discovery process starts with protein identification because proteins are responsible for many functions required for maintenance of life. Protein identification further needs determination of protein function. Proposed method develops a classifier for human protein function prediction. The model uses decision tree for classification process. The protein function is predicted on the basis of matched sequence derived features per each protein function. The research work includes the development of a tool which determines sequence derived features by analyzing different parameters. The other sequence derived features are determined using various web based tools.Keywords: Sequence Derived Features, decision tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1950874 UTMGO: A Tool for Searching a Group of Semantically Related Gene Ontology Terms and Application to Annotation of Anonymous Protein Sequence
Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias
Abstract:
Gene Ontology terms have been actively used to annotate various protein sets. SWISS-PROT, TrEMBL, and InterPro are protein databases that are annotated according to the Gene Ontology terms. However, direct implementation of the Gene Ontology terms for annotation of anonymous protein sequences is not easy, especially for species not commonly represented in biological databases. UTMGO is developed as a tool that allows the user to quickly and easily search for a group of semantically related Gene Ontology terms. The applicability of the UTMGO is demonstrated by applying it to annotation of anonymous protein sequence. The extended UTMGO uses the Gene Ontology terms together with protein sequences associated with the terms to perform the annotation task. GOPET, GOtcha, GoFigure, and JAFA are used to compare the performance of the extended UTMGO.Keywords: Anonymous protein sequence, Gene Ontology, Protein sequence annotation, Protein sequence alignment
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1439873 Detecting Remote Protein Evolutionary Relationships via String Scoring Method
Authors: Nazar Zaki, Safaai Deris
Abstract:
The amount of the information being churned out by the field of biology has jumped manifold and now requires the extensive use of computer techniques for the management of this information. The predominance of biological information such as protein sequence similarity in the biological information sea is key information for detecting protein evolutionary relationship. Protein sequence similarity typically implies homology, which in turn may imply structural and functional similarities. In this work, we propose, a learning method for detecting remote protein homology. The proposed method uses a transformation that converts protein sequence into fixed-dimensional representative feature vectors. Each feature vector records the sensitivity of a protein sequence to a set of amino acids substrings generated from the protein sequences of interest. These features are then used in conjunction with support vector machines for the detection of the protein remote homology. The proposed method is tested and evaluated on two different benchmark protein datasets and it-s able to deliver improvements over most of the existing homology detection methods.
Keywords: Protein homology detection; support vectormachine; string kernel.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1391872 Comparison of Domain and Hydrophobicity Features for the Prediction of Protein-Protein Interactions using Support Vector Machines
Authors: Hany Alashwal, Safaai Deris, Razib M. Othman
Abstract:
The protein domain structure has been widely used as the most informative sequence feature to computationally predict protein-protein interactions. However, in a recent study, a research group has reported a very high accuracy of 94% using hydrophobicity feature. Therefore, in this study we compare and verify the usefulness of protein domain structure and hydrophobicity properties as the sequence features. Using the Support Vector Machines (SVM) as the learning system, our results indicate that both features achieved accuracy of nearly 80%. Furthermore, domains structure had receiver operating characteristic (ROC) score of 0.8480 with running time of 34 seconds, while hydrophobicity had ROC score of 0.8159 with running time of 20,571 seconds (5.7 hours). These results indicate that protein-protein interaction can be predicted from domain structure with reliable accuracy and acceptable running time.
Keywords: Bioinformatics, protein-protein interactions, support vector machines, protein features.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1917871 Introducing Sequence-Order Constraint into Prediction of Protein Binding Sites with Automatically Extracted Templates
Authors: Yi-Zhong Weng, Chien-Kang Huang, Yu-Feng Huang, Chi-Yuan Yu, Darby Tien-Hao Chang
Abstract:
Search for a tertiary substructure that geometrically matches the 3D pattern of the binding site of a well-studied protein provides a solution to predict protein functions. In our previous work, a web server has been built to predict protein-ligand binding sites based on automatically extracted templates. However, a drawback of such templates is that the web server was prone to resulting in many false positive matches. In this study, we present a sequence-order constraint to reduce the false positive matches of using automatically extracted templates to predict protein-ligand binding sites. The binding site predictor comprises i) an automatically constructed template library and ii) a local structure alignment algorithm for querying the library. The sequence-order constraint is employed to identify the inconsistency between the local regions of the query protein and the templates. Experimental results reveal that the sequence-order constraint can largely reduce the false positive matches and is effective for template-based binding site prediction.Keywords: Protein structure, binding site, functional prediction
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1460870 A Novel Approach for Protein Classification Using Fourier Transform
Authors: A. F. Ali, D. M. Shawky
Abstract:
Discovering new biological knowledge from the highthroughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a new approach for protein classification. Proteins that are evolutionarily- and thereby functionally- related are said to belong to the same classification. Identifying protein classification is of fundamental importance to document the diversity of the known protein universe. It also provides a means to determine the functional roles of newly discovered protein sequences. Our goal is to predict the functional classification of novel protein sequences based on a set of features extracted from each protein sequence. The proposed technique used datasets extracted from the Structural Classification of Proteins (SCOP) database. A set of spectral domain features based on Fast Fourier Transform (FFT) is used. The proposed classifier uses multilayer back propagation (MLBP) neural network for protein classification. The maximum classification accuracy is about 91% when applying the classifier to the full four levels of the SCOP database. However, it reaches a maximum of 96% when limiting the classification to the family level. The classification results reveal that spectral domain contains information that can be used for classification with high accuracy. In addition, the results emphasize that sequence similarity measures are of great importance especially at the family level.
Keywords: Bioinformatics, Artificial Neural Networks, Protein Sequence Analysis, Feature Extraction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2359869 Computational Method for Annotation of Protein Sequence According to Gene Ontology Terms
Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias
Abstract:
Annotation of a protein sequence is pivotal for the understanding of its function. Accuracy of manual annotation provided by curators is still questionable by having lesser evidence strength and yet a hard task and time consuming. A number of computational methods including tools have been developed to tackle this challenging task. However, they require high-cost hardware, are difficult to be setup by the bioscientists, or depend on time intensive and blind sequence similarity search like Basic Local Alignment Search Tool. This paper introduces a new method of assigning highly correlated Gene Ontology terms of annotated protein sequences to partially annotated or newly discovered protein sequences. This method is fully based on Gene Ontology data and annotations. Two problems had been identified to achieve this method. The first problem relates to splitting the single monolithic Gene Ontology RDF/XML file into a set of smaller files that can be easy to assess and process. Thus, these files can be enriched with protein sequences and Inferred from Electronic Annotation evidence associations. The second problem involves searching for a set of semantically similar Gene Ontology terms to a given query. The details of macro and micro problems involved and their solutions including objective of this study are described. This paper also describes the protein sequence annotation and the Gene Ontology. The methodology of this study and Gene Ontology based protein sequence annotation tool namely extended UTMGO is presented. Furthermore, its basic version which is a Gene Ontology browser that is based on semantic similarity search is also introduced.
Keywords: automatic clustering, bioinformatics tool, gene ontology, protein sequence annotation, semantic similarity search
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3127868 Predicting Protein-Protein Interactions from Protein Sequences Using Phylogenetic Profiles
Authors: Omer Nebil Yaveroglu, Tolga Can
Abstract:
In this study, a high accuracy protein-protein interaction prediction method is developed. The importance of the proposed method is that it only uses sequence information of proteins while predicting interaction. The method extracts phylogenetic profiles of proteins by using their sequence information. Combining the phylogenetic profiles of two proteins by checking existence of homologs in different species and fitting this combined profile into a statistical model, it is possible to make predictions about the interaction status of two proteins. For this purpose, we apply a collection of pattern recognition techniques on the dataset of combined phylogenetic profiles of protein pairs. Support Vector Machines, Feature Extraction using ReliefF, Naive Bayes Classification, K-Nearest Neighborhood Classification, Decision Trees, and Random Forest Classification are the methods we applied for finding the classification method that best predicts the interaction status of protein pairs. Random Forest Classification outperformed all other methods with a prediction accuracy of 76.93%Keywords: Protein Interaction Prediction, Phylogenetic Profile, SVM , ReliefF, Decision Trees, Random Forest Classification
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1612867 Genome-Wide Analysis of BES1/BZR1 Gene Family in Five Plant Species
Authors: Jafar Ahmadi, Zhohreh Asiaban, Sedigheh Fabriki Ourang
Abstract:
Brassinosteroids (BRs) regulate cell elongation, vascular differentiation, senescence, and stress responses. BRs signal through the BES1/BZR1 family of transcription factors, which regulate hundreds of target genes involved in this pathway. In this research a comprehensive genome-wide analysis was carried out in BES1/BZR1 gene family in Arabidopsis thaliana, Cucumis sativus, Vitis vinifera, Glycin max and Brachypodium distachyon. Specifications of the desired sequences, dot plot and hydropathy plot were analyzed in the protein and genome sequences of five plant species. The maximum amino acid length was attributed to protein sequence Brdic3g with 374aa and the minimum amino acid length was attributed to protein sequence Gm7g with 163aa. The maximum Instability index was attributed to protein sequence AT1G19350 equal with 79.99 and the minimum Instability index was attributed to protein sequence Gm5g equal with 33.22. Aliphatic index of these protein sequences ranged from 47.82 to 78.79 in Arabidopsis thaliana, 49.91 to 57.50 in Vitis vinifera, 55.09 to 82.43 in Glycin max, 54.09 to 54.28 in Brachypodium distachyon 55.36 to 56.83 in Cucumis sativus. Overall, data obtained from our investigation contributes a better understanding of the complexity of the BES1/BZR1 gene family and provides the first step towards directing future experimental designs to perform systematic analysis of the functions of the BES1/BZR1 gene family.
Keywords: BES1/BZR1, Brassinosteroids, Phylogenetic analysis, Transcription factor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2255866 Virulent-GO: Prediction of Virulent Proteins in Bacterial Pathogens Utilizing Gene Ontology Terms
Authors: Chia-Ta Tsai, Wen-Lin Huang, Shinn-Jang Ho, Li-Sun Shu, Shinn-Ying Ho
Abstract:
Prediction of bacterial virulent protein sequences can give assistance to identification and characterization of novel virulence-associated factors and discover drug/vaccine targets against proteins indispensable to pathogenicity. Gene Ontology (GO) annotation which describes functions of genes and gene products as a controlled vocabulary of terms has been shown effectively for a variety of tasks such as gene expression study, GO annotation prediction, protein subcellular localization, etc. In this study, we propose a sequence-based method Virulent-GO by mining informative GO terms as features for predicting bacterial virulent proteins. Each protein in the datasets used by the existing method VirulentPred is annotated by using BLAST to obtain its homologies with known accession numbers for retrieving GO terms. After investigating various popular classifiers using the same five-fold cross-validation scheme, Virulent-GO using the single kind of GO term features with an accuracy of 82.5% is slightly better than VirulentPred with 81.8% using five kinds of sequence-based features. For the evaluation of independent test, Virulent-GO also yields better results (82.0%) than VirulentPred (80.7%). When evaluating single kind of feature with SVM, the GO term feature performs much well, compared with each of the five kinds of features.Keywords: Bacterial virulence factors, GO terms, prediction, protein sequence.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2187865 Cloning, Expression and Protein Purification of AV1 Gene of Okra Leaf Curl Virus Egyptian Isolate and Genetic Diversity between Whitefly and Different Plant Hosts
Authors: Dalia. G. Aseel
Abstract:
Begomoviruses are economically important plant viruses that infect dicotyledonous plants and exclusively transmitted by the whitefly Bemisia tabaci. Here, replicative form was isolated from Okra, Cotton, Tomato plants and whitefly infected with Begomoviruses. Using coat protein specific primers (AV1), the viral infection was verified with amplicon at 450 bp. The sequence of OLCuV-AV1 gene was recorded and received an accession number (FJ441605) from Genebank. The phylogenetic tree of OLCuV was closely related to Okra leaf curl virus previously isolated from Cameroon and USA with nucleotide sequence identity of 92%. The protein purification was carried out using His-Tag methodology by using Affinity Chromatography. The purified protein was separated on SDS-PAGE analysis and an enriched expected size of band at 30 kDa was observed. Furthermore, RAPD and SDS-PAGE were used to detect genetic variability between different hosts of okra leaf curl virus (OLCuV), cotton leaf curl virus (CLCuV), tomato yellow leaf curl virus (TYLCuV) and the whitefly vector. Finally, the present study would help to understand the relationship between the whitefly and different economical crops in Egypt.
Keywords: Begomovirus, AV1 gene, sequence, cloning, whitefly, okra, cotton, tomato, RAPD, phylogenetic tree and SDS-PAGE.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 902864 Protein-Protein Interaction Detection Based on Substring Sensitivity Measure
Authors: Nazar Zaki, Safaai Deris, Hany Alashwal
Abstract:
Detecting protein-protein interactions is a central problem in computational biology and aberrant such interactions may have implicated in a number of neurological disorders. As a result, the prediction of protein-protein interactions has recently received considerable attention from biologist around the globe. Computational tools that are capable of effectively identifying protein-protein interactions are much needed. In this paper, we propose a method to detect protein-protein interaction based on substring similarity measure. Two protein sequences may interact by the mean of the similarities of the substrings they contain. When applied on the currently available protein-protein interaction data for the yeast Saccharomyces cerevisiae, the proposed method delivered reasonable improvement over the existing ones.
Keywords: Protein-Protein Interaction, support vector machine, feature extraction, pairwise alignment, Smith-Waterman score.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1936863 A System to Integrate and Manipulate Protein Database Using BioPerl and XML
Authors: Zurinahni Zainol, Rosalina Abdul Salam, Rosni Abdullah, Nur'Aini, Wahidah Husain
Abstract:
The size, complexity and number of databases used for protein information have caused bioinformatics to lag behind in adapting to the need to handle this distributed information. Integrating all the information from different databases into one database is a challenging problem. Our main research is to develop a tool which can be used to access and manipulate protein information from difference databases. In our approach, we have integrated difference databases such as Swiss-prot, PDB, Interpro, and EMBL and transformed these databases in flat file format into relational form using XML and Bioperl. As a result, we showed this tool can search different sizes of protein information stored in relational database and the result can be retrieved faster compared to flat file database. A web based user interface is provided to allow user to access or search for protein information in the local database.Keywords: Protein sequence database, relational database, integrated database.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1442862 PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts
Authors: Rania A. Abul Seoud, Nahed H. Solouma, Abou-Baker M. Youssef, Yasser M. Kadah
Abstract:
Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising.Keywords: Link Grammar Parser, Interaction extraction, protein-protein interaction, Natural language processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2253861 Multiple Sequence Alignment Using Optimization Algorithms
Authors: M. F. Omar, R. A. Salam, R. Abdullah, N. A. Rashid
Abstract:
Proteins or genes that have similar sequences are likely to perform the same function. One of the most widely used techniques for sequence comparison is sequence alignment. Sequence alignment allows mismatches and insertion/deletion, which represents biological mutations. Sequence alignment is usually performed only on two sequences. Multiple sequence alignment, is a natural extension of two-sequence alignment. In multiple sequence alignment, the emphasis is to find optimal alignment for a group of sequences. Several applicable techniques were observed in this research, from traditional method such as dynamic programming to the extend of widely used stochastic optimization method such as Genetic Algorithms (GAs) and Simulated Annealing. A framework with combination of Genetic Algorithm and Simulated Annealing is presented to solve Multiple Sequence Alignment problem. The Genetic Algorithm phase will try to find new region of solution while Simulated Annealing can be considered as an alignment improver for any near optimal solution produced by GAs.
Keywords: Simulated annealing, genetic algorithm, sequence alignment, multiple sequence alignment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2408860 Analytical Modeling of Globular Protein-Ferritin in α-Helical Conformation: A White Noise Functional Approach
Authors: Vernie C. Convicto, Henry P. Aringa, Wilson I. Barredo
Abstract:
This study presents a conformational model of the helical structures of globular protein particularly ferritin in the framework of white noise path integral formulation by using Associated Legendre functions, Bessel and convolution of Bessel and trigonometric functions as modulating functions. The model incorporates chirality features of proteins and their helix-turn-helix sequence structural motif.Keywords: Globular protein, modulating function, white noise, winding probability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1955859 Expression of Tissue Plasminogen Activator in Transgenic Tobacco Plants by Signal Peptides Targeting for Delivery to Apoplast, Endoplasmic Reticulum and Cytosol Spaces
Authors: Sadegh Lotfieblisofla, Arash Khodabakhshi
Abstract:
Tissue plasminogen activator (tPA) as a serine protease plays an important role in the fibrinolytic system and the dissolution of fibrin clots in human body. The production of this drug in plants such as tobacco could reduce its production costs. In this study, expression of tPA gene and protein targeting to different plant cell compartments, using various signal peptides has been investigated. For high level of expression, Kozak sequence was used after CaMV35S in the beginning of the gene. In order to design the final construction, Extensin, KDEL (amino acid sequence including Lys-Asp-Glu-Leu) and SP (γ-zein signal peptide coding sequence) were used as leader signals to conduct this protein into apoplast, endoplasmic reticulum and cytosol spaces, respectively. Cloned human tPA gene under the CaMV (Cauliflower mosaic virus) 35S promoter and NOS (Nopaline Synthase) terminator into pBI121 plasmid was transferred into tobacco explants by Agrobacterium tumefaciens strain LBA4404. The presence and copy number of genes in transgenic tobacco was proved by Southern blotting. Enzymatic activity of the rt-PA protein in transgenic plants compared to non-transgenic plants was confirmed by Zymography assay. The presence and amount of rt-PA recombinant protein in plants was estimated by ELISA analysis on crude protein extract of transgenic tobacco using a specific antibody. The yield of recombinant tPA in transgenic tobacco for SP, KDEL, Extensin signals were counted 0.50, 0.68, 0.69 microgram per milligram of total soluble proteins.
Keywords: Recombinant tissue plasminogen activator, plant cell comportment, leader signals, transgenic tobacco.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 714858 Investigations of Protein Aggregation Using Sequence and Structure Based Features
Authors: M. Michael Gromiha, A. Mary Thangakani, Sandeep Kumar, D. Velmurugan
Abstract:
The main cause of several neurodegenerative diseases such as Alzhemier, Parkinson and spongiform encephalopathies is formation of amyloid fibrils and plaques in proteins. We have analyzed different sets of proteins and peptides to understand the influence of sequence based features on protein aggregation process. The comparison of 373 pairs of homologous mesophilic and thermophilic proteins showed that aggregation prone regions (APRs) are present in both. But, the thermophilic protein monomers show greater ability to ‘stow away’ the APRs in their hydrophobic cores and protect them from solvent exposure. The comparison of amyloid forming and amorphous b-aggregating hexapeptides suggested distinct preferences for specific residues at the six positions as well as all possible combinations of nine residue pairs. The compositions of residues at different positions and residue pairs have been converted into energy potentials and utilized for distinguishing between amyloid forming and amorphous b-aggregating peptides. Our method could correctly identify the amyloid forming peptides at an accuracy of 95-100% in different datasets of peptides.
Keywords: Aggregation prone regions, amyloids, thermophilic proteins, amino acid residues, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1498857 Multiple Sequence Alignment Using Three- Dimensional Fragments
Authors: Layal Al Ait, Eduardo Corel, Kifah Tout, Burkhard Morgenstern
Abstract:
Background: Dialign is a DNA/Protein alignment tool for performing pairwise and multiple pairwise alignments through the comparison of gap-free segments (fragments) between sequence pairs. An alignment of two sequences is a chain of fragments, i.e local gap-free pairwise alignments, with the highest total score. METHOD: A new approach is defined in this article which relies on the concept of using three-dimensional fragments – i.e. local threeway alignments -- in the alignment process instead of twodimensional ones. These three-dimensional fragments are gap-free alignments constituting of equal-length segments belonging to three distinct sequences. RESULTS: The obtained results showed good improvments over the performance of DIALIGN.Keywords: DIALIGN, Multiple sequence alignment, Threedimensional fragments.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1557856 Cloning of a β-Glucosidase Gene (BGL1) from Traditional Starter Yeast Saccharomycopsis fibuligera BMQ 908 and Expression in Pichia pastoris
Authors: Le Thuy Mai, Vu Nguyen Thanh
Abstract:
β-Glucosidase is an important enzyme for production of ethanol from lignocellulose. With hydrolytic activity on cellooligosaccharides, especially cellobiose, β-glucosidase removes product inhibitory effect on cellulases and forms fermentable sugars. In this study, β-glucosidase encoding gene (BGL1) from traditional starter yeast Saccharomycosis fibuligera BMQ908 was cloned and expressed in Pichia pastoris. BGL1 of S. fibuligera BMQ 908 shared 98% nucleotide homology with the closest GenBank sequence (M22475) but identity in amino-acid sequences of catalytic domains. Recombinant plasmid pPICZαA/BGL1 containing the sequence encoding BGL1 mature protein and α-factor secretion signal was constructed and transformed into methylotrophic yeast P. pastoris by electroporation. The recombinant strain produced single extracellular protein with molecular weight of 120 kDa and cellobiase activity of 60 IU/ml. The optimum pH of the recombinant β-glucosidase was 5.0 and the optimum temperature was 50°C.Keywords: β-Glucosidase, Pichia pastoris, Saccharomycopsisfibuligera, recombinant enzyme.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4971855 Protein Secondary Structure Prediction Using Parallelized Rule Induction from Coverings
Authors: Leong Lee, Cyriac Kandoth, Jennifer L. Leopold, Ronald L. Frank
Abstract:
Protein 3D structure prediction has always been an important research area in bioinformatics. In particular, the prediction of secondary structure has been a well-studied research topic. Despite the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of various computational prediction algorithms rarely has exceeded 75%. In a previous paper [1], this research team presented a rule-based method called RT-RICO (Relaxed Threshold Rule Induction from Coverings) to predict protein secondary structure. The average Q3 accuracy on the sample datasets using RT-RICO was 80.3%, an improvement over comparable computational methods. Although this demonstrated that RT-RICO might be a promising approach for predicting secondary structure, the algorithm-s computational complexity and program running time limited its use. Herein a parallelized implementation of a slightly modified RT-RICO approach is presented. This new version of the algorithm facilitated the testing of a much larger dataset of 396 protein domains [2]. Parallelized RTRICO achieved a Q3 score of 74.6%, which is higher than the consensus prediction accuracy of 72.9% that was achieved for the same test dataset by a combination of four secondary structure prediction methods [2].Keywords: data mining, protein secondary structure prediction, parallelization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1595854 Analysis of DNA-Recognizing Enzyme Interaction using Deaminated Lesions
Authors: Seung Pil Pack
Abstract:
Deaminated lesions were produced via nitrosative oxidation of natural nucleobases; uracul (Ura, U) from cytosine (Cyt, C), hypoxanthine (Hyp, H) from adenine (Ade, A), and xanthine (Xan, X) and oxanine (Oxa, O) from guanine (Gua, G). Such damaged nucleobases may induce mutagenic problems, so that much attentions and efforts have been poured on the revealing of their mechanisms in vivo or in vitro. In this study, we employed these deaminated lesions as useful probes for analysis of DNA-binding/recognizing proteins or enzymes. Since the pyrimidine lesions such as Hyp, Oxa and Xan are employed as analogues of guanine, their comparative uses are informative for analyzing the role of Gua in DNA sequence in DNA-protein interaction. Several DNA oligomers containing such Hyp, Oxa or Xan substituted for Gua were designed to reveal the molecular interaction between DNA and protein. From this approach, we have got useful information to understand the molecular mechanisms of the DNA-recognizing enzymes, which have not ever been observed using conventional DNA oligomer composed of just natural nucleobases.
Keywords: Deaminated lesion, DNA-protein interaction, DNA-recognizing enzymes
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1292853 An Algebra for Protein Structure Data
Authors: Yanchao Wang, Rajshekhar Sunderraman
Abstract:
This paper presents an algebraic approach to optimize queries in domain-specific database management system for protein structure data. The approach involves the introduction of several protein structure specific algebraic operators to query the complex data stored in an object-oriented database system. The Protein Algebra provides an extensible set of high-level Genomic Data Types and Protein Data Types along with a comprehensive collection of appropriate genomic and protein functions. The paper also presents a query translator that converts high-level query specifications in algebra into low-level query specifications in Protein-QL, a query language designed to query protein structure data. The query transformation process uses a Protein Ontology that serves the purpose of a dictionary.Keywords: Domain-Specific Data Management, Protein Algebra, Protein Ontology, Protein Structure Data.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1542852 Comparison of Phylogenetic Trees of Multiple Protein Sequence Alignment Methods
Authors: Khaddouja Boujenfa, Nadia Essoussi, Mohamed Limam
Abstract:
Multiple sequence alignment is a fundamental part in many bioinformatics applications such as phylogenetic analysis. Many alignment methods have been proposed. Each method gives a different result for the same data set, and consequently generates a different phylogenetic tree. Hence, the chosen alignment method affects the resulting tree. However in the literature, there is no evaluation of multiple alignment methods based on the comparison of their phylogenetic trees. This work evaluates the following eight aligners: ClustalX, T-Coffee, SAGA, MUSCLE, MAFFT, DIALIGN, ProbCons and Align-m, based on their phylogenetic trees (test trees) produced on a given data set. The Neighbor-Joining method is used to estimate trees. Three criteria, namely, the dNNI, the dRF and the Id_Tree are established to test the ability of different alignment methods to produce closer test tree compared to the reference one (true tree). Results show that the method which produces the most accurate alignment gives the nearest test tree to the reference tree. MUSCLE outperforms all aligners with respect to the three criteria and for all datasets, performing particularly better when sequence identities are within 10-20%. It is followed by T-Coffee at lower sequence identity (<10%), Align-m at 20-30% identity, and ClustalX and ProbCons at 30-50% identity. Also, it is noticed that when sequence identities are higher (>30%), trees scores of all methods become similar.Keywords: Multiple alignment methods, phylogenetic trees, Neighbor-Joining method, Robinson-Foulds distance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1826851 Software Evolution Based Sequence Diagrams Merging
Authors: Zine-Eddine Bouras, Abdelouaheb Talai
Abstract:
The need to merge software artifacts seems inherent to modern software development. Distribution of development over several teams and breaking tasks into smaller, more manageable pieces are an effective means to deal with the kind of complexity. In each case, the separately developed artifacts need to be assembled as efficiently as possible into a consistent whole in which the parts still function as described. In addition, earlier changes are introduced into the life cycle and easier is their management by designers. Interaction-based specifications such as UML sequence diagrams have been found effective in this regard. As a result, sequence diagrams can be used not only for capturing system behaviors but also for merging changes in order to create a new version. The objective of this paper is to suggest a new approach to deal with the problem of software merging at the level of sequence diagrams by using the concept of dependence analysis that captures, formally, all mapping, and differences between elements of sequence diagrams and serves as a key concept to create a new version of sequence diagram.Keywords: System behaviors, sequence diagram merging, dependence analysis, sequence diagram slicing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1762850 One-Class Support Vector Machines for Protein-Protein Interactions Prediction
Authors: Hany Alashwal, Safaai Deris, Razib M. Othman
Abstract:
Predicting protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been applied to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Therefore, in this paper we solve this problem as a one-class classification problem using one-class support vector machines (SVM). Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of about 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples.Keywords: Bioinformatics, Protein-protein interactions, One-Class Support Vector Machines
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1988849 A New Class F2 (M, 0, N)L„ p)F of The Double Difference Sequences of Fuzzy Numbers
Authors: N. Subramanian, C. Murugesan
Abstract:
The double difference sequence space I2 (M, of fuzzy numbers for both 1 < p < oo and 0 < p < 1, is introduced. Some general properties of this sequence space are studied. Some inclusion relations involving this sequence space are obtained.
Keywords: Orlicz function, solid space, metric space, completeness
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1013848 An Improved Fast Search Method Using Histogram Features for DNA Sequence Database
Authors: Qiu Chen, Feifei Lee, Koji Kotani, Tadahiro Ohmi
Abstract:
In this paper, we propose an efficient hierarchical DNA sequence search method to improve the search speed while the accuracy is being kept constant. For a given query DNA sequence, firstly, a fast local search method using histogram features is used as a filtering mechanism before scanning the sequences in the database. An overlapping processing is newly added to improve the robustness of the algorithm. A large number of DNA sequences with low similarity will be excluded for latter searching. The Smith-Waterman algorithm is then applied to each remainder sequences. Experimental results using GenBank sequence data show the proposed method combining histogram information and Smith-Waterman algorithm is more efficient for DNA sequence search.Keywords: Fast search, DNA sequence, Histogram feature, Smith-Waterman algorithm, Local search
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1328847 Optimization of Protein Hydrolysate Production Process from Jatropha curcas Cake
Authors: Waraporn Apiwatanapiwat, Pilanee Vaithanomsat, Phanu Somkliang, Taweesiri Malapant
Abstract:
This was the first document revealing the investigation of protein hydrolysate production optimization from J. curcas cake. Proximate analysis of raw material showed 18.98% protein, 5.31% ash, 8.52% moisture and 12.18% lipid. The appropriate protein hydrolysate production process began with grinding the J. curcas cake into small pieces. Then it was suspended in 2.5% sodium hydroxide solution with ratio between solution/ J. curcas cake at 80:1 (v/w). The hydrolysis reaction was controlled at temperature 50 °C in water bath for 45 minutes. After that, the supernatant (protein hydrolysate) was separated using centrifuge at 8000g for 30 minutes. The maximum yield of resulting protein hydrolysate was 73.27 % with 7.34% moisture, 71.69% total protein, 7.12% lipid, 2.49% ash. The product was also capable of well dissolving in water.Keywords: Production, protein hydrolysate, Jatropha curcas cake, optimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1954