Search results for: Protein sequence database
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1545

Search results for: Protein sequence database

1545 Parallezation Protein Sequence Similarity Algorithms using Remote Method Interface

Authors: Mubarak Saif Mohsen, Zurinahni Zainol, Rosalina Abdul Salam, Wahidah Husain

Abstract:

One of the major problems in genomic field is to perform sequence comparison on DNA and protein sequences. Executing sequence comparison on the DNA and protein data is a computationally intensive task. Sequence comparison is the basic step for all algorithms in protein sequences similarity. Parallel computing is an attractive solution to provide the computational power needed to speedup the lengthy process of the sequence comparison. Our main research is to enhance the protein sequence algorithm using dynamic programming method. In our approach, we parallelize the dynamic programming algorithm using multithreaded program to perform the sequence comparison and also developed a distributed protein database among many PCs using Remote Method Interface (RMI). As a result, we showed how different sizes of protein sequences data and computation of scoring matrix of these protein sequence on different number of processors affected the processing time and speed, as oppose to sequential processing.

Keywords: Protein sequence algorithm, dynamic programming algorithm, multithread

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1857
1544 A System to Integrate and Manipulate Protein Database Using BioPerl and XML

Authors: Zurinahni Zainol, Rosalina Abdul Salam, Rosni Abdullah, Nur'Aini, Wahidah Husain

Abstract:

The size, complexity and number of databases used for protein information have caused bioinformatics to lag behind in adapting to the need to handle this distributed information. Integrating all the information from different databases into one database is a challenging problem. Our main research is to develop a tool which can be used to access and manipulate protein information from difference databases. In our approach, we have integrated difference databases such as Swiss-prot, PDB, Interpro, and EMBL and transformed these databases in flat file format into relational form using XML and Bioperl. As a result, we showed this tool can search different sizes of protein information stored in relational database and the result can be retrieved faster compared to flat file database. A web based user interface is provided to allow user to access or search for protein information in the local database.

Keywords: Protein sequence database, relational database, integrated database.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1395
1543 A Novel Approach for Protein Classification Using Fourier Transform

Authors: A. F. Ali, D. M. Shawky

Abstract:

Discovering new biological knowledge from the highthroughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a new approach for protein classification. Proteins that are evolutionarily- and thereby functionally- related are said to belong to the same classification. Identifying protein classification is of fundamental importance to document the diversity of the known protein universe. It also provides a means to determine the functional roles of newly discovered protein sequences. Our goal is to predict the functional classification of novel protein sequences based on a set of features extracted from each protein sequence. The proposed technique used datasets extracted from the Structural Classification of Proteins (SCOP) database. A set of spectral domain features based on Fast Fourier Transform (FFT) is used. The proposed classifier uses multilayer back propagation (MLBP) neural network for protein classification. The maximum classification accuracy is about 91% when applying the classifier to the full four levels of the SCOP database. However, it reaches a maximum of 96% when limiting the classification to the family level. The classification results reveal that spectral domain contains information that can be used for classification with high accuracy. In addition, the results emphasize that sequence similarity measures are of great importance especially at the family level.

Keywords: Bioinformatics, Artificial Neural Networks, Protein Sequence Analysis, Feature Extraction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2300
1542 Predicting Protein Function using Decision Tree

Authors: Manpreet Singh, Parminder Kaur Wadhwa, Surinder Kaur

Abstract:

The drug discovery process starts with protein identification because proteins are responsible for many functions required for maintenance of life. Protein identification further needs determination of protein function. Proposed method develops a classifier for human protein function prediction. The model uses decision tree for classification process. The protein function is predicted on the basis of matched sequence derived features per each protein function. The research work includes the development of a tool which determines sequence derived features by analyzing different parameters. The other sequence derived features are determined using various web based tools.

Keywords: Sequence Derived Features, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1900
1541 UTMGO: A Tool for Searching a Group of Semantically Related Gene Ontology Terms and Application to Annotation of Anonymous Protein Sequence

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias

Abstract:

Gene Ontology terms have been actively used to annotate various protein sets. SWISS-PROT, TrEMBL, and InterPro are protein databases that are annotated according to the Gene Ontology terms. However, direct implementation of the Gene Ontology terms for annotation of anonymous protein sequences is not easy, especially for species not commonly represented in biological databases. UTMGO is developed as a tool that allows the user to quickly and easily search for a group of semantically related Gene Ontology terms. The applicability of the UTMGO is demonstrated by applying it to annotation of anonymous protein sequence. The extended UTMGO uses the Gene Ontology terms together with protein sequences associated with the terms to perform the annotation task. GOPET, GOtcha, GoFigure, and JAFA are used to compare the performance of the extended UTMGO.

Keywords: Anonymous protein sequence, Gene Ontology, Protein sequence annotation, Protein sequence alignment

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1393
1540 Detecting Remote Protein Evolutionary Relationships via String Scoring Method

Authors: Nazar Zaki, Safaai Deris

Abstract:

The amount of the information being churned out by the field of biology has jumped manifold and now requires the extensive use of computer techniques for the management of this information. The predominance of biological information such as protein sequence similarity in the biological information sea is key information for detecting protein evolutionary relationship. Protein sequence similarity typically implies homology, which in turn may imply structural and functional similarities. In this work, we propose, a learning method for detecting remote protein homology. The proposed method uses a transformation that converts protein sequence into fixed-dimensional representative feature vectors. Each feature vector records the sensitivity of a protein sequence to a set of amino acids substrings generated from the protein sequences of interest. These features are then used in conjunction with support vector machines for the detection of the protein remote homology. The proposed method is tested and evaluated on two different benchmark protein datasets and it-s able to deliver improvements over most of the existing homology detection methods.

Keywords: Protein homology detection; support vectormachine; string kernel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1340
1539 Using Spectral Vectors and M-Tree for Graph Clustering and Searching in Graph Databases of Protein Structures

Authors: Do Phuc, Nguyen Thi Kim Phung

Abstract:

In this paper, we represent protein structure by using graph. A protein structure database will become a graph database. Each graph is represented by a spectral vector. We use Jacobi rotation algorithm to calculate the eigenvalues of the normalized Laplacian representation of adjacency matrix of graph. To measure the similarity between two graphs, we calculate the Euclidean distance between two graph spectral vectors. To cluster the graphs, we use M-tree with the Euclidean distance to cluster spectral vectors. Besides, M-tree can be used for graph searching in graph database. Our proposal method was tested with graph database of 100 graphs representing 100 protein structures downloaded from Protein Data Bank (PDB) and we compare the result with the SCOP hierarchical structure.

Keywords: Eigenvalues, m-tree, graph database, protein structure, spectra graph theory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1595
1538 Comparison of Domain and Hydrophobicity Features for the Prediction of Protein-Protein Interactions using Support Vector Machines

Authors: Hany Alashwal, Safaai Deris, Razib M. Othman

Abstract:

The protein domain structure has been widely used as the most informative sequence feature to computationally predict protein-protein interactions. However, in a recent study, a research group has reported a very high accuracy of 94% using hydrophobicity feature. Therefore, in this study we compare and verify the usefulness of protein domain structure and hydrophobicity properties as the sequence features. Using the Support Vector Machines (SVM) as the learning system, our results indicate that both features achieved accuracy of nearly 80%. Furthermore, domains structure had receiver operating characteristic (ROC) score of 0.8480 with running time of 34 seconds, while hydrophobicity had ROC score of 0.8159 with running time of 20,571 seconds (5.7 hours). These results indicate that protein-protein interaction can be predicted from domain structure with reliable accuracy and acceptable running time.

Keywords: Bioinformatics, protein-protein interactions, support vector machines, protein features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1872
1537 An Algebra for Protein Structure Data

Authors: Yanchao Wang, Rajshekhar Sunderraman

Abstract:

This paper presents an algebraic approach to optimize queries in domain-specific database management system for protein structure data. The approach involves the introduction of several protein structure specific algebraic operators to query the complex data stored in an object-oriented database system. The Protein Algebra provides an extensible set of high-level Genomic Data Types and Protein Data Types along with a comprehensive collection of appropriate genomic and protein functions. The paper also presents a query translator that converts high-level query specifications in algebra into low-level query specifications in Protein-QL, a query language designed to query protein structure data. The query transformation process uses a Protein Ontology that serves the purpose of a dictionary.

Keywords: Domain-Specific Data Management, Protein Algebra, Protein Ontology, Protein Structure Data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1488
1536 Fast Database Indexing for Large Protein Sequence Collections Using Parallel N-Gram Transformation Algorithm

Authors: Jehad A. H. Hammad, Nur'Aini binti Abdul Rashid

Abstract:

With the rapid development in the field of life sciences and the flooding of genomic information, the need for faster and scalable searching methods has become urgent. One of the approaches that were investigated is indexing. The indexing methods have been categorized into three categories which are the lengthbased index algorithms, transformation-based algorithms and mixed techniques-based algorithms. In this research, we focused on the transformation based methods. We embedded the N-gram method into the transformation-based method to build an inverted index table. We then applied the parallel methods to speed up the index building time and to reduce the overall retrieval time when querying the genomic database. Our experiments show that the use of N-Gram transformation algorithm is an economical solution; it saves time and space too. The result shows that the size of the index is smaller than the size of the dataset when the size of N-Gram is 5 and 6. The parallel N-Gram transformation algorithm-s results indicate that the uses of parallel programming with large dataset are promising which can be improved further.

Keywords: Biological sequence, Database index, N-gram indexing, Parallel computing, Sequence retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2078
1535 An Improved Fast Search Method Using Histogram Features for DNA Sequence Database

Authors: Qiu Chen, Feifei Lee, Koji Kotani, Tadahiro Ohmi

Abstract:

In this paper, we propose an efficient hierarchical DNA sequence search method to improve the search speed while the accuracy is being kept constant. For a given query DNA sequence, firstly, a fast local search method using histogram features is used as a filtering mechanism before scanning the sequences in the database. An overlapping processing is newly added to improve the robustness of the algorithm. A large number of DNA sequences with low similarity will be excluded for latter searching. The Smith-Waterman algorithm is then applied to each remainder sequences. Experimental results using GenBank sequence data show the proposed method combining histogram information and Smith-Waterman algorithm is more efficient for DNA sequence search.

Keywords: Fast search, DNA sequence, Histogram feature, Smith-Waterman algorithm, Local search

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1279
1534 Database Development and Discrimination Algorithms for Membrane Protein Functions

Authors: M. Michael Gromiha, Y. Yabuki, K. Imai, P. Horton, K. Fukui

Abstract:

We have developed a database for membrane protein functions, which has more than 3000 experimental data on functionally important amino acid residues in membrane proteins along with sequence, structure and literature information. Further, we have proposed different methods for identifying membrane proteins based on their functions: (i) discrimination of membrane transport proteins from other globular and membrane proteins and classifying them into channels/pores, electrochemical and active transporters, and (ii) β-signal for the insertion of mitochondrial β-barrel outer membrane proteins and potential targets. Our method showed an accuracy of 82% in discriminating transport proteins and 68% to classify them into three different transporters. In addition, we have identified a motif for targeting β-signal and potential candidates for mitochondrial β-barrel membrane proteins. Our methods can be used as effective tools for genome-wide annotations.

Keywords: Membrane proteins, database, transporters, discrimination, β-signal.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1523
1533 Introducing Sequence-Order Constraint into Prediction of Protein Binding Sites with Automatically Extracted Templates

Authors: Yi-Zhong Weng, Chien-Kang Huang, Yu-Feng Huang, Chi-Yuan Yu, Darby Tien-Hao Chang

Abstract:

Search for a tertiary substructure that geometrically matches the 3D pattern of the binding site of a well-studied protein provides a solution to predict protein functions. In our previous work, a web server has been built to predict protein-ligand binding sites based on automatically extracted templates. However, a drawback of such templates is that the web server was prone to resulting in many false positive matches. In this study, we present a sequence-order constraint to reduce the false positive matches of using automatically extracted templates to predict protein-ligand binding sites. The binding site predictor comprises i) an automatically constructed template library and ii) a local structure alignment algorithm for querying the library. The sequence-order constraint is employed to identify the inconsistency between the local regions of the query protein and the templates. Experimental results reveal that the sequence-order constraint can largely reduce the false positive matches and is effective for template-based binding site prediction.

Keywords: Protein structure, binding site, functional prediction

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1409
1532 Computational Method for Annotation of Protein Sequence According to Gene Ontology Terms

Authors: Razib M. Othman, Safaai Deris, Rosli M. Illias

Abstract:

Annotation of a protein sequence is pivotal for the understanding of its function. Accuracy of manual annotation provided by curators is still questionable by having lesser evidence strength and yet a hard task and time consuming. A number of computational methods including tools have been developed to tackle this challenging task. However, they require high-cost hardware, are difficult to be setup by the bioscientists, or depend on time intensive and blind sequence similarity search like Basic Local Alignment Search Tool. This paper introduces a new method of assigning highly correlated Gene Ontology terms of annotated protein sequences to partially annotated or newly discovered protein sequences. This method is fully based on Gene Ontology data and annotations. Two problems had been identified to achieve this method. The first problem relates to splitting the single monolithic Gene Ontology RDF/XML file into a set of smaller files that can be easy to assess and process. Thus, these files can be enriched with protein sequences and Inferred from Electronic Annotation evidence associations. The second problem involves searching for a set of semantically similar Gene Ontology terms to a given query. The details of macro and micro problems involved and their solutions including objective of this study are described. This paper also describes the protein sequence annotation and the Gene Ontology. The methodology of this study and Gene Ontology based protein sequence annotation tool namely extended UTMGO is presented. Furthermore, its basic version which is a Gene Ontology browser that is based on semantic similarity search is also introduced.

Keywords: automatic clustering, bioinformatics tool, gene ontology, protein sequence annotation, semantic similarity search

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3084
1531 Predicting Protein-Protein Interactions from Protein Sequences Using Phylogenetic Profiles

Authors: Omer Nebil Yaveroglu, Tolga Can

Abstract:

In this study, a high accuracy protein-protein interaction prediction method is developed. The importance of the proposed method is that it only uses sequence information of proteins while predicting interaction. The method extracts phylogenetic profiles of proteins by using their sequence information. Combining the phylogenetic profiles of two proteins by checking existence of homologs in different species and fitting this combined profile into a statistical model, it is possible to make predictions about the interaction status of two proteins. For this purpose, we apply a collection of pattern recognition techniques on the dataset of combined phylogenetic profiles of protein pairs. Support Vector Machines, Feature Extraction using ReliefF, Naive Bayes Classification, K-Nearest Neighborhood Classification, Decision Trees, and Random Forest Classification are the methods we applied for finding the classification method that best predicts the interaction status of protein pairs. Random Forest Classification outperformed all other methods with a prediction accuracy of 76.93%

Keywords: Protein Interaction Prediction, Phylogenetic Profile, SVM , ReliefF, Decision Trees, Random Forest Classification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1566
1530 Genome-Wide Analysis of BES1/BZR1 Gene Family in Five Plant Species

Authors: Jafar Ahmadi, Zhohreh Asiaban, Sedigheh Fabriki Ourang

Abstract:

Brassinosteroids (BRs) regulate cell elongation, vascular differentiation, senescence, and stress responses. BRs signal through the BES1/BZR1 family of transcription factors, which regulate hundreds of target genes involved in this pathway. In this research a comprehensive genome-wide analysis was carried out in BES1/BZR1 gene family in Arabidopsis thaliana, Cucumis sativus, Vitis vinifera, Glycin max and Brachypodium distachyon. Specifications of the desired sequences, dot plot and hydropathy plot were analyzed in the protein and genome sequences of five plant species. The maximum amino acid length was attributed to protein sequence Brdic3g with 374aa and the minimum amino acid length was attributed to protein sequence Gm7g with 163aa. The maximum Instability index was attributed to protein sequence AT1G19350 equal with 79.99 and the minimum Instability index was attributed to protein sequence Gm5g equal with 33.22. Aliphatic index of these protein sequences ranged from 47.82 to 78.79 in Arabidopsis thaliana, 49.91 to 57.50 in Vitis vinifera, 55.09 to 82.43 in Glycin max, 54.09 to 54.28 in Brachypodium distachyon 55.36 to 56.83 in Cucumis sativus. Overall, data obtained from our investigation contributes a better understanding of the complexity of the BES1/BZR1 gene family and provides the first step towards directing future experimental designs to perform systematic analysis of the functions of the BES1/BZR1 gene family.

Keywords: BES1/BZR1, Brassinosteroids, Phylogenetic analysis, Transcription factor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2204
1529 Connectivity Characteristic of Transcription Factor

Authors: T. Mahalakshmi, Aswathi B. L., Achuthsankar S. Nair

Abstract:

Transcription factors are a group of proteins that helps for interpreting the genetic information in DNA. Protein-protein interactions play a major role in the execution of key biological functions of a cell. These interactions are represented in the form of a graph with nodes and edges. Studies have showed that some nodes have high degree of connectivity and such nodes, known as hub nodes, are the inevitable parts of the network. In the present paper a method is proposed to identify hub transcription factor proteins using sequence information. On a complete data set of transcription factor proteins available from the APID database, the proposed method showed an accuracy of 77%, sensitivity of 79% and specificity of 76%.

Keywords: Transcription Factor Proteins, Hub Proteins, Shannon Index, Transfer Free Energy to Surface (TFES).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1341
1528 Virulent-GO: Prediction of Virulent Proteins in Bacterial Pathogens Utilizing Gene Ontology Terms

Authors: Chia-Ta Tsai, Wen-Lin Huang, Shinn-Jang Ho, Li-Sun Shu, Shinn-Ying Ho

Abstract:

Prediction of bacterial virulent protein sequences can give assistance to identification and characterization of novel virulence-associated factors and discover drug/vaccine targets against proteins indispensable to pathogenicity. Gene Ontology (GO) annotation which describes functions of genes and gene products as a controlled vocabulary of terms has been shown effectively for a variety of tasks such as gene expression study, GO annotation prediction, protein subcellular localization, etc. In this study, we propose a sequence-based method Virulent-GO by mining informative GO terms as features for predicting bacterial virulent proteins. Each protein in the datasets used by the existing method VirulentPred is annotated by using BLAST to obtain its homologies with known accession numbers for retrieving GO terms. After investigating various popular classifiers using the same five-fold cross-validation scheme, Virulent-GO using the single kind of GO term features with an accuracy of 82.5% is slightly better than VirulentPred with 81.8% using five kinds of sequence-based features. For the evaluation of independent test, Virulent-GO also yields better results (82.0%) than VirulentPred (80.7%). When evaluating single kind of feature with SVM, the GO term feature performs much well, compared with each of the five kinds of features.

Keywords: Bacterial virulence factors, GO terms, prediction, protein sequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2138
1527 Cloning, Expression and Protein Purification of AV1 Gene of Okra Leaf Curl Virus Egyptian Isolate and Genetic Diversity between Whitefly and Different Plant Hosts

Authors: Dalia. G. Aseel

Abstract:

Begomoviruses are economically important plant viruses that infect dicotyledonous plants and exclusively transmitted by the whitefly Bemisia tabaci. Here, replicative form was isolated from Okra, Cotton, Tomato plants and whitefly infected with Begomoviruses. Using coat protein specific primers (AV1), the viral infection was verified with amplicon at 450 bp. The sequence of OLCuV-AV1 gene was recorded and received an accession number (FJ441605) from Genebank. The phylogenetic tree of OLCuV was closely related to Okra leaf curl virus previously isolated from Cameroon and USA with nucleotide sequence identity of 92%. The protein purification was carried out using His-Tag methodology by using Affinity Chromatography. The purified protein was separated on SDS-PAGE analysis and an enriched expected size of band at 30 kDa was observed. Furthermore, RAPD and SDS-PAGE were used to detect genetic variability between different hosts of okra leaf curl virus (OLCuV), cotton leaf curl virus (CLCuV), tomato yellow leaf curl virus (TYLCuV) and the whitefly vector. Finally, the present study would help to understand the relationship between the whitefly and different economical crops in Egypt.

Keywords: Begomovirus, AV1 gene, sequence, cloning, whitefly, okra, cotton, tomato, RAPD, phylogenetic tree and SDS-PAGE.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 824
1526 Protein Graph Partitioning by Mutually Maximization of cycle-distributions

Authors: Frank Emmert Streib

Abstract:

The classification of the protein structure is commonly not performed for the whole protein but for structural domains, i.e., compact functional units preserved during evolution. Hence, a first step to a protein structure classification is the separation of the protein into its domains. We approach the problem of protein domain identification by proposing a novel graph theoretical algorithm. We represent the protein structure as an undirected, unweighted and unlabeled graph which nodes correspond the secondary structure elements of the protein. This graph is call the protein graph. The domains are then identified as partitions of the graph corresponding to vertices sets obtained by the maximization of an objective function, which mutually maximizes the cycle distributions found in the partitions of the graph. Our algorithm does not utilize any other kind of information besides the cycle-distribution to find the partitions. If a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. As stop criterion, we calculate numerically a significance level which indicates the stability of the predicted partition against a random rewiring of the protein graph. Hence, our algorithm terminates automatically its iterative application. We present results for one and two domain proteins and compare our results with the manually assigned domains by the SCOP database and differences are discussed.

Keywords: Graph partitioning, unweighted graph, protein domains.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1304
1525 Protein-Protein Interaction Detection Based on Substring Sensitivity Measure

Authors: Nazar Zaki, Safaai Deris, Hany Alashwal

Abstract:

Detecting protein-protein interactions is a central problem in computational biology and aberrant such interactions may have implicated in a number of neurological disorders. As a result, the prediction of protein-protein interactions has recently received considerable attention from biologist around the globe. Computational tools that are capable of effectively identifying protein-protein interactions are much needed. In this paper, we propose a method to detect protein-protein interaction based on substring similarity measure. Two protein sequences may interact by the mean of the similarities of the substrings they contain. When applied on the currently available protein-protein interaction data for the yeast Saccharomyces cerevisiae, the proposed method delivered reasonable improvement over the existing ones.

Keywords: Protein-Protein Interaction, support vector machine, feature extraction, pairwise alignment, Smith-Waterman score.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1891
1524 Mining Sequential Patterns Using I-PrefixSpan

Authors: Dhany Saputra, Dayang R. A. Rambli, Oi Mean Foong

Abstract:

In this paper, we propose an improvement of pattern growth-based PrefixSpan algorithm, called I-PrefixSpan. The general idea of I-PrefixSpan is to use sufficient data structure for Seq-Tree framework and separator database to reduce the execution time and memory usage. Thus, with I-PrefixSpan there is no in-memory database stored after index set is constructed. The experimental result shows that using Java 2, this method improves the speed of PrefixSpan up to almost two orders of magnitude as well as the memory usage to more than one order of magnitude.

Keywords: ArrayList, ArrayIntList, minimum support, sequence database, sequential patterns.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1524
1523 PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts

Authors: Rania A. Abul Seoud, Nahed H. Solouma, Abou-Baker M. Youssef, Yasser M. Kadah

Abstract:

Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising.

Keywords: Link Grammar Parser, Interaction extraction, protein-protein interaction, Natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2194
1522 Multiple Sequence Alignment Using Optimization Algorithms

Authors: M. F. Omar, R. A. Salam, R. Abdullah, N. A. Rashid

Abstract:

Proteins or genes that have similar sequences are likely to perform the same function. One of the most widely used techniques for sequence comparison is sequence alignment. Sequence alignment allows mismatches and insertion/deletion, which represents biological mutations. Sequence alignment is usually performed only on two sequences. Multiple sequence alignment, is a natural extension of two-sequence alignment. In multiple sequence alignment, the emphasis is to find optimal alignment for a group of sequences. Several applicable techniques were observed in this research, from traditional method such as dynamic programming to the extend of widely used stochastic optimization method such as Genetic Algorithms (GAs) and Simulated Annealing. A framework with combination of Genetic Algorithm and Simulated Annealing is presented to solve Multiple Sequence Alignment problem. The Genetic Algorithm phase will try to find new region of solution while Simulated Annealing can be considered as an alignment improver for any near optimal solution produced by GAs.

Keywords: Simulated annealing, genetic algorithm, sequence alignment, multiple sequence alignment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2361
1521 Analytical Modeling of Globular Protein-Ferritin in α-Helical Conformation: A White Noise Functional Approach

Authors: Vernie C. Convicto, Henry P. Aringa, Wilson I. Barredo

Abstract:

This study presents a conformational model of the helical structures of globular protein particularly ferritin in the framework of white noise path integral formulation by using Associated Legendre functions, Bessel and convolution of Bessel and trigonometric functions as modulating functions. The model incorporates chirality features of proteins and their helix-turn-helix sequence structural motif.

Keywords: Globular protein, modulating function, white noise, winding probability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1901
1520 Expression of Tissue Plasminogen Activator in Transgenic Tobacco Plants by Signal Peptides Targeting for Delivery to Apoplast, Endoplasmic Reticulum and Cytosol Spaces

Authors: Sadegh Lotfieblisofla, Arash Khodabakhshi

Abstract:

Tissue plasminogen activator (tPA) as a serine protease plays an important role in the fibrinolytic system and the dissolution of fibrin clots in human body. The production of this drug in plants such as tobacco could reduce its production costs. In this study, expression of tPA gene and protein targeting to different plant cell compartments, using various signal peptides has been investigated. For high level of expression, Kozak sequence was used after CaMV35S in the beginning of the gene. In order to design the final construction, Extensin, KDEL (amino acid sequence including Lys-Asp-Glu-Leu) and SP (γ-zein signal peptide coding sequence) were used as leader signals to conduct this protein into apoplast, endoplasmic reticulum and cytosol spaces, respectively. Cloned human tPA gene under the CaMV (Cauliflower mosaic virus) 35S promoter and NOS (Nopaline Synthase) terminator into pBI121 plasmid was transferred into tobacco explants by Agrobacterium tumefaciens strain LBA4404. The presence and copy number of genes in transgenic tobacco was proved by Southern blotting. Enzymatic activity of the rt-PA protein in transgenic plants compared to non-transgenic plants was confirmed by Zymography assay. The presence and amount of rt-PA recombinant protein in plants was estimated by ELISA analysis on crude protein extract of transgenic tobacco using a specific antibody. The yield of recombinant tPA in transgenic tobacco for SP, KDEL, Extensin signals were counted 0.50, 0.68, 0.69 microgram per milligram of total soluble proteins.

Keywords: Recombinant tissue plasminogen activator, plant cell comportment, leader signals, transgenic tobacco.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 652
1519 Observation of the Correlations between Pair Wise Interaction and Functional Organization of the Proteins, in the Protein Interaction Network of Saccaromyces Cerevisiae

Authors: N. Tuncbag, T. Haliloglu, O. Keskin

Abstract:

Understanding the cell's large-scale organization is an interesting task in computational biology. Thus, protein-protein interactions can reveal important organization and function of the cell. Here, we investigated the correspondence between protein interactions and function for the yeast. We obtained the correlations among the set of proteins. Then these correlations are clustered using both the hierarchical and biclustering methods. The detailed analyses of proteins in each cluster were carried out by making use of their functional annotations. As a result, we found that some functional classes appear together in almost all biclusters. On the other hand, in hierarchical clustering, the dominancy of one functional class is observed. In brief, from interaction data to function, some correlated results are noticed about the relationship between interaction and function which might give clues about the organization of the proteins.

Keywords: Pair-wise protein interactions, DIP database, functional correlations, biclustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1651
1518 Investigations of Protein Aggregation Using Sequence and Structure Based Features

Authors: M. Michael Gromiha, A. Mary Thangakani, Sandeep Kumar, D. Velmurugan

Abstract:

The main cause of several neurodegenerative diseases such as Alzhemier, Parkinson and spongiform encephalopathies is formation of amyloid fibrils and plaques in proteins. We have analyzed different sets of proteins and peptides to understand the influence of sequence based features on protein aggregation process. The comparison of 373 pairs of homologous mesophilic and thermophilic proteins showed that aggregation prone regions (APRs) are present in both. But, the thermophilic protein monomers show greater ability to ‘stow away’ the APRs in their hydrophobic cores and protect them from solvent exposure. The comparison of amyloid forming and amorphous b-aggregating hexapeptides suggested distinct preferences for specific residues at the six positions as well as all possible combinations of nine residue pairs. The compositions of residues at different positions and residue pairs have been converted into energy potentials and utilized for distinguishing between amyloid forming and amorphous b-aggregating peptides. Our method could correctly identify the amyloid forming peptides at an accuracy of 95-100% in different datasets of peptides.

Keywords: Aggregation prone regions, amyloids, thermophilic proteins, amino acid residues, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1449
1517 Computational Analysis of Potential Inhibitors Selected Based On Structural Similarity for the Src SH2 Domain

Authors: W. P. Hu, J. V. Kumar, Jeffrey J. P. Tsai

Abstract:

The inhibition of SH2 domain regulated protein-protein interactions is an attractive target for developing an effective chemotherapeutic approach in the treatment of disease. Molecular simulation is a useful tool for developing new drugs and for studying molecular recognition. In this study, we searched potential drug compounds for the inhibition of SH2 domain by performing structural similarity search in PubChem Compound Database. A total of 37 compounds were screened from the database, and then we used the LibDock docking program to evaluate the inhibition effect. The best three compounds (AP22408, CID 71463546 and CID 9917321) were chosen for MD simulations after the LibDock docking. Our results show that the compound CID 9917321 can produce a more stable protein-ligand complex compared to other two currently known inhibitors of Src SH2 domain. The compound CID 9917321 may be useful for the inhibition of SH2 domain based on these computational results. Subsequently experiments are needed to verify the effect of compound CID 9917321 on the SH2 domain in the future studies.

Keywords: Nonpeptide inhibitor, Src SH2 domain, LibDock, molecular dynamics simulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2027
1516 Multiple Sequence Alignment Using Three- Dimensional Fragments

Authors: Layal Al Ait, Eduardo Corel, Kifah Tout, Burkhard Morgenstern

Abstract:

Background: Dialign is a DNA/Protein alignment tool for performing pairwise and multiple pairwise alignments through the comparison of gap-free segments (fragments) between sequence pairs. An alignment of two sequences is a chain of fragments, i.e local gap-free pairwise alignments, with the highest total score. METHOD: A new approach is defined in this article which relies on the concept of using three-dimensional fragments – i.e. local threeway alignments -- in the alignment process instead of twodimensional ones. These three-dimensional fragments are gap-free alignments constituting of equal-length segments belonging to three distinct sequences. RESULTS: The obtained results showed good improvments over the performance of DIALIGN.

Keywords: DIALIGN, Multiple sequence alignment, Threedimensional fragments.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1494