Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3718

Search results for: protein structure prediction

3718 Protein Secondary Structure Prediction

Authors: Manpreet Singh, Parvinder Singh Sandhu, Reet Kamal Kaur

Abstract:

Protein structure determination and prediction has been a focal research subject in the field of bioinformatics due to the importance of protein structure in understanding the biological and chemical activities of organisms. The experimental methods used by biotechnologists to determine the structures of proteins demand sophisticated equipment and time. A host of computational methods are developed to predict the location of secondary structure elements in proteins for complementing or creating insights into experimental results. However, prediction accuracies of these methods rarely exceed 70%.

Keywords: Protein, Secondary Structure, Prediction, DNA, RNA.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1116
3717 Protein Secondary Structure Prediction Using Parallelized Rule Induction from Coverings

Authors: Leong Lee, Cyriac Kandoth, Jennifer L. Leopold, Ronald L. Frank

Abstract:

Protein 3D structure prediction has always been an important research area in bioinformatics. In particular, the prediction of secondary structure has been a well-studied research topic. Despite the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of various computational prediction algorithms rarely has exceeded 75%. In a previous paper [1], this research team presented a rule-based method called RT-RICO (Relaxed Threshold Rule Induction from Coverings) to predict protein secondary structure. The average Q3 accuracy on the sample datasets using RT-RICO was 80.3%, an improvement over comparable computational methods. Although this demonstrated that RT-RICO might be a promising approach for predicting secondary structure, the algorithm-s computational complexity and program running time limited its use. Herein a parallelized implementation of a slightly modified RT-RICO approach is presented. This new version of the algorithm facilitated the testing of a much larger dataset of 396 protein domains [2]. Parallelized RTRICO achieved a Q3 score of 74.6%, which is higher than the consensus prediction accuracy of 72.9% that was achieved for the same test dataset by a combination of four secondary structure prediction methods [2].

Keywords: data mining, protein secondary structure prediction, parallelization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1351
3716 Protein Residue Contact Prediction using Support Vector Machine

Authors: Chan Weng Howe, Mohd Saberi Mohamad

Abstract:

Protein residue contact map is a compact representation of secondary structure of protein. Due to the information hold in the contact map, attentions from researchers in related field were drawn and plenty of works have been done throughout the past decade. Artificial intelligence approaches have been widely adapted in related works such as neural networks, genetic programming, and Hidden Markov model as well as support vector machine. However, the performance of the prediction was not generalized which probably depends on the data used to train and generate the prediction model. This situation shown the importance of the features or information used in affecting the prediction performance. In this research, support vector machine was used to predict protein residue contact map on different combination of features in order to show and analyze the effectiveness of the features.

Keywords: contact map, protein residue contact, support vector machine, protein structure prediction

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1610
3715 Comparison of Domain and Hydrophobicity Features for the Prediction of Protein-Protein Interactions using Support Vector Machines

Authors: Hany Alashwal, Safaai Deris, Razib M. Othman

Abstract:

The protein domain structure has been widely used as the most informative sequence feature to computationally predict protein-protein interactions. However, in a recent study, a research group has reported a very high accuracy of 94% using hydrophobicity feature. Therefore, in this study we compare and verify the usefulness of protein domain structure and hydrophobicity properties as the sequence features. Using the Support Vector Machines (SVM) as the learning system, our results indicate that both features achieved accuracy of nearly 80%. Furthermore, domains structure had receiver operating characteristic (ROC) score of 0.8480 with running time of 34 seconds, while hydrophobicity had ROC score of 0.8159 with running time of 20,571 seconds (5.7 hours). These results indicate that protein-protein interaction can be predicted from domain structure with reliable accuracy and acceptable running time.

Keywords: Bioinformatics, protein-protein interactions, support vector machines, protein features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1704
3714 Introducing Sequence-Order Constraint into Prediction of Protein Binding Sites with Automatically Extracted Templates

Authors: Yi-Zhong Weng, Chien-Kang Huang, Yu-Feng Huang, Chi-Yuan Yu, Darby Tien-Hao Chang

Abstract:

Search for a tertiary substructure that geometrically matches the 3D pattern of the binding site of a well-studied protein provides a solution to predict protein functions. In our previous work, a web server has been built to predict protein-ligand binding sites based on automatically extracted templates. However, a drawback of such templates is that the web server was prone to resulting in many false positive matches. In this study, we present a sequence-order constraint to reduce the false positive matches of using automatically extracted templates to predict protein-ligand binding sites. The binding site predictor comprises i) an automatically constructed template library and ii) a local structure alignment algorithm for querying the library. The sequence-order constraint is employed to identify the inconsistency between the local regions of the query protein and the templates. Experimental results reveal that the sequence-order constraint can largely reduce the false positive matches and is effective for template-based binding site prediction.

Keywords: Protein structure, binding site, functional prediction

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1233
3713 Improving Protein-Protein Interaction Prediction by Using Encoding Strategies and Random Indices

Authors: Essam Al-Daoud

Abstract:

A New features are extracted and compared to improve the prediction of protein-protein interactions. The basic idea is to select and use the best set of features from the Tensor matrices that are produced by the frequency vectors of the protein sequences. Three set of features are compared, the first set is based on the indices that are the most common in the interacting proteins, the second set is based on the indices that tend to be common in the interacting and non-interacting proteins, and the third set is constructed by using random indices. Moreover, three encoding strategies are compared; that are based on the amino asides polarity, structure, and chemical properties. The experimental results indicate that the highest accuracy can be obtained by using random indices with chemical properties encoding strategy and support vector machine.

Keywords: protein-protein interactions, random indices, encoding strategies, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1332
3712 Sequence-based Prediction of Gamma-turn Types using a Physicochemical Property-based Decision Tree Method

Authors: Chyn Liaw, Chun-Wei Tung, Shinn-Jang Ho, Shinn-Ying Ho

Abstract:

The γ-turns play important roles in protein folding and molecular recognition. The prediction and analysis of γ-turn types are important for both protein structure predictions and better understanding the characteristics of different γ-turn types. This study proposed a physicochemical property-based decision tree (PPDT) method to interpretably predict γ-turn types. In addition to the good prediction performance of PPDT, three simple and human interpretable IF-THEN rules are extracted from the decision tree constructed by PPDT. The identified informative physicochemical properties and concise rules provide a simple way for discriminating and understanding γ-turn types.

Keywords: Classification and regression tree (CART), γ-turn, Physicochemical properties, Protein secondary structure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1315
3711 An Algebra for Protein Structure Data

Authors: Yanchao Wang, Rajshekhar Sunderraman

Abstract:

This paper presents an algebraic approach to optimize queries in domain-specific database management system for protein structure data. The approach involves the introduction of several protein structure specific algebraic operators to query the complex data stored in an object-oriented database system. The Protein Algebra provides an extensible set of high-level Genomic Data Types and Protein Data Types along with a comprehensive collection of appropriate genomic and protein functions. The paper also presents a query translator that converts high-level query specifications in algebra into low-level query specifications in Protein-QL, a query language designed to query protein structure data. The query transformation process uses a Protein Ontology that serves the purpose of a dictionary.

Keywords: Domain-Specific Data Management, Protein Algebra, Protein Ontology, Protein Structure Data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1296
3710 Prediction of Protein Subchloroplast Locations using Random Forests

Authors: Chun-Wei Tung, Chyn Liaw, Shinn-Jang Ho, Shinn-Ying Ho

Abstract:

Protein subchloroplast locations are correlated with its functions. In contrast to the large amount of available protein sequences, the information of their locations and functions is less known. The experiment works for identification of protein locations and functions are costly and time consuming. The accurate prediction of protein subchloroplast locations can accelerate the study of functions of proteins in chloroplast. This study proposes a Random Forest based method, ChloroRF, to predict protein subchloroplast locations using interpretable physicochemical properties. In addition to high prediction accuracy, the ChloroRF is able to select important physicochemical properties. The important physicochemical properties are also analyzed to provide insights into the underlying mechanism.

Keywords: Chloroplast, Physicochemical properties, Proteinlocations, Random Forests.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1403
3709 Protein-Protein Interaction Detection Based on Substring Sensitivity Measure

Authors: Nazar Zaki, Safaai Deris, Hany Alashwal

Abstract:

Detecting protein-protein interactions is a central problem in computational biology and aberrant such interactions may have implicated in a number of neurological disorders. As a result, the prediction of protein-protein interactions has recently received considerable attention from biologist around the globe. Computational tools that are capable of effectively identifying protein-protein interactions are much needed. In this paper, we propose a method to detect protein-protein interaction based on substring similarity measure. Two protein sequences may interact by the mean of the similarities of the substrings they contain. When applied on the currently available protein-protein interaction data for the yeast Saccharomyces cerevisiae, the proposed method delivered reasonable improvement over the existing ones.

Keywords: Protein-Protein Interaction, support vector machine, feature extraction, pairwise alignment, Smith-Waterman score.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1692
3708 Selecting Negative Examples for Protein-Protein Interaction

Authors: Mohammad Shoyaib, M. Abdullah-Al-Wadud, Oksam Chae

Abstract:

Proteomics is one of the largest areas of research for bioinformatics and medical science. An ambitious goal of proteomics is to elucidate the structure, interactions and functions of all proteins within cells and organisms. Predicting Protein-Protein Interaction (PPI) is one of the crucial and decisive problems in current research. Genomic data offer a great opportunity and at the same time a lot of challenges for the identification of these interactions. Many methods have already been proposed in this regard. In case of in-silico identification, most of the methods require both positive and negative examples of protein interaction and the perfection of these examples are very much crucial for the final prediction accuracy. Positive examples are relatively easy to obtain from well known databases. But the generation of negative examples is not a trivial task. Current PPI identification methods generate negative examples based on some assumptions, which are likely to affect their prediction accuracy. Hence, if more reliable negative examples are used, the PPI prediction methods may achieve even more accuracy. Focusing on this issue, a graph based negative example generation method is proposed, which is simple and more accurate than the existing approaches. An interaction graph of the protein sequences is created. The basic assumption is that the longer the shortest path between two protein-sequences in the interaction graph, the less is the possibility of their interaction. A well established PPI detection algorithm is employed with our negative examples and in most cases it increases the accuracy more than 10% in comparison with the negative pair selection method in that paper.

Keywords: Interaction graph, Negative training data, Protein-Protein interaction, Support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1430
3707 Detecting Community Structure in Amino Acid Interaction Networks

Authors: Omar GACI, Stefan BALEV, Antoine DUTOT

Abstract:

In this paper we introduce the notion of protein interaction network. This is a graph whose vertices are the protein-s amino acids and whose edges are the interactions between them. Using a graph theory approach, we observe that according to their structural roles, the nodes interact differently. By leading a community structure detection, we confirm this specific behavior and describe thecommunities composition to finally propose a new approach to fold a protein interaction network.

Keywords: interaction network, protein structure, community structure detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1284
3706 Predicting Protein-Protein Interactions from Protein Sequences Using Phylogenetic Profiles

Authors: Omer Nebil Yaveroglu, Tolga Can

Abstract:

In this study, a high accuracy protein-protein interaction prediction method is developed. The importance of the proposed method is that it only uses sequence information of proteins while predicting interaction. The method extracts phylogenetic profiles of proteins by using their sequence information. Combining the phylogenetic profiles of two proteins by checking existence of homologs in different species and fitting this combined profile into a statistical model, it is possible to make predictions about the interaction status of two proteins. For this purpose, we apply a collection of pattern recognition techniques on the dataset of combined phylogenetic profiles of protein pairs. Support Vector Machines, Feature Extraction using ReliefF, Naive Bayes Classification, K-Nearest Neighborhood Classification, Decision Trees, and Random Forest Classification are the methods we applied for finding the classification method that best predicts the interaction status of protein pairs. Random Forest Classification outperformed all other methods with a prediction accuracy of 76.93%

Keywords: Protein Interaction Prediction, Phylogenetic Profile, SVM , ReliefF, Decision Trees, Random Forest Classification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1300
3705 Protein Graph Partitioning by Mutually Maximization of cycle-distributions

Authors: Frank Emmert Streib

Abstract:

The classification of the protein structure is commonly not performed for the whole protein but for structural domains, i.e., compact functional units preserved during evolution. Hence, a first step to a protein structure classification is the separation of the protein into its domains. We approach the problem of protein domain identification by proposing a novel graph theoretical algorithm. We represent the protein structure as an undirected, unweighted and unlabeled graph which nodes correspond the secondary structure elements of the protein. This graph is call the protein graph. The domains are then identified as partitions of the graph corresponding to vertices sets obtained by the maximization of an objective function, which mutually maximizes the cycle distributions found in the partitions of the graph. Our algorithm does not utilize any other kind of information besides the cycle-distribution to find the partitions. If a partition is found, the algorithm is iteratively applied to each of the resulting subgraphs. As stop criterion, we calculate numerically a significance level which indicates the stability of the predicted partition against a random rewiring of the protein graph. Hence, our algorithm terminates automatically its iterative application. We present results for one and two domain proteins and compare our results with the manually assigned domains by the SCOP database and differences are discussed.

Keywords: Graph partitioning, unweighted graph, protein domains.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1109
3704 Using Spectral Vectors and M-Tree for Graph Clustering and Searching in Graph Databases of Protein Structures

Authors: Do Phuc, Nguyen Thi Kim Phung

Abstract:

In this paper, we represent protein structure by using graph. A protein structure database will become a graph database. Each graph is represented by a spectral vector. We use Jacobi rotation algorithm to calculate the eigenvalues of the normalized Laplacian representation of adjacency matrix of graph. To measure the similarity between two graphs, we calculate the Euclidean distance between two graph spectral vectors. To cluster the graphs, we use M-tree with the Euclidean distance to cluster spectral vectors. Besides, M-tree can be used for graph searching in graph database. Our proposal method was tested with graph database of 100 graphs representing 100 protein structures downloaded from Protein Data Bank (PDB) and we compare the result with the SCOP hierarchical structure.

Keywords: Eigenvalues, m-tree, graph database, protein structure, spectra graph theory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1316
3703 On the Prediction of Transmembrane Helical Segments in Membrane Proteins

Authors: Yu Bin, Zhang Yan

Abstract:

The prediction of transmembrane helical segments (TMHs) in membrane proteins is an important field in the bioinformatics research. In this paper, a method based on discrete wavelet transform (DWT) has been developed to predict the number and location of TMHs in membrane proteins. PDB coded as 1F88 was chosen as an example to describe the prediction of the number and location of TMHs in membrane proteins by using this method. One group of test data sets that contain total 19 protein sequences was utilized to access the effect of this method. Compared with the prediction results of DAS, PRED-TMR2, SOSUI, HMMTOP2.0 and TMHMM2.0, the obtained results indicate that the presented method has higher prediction accuracy.

Keywords: hydrophobicity, membrane protein, transmembranehelical segments, wavelet transform

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1366
3702 Critical Assessment of Scoring Schemes for Protein-Protein Docking Predictions

Authors: Dhananjay C. Joshi, Jung-Hsin Lin

Abstract:

Protein-protein interactions (PPI) play a crucial role in many biological processes such as cell signalling, transcription, translation, replication, signal transduction, and drug targeting, etc. Structural information about protein-protein interaction is essential for understanding the molecular mechanisms of these processes. Structures of protein-protein complexes are still difficult to obtain by biophysical methods such as NMR and X-ray crystallography, and therefore protein-protein docking computation is considered an important approach for understanding protein-protein interactions. However, reliable prediction of the protein-protein complexes is still under way. In the past decades, several grid-based docking algorithms based on the Katchalski-Katzir scoring scheme were developed, e.g., FTDock, ZDOCK, HADDOCK, RosettaDock, HEX, etc. However, the success rate of protein-protein docking prediction is still far from ideal. In this work, we first propose a more practical measure for evaluating the success of protein-protein docking predictions,the rate of first success (RFS), which is similar to the concept of mean first passage time (MFPT). Accordingly, we have assessed the ZDOCK bound and unbound benchmarks 2.0 and 3.0. We also createda new benchmark set for protein-protein docking predictions, in which the complexes have experimentally determined binding affinity data. We performed free energy calculation based on the solution of non-linear Poisson-Boltzmann equation (nlPBE) to improve the binding mode prediction. We used the well-studied thebarnase-barstarsystem to validate the parameters for free energy calculations. Besides,thenlPBE-based free energy calculations were conducted for the badly predicted cases by ZDOCK and ZRANK. We found that direct molecular mechanics energetics cannot be used to discriminate the native binding pose from the decoys.Our results indicate that nlPBE-based calculations appeared to be one of the promising approaches for improving the success rate of binding pose predictions.

Keywords: protein-protein docking, protein-protein interaction, molecular mechanics energetics, Poisson-Boltzmann calculations

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1577
3701 Predicting Protein Function using Decision Tree

Authors: Manpreet Singh, Parminder Kaur Wadhwa, Surinder Kaur

Abstract:

The drug discovery process starts with protein identification because proteins are responsible for many functions required for maintenance of life. Protein identification further needs determination of protein function. Proposed method develops a classifier for human protein function prediction. The model uses decision tree for classification process. The protein function is predicted on the basis of matched sequence derived features per each protein function. The research work includes the development of a tool which determines sequence derived features by analyzing different parameters. The other sequence derived features are determined using various web based tools.

Keywords: Sequence Derived Features, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1706
3700 One-Class Support Vector Machines for Protein-Protein Interactions Prediction

Authors: Hany Alashwal, Safaai Deris, Razib M. Othman

Abstract:

Predicting protein-protein interactions represent a key step in understanding proteins functions. This is due to the fact that proteins usually work in context of other proteins and rarely function alone. Machine learning techniques have been applied to predict protein-protein interactions. However, most of these techniques address this problem as a binary classification problem. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Therefore, in this paper we solve this problem as a one-class classification problem using one-class support vector machines (SVM). Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of about 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples.

Keywords: Bioinformatics, Protein-protein interactions, One-Class Support Vector Machines

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1590
3699 Effect of Calcium Chloride on Rheological Properties and Structure of Inulin - Whey Protein Gels

Authors: Pawel Glibowski, Agnieszka Glibowska

Abstract:

The rheological properties, structure and potential synergistic interactions of whey proteins (1-6%) and inulin (20%) in mixed gels in the presence of CaCl2 was the aim of this study. Whey proteins have a strong influence on inulin gel formation. At low concentrations (2%) whey proteins did not impair in inulin gel formation. At higher concentration (4%) whey proteins impaired inulin gelation and inulin impaired the formation of a Ca2+-induced whey protein network. The presence of whey proteins at a level allowing for protein gel network formation (6%) significantly increased the rheological parameters values of the gels. SEM micrographs showed that whey protein structure was coated by inulin moieties which could make the mixed gels firmer. The protein surface hydrophobicity measurements did not exclude synergistic interactions between inulin and whey proteins, however. The use of an electrophoretic technique did not show any stable inulin-whey protein complexes.

Keywords: gels, hydrophobicity, inulin, SEM, whey proteins.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1998
3698 Development of the Structure of the Knowledgebase for Countermeasures in the Knowledge Acquisition Process for Trouble Prediction in Healthcare Processes

Authors: Shogo Kato, Daisuke Okamoto, Satoko Tsuru, Yoshinori Iizuka, Ryoko Shimono

Abstract:

Healthcare safety has been perceived important. It is essential to prevent troubles in healthcare processes for healthcare safety. Trouble prevention is based on trouble prediction using accumulated knowledge on processes, troubles, and countermeasures. However, information on troubles has not been accumulated in hospitals in the appropriate structure, and it has not been utilized effectively to prevent troubles. In the previous study, however a detailed knowledge acquisition process for trouble prediction was proposed, the knowledgebase for countermeasures was not involved. In this paper, we aim to propose the structure of the knowledgebase for countermeasures, in the knowledge acquisition process for trouble prediction in healthcare process. We first design the structure of countermeasures and propose the knowledge representation form on countermeasures. Then, we evaluate the validity of the proposal, by applying it into an actual hospital.

Keywords: Trouble prevention, knowledge structure, structured knowledge, reusable knowledge.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1428
3697 An Integrative Bayesian Approach to Supporting the Prediction of Protein-Protein Interactions: A Case Study in Human Heart Failure

Authors: Fiona Browne, Huiru Zheng, Haiying Wang, Francisco Azuaje

Abstract:

Recent years have seen a growing trend towards the integration of multiple information sources to support large-scale prediction of protein-protein interaction (PPI) networks in model organisms. Despite advances in computational approaches, the combination of multiple “omic" datasets representing the same type of data, e.g. different gene expression datasets, has not been rigorously studied. Furthermore, there is a need to further investigate the inference capability of powerful approaches, such as fullyconnected Bayesian networks, in the context of the prediction of PPI networks. This paper addresses these limitations by proposing a Bayesian approach to integrate multiple datasets, some of which encode the same type of “omic" data to support the identification of PPI networks. The case study reported involved the combination of three gene expression datasets relevant to human heart failure (HF). In comparison with two traditional methods, Naive Bayesian and maximum likelihood ratio approaches, the proposed technique can accurately identify known PPI and can be applied to infer potentially novel interactions.

Keywords: Bayesian network, Classification, Data integration, Protein interaction networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1387
3696 On the Prediction of Transmembrane Helical Segments in Membrane Proteins Based on Wavelet Transform

Authors: Yu Bin, Zhang Yan

Abstract:

The prediction of transmembrane helical segments (TMHs) in membrane proteins is an important field in the bioinformatics research. In this paper, a new method based on discrete wavelet transform (DWT) has been developed to predict the number and location of TMHs in membrane proteins. PDB coded as 1KQG was chosen as an example to describe the prediction of the number and location of TMHs in membrane proteins by using this method. To access the effect of the method, 80 proteins with known 3D-structure from Mptopo database are chosen at random as the test objects (including 325 TMHs), 308 of which can be predicted accurately, the average predicted accuracy is 96.3%. In addition, the above 80 membrane proteins are divided into 13 groups according to their function and type. In particular, the results of the prediction of TMHs of the 13 groups are satisfying.

Keywords: discrete wavelet transform, hydrophobicity, membrane protein, transmembrane helical segments

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1203
3695 A General Model for Amino Acid Interaction Networks

Authors: Omar Gaci, Stefan Balev

Abstract:

In this paper we introduce the notion of protein interaction network. This is a graph whose vertices are the protein-s amino acids and whose edges are the interactions between them. Using a graph theory approach, we identify a number of properties of these networks. We compare them to the general small-world network model and we analyze their hierarchical structure.

Keywords: interaction network, protein structure, small-world network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1355
3694 Virulent-GO: Prediction of Virulent Proteins in Bacterial Pathogens Utilizing Gene Ontology Terms

Authors: Chia-Ta Tsai, Wen-Lin Huang, Shinn-Jang Ho, Li-Sun Shu, Shinn-Ying Ho

Abstract:

Prediction of bacterial virulent protein sequences can give assistance to identification and characterization of novel virulence-associated factors and discover drug/vaccine targets against proteins indispensable to pathogenicity. Gene Ontology (GO) annotation which describes functions of genes and gene products as a controlled vocabulary of terms has been shown effectively for a variety of tasks such as gene expression study, GO annotation prediction, protein subcellular localization, etc. In this study, we propose a sequence-based method Virulent-GO by mining informative GO terms as features for predicting bacterial virulent proteins. Each protein in the datasets used by the existing method VirulentPred is annotated by using BLAST to obtain its homologies with known accession numbers for retrieving GO terms. After investigating various popular classifiers using the same five-fold cross-validation scheme, Virulent-GO using the single kind of GO term features with an accuracy of 82.5% is slightly better than VirulentPred with 81.8% using five kinds of sequence-based features. For the evaluation of independent test, Virulent-GO also yields better results (82.0%) than VirulentPred (80.7%). When evaluating single kind of feature with SVM, the GO term feature performs much well, compared with each of the five kinds of features.

Keywords: Bacterial virulence factors, GO terms, prediction, protein sequence.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1894
3693 A Bayesian Kernel for the Prediction of Protein- Protein Interactions

Authors: Hany Alashwal, Safaai Deris, Razib M. Othman

Abstract:

Understanding proteins functions is a major goal in the post-genomic era. Proteins usually work in context of other proteins and rarely function alone. Therefore, it is highly relevant to study the interaction partners of a protein in order to understand its function. Machine learning techniques have been widely applied to predict protein-protein interactions. Kernel functions play an important role for a successful machine learning technique. Choosing the appropriate kernel function can lead to a better accuracy in a binary classifier such as the support vector machines. In this paper, we describe a Bayesian kernel for the support vector machine to predict protein-protein interactions. The use of Bayesian kernel can improve the classifier performance by incorporating the probability characteristic of the available experimental protein-protein interactions data that were compiled from different sources. In addition, the probabilistic output from the Bayesian kernel can assist biologists to conduct more research on the highly predicted interactions. The results show that the accuracy of the classifier has been improved using the Bayesian kernel compared to the standard SVM kernels. These results imply that protein-protein interaction can be predicted using Bayesian kernel with better accuracy compared to the standard SVM kernels.

Keywords: Bioinformatics, Protein-protein interactions, Bayesian Kernel, Support Vector Machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1885
3692 Multi-Label Hierarchical Classification for Protein Function Prediction

Authors: Helyane B. Borges, Julio Cesar Nievola

Abstract:

Hierarchical classification is a problem with applications in many areas as protein function prediction where the dates are hierarchically structured. Therefore, it is necessary the development of algorithms able to induce hierarchical classification models. This paper presents experimenters using the algorithm for hierarchical classification called Multi-label Hierarchical Classification using a Competitive Neural Network (MHC-CNN). It was tested in ten datasets the Gene Ontology (GO) Cellular Component Domain. The results are compared with the Clus-HMC and Clus-HSC using the hF-Measure.

Keywords: Hierarchical Classification, Competitive Neural Network, Global Classifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2133
3691 Analysis of Physicochemical Properties on Prediction of R5, X4 and R5X4 HIV-1 Coreceptor Usage

Authors: Kai-Ti Hsu, Hui-Ling Huang, Chun-Wei Tung, Yi-Hsiung Chen, Shinn-Ying Ho

Abstract:

Bioinformatics methods for predicting the T cell coreceptor usage from the array of membrane protein of HIV-1 are investigated. In this study, we aim to propose an effective prediction method for dealing with the three-class classification problem of CXCR4 (X4), CCR5 (R5) and CCR5/CXCR4 (R5X4). We made efforts in investigating the coreceptor prediction problem as follows: 1) proposing a feature set of informative physicochemical properties which is cooperated with SVM to achieve high prediction test accuracy of 81.48%, compared with the existing method with accuracy of 70.00%; 2) establishing a large up-to-date data set by increasing the size from 159 to 1225 sequences to verify the proposed prediction method where the mean test accuracy is 88.59%, and 3) analyzing the set of 14 informative physicochemical properties to further understand the characteristics of HIV-1coreceptors.

Keywords: Coreceptor, genetic algorithm, HIV-1, SVM, physicochemical properties, prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2156
3690 Sorting Primitives and Genome Rearrangementin Bioinformatics: A Unified Perspective

Authors: Swapnoneel Roy, Minhazur Rahman, Ashok Kumar Thakur

Abstract:

Bioinformatics and computational biology involve the use of techniques including applied mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level. Research in computational biology often overlaps with systems biology. Major research efforts in the field include sequence alignment, gene finding, genome assembly, protein structure alignment, protein structure prediction, prediction of gene expression and proteinprotein interactions, and the modeling of evolution. Various global rearrangements of permutations, such as reversals and transpositions,have recently become of interest because of their applications in computational molecular biology. A reversal is an operation that reverses the order of a substring of a permutation. A transposition is an operation that swaps two adjacent substrings of a permutation. The problem of determining the smallest number of reversals required to transform a given permutation into the identity permutation is called sorting by reversals. Similar problems can be defined for transpositions and other global rearrangements. In this work we perform a study about some genome rearrangement primitives. We show how a genome is modelled by a permutation, introduce some of the existing primitives and the lower and upper bounds on them. We then provide a comparison of the introduced primitives.

Keywords: Sorting Primitives, Genome Rearrangements, Transpositions, Block Interchanges, Strip Exchanges.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892
3689 Classifying and Predicting Efficiencies Using Interval DEA Grid Setting

Authors: Yiannis G. Smirlis

Abstract:

The classification and the prediction of efficiencies in Data Envelopment Analysis (DEA) is an important issue, especially in large scale problems or when new units frequently enter the under-assessment set. In this paper, we contribute to the subject by proposing a grid structure based on interval segmentations of the range of values for the inputs and outputs. Such intervals combined, define hyper-rectangles that partition the space of the problem. This structure, exploited by Interval DEA models and a dominance relation, acts as a DEA pre-processor, enabling the classification and prediction of efficiency scores, without applying any DEA models.

Keywords: Data envelopment analysis, interval DEA, efficiency classification, efficiency prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 430