Shinn-Ying Ho

Publications

4 Sequence-based Prediction of Gamma-turn Types using a Physicochemical Property-based Decision Tree Method

Authors: Shinn-Jang Ho, Shinn-Ying Ho, Chyn Liaw, Chun-Wei Tung

Abstract:

The γ-turns play important roles in protein folding and molecular recognition. The prediction and analysis of γ-turn types are important for both protein structure predictions and better understanding the characteristics of different γ-turn types. This study proposed a physicochemical property-based decision tree (PPDT) method to interpretably predict γ-turn types. In addition to the good prediction performance of PPDT, three simple and human interpretable IF-THEN rules are extracted from the decision tree constructed by PPDT. The identified informative physicochemical properties and concise rules provide a simple way for discriminating and understanding γ-turn types.

Keywords: Physicochemical properties, protein secondary structure, Classification and regression tree (CART), γ-turn

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1224
3 Prediction of Protein Subchloroplast Locations using Random Forests

Authors: Shinn-Jang Ho, Shinn-Ying Ho, Chyn Liaw, Chun-Wei Tung

Abstract:

Protein subchloroplast locations are correlated with its functions. In contrast to the large amount of available protein sequences, the information of their locations and functions is less known. The experiment works for identification of protein locations and functions are costly and time consuming. The accurate prediction of protein subchloroplast locations can accelerate the study of functions of proteins in chloroplast. This study proposes a Random Forest based method, ChloroRF, to predict protein subchloroplast locations using interpretable physicochemical properties. In addition to high prediction accuracy, the ChloroRF is able to select important physicochemical properties. The important physicochemical properties are also analyzed to provide insights into the underlying mechanism.

Keywords: Physicochemical properties, random forests, chloroplast, Proteinlocations

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1280
2 Analysis of Physicochemical Properties on Prediction of R5, X4 and R5X4 HIV-1 Coreceptor Usage

Authors: KaI-Ti Hsu, Shinn-Ying Ho, Chun-Wei Tung, Hui-Ling Huang, Yi-Hsiung Chen

Abstract:

Bioinformatics methods for predicting the T cell coreceptor usage from the array of membrane protein of HIV-1 are investigated. In this study, we aim to propose an effective prediction method for dealing with the three-class classification problem of CXCR4 (X4), CCR5 (R5) and CCR5/CXCR4 (R5X4). We made efforts in investigating the coreceptor prediction problem as follows: 1) proposing a feature set of informative physicochemical properties which is cooperated with SVM to achieve high prediction test accuracy of 81.48%, compared with the existing method with accuracy of 70.00%; 2) establishing a large up-to-date data set by increasing the size from 159 to 1225 sequences to verify the proposed prediction method where the mean test accuracy is 88.59%, and 3) analyzing the set of 14 informative physicochemical properties to further understand the characteristics of HIV-1coreceptors.

Keywords: Physicochemical properties, Genetic Algorithm, prediction, SVM, HIV-1, Coreceptor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2057
1 Virulent-GO: Prediction of Virulent Proteins in Bacterial Pathogens Utilizing Gene Ontology Terms

Authors: Chia-Ta Tsai, Wen-Lin Huang, Shinn-Jang Ho, Li-Sun Shu, Shinn-Ying Ho

Abstract:

Prediction of bacterial virulent protein sequences can give assistance to identification and characterization of novel virulence-associated factors and discover drug/vaccine targets against proteins indispensable to pathogenicity. Gene Ontology (GO) annotation which describes functions of genes and gene products as a controlled vocabulary of terms has been shown effectively for a variety of tasks such as gene expression study, GO annotation prediction, protein subcellular localization, etc. In this study, we propose a sequence-based method Virulent-GO by mining informative GO terms as features for predicting bacterial virulent proteins. Each protein in the datasets used by the existing method VirulentPred is annotated by using BLAST to obtain its homologies with known accession numbers for retrieving GO terms. After investigating various popular classifiers using the same five-fold cross-validation scheme, Virulent-GO using the single kind of GO term features with an accuracy of 82.5% is slightly better than VirulentPred with 81.8% using five kinds of sequence-based features. For the evaluation of independent test, Virulent-GO also yields better results (82.0%) than VirulentPred (80.7%). When evaluating single kind of feature with SVM, the GO term feature performs much well, compared with each of the five kinds of features.

Keywords: prediction, Bacterial virulence factors, GO terms, protein sequence

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1755

Abstracts

1 Prediction and Analysis of Human Transmembrane Transporter Proteins Based on SCM

Authors: Shinn-Ying Ho, Hui-Ling Huang, Tamara Vasylenko, Phasit Charoenkwan, Shih-Hsiang Chiu

Abstract:

The knowledge of the human transporters is still limited due to technically demanding procedure of crystallization for the structural characterization of transporters by spectroscopic methods. It is desirable to develop bioinformatics tools for effective analysis of available sequences in order to identify human transmembrane transporter proteins (HMTPs). This study proposes a scoring card method (SCM) based method for predicting HMTPs. We estimated a set of propensity scores of dipeptides to be HMTPs using SCM from the training dataset (HTS732) consisting of 366 HMTPs and 366 non-HMTPs. SCM using the estimated propensity scores of 20 amino acids and 400 dipeptides -as HMTPs, has a training accuracy of 87.63% and a test accuracy of 66.46%. The five top-ranked dipeptides include LD, NV, LI, KY, and MN with scores 996, 992, 989, 987, and 985, respectively. Five amino acids with the highest propensity scores are Ile, Phe, Met, Gly, and Leu, that hydrophobic residues are mostly highly-scored. Furthermore, obtained propensity scores were used to analyze physicochemical properties of human transporters.

Keywords: physicochemical property, dipeptide composition, human transmembrane transporter proteins, human transmembrane transporters binding propensity, scoring card method

Procedia PDF Downloads 248