Prediction of Protein Subchloroplast Locations using Random Forests
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33090
Prediction of Protein Subchloroplast Locations using Random Forests

Authors: Chun-Wei Tung, Chyn Liaw, Shinn-Jang Ho, Shinn-Ying Ho

Abstract:

Protein subchloroplast locations are correlated with its functions. In contrast to the large amount of available protein sequences, the information of their locations and functions is less known. The experiment works for identification of protein locations and functions are costly and time consuming. The accurate prediction of protein subchloroplast locations can accelerate the study of functions of proteins in chloroplast. This study proposes a Random Forest based method, ChloroRF, to predict protein subchloroplast locations using interpretable physicochemical properties. In addition to high prediction accuracy, the ChloroRF is able to select important physicochemical properties. The important physicochemical properties are also analyzed to provide insights into the underlying mechanism.

Keywords: Chloroplast, Physicochemical properties, Proteinlocations, Random Forests.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1080193

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1676

References:


[1] W. Martin and R. G. Herrmann, "Gene transfer from organelles to the nucleus: how much, what happens, and Why?," Plant Physiol, vol. 118, pp. 9-17, Sep 1998.
[2] J. B. Peltier, G. Friso, D. E. Kalume, P. Roepstorff, F. Nilsson, I. Adamska, and K. J. van Wijk, "Proteomics of the chloroplast: systematic identification and targeting analysis of lumenal and peripheral thylakoid proteins," Plant Cell, vol. 12, pp. 319-41, Mar 2000.
[3] J. B. Peltier, O. Emanuelsson, D. E. Kalume, J. Ytterberg, G. Friso, A. Rudella, D. A. Liberles, L. Soderberg, P. Roepstorff, G. von Heijne, and K. J. van Wijk, "Central functions of the lumenal and peripheral thylakoid proteome of Arabidopsis determined by experimentation and genome-wide prediction," Plant Cell, vol. 14, pp. 211-36, Jan 2002.
[4] M. Ferro, D. Salvi, H. Riviere-Rolland, T. Vermat, D. Seigneurin-Berny, D. Grunwald, J. Garin, J. Joyard, and N. Rolland, "Integral membrane proteins of the chloroplast envelope: identification and subcellular localization of new transporters," Proc Natl Acad Sci U S A, vol. 99, pp. 11487-92, Aug 20 2002.
[5] M. Ferro, D. Salvi, S. Brugiere, S. Miras, S. Kowalski, M. Louwagie, J. Garin, J. Joyard, and N. Rolland, "Proteomics of the chloroplast envelope membranes from Arabidopsis thaliana," Mol Cell Proteomics, vol. 2, pp. 325-45, May 2003.
[6] O. Emanuelsson, H. Nielsen, S. Brunak, and G. von Heijne, "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence," J Mol Biol, vol. 300, pp. 1005-16, Jul 21 2000.
[7] O. Emanuelsson, H. Nielsen, and G. von Heijne, "ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites," Protein Sci, vol. 8, pp. 978-84, May 1999.
[8] F. Abdallah, F. Salamini, and D. Leister, "A prediction of the size and evolutionary origin of the proteome of chloroplasts of Arabidopsis," Trends Plant Sci, vol. 5, pp. 141-2, Apr 2000.
[9] W. Martin, T. Rujan, E. Richly, A. Hansen, S. Cornelsen, T. Lins, D. Leister, B. Stoebe, M. Hasegawa, and D. Penny, "Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus," Proc Natl Acad Sci U S A, vol. 99, pp. 12246-51, Sep 17 2002.
[10] D. Leister, "Chloroplast research in the genomic age," Trends Genet, vol. 19, pp. 47-56, Jan 2003.
[11] P. Du, S. Cao, and Y. Li, "SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm," J Theor Biol, vol. 261, pp. 330-5, Nov 21 2009.
[12] C.-W. Tung and S.-Y. Ho, "POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties," Bioinformatics, vol. 23, pp. 942-9, Apr 15 2007.
[13] C.-W. Tung and S.-Y. Ho, "Computational identification of ubiquitylation sites from protein sequences," BMC Bioinformatics, vol. 9, p. 310, 2008.
[14] K.-T. Hsu, H.-L. Huang, C.-W. Tung, Y.-H. Chen, and S.-Y. Ho, "Analysis of physicochemical properties on prediction of R5, X4, and R5X4 HIV-1 coreceptor usage," Int J Biol Life Sci, vol. 5, pp. 208-15, 2009.
[15] W.-L. Huang, C.-W. Tung, H.-L. Huang, S.-F. Hwang, and S.-Y. Ho, "ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features," Biosystems, Jan 4 2007.
[16] D. Sarda, G. H. Chua, K. B. Li, and A. Krishnan, "pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties," BMC Bioinformatics, vol. 6, p. 152, 2005.
[17] L. Breiman, "Random forests," Machine Learning, vol. 45, pp. 5-32, Oct 2001.
[18] S. Kawashima, P. Pokarowski, M. Pokarowska, A. Kolinski, T. Katayama, and M. Kanehisa, "AAindex: amino acid index database, progress report 2008," Nucleic Acids Res, vol. 36, pp. D202-5, Jan 2008.
[19] N. Lin, B. Wu, R. Jansen, M. Gerstein, and H. Zhao, "Information assessment on predicting protein-protein interactions," BMC Bioinformatics, vol. 5, p. 154, Oct 18 2004.
[20] D. Amaratunga, J. Cabrera, and Y. S. Lee, "Enriched random forests," Bioinformatics, vol. 24, pp. 2010-4, Sep 15 2008.
[21] "The Universal Protein Resource (UniProt) 2009," Nucleic Acids Res, vol. 37, pp. D169-74, Jan 2009.
[22] W. Li and A. Godzik, "Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences," Bioinformatics, vol. 22, pp. 1658-9, Jul 1 2006.
[23] Y. Huang, B. Niu, Y. Gao, L. Fu, and W. Li, "CD-HIT Suite: a web server for clustering and comparing biological sequences," Bioinformatics, vol. 26, pp. 680-2, Mar 1 2010.
[24] L. Breiman, Classification and regression trees: Chapman & Hall/CRC, 1984.
[25] S. Rackovsky and H. Scheraga, "Differential geometry and polymer conformation. 4. Conformational and nucleation properties of individual amino acids," Macromolecules, vol. 15, pp. 1340-1346, 1982.
[26] R. Grantham, "Amino acid difference formula to help explain protein evolution," Science, vol. 185, pp. 862-4, Sep 6 1974.
[27] M. Wilce, M. Aguilar, and M. Hearn, "Physicochemical basis of amino acid hydrophobicity scales: Evaluation of four new scales of amino acid hydrophobicity coefficients derived from RP-HPLC of peptides," Analytical chemistry, vol. 67, pp. 1210-1219, 1995.
[28] L. Kuhn, C. Swanson, M. Pique, J. Tainer, and E. Getzoff, "Atomic and residue hydrophilicity in the context of folded protein structures," Proteins, vol. 23, p. 536, 1995.
[29] P. K. Ponnuswamy, M. Prabhakaran, and P. Manavalan, "Hydrophobic packing and spatial arrangement of amino acid residues in globular proteins," Biochim Biophys Acta, vol. 623, pp. 301-16, Jun 26 1980.
[30] D. Eisenberg and A. D. McLachlan, "Solvation energy in protein folding and binding," Nature, vol. 319, pp. 199-203, Jan 16-22 1986.
[31] P. Argos, J. K. Rao, and P. A. Hargrave, "Structural prediction of membrane-bound proteins," Eur J Biochem, vol. 128, pp. 565-75, Nov 15 1982.
[32] H. Nakashima and K. Nishikawa, "The amino acid composition is different between the cytoplasmic and extracellular sides in membrane proteins," FEBS letters, vol. 303, pp. 141-146, 1992.
[33] J. Cornette, K. Cease, H. Margalit, J. Spouge, J. Berzofsky, and C. DeLisi, "Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins," J Mol Biol, vol. 195, pp. 659-685, 1987.
[34] S. Fukuchi and K. Nishikawa, "Protein surface amino acid compositions distinctively differ between thermophilic and mesophilic bacteria1," J Mol Biol, vol. 309, pp. 835-843, 2001.
[35] S. Kumar, C. Tsai, and R. Nussinov, "Factors enhancing protein thermostability," Protein Eng Des Sel, vol. 13, p. 179, 2000.