Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32128
An Information Theoretic Approach to Rescoring Peptides Produced by De Novo Peptide Sequencing

Authors: John R. Rose, James P. Cleveland, Alvin Fox


Tandem mass spectrometry (MS/MS) is the engine driving high-throughput protein identification. Protein mixtures possibly representing thousands of proteins from multiple species are treated with proteolytic enzymes, cutting the proteins into smaller peptides that are then analyzed generating MS/MS spectra. The task of determining the identity of the peptide from its spectrum is currently the weak point in the process. Current approaches to de novo sequencing are able to compute candidate peptides efficiently. The problem lies in the limitations of current scoring functions. In this paper we introduce the concept of proteome signature. By examining proteins and compiling proteome signatures (amino acid usage) it is possible to characterize likely combinations of amino acids and better distinguish between candidate peptides. Our results strongly support the hypothesis that a scoring function that considers amino acid usage patterns is better able to distinguish between candidate peptides. This in turn leads to higher accuracy in peptide prediction.

Keywords: Tandem mass spectrometry, proteomics, scoring, peptide, de novo, mutual information

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1582


[1] R. Aebersold and M. Mann, "Mass spectrometry-based proteomics," Nature, vol. 422, pp. 198-207, 2003.
[2] R. D. Smith, G. A. Anderson, M. S. Lipton, L. Pasa-Tolic, Y. Shen, T. P. Conrads, T. D. Veenstra, and H. R. Udseth, "An accurate mass tag strategy for quantitative and high-throughput proteome measurements," Proteomics, vol. 2, pp. 513-523, 2002.
[3] D. A. Wolters, M. P. Washburn, and J. R. I. Yates, "An automated multidimensional protein identification technology for shotgun proteomics," Anal. Chem., vol. 73, pp. 5683-5690, 2001.
[4] D. N. Perkins, D. J. C. Pappin, D. M. Creasy, and J. S. Cottrell, "Probability-based protein identification by searching sequence databases using mass spectrometry data," Electrophoresis, vol. 20, pp. 3551-3567, 1999.
[5] J. K. Eng, A. L. McCormack, and J. R. I. Yates, "An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database," J. Am. Soc. Mass Spectrom., vol. 5, pp. 976-989, 1994.
[6] J. I. Yates, J. K. Eng, A. L. McCormack, and D. Schieltz, "A method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database," Anal. Chem., vol. 67, pp. 1426- 1436, 1995.
[7] V. Bafna and N. Edwards, "Scope: a probabilistic model for scoring tandem mass spectra against a peptide database," Bioinformatics, vol. 17, pp. S13-S21, 2001.
[8] J. A. Taylor and R. S. Johnson, "Sequence database searches via de novo peptide sequencing by tandem mass spectrometry," Rapid Commun. Mass Spectrom., vol. 11, pp. 1067-1075, 1997.
[9] V. Dancik, T. A. Addona, K. R. Clauser, J. E. Vath, and P. A. Pevzner, "De novo peptide sequencing via tandem mass spectrometry," J Comp Biol., vol. 6, pp. 327-342, 1999.
[10] A. Frank and P. Pevzner, "Pepnovo: De novo peptide sequencing via probabilistic network modeling," Anal. Chem., vol. 77, pp. 964-973, 2005.
[11] M. Bern and D. Goldberg, "De novo analysis of peptide tandem mass spectra by spectral graph partitioning," J Comp Biol., vol. 13, pp. 364- 378, 2006.
[12] B. Fischer, V. Roth, F. Roos, J. Grossmann, S. Baginsky, P. Widmayer, W. Gruissem, and J. M. Buhmann, "Novohmm: A hidden markov model for de novo peptide sequencing," Anal. Chem., vol. 77, pp. 7265-7273, 2005.
[13] P. A. DiMaggio and C. A. Floudas, "De novo peptide identification via tandem mass spectrometry and integer linear optimization," Anal. Chem., vol. 79, pp. 1433-1446, 2007.
[14] K. R. Clauser, P. Baker, and A. L. Burlingame, "Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing ms or ms/ms and database searching," Anal. Chem., vol. 71, pp. 2871- 2882, 1999.
[15] M. Mann and M. Wilm, "Error-tolerant identification of peptides in sequence databases by peptide sequence tags," Anal. Chem., vol. 66, pp. 4390-4399, 1994.
[16] D. M. Ward, W. R., and M. M. Bateson, "16s rrna sequences reveal numerous uncultured microorganisms in a natural community," Nature, vol. 345, pp. 63-65, 1990.
[17] U. B. Goebel, "Phylogenetic amplification for the detection of uncultured bacteria and the analysis of complex microbiota," J. Microbiol. Methods, vol. 23, pp. 117-128, 1995.
[18] J. M. Gonzalez and C. Saiz-Jimenez, "Application of molecular nucleic acidbased techniques for the study of microbial communities in monuments," Int. Microbiol., vol. 8, pp. 189-194, 2005.
[19] Y. Fu, Q. Yang, R. Sun, D. Li, R. Zeng, C. X. Ling, and W. Gao, "Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry," Bioinformatics, vol. 20, pp. 1948-1954, 2004.
[20] M. Havilio, Y. Haddad, and Z. Smilansky, "Intensity-based statistical scorer for tandem mass spectrometry," Anal. Chem., vol. 75, pp. 435- 444, 2003.
[21] R. G. Sadygov and J. R. Yates, "A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases," Anal. Chem., vol. 75, pp. 3792-3798, 2003.
[22] T. Fridman, J. Razumovskaya, N. Verberkmoes, G. Hurst, V. Protopopescu, and Y. Xu, "The probability distribution for a random match between an experimental-theoretical spectral pair in tandem mass spectrometry," J. Bioinform. Comput. Biol., vol. 3, pp. 455-476, 2005.
[23] A. M. Frank, "A ranking-based scoring function for peptide-spectrum matches," J. Proteome Res., vol. 8, pp. 2241-2252, 2008.
[24] R. Craig, J. Cortens, and R. Beavis, "The use of proteotypic peptide libraries for protein identification," Rapid Commun. Mass Spectrom., vol. 19, pp. 1844-1850, 2005.
[25] H. Tang, R. Arnold, P. Alves, Z. Xun, D. Clemmer, M. Novotny, J. Reilly, and P. Radivojac, "A computational approach toward labelfree protein quantification using predicted peptide detectability," Bioinformatics, vol. 22, pp. e481-e488, 2006.
[26] J. Ranish, B. Raught, R. Schmitt, T. Werner, K. B., and R. Aebersold, "Computational prediction of proteotypic peptides for quantitative proteomics," Nat. Biotechnol., vol. 25, pp. 125-131, 2007.
[27] P. Foster and D. A. Hickey, "Compositional bias may affect both dnabased and protein-based phylogenetic reconstructions," J. Mol. Evol., vol. 48, pp. 284-290, 1999.
[28] G. A. C. Singer and D. A. Hickey, "Nucleotide bias causes a genomewide bias in the amino acid composition of proteins," Mol. Biol. Evol., vol. 17, pp. 1581-1588, 2000.
[29] A. Keller, S. Purvine, A. I. Nesvizhskii, S. Stolyar, D. R. Goodlett, and E. Kolker, "Experimental protein mixture for validating tandem mass spectrometry analysis," OMICS J. Integr. Biol., vol. 6, pp. 207-212, 2002.
[30] J. T. Prince, M. W. Carlson, R. Wang, P. Lu, and E. M. Marcotte, "The need for a public proteomics repository," Nat. Biotechnol., vol. 22, pp. 471-472, 2004.
[31] N. Pace, "Mapping the tree of life: Progress and prospects," Microbiology and Molecular Biology Reviews, vol. 73, no. 4, pp. 565-576, December 2009.
[32] J. M. Janda and S. L. Abbott, "16s rrna gene sequencing for bacterial identification in the diagnostic laboratory: Pluses, perils, and pitfalls," Journal of Clinical Microbiology, vol. 45, no. 6, pp. 2761-2764, September 2007.
[33] M. Drancourt, C. Bollet, A. Carlioz, R. Martelin, J. Gayral, and D. Raoult, "16s ribosomal dna sequence analysis of a large collection of environmental and clinical unidentifiable bacterial isolates," Journal of Clinical Microbiology, vol. 38, no. 10, pp. 3623-3630, October 2000.
[34] S. Mignard and J. P. Flandrois, "16s rrna sequencing in routine bacterial identification: a 30-month experiment," Journal of Microbiological Methods, vol. 67, no. 3, pp. 574-581, December 2006.