Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31107
Investigations of Protein Aggregation Using Sequence and Structure Based Features

Authors: M. Michael Gromiha, A. Mary Thangakani, Sandeep Kumar, D. Velmurugan


The main cause of several neurodegenerative diseases such as Alzhemier, Parkinson and spongiform encephalopathies is formation of amyloid fibrils and plaques in proteins. We have analyzed different sets of proteins and peptides to understand the influence of sequence based features on protein aggregation process. The comparison of 373 pairs of homologous mesophilic and thermophilic proteins showed that aggregation prone regions (APRs) are present in both. But, the thermophilic protein monomers show greater ability to ‘stow away’ the APRs in their hydrophobic cores and protect them from solvent exposure. The comparison of amyloid forming and amorphous b-aggregating hexapeptides suggested distinct preferences for specific residues at the six positions as well as all possible combinations of nine residue pairs. The compositions of residues at different positions and residue pairs have been converted into energy potentials and utilized for distinguishing between amyloid forming and amorphous b-aggregating peptides. Our method could correctly identify the amyloid forming peptides at an accuracy of 95-100% in different datasets of peptides.

Keywords: Machine Learning, amyloids, thermophilic proteins, amino acid residues, Aggregation prone regions

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 843


[1] Agrawal NJ, Kumar S, Wang X, Helk B, Singh SK, Trout BL 2011. Aggregation in protein-based biotherapeutics: Computational studies and tools to identify aggregation-prone regions. J Pharm Sci 2011, 100:5081-5095.
[2] Wurth C, Guimard NK, Hecht MH (2002) Mutations that reduce aggregation of the Alzheimer's Aβ42 peptide: an unbiased search for the sequence determinants of Aβ amyloidogenesis. J Mol Biol 319: 1279–1290
[3] de Groot NS, Aviles FX, Vendrell J, Ventura S (2006) Mutagenesis of the central hydrophobic cluster in Aβ42 Alzheimer's peptide. Side-chain properties correlate with aggregation propensities. FEBS J 273: 658–668
[4] Kim W, Hecht MH (2006) Generic hydrophobic residues are sufficient to promote aggregation of the Alzheimer's Aβ42 peptide. Proc Natl Acad Sci USA 103: 15824–15829
[5] Luheshi LM et al (2007) Systematic in vivo analysis of the intrinsic determinants of amyloid β pathogenicity. PLoS Biol 5: e290
[6] Winkelmann J, Calloni G, Campioni S, Mannini B, Taddei N, Chiti F (2010) Low-level expression of a folding-incompetent protein in Escherichia coli: search for the molecular determinants of protein aggregation in vivo. J Mol Biol 398: 600–613
[7] Thangakani AM, Kumar S, Velmurugan D, Gromiha MM. How do thermophilic proteins resist aggregation? Proteins. 2012;80:1003-15.
[8] Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L (2004) Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol 22: 1302–1306
[9] Conchillo-Solé O, de Groot NS, Avilés FX, Vendrell J, Daura X, Ventura S (2007) AGGRESCAN: a server for the prediction and evaluation of “hot spots” of aggregation in polypeptides. BMC Bioinformatics 8: 65
[10] Maurer-Stroh S et al (2010) Exploring the sequence determinants of amyloid structure using position-specific scoring matrices. Nat Methods 7: 237–242.
[11] DuBay KF, Pawar AP, Chiti F, Zurdo J, Dobson CM, Vendruscolo M. Prediction of the absolute aggregation rates of amyloidogenic polypeptide chains. J Mol Biol 2004, 341: 1317–1326.
[12] Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L. Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol 2004, 22: 1302–1306.
[13] Pawar AP, Dubay KF, Zurdo J, Chiti F, Vendruscolo M, Dobson CM. Prediction of "aggregation-prone" and "aggregation-susceptible" regions in proteins associated with neurodegenerative diseases. J. Mol. Biol. 2005, 350:379-392.
[14] Tartaglia GG, Cavalli A, Pellarin R, Caflisch A. Prediction of aggregation rate and aggregation-prone segments in polypeptide sequences. Protein Science 2005, 14:2723-2734.
[15] Galzitskaya OV, Garbuzynskiy SO, Lobanov MY. Prediction of amyloidogenic and disordered regions in protein chains. PLoS Comput Biol 2006, 2: e177.
[16] Thompson MJ, Sievers SA, Karanicolas J, Ivanova MI, Baker D, Eisenberg D. The 3D profile method for identifying fibril-forming segments of proteins. Proc Natl Acad Sci USA 2006, 103: 4074–4078.
[17] Zibaee S, Makin OS, Goedert M, Serpell LC. A simple algorithm locates β-strands in the amyloid fibril core of α-synuclein, Aβ, and tau using the amino acid sequence alone. Protein Sci 2007, 16: 906–918.
[18] Belli M, Ramazzotti M, Chiti F. Prediction of amyloid aggregation in vivo. EMBO Rep. 2011;12:657-63.
[19] Glyakina AV, Garbuzynskiy SO, Lobanov MY, Galzitskaya OV. (2007) Different packing of external residues can explain differences in the thermostability of proteins from thermophilic and mesophilic organisms. Bioinformatics. 23(17):2231-8.
[20] Wang X, Singh SK, Kumar S. (2010) Potential Aggregation-Prone Regions in Complementarity-Determining Regions of Antibodies and Their Contribution Towards Antigen Recognition: A Computational Analysis. Pharm Res (2010) 27:1512–1529.
[21] Kumar S, Wang X, Singh SK. (2010) Identification and impact of aggregation prone regions in proteins and therapeutic mAbs. In: Wang and W, Roberts C, editors. Aggregation of therapeutic proteins. US: Wiley; Hoboken, 2010; pp 103 – 118.
[22] Lopez de la Paz, M, Serrano. (2004) Sequence determinants of amyloid fibril formation. Proc Natl Acad Sci USA, 101, 87-92.
[23] Tjernberg, L., Hosia, W., Bark, N., Thyberg, J. Johansson, J. Charge attraction and beta propensity are necessary for amyloid fibril formation from tetrapeptides. J Biol Chem 2002, 277, 43243-6.
[24] Gromiha MM, Suwa M. A simple statistical method for discriminating outer membrane proteins with better accuracy. Bioinformatics. 2005;21:961-8
[25] Gromiha MM. (2010) Protein Bioinformatics: From Sequence to Function, Elsevier/Academic Press.
[26] Gromiha MM, Thangakani AM, Kumar S, Velmurugan D. (2012) Sequence analysis and discrimination of amyloid and non-amyloid peptides. Comm. Comp. Inf. Sci. 304, 447-452 (2012).
[27] Thangakani AM, Kumar S, Nagarajan R, Velmurugan D, Gromiha MM. GAP: towards almost 100 percent prediction for β-strand-mediated aggregating peptides with distinct morphologies. Bioinformatics. 2014;30(14):1983-90.
[28] Thangakani AM, Kumar S, Velmurugan D, Gromiha MM. Distinct position-specific sequence features of hexa-peptides that form amyloid-fibrils: application to discriminate between amyloid fibril and amorphous β-aggregate forming peptide sequences. BMC Bioinformatics. 2013;14 Suppl 8:S6
[29] Witten IH, Frank E: Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Francisco, 2005.
[30] Gromiha MM, Suwa M. Influence of amino acid properties for discriminating outer membrane proteins at better accuracy. Biochim Biophys Acta. 2006;1764:1493-7.
[31] Tsolis AC1, Papandreou NC, Iconomidou VA, Hamodrakas SJ. A consensus method for the prediction of 'aggregation-prone' peptides in globular proteins. PLoS One. 2013;8(1):e54175