Comparison of Phylogenetic Trees of Multiple Protein Sequence Alignment Methods

Khaddouja Boujenfa; Nadia Essoussi; Mohamed Limam

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Comparison of Phylogenetic Trees of Multiple Protein Sequence Alignment Methods

Authors: Khaddouja Boujenfa, Nadia Essoussi, Mohamed Limam

Abstract:

Multiple sequence alignment is a fundamental part in many bioinformatics applications such as phylogenetic analysis. Many alignment methods have been proposed. Each method gives a different result for the same data set, and consequently generates a different phylogenetic tree. Hence, the chosen alignment method affects the resulting tree. However in the literature, there is no evaluation of multiple alignment methods based on the comparison of their phylogenetic trees. This work evaluates the following eight aligners: ClustalX, T-Coffee, SAGA, MUSCLE, MAFFT, DIALIGN, ProbCons and Align-m, based on their phylogenetic trees (test trees) produced on a given data set. The Neighbor-Joining method is used to estimate trees. Three criteria, namely, the dNNI, the dRF and the Id_Tree are established to test the ability of different alignment methods to produce closer test tree compared to the reference one (true tree). Results show that the method which produces the most accurate alignment gives the nearest test tree to the reference tree. MUSCLE outperforms all aligners with respect to the three criteria and for all datasets, performing particularly better when sequence identities are within 10-20%. It is followed by T-Coffee at lower sequence identity (<10%), Align-m at 20-30% identity, and ClustalX and ProbCons at 30-50% identity. Also, it is noticed that when sequence identities are higher (>30%), trees scores of all methods become similar.

Keywords: Multiple alignment methods, phylogenetic trees, Neighbor-Joining method, Robinson-Foulds distance.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1331675

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1830

References:

[1] J.D. Thompson, et al. "The ClustalX:windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools," Nucleic Acids Res., vol. 25, 1997, pp. 4876-4882.
[2] C. NotreDame, et al. "T-Coffee: A novel method for multiple sequence alignments", J. Mol. Biol., vol. 302, 2000, pp. 205-217.
[3] R.C. Edgar, "MUSCLE: multiple sequence alignment with high accuracy and high throughput," Nucleic Acids Res., vol. 32, 2004, pp. 1792-1797.
[4] K. Katoh, et al. "MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier Transform," Nucleic Acids Res., vol.30, 2002, pp. 3059-3066.
[5] C. B. Do, "ProbCons: Probabilistic consistency-based multiple sequence alignmentg," Genome Res., vol. 15, 2005, pp. 330-340.
[6] I. V. Walle, et al. "Align-mÔÇöA new algorithm for multiple alignment of high divergent sequences," Bioinformatics., vol. 20, 2004, pp. 1428- 1435.
[7] B. Morgenstern, "DIALIGN2: improvement of the segment-to-segment approach to multiple sequence alignment," Bioinformatics., vol. 15, 1999, pp. 211-218.
[8] C. NotreDame and D. G. Higgins, "SAGA: sequence alignment by genetic algorithm," Nucleic Acids Res., vol. 24, 1996, pp. 1515-1524.
[9] M. A. McClure, et al., "Comparative analysis of multiple proteinsequence alignment methods," Mol. Biol. Evol., vol. 11, 1994, pp. 571- 592.
[10] S. Henikoff and J. G. Henikoff, "Embedding strategies for effective use of information from multiple sequence alignments," Protein Sci., vol. 6, 1997, pp. 698-705.
[11] P. Briffeuil, et al., "Comparative analysis of multiple protein sequence alignment servers: clues to enhance reliability of prdictions," Bioinformatics, vol. 14, 1998, pp. 357-366.
[12] J.D. Thompson, et al. "BAliBASE: A benchmark alignment database for the evaluation of multiple alignment programs," Bioinformatics., vol. 15, 1999, pp. 87-88.
[13] M. S. Waterman and T. F. Smith "On the similarity of dendrograms," J. Theor. Biol., vol. 73, 1978, pp. 789-800.
[14] D. F. Robinson and L. R. Foulds "Comparison of phylogenetic trees," Math. Bios., vol. 53, 1981, pp. 131 -147.
[15] G. P. S. Raghava et al., "OXBench: A benchmark for evaluation of protein multiple sequence alignment accuracy," BMC Bioinformatics, vol. 4, 2003.
[16] J.D. Thompson, et al. "ClustalW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice," Nucleic Acids Res., vol. 22, 1994, pp. 4673-4680.
[17] N. Saitou and M. Nei "The Neighbor-Joining method: a new method for reconstructing phylogenetic trees," Mol. Biol. Evol., vol. 4, 1987, pp. 406-425.
[18] R. D. M. Page "COMPONENT: Tree comparison software for Microsoft Windows, version 2.0," The Natural History Museum, London, 1993.
[19] A. Drummond and K. Strimmer "PAL: An object-oriented programming library for molecular evolution and phylogenetics," Bioinformatics, vol. 17, 2001, pp. 662-663.