An Improved Fast Search Method Using Histogram Features for DNA Sequence Database
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32794
An Improved Fast Search Method Using Histogram Features for DNA Sequence Database

Authors: Qiu Chen, Feifei Lee, Koji Kotani, Tadahiro Ohmi


In this paper, we propose an efficient hierarchical DNA sequence search method to improve the search speed while the accuracy is being kept constant. For a given query DNA sequence, firstly, a fast local search method using histogram features is used as a filtering mechanism before scanning the sequences in the database. An overlapping processing is newly added to improve the robustness of the algorithm. A large number of DNA sequences with low similarity will be excluded for latter searching. The Smith-Waterman algorithm is then applied to each remainder sequences. Experimental results using GenBank sequence data show the proposed method combining histogram information and Smith-Waterman algorithm is more efficient for DNA sequence search.

Keywords: Fast search, DNA sequence, Histogram feature, Smith-Waterman algorithm, Local search

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1275


[1] J. C. Venter, M. D. Adams, E. W. Myers, etc., "The sequence of the human genome", Science, vol. 291, no. 5507, pp. 1304 -1351, 2001.
[2] F. S. Collins, M. Morgan, A. Patrinos, "The human genome project: lessons from Large-Scale Biology", Science, vol. 300, no. 5617, pp. 286-290, 2003.
[3] S.B.Needlman and C.D.Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins", Journal of Molecular Biology, vol. 48, pp. 443- 453, 1970.
[4] T. F. Smith and M. S.Waterman, "Identification of common molecular subsequences", Journal of Molecular Biology, vol. 47, pp. 195- 197, 1981.
[5] S. F. Altscgul, W.Gish, W. Miller, E. W. Myers and D. J. Lipman, "Basic local alignment search tool", Journal of Molecular Biology, vol. 215, pp. 403- 410, 1990.
[6] D. Lipman and W. R. Pearson, "Rapid and sensitive protein similarity searches", Science, vol. 227, pp. 1435-1441, 1985.
[7] GenBank,
[9] M. Li, and B. Ma, "PatternHunter II: highly sensitive and fast homology search", Genome Informatics, vol. 14, pp. 164-175, 2003.
[10] B. Ma, J. Tromp, and M. Li, "PatternHunter: faster and more sensitive homology search", Bioinformatics, vol. 18, no. 3, pp. 440- 445, 2002.
[11] Q. Chen, K. Kotani, F. Lee, and T. Ohmi, "A Fast Retrieval of DNA Sequences Using Histogram Information", 2009 Int-l Conf. on Future Information Technology and Management Engineering (FITME 2009), pp. 529-532, Sanya, China, Dec., 2009.