An Improved Fast Search Method Using Histogram Features for DNA Sequence Database
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32807
An Improved Fast Search Method Using Histogram Features for DNA Sequence Database

Authors: Qiu Chen, Feifei Lee, Koji Kotani, Tadahiro Ohmi

Abstract:

In this paper, we propose an efficient hierarchical DNA sequence search method to improve the search speed while the accuracy is being kept constant. For a given query DNA sequence, firstly, a fast local search method using histogram features is used as a filtering mechanism before scanning the sequences in the database. An overlapping processing is newly added to improve the robustness of the algorithm. A large number of DNA sequences with low similarity will be excluded for latter searching. The Smith-Waterman algorithm is then applied to each remainder sequences. Experimental results using GenBank sequence data show the proposed method combining histogram information and Smith-Waterman algorithm is more efficient for DNA sequence search.

Keywords: Fast search, DNA sequence, Histogram feature, Smith-Waterman algorithm, Local search

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1076388

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1281

References:


[1] J. C. Venter, M. D. Adams, E. W. Myers, etc., "The sequence of the human genome", Science, vol. 291, no. 5507, pp. 1304 -1351, 2001.
[2] F. S. Collins, M. Morgan, A. Patrinos, "The human genome project: lessons from Large-Scale Biology", Science, vol. 300, no. 5617, pp. 286-290, 2003.
[3] S.B.Needlman and C.D.Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins", Journal of Molecular Biology, vol. 48, pp. 443- 453, 1970.
[4] T. F. Smith and M. S.Waterman, "Identification of common molecular subsequences", Journal of Molecular Biology, vol. 47, pp. 195- 197, 1981.
[5] S. F. Altscgul, W.Gish, W. Miller, E. W. Myers and D. J. Lipman, "Basic local alignment search tool", Journal of Molecular Biology, vol. 215, pp. 403- 410, 1990.
[6] D. Lipman and W. R. Pearson, "Rapid and sensitive protein similarity searches", Science, vol. 227, pp. 1435-1441, 1985.
[7] GenBank, ftp://ftp.ncbi.nih.gov/genbank/
[8] http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html.
[9] M. Li, and B. Ma, "PatternHunter II: highly sensitive and fast homology search", Genome Informatics, vol. 14, pp. 164-175, 2003.
[10] B. Ma, J. Tromp, and M. Li, "PatternHunter: faster and more sensitive homology search", Bioinformatics, vol. 18, no. 3, pp. 440- 445, 2002.
[11] Q. Chen, K. Kotani, F. Lee, and T. Ohmi, "A Fast Retrieval of DNA Sequences Using Histogram Information", 2009 Int-l Conf. on Future Information Technology and Management Engineering (FITME 2009), pp. 529-532, Sanya, China, Dec., 2009.