**Commenced**in January 2007

**Frequency:**Monthly

**Edition:**International

**Paper Count:**31903

##### Constraint Based Frequent Pattern Mining Technique for Solving GCS Problem

**Authors:**
First G.M. Karthik,
Second Ramachandra.V.Pujeri,
Dr.

**Abstract:**

**Keywords:**
Constraint Based Mining,
FP tree,
Data mining,
GCS problem,
CBFP mining technique.

**Digital Object Identifier (DOI):**
doi.org/10.5281/zenodo.1076646

**References:**

[1] C.N. Meneses, Z. Lu, A.S. Oliveria, and P.M. Pardolos, "Optimal Solutions for the Closest String Problem via Integer Programming," INFORMS J. Computing, vol. 16, no. 4, pp. 419-429, 2004.

[2] E. Eskin and P. Pevzner, "Finding Composite Regulatory Patterns in DNA Sequences," Bioinformatics, pp. 354-363, 2002.

[3] F.Y.L. Chin and H.C.M. Leung, "Voting Algorithms for Discovering Long Motifs," Proc. Third Asia-Pacific Bioinformatics Conf. (APBC -05), pp. 261-271, 2005.

[4] G. Grahne, L. Lakshmanan, and X. Wang, "Efficient mining of constrained correlated sets," In Proc. 2000 Int. Conf. Data Engineering (ICDE-00), San Diego, CA, pp. 512-521, 2000.

[5] G. Pavesi, G. Mauri, and G. Pesole, "An Algorithm for Finding Signals of Unknown Length in DNA Sequences," Bioinformatics, vol. 17, pp. 207-214, 2001.

[6] G. Pavesi, G. Mauri, and G. Pesole, "In Silico Representation and Discovery of Transcription Factor Binding Sites," Brief in Bioinformatics, vol. 5, no. 3, pp. 217-36, 2004.

[7] G.D. Stormo, "DNA Binding Sites: Representation and Discovery," Bionformatics, vol. 16, no. 1, pp. 16-23, 2004.

[8] H. Mannila, H. Toivonen, "Levelwise search and borders of theories in knowledge discovery," Data Mining and Knowledge Discovery 1, pp 241-258, 1997.

[9] H.C.M. Leung and F.Y.L. Chin, "Algorithms for Challenging Motif Problems," J. Bioinformatics and Computational Biology, vol. 4, no. 1, pp. 43-58, 2006.

[10] J. Buhler and M. Tompa, "Finding Motifs Using Random Projections," Proc. Fifth Ann. Int-l Conf. Computational Molecular Biology (RECOMB), pp. 69-76, 2001.

[11] J. Davila, S. Balla, and S. Rajasekaran, "Space and Time Efficient Algorithms for Planted Motif Search," Proc. Int-l Workshop Bioinformatics Research and Applications (IWBRA -06), May 2006.

[12] J. Gramm, R. Niedermeier, and P. Rossmanith, "Exact Solutions for CLOSEST STRING and Related Problems," Proc. 12th Int-l Symp. Algorithms and Computation, pp. 441-453, 2001.

[13] J. Guan, D.Y. Liu, and D.A. Bell, "Discovering Motifs in DNA Sequences," Fundamenta Informaticae, vol. 59, pp. 119-132, 2004.

[14] J. Han, J. Pei, and Y. Yin, "Mining frequent patterns without candidate generation," In Proc. 2000 ACMSIGMOD, Int. Conf. Management of Data (SIGMOD-00), Dallas, TX, pp. 1-12, 2000.

[15] J. Han, J. Pei, and Y. Yin, "Mining Frequent Patterns without Candidate Generation," Proc. ACM SIGMOD, pp. 1-12, 2000.

[16] J. Han, L.V.S. Lakshmanan, T.N.Raymond, "Constraint-Based Multidimensional Data Mining," IEEE Computer, 32(8), 46- 50,1999.

[17] J. Pei, J. Han, and L.V.S. Lakshmanan, "Mining frequent itemsets with convertible constraints," In Proc. 2001 Int. Conf. Data Engineering (ICDE-01), Heidelberg, Germany, pp. 433-332, 2001.

[18] J. Pei, J. Han, and R. Mao, "CLOSET: An efficient algorithm for mining frequent closed itemsets," In Proc. 2000 ACM-SIGMOD Int. Workshop Data Mining and Knowledge Discovery (DMKD-00), Dallas, TX, pp. 11-20, 2000.

[19] K. Lanctot, M. Li, B. Ma, and L. Zhang, "Distinguishing String Selection Problems," Information and Computation, vol. 185, no. 1, pp. 44-51, 2003.

[20] L. De Raedt, S. Kramer, "The levelwise version space algorithm and its application to molecular fragment finding," In. IJCA101: 7th Int. Joint Conf. Artificial Intelligence, 2001.

[21] L. Marsan and M.F. Sagot, "Algorithms for Extracting Structured Motifs Using a Suffix Tree with Application to Promoter and Regulatory Site Consensus Identification," J. Computational Biology, vol. 7, pp. 345-360, 2000.

[22] M. Ester and X. Zhang, "A Top-Down Method for Mining Most Specific Frequent Patterns in Biological Sequence Data," Proc. SIAM Int-l Conf. on Data Mining (SDM -04), Apr. 2004.

[23] M. Frances and A. Litman, "On Covering Problems of Codes," Theoretical Computer System, vol. 30, pp. 113-119, 1997.

[24] M. Li, B. Ma, and L. Wang, "Finding Similar Regions in Many Strings," Proc. 31st Ann. ACM Symp. Theory of Computing (STOC -99), pp. 473-482, 1999.

[25] M. Li, B. Ma, and L. Wang, "On the Closest String and Substring Problems," J. ACM, vol. 49, no. 2, pp. 151-171, 2002.

[26] M. Tompa et al., "Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites," Nature Biotechnology, vol. 23, no. 1, pp. 137-144, 2005.

[27] M. Tompa, "An Exact Method for Finding Short Motifs in Sequences, with Application to the Ribosome Binding Site Problem," Proc. Seventh Int-l Conf. Intelligent Systems for Molecular Biology, pp. 262-271, 1999.

[28] M.F. Sagot, "Spelling Approximate Repeated or Common Motifs Using a Suffix Tree," Proc. Third Latin Am. Symp. Theoretical Informatics, pp. 111-127, 1998.

[29] M.J. Zaki, and C.J. Hsiao, "CHARM: An efficient algorithm for closed itemset mining," In Proc. 2002 SIAMInt. Conf. Data Mining, Arlington, VA, pp. 457-473, 2002.

[30] M.N. Garofalakis, R. Rastogi, and K. Shim, "SPIRIT: Sequential pattern mining with regular expression constraints," In Proc. 25th Int. Conf. Very Large Data Bases (VLDB '99), San Francisco, Morgan Kaufmann, pp 223-234, 1999.

[31] M.S. Waterman, R. Arratia, and D.J. Galas, "Pattern Recognition in Several Sequences: Consensus and Alignment," Bull. Math. Biology, vol. 46, pp. 515-527, 1984

[32] P. Pevzner and S. Sze, "Combinatorial Approaches to Finding Subtle Signals in DNA Sequences," Proc. Int-l Conf. Intelligent Systems for Molecular Biology, pp. 269-278, 2000.

[33] P.A. Evans and A.D. Smith, "Complexity of Approximating Closest Substring Problems," Fundamentals of Computation Theory, pp. 210-221, 2003.

[34] P.A. Evans and A.D. Smith, "Toward Optimal Motif Enumeration," Proc. Algorithms and Data Structures, Eighth Int-l Workshop (WADS -03), pp. 47-58, 2003.

[35] P.A. Evans and H.T. Wareham, "Practical Algorithms for Universal DNA Primer Design: An Exercise in Algorithm Engineering," Currents in Computational Molecular Biology 2001, N. El-Mabrouk, T. Lengauer, and D. Sankoff, eds., pp. 25-26, Les Publications CRM, 2001.

[36] P.A. Evans, A.D. Smith, and H.T. Wareham, "On the Complexity of Finding Common Approximate Substrings," Theoretical Computer Science, vol. 306, no. 1-3, pp. 407-430, 2003.

[37] R. Agarwal, C. Aggarwal, and V.V.V. Prasad, "A tree projection algorithm for generation of frequentitemsets," Journal of Parallel and Distributed Computing, 61:350-371, 2001.

[38] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. 20th Int-l Conf. Very Large Data Bases (VLDB -94), pp. 487-499, 1994.

[39] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and I. Verkamo, "Fast Discovery of the Association Rules," Advances in Knowledge Discovery and Data Mining, pp. 307-328, AAAI Press, 1996.

[40] R. Ng et al., "Exploratory Mining and Pruning Optimizations of Constrained Associations Rules," Proc. 1998 ACM SIGMOD Int'l Conf. Management of Data, ACM Press, New York, 1998, pp. 13- 24.

[41] R. Srikant, and R. Agrawal, "Mining sequential patterns: Generalizations and performance improvements," In Proc. 5th Int. Conf. Extending Database Technology (EDBT-96), Avignon, France, pp. 3-17, 1996.

[42] R. Srikant, Q. Vu, and R. Agrawal, "Mining association rules with item constraints," In Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD-97), Newport Beach, CA, pp. 67-73, 1997.

[43] Ruqian Lu, Caiyan Jia, Shaofang Zhang, Lusheng Chen, Hongyu Zhang, "An Exact Data Mining Method for Finding Center Strings and All Their Instances," IEEE Trans. Knowledge and Data Engineering," 19(4), 509-522, 2007.

[44] S. Rajasekaran, S. Balla, and C.-H. Huang, "Exact Algorithms for the Planted Motif Problem," J. Computational Biology, vol. 12, no. 8, pp. 1117-1128, 2005.

[45] S. Sinha and M. Tompa, "Discovery of Novel Transcription Factor Binding Sites by Statistical Overrepresentation," Nucleic Acids Research, vol. 30, no. 24, pp. 5549-5560, 2002.

[46] S. Sinha and M. Tompa, "YMF: A Program or Discovery of Novel Transcription Factor Binding Sites by Statistical Overrepresentation," Nucleic Acids Research, vol. 31, pp. 3586- 3588, 2003.

[47] S.T. Jensen, X.S. Liu, Q. Zhou, and J.S. Liu, "Computational Discovery of Gene Regulatory Binding Motifs: A Bayesian Perspective," Statistical Science, vol. 19, pp. 188-204, 2004.

[48] Sau Dan Lee, Luc De Raedt, "Constraint Based Mining of First Order Sequences in SeqLog," Database Support for Data Mining Applications 154-173, 2004.