Model Discovery and Validation for the Qsar Problem using Association Rule Mining

Luminita Dumitriu; Cristina Segal; Marian Craciun; Adina Cocu; Lucian P. Georgescu

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Model Discovery and Validation for the Qsar Problem using Association Rule Mining

Authors: Luminita Dumitriu, Cristina Segal, Marian Craciun, Adina Cocu, Lucian P. Georgescu

Abstract:

There are several approaches in trying to solve the Quantitative 1Structure-Activity Relationship (QSAR) problem. These approaches are based either on statistical methods or on predictive data mining. Among the statistical methods, one should consider regression analysis, pattern recognition (such as cluster analysis, factor analysis and principal components analysis) or partial least squares. Predictive data mining techniques use either neural networks, or genetic programming, or neuro-fuzzy knowledge. These approaches have a low explanatory capability or non at all. This paper attempts to establish a new approach in solving QSAR problems using descriptive data mining. This way, the relationship between the chemical properties and the activity of a substance would be comprehensibly modeled.

Keywords: association rules, classification, data mining, Quantitative Structure - Activity Relationship.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1080494

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1794

References:

[1] Agrawal, R., Imielinski, T. and Swami (1993) "Mining association rules between sets of items in large databases", in Proceedings of 1993 ACM SIGMOD International Conference on Management of Data, Washington D.C., pp. 207-216.
[2] Deshpande, M., Kuramochi, M., Wale, N. and George Karypis, G., (2005) "Frequent Substructure-Based Approaches for Classifying Chemical Compounds" in IEEE Transaction on Knowledge and Data Engineering, Vol 17(8): 1036-1050
[3] Dumitriu, L., (2002) "Interactive mining and knowledge reuse for the closed-itemset incremental-mining problem", Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining, ed. U. Fayyad, Vol 3:2, pp. 28-36, ian. 2002, http://www.acm.org/ sigkdd/ explorations.
[4] Langdon, W. B. and Barrett, S. J., (2004) "Genetic Programming in Data Mining for Drug Discovery", in Evolutionary Computing in Data Mining, Springer, 2004, Ashish Ghosh and Lakhmi C. Jain, 163, Studies in Fuzziness and Soft Computing, 10, ISBN 3-540-22370-3, pp. 211--235.
[5] Neagu, C.D., Benfenati, E., Gini, G., Mazzatorta, P., Roncaglioni, A., (2002) "Neuro-Fuzzy Knowledge Representation for Toxicity Prediction of Organic Compounds", in Proceedings of the 15th European Conference on Artificial Intelligence, Frank van Harmelen (Ed.):, ECAI'2002, Lyon, France, July 2002. IOS Press 2002: pp. 498-502
[6] Wang, Z., Durst, G., Eberhart, R., Boyd, D., Ben-Miled, Z., (2004) "Particle Swarm Optimization and Neural Network Application for QSAR", in the Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 26-30 April 2004, Santa Fe, New Mexico, USA. IEEE Computer Society 2004, ISBN 0- 7695-2132-0.
[7] Wille, R. (1982) "Restructuring lattice theory: an approach based on hierarchies of concepts", in Ordered Sets, Proceedings of NATO Advanced Study Institute, D. Reidel Publisher Co., pp. 445-470.
[8] Zaki, M.J. and Ogihara, M. (1998) "Theoretical Foundations of Association Rules", in Proceedings of the 3rd SIGMOD-98 Workshop on DMKD, Seattle, WA, pp 7:1-7:8.