Data-organization Before Learning Multi-Entity Bayesian Networks Structure
Authors: H. Bouhamed, A. Rebai, T. Lecroq, M. Jaoua
Abstract:
The objective of our work is to develop a new approach for discovering knowledge from a large mass of data, the result of applying this approach will be an expert system that will serve as diagnostic tools of a phenomenon related to a huge information system. We first recall the general problem of learning Bayesian network structure from data and suggest a solution for optimizing the complexity by using organizational and optimization methods of data. Afterward we proposed a new heuristic of learning a Multi-Entities Bayesian Networks structures. We have applied our approach to biological facts concerning hereditary complex illnesses where the literatures in biology identify the responsible variables for those diseases. Finally we conclude on the limits arched by this work.
Keywords: Data-organization, data-optimization, automatic knowledge discovery, Multi-Entities Bayesian networks, score merging.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1333382
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1610References:
[1] M. L. Damian and F. H. Donald, "Combining multiple scoring systems for target tracking using rank-score characteristics," Information Fusion, 10, 124-136, 2009.
[2] S. Detera-Wadleigh and F. McMahon, "G72/g30 in schizophrenia and bipolar disorder: review and meta-analysis," Biological Psychiatry, 60(2): 106-114, 2006.
[3] P. Dempster, N. Laird and B. D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Stat Soc B 39: 1-38, 1977.
[4] M. Geudj, J. Wojcik, D. Robelin, M. Hoebeke, M. Lamarine and G. Nuel, "Detecting Local High-Scoring Segments: a First-Stage Approach for Genome-Wide Association Studies," Statistical Applications in Genetics and Molecular Biology, Vol. 5, Iss. 1, Article 22 2006.
[5] A. Jain, K. Nandakumar and A. Ross, "Score normalization in multimodal biometric systems," Pattern Recognition, volume 38 Issue 12, Pages 2270-2285, Dec 2005.
[6] S. Karlin and S. Altshul, "Applications and statistics for multiple highscoring segments in molecular sequences," Proceedings of the National Academy of Science USA 90, 5873-5877, 1993.
[7] K. B. Laskey, "MEBN: A language for first-order Bayesian knowledge bases," Artificial Intelligence, 172, 140-178, 2007.
[8] O. Francois, and P. Leray, "Evaluation d'algorithmes d'apprentissage de structure pour les réseaux bayésiens," In Proceedings of 14ème Congrès Francophone Reconnaissance des Formes et Intelligence Artificielle, RFIA, pages 1453-1460, Toulouse, France, 2004.
[9] H. N. Parkash and D. S. Guru, "Offline signature verification: An approach based on score level fusion," International journal of computer applications, 0975-8887, Article 10, No.18, 2010.
[10] R. W. Robinson, "Counting unlabeled acyclic digraphs," Combinatorial Mathematics, 622, 28-43, 1977.
[11] D. Zaykin, L. Zhivotovsky, P. Westfall and B. Weir, "Truncated product method for combining P-values," Genet Epidemiol, 22(2), 170-85, Feb 2002.
[12] O. Fran├ºois, "De l-identification de structure de réseaux bayésiens ├á la reconnaissance de formes ├á partir d-informations completes o├╣ incompletes, "Thèse de doctorat. Institut National des Science Appliquées de Rouen, 2006.
[13] P. Leray, "Réseaux Bayésiens: apprentissage et modélisation de systèmes complexes," habilitation ├á diriger les recherches, Université de Rouen, 2006.
[14] B. Efron, "The lenght heuristic for simultaneous hypothesis tests," Biometrica, 84, 143-157, 1997.
[15] C. Herman and E. L. Lehman, "The use of Maximum Likelihood Estimates in chi-square tests for goodness of fit," The annals of Mathematical Statistics volume 25, Number 3, 579-586, 1954.
[16] X. Rui, and C. W. Donald, "Clustering," IEEE Press/Wiley, oct 2008.