Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31100
Use of Bayesian Network in Information Extraction from Unstructured Data Sources

Authors: Quratulain N. Rajput, Sajjad Haider


This paper applies Bayesian Networks to support information extraction from unstructured, ungrammatical, and incoherent data sources for semantic annotation. A tool has been developed that combines ontologies, machine learning, and information extraction and probabilistic reasoning techniques to support the extraction process. Data acquisition is performed with the aid of knowledge specified in the form of ontology. Due to the variable size of information available on different data sources, it is often the case that the extracted data contains missing values for certain variables of interest. It is desirable in such situations to predict the missing values. The methodology, presented in this paper, first learns a Bayesian network from the training data and then uses it to predict missing data and to resolve conflicts. Experiments have been conducted to analyze the performance of the presented methodology. The results look promising as the methodology achieves high degree of precision and recall for information extraction and reasonably good accuracy for predicting missing values.

Keywords: Machine Learning, Ontology, information extraction, Bayesian network

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1892


[1] Antoniou, G., Harmelen, F.V.: A Semantic Web Primer. 2nd Edition. MIT Press (2004)
[2] Buneman P., Semistructured data. In Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. Arizona, United States (1997) 117-121
[3] Buntine, W. L. (1996), "A Guide to the Literature on Learning Probabilistic Networks from Data," IEEE Transactions on Knowledge and Data Engineering, 8, pp. 195-210
[4] Chariak, E. (1991), "Bayesian Network without Tears," AI Magzine, Winter
[5] Cooper, G. F. and E. Herskovits (1992). "A Bayesian Method for the Induction of Probabilistic Networks from Data." Machine Learning 9: 309-347.
[6] Embley D.W., Ding Y., Liddle S. W., and Vickers M.: Automatic Creation And Simplified Querying Of Semantic Web Content. In Proceedings of First Asian Semantic Conference (ASWC), Beijing China (2006)
[7] Embley, D.W., Campbell, D.M., Jiang, Y.S., Liddle, S.W., Lonsdale, D.W., Ng, Y.k., Smith, R.D.: Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages. Journal of Data and Knowledge Engineering, Vol.31(3), (1999) 227-251
[8] Embley, D.W., Tao, C., Liddle, S.W.: Automating the Extraction of Data from HTML Tables with Unknown Structure. Journal of Data & knowledge Engineering. Vol. 54(1), (2005) 3-28
[9] Gruber, T. R. A Translation Approach to Portable Ontology Specifications. 1993
[10] Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM Semi-automatic CREAtion of Metadata. In Proceedings of 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW), Siguenza Spain (2002)
[11] Hrycej, T. (1990), "Gibbs Sampling in Bayesian Networks," Artificial Intelligence, pg. 351-363
[12] Jensen, F. V. (2001), Bayesian Networks and Decision Graphs, Springer-Veralg.
[13] Laender, A.H.F., Ribeiro-Neto, B.A., da Silva A.S., Teixeira J.S.: A Brief Survey of Web Data Extraction Tools. In ACM SIGMOD Record, Vol. 31(2) (2002) 84-93
[14] Laskey K.B., Myers J.W., DeJong K. A., Learning Bayesian Network from Incomplete data using Evolutionary Algorithms. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. George Mason University (1999).
[15] Lauritzen, S. L. (1995), "The EM Algorithm for Graphical Association Models with Missing Data", Computational Statistics and Data Analysis, 19, pp. 191-201.
[16] Neapolitan, R. E. (2003), Learning Bayesian Networks, Prentice Hall, 2003.
[17] Partee, Barbara H., Alice ter Meulen and Robert E. Wall (1990). Mathematical methods in linguistics. Dordrecht, The Netherlands: Kluwer Academic Publishers.
[18] Pearl, J. (1987), Probabilistic Reasoning in Intelligent Systems: Network of Plausible Inference, Morgan Kaufmann, 1987.
[19] Peter and Mika, Social Networks and the Semantic Web Series: Semantic Web and Beyond , Vol. 5 (2007)
[20] Pretorius, A.J., Lexon visualization: visualizing binary fact types in ontology bases. In proceedings of the 8th International Conference on Information Visualizations. Vol. 14-16 July Washington,DC,USA.(2004)58-63
[21] Rajput, Q., Haider, S., Tauheed N.: Information Extraction from Unstructured and Ungrammatical Data Sources for Semantic Annotation. Submitted to International Conference on Ontology and Semantic Engineering, Rome, Italy, April (2009)
[22] Ramoni, M., and Sebastani, P. (1998), Parameter Estimation in Bayesian Networks from Incomplete Databases, Intelligent Data Analysis, 2.
[23] Ramoni, M., and Sebastiani, P. (1995), "Learning Bayesian Networks from Incomplete Databases," Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence.
[24] Tang, J., Li, J., Lu, H., Liang, B., Huang, X., Wang, K.: IASA: Learning to Annotate the Semantic Web. Journal on Data Semantics. Vol. 4. (2005) 110-145
[25] Tjoa, A., Wagner, R., Andjomshoa, A., Shayeganfar, F.: Semantic Web: Challenges and New Requirements. In Proceedings. Sixteenth International Workshop on Database and Expert Systems Application (DEXA). Copenhagen Denmark (2005) 1160 - 1163
[26] Vargas-Vera, M., Motta, E., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F: MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup. In Proceedings of The 13th International Conference on Knowledge Engineering and Management. Seguenza Spain (2002)
[27] Wilson, M., Matthews, B.: The Semantic Web: Prospects And Challenges. In Proceeding of 7th International Baltic Conference on Databases and Information Systems. Vilnius Lithuania (2006)
[28] Yildiz, B., Miksch, S.: Motivating ontology-driven information extraction. In Prasad, A., Madalli, D., eds.: International Conference on Semantic Web and Digital Libraries. Indian Statistical Institute Platinum Jubilee Conference Series (2007) 45-53
[29] Yildiz Burcu, Miksch Silvia. ontoX - A Method for Ontology-Driven Information Extraction. In: Computational Science and Its Applications (ICCSA 2007), LNCS 4707, Springer-Verlag, 2007, S. 660 - 673.
[30] Mittal, A. and Kassim, A., Bayesian Network Technologies: Applications and Graphical Models, IGI Publishing (2007)
[31] Pourret, O., Naim, Patrick, and Marcot, B., Bayesian Networks: A Practical Guide to Applications, Wiley (2008)
[32] Heckerman, D. A Tutorial on Learning in Bayesian Networks, Learning in Graphical Models, MIT Press (1999).