Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32468
Using Textual Pre-Processing and Text Mining to Create Semantic Links

Authors: Ricardo Avila, Gabriel Lopes, Vania Vidal, Jose Macedo


This article offers a approach to the automatic discovery of semantic concepts and links in the domain of Oil Exploration and Production (E&P). Machine learning methods combined with textual pre-processing techniques were used to detect local patterns in texts and, thus, generate new concepts and new semantic links. Even using more specific vocabularies within the oil domain, our approach has achieved satisfactory results, suggesting that the proposal can be applied in other domains and languages, requiring only minor adjustments.

Keywords: Semantic links, data mining, linked data, SKOS.

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 946


[1] Miles, A. & Brickley, D. (2009, August 18). SKOS Simple Knowledge Organization System Primer. Retrieved from
[2] Agˆencia Nacional do Petr´oleo, G´as Natural e Biocombust´ıveis (2016, August 19). Gloss´ario Retrieved from
[3] Fern´andez, E. F., Pedrosa Junior, O., Pinho , A. C. (2015, January 7). Dicion´ario do Petr´oleo Retrieved from
[4] Anthonysamy, P., Edwards, M. J., Weichel, C. & Rashid, A. (2016). Inferring Semantic Mapping Between Policies and Code: The Clue is in the Language. In: ESSoS (p./pp. 233-250): Springer.
[5] Avila, Ricardo, Santos, Salomao, Araujo, David, Vidal, Vania Maria Ponte and de Macedo, Jose Antonio Fernandes. Semantic Links Using SKOS Predicates. Paper presented at the meeting of the KES, 2017.
[6] Bland, J. M. and D. G. Altman (1996). Transformations, means, and confidence intervals. 312(7038), 1079.
[7] Bot, M. C. J. (2000). Improving Induction of Linear Classification Trees with Genetic Programming. In: Proc. of the Genetic and Evolutionary Computation Conference (GECCO-2000). Las Vegas,Nevada,USA, pp. 403–410.
[8] Brown, M. L. and J. F. Kros (2009). Imprecise Data and the Data Mining Process. In: Encyclopedia of Data Warehousing and Mining. IGI Global, pp. 999–1005.
[9] Chakrabarti, S. (2002). Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman.
[10] Engels, R., G. Lindner, and R. Studer (1997). A Guided Tour through the Data Mining Jungle. In: KDD. AAAI Press, pp. 163–166.
[11] Engels, R. and C. Theusinger (1998). Using a Data Metric for Preprocessing Advice for Data Mining Applications. In: ECAI, pp. 430–434.
[12] Hasan, M. A., V. Chaoji, S. Salem, and M. Zaki (2006). Link Prediction Using Supervised Learning. In: Proc. of SDM 06 workshop on Link Analysis, Counterterrorism and Security.
[13] Joachims, T. (2002). Learning to Classify Text Using Support Vector Machines. Kluwer Academic Publishers.
[14] Kuhn, M. and K. Johnson (2013). Applied predictive modeling. Vol. 26. Springer.
[15] Lampos, V., B. Zou, and I. J. Cox (2017). Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance. In: WWW. ACM, pp. 695–704.
[16] Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell apine cone from an ice cream cone. In: Proceedings of ACM SIGDOC Conference, pp. 24–26.
[17] Lichtenwalter, R., J. T. Lussier, and N. V. Chawla (2010). New perspectives and methods in link prediction. In: KDD, pp. 243–252.
[18] Miles, A. and S. Bechhofer (2008). SKOS Simple Knowledge Organization System Reference. W3C. URL:
[19] Miles, A., B. Matthews, M. Wilson, and D. Brickley (2005). SKOS core: Simple Knowledge Organisation for the Web. In: Proc. of international conference on DC and metadata applications. DC Metadata Initiative, pp. 1–9.
[20] Morik, K. (2000). The Representation Race - Preprocessing for Handling Time Phenomena. In: ECML. Vol. 1810. Lecture Notes in Computer Science. Springer, pp. 4–19.
[21] Muller, P., C. Fabre, and C. Adam (2014). Predicting the relevance of distributional semantic similarity with contextual information. In: Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics. Volume 1, pp. 479–488.
[22] Mustapha, S. M. F. D. S. (2018). Case-based reasoning for identifying knowledge leader within online community. Expert Syst. Appl. 97, 244–252.
[23] Su, Y. and S.-U. Guan (2016). Density and Distance Based KNN Approach to Classification. IJAEC7(2), 45–60.
[24] Sun, S., D. Liu, G. Li, W. Yu, and L. Pang (2010). Combination of Ontology Model and Semantic Link Network in Web Resource Retrieval. In: SKG. IEEE Computer Society, pp. 285–288.
[25] Ukey, K. and A. Alvi (2012). Text Classification using Support Vector Machine. In: International Journal of Engineering and Technology (IJERT).
[26] Volker, J., P. Haase, and P. Hitzler (2009). Learning expressive ontologies. IOS Press.
[27] Volz, J., C. Bizer, M. Gaedke, and G. Kobilarov (2009). Discovering and Maintaining Links on the Web of Data. In: International Semantic Web Conference. Vol. 5823. Springer, pp. 650–665.
[28] Wang, Z., J. Li, Y. Zhao, R. Setchi, and J. Tang (2013). A unified approach to matching semantic data onthe Web. Knowl.-Based Syst.39, 173–184.
[29] Weiss, S. M., N. Indurkhya, T. Zhang, and F. Damerau (2005). Text Mining: Predictive Methods for Analyzing Unstructured Information. Springer.
[30] Zhang, C., G.-R. Xue, Y. Yu, and H. Zha (2009). Web-scale classification with naive bayes. In: WWW.ACM, pp. 1083–1084.
[31] Zhang, J. and Y. Yang (2003). Robustness of regularized linear classification methods in text categorization. In: SIGIR. ACM, pp. 190–197.
[32] Zhuge, H. (2009). Communities and Emerging Semantics in Semantic Link Network: Discovery and Learning. IEEE Trans. Knowl. Data Eng.21 (6), 785–799.