On the Interactive Search with Web Documents

Mario Kubek; Herwig Unger

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32807

On the Interactive Search with Web Documents

Authors: Mario Kubek, Herwig Unger

Abstract:

Due to the large amount of information in the World Wide Web (WWW, web) and the lengthy and usually linearly ordered result lists of web search engines that do not indicate semantic relationships between their entries, the search for topically similar and related documents can become a tedious task. Especially, the process of formulating queries with proper terms representing specific information needs requires much effort from the user. This problem gets even bigger when the user's knowledge on a subject and its technical terms is not sufficient enough to do so. This article presents the new and interactive search application DocAnalyser that addresses this problem by enabling users to find similar and related web documents based on automatic query formulation and state-ofthe- art search word extraction. Additionally, this tool can be used to track topics across semantically connected web documents.

Keywords: DocAnalyser, interactive web search, search word extraction, query formulation, source topic detection, topic tracking.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1097152

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1609

References:

[1] L. Page, S. Brin, R. Motwani, T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web”, Technical Report, Stanford Digital Library Technologies Project, 1998.
[2] Website of Google Autocomplete, Web Search Help, https://support.google.com/websearch/answer/106230
[3] M. Kubek, H.F. Witschel, “Searching the Web by Using the Knowledge in Local Text Documents”, In Proceedings of Mallorca Workshop 2010 Autonomous Systems, Shaker Verlag, Aachen, 2010.
[4] K. Yee, K. Swearingen, K. Li, M. Hearst, “Faceted Metadata for Image Search and Browsing”, CHI ’03 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 401–408, New York, 2003.
[5] F. Tushabe, M. H. Wilkinson, “Content-based Image Retrieval Using Combined 2D Attribute Pattern Spectra”, Advances in Multilingual and Multimodal Information Retrieval, pp. 554–561, Springer, Heidelberg, 2008.
[6] P. Sukjit, M. Kubek, T. Böhme, H. Unger, “PDSearch: Using Pictures as Queries”, Recent Advances in Information and Communication Technology, Advances in Intelligent Systems and Computing, Vol. 265, pp. 255–262, Springer International Publishing, 2014.
[7] J. Wang, J. Liu, C. Wang, “Keyword Extraction Based on PageRank”, Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, Vol. 4426, pp. 857–864, Springer Berlin Heidelberg, 2007.
[8] R. Mihalcea, P. Tarau, “TextRank: Bringing Order into Texts”, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), pp. 401–411, Association for Computational Linguistics, 2004.
[9] M. Kubek, H. Unger, “Search Word Extraction Using Extended PageRank Calculations”, Autonomous Systems: Developments and Trends, Volume 391 of Studies in Computational Intelligence, pp. 325– 337, Springer Berlin Heidelberg, 2011.
[10] G. Salton, A. Wong, C.S. Yang, “A vector space model for automatic indexing”, Communications. of the ACM, Vol. 18, Issue 11, pp. 613– 620, 1975.
[11] G. Heyer, U. Quasthoff, T. Wittig, Text Mining: Wissensrohstoff Text: Konzepte, Algorithmen, Ergebnisse, W3L-Verlag, 2006.
[12] M. Kubek, “Dezentrale, kontextbasierte Steuerung der Suche im Internet“, PhD Thesis, FernUniversität in Hagen, 2012.
[13] J. M. Kleinberg, “Authoritative sources in a hyperlinked environment”, Proc. of ACM-SIAM Symp.on Discrete Algorithms, San Francisco, California, pp. 668–677, 1998.
[14] Website of DocAnalyser, http://www.docanalyser.de, 2014, Last retrieved on 10/01/2014
[15] M. Kubek, H. Unger, “On N-term Co-occurrences”, Recent Advances in Information and Communication Technology, Advances in Intelligent Systems and Computing, Vol. 265, pp. 63–72, Springer International Publishing, 2014.
[16] J.B. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations”, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 281–297, University of California Press, 1967.
[17] C. Biemann, “Chinese Whispers: An Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems”, Proceedings of the HLT-NAACL-06 Workshop on Textgraphs-06, pp. 73–80, ACL, New York City, 2006.
[18] V. Heß, “Implementierung und Evaluation eines Verfahrens zur Themenverfolgung in großen Korpora“, Master’s thesis, FernUniversit¨at in Hagen, 2014.