Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30517
Information Retrieval in Domain Specific Search Engine with Machine Learning Approaches

Authors: Shilpy Sharma

Abstract:

As the web continues to grow exponentially, the idea of crawling the entire web on a regular basis becomes less and less feasible, so the need to include information on specific domain, domain-specific search engines was proposed. As more information becomes available on the World Wide Web, it becomes more difficult to provide effective search tools for information access. Today, people access web information through two main kinds of search interfaces: Browsers (clicking and following hyperlinks) and Query Engines (queries in the form of a set of keywords showing the topic of interest) [2]. Better support is needed for expressing one's information need and returning high quality search results by web search tools. There appears to be a need for systems that do reasoning under uncertainty and are flexible enough to recover from the contradictions, inconsistencies, and irregularities that such reasoning involves. In a multi-view problem, the features of the domain can be partitioned into disjoint subsets (views) that are sufficient to learn the target concept. Semi-supervised, multi-view algorithms, which reduce the amount of labeled data required for learning, rely on the assumptions that the views are compatible and uncorrelated. This paper describes the use of semi-structured machine learning approach with Active learning for the “Domain Specific Search Engines". A domain-specific search engine is “An information access system that allows access to all the information on the web that is relevant to a particular domain. The proposed work shows that with the help of this approach relevant data can be extracted with the minimum queries fired by the user. It requires small number of labeled data and pool of unlabelled data on which the learning algorithm is applied to extract the required data.

Keywords: Search engines; machine learning, Informationretrieval, Active logic

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1074553

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1761

References:


[1] LookOff E-book, Engine Basics, http://www.lookoff.com/tactics/engines.php3 , Oct 24 2000.
[2] M. Jaczynski, B. Trousse, Broadway: A Case-Based System for Cooperative Information Browsing on the World-Wide-web, Collaboration between Human and Artificial Societies, pp. 264-283, 1999.
[3] Internet Fact and State, http://optistreams.com/factsandstats15.htm
[4] The Censorware Project, http://www.censorware.org/web_size, Jan. 26, 1999
[5] S. Lawrence and C.L. Giles, Searching the World Wide Web, Science 80:98-100, 1998.
[6] S. Lawrence and C.L. Giles, Accessibility of Information on the Web, Nature 400:107-109,1999.
[7] S. Chakrabarti, Data mining for hypertext: A tutorial survey, SIGKDD: SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, ACM 1(2): 1-11, 2000.
[8] L. Page, S. Brin, The anatomy of a large-scale hypertext web search engine, Proceeding of the seventh International World Wide Web Conference, 1998.
[9] S. Mizzaro, Relevance: The whole history, Journal of the American Society for Information Science, 48(9): 810-832, 1997.
[10] S. Lawrence, Context in web Search, IEEE Data Engineering Bulletin,
[11] Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. Proc. of the Conference on Computational Learning Theory (pp. 92-100).
[12] Collins, M., & Singer, Y. (1999). Unsupervised models for named entity classification. Proc. of the Empirical NLP and Very Large Corpora Conference (pp. 100-110). de Sa, V., & Ballard, D. (1998).
[13] T. M. Mitchell, Machine Learning, New York: McGraw-Hill, 1997.
[14] S. Chakrabarti, M. van der Berg, and B. Dom, Focused crawling: a new approach to topic-specific web resource discovery, Proceeding of the 8th International World Wide Web Conference (WWW8), 1999.