WASET

%0 Journal Article
%A Shilpy Sharma
%D 2008
%J International Journal of Industrial and Manufacturing Engineering
%B World Academy of Science, Engineering and Technology
%I Open Science Index 18, 2008
%T Information Retrieval in Domain Specific Search Engine with Machine Learning Approaches
%U https://publications.waset.org/pdf/10075
%V 18
%X As the web continues to grow exponentially, the idea
of crawling the entire web on a regular basis becomes less and less
feasible, so the need to include information on specific domain,
domain-specific search engines was proposed. As more information
becomes available on the World Wide Web, it becomes more difficult
to provide effective search tools for information access. Today,
people access web information through two main kinds of search
interfaces: Browsers (clicking and following hyperlinks) and Query
Engines (queries in the form of a set of keywords showing the topic
of interest) [2]. Better support is needed for expressing one's
information need and returning high quality search results by web
search tools. There appears to be a need for systems that do reasoning
under uncertainty and are flexible enough to recover from the
contradictions, inconsistencies, and irregularities that such reasoning
involves. In a multi-view problem, the features of the domain can be
partitioned into disjoint subsets (views) that are sufficient to learn the
target concept. Semi-supervised, multi-view algorithms, which
reduce the amount of labeled data required for learning, rely on the
assumptions that the views are compatible and uncorrelated. This
paper describes the use of semi-structured machine learning approach
with Active learning for the “Domain Specific Search Engines". A
domain-specific search engine is “An information access system that
allows access to all the information on the web that is relevant to a
particular domain. The proposed work shows that with the help of
this approach relevant data can be extracted with the minimum
queries fired by the user. It requires small number of labeled data and
pool of unlabelled data on which the learning algorithm is applied to
extract the required data.
%P 643 - 646