%0 Journal Article
	%A David B. Bracewell and  Fuji Ren and  Shingo Kuroiwa
	%D 2008
	%J International Journal of Computer and Information Engineering
	%B World Academy of Science, Engineering and Technology
	%I Open Science Index 18, 2008
	%T Mining News Sites to Create Special Domain News Collections
	%U https://publications.waset.org/pdf/2162
	%V 18
	%X We present a method to create special domain
collections from news sites. The method only requires a single
sample article as a seed. No prior corpus statistics are needed and the
method is applicable to multiple languages. We examine various
similarity measures and the creation of document collections for
English and Japanese. The main contributions are as follows. First,
the algorithm can build special domain collections from as little as
one sample document. Second, unlike other algorithms it does not
require a second “general" corpus to compute statistics. Third, in our
testing the algorithm outperformed others in creating collections
made up of highly relevant articles.
	%P 2072 - 2079