%0 Journal Article %A David B. Bracewell and Fuji Ren and Shingo Kuroiwa %D 2008 %J International Journal of Computer and Information Engineering %B World Academy of Science, Engineering and Technology %I Open Science Index 18, 2008 %T Mining News Sites to Create Special Domain News Collections %U https://publications.waset.org/pdf/2162 %V 18 %X We present a method to create special domain collections from news sites. The method only requires a single sample article as a seed. No prior corpus statistics are needed and the method is applicable to multiple languages. We examine various similarity measures and the creation of document collections for English and Japanese. The main contributions are as follows. First, the algorithm can build special domain collections from as little as one sample document. Second, unlike other algorithms it does not require a second “general" corpus to compute statistics. Third, in our testing the algorithm outperformed others in creating collections made up of highly relevant articles. %P 2072 - 2079