Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1

Special DomainCollections Related Publications

1 Mining News Sites to Create Special Domain News Collections

Authors: David B. Bracewell, Fuji Ren, Shingo Kuroiwa

Abstract:

We present a method to create special domain collections from news sites. The method only requires a single sample article as a seed. No prior corpus statistics are needed and the method is applicable to multiple languages. We examine various similarity measures and the creation of document collections for English and Japanese. The main contributions are as follows. First, the algorithm can build special domain collections from as little as one sample document. Second, unlike other algorithms it does not require a second “general" corpus to compute statistics. Third, in our testing the algorithm outperformed others in creating collections made up of highly relevant articles.

Keywords: Information Retrieval, news, Special DomainCollections

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1234