TY - JFULL AU - Dasom Kim and Chen Liu and Myungsu Lim and Soo-Hyeon Jeon and Byeoung Kug Jeon and Kee-Young Kwahk and Namgyu Kim PY - 2015/11/ TI - A Methodology for Automatic Diversification of Document Categories T2 - International Journal of Computer and Information Engineering SP - 2199 EP - 2205 VL - 9 SN - 1307-6892 UR - https://publications.waset.org/pdf/10002629 PU - World Academy of Science, Engineering and Technology NX - Open Science Index 106, 2015 N2 - Recently, numerous documents including large volumes of unstructured data and text have been created because of the rapid increase in the use of social media and the Internet. Usually, these documents are categorized for the convenience of users. Because the accuracy of manual categorization is not guaranteed, and such categorization requires a large amount of time and incurs huge costs. Many studies on automatic categorization have been conducted to help mitigate the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorize complex documents with multiple topics because they work on the assumption that individual documents can be categorized into single categories only. Therefore, to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, the learning process employed in these studies involves training using a multi-categorized document set. These methods therefore cannot be applied to the multi-categorization of most documents unless multi-categorized training sets using traditional multi-categorization algorithms are provided. To overcome this limitation, in this study, we review our novel methodology for extending the category of a single-categorized document to multiple categorizes, and then introduce a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology. ER -