Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30184
Impovement of a Label Extraction Method for a Risk Search System

Authors: Shigeaki Sakurai, Ryohei Orihara

Abstract:

This paper proposes an improvement method of classification efficiency in a classification model. The model is used in a risk search system and extracts specific labels from articles posted at bulletin board sites. The system can analyze the important discussions composed of the articles. The improvement method introduces ensemble learning methods that use multiple classification models. Also, it introduces expressions related to the specific labels into generation of word vectors. The paper applies the improvement method to articles collected from three bulletin board sites selected by users and verifies the effectiveness of the improvement method.

Keywords: Text mining, Risk search system, Corporate reputation, Bulletin board site, Ensemble learning

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1333662

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 990

References:


[1] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
[2] A. Esuli and F. Sebastiani, "SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining," Proc. 5th Conf. on Language Resources and Evaluation, 2006, Genoa, Italy, pp. 417-422.
[3] Y. Freund, "Boosting a Weak Learning Algorithm by Majority," Information and Computation, vol. 121, no. 2, pp. 256-285, 1995.
[4] Y. Freund and R. E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," J. of Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[5] C. -W. Hsu, C. -C. Chang, and C. -J. Lin, "A Practical Guide to Support Vector Classification," http://www.csie.ntu.edu.tw/˜cjlin/papers/guide/guide.pdf, 2008.
[6] M. Hu and B. Liu, "Mining and Summarizing Customer Reviews," Proc. 10th Intl. Conf. on Knowledge Discovery and Data Mining, 2004, Seattle, Washington, USA, pp. 168-177.
[7] N. Kobayashi, R. Iida, K. Inui, and Y. Matsumoto, "Opinion Extraction Using a Learning-Based Anaphora Resolution Technique," Proc. 2nd Intl. Joint Conf. on Natural Language Processing, 2005, Jeju Island, Korea, pp. 175-180.
[8] G. A. Miller, C. Fellbaum, R. Tengi, P. Wakefield, H. Langone, and B. R. Haskell, "WordNet," http://wordnet.princeton.edu/, 2006.
[9] S. Morinaga, K. Yamanishi, K. Tateishi, and T. Fukushima, "Mining Product Reputations on the Web," Proc. 8th Intl. Conf. on Knowledge Discovery and Data Mining, 2002, Edmonton, Alberta, Canada, pp. 341- 349.
[10] S. Sakurai and R. Orihara, "Discovery of Important Threads from Bulletin Board Sites," Intl. J. of Information Technology and Intelligent Computing, vol. 1, no. 1, pp. 217-228, 2006.
[11] S. Sakurai and R. Orihara, "Discovery of Important Threads using Thread Analysis Reports," Proc. 2006 IADIS Intl. Conf. of WWW/Internet 2006, 2006, Murcia, Spain, vol. 2, pp. 243-248.
[12] S. Sakurai, "A Risk Analysis Method using Textual Data on Bulletin Board Sites," Proc. 8th Intl. Sympo. on advanced Intelligent Systems, 2007, Sokcho, Korea, pp. 99-102.
[13] G. Salton and M. J. McGill, "Introduction to Modern Information Retrieval," McGraw Hill Computer Science Series, 1983.
[14] V. N. Vapnik, "The Nature of Statistical Learning Theory," Springer, 1995.