Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31824
Analysis of Textual Data Based On Multiple 2-Class Classification Models

Authors: Shigeaki Sakurai, Ryohei Orihara


This paper proposes a new method for analyzing textual data. The method deals with items of textual data, where each item is described based on various viewpoints. The method acquires 2- class classification models of the viewpoints by applying an inductive learning method to items with multiple viewpoints. The method infers whether the viewpoints are assigned to the new items or not by using the models. The method extracts expressions from the new items classified into the viewpoints and extracts characteristic expressions corresponding to the viewpoints by comparing the frequency of expressions among the viewpoints. This paper also applies the method to questionnaire data given by guests at a hotel and verifies its effect through numerical experiments.

Keywords: Text mining, Multiple viewpoints, Differential analysis, Questionnaire data

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1138


[1] A. Cardoso-Cachopo and A. L. Oliveira, "An Empirical Comparison of Text Categorization Methods," Proc. of the 10th Intl. Sympo. on String Processing and Information Retrieval, 2003, Manaus, Brazil, pp. 183-196.
[2] R. Feldman and H. Hirsh, "Mining Text using Keyword Distributions," Journal of Intelligent Information Systems, vol. 10, no. 3, pp. 281-300, 1998.
[3] M. A. Hearst, "Untangling Text Data Mining," Proc. of the 37th Annual Meeting of the Association for Computational Linguistics, 1999, Montreal, Canada, pp. 20-26.
[4] C. -W. Hsu, C. -C. Chang, and C. -J. Lin, "A Practical Guide to Support Vector Classification,"˜cjlin/papers/guide/guide.pdf, 2003.
[5] Y. Ichimura, Y. Nakayama, M. Miyoshi, T. Akahane, T. Sekiguchi, and Y. Fujiwara, "Text Mining System for Analysis of a Salesperson-s Daily Reports," Proc. of the Pacific Association for Computational Linguistics 2001, 2001, Kitakyushu, Japan, pp. 127-135.
[6] A. Ittycheriah, M. Franz, W. -J. Zhu, and A. Ratnaparkhi, "IBM-s Statistical Question Answering System," Proc. of the 9th Text Retrieval Conf. 2000, Gaithersburg, Maryland, USA, pp. 229-234.
[7] T. Joachims, "Text Categorization with Support Vector Machines: Learning with Many Relevant Features," Proc. of the 10th European Conf. on Machine Learning, 1998, Dorint-Parkhotel, Chemnitz, Germany, pp. 137-142.
[8] T. Joachims, "Transductive Inference for Text Classification using Support Vector Machines," Proc. of the 16th Intl. Conf. on Machine Learning, 1999, Bled, Slovenia, pp. 27-30.
[9] S. Sakurai, Y. Ichimura, A. Suyama, and R. Orihara, "Acquisition of a Knowledge Dictionary for a Text Mining System using an Inductive Learning Method," Proc. of the IJCAI 2001 Workshop on Text Learning: Beyond Supervision, 2001, Seattle, Washington, USA, pp. 45-52.
[10] S. Sakurai and A. Suyama, "An E-mail Analysis Method based on Text Mining Techniques," Applied Soft Computing, vol. 6, no. 1, pp. 62-71, 2005.
[11] G. Salton and M. J. McGill, "Introduction to Modern Information Retrieval," Mcgraw-Hill, New York, USA, 1983.
[12] P. -N. Tan, H. Blau, S. Harp, and R. Goldman, "Data Mining of Service Center Call Records," Proc. of the 6th Intl. Conf. on Knowledge Discovery and Data Mining, 2000, Boston, Massachusetts, USA, pp. 417-423.
[13] S. Tellex, B. Katz, J. Lin , and A. Fernandes, "Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering," Proc. of the 26th Intl. Conf. on Research and Development in Information Retrieval, 2003, Toronto, Canada, pp. 41-47.
[14] V. N. Vapnik, "The Nature of Statistical Learning Theory," Springer, New York, USA, 1995.
[15] Y. Yang and X. Liu, "A Re-examination of Text Categorization Methods," Proc. of the 22nd Intl. Conf. on Research and Development in Information Retrieval, 1999, Berkeley, California, USA, pp. 15-19.