Discovery of Time Series Event Patterns based on Time Constraints from Textual Data
Authors: Shigeaki Sakurai, Ken Ueno, Ryohei Orihara
Abstract:
This paper proposes a method that discovers time series event patterns from textual data with time information. The patterns are composed of sequences of events and each event is extracted from the textual data, where an event is characteristic content included in the textual data such as a company name, an action, and an impression of a customer. The method introduces 7 types of time constraints based on the analysis of the textual data. The method also evaluates these constraints when the frequency of a time series event pattern is calculated. We can flexibly define the time constraints for interesting combinations of events and can discover valid time series event patterns which satisfy these conditions. The paper applies the method to daily business reports collected by a sales force automation system and verifies its effectiveness through numerical experiments.
Keywords: Text mining, sequential mining, time constraints, daily business reports.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1070747
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490References:
[1] R. Agrawal and R. Srikant, "Mining Sequential Patterns," in Proc. of the 11th Int. Conf. Data Engineering, 1995, Taipei, Taiwan, pp. 3-14.
[2] R. Feldman, I. Dagan, and H. Hirsh, "Mining Text using Keyword Distributions," J. of Intelligent Information Systems, vol. 10, no.3, pp. 281-300, May, 1998.
[3] M. N. Garofalakis, R. Rastogi, and K. Shim, "SPIRIT: Sequential Pattern Mining with Regular Expression Constraints," in Proc. of the Very Large Data Bases Conf. 1999, 1999, Edinburgh, Scotland, UK, pp. 223-234.
[4] Y. Ichimura, Y. Nakayama, M. Miyoshi, T. Akahane, T. Sekiguchi, and Y. Fujiwara, "Text Mining System for Analysis of a Salesperson-s Daily Reports," in Proc. of the Pacific Association for Computational Linguistics 2001, 2001, Kitakyushuu, Japan, pp. 127-135.
[5] K. Lagus, T. Honkela, S. Kaski, and T. Kohonen, "Websom for Textual Data Mining," J. of Artificial Intelligence Review, vol. 13, no. 5/6, pp. 335-364, Dec., 1999.
[6] V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, and J. Allan, "Mining of Concurrent Text and Time-Series," in Proc. of the KDD-2000 Workshop on Text Mining, 2000, Boston, Massachusetts, USA, pp. 37-44.
[7] B. Lent, R. Agrawal, and R. Srikant, "Discovering Trends in Text Databases," in Proc. of the 3rd Int. Conf. on Knowledge Discovery and Data Mining, 1997, Newportbeach, California, USA, pp. 227-230.
[8] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, "PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth," in Proc. of 2001 Int. Conf. Data Engineering, 2001, Heidelberg, Germany, pp. 215-224.
[9] J. Pei, J. Han, and W. Wang, "Mining Sequential Patterns with Constraints in Large Databases," in Proc. of the 11th ACM Int. Conf. on Information and Knowledge Management, 2002, McLean, Virginia, USA, pp. 4-9.
[10] S. Sakurai, Y. Ichimura, and A. Suyama, "Acquisition of a Knowledge Dictionary from Training Examples including Multiple Values," Proc. of the 13th Int. Symposium on Methodologies for Intelligent Systems, 2002, Lyon, France, pp. 103-113.
[11] S. Sakurai and K. Ueno, "Analysis of Daily Business Reports Based on Sequential Text Mining Method," in Proc. of the 2004 IEEE Int. Conf. on Systems, Man and Cybernetics, 2004, Hague, Netherlands, pp. 3279- 3284.
[12] R. Srikant and R. Agrawal, "Mining Sequential Patterns: Generalizations and Performance Improvements," Proc. of the 5th Int. Conf. Extending Database Technology, 1996, Avignon, France, pp. 3-17.
[13] R. Swan and D. Jensen, "TimeMines: Constructing Timelines with Statistical Models of Word Usage," Proc. of the KDD-2000 Workshop on Text Mining, 2000, Boston, Massachusetts, USA, pp. 73-80.
[14] M. J. Zaki, "SPADE: An Efficient Algorithm for Mining Frequent Sequences," Machine Learning, vol. 42, no. 1/2, pp. 31-60, Jan., 2001.