Unsupervised Outlier Detection in Streaming Data Using Weighted Clustering
Authors: Yogita, Durga Toshniwal
Abstract:
Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data, both density based and partitioning clustering are combined for outlier detection. In this scheme partitioning clustering is also used to assign weights to attributes depending upon their respective relevance and weights are adaptive. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.
Keywords: Concept Evolution, Irrelevant Attributes, Streaming Data, Unsupervised Outlier Detection.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1081627
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2640References:
[1] J. Han and M. Kamber, Data Mining: Concepts and Techniques, J. Kacprzyk and L. C. Jain, Eds. Morgan Kaufmann, 2006, vol. 54, no. Second Edition.
[2] Yogita and D. Toshniwal, "A framework for outlier detection in evolving data streams by weighting attributes in clustering," in Proceedings of the 2nd International Conference on Communication Computing and Security, India, 2012.
[3] S. Ramaswamy, R. Rastogi, and K. Shim, "Efficient algorithms for mining outliers from large data sets," in Proceedings of the 2000 ACM SIGMOD international conference on Management of data, ser. SIGMOD -00. New York, NY, USA: ACM, 2000, pp. 427-438.
[4] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, "Lof: identifying density-based local outliers," in Proceedings of the 2000 ACM SIGMOD international conference on Management of data, ser. SIGMOD -00. New York, NY, USA: ACM, 2000, pp. 93-104.
[5] Z. He, X. Xu, and S. Deng, "Discovering cluster based local outliers," Pattern Recognition Letters, vol. 2003, pp. 9-10, 2003.
[6] M. Elahi, K. Li, W. Nisar, X. Lv, and H. Wang, "Efficient clusteringbased outlier detection algorithm for dynamic data stream," in Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 05, ser. FSKD -08. Washington, DC, USA: IEEE Computer Society, 2008, pp. 298-304.
[7] F. Angiulli and F. Fassetti, "Detecting distance-based outliers in streams of data," in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, ser. CIKM -07. New York, NY, USA: ACM, 2007, pp. 811-820.
[8] S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos, "Online outlier detection in sensor data using nonparametric models," in Proceedings of the 32nd international conference on Very large data bases, ser. VLDB -06. VLDB Endowment, 2006, pp. 187-198.
[9] M. S. Sadik and L. Gruenwald, DBOD-DS : Distance Based Outlier Detection for Data Streams. Springer, 2011, vol. 6261, p. 122136.
[10] F. Angiulli, S. Basta, and C. Pizzuti, "Distance-based detection and prediction of outliers," IEEE Trans. on Knowl. and Data Eng., vol. 18, no. 2, pp. 145-160, Feb. 2006.
[11] L. Duan, L. Xu, Y. Liu, and J. Lee, "Cluster-based outlier detection," Annals of Operations Research, vol. 168, pp. 151-168, 2009.
[12] M. B. Al-Zoubi, "An effective clustering-based approach for outlier detection," European Journal of Scientific Research, vol. 28, pp. 310- 316, 2009.
[13] J. Z. Huang, M. K. Ng, H. Rong, and Z. Li, "Automated variable weighting in k-means type clustering," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 5, pp. 657-668, May 2005.
[14] A. Frank and A. Asuncion, "UCI machine learning repository," 2010. (Online). Available: http://archive.ics.uci.edu/ml
[15] F. T. Liu, K. M. Ting, and Z.-H. Zhou, "Isolation-based anomaly detection," ACM Trans. Knowl. Discov. Data, vol. 6, no. 1, pp. 3:1- 3:39, Mar. 2012.