Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30127
Data Mining Approach for Commercial Data Classification and Migration in Hybrid Storage Systems

Authors: Mais Haj Qasem, Maen M. Al Assaf, Ali Rodan

Abstract:

Parallel hybrid storage systems consist of a hierarchy of different storage devices that vary in terms of data reading speed performance. As we ascend in the hierarchy, data reading speed becomes faster. Thus, migrating the application’ important data that will be accessed in the near future to the uppermost level will reduce the application I/O waiting time; hence, reducing its execution elapsed time. In this research, we implement trace-driven two-levels parallel hybrid storage system prototype that consists of HDDs and SSDs. The prototype uses data mining techniques to classify application’ data in order to determine its near future data accesses in parallel with the its on-demand request. The important data (i.e. the data that the application will access in the near future) are continuously migrated to the uppermost level of the hierarchy. Our simulation results show that our data migration approach integrated with data mining techniques reduces the application execution elapsed time when using variety of traces in at least to 22%.

Keywords: Data mining, hybrid storage system, recurrent neural network, support vector machine.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1112183

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1371

References:


[1] Nijim, M., Zong, Z., Qin, X., & Nijim, Y. (2010, September). Multi-layer prefetching for hybrid storage systems: algorithms, models, and evaluations. InParallel Processing Workshops (ICPPW), 2010 39th International Conference on (pp. 44-49). IEEE.
[2] Nijim, M., Lee, Y., Yilmazer, N., & Seker, R. (2011). A Data Mining Algorith for Multi Level Prefetching in Storage Systems.
[3] Rodan, Ali, and Peter Tiňo. "Minimum complexity echo state network." Neural Networks, IEEE Transactions on 22.1 (2011): 131-144.
[4] Jaeger, H. (2002). Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the" echo state network" approach. GMD-Forschungszentrum Informationstechnik.
[5] Rodan, A., & Tiňo, P. (2012). Simple deterministically constructed cycle reservoirs with regular jumps. Neural computation, 24(7), 1822-1852.
[6] Tong, S., & Koller, D. (2002). Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research,2, 45-66.
[7] Al Assaf, M. M., Jiang, X., Abid, M. R., & Qin, X. (2013). Eco-storage: A hybrid storage system with energy-efficient informed prefetching. Journal of Signal Processing Systems, 72(3), 165-180.
[8] Al Assaf, M. M. (2015). Predictive Prefetching for Parallel Hybrid Storage Systems. International Journal of Communications, Network and System Sciences, 8(05), 161.
[9] Al Assaf, M. M., Alghamdi, M., Jiang, X., Zhang, J., & Qin, X. (2012, August). A pipelining approach to informed prefetching in distributed multi-level storage systems. In Network Computing and Applications (NCA), 2012 11th IEEE International Symposium on (pp. 87-95). IEEE.
[10] Jiang, X., Al Assaf, M. M., Zhang, J., Alghamdi, M. I., Ruan, X., Muzaffar, T., & Qin, X. (2013). Thermal modeling of hybrid storage clusters. Journal of Signal Processing Systems, 72(3), 181-196.
[11] Sania IOTTA RepositoryDataset-,Lasr trace machine01, http://iotta.snia.org/traces/list/Subtrace?parent=LASR+Traces.
[12] Sania IOTTA Repository, Dataset-Lasr trace machine06, http://iotta.snia.org/traces/list/Subtrace?parent=LASR+Traces.