Practical Method for Digital Music Matching Robust to Various Sound Qualities
Authors: Bokyung Sung, Jungsoo Kim, Jinman Kwun, Junhyung Park, Jihye Ryeo, Ilju Ko
Abstract:
In this paper, we propose a practical digital music matching system that is robust to variation in sound qualities. The proposed system is subdivided into two parts: client and server. The client part consists of the input, preprocessing and feature extraction modules. The preprocessing module, including the music onset module, revises the value gap occurring on the time axis between identical songs of different formats. The proposed method uses delta-grouped Mel frequency cepstral coefficients (MFCCs) to extract music features that are robust to changes in sound quality. According to the number of sound quality formats (SQFs) used, a music server is constructed with a feature database (FD) that contains different sub feature databases (SFDs). When the proposed system receives a music file, the selection module selects an appropriate SFD from a feature database; the selected SFD is subsequently used by the matching module. In this study, we used 3,000 queries for matching experiments in three cases with different FDs. In each case, we used 1,000 queries constructed by mixing 8 SQFs and 125 songs. The success rate of music matching improved from 88.6% when using single a single SFD to 93.2% when using quadruple SFDs. By this experiment, we proved that the proposed method is robust to various sound qualities.
Keywords: Digital Music, Music Matching, Variation in Sound Qualities, Robust Matching method.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1072587
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1375References:
[1] G. Skobeltsy, T. Luu, I. P. Zarko, M.
[2] Rajman and K. aberer, "Query-Driven Indexing for Peer-toPeer Text Retrieval," Proc. 16th International World Wide Web Conference, Canada, 2007, pp. 1185-1186.
[3] B. Stein, "Fuzzy-Fingerprints for Text-Based Information Retrieval," Proc. I-KNOW 05th, 2005, pp. 572-579.
[4] Y. Peng, C.W. Ngo, C. Fang, X. Chen and J. Xiao, "Audio Similarity Measure by Graph Modeling and Matching," Proc. 14th annual ACM international conference on Multimedia, USA, 2006, pp. 603-606.
[5] F. Kurth and M. Muller, "Efficient Index-Based Audio Matching," IEEE Trans. Audio, Speech and Language processing, vol. 16, no. 2, pp. 382-395, Feb. 2008.
[6] P. Roos and B. Manaris, "A Music Information Retrieval Approach Based on Power Laws," Proc. 19th IEEE ICTAI, Greece, Oct. 2007, pp. 29-31.
[7] Z. W. Ras, X. Zhang and R. Lewis, "MIRAI: Multi-hierarchical, FS-Tree Based Music Information Retrieval System," LNAI 4585, pp. 80-89, 2007.
[8] M. I. Mandel, D. P. W. Ellis, "Multiple-Instance Learning for Music Information Retrieval," Proc. ISMIR 2008.
[9] S. Wabnik, G. Schuller, J. Hirschfeld and U. Kraemer, "Different quantization noise shaping methods for predictive audio coding," Proc. IEEE International Conference on Acoustics, Speech and Signal processing, France, 2006.
[10] M. Park, H. R. Kim and S. H. Yang, "Frequency-Temporal Filtering for a Robust Audio Fingerprinting Scheme in Real-Noise Environments," ETRI Journal, vol. 28, no. 4, pp. 509-512, Aug. 2006.
[11] S. Hamawaki, S. Funasawa, J. Katto, H. Ishizaki, K. Hoashi and Y. Takishima, "Feature Analysis and Normalization Approach for Robust Content-Based Music Retrieval to Encoded Audio with Different Bit Rates," LNCS 5371, 2008.
[12] D. Giuliani, M. Gerosa and F. Brugnara, "Improved automatic speech recognition through speaker normalization," Computer Speech and Language, vol. 20, pp. 107-123, 2006.
[13] C. C. Toh, B. Zhang and Y. Wang, "Multiple feature fusion based onset detection for solo singing voice," Proc. ISMIR 2008.
[14] R. Zhou and J. D. Reiss, "Music Onset detection combining energy-based and pitch-based approaches," Proc. MIREX Audio Onset Detection Contest, 2007.
[15] W. Pan, Y. Yao, Z. Liu and W. Huang, "Audio Classification in a Weighted SVM," Proc. ISCIT07, 2007.
[16] A. J. eronen, V. T. Peltonen, J. T. Tuomi, A. P. Klapuri, S. Fagerlund, T. Sorsa, G. Lorho and J. Huopaniemi, "Audio-Based Contest Recognition," IEEE Trans. Audio, Speech and Language processing, vol. 14, no. 1, pp. 321-329, Jan. 2006.
[17] A. Farina, "Assessment of Hearing Damage when listening to music through a personal digital audio player," Journal of the Acoustical Society of America, 2008.
[18] J. E. M. Exposito, S. G. Galan, N. R. Reyes and P. V. Candeas, "Adaptive network-based fuzzy inference system vs. other classification algorithms for warped LPC-based speech/music discrimination," Engineering Applications of Artificial Intelligence Journal, vol. 20, pp. 783-793, 2007.
[19] M. K. S. Khan and W. G. A. Khatib, "Machine-learning based classification of speech and music," Multimedia Systems Journal, vol. 12, no. 1, pp. 55-67, 2006.
[20] H. Zhou, A. Sadka and R. M. Jiang, "Feature Extraction for Speech and Music Discrimination," Proc. 6th International Workshop on Content-Based Multimedia Indexing, UK, Jun. 2008.
[21] G. J. A. Hunter and K. Zienowicz and A.I Shihab, "The Use of Mel Cepstral Coefficients and Markov Models for the Audomatic Identification, Classification and Sequence Modeling of Salient Sound Events Occurring During Tennis Matches," Journal of the Acoustical Society of America, vol. 123, issue. 5, pp. 3431, 2008.
[22] K. M. Indrebo, R. J. Povinelli and M. T. Johnson, "Minimum Mean-Squared Error Estimation Mel-Frequency Cepstral Coefficients Using a Novel Distortion Model," IEEE Trans. Audio, Speech and Language processing, vol. 16, no. 8, pp. 1654-1661, Nov. 2008.
[23] A. H. Nour-Eldin and P. Kabal, "Mel-Frequency Cepstral Coefficient-Based Bandwidth Extension of Narrowband Speech," Proc. InterSpeech, Brisbane, 2008.
[24] N. Sato and Y. Obuchi, "Emotion Recognition using Mel-Frequency Cepstral Coefficients," Journal of Natural Language Processing, vol. 14, no. 4, pp. 83-96, 2007.
[25] J. Bergstra and N. Casagrande, "Aggregate features and Adaboost for music classification," Machine Learning Journal, vol. 65, no. 2-3, pp. 473-484, Dec. 2006.
[26] E. Schubert and J. Wolfe, "Does Timbral Brightness Scale with Frequency and Spectral Centroid?," ACTA Acoustica United with Acoustica, vol. 92, pp. 820-825, 2006.
[27] T. Li and M. Ogihara, "Content-based music similarity search and emotion detection," Proc. IEEE International Conference on Acoustic, Speech and Signal processing, France, 2006.