A Non-Parametric Based Mapping Algorithm for Use in Audio Fingerprinting
Authors: Analise Borg, Paul Micallef
Abstract:
Over the past few years, the online multimedia collection has grown at a fast pace. Several companies showed interest to study the different ways to organise the amount of audio information without the need of human intervention to generate metadata. In the past few years, many applications have emerged on the market which are capable of identifying a piece of music in a short time. Different audio effects and degradation make it much harder to identify the unknown piece. In this paper, an audio fingerprinting system which makes use of a non-parametric based algorithm is presented. Parametric analysis is also performed using Gaussian Mixture Models (GMMs). The feature extraction methods employed are the Mel Spectrum Coefficients and the MPEG-7 basic descriptors. Bin numbers replaced the extracted feature coefficients during the non-parametric modelling. The results show that nonparametric analysis offer potential results as the ones mentioned in the literature.
Keywords: Audio fingerprinting, mapping algorithm, Gaussian Mixture Models, MFCC, MPEG-7.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1100034
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2285References:
[1] P. Cano, E. Batlle, T. Kalker and J. Haitsma, “A Review of Audio Fingerprinting,” Journal of VLSI Signal Processing Systems, vol. 41, no. 3, pp. 271-284, Nov. 2005.
[2] S. Baluja and M. Covell, “Waveprint: Efficient Wavelet-Based Audio Fingerprinting,” in Pattern Recognition, pp. 3467-3480, Nov. 2008.
[3] J. Haitsma and T. Kalker, “A Highly Robust Audio Fingerprinting System,” in Proc. Of ISMIR, 2002.
[4] Y. Ke, D. Hoiem, and R. Sukthankar, “Computer vision for music identification,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June, 2005.
[5] A. Ramalingam and S. Krishnan, “Gaussian Mixture Modeling of Short time Fourier Transform Features for Audio Fingerprinting,” IEEE Trans. Inf. Forens. Security, vol. 1, no. 4, pp. 457-463, Dec. 2006.
[6] E. Battle, J. Masip, E. Guaus and P. Cano, “Scalability issues in an HMM-based audio fingerprinting,” in Multimedia and Expo 2004. ICME ’04. 2004 Int. Conf., vol. 1, 2004, pp. 735-738.
[7] J.W. Picone, “Signal modeling techniques in speech recognition,” in Proc. of IEEE, vol. 81, no. 9, 1993, pp. 1215-1247.
[8] M. Babtan, (2009, December 23). MPEG-7 (Online). Available: http://www.cs.bilkent.edu.tr/~bilmdg/bilaudio-7/MPEG7.html.
[9] J. Bercher and C. Vignat, “Estimating the entropy of a signal with applications,” IEEE Transactions on Signal Processing, vol. 48, no. 6, pp. 1687–1694, June 2000.
[10] J. Herre, O. Hellmuth and M. Cremer, “Scalable Robust Audio Fingerprinting Using MPEG-7 Content Description,” Multimedia Signal Processing, 2002 IEEE Workshop, pp. 165-168, Dec. 2002.
[11] E. Allamanche, J. Herre, O. Helmuth, B. Fröba, T. Kasten, and M. Cremer, “Content-Based Identification of Audio Material Using Mpeg-7 Low Level Description,” Proc. of the Int. Symp. Of Music Information Retrieval, pp. 197-204, Oct. 2001.