Orchestra/Percussion Classification Algorithm for United Speech Audio Coding System
Authors: Yueming Wang, Rendong Ying, Sumxin Jiang, Peilin Liu
Abstract:
Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.
Keywords: ID3 Decision Tree, MFCC, Orchestra/Percussion Classification, USAC
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1086545
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1682References:
[1] Jeongook Song, Hyen-o Oh, Hong-Goo Kong, "Enhanced long-term predictor for Unified Speech and Audio Coding”, Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pp: 505-508, 22-27 May 2011.
[2] Extended Adaptive Multi-Rate-Wideband (AMR-WB+) codec; Transcoding functions (Release 9), 3GPP TS 26.304 V6.2.0, 2005-03.
[3] Martin Dietz, Lars Liljeryd, Kristofer Kjörling and Oliver Kunz, "Spectral Band Replication, a novel approach in audio coding”, In 112th AES Convention, Munich, May, 2002.
[4] Nagel, F.; Disch, S., "A harmonic bandwidth extension method for audio codecs",Acoustics, Speech and Signal Processing, ICASSP. IEEE International Conference on, vol., no., pp.145-148, 19-24 April 2009.
[5] E. Aylon. Automatic detection and classification ofdrum kit sounds. Master’s thesis, Universitat PompeuFabra, 2006.
[6] S. Z. Li, "Content-based audio classification and retrieval using thenearest feature line method,” IEEE Trans. Speech Audio Process., vol.8, no. 5, pp. 619–625, Sep. 2000.
[7] ISO/IEC Working Group: MPEG-7 overview. URLhttp:// www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm (2004) Accessed8.2.2006.
[8] Lin, C.-C.; Chen, S.-H.;Truong, T.-K.; Chang, Y., "Audio Classification and Categorization Based on Wavelets and Support Vector Machine”, Speech and Audio Processing, IEEE Transactions on,Volume: 13, Issue: 5,pp. 644-651, Sept. 2005.
[9] Eigenfeldt, A., Pasquier, P. 2009. "Realtime Selection of Percussion Samples Through Timbral Similarity in Max/MSP”, in Proceedings of ICMC.
[10] Hyoung-Gook Kim, Commun. Syst. Group, Technische Univ. Berlin, Germany Sikora, T."Comparison of MPEG-7 audio spectrum projection features and MFCC applied to speaker recognition, sound classification and audio segmentation”, Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on, Volume5 pp- 925-8 vol.5, 17-21 May 2004.
[11] J. R. Quinlan, "Learning efficient classification procedures and the irapplication to chess end games”, Machin eLearning: An Artificia lIntelligence Approach,Vol.1,pp.463-482, Toiga, Palo Alto, CA, 1983.
[12] E. Aylon. "Automatic detection and classification ofdrum kit sounds.”, Master’s thesis, Universitat PompeuFabra, 2006.
[13] Stevens, Stanley Smith; Volkman; John; Newman, Edwin B. "A scale for the measurement of the psychological magnitude pitch". Journal of the Acoustical Society of America 8 (3): 185–190. 1937.