Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32727
The Influence of Audio on Perceived Quality of Segmentation

Authors: Silvio R. R. Sanches, Bianca C. Barbosa, Beatriz R. Brum, Cléber G.Corrêa


In order to evaluate the quality of a segmentation algorithm, the researchers use subjective or objective metrics. Although subjective metrics are more accurate than objective ones, objective metrics do not require user feedback to test an algorithm. Objective metrics require subjective experiments only during their development. Subjective experiments typically display to users some videos (generated from frames with segmentation errors) that simulate the environment of an application domain. This user feedback is crucial information for metric definition. In the subjective experiments applied to develop some state-of-the-art metrics used to test segmentation algorithms, the videos displayed during the experiments did not contain audio. Audio is an essential component in applications such as videoconference and augmented reality. If the audio influences the user’s perception, using only videos without audio in subjective experiments can compromise the efficiency of an objective metric generated using data from these experiments. This work aims to identify if the audio influences the user’s perception of segmentation quality in background substitution applications with audio. The proposed approach used a subjective method based on formal video quality assessment methods. The results showed that audio influences the quality of segmentation perceived by a user.

Keywords: Background substitution, influence of audio, segmentation evaluation, segmentation quality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 267


[1] N. Austin, R. Hampel, and A. Kukulska-Hulme, “Video conferencing and multimodal expression of voice: Children’s conversations using Skype for second language development in a telecollaborative setting,” System, vol. 64, no. 47, pp. 87–103, 2017.
[2] T. Shimizu and H. Onaga, “Study on acoustic improvements by sound-absorbing panels and acoustical quality assessment of teleconference systems,” Applied Acoustics, vol. 139, no. November 2017, pp. 101–112, 2018.
[3] Personify Inc, “ChromaCam,” 2019, Accessed 26 Jun 2019.
[4] S. R. R. Sanches, R. Nakamura, V. F. da Silva, R. Tori, V. F. Silva, and R. Tori, “Bilayer Segmentation of Live Video in Uncontrolled Environments for Background Substitution: An Overview and Main Challenges,” IEEE Latin America Transactions, vol. 10, no. 5, pp. 2138–2149, sep 2012.
[5] A. Parolin, G. P. Fickel, C. R. Jung, T. Malzbender, and R. Samadani, “Bilayer video segmentation for videoconferencing applications,” in Proceedings of the IEEE International Conference on Multimedia and Expo – ICME 2011. Washington, DC, USA: IEEE Computer Society, 2011, pp. 1–6.
[6] V. D. Bergh and V. Lalioti, “Software chroma keying in an immersive virtual environment,” South African Computer Journal, vol. 24, pp. 155–162, 11 1999.
[7] A. Criminisi, G. Cross, A. Blake, and V. Kolmogorov, “Bilayer segmentation of live video,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition - CVPR ’06, vol. 1. Washington, DC, USA: IEEE Computer Society, Jun 2006, pp. 53–60.
[8] P. Yin, A. Criminisi, J. Winn, and I. Essa, “Bilayer segmentation of webcam videos using tree-based classifiers,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 30–42, 2011.
[9] H. Huang, X. Fang, Y. Ye, S. Zhang, and P. L. Rosin, “Practical automatic background substitution for live video,” Computational Visual Media, vol. 3, no. 3, pp. 273–284, 2017.
[10] S. R. R. Sanches, D. M. Tokunaga, V. F. Silva, and R. Tori, “Subjective video quality assessment in segmentation for augmented reality applications,” in 2012 14th Symposium on Virtual and Augmented Reality, May 2012, pp. 46–55.
[11] E. D. Gelasca, T. Ebrahimi, M. C. Q. Farias, M. Carli, and S. K. Mitra, “Annoyance of spatio-temporal artifacts in segmentation quality assessment
[video sequences],” in Image Processing, 2004. ICIP ’04. 2004 International Conference on, vol. 1, Oct 2004, pp. 345–348.
[12] E. Gelasca and T. Ebrahimi, “On evaluating video object segmentation quality: A perceptually driven objective metric,” IEEE Journal of Selected Topics in Signal Processing, vol. 3, no. 2, pp. 319–335, april 2009.
[13] Y. Wang, P.-M. Jodoin, F. Porikli, J. Konrad, Y. Benezeth, and P. Ishwar, “CDnet 2014: An Expanded Change Detection Benchmark Dataset,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, jun 2014, pp. 393–400.
[14] S. R. R. Sanches, V. F. Silva, R. Nakamura, and R. Tori, “Objective assessment of video segmentation quality for augmented reality,” in Proceedings of IEEE International Conference on Multimedia and Expo – ICME 2013. Washington, DC, USA: IEEE Computer Society, 2013, pp. 1–6.
[15] S. R. R. Sanches, A. C. Sementille, R. Tori, R. Nakamura, and V. Freire, “PAD: a perceptual application-dependent metric for quality assessment of segmentation algorithms,” Multimedia Tools and Applications, Aug 2019.
[16] J. J. McDonald, W. A. Teder-Sa ` Eleja ` Ervi, and S. A. Hillyard, “Involuntary orienting to sound improves visual perception,” Nature, vol. 407, no. 6806, pp. 906–908, 2000.
[17] J. Driver and T. Noesselt, “Multisensory interplay reveals crossmodal influences on ‘sensory-specific’brain regions, neural responses, and judgments,” Neuron, vol. 57, no. 1, pp. 11–23, 2008.
[18] J. Vroomen and B. d. Gelder, “Sound enhances visual perception: cross-modal effects of auditory organization on vision.” Journal of experimental psychology: Human perception and performance, vol. 26, no. 5, p. 1583, 2000.
[19] P. Dalton and C. Spence, “Attentional capture in serial audiovisual search tasks,” Perception & Psychophysics, vol. 69, no. 3, pp. 422–438, 2007.
[20] S. R. R. Sanches, C. Oliveira, A. C. Sementille, and V. Freire, “Challenging situations for background subtraction algorithms,” Applied Intelligence, vol. 49, no. 5, pp. 1771–1784, May 2019.
[21] Universit´e de Sherbrooke, “ChangeDetection.NET – a video database for testing change detection algorithms,” 2019, Accessed 20 Jun 2019.
[22] N. Goyette, P. M. Jodoin, F. Porikli, J. Konrad, and P. Ishwar, “ A new change detection benchmark dataset,” in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, June 2012, pp. 1–8.
[23] S. R. R. Sanches, V. F. Silva, R. Nakamura, and R. Tori, “Objective assessment of video segmentation quality for augmented reality,” in 2013 IEEE International Conference on Multimedia and Expo (ICME), July 2013, pp. 1–6.
[24] S. R. R. Sanches, V. F. da Silva, and R. Tori, “Bilayer segmentation augmented with future evidence,” in Proceedings of the 12th International Conference on Computational Science and Its Applications - Volume Part II, ser. ICCSA’12. Berlin, Heidelberg: Springer-Verlag, 2012, pp. 699–711.
[25] P. L. Rosin and E. Ioannidis, “Evaluation of global image thresholding for change detection,” Pattern Recognition Letters, vol. 24, no. 14, pp. 2345–2356, 2003.
[26] Q. Li, Y.-M. Fang, and J.-T. Xu, “A novel spatial pooling strategy for image quality assessment,” Journal of Computer Science and Technology, vol. 31, no. 2, pp. 225–234, Mar 2016.
[27] X. Huang, J. Søgaard, and S. Forchhammer, “No-reference pixel based video quality assessment for hevc decoded video,” Journal of Visual Communication and Image Representation, vol. 43, no. C, pp. 173–184, 2017.
[28] ITU-R, “Methodology for the subjective assessment of the quality of television pictures,” Geneva, Switzerland, 2009, Accessed 1 March 2019.
[29] ITU-R, “International telecommunications union – committed to connecting the world,” 2019, Accessed 21 February 2019.
[30] J. G. Beerends and F. E. De Caluwe, “The influence of video quality on perceived audio quality and vice versa,” Journal of the Audio Engineering Society, vol. 47, no. 5, pp. 355–362, 1999.
[31] S. Jumisko-Pyykkö, J. Hakkinen, and G. Nyman, “Experienced quality factors: qualitative evaluation approach to audiovisual quality,” Multimedia on Mobile Devices 2007, vol. 6507, no. February, p. 65070M, 2007.
[32] J. Wang and M. F. Cohen, “Image and video matting: A survey,” Found. Trends. Comput. Graph. Vis., vol. 3, no. 2, pp. 97–175, Jan. 2007.
[33] F. Kozamernik, V. Steinmann, P. Sunna, and E. Wyckens, “Samviq – a new ebu methodology for video quality evaluations in multimedia,” SMPTE Motion Imaging Journal, vol. 114, no. 4, pp. 152–160, april 2005.
[34] S. Péchard, R. Pépion, and P. L. Callet, “Suitable methodology in subjective video quality assessment: a resolution dependent paradigm,” in Proceedings of the International Workshop on Image Media Quality and its Applications – IMQA2008, 2008, pp. 1–6.
[35] European Broadcasting Union, “EBU – european broadcasting union,” 2019, Accessed 22 February 2019.