Quality of Experience assessment methodologies in next generation video compression standards Jing LI University of Nantes, France
3D viewing experience Depth rendering Visual discomfort 2
Ultra-HD viewing experience 480p 720p 1080p 4K Image quality Naturalness Immersiveness
Outline What is QoE (Quality of Experience)? How to measure QoE? Why traditional quality methods do not work? Solution? International efforts towards standardization
Definition Quality of Experience (QoE) is the degree of delight or annoyance of the user of an application or service. It results from the fulfillment of his or her expectations with respect to the utility and/or enjoyment of the application or service in the light of the user s personality and current state. Qualinet white paper, 2012 P. Le Callet, S.Möller,A.Perkis, "Qualinet White Paper on Definitions of Quality of Experience (2012)", European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003), (Version 1.1), 2012.
Quality of Experience (QoE) in 3DTV/UHDTV Visual experience [Chen2012] Naturalness Depth rendering 2D Image quality Depth quantity Visual discomfort 6 Chen, W., Fournier, J., Barkowsky, M., & Le Callet, P. (2012). Exploration of Quality of Experience of Stereoscopic Images: Binocular Depth. VPQM.
Subjective assessment methodology for QoE 2D image quality assessment [P.910][BT.500] ACR (Absolute Category Rating) DSCQS (Double-Stimulus Continuous Quality Scale) SSCQE (Single Stimulus Continuous Quality Evaluation) Visual discomfort assessment in 3D [BT.2021] 5 4 Very comfortable comfortable ACR 5 4 3 2 1 excellent good fair poor bad 3 2 1 Mildly uncomfortable uncomfortable Extremely uncomfortable 7
Example: scale interpretation & observer variability A Co-joint ACR experiment for visual comfort and image quality in 3DTV [Engelke2011] Engelke, Ulrich, Yohann Pitrey, and Patrick Le Callet. "Towards an inter-observer analysis framework for multimedia quality assessment." Third International Workshop on Quality of Multimedia Experience (QoMEX), IEEE, 2011. 8
Example: scale interpretation & observer variability A Co-joint ACR experiment for visual comfort and image quality in 3DTV [Engelke2011] Engelke, Ulrich, Yohann Pitrey, and Patrick Le Callet. "Towards an inter-observer analysis framework for multimedia quality assessment." Third International Workshop on Quality of Multimedia Experience (QoMEX), IEEE, 2011. 9
Example: scale interpretation & observer variability A Co-joint ACR experiment for visual comfort and image quality in 3DTV [Engelke2011] Engelke, Ulrich, Yohann Pitrey, and Patrick Le Callet. "Towards an inter-observer analysis framework for multimedia quality assessment." Third International Workshop on Quality of Multimedia Experience (QoMEX), IEEE, 2011. 10
Example: scale interpretation & observer variability A Co-joint ACR experiment for visual comfort and image quality in 3DTV [Engelke2011] Ulrich Engelke, Yohann Pitrey, and Patrick Le Callet. "Towards an inter-observer analysis framework for multimedia quality assessment." Third International Workshop on Quality of Multimedia Experience (QoMEX), IEEE, 2011. 11
Subjective assessment methodology for 3D quality Subjects are not always capable of expressing their perceptions or impression by means of an exact numerical value. ACR 5 4 3 2 1 excellent good fair poor bad 12
Observer Contexte dependency judgment must be link with the experience If you have never watched a 3D movie, it will be hard to judge the quality of 3D videos. We are not used to 3D or Ultra HD.
Language Interpretation On the quality scale, does good in English equal to bon in French?
Subjective assessment methodology for QoE A good alternative Subjects are not always capable of expressing their perceptions or impression by means of an exact numerical value. ACR Pair Comparison 5 4 3 excellent good fair alternative A B - Which one do you prefer? QoE 2 poor 1 bad Easier to understand & implement 15
Standardized Pair comparison To obtain accurate estimates on mean score Pair Comparison: All possible pairs are compared N stimuli N(N-1)/2 pairs [ITU-T P.910] A1 A2 A3 A4 All observers Pair comparison matrix A1 A2 A3 A4 A1-5 3 2 A2 5-4 1 A3 7 6-5 A1 to A4 are four different stimuli A4 8 9 5 - e.g., for pair (A1 A3) 3 out of 10 observations chose A1 7 out of 10 observations chose A3 16
Paired Comparison model: Converting paired comparison data to scale values Pair comparison matrix A1 A2 A3 A4 A1-5 3 2 A2 5-4 1 A3 7 6-5 A4 8 9 5 - Thurstone Model Bradley-Terry Model A4 A3 A2 A1 BT model also provides: Confidence intervals Goodness of model fit Hypothesis test 17
The number of trials Limitation of Pair Comparison in the real application All pairs have to be compared 5000 4000 780 trials = 351 minutes!!! 3000 2000 1000 780 y= x (x-1)/2 For a ACR test: 40 stimuli *15s = 10 minutes! 0 0 20 40 60 80 100 Number of stimuli For each comparison (trial): A1 (10s)+gray(2s)+A2 (10s)+voting (5s) = 27s/trial 18
Boosting Pair Comparison Select a subset of the whole pairs for comparison Efficient: The selected pairs should provide more information on the final scale values than other pairs. Balanced: The occurrence frequency of each stimulus is equal. to avoid any bias effects from presentation frequency of a particular stimulus Robust: The selection of the pairs would be more robust to observation errors that happen in a subjective test [LiICIP12]. Jing Li, Marcus Barkowsky, Patrick Le Callet, "Analysis and improvement of a paired comparison method in the application of 3DTV subjective experiment", ICIP, 2012. 19
Adaptive Rectangular Design (ARD) A1 A2 A3 A4 A5 A6 A7 A8 A9 Initiation: - Put the stimuli index into a matrix randomly - Only compare the stimuli which are in the same column or same row A6 A1 A5 A2 A4 A3 B-T model A8 A9 A7 Rearrange the matrix A6 A1 A5 A9 A7 A2 A8 A3 A4 Run pair comparison A6 A1 A5 A9 A7 A2 A8 A3 A4 A6 A5 A1 A4 A2 A3 A8 A9 A7 Final result 20 Jing Li, Marcus Barkowsky, Patrick Le Callet, "Subjective assessment methodology for Preference of Experience in 3DTV", IEEE IVMSP, 2013.
Adaptive Rectangular Design (ARD) A1 A2 A3 A4 A5 A6 A7 A8 A9 B-T model Time complexity: From N 2 to N N A6 A1 A5 A2 A4 A3 A8 A9 A7 Rearrange the matrix A6 A1 A5 A9 A7 A2 A8 A3 A4 Run pair comparison A6 A1 A5 A9 A7 A2 A8 A3 A4 Balanced Efficient Robust A6 A5 A1 A4 A2 A3 A8 A9 A7 Final result 21 Jing Li, Marcus Barkowsky, Patrick Le Callet, "Subjective assessment methodology for Preference of Experience in 3DTV", IEEE IVMSP, 2013.
Subjective assessment methodology - Questionnaire Simulator sickness questionnaire (SSQ) for visual discomfort [Kennedy et al., 1993] Pre-exposure SSQ Post-exposure SSQ 1. General discomfort None Slight Moderate Severe 2. Fatigue None Slight Moderate Severe 3. Boredom None Slight Moderate Severe 4. Drowsiness None Slight Moderate Severe 5. Headache None Slight Moderate Severe 6. Eyestrain None Slight Moderate Severe 7. Difficulty focusing None Slight Moderate Severe Kennedy, Robert S., et al. "Simulator sickness questionnaire: An enhanced method for quantifying simulator sickness." The international journal of aviation psychology 3.3 (1993): 203-220.
Subjective assessment methodology - Questionnaire Simulator sickness questionnaire (SSQ) for visual discomfort Pre-exposure SSQ Post-exposure SSQ Descriptive quality evaluation methods 1) In training session, develop the vocabulary (attribute list) used in the subjective test 2) In main test session, evaluate (rate) the test stimuli based on the attributes in the vocabulary. the contribution of the attributes on the quality of the test stimuli can be analyzed. D. Strohmeier, S. Jumisko-Pyykkö, K. Kunze, "Open profiling of quality: a mixed method approach to understanding multimodal quality perception," Advances in Multimedia, vol. 2010, pp. 1-28, 2010.
Objective psychophysical measurement EEG(Electroencephalogram) EMG(Electromyography) fmri(functional Magnetic Resonance Imaging) EEG EMG fmri
EEG signal In [Kim2011] the power of the EEG signals in beta frequency was significantly higher when watching 3D contents compared to 2D contents fmri Eye blinking [Kim2011dsp] strong activation at the frontal eye field (FEF) -Y.J. Kim et.al. EEG Based Comparative Measurement of Visual Fatigue Caused by 2D and 3D Displays. HCI International 2011 Posters Extended Abstracts -D.C.Kim, et al. "Human brain response to visual fatigue caused by stereoscopic depth perception." Digital Signal Processing (DSP), 2011 -J.Li, et.al, Visual discomfort is not always proportional to eye blinking rate: exploring some effects of planar and in-depth motion on 3D QoE, VPQM, 2013.
Efforts towards International Standards IEEE P3333.1 WG - Quality Assessment of Three Dimensional (3D) Contents based on Psychophysical Studies Working Group ITU and VQEG - Recommendations of display requirement, visual fatigue, and subjective assessment methodology on 3DTV. Qualinet Task forces for different work groups focusing on QoE related issues.
Thanks for your attention!