Perceptual wideband speech and audio quality measurement. Dr Antony Rix Psytechnics Limited

Similar documents
Technical Report Speech and multimedia Transmission Quality (STQ); Speech samples and their usage for QoS testing

Quality comparison of wideband coders including tandeming and transcoding

Deriving Equipment Impairment Factors for Wideband Speech Codecs

Speech Quality Evaluation of Artificial Bandwidth Extension: Comparing Subjective Judgments and Instrumental Predictions

Conversational Speech Quality - The Dominating Parameters in VoIP Systems

TECHNICAL REPORT Speech and multimedia Transmission Quality (STQ); Speech samples and their use for QoS testing

ITU-T P.863. Amendment 1 (11/2011)

Acoustics of wideband terminals: a 3GPP perspective

ADVANCED NON-INTRUSIVE VOICE QUALITY TESTING

Speech Quality Assessment for Wideband Communication Scenarios

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec

-/$5,!4%$./)3% 2%&%2%.#% 5.)4 -.25

SERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Methods for objective and subjective assessment of quality

INTERNATIONAL TELECOMMUNICATION UNION

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Transcoding free voice transmission in GSM and UMTS networks

INTERNATIONAL TELECOMMUNICATION UNION

Wideband Speech Encryption Based Arnold Cat Map for AMR-WB G Codec

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

PARAMETER-BASED SPEECH QUALITY MEASURES FOR GSM

Quantification of audio quality loss after wireless transfer By

Technical Specification Group Services and System Aspects Meeting #7, Madrid, Spain, March 15-17, 2000 Agenda Item: 5.4.3

Instrumental Assessment of Near-end Perceived Listening Effort

Artificial Bandwidth Extension Using Deep Neural Networks for Spectral Envelope Estimation

3GPP TS V5.0.0 ( )

Final draft ETSI EG V1.1.1 ( )

The Association of Loudspeaker Manufacturers & Acoustics International presents

Bandwidth Extension for Speech Enhancement

Practical Limitations of Wideband Terminals

Advances in voice quality measurement in modern telecommunications

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat

COM 12 C 288 E October 2011 English only Original: English

ETSI EG V1.3.1 ( ) ETSI Guide

OPTIMAL SPECTRAL SMOOTHING IN SHORT-TIME SPECTRAL ATTENUATION (STSA) ALGORITHMS: RESULTS OF OBJECTIVE MEASURES AND LISTENING TESTS

The Emergence, Introduction and Challenges of Wideband Choice Codecs in the VoIP Market

ARTICLE IN PRESS. Signal Processing

Final draft ETSI EG V1.2.1 ( )

Enhancing 3D Audio Using Blind Bandwidth Extension

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

ETSI TS V ( )

Test Report. 4 th ITU Test Event on Compatibility of Mobile Phones and Vehicle Hands-free Terminals th September 2017

35"*%#4)6% 0%2&/2-!.#%!33%33-%.4 /& 4%,%0(/.%"!.$!.$ 7)$%"!.$ $)')4!, #/$%#3

Analytical Analysis of Disturbed Radio Broadcast

Factors impacting the speech quality in VoIP scenarios and how to assess them

3GPP TS V ( )

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

ETSI TS V1.2.1 ( )

ETSI TR V1.1.1 ( ) Technical Report

ON THE PERFORMANCE OF WTIMIT FOR WIDE BAND TELEPHONY

Combining Subjective and Objective Assessment of Loudspeaker Distortion Marian Liebig Wolfgang Klippel

Call Quality Measurement for Telecommunication Network and Proposition of Tariff Rates

Speech communication in cars goes wideband the new ITU-T T Focus Group CarCom

ETSI TR V ( )

Pre- and Post Ringing Of Impulse Response

Proceedings of Meetings on Acoustics

ETSI TS V (201

INTERNATIONAL TELECOMMUNICATION UNION

3GPP TS V ( )

Gerhard Schmidt / Tim Haulick Recent Tends for Improving Automotive Speech Enhancement Systems. Geneva, 5-7 March 2008

ETSI TS V1.5.1 ( )

Telephone Speech Quality Standards. for. Wideband IP Phone Terminals (handsets) CES-Q March 30, 2009

ETSI TS V ( )

Bandwidth Efficient Mixed Pseudo Analogue-Digital Speech Transmission

ETSI EG V1.4.1 ( )

3GPP TS V ( )

3GPP TS V ( )

ETSI TS V1.1.1 ( )

ARIB STD-T64-C.S0018-D v1.0

3GPP TS V ( )

Speech Quality in modern Network-Terminal Configurations

ETSI TS V8.0.0 ( ) Technical Specification

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Super-Wideband Fine Spectrum Quantization for Low-rate High-Quality MDCT Coding Mode of The 3GPP EVS Codec

Technical Report Speech and multimedia Transmission Quality (STQ); Adaptation of the ETSI QoS Model to better consider results from field testing

Near-end Listening Enhancement Algorithms

Improving Sound Quality by Bandwidth Extension

ETSI TS V ( )

ETSI TR V1.1.1 ( )

Transcoding of Narrowband to Wideband Speech

ETSI TS V ( )

Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders

ETSI EG V1.6.1 ( )

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Agilent Technologies VQT Undercradle J4630A

ETSI TS V ( )

Review of recent standardization activities in speech quality of experience

ETSI TS V5.2.0 ( )

3GPP TS V ( )

Wideband Speech Coding & Its Application

RECOMMENDATION ITU-R F *, ** Signal-to-interference protection ratios for various classes of emission in the fixed service below about 30 MHz

3GPP TS V4.2.0 ( )

ROBUST echo cancellation requires a method for adjusting

ETSI TS V ( )

RECOMMENDATION ITU-R BS User requirements for audio coding systems for digital broadcasting

Open Access Improved Frame Error Concealment Algorithm Based on Transform- Domain Mobile Audio Codec

ing. Vasile Petrică, Drd. ing. Sorin Soviany*

The relation between perceived apparent source width and interaural cross-correlation in sound reproduction spaces with low reverberation

SERIES P: TERMINALS AND SUBJECTIVE AND OBJECTIVE ASSESSMENT METHODS Voice terminal characteristics

Do We Need Dereverberation for Hand-Held Telephony?

The psychoacoustics of reverberation

Transcription:

Perceptual wideband speech and audio quality measurement Dr Antony Rix Psytechnics Limited

Agenda Background Perceptual models BS.1387 PEAQ P.862 PESQ Scope Extension to wideband Performance of wideband PESQ Results for speech Results for audio Next steps discussion AMR-WB case study 2

Psytechnics background Solutions for measuring/monitoring speech, audio, video quality Extensive subjective testing background Main products are objective quality models (software) Intrusive (P.862 PESQ, ) for testing Non-intrusive (P.VTQ/psyVoIP, P.563 SEAM/NiQA, P.562 CCI) for monitoring Experience in wideband in both subjective testing and objective models (PAMS, PESQ). 3

BS.1387 PEAQ High-quality audio model for small impairments Comparable with BS.1116 subjective tests General audio model, not designed or optimised for wideband speech Mobile/IP multimedia is at edge of or outside scope Some issues with accuracy (see BS.1387 for results). Not currently applicable to 16kHz wideband speech 4

P.862 PESQ Speech quality model for telephony applications Comparable with P.800 subjective tests Assumes listening through narrowband IRS handset Was not extensively tested on perceptual waveform codecs (e.g. MP3, AAC) or with non-speech signals Not currently applicable to 16kHz wideband speech or audio 5

P.862 PESQ scope Reference signal System under test Level align Input filter Time align and equalise Auditory transform Disturbance processing Cognitive modelling Prediction of perceived speech quality Degraded signal Level align Input filter Auditory transform Identify bad intervals Re-align bad intervals 6

P.862 PESQ scope Reference signal System under test Level align Input filter Time align and equalise Auditory transform Disturbance processing Cognitive modelling Prediction of perceived speech quality Degraded signal Level align Input filter Auditory transform Identify bad intervals Gain (db) 20 10 0 10 20 30 40 PESQ input filter Re-align bad intervals Scope assumes narrowband telephone handset listening, and speech signals 50 0 1000 2000 3000 4000 7

Extending PESQ for wideband speech & audio Reference signal System under test Level align Input filter Time align and equalise Auditory transform Disturbance processing Cognitive modelling Prediction of perceived speech quality Degraded signal Level align Input filter Auditory transform Identify bad intervals 20 10 0 PESQ wideband input filter Re-align bad intervals Modification proposed in COM12-D7: Gain (db) 10 20 30 40 50 0 1000 2000 3000 4000 5000 6000 7000 8000 Input filter replaced by 100Hz highpass with 9dB additional gain. No other changes (e.g. same psychoacoustic model). 8

Use of WPESQ Select wideband mode whenever headphone listening is used Also operates at 8kHz sampling rate (same filter frequency response) Be careful about mixing narrowband and wideband PESQ binaural headphone listening is more sensitive, so the results are different Reference signal should normally be full bandwidth 9

WPESQ results speech 5 4.5 P.905 PESQ vs. subjective quality, exp1 ρ=95.2% Eurescom P905 exp1 Multiple audio bandwidths Q S W PE v n a e. o d i t n o d c p e M ap 4 3.5 3 2.5 2 Wideband codec Narrowband codec Wideband MNRU Narrowband MNRU 5 4.5 P.905 PESQ vs. subjective quality, exp2a ρ=98.1% 1.5 1 1 1.5 2 2.5 3 3.5 4 4.5 5 Subjective condition MOS Q S W PE v n a e. o d i t n o d c p e M ap 4 3.5 3 2.5 2 Codec A, error-free Codec A, packet loss Codec B, error-free Codec B, packet loss Narrowband MNRU Eurescom P905 exp2a 8kHz conditions only 1.5 1 1 1.5 2 2.5 3 3.5 4 4.5 5 Subjective condition MOS 10

WPESQ results speech 5 4.5 P.905 PESQ vs. subjective quality, exp2b ρ=97.7% Eurescom P905 exp2b 16kHz conditions only Q S W PE v n a e. o d i t n o d c p e M ap 4 3.5 3 2.5 2 1.5 1 1 1.5 2 2.5 3 3.5 4 4.5 5 Subjective condition MOS Codec C, error-free Codec C, packet loss Codec D, error-free Codec D, packet loss Wideband MNRU Q S W PE. n ave o d i t n o d c p e M ap 5 4 3 2 ρ=94.9% All conditions BT AES experiment Multiple audio bandwidths 1 1 2 3 4 5 Subjective condition MOS 11

WPESQ results NTT Morioka & Takahashi have published an independent evaluation of wideband PESQ Wideband results: 91.2% correlation Main issue is slight offset between G.722.1 and other conditions will be investigated further Problem with analysis used narrow-band PESQ for 8kHz (wideband headphone) conditions although WPESQ should be used for this. This caused offset between 8kHz and 16kHz conditions Wideband PESQ is more critical than narrowband 8kHz and overall results not included here. 12

WPESQ results audio New subjective test by Psytechnics using: 8 audio signals representative of PC and mobile multimedia (advertisement, movies, news documentary, pop music, speech, sports), of duration 8-12sec 20 conditions Range of codecs (AAC, AMR, G.711, G.722, and direct) Range of bandwidths (8, 11.025, 12, 16kHz sample rates) Presented to subjects and model at 16kHz, mono Wideband binaural free field equalised headphones at 76dB SPL Bit-rates from 4.75-256kbit/s 13

WPESQ results audio 14

WPESQ results overall Test P905 exp 1 (speech) P905 exp 2a (speech) P905 exp 2b (speech) AES107 (speech) NTT wideband results (speech) Psytechnics multimedia (16kHz mono audio) Overall mean R % 95.2 98.1 97.7 94.9 91.2 95.2 95.4 15

WPESQ discussion WPESQ shows excellent correlation with MOS, comparing favourably with narrowband PESQ. Explore issues identified in P905 exp1 and NTT test: Bandwidth and context effect G.722.1 codec Can be used for both wideband speech and 16kHz mono audio e.g. mobile multimedia applications Mapping between WPESQ and subjective MOS is required (like P.862.1 MOS-LQO). 16

Case study Validation of AMR-WB (G.722.2) floating-point codec Fixed-point AMR-WB codec had been approved; needed to validate non-bit-exact floating-point version Used WPESQ to compare speech quality of codecs over 1280 test cases. Identified bug in fixed-point codec mode-switching Showed bug was corrected in floating-point and modified fixed-point codecs Found no significant difference in quality between (corrected) fixed-point and floating-point codecs. Took just 2 days of processing and analysis. 17

Conclusions BS.1387 PEAQ and P.862 PESQ not originally designed for wideband speech quality measurement By changing PESQ to use an appropriate input filter, WPESQ is able to make accurate quality measurements of wideband speech and 16kHz audio WPESQ allows interesting new applications in wideband speech and 16kHz audio quality testing, such as codec development, multimedia quality Some issues with subjective tests remain to be explored and further testing is desirable. 18

References ITU-T P.800. Methods for subjective determination of transmission quality. Aug 1996. Rix, A. W. and Hollier, M. P. Perceptual speech quality assessment from narrowband telephony to wideband audio. 107th AES Convention, New York, preprint 5018, September 1999. ITU-R BS.1387. Method for objective measurements of perceived audio quality. January 1999. ITU-T P.862. Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Feb 2001. Eurescom P905. AQUAVIT - Assessment of Quality for Audio-Visual signals over Internet and UMTS Rix, A. W. et al. Proposed modification to draft P.862 to allow PESQ to be used for quality assessment of wideband speech. ITU-T COM12-D007, Feb 2001. Morioka, C. and Takahashi, A. Performance evaluation of the wideband PESQ algorithm. ITU-T COM12-D187, April 2004. Barrett, P. A. and Rix, A. W. Verification of floating-point implementation of AMR-WB using Wideband-PESQ. 3GPP Tdoc S4 (02)0049r1 and S4 (02)0124, Feb 2002. 19

Dr Antony Rix Psytechnics Limited antony.rix@psytechnics.com www.psytechnics.com