HISTOGRAM BASED APPROACH FOR NON- INTRUSIVE SPEECH QUALITY MEASUREMENT IN NETWORKS

Similar documents
Conversational Speech Quality - The Dominating Parameters in VoIP Systems

Contents. Sevana Voice Quality Analyzer Copyright (c) 2009 by Sevana Oy, Finland. All rights reserved.

INTERNATIONAL TELECOMMUNICATION UNION

Voice Activity Detection for Speech Enhancement Applications

Factors impacting the speech quality in VoIP scenarios and how to assess them

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

Nonuniform multi level crossing for signal reconstruction

Quantification of audio quality loss after wireless transfer By

Speech Quality in modern Network-Terminal Configurations

Speech quality for mobile phones: What is achievable with today s technology?

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Call Quality Measurement for Telecommunication Network and Proposition of Tariff Rates

Development of a Compact Matrix Converter J. Bauer

COM 12 C 288 E October 2011 English only Original: English

GSM Interference Cancellation For Forensic Audio

Differential Image Compression for Telemedicine: A Novel Approach

ing. Vasile Petrică, Drd. ing. Sorin Soviany*

Agilent Technologies VQT Undercradle J4630A

RECOMMENDATION ITU-R BS

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Perceptual wideband speech and audio quality measurement. Dr Antony Rix Psytechnics Limited

Enhancing 3D Audio Using Blind Bandwidth Extension

End-to-End Speech Quality Testing in a Complex Transmission Scenario

DESIGN OF VOICE ALARM SYSTEMS FOR TRAFFIC TUNNELS: OPTIMISATION OF SPEECH INTELLIGIBILITY

INTERNATIONAL TELECOMMUNICATION UNION

INTERNATIONAL TELECOMMUNICATION UNION

Experimental Evaluation of the Impact of Network Frequency Synchronization on GSM Quality of Service During Handover

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

RECOMMENDATION ITU-R F *, ** Signal-to-interference protection ratios for various classes of emission in the fixed service below about 30 MHz

NOISE ESTIMATION IN A SINGLE CHANNEL

THE TELECOMMUNICATIONS industry is going

Rec. ITU-R F RECOMMENDATION ITU-R F *,**

The Association of Loudspeaker Manufacturers & Acoustics International presents

-/$5,!4%$./)3% 2%&%2%.#% 5.)4 -.25

DWT based high capacity audio watermarking

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat

ETSI TR V1.1.1 ( )

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Evaluation of Audio Compression Artifacts M. Herrera Martinez

Objectives. Abstract. This PRO Lesson will examine the Fast Fourier Transformation (FFT) as follows:

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

14 fasttest. Multitone Audio Analyzer. Multitone and Synchronous FFT Concepts

Bass Extension Comparison: Waves MaxxBass and SRS TruBass TM

FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE

Jitter Analysis Techniques Using an Agilent Infiniium Oscilloscope

Laboratory Assignment 5 Amplitude Modulation

PHYSICS 107 LAB #9: AMPLIFIERS

Low Bit Rate Speech Coding Using Differential Pulse Code Modulation

Fourier Analysis of Smartphone Call Quality. Zackery Dempsey Advisor: David McIntyre Oregon State University 5/19/2017

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio

MAXXSPEECH PERFORMANCE ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION

ENGR 210 Lab 12: Sampling and Aliasing

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Technical Report Speech and multimedia Transmission Quality (STQ); Speech samples and their usage for QoS testing

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

3D Distortion Measurement (DIS)

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

PARAMETER-BASED SPEECH QUALITY MEASURES FOR GSM

ViSQOL: an objective speech quality model

Acoustic Beamforming for Hearing Aids Using Multi Microphone Array by Designing Graphical User Interface

Twelve voice signals, each band-limited to 3 khz, are frequency -multiplexed using 1 khz guard bands between channels and between the main carrier

Review of recent standardization activities in speech quality of experience

Acoustic echo cancellers for mobile devices

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts

Lecture Outline. Data and Signals. Analogue Data on Analogue Signals. OSI Protocol Model

ANALYSIS OF REAL TIME AUDIO EFFECT DESIGN USING TMS320 C6713 DSK

EE482: Digital Signal Processing Applications

Comparison of Low-Rate Speech Transcoders in Electronic Warfare Situations: Ambe-3000 to G.711, G.726, CVSD

Sampling and Reconstruction

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

NXDN Signal and Interference Contour Requirements An Empirical Study

Screen shots vary slightly according to Windows version you have.

ROBUST echo cancellation requires a method for adjusting

SERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Methods for objective and subjective assessment of quality

Chapter 2: Digitization of Sound

Transcoding free voice transmission in GSM and UMTS networks

TE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION

Reducing comb filtering on different musical instruments using time delay estimation

Online Game Quality Assessment Research Paper

Agilent VQT Portable Analyzer J1981B Data Sheet

Advances in voice quality measurement in modern telecommunications

RF/IF Terminology and Specs

Panasonic, 2 Channel FFT Analyzer VS-3321A. DC to 200kHz,512K word memory,and 2sets of FDD

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

A Spread Spectrum Network Analyser

Reconfigurable Low-Power Continuous-Time Sigma-Delta Converter for Multi- Standard Applications

1 line

Low frequency section: 500 Watts continuous 1,000 Watts program 2,000 Watts peak

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

Terminology (1) Chapter 3. Terminology (3) Terminology (2) Transmitter Receiver Medium. Data Transmission. Simplex. Direct link.

COMB-FILTER FREE AUDIO MIXING USING STFT MAGNITUDE SPECTRA AND PHASE ESTIMATION

Crowdsourcing and Its Applications on Scientific Research. Sheng Wei (Kuan Ta) Chen Institute of Information Science, Academia Sinica

Testing Triple Play Services Over Open Source IMS Solution for Various Radio Access Networks

m+p Analyzer Revision 5.2

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Acoustic signal processing via neural network towards motion capture systems

EC 551 Telecommunication System Engineering. Mohamed Khedr

Transcription:

Abstract HISTOGRAM BASED APPROACH FOR NON- INTRUSIVE SPEECH QUALITY MEASUREMENT IN NETWORKS Neintrusivní měření kvality hlasových přenosů pomocí histogramů Jan Křenek *, Jan Holub * This article describes the usage of histograms for speech audio quality assessment in GSM and other networks. Abstrakt Článek popisuje způsob použití histogramů pro neintrusivní hodnocení kvality hlasových přenosů v GSM a jiných sítích. Introduction The networks, such as GSM, UMTS, Tetra or Local Area Network (with proper VoIP software) represent technology used for speech transmission. Transmitted voice signal experiences a set of distortions on its way through the communication channel from a receiver to transmitter. Each distortion type (e.g. attenuation, noise, delay, echo, packet loss, clipping, jitter etc.) could cause considerable voice quality degradation. Along with the need for quality improvement goes the need for a methodology for its measurement. Two different methods are described below followed up with a deeper focus on histogram based nonintrusive voice quality measurement. Voice quality measurement There are two different methods for voice quality measurement: Intrusive and Nonintrusive. Intrusive method Based on a comparison of both, original and transmitted signal using proper algorithm, e.g. ITU-T P862 (PESQ). The usage of intrusive method gives more accurate results in comparison with the quality assessed by the average listener that is acquired from listening tests. High cost and time consumption forms a space for the second, Non-intrusive method. Non-intrusive method The non-intrusive method estimates the quality just from the transmitted sample! It is easy to see the difficulty of such an algorithm to give reliable results. The main advantage of the non-intrusive method is cost efficiency ( unlimited number of speech samples can be * Ing. Jan Krenek, Department of Measurement, CTU in Prague, FEL, Technicka 2, 162 27 Prague tel.: +420 2 2435 2187, fax.: +420 2 3119 929, e-mail: krenekj@fel.cvut.cz * Doc. Ing. Jan Holub, Department of Measurement, CTU in Prague, FEL, Technicka 2, 162 27 Prague tel.: +420 2 2435 2131, fax.: +420 2 3119 929, e-mail: holubjan@fel.cvut.cz 68

assessed for the quality for the total accuracy improvement), the measurement is conducted within real network data and states (as the call establishment is not necessary unlike in the intrusive method). Nevertheless, some distortion types, like the harmonic distortion of some codec s (e.g. ADPCM) cannot be detected, due to the lack of the original sample. It is therefore recommended to use a combination of both, intrusive and non-intrusive methods. MOS scale When discussing quality of voice transmission, clarification of the term QUALITY is expedient. Surprisingly, the term is not defined unambiguously. Considering the signal transmission, the quality is treated as the level of similarity of both the transmitted and received sample. In a view of human perception, the quality indicators could be: clarity, delay, noise, level, drop-outs, etc. For the purpose of the quality assessment, the MOS (Mean Opinion Score) scale is widely used, which corresponds to the meaning of the average listener. It is, therefore, a subjective assessment. MOS scale 5 Excellent 4 Good 3 Fair 2 Poor 1 Bad Tab. 1 MOS scale Unlike the standard custom, the best excellent grade is a 5, whereas the worst bad grade is a 1. Histogram based non-intrusive algorithm for voice transmission quality measurement For the algorithm development, a proper library of speech samples is essential. Such a library was at our disposal. It consists of four original speech samples made by two male and female professional speakers. These samples were distorted by clipping (simulating the distortion caused by a Voice Activity Detector used in mobile phones), jitter, noise and filtering. In total, the library contains 40 samples (4 original and 36 distorted). Quality of those distorted samples was assessed during listening tests, which are considered as an etalon of value that is basically desirable by every quality appraising algorithm to achieve. The first step is to obtain histograms of all samples. Such an algorithm can work as the one in Fig. 1. The examined sample is at first normalized in amplitude and re-sampled to an 8kHz sample rate. After that, the whole sample is divided into 16ms packets. Using FFT, amplitude spectrum is computed for every packet. Depending on the type of each packet (active speech-voice or pause-noise) determined by a Voice Activity Detector, the sum of the amplitude spectra related to the packet type is computed. Sum of voice packets HIST_voice, sum of noise packets HIST_noise and sum of both HIST were examined for invention of parameters with relation to quality. 69

Load sample, sample normalization 1 If speech sample rate 8kHz, resample Voice Activity Detection noise voice LOOP START Until end of sample HIST_voice = HIST_voice + ABS(FFT(packet*hann)) HIST_noise = HIST_noise + ABS(FFT(packet*hann)) Take first or next packet of 16ms from the sample LOOP END HIST = HIST + ABS(FFT(packet* hann)) Sample noise power computation plot HIST plot HIST_voice plot HIST_noise 1 Fig. 1 Simplified flowchart for Histograms creation The simplified flowchart in Fig. 1 was more complex in real in order to provide other histograms creation. The expanded flowchart gives 4 more histograms HIST_voice_delta, HIST_voice_delta_delta, HIST_noise_delta and HIST_voice_delta_delta. Delta ( ) represents that difference between the amplitude spectrum of the actual packet and the next packet (with 50% overlap) was used for the histogram computation. In the double delta variant, there is a difference from differences made in the previous step, also with a 50% overlap. The reason for making such differences with a 50% overlap is that they can increase resolution and highlight possible parameters. Fig. 2 Histogram of pure sample of MOS=5 70

Fig. 3 Histogram of clipping distorted sample of MOS=4.88 Fig. 4 Histogram of clipping distorted sample of MOS=2.76 There are frequency bins on the horizontal axis (with increasing position towards the right, the frequency increases). The vertical axis is the axis of amplitudes. Counts are represented by the color scale, from white to black. The presented histograms are actually three dimensional figures presented in two dimensions. In Figures 2 4, sample histograms can be seen. The first histogram in Fig.2 is a histogram of the sample with the best quality (MOS=5). In Fig. 3 the histogram of the 71

sample is just a bit distorted by clipping with MOS=4.88. Fig. 4 shows a histogram significantly distorted by clipping (MOS=2.76). It can be seen even at first glance that the number of counts in first row (from bottom) of histograms increases with decreasing quality. On further review, it can be seen that the ratio between for example five frequency bins from the middle of the second row and the middle five from the first row is bigger than 1 for relatively high quality. This ratio decreases with the quality. The procedure of searching for parameters applied on all speech samples and all (=seven) histograms leads to spotting many desirable parameters. Such parameters were spotted and could be used as an input for a neural network in order to find out which of them are suitable to use and how each should affect the result of the algorithm. However, high computational requirements for all histogram creation made usage of such an algorithm unsuitable for real-time quality assessments. The primary advantage of the non-intrusive method is its speed. Further investigations to speed up the computations are in progress at present. Consideration is being given whether the usage of a Voice Activity Detector and the computation of all seven histograms for all samples are necessary. Conclusion In this paper, a novel approach for voice transmission quality assessment in a nonintrusive way was presented. Starting with brief introduction of VTQoS (Voice Transmission Quality of Service) measurement, principle of proposed histogram based algorithm was described. Even though this approach has shown its functionality, further research is necessary for the performance improvement. References [1] KŘENEK, J. Systém pro intrusivní měření kvality přenosu hlasu v sítích GSM s lokalizací polohy, Diploma Thesis, CTU FEL 2004. [2] VEAUX, Ch. BARRAIC, V. Perceptually Motivated Non-Intrusive Assessment of Speech Quality, Measurement of Speech and Audio Quality in Networks, Proceedings, Prague 2002, ISBN 80-01-02515-2 [3] BERNEX, E. BARRAIC, V. Architecture of non-intrusive perceived voice quality assessment, Measurement of Speech and Audio Quality in Networks, Proceedings, Prague 2002, ISBN 80-01-02515-2 72