Enhanced Waveform Interpolative Coding at 4 kbps

Size: px

Start display at page:

Download "Enhanced Waveform Interpolative Coding at 4 kbps"

Thomas Hood
6 years ago
Views:

1 Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara [oded, gersho]@scl.ece.ucsb.edu Signal Compression Lab., University of California, Santa Barbara 1

2 Description of Waveform-Interpolative coder Our WI Coder s Novel Techniques AbS SEW VQ AbS Dispersion-Phase VQ Pitch search for transitions AbS Switched-Predictive Gain VQ Subjective Tests Summary Demo Outline Signal Compression Lab., University of California, Santa Barbara 2

3 Description of Waveform-Interpolative coder Our WI Coder s Novel Techniques AbS SEW VQ AbS Dispersion-Phase VQ Pitch search for transitions AbS Switched-Predictive Gain VQ Subjective Tests Summary Demo Outline Signal Compression Lab., University of California, Santa Barbara 3

4 Waveform Interpolation (WI) Voiced speech is nearly periodic. Succeeding pitch cycles, have a slowly evolving shape. A continuously evolving sequence of pitch cycle waveforms can be generated. Extract a subsequence of these waveforms ( subsampling ) for quantization, then synthesis of speech can be performed by interpolating missing waveforms ( upsampling ). Signal Compression Lab., University of California, Santa Barbara 4

5 Waveform Decomposition Slowly Evolving Waveform (SEW) - voiced component, is coded: (1) at low temporal resolution, (2) high spectral resolution, (3) with spectral masking distortion. Rapidly Evolving Waveform (REW) - unvoiced component, is coded: (1) at high temporal, (2) low spectral resolution, (3) with spectral and temporal masking. The system is universal for all speech sounds (no V/UV classification needed) Signal Compression Lab., University of California, Santa Barbara 5

6 Waveform Decomposition (cont d( cont d) A speech segment typically contains both voiced and unvoiced attributes. Different perceived character of voiced and unvoiced components suggests a separation of the components and distinct perceptually based coding. Signal Compression Lab., University of California, Santa Barbara 6

7 Waveform Extraction Time Pitch-Normalized Cycle Phase Signal Compression * From Kleijn Lab., University and Haagen, of California, ICASSP95 Santa Barbara 7

8 Waveform Decomposition Signal Compression Lab., University of California, Santa Barbara 8 * From Kleijn and Haagen, ICASSP95

9 Waveform Synthesis φ () π t = φ ( t ) d p() τ τ m + 2 t t m Instantaneous phase contour t m-2 t m-1 Time t m t m+1 t m+2 Pitch-Normalized Cycle Phase Signal Compression Lab., University of California, Santa Barbara 9 * From Kleijn and Haagen, ICASSP95

10 WI Encoder Speech S(n) A(z) Residual r(n) Waveform Extraction Waveform Normalization DFT LPC Analysis Pitch Extraction Waveform Alignment LPC Interpolator Pitch Interpolator Waveform Power Delay LPC Quantizer Pitch Quantizer CW Waveform Decomposition Local Decoder AbS AGC REW SEW 10 Gain Quantizer REW Quantizer SEW Quantizer Signal Compression Lab., University of California, Santa Barbara 10

11 WI Decoder Delay Waveform Interpolation SEW Decoder REW Decoder Waveform Alignment 10 SEW Phase REW Phase + Power Normalization x IDFT Power Decoder Overlap- Add Interpolation Pitch Decoder LPC Decoder Pitch Interpolator LPC Interpolator p(n) Phase Contour φ (n) 1 A(z) Reconst. Residual Reconst. Speech Signal Compression Lab., University of California, Santa Barbara 11

12 Continuous Waveform Interpolation Over the interpolation interval t t m t m+1, the continuous reconstructed excitation signal is given by the time dependent Fourier series: rt () = [ ] K() t ( ( t)) Ak( tm) + ( t) Ak( tm ) cos( k ( t)) α α 1 φ [ t B t t B t ] k = 0 ( 1 α( )) k( m) + α( ) k( m ) sin( k ( t)) + 1 φ π φ() t = φ( tm ) + 2 dτ p( τ ) t t m where φ(t), is the instantaneous phase contour, and α(t) is some (increasing) interpolation function in the range 0 α(t) 1, which can be a simple linear function of the time, t, or of the pitchnormalized phase φ. Signal Compression Lab., University of California, Santa Barbara 12

13 Description of Waveform-Interpolative coder Our WI Coder s Novel Techniques AbS SEW VQ AbS Dispersion-Phase VQ Pitch search for transitions AbS Switched-Predictive Gain VQ Subjective Tests Summary Demo Outline Signal Compression Lab., University of California, Santa Barbara 13

14 SEW Quantization SEW is downsampled - quantized - upsampled Non-ideal filters used: aliasing + mirroring Distortion most notable in the transitions How can we improve quantization? Signal Compression Lab., University of California, Santa Barbara 14

15 SEW Filter Bank CW C 0 (t) LPF h(t) Anti-Aliasing (Decomposition) SEW M M LPF Interpolation Quantized SEW e j ω 0 t C 1 (t) LPF h(t) Anti-Aliasing (Decomposition) M SEW VQ M LPF Interpolation X SEW excitation + e j ω 0 Kt C K (t) LPF h(t) Anti-Aliasing (Decomposition) M M LPF Interpolation X Signal Compression Lab., University of California, Santa Barbara 15

16 Waveform Based SEW AbS LPC Analysis LPC Interpolation Speech A(z) Residual Waveform Extraction + Alignment + Decomposition Pitch Extraction r 1 r M+L-1 A 1 A M+L-1 Waveform Codebooks z -1 r^ 0 ^r M Waveform Synthesizer ( M Interpolation +Lookahead extrapolation) ~ r Lookahead ~ r M+L W 1 (z) A 1 (z) W M+L-1 (z) A M+L-1 (z) * 2 * 2 [1-α(t M+L-1 )] 2 + min D wi M - Number of waveforms per frame L - Number of lookahead waveforms Signal Compression Lab., University of California, Santa Barbara 16

17 SEW Optimization Examples 5000 Original 1 x 104 Original Amplitude 0 Amplitude Optimized x 104 Optimized Amplitude 0 Amplitude Non-optimized x 104 Non-optimized Amplitude 0 Amplitude Time (sec) Time (sec) Signal Compression Lab., University of California, Santa Barbara 17

18 Description of Waveform-Interpolative coder Our WI Coder s Novel Techniques AbS SEW VQ AbS Dispersion-Phase VQ Pitch search for transitions AbS Switched-Predictive Gain VQ Subjective Tests Summary Demo Outline Signal Compression Lab., University of California, Santa Barbara 18

19 Observations: Phase Quantization Phase is of secondary perceptual significance No efficient phase quantization scheme is known. Two extremes: Waveform coders (CELP) implicitly allocate a perceptually-excessive number of bits to the phase. In parametric coders the phase information is commonly not transmitted: How can the phase be quantized efficiently? Signal Compression Lab., University of California, Santa Barbara 19

20 Dispersion Phase ϕ Dispersion Phase: Pitch cycle extracted from residual signal cyclically shifted such that its pulse is located at position zero. resulting DFT phase: ϕ ϕ determines (along with the magnitude), the waveform s pulse shape. Signal Compression Lab., University of California, Santa Barbara 20

21 AbS Phase Dispersion Quantization Pitch-Cycle Waveform s DFT Crude Linear- Phase Alignment Refined Linear- Phase Alignment Magnitude r ^ r^ x - Codebook + r Phase Codebook e jϕ^ W(z) A(z) Pitch min * 2 Signal Compression Lab., University of California, Santa Barbara 21

22 Vector Quantization Design Variable-Dimension VQ Eight pitch-range dependent codebooks Pitch changes over time cause the quantizer to switch among the pitch-range codebooks. Solution: overlapped training clusters used, to achieve smooth phase variations whenever such switch occurs. 0-6 bit codebooks were designed and tested. Signal Compression Lab., University of California, Santa Barbara 22

23 Vector Quantization Design (cont d( cont d) Range for... codebook #1 Range for codebook #7 Range for codebook # Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 K Number of harmonics (vector dimension) Cluster 7 Cluster 8 Signal Compression Lab., University of California, Santa Barbara 23

24 Phase Code-Vector Examples Codebook #1 Codebook #2 Codebook #3 Codebook # Codebook #5 Codebook #6 Codebook #7 Codebook # Signal Compression Lab., University of California, Santa Barbara 24-3

25 Segmental Weighted SNR of 14 Phase VQ Seg. Weighted SNR: db Non-MIRS (Flat) MIRS Phase Bits Signal Compression Lab., University of California, Santa Barbara 25

26 Subjective Results Subjective Score 50.00% 45.00% 40.00% 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% Female 4bit VQ Male Extracted No Preference 1 2 Male Results of subjective A/B test for comparison between the 4-bit phase VQ, and male extracted fixed phase. Signal Compression Lab., University of California, Santa Barbara 26

27 Description of Waveform-Interpolative coder Our WI Coder s Novel Techniques AbS SEW VQ AbS Dispersion-Phase VQ Pitch search for transitions AbS Switched-Predictive Gain VQ Subjective Tests Summary Demo Outline Signal Compression Lab., University of California, Santa Barbara 27

28 Pitch Search Pitch search robustness needed for: high and low pitch transitions segments with rapidly varying pitch Our suggested solutions: Combined temporal/spectral domain search Higher rate temporal domain search Varying boundaries based search Average pitch - using normalized correlation as weighting Signal Compression Lab., University of California, Santa Barbara 28

29 Pitch Search Algorithm Speech Spectral domain pitch search + tracker 100 Hz No Good Pitch? Yes Weighted speech Temporal domain pitch refinement 500 Hz Temporal domain pitch search No Good Pitches? Good Pitches? Yes 500 Hz Yes No Use 4 ms waveform length Weighted-Average Pitch Signal Compression Lab., University of California, Santa Barbara Hz

30 Description of Waveform-Interpolative coder Our WI Coder s Novel Techniques AbS SEW VQ AbS Dispersion-Phase VQ Pitch search for transitions AbS Switched-Predictive Gain VQ Subjective Tests Summary Demo Outline Signal Compression Lab., University of California, Santa Barbara 30

31 Switched-Predictive Gain VQ using Temporal Weighting VQ of Log-Gain (no gain down/up-sampling). Temporal weighting used for quantization avoids smear of plosives and onsets. Switched-Predictive VQ allows different gain predictors, and reduces outliers. Switched DC levels improve performance. Signal Compression Lab., University of California, Santa Barbara 31

32 Switched-Predictive AbS Log-Gain Gain VQ g(m) + DC Codebook D i Predictor Codebook Vector Quantizer Codebook P i c ij (m) Synthesis Filter 1 1 z P i 1 ^ t(m) + t(m) Temporal Weighting min * 2 Signal Compression Lab., University of California, Santa Barbara 32

33 Bit Allocation of EWI Coder Parameter Bits / Frame Bits / second LPC Pitch 2x6= Gain 2x6= REW SEW magnitude SEW phase Total Frame size: 20 ms Two subframes per frame Signal Compression Lab., University of California, Santa Barbara 33

34 Description of Waveform-Interpolative coder Our WI Coder s Novel Techniques AbS SEW VQ AbS Dispersion-Phase VQ Pitch search for transitions AbS Switched-Predictive Gain VQ Subjective Tests Summary Demo Outline Signal Compression Lab., University of California, Santa Barbara 34

35 A/B subjective Test 14 Listeners 12 male + 12 female sentences. MIRS filtered speech. Test 4 kbps WI 4 kbps MPEG-4 Female 65% 35% Male 62% 38% Total 64% 36% With 95% certainty the WI preference lies in [59%, 69%] WI is preferred in % certainty. Signal Compression Lab., University of California, Santa Barbara 35

36 A/B subjective Test (cont d( cont d) Test 4 kbps WI 5.3 kbps G Female 58% 42% Male 61% 39% Total 60% 40% With 95% certainty the WI preference lies in [54%, 65%] WI is preferred in 99.98% certainty. Test 4 kbps WI 6.3 kbps G Female 55% 45% Male 53% 47% Total 54% 46% With 95% certainty the WI preference lies in [49%, 59%] WI is preferred in 92.21% certainty. Signal Compression Lab., University of California, Santa Barbara 36

37 Summary AbS VQ for the SEW enhances coding efficiency. AbS VQ for the dispersion phase improves speech naturalness and waveform matching. New pitch search improves accuracy and robustness. Switched-Predictive Gain VQ avoids gain smear, and reduces outliers. Enhanced WI coder slightly exceeds quality of G at 6.3 kbps. Signal Compression Lab., University of California, Santa Barbara 37

38 References [1] O. Gottesman, Dispersion Phase Vector Quantization For Enhancement of Waveform Interpolative Coder, IEEE ICASSP 99, vol. 1, pp , [2] O. Gottesman and A. Gersho, "Enhanced Waveform Interpolative Coding at 4 kbps", IEEE Workshop on Speech Coding Proceedings, pp , 1999, Finland. [3] O. Gottesman and A. Gersho, "Enhanced Analysis-by- Synthesis Waveform Interpolative Coding at 4 kbps", EUROSPEECH 99, 1999, Hungary. Signal Compression Lab., University of California, Santa Barbara 38

39 SCW 99 Best Paper Award Signal Compression Lab., University of California, Santa Barbara 39

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder Jing Wang, Jingg Kuang, and Shenghui Zhao Research Center of Digital Communication Technology,Department of Electronic