Page 0 of 23. MELP Vocoder

Similar documents
The Channel Vocoder (analyzer):

DEPARTMENT OF DEFENSE TELECOMMUNICATIONS SYSTEMS STANDARD

A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS

Speech Synthesis; Pitch Detection and Vocoders

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

EE482: Digital Signal Processing Applications

APPLICATIONS OF DSP OBJECTIVES

Speech Compression Using Voice Excited Linear Predictive Coding

Low Bit Rate Speech Coding

Digital Speech Processing and Coding

Implementation of attractive Speech Quality for Mixed Excited Linear Prediction

Speech Synthesis using Mel-Cepstral Coefficient Feature

Overview of Code Excited Linear Predictive Coder

Enhanced Waveform Interpolative Coding at 4 kbps

Evaluation of MELP Quality and Principles Marcus Ek Lars Pääjärvi Martin Sehlstedt Lule_a Technical University in cooperation with Ericsson Erisoft AB

Chapter IV THEORY OF CELP CODING

Analysis/synthesis coding

Distributed Speech Recognition Standardization Activity

L19: Prosodic modification of speech

Linguistic Phonetics. Spectral Analysis

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Speech Enhancement using Wiener filtering

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Communications Theory and Engineering

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Cellular systems & GSM Wireless Systems, a.a. 2014/2015

Voice Excited Lpc for Speech Compression by V/Uv Classification

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Multi-Band Excitation Vocoder

Comparison of CELP speech coder with a wavelet method

Universal Vocoder Using Variable Data Rate Vocoding

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

SILK Speech Codec. TDP 10/11 Xavier Anguera I Ciro Gracia

Speech Coding using Linear Prediction

Robust Speech Processing in EW Environment

ENEE408G Multimedia Signal Processing

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

MASTER'S THESIS. Speech Compression and Tone Detection in a Real-Time System. Kristina Berglund. MSc Programmes in Engineering

UNIVERSITY OF SURREY LIBRARY

Advanced audio analysis. Martin Gasser

General outline of HF digital radiotelephone systems

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

Converting Speaking Voice into Singing Voice

The 1.2Kbps/2.4Kbps MELP Speech Coding Suite with Integrated Noise Pre-Processing

Defense Technical Information Center Compilation Part Notice

ENHANCED TIME DOMAIN PACKET LOSS CONCEALMENT IN SWITCHED SPEECH/AUDIO CODEC.

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

SOURCE-filter modeling of speech is based on exciting. Glottal Spectral Separation for Speech Synthesis

Adaptive time scale modification of speech for graceful degrading voice quality in congested networks

T a large number of applications, and as a result has

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

SPEECH AND SPECTRAL ANALYSIS

Waveform interpolation speech coding

Improving Sound Quality by Bandwidth Extension

Lesson 8 Speech coding

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Information. LSP (Line Spectrum Pair): Essential Technology for High-compression Speech Coding. Takehiro Moriya. Abstract

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding

Analog and Telecommunication Electronics

LMR Codecs Why codecs? Which ones? Why care? Joseph Rothweiler Sensicomm LLC Hudson NH

Sound Synthesis Methods

3GPP TS V8.0.0 ( )

Cepstrum alanysis of speech signals

DECOMPOSITION OF SPEECH INTO VOICED AND UNVOICED COMPONENTS BASED ON A KALMAN FILTERBANK

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

Voice mail and office automation

LOSS CONCEALMENTS FOR LOW-BIT-RATE PACKET VOICE IN VOIP. Outline

COMPARATIVE REVIEW BETWEEN CELP AND ACELP ENCODER FOR CDMA TECHNOLOGY

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

EC 2301 Digital communication Question bank

Glottal source model selection for stationary singing-voice by low-band envelope matching

Synthesis Algorithms and Validation

May A uthor -... LIB Depof "Elctrical'Engineering and 'Computer Science May 21, 1999

Transcoding of Narrowband to Wideband Speech

Pitch Period of Speech Signals Preface, Determination and Transformation

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

-voiced. +voiced. /z/ /s/ Last Lecture. Digital Speech Processing. Overview of Speech Processing. Example on Sound Source Feature

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

EE482: Digital Signal Processing Applications

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

Fundamental Frequency Detection

Introduction to Speech Coding. Nimrod Peleg Update: Oct. 2009

NOVEL PITCH DETECTION ALGORITHM WITH APPLICATION TO SPEECH CODING

CS 188: Artificial Intelligence Spring Speech in an Hour

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

Improvement of the Narrowband Linear Predictive Coder

USING A WHITE NOISE SOURCE TO CHARACTERIZE A GLOTTAL SOURCE WAVEFORM FOR IMPLEMENTATION IN A SPEECH SYNTHESIS SYSTEM

An Approach to Very Low Bit Rate Speech Coding

Telecommunication Electronics

Transcoding Between Two DoD Narrowband Voice Encoding Algorithms (LPC-10 and MELP)

Waveform Interpolation Speech Coder at 4 kb/s

Speech Signal Analysis

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Transcription:

Page 0 of 23 MELP Vocoder

Outline Introduction MELP Vocoder Features Algorithm Description Parameters & Comparison Page 1 of 23

Introduction Traditional pitched-excited LPC vocoders use either a periodic train or white noise for synthesis filter intelligible speech at very low bit rates But sometimes results in mechanical or buzzy sound and are prone to tonal noise Page 2 of 23

Introduction These problems arise from: Inability of a simple pulse train to reproduce all kind of voiced speech MELP vocoder uses a mixed-excitation model and it represents a richer ensemble of speech characteristic Produce more natural sounding speech Page 3 of 23

MELP vocoder Robust in background noise environments Based on traditional LPC model, also includes additional features Mixed excitation Aperiodic pulses Pulse dispersion Adaptive spectral enhancement Page 4 of 23

وكدر MELP كد كننده LPC LPC LSF MSVQ LPC LSF LSF 54 از صفحه 5

وكدر MELP صفحه 6 از 54

وكدر MELP محاسبه دامنه هاي تبديل فوريه FFT 54 از صفحه 7

وكدر MELP محاسبه شدت هاي صدايي و تعيين پرچم غير پريوديك L=40,41,,160 54 از صفحه 8

وكدر MELP ميزان پراكندگي نقاط اوج P=12.64 P=6.77 p 1 160 1 160 79 n80 79 n80 e 2 [ n] e[ n] P=1.16 P=1.1 54 از صفحه 9

وكدر MELP p 1 160 1 160 ميزان پراكندگي نقاط اوج 79 n80 79 n80 e 2 [ n] e[ n] 54 از صفحه 10

وكدر MELP جدول اختصاص بيت LSF 25 25 8-8 8 VS1 7 7 4-1 - - 13 1 1 54 54 54 از صفحه 11

Mixed Excitation Mixed-excitation is implemented using a multi-band mixing model This model can simulate frequency dependent voicing strength Using a mixture of Aperiodic/periodic and white noise as excitation Primary effect of this unit is to reduce the buzz in broadband acoustic noise Page 12 of 23

Aperiodic pulses When input signal is voiced, MELP vocoder can synthesize speech using either aperiodic or periodic pulses. Aperiodic pulses used during transition regions between voiced and unvoiced segments of speech signal Producing erratic glottal pulses without tonal noise Page 13 of 23

Pulse Dispersion Pulse dispersion is implemented using fixed pulse dispersion filter based on a flattened triangle pulse The pulse dispersion filter improves the match of bandpass filtered synthetic and natural speech waveforms in frequency bands which do not contain a formant resonance. Spreading the excitation energy with a pitch period Reduce harsh quality of the synthetic speech Page 14 of 23

Adaptive spectral enhancement filter Based on the poles of the vocal tract filter Is used to enhance the formant structure in the synthetic speech This filter improves the match between synthetic and natural bandpass waveforms more natural speech output Page 15 of 23

MELP Algorithm Description (Encoder) filter out any low frequency noise This filtered speech is again filtered in order to perform the initial pitch search for the pitch estimation The next step is to perform the Bandpass voicing analysis.1.2.3 - In this step we decide to use periodic/aperiodic train or white noise model Page 16 of 23

MELP Algorithm Description (Encoder) cont d In this stage A voice degree parameter is estimated in each band, based on the normalized correlation function of the speech signal and the smoothed rectified signal in the non-dc band Let s k (n) denote the speech signal in band k, u k (n) denote the DC-removed smoothed rectified signal of s k (n). The correlation N 1 function: x( n) x( n p) n0 Rx ( p) N 1 N 1 2 2 1/ 2 [ x ( n) x ( n p)] P the pitch of current frame N the frame length k the voicing strength for band (defined as max(r sk (P),R uk (P))) n0 n0 Page 17 of 23

MELP Algorithm Description (Encoder ) cont d The jittery state is determined by the peakiness of the fullwave rectified LP residue e(n): Peakiness 1 [ N 1 N N 1 n0 N 1 e( n) n0 2 ] e( n) 1/ 2 If peakiness is greater than some threshold, the speech frame is then flagged as jittered (Aperiodic flag will be set) Page 18 of 23

MELP Algorithm Description (Encoder) cont d Applying a LPC analysis 4. Calculating final pitch estimate 5. Calculating Gain estimate 6. quantize the LPC coefficients, pitch, gain and 7. bandpass voicing Fourier magnitudes are determined and.8 quantized The information in these coefficients improves the accuracy of the speech production model at the perceptually-important lower frequencies Page 19 of 23

MELP Encoder Input signal Pre filter Pitch Search Bandpass Voicing Decision Gain Calculator LPC Analysis Filter Final Pitch And voicing Decision LSF quantization Quantize Gain, pitch, Voicing, jitter Fourier Magnitude calculation Apply Forward Error Correction Transmitted Bitstream Page 20 of 23

MELP Algorithm (Decoder) Decoding the pitch Applying gain attenuation Interpolating linearly all of the synthesis parameters pitch-synchronously Generating mixed-excitation.1.2.3.4 Page 21 of 23

MELP Algorithm (Decoder) cont d Applying an adaptive spectral enhancement filter LPC synthesis and applying gain factor Dispersion filtering.5.6.7 Page 22 of 23

MELP Decoder Received Bitstream Decode parameters Noise Generator Noise Shaping Filter + Adaptive Spectral Enhancement Pulse Generator Pulse Position Jitter Pulse Shaping Filter LPC Synthesis Filter gain Pulse Dispersion Filter Synthesized Speech Page 23 of 23

Parameter Quantization Parameters Voiced Unvoiced LSF parameters 25 25 Fourier magnitudes 8 - Gain (2 per frames) 8 8 Pitch. overall voicing 7 7 Bandpass voicing 4 - Aperiodic flag 1 - Error protection - 13 Sync bit 1 1 Total bits / 22.5 ms frame 54 54 Page 24 of 23

Bit transmission order Page 25 of 23

Comparison of the 2400 BPS MELP with other Standard Coders Diagnostic Acceptability Measure Two Conditions Quiet Office Continuously Variable Slope Delta Modulation (CVSD) 16,000 bps Code Excited Linear Prediction (CELP) 4800 bps FS1016 Mixed Excitation Linear Prediction (MELP) 2400 bps FIPS Publication 137 Linear Predictive Coding (LPC) 2400 bps Page 26 of 23

Comparison of the 2400 BPS MELP with other Standard Coders (cont d) Mean Opinion Score in Six Conditions Quiet Anechoic Sound Chamber Dynamic Microphone Quiet - H250 Anechoic Sound Chamber H250 Microphone 1% Random Bit Errors Anechoic Sound Chamber Dynamic Microphone 0.5% Random Block Errors Anechoic Sound Chamber Dynamic Microphone 50% Errors within a 35ms block Office Modern Office Environment Dynamic Microphone Mobile Command Environment Field Shelter EV M87 Microphone Page 27 of 23

Comparison of the 2400 BPS MELP with other Standard Coders (cont d) Complexity with three Measurements RAM ROM MIPS Page 28 of 23

Voice samples LPC 10 Page 29 of 23

Voice samples Original Sound MELP 1800 MELP 2000 MELP 2200 Page 30 of 30