Implementation of attractive Speech Quality for Mixed Excited Linear Prediction

Similar documents
Overview of Code Excited Linear Predictive Coder

DEPARTMENT OF DEFENSE TELECOMMUNICATIONS SYSTEMS STANDARD

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Speech Compression Using Voice Excited Linear Predictive Coding

Voice Excited Lpc for Speech Compression by V/Uv Classification

Low Bit Rate Speech Coding

Audio Signal Compression using DCT and LPC Techniques

Enhanced Waveform Interpolative Coding at 4 kbps

Page 0 of 23. MELP Vocoder

COMPARATIVE REVIEW BETWEEN CELP AND ACELP ENCODER FOR CDMA TECHNOLOGY

Digital Speech Processing and Coding

Communications Theory and Engineering

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

Audio and Speech Compression Using DCT and DWT Techniques

EE482: Digital Signal Processing Applications

Chapter IV THEORY OF CELP CODING

Waveform Encoding - PCM. BY: Dr.AHMED ALKHAYYAT. Chapter Two

APPLICATIONS OF DSP OBJECTIVES

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

The Channel Vocoder (analyzer):

Comparison of CELP speech coder with a wavelet method

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

EEE 309 Communication Theory

Chapter 2: Signal Representation

Data Transmission at 16.8kb/s Over 32kb/s ADPCM Channel

Adaptive time scale modification of speech for graceful degrading voice quality in congested networks

EEE 309 Communication Theory

Pulse Code Modulation

QUESTION BANK EC 1351 DIGITAL COMMUNICATION YEAR / SEM : III / VI UNIT I- PULSE MODULATION PART-A (2 Marks) 1. What is the purpose of sample and hold

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Speech Synthesis using Mel-Cepstral Coefficient Feature

Department of Electronics and Communication Engineering 1

Fundamentals of Digital Communication

Implementation of FSK and PSK Using On-Off Keying with MATLAB

AN AUTOREGRESSIVE BASED LFM REVERBERATION SUPPRESSION FOR RADAR AND SONAR APPLICATIONS

Time division multiplexing The block diagram for TDM is illustrated as shown in the figure

Improved signal analysis and time-synchronous reconstruction in waveform interpolation coding

Class 4 ((Communication and Computer Networks))

Department of Electrical and Electronics Engineering Institute of Technology, Korba Chhattisgarh, India

Overview of Digital Mobile Communications

SGN Audio and Speech Processing

The quality of the transmission signal The characteristics of the transmission medium. Some type of transmission medium is required for transmission:

Principles of Communications

CODING TECHNIQUES FOR ANALOG SOURCES

Key words: OFDM, FDM, BPSK, QPSK.

Signal Characteristics

EC 2301 Digital communication Question bank

The Optimization of G.729 Speech codec and Implementation on the TMS320VC5402

2.1. General Purpose Run Length Encoding Relative Encoding Tokanization or Pattern Substitution

Speech Synthesis; Pitch Detection and Vocoders

Dilpreet Singh 1, Parminder Singh 2 1 M.Tech. Student, 2 Associate Professor

CHAPTER 3 Syllabus (2006 scheme syllabus) Differential pulse code modulation DPCM transmitter

Teaching Scheme. Credits Assigned (hrs/week) Theory Practical Tutorial Theory Oral & Tutorial Total


Distributed Speech Recognition Standardization Activity

Evaluation of MELP Quality and Principles Marcus Ek Lars Pääjärvi Martin Sehlstedt Lule_a Technical University in cooperation with Ericsson Erisoft AB

UNIT TEST I Digital Communication

A SURVEY ON DICOM IMAGE COMPRESSION AND DECOMPRESSION TECHNIQUES

Image Compression Using Haar Wavelet Transform

Realization and Performance Evaluation of New Hybrid Speech Compression Technique

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

Advanced Digital Signal Processing Part 2: Digital Processing of Continuous-Time Signals

L19: Prosodic modification of speech

Transcoding of Narrowband to Wideband Speech

Image Compression Technique Using Different Wavelet Function

Communication Networks

Cellular systems & GSM Wireless Systems, a.a. 2014/2015

Comparative Analysis of WDR-ROI and ASWDR-ROI Image Compression Algorithm for a Grayscale Image

International Journal of Advanced Engineering Technology E-ISSN

Chapter 4 Digital Transmission 4.1

SNR Scalability, Multiple Descriptions, and Perceptual Distortion Measures

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

Digital Signal Processing Lecture 1

Digital Signal Processing

UNIT-1. Basic signal processing operations in digital communication

Basic Characteristics of Speech Signal Analysis

ECE Digital Signal Processing

A Novel Approach of Compressing Images and Assessment on Quality with Scaling Factor

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Comparison of Low-Rate Speech Transcoders in Electronic Warfare Situations: Ambe-3000 to G.711, G.726, CVSD

17. Delta Modulation

COMPARISON OF CHANNEL ESTIMATION AND EQUALIZATION TECHNIQUES FOR OFDM SYSTEMS

Design of Digital Filter and Filter Bank using IFIR

A GENERAL SYSTEM DESIGN & IMPLEMENTATION OF SOFTWARE DEFINED RADIO SYSTEM

Speech synthesizer. W. Tidelund S. Andersson R. Andersson. March 11, 2015

Chapter 9 Image Compression Standards

PULSE CODE MODULATION (PCM)

Speech Coding using Linear Prediction

ENGR 4323/5323 Digital and Analog Communication

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder

Systems for Audio and Video Broadcasting (part 2 of 2)

QUESTION BANK. SUBJECT CODE / Name: EC2301 DIGITAL COMMUNICATION UNIT 2

FPGA implementation of DWT for Audio Watermarking Application

Exploring QAM using LabView Simulation *

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Physical Layer: Outline

Transcription:

IOSR Journal of Electrical and Electronics Engineering (IOSR-JEEE) e-issn: 2278-1676,p-ISSN: 2320-3331, Volume 9, Issue 2 Ver. I (Mar Apr. 2014), PP 07-12 Implementation of attractive Speech Quality for Mixed Excited Linear Prediction Haresh Miyani 1, Aalay Mehta 2, Pratik Nai 2, Harshad Patel 2 Assistant professor of Electronics Engineering, Bhagwan Mahavir College of Engineering and Technology, Surat-395017, Gujarat, India. 1 Student of Electronics Engineering, Bhagwan Mahavir College of Engineering and Technology, Surat-395017, Gujarat, India. 2 Abstract: Nowadays the number of mobile subscribers is increasing all over the world, so the system for the communication has to be improved. Mixed Excited Linear Prediction (MELP) algorithm is developed for reducing the bandwidth of the signal as well as transmission of large data on a single channel. This results in increase channel capacity. This also results in, increasing the number of user in a channel. MELP is basically a speech coding method, relying on a Speech Encoder and Speech Decoder. The MELP speech coder reduces the redundancy of the signal and compresses it which is represented by the MELP code. Speech Decoder includes a Linear Predictive Coding () filter providing a synthesized speech at its output side in response to voice and unvoiced. MELP also reduces jitter voice. The bit rate of MELP is reducing the reserves of the code book and calculation complexity. This paper describes the bit rates of MELP coder can be reduced to as low as 2.4kbps without apparent damage to the speech quality. Keywords: MELP, Channel capacity, Speech coder, Bandwidth, Synthetic speech. I. Introduction As of June 2012, more than 2.4 billion people over a third of the world's human population have used the services of the Internet; approximately 100 times more people than were using it in 1995. Need for Mixed Excited Linear Prediction (MELP) to reduce the number of bits used to store or transmit more information. MELP is used for increasing the Bandwidth of the signal, which results in a transmission of more data over a single channel. The bandwidth of the digital communication can be increased by compressing the signal at the transmitter side and decompressing the signal at receiver side. If the data are effectively compressed, we receive an improved data on the receiver side. Speech communication is the basic communication from all of the communication. Mixed Excited Linear Prediction (MELP) coders can produce high quality speech at 2.4 kb/s. A distortion is seen when bit-rate are decreased below the 2.4 kb/s. Voice compression can be done in two steps: 1. Increasing the amount of data that can be stored in a given domain, such as space, time or frequency or contained in given message length. [1] 2. Reducing the amount of storage space required to store a given amount of data or reducing the length of message required to transfer the given amount of information. [1] Voice compression is done by two techniques: 1) Lossy Compression used when signals are in wave form, 2) Lossless Compression used when storing data base records and word pressing files in the input data. II. Speech Coding The term Speech Coding refers to the process of compressing and decompressing Human Speech. Speech coding is a digital technique. Transmission of Human Speech by digital technique is becoming more use in nowadays. Morden communication technologies are mobile satellite telephone, internet, landline, mailbox etc. In 1994 only 3% of American classrooms had the Internet while by 2002 92% did. A variety of Speech Coding methods are there for compressing (coding) and decompressing (decoding), to get the efficient Human Speech at the receiver side. Compression is typically done by extracting parameters of successive sample sets; also it is referred as frames, of the original speech Waveform and representing the parameters as a digital signal. Decompression is typically done by decoding the transmitted or stored digital signal. [6] In decoding the signal is the encoded versions of extracted parameters. There are three types of noise: 1) Voiced, 2) Unvoiced, 3) Jitter Voice. [1][3] Accordingly, a need exists for a speech encoder and method for rapidly, efficiently and accurately characterizing speech signals in a fashion lending itself to compact digital representation. Further, a need exists for a speech decoder and method for providing high quality speech signals from the compact digital representations. The problem of providing high fidelity speech while conserving digital bandwidth and 7 Page

minimizing both computation complexity and power requirements has been covered by the Mixed Excited Linear Prediction(MELP). The Fig.1 shows the Block Diagram of speech communication system. Speech Coding mainly divided into two parts: 1) Speech Encoder and 2) Speech Decoder. SPEECH SOURCE SAMPLER A/D CONVERTER SOURCE CHANNEL CHANNEL CHANNEL SOURCE D/A CONVERTER Fig.1 Block Diagram of Speech Communication System Speech Encoder: Speech source is the input of the speech encoder, further it is divides in the Filterremoves the noise from the speech and do the framing of the signal. Sampler- it takes the most significant data from the speech on base the receiver side can regenerate the speech. AtoD converter- it is a analog to digital converter, which convert the speech into digital code. Source encoder- here the MELP main works. The MELP Encoder compresses the speech up to 2.4KB/s. and further it transmit to the channel. [2][3] Speech Decoder: This is exactly reveres process of the encoder. It decodes the data from the cannel and give to the source decoder. This is a MELP Decoder, which regenerate the digital code send from the transmitter side. The digital to analog converter convert digital speech to the analog. And after the filtering the speech the original speech is received at the receiver. [2][3] III. Melp Encoder An audio encoder is capable of capturing compression and converting audio to digital signal. Here Fig.2 shows Mixed Excited Linear Prediction (MELP) Encoder. It contains : A) Frame Segmentation, B) Pitch Detection, C) Fourier Magnitude, D) Shaping Filter, E) Gain Computation, F)LP Analysis. OUTPUT SPEECH INPUT PCM SPEECH FRAME SEGMENTATION ESTIMATION (FIRST STAGE) MAGNITUDE CALCULATION MAGNITUDE LP ANALYSIS VOICE STRENGTH CALCULATION BANDPASS V.S. COMPUTAION PREDICTION ESTIMATION APERIODIC FLAG DECISION / LOW BAND V.S. APERIODIC FLAG PITCH PERIOD BANDPASS V.S. M. PACK MELP BIT STREAM Fig.2 MELP Encoder [1] A. Frame Segmentation: Frame segmentation is also called as positions of Windows. The major frame segmentation utilized by MELP coder. MELP coder is take 180 samples in each frame for compression. This require for facilitate interpolation, because parameter of a given frame are interpolated between two different sets. It is calculated from centred at the beginning of the frame and at the end of the frame. For instance and superior Fourier magnitude utilize a 200 sample per frame is consider. [2][3] B. Pitch Detection: This parameter is a mostly plays a role to generate a synthetic speech. Pitch period is related to voicing strength. Here for detection of pitch period the fractional refinement is used throughout the encoding process. 8 Page

To separate the input signal into five bands, a voicing strength find in every bands. First the speech is filtered by bandpass filter with passband 0 to 500Hz. [7] The normalized autocorrelation [2] is given by the, [ ] [ ] ( ) [ ] [ ] Where l=40,41,..,160. The fractional pitch period denoted by the T (1) =T +. C. Fourier Magnitude: The MELP coder depends on the Fourier magnitude from the prediction error signal to generate the shape of the excitation speech or pulse which depends on the magnitude of Fourier transform. These values are quantized and transmitted on frames. The synthetic speech is can generate on the decoder side, a sequence as close as possible to the original signal. Discrete Fourier Transform (DFT) [6][3] is defined by, being the analysis equation of the transform, and [ ] [ ] ( ) ; k = 0,..., ( ) [ ] [ ] ( ) ; n = 0,..., ( ) N 0 =512 is the length of the sequences. The Fourier magnitude sequence is denoted by Fmag [i]. It is normalized by, Where, [ ] [ ] ( ) ( (( [ ]) )) ( ) D. Shaping Filter: In MELP, shaping filter is made of five filters. This filters called as synthesis filters, which are used to synthesize the mixed excitation signal during decoding. Each synthesis filter works on particular frequency band, there are defined as 0-500 Hz, 500-1000 Hz,1-2000 Hz, 2000-3000 Hz and 3000-4000 Hz. This all synthesis filters are connected in parallel to define the frequency responses of the shaping filters. [7] Total response of shaping filter is given by, [ ] [ ] ( ) This two shaping filters used only during the decoding if the speech where mixed excitation synthetic speech is generated. E. Gain Computation: It is a technology to calculate the gain of the speech. Using a pitch-adaptive window length, the input speech signal gain can be measured twice per frame. 1) If vs 1 > 0.6, the window length is shortest multiple of first stage pitch period (T (1) ) are longer than 120 samples. This length exceeds up to 320 samples which are divided by 2. If the value of gain is changed than the position of window is minimized. 2) If vs 1 0.6, the window length is 120 samples. This used in only for voiced or jittery voiced frames. And the length of window is default. [7][8] In the gain computation the first 90 samples produce g 1 at the first window. At the last sample are used in current frame. The gain computation can be given as follows; ( [ ]) ( ) F. LP Analysis: It is a simple form of the 10 th order extrapolation. It uses Levinson-Durbin method (Toeplitz matrix). LP analysis is performed on the input speech signal using 200 samples. The final coefficients are bandwidthexpanded with a constant of 0.994. This are quantized and used to find the prediction error signal. [2] 9 Page

G. Bit Allocation: A total 54 bits are transmitted per frame at a frame length of 22.5ms. As it is result in a bit-rate of 2400 bps. Here in the MELP there is 13 bits is provided against the error protection per frame. Synchronization is alternating one/zero pattern. PARAMETER VOICED UNVOICED 25 25 /LOW-BAND VOICING STRENGTH 7 7 BANDPASS VOICING STRENGTH 4 **** FIRST 3 3 SECOND 5 5 APERIODIC FLAG 1 **** MAGNITUDES 8 **** SYNCRONIZATION 1 1 ERROR PROTECTION **** 13 TOTAL 54 54 Table 1 Bit Allocation in MELP IV. Melp Decoder The decoder is a device which is regenerates the original data from the packed data received at the receiver. The MELP Decoder shows in the Fig 3. MELP decoder, where the bit stream is unpacked as directed in the decoder. Two additional filters are added to the processing path: 1) The spectral enhancement filter taking the mixed excitation as input. 2) The pulse dispersion filter at the end of the processing chain. MELP BIT-STREAM UNPACK M. / LOWBAND V.S. APERIODI C FLAG BANDPASS V.S M. DECODING AND / LOW-BAND V.S. SHAPING S COEFFICIENT DECODING AND PULSE GENERATION JITTR GENERATION AND DECODING AND ADJUSTMNET PULSE SHAPING SHAPING S COEFFICIENT SPECIAL ENHANCEMENT SYNTHESIS Y[n] WHITE NOISE GENERATOR + + NOISE SHAPING G0 SCALE FACTOR CALCULATION G SYNTHETIC SPEECH PULSE DISPERSION Fig.3 MELP Decoder [1] For unvoiced frames the different values are used. In the MELP decoder, the 50 included to the pitch period, 0.25 for the jittery voice, all 0 s for the voicing strengths and all 1 s for the Fourier magnitude. the jitter is 0.25 if aperiodic flag is equal to one. The maximum pitch period distortion is around ±25%. The interpolation factor [2][3] is given by, [ ] ( ) This gives the (LSF), pitch period, jitter, Fourier magnitudes, and shaping filters coefficients. By putting all this coefficients together the final speech generated. This speech is approximately exactly same as the input of the speech. This speech is generated from the codebook of the different amplitudes so it is a synthetic 10 Page

speech. (Not a natural. Its artificial or contrived). As the result the final a synthetic speech is generated at the end of receiver. V. Result Input and Output for five.wav TABLE 2Five.wav Coefficients Wave file duration MSE Absolute SNR five.wav 4329 7.0410e+006 1.6810e+003-0.8683 Fig. 3 Input Output Waveform of Five.wav VI. Conclusion After working on this project, compressed and transmitted speech is received with highest possible quality using the least channel capacity. At the low bit rate, MELP speech codec satisfy all the requirements of being good quality speech coder. The lower the bit-rate at which the coder can deliver toll quality speech, the more speech channels can be compressed within a given bandwidth. The speech received at the bit rate of 2.4KB/s without damage our original speech. The results shows the low SNR ratio compress the speech at the same rate but gives a good quality speech, while increasing in SNR ratio gives a slight lower quality speech. Acknowledgement Many have contributed to the successful preparation of our project. We would like to place on record my grateful thanks to each of them. We wish to express our gratitude to our guide Mr. Haresh Miyani, for his demonstrative and undaunted guidance and support during our project work and giving an opportunity to represent our project. We would like to thank our parents for their kindness support to us and inspiring us. Lastly, we would thank each and every person who has helped us directly or indirectly. Refernences [1] Wai C. Chu, Speech coding Algorithems: Foundation and Evolution of Standardized Coders A John Wiley & Sons, Inc. [2] John G Proakis & Dimitris G Manolakis : Digital Signal Processing : Principles, Algorithms, and Applications (Prentice Hall India, 3rd edition). [3] Sanjit K Mitra : Digital Signal Processing : A computer Based Approach (TMH, 2 nd edition.) [4] Theodore S. Rappaport: Wireless Communications Principles and Practice ( PEARSON Education, 3 rd edition ). [5] Antoniou, A. Digital Filters: Analysis, Design, and Applications ( McGraw-Hill, New York.). [6] Burrus, C. S. and T. W. Parks (1985). DFT/FFT and Convolution Algorithms, John Wiley &Sons, Hoboken, NJ. (1993). [7] Gersho, A. and R. M. Gray. Vector Quantization and Signal Compression, 4th printing, Kluwer Academic Publishers, Norwell, MA. (1995). [8] United Status patent paper, Speech coding system and method including harmonic generator having an adaptive phase off setter (Richard Louis Zinser, Mark Lewis Grabb, Steven Robert Koch, Willium Brooksby.). [9] Advanced Electronic Communications Systems (Wayne Tomasi, 5 th Edition.). AUTHORS Haresh Miyani was born in Bhavnagar, Gujarat, India in 1987. He passed his B.E. from C.K. Pithawala Engg. & Tech., Surat. He passed M.Tech from VJTI in 2012. He is qualified for GATE. His area of interest is Digital Signal Processing, Image Prosessing, MATLAB. He is Asst. Proff. At the Bhagwan Mahavir College of Engg. & Tech., Surat. He has one year training at the IIT, Bombay campus. 11 Page

Aalay Mehta was born in Mehsana, Gujarat, India in 1993. He is Post Graduate in BE- Electronics Engineering, Bhagwan Mahavir College of Engineering and Technology (Gujarat Technical University), Surat. His area of interest is to work on Digital Signal Processing, Power Electronics, Embedded System and software (MATLAB). Pratik Nai was born in Bharuch, Gujarat, India in 1992. He is Post Graduate in BE-Electronics Engineering, Bhagwan Mahavir College of Engineering and Technology (Gujarat Technical University), Surat. His area of interest is to work on Digital Signal Processing, Power Electronics, Embedded System and software (MATLAB), Digital logic Analog System. Harshad Patel was born in Surat, Gujarat, India in 1992. He is Post Graduate in BE- Electronics Engineering, Bhagwan Mahavir College of Engineering and Technology (Gujarat Technical University), Surat. His area of interest is to work on Digital Signal Processing, Embedded System and software (MATLAB), Digital Communication. 12 Page