An Approach to Very Low Bit Rate Speech Coding

Size: px
Start display at page:

Download "An Approach to Very Low Bit Rate Speech Coding"

Transcription

1 Computing For Nation Development, February 26 27, 2009 Bharati Vidyapeeth s Institute of Computer Applications and Management, New Delhi An Approach to Very Low Bit Rate Speech Coding Hari Kumar Singh Department of Electronics & Communication Institute of Engineering and Technology, M.J.P.Rohilkhand University, Bareilly harimtech2000@rediffmail.com Sanjeev Sharma and Yash Vir Singh Department of Electronics & Communication College of Engineering and Rural Technology, Meerut Sanjeev_vats1@yahoo.co.in ABSTRACT Speech Coding is the process of coding speech signals for efficient transmission. The problem of reducing the bit rate of a speech representation, while preserving the quality of speech reconstructed from such a representation has received continuous attention in the past five decades. Speech coded at 64 kilobits per second (kbits/s) using logarithmic PCM is considered as ``non-compressed'' and is often used as a reference for comparisons. The term medium-rate is used for coding in the range of 8-16 kbits/s, low-rate for systems working below 8 kbits/s and down to 2.4 kbits/s, and very-lowrate for coders operating below 2.4 kbits/s. KEYWORDS Speech, Quantization, Code book, Clusters, Signals, and Compression 1. INTRODUCTION TRADITIONAL SPEECH CODING Natural speech waveforms are continuous in time and amplitude. Periodically sampling an analog waveform at the Nyquist rate (twice the highest frequency) converts it to a discrete-time signal. The signal amplitude at each time instant is quantized to one of a set of L amplitude values (where B = Log 2 L is the number of bits used to digitally code each value). Digital communication of an analog amplitude X consists of: A/D conversion, transmission of binary information over a digital channel, and D/A conversion to reconstruct the analog X value. If the channel is noise-free, the output value differs from the input by an amount known as the quantization noise. The bit rate for a signal is the product of its sampling rate F s (in samples/second) and the number of bits B used to code each sample. The process of extraction of properties or features from a speech signal which are important for communication is called speech analysis. This involves a transformation of the speech signal into another signal, a set of signals, or a set of features, with the objective of simplification and data reduction. The standard model of speech production (a source exciting vocal tract filter) is implicit in many analysis methods, including LPC. Most of the methods operate in the frequency domain as it offers the most useful parameters for speech processing. Human hearing appears to pay much more attention to spectral aspects of speech (e.g., amplitude distribution in frequency) than to phase or timing aspects. 2. TRADITIONAL APPROACH 2.1 LPC BAS ED CODING LPC is one of the most common techniques for low-bit-rate speech coding. The popularity of LPC derives from its compact yet precise representation of the speech spectral magnitude as well as its relatively simple computation. LPC analysis produces a set or vector of real-valued features, which represent an estimate to the spectrum of the windowed speech signal. The LPC vector for a signal frame typically consists of about 8-12 spectral coefficients with 5-6 bits/coefficient. The gain level and pitch are coded with 2-4 bits each. In addition, the binary voiced/unvoiced decision is transmitted. Thus, a 2400 bits/second vocoder might send 60 bits/frame every 25 ms. 2.2 VECTOR QUANTIZATION Most speech coders transmit time or frequency samples as independent (scalar) parameters, but coding efficiency can be enhanced by eliminating redundant information within blocks of parameters and transmitting a single index code to represent the entire block. This is Vector Quantization (VQ). During the coding phase, basic analysis yields k scalar parameters v (features) for each frame. Then, a particular k- dimensional vector, among a set of M vectors stored in a codebook is chosen which corresponds most closely to the vector v. A Log 2 M bit code (the index of the vector chosen from the codebook) is sent in place of the k scalar parameters. The system's decoder must include a codebook identical to that of the coder. To synthesize the output speech, the decoder uses the parameters listed under the index in the codebook corresponding to the received code. The key issues in VQ are the design and search of the codebook. In coders with scalar quantization, coding distortion comes from the finite precision for representing each parameter. VQ distortion comes instead from using synthesis parameters from a codebook entry, which differ from the parameters determined by the speech analyzer.

2 The size of the codebook, M should be large enough that each possible input vector corresponds to a codebook entry whose substitution for v yields output speech close to the original. However, efficient search procedures and storage considerations limit M to smaller values. The greater the degree of correlation among the vector elements, the more the bit rate can be lowered. LPC typically sends 50 bits/frame (10 coefficients, 5 bits each) with scalar quantization, but VQ succeeds with about 10 bits. A well-chosen set of 1024 spectra (2 10 for 10 bit VQ) can adequately represent most possible speech sounds. 2.3 SPEECH CODING BAS ED UPON VECTOR QUANTIZATION One of the first experimental co mparisons between optimized scalar quantization and vector quantization is presented in [2]. In this work, the gain parameter is treated separately from the rest of the information. The signal is segmented into frames of N samples for which one gain parameter is sent along with the N sample vectors. This approach is called gain separation. In this method, the gain and spectral codebooks are separate, and each entry is decoded as a scalar gain times a waveform vector. Since this method allows separate codebook searches, only 2 L +2 M entries must be examined instead of 2 L 2 M in the original method, with M spectral codebook entries and L gain possibilities. Furthermore, the gain codebook search is much simpler since it involved scalars, rather than the k-dimensional vectors in the spectral codebook. Another sub-optimal technique, binary tree search, is used for searching the codebook. In a full codebook search, the vector for each frame is compared with each of the M codebook entries, thus requiring M distance calculations. A binary tree search replaces M comparisons with only 2Log 2 comparisons. The M codebook entries form the lowest nodes of the tree, with each higher node being represented by the centroid of all entries below it in the tree. Note that this approach doubles the memory requirement for the codebook. The benefits of VQ are made quite apparent in the results of this work [2]. For a 10-bits/frame full search vector quantizer, the measured distortion is approximately 1.8 db. The equivalent distortion for scalar quantizer occurs at approximately 37 bits/frame resulting in a difference of 27 bits/frame or a 73 percent reduction in bit-rate for the same equivalent distortion. With binary tree search, the distortion was slightly greater and the improvement obtained was about 66 percent. In [3], VQ is applied to modify a 2400 bits/s LPC vocoder to operate at 800 bits/s, while retaining acceptable intelligibility and naturalness of quality. No change to the LPC design is done other than the quantization algorithms for the LPC parameters. One of the modifications here is the separation of pitch and voicing in addition to gain. The quantization technique of pitch and gain are scalar, having one value for 3 frames. Voiced and unvoiced speech spectra are in most cases very different, hence separate codebooks are employed for the two. The tree search has been modified so as to reduce the distortion in the binary tree approach. A 32 branches/node tree search has been found to be a good compromise, requiring only 1/16 the computation of the full search procedure, but achieving an average distortion very close to the full search codebook. Some techniques for reducing the bit rate are :- Frame-Repeat: In this method, every other frame is not transmitted, but a 1-bit code is sent, which specifies whether the missing frame should be interpreted as being the same as the preceding or following spectrum. The determination is made based on whichever of the two spectra is closer to the omitted spectrum. Variable-Frame-Rate (VFR) Transmission: To economize and avoid transients, the (LP) vectors are often smoothed (parameters interpolated in time) before use in the synthesizer stage. When the speech signal changes rapidly, LP vectors might be sent every 10 ms, while during steady vowels (or silence), a much lower rate suffices. Thus, data is buffered during rapid changes, for later transmission during times of less speech dynamics. VFR vocoders can often reduce bit rate significantly without loss of speech quality, but with extra complexity and delay. Determination of when to transmit a frame of data normally depends on a distance measure comparing the last frame sent along with the current analysis frame; when the distance exceeds a threshold (indicating a large enough speech change), 1-2 frames are sent. Gain coding: In a typical LPC vocoder, a 5-bit code is used for quantizing the gain. In this method, the average gain for each spectral template in the codebook is noted and stored. Then, rather than coding the absolute gain level, only the difference between the input gain and the average gain for the codebook entry is transmitted. 2.4 ACHIEVING VERY LOW RATES A new method where input speech is modeled as a sequence of variable-length segments is introduced in [5] and further optimized in [6]. An automatic segmentation algorithm is used to obtain segments with an average duration comparable to that of a phoneme. Such a segment is quantized as a single block. For segmentation, speech is considered as a succession of steady states separated by transitions. Spectral time-derivatives are thresholded to determine the middle of transitions. The threshold is chosen such that approximately 11 segments/s are obtained (equal to the expected phoneme rate). The distance between two segments is calculated using an ``equi-spaced'' sampled representation of the segment (spatial sampling in 14 LPC spectral dimensions). The euclidean distance between corresponding equi-spaced points on two segments is summed to arrive at a distance value. Each segment, in this approach, is a 140 dimensional vector (14 spectral value x 10 spatial samples). Usually, a clustering algorithm is used to obtain an optimal set of segment templates for the codebook. For the large dimensionality of the segment vocoder, the expected quantization error of a properly chosen

3 An Approach to Very Low Bit Rate Speech Coding random quantizer is nearly equal to the distortion rate bound. Therefore, a computationally intensive clustering algorithm was not used and a random set of segments was obtained and used as the codebook. Approximately 8000 segment templates (13 bits) were used for coding and a further 8 bits were used for gain, pitch and timing information. Thus the bit rate obtained was 231 bits/s for 11 segments/s. In [6], a further decrease in bit rate was achieved by using a segment network that restricts the number of segment templates that can follow a given template. The segment network is used in the following manner: if the current input segment is quantized to a given template, then only those segment templates that follow this template in the network can be used to quantize the following input segment. The implementation of the segment network was done in the following fashion. Suppose that the current input segment is quantized to a given template. The last spectrum of this best segment template is used to determine a subset of templates that are allowed in quantizing the following input segment. The templates allowed are those whose first spectrum is nearest (euclidean distance) to the last spectrum of the template used in quantizing the current input segment. Comparison was done with an unconstrained case, using a total of 1024 segment templates (10 bits). W ith the segment network, the choice of segment templates is restricted to 256 and thus 8 bits/segment are needed to code the segment. Almost no difference in quantization error was found in the two approaches. 2.5 PHONETIC VOCODER To code speech at very low data rates of about 100 bits/s implies approaching the true information rate of the communication. (about 60 phones at the rate of phones/second). One approach is to extract detailed linguistic knowledge of the information embedded in the speech signal. In [7], a coding technique based on phonetic speech recognition is explored. The motivation for the work stems from the observation that Hidden Markov Model (HMM) based speech recognition systems (working on LPC features) tend to do an excellent job of modeling the acoustic data. A basic premise of the paper is that the quality of the acoustic match will be good even if the phone recognition accuracy is poor. In other words, a phone may be recognized as another phone by the HMM system. But, this will not result in significant deterioration of the speech quality, as the two phones will be very close acoustically. A large acoustic phonetic database is used which has been phonetically transcribed and labeled with a set of 60 phones. A basic phone recognition system is implemented with a null grammar (any phone can follow any phone). A contextually rich sample of examplars of each of the 60 phones are clustered. The major way in which the phonetic vocoder distinguishes itself from a vector quantization system is the manner in which spectral information is transmitted. Rather than transmit indices in a VQ codebook representing each spectral vector, a phone index is transmitted along with auxiliary information describing the path through the model. Good overall synthesized speech quality was achieved with 8 clusters per phone. (i.e. 480 clusters). Thus, a simple inventory of phones is shown to be sufficient for capturing the bulk of acoustic information. 3. CONCATENATIVE S YNTHES IS OF WAVEFORMS Speech coding at medium-rates and below is achieved using an analysis-synthesis process. In the analysis stage, speech is represented by a compact set of parameters, which are encoded efficiently. In the synthesis stage, these parameters are decoded and used in conjunction with a reconstruction mechanism to form speech. In this chapter, a new method is discussed in which the original waveform corresponding to the nearest template segment (codebook entry) is used for synthesis. The primary difference from conventional coders is that no speech generation framework like the source-filter model is used. Instead, it is assumed that any speech signal can be reconstructed by concatenating short segment waveforms that are suitably chosen from a database. Such an approach is reported to give speech with a better perceptive quality than the LPC synthesized speech using pulse/noise excitation. 3.1 WAVEFORM S EGMENT VOCODER The first foray into waveform based synthesis was made in [8]. Here, the decoding stage works with the waveforms of the nearest templates and not their spectral representation. Pitch, energy and duration of each template is independently modified to match those of the input segment. These modified segments are then concatenated to produce the output waveform. The paper describes algorithms used for time-scale modification and pitch-scale modification which are applied to the template waveforms so as to match the time and pitch characteristics of the input segment. Several sentences from a single male speaker are vocoded using his own templates (speakerdependent paradigm). The speech has a choppy quality, presumably due to segment boundary discontinuities. The authors view this work as a modification to the original LPC vocoder developed by them [5]. Hence, phoneme like segments were the basic elements in their waveform approach. These segments were then passed through considerable modifications which may in fact reduce the naturalness of the waveform. 3.2 CONTEMPORARY TECHNIQUES The waveform concatenation approach has been investigated in some detail in [9]. Here, frames are used instead of segments as the units for selection. One advantage is that since frame selection does not require a time-warping process, the synthesis of the speech signal is done without time scale modification (as was needed in [8] ). On the downside, the bit rate of frame based approach is greater, because longer segments contribute to high compression ratio of segmental coders. Mel-frequency cepstrum coefficients (MFCCs) are used as the feature parameters for the unit selection. The size of the database is about 76 minutes of speech corresponding to 460,000 frames each of 10ms duration. The first one contains MFCCs of the about 460,000 frames obtained by the feature extraction

4 process. The second one contains speech waveforms that are used while generating the output waveform. The raw speech signal from which the MFCCs of the first database are computed is the same as those in the second database. In addition to the unit indices, pitch (Fo) and gain parameters are also transmitted. The unit index represents the position where the selected unit is located in the database. Figure 3.1: Block diagram of the coder UNIT S ELECTION For unit selection, a novel approach is proposed in [10]. Units in the synthesis database are considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. This framework has many similarities to HMM-based speech recognition. A pruned Viterbi search is used to select the best units for synthesis from the database CODING THE S ELECTED UNIT S EQUENCE To take advantage of this property, a run-length coding technique is employed to compress the unit sequence. In this method, a series of consecutive frames are represented with the start frame and the number of the following consecutive frames. Thereby, a number of consecutive frames are encoded into only two variables CODING PITCH Accurate coding of the pitch(f0) contour plays an important role in a very low rate coder since the correct pitch contour will increase naturalness. Piecewise linear approximation is used to implement contour-wise F0 coding. This method offers high compression, as only a small number of sampled points need to be transmitted instead of all individual samples. Of course, the intervals between the sampled points must be transmitted for proper interpolation. Piecewise linear approximation presumes some degree of smoothness for the function approximated. Therefore, the F0 contour is smoothed before compression. The methods used for finding the location of F0 points are discussed in the paper [9]. 4. CONCLUS ION AND FUTURE S COPES Recent advances in computer technology allow a wide variety of applications for speech coding. Transmission can either be real time as in normal telephone conversations, or off-line, as in storing speech for electronic mail forwarding of voice messages. In either case, the transmission bit rate (or storage requirements) is crucial to evaluate the practicality of different coding schemes. Low bit rate concatenative coders can be very useful when requiring storage of large amount of pre-recorded speech. A talking book, which is the spoken equivalent of its printed version, requires huge space for storing speech waveforms unless a high compression-coding scheme is applied. Similarly, for a wide variety of multimedia applications, such as language learning assistance, electronic dictionaries and encyclopedias there are potential applications of very low bit rate speech coders. Interest in exchanging voice messages across the Internet is increasing. To save on bandwidth, such coders could have a large role to play. For scenarios where two persons (or a small set of persons) are frequently exchanging voice messages, concatenative synthesis could be employed. REFERENCES 1. Douglas O'Shaugnessy. Speech Communications - Human and Machine. Universities Press, A. Buzo, Jr. A. H. Gray, R. M. Gray, and J. D. Markel. Speech coding based upon vector quantization. IEEE International Conference on Acoustics, Speech and Signal Processing, D. Y. Wong, B. H. Juang, and Jr. A. H. Gray. Recent developments in vector quantization for speech processing. IEEE International Conference on Acoustics, Speech and Signal Processing, Richard M. Schwartz and Salim E. Roucos. A comparison of methods for b/s vocoders. IEEE International Conference on Acoustics, Speech and Signal Processing, S. Roucos, R. Schwartz, and J. Makhoul. Segment quantization for very-low-rate speech coding. IEEE International Conference on Acoustics, Speech and Signal Processing, S. Roucos, R. M. Schwartz, and J. Makhoul.A segment vocoder at 150 b/s. IEEE International Conference on Acoustics, Speech and Signal Processing, Joseph Picone and George R. Doddington. A phonetic vocoder. IEEE International Conference on Acoustics, Speech and Signal Processing, Salim Roucos and Alexander M. W ilgus. The waveform segment vocoder: A new approach for very -low-rate speech coding. IEEE International Conference on Acoustics, Speech and Signal Processing, 1985.

5 An Approach to Very Low Bit Rate Speech Coding 9. Ki-Seung Lee and Richard V. Cox. A very low bit rate speech coder based on a recognition/synthesis paradigm. IEEE Transactions on Speech and Audio Processing, Andrew J. Hunt and Alan W. Black. Unit selection in a concatenative speech synthesis system using a large speech database. IEEE International Conference on Acoustics, Speech and Signal Processing, Lawrence R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, Volume 77, No.2, 1989.

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

A Very Low Bit Rate Speech Coder Based on a Recognition/Synthesis Paradigm

A Very Low Bit Rate Speech Coder Based on a Recognition/Synthesis Paradigm 482 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 5, JULY 2001 A Very Low Bit Rate Speech Coder Based on a Recognition/Synthesis Paradigm Ki-Seung Lee, Member, IEEE, and Richard V. Cox,

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS 1 WAHYU KUSUMA R., 2 PRINCE BRAVE GUHYAPATI V 1 Computer Laboratory Staff., Department of Information Systems, Gunadarma University,

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

Wideband Speech Coding & Its Application

Wideband Speech Coding & Its Application Wideband Speech Coding & Its Application Apeksha B. landge. M.E. [student] Aditya Engineering College Beed Prof. Amir Lodhi. Guide & HOD, Aditya Engineering College Beed ABSTRACT: Increasing the bandwidth

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Voice Excited Lpc for Speech Compression by V/Uv Classification

Voice Excited Lpc for Speech Compression by V/Uv Classification IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 65-69 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Voice Excited Lpc for Speech

More information

APPLICATIONS OF DSP OBJECTIVES

APPLICATIONS OF DSP OBJECTIVES APPLICATIONS OF DSP OBJECTIVES This lecture will discuss the following: Introduce analog and digital waveform coding Introduce Pulse Coded Modulation Consider speech-coding principles Introduce the channel

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

Time division multiplexing The block diagram for TDM is illustrated as shown in the figure

Time division multiplexing The block diagram for TDM is illustrated as shown in the figure CHAPTER 2 Syllabus: 1) Pulse amplitude modulation 2) TDM 3) Wave form coding techniques 4) PCM 5) Quantization noise and SNR 6) Robust quantization Pulse amplitude modulation In pulse amplitude modulation,

More information

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE

IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE International Journal of Technology (2011) 1: 56 64 ISSN 2086 9614 IJTech 2011 IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE Djamhari Sirat 1, Arman D. Diponegoro

More information

Analog and Telecommunication Electronics

Analog and Telecommunication Electronics Politecnico di Torino - ICT School Analog and Telecommunication Electronics D5 - Special A/D converters» Differential converters» Oversampling, noise shaping» Logarithmic conversion» Approximation, A and

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

Implementing Speaker Recognition

Implementing Speaker Recognition Implementing Speaker Recognition Chase Zhou Physics 406-11 May 2015 Introduction Machinery has come to replace much of human labor. They are faster, stronger, and more consistent than any human. They ve

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile 8 2. LITERATURE SURVEY The available radio spectrum for the wireless radio communication is very limited hence to accommodate maximum number of users the speech is compressed. The speech compression techniques

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Audio and Speech Compression Using DCT and DWT Techniques

Audio and Speech Compression Using DCT and DWT Techniques Audio and Speech Compression Using DCT and DWT Techniques M. V. Patil 1, Apoorva Gupta 2, Ankita Varma 3, Shikhar Salil 4 Asst. Professor, Dept.of Elex, Bharati Vidyapeeth Univ.Coll.of Engg, Pune, Maharashtra,

More information

Telecommunication Electronics

Telecommunication Electronics Politecnico di Torino ICT School Telecommunication Electronics C5 - Special A/D converters» Logarithmic conversion» Approximation, A and µ laws» Differential converters» Oversampling, noise shaping Logarithmic

More information

10 Speech and Audio Signals

10 Speech and Audio Signals 0 Speech and Audio Signals Introduction Speech and audio signals are normally converted into PCM, which can be stored or transmitted as a PCM code, or compressed to reduce the number of bits used to code

More information

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);

More information

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS

A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS Mark W. Chamberlain Harris Corporation, RF Communications Division 1680 University Avenue Rochester, New York 14610 ABSTRACT The U.S. government has developed

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying

More information

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2 ECE 556 BASICS OF DIGITAL SPEECH PROCESSING Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2 Analog Sound to Digital Sound Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre

More information

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley University of California Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Professors : N.Morgan / B.Gold EE225D Spring,1999 Medium & High Rate Coding Lecture 26

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Sound Synthesis Methods

Sound Synthesis Methods Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

SGN Audio and Speech Processing

SGN Audio and Speech Processing Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations

More information

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM DR. D.C. DHUBKARYA AND SONAM DUBEY 2 Email at: sonamdubey2000@gmail.com, Electronic and communication department Bundelkhand

More information

Low Bit Rate Speech Coding

Low Bit Rate Speech Coding Low Bit Rate Speech Coding Jaspreet Singh 1, Mayank Kumar 2 1 Asst. Prof.ECE, RIMT Bareilly, 2 Asst. Prof.ECE, RIMT Bareilly ABSTRACT Despite enormous advances in digital communication, the voice is still

More information

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP Monika S.Yadav Vidarbha Institute of Technology Rashtrasant Tukdoji Maharaj Nagpur University, Nagpur, India monika.yadav@rediffmail.com

More information

EEE 309 Communication Theory

EEE 309 Communication Theory EEE 309 Communication Theory Semester: January 2016 Dr. Md. Farhad Hossain Associate Professor Department of EEE, BUET Email: mfarhadhossain@eee.buet.ac.bd Office: ECE 331, ECE Building Part 05 Pulse Code

More information

Comparison of CELP speech coder with a wavelet method

Comparison of CELP speech coder with a wavelet method University of Kentucky UKnowledge University of Kentucky Master's Theses Graduate School 2006 Comparison of CELP speech coder with a wavelet method Sriram Nagaswamy University of Kentucky, sriramn@gmail.com

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION

ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION Tenkasi Ramabadran and Mark Jasiuk Motorola Labs, Motorola Inc., 1301 East Algonquin Road, Schaumburg, IL 60196,

More information

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder COMPUSOFT, An international journal of advanced computer technology, 3 (3), March-204 (Volume-III, Issue-III) ISSN:2320-0790 Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech

More information

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical

More information

TE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION

TE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION TE 302 DISCRETE SIGNALS AND SYSTEMS Study on the behavior and processing of information bearing functions as they are currently used in human communication and the systems involved. Chapter 1: INTRODUCTION

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Cellular systems & GSM Wireless Systems, a.a. 2014/2015

Cellular systems & GSM Wireless Systems, a.a. 2014/2015 Cellular systems & GSM Wireless Systems, a.a. 2014/2015 Un. of Rome La Sapienza Chiara Petrioli Department of Computer Science University of Rome Sapienza Italy 2 Voice Coding 3 Speech signals Voice coding:

More information

Pulse Code Modulation

Pulse Code Modulation Pulse Code Modulation EE 44 Spring Semester Lecture 9 Analog signal Pulse Amplitude Modulation Pulse Width Modulation Pulse Position Modulation Pulse Code Modulation (3-bit coding) 1 Advantages of Digital

More information

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Hungarian Speech Synthesis Using a Phase Exact HNM Approach Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding Nanda Prasetiyo Koestoer B. Eng (Hon) (1998) School of Microelectronic Engineering Faculty of Engineering and Information Technology Griffith

More information

Chapter 8. Representing Multimedia Digitally

Chapter 8. Representing Multimedia Digitally Chapter 8 Representing Multimedia Digitally Learning Objectives Explain how RGB color is represented in bytes Explain the difference between bits and binary numbers Change an RGB color by binary addition

More information

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals. XIV. SPEECH COMMUNICATION Prof. M. Halle G. W. Hughes J. M. Heinz Prof. K. N. Stevens Jane B. Arnold C. I. Malme Dr. T. T. Sandel P. T. Brady F. Poza C. G. Bell O. Fujimura G. Rosen A. AUTOMATIC RESOLUTION

More information

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment BABU et al: VOICE ACTIVITY DETECTION ALGORITHM FOR ROBUST SPEECH RECOGNITION SYSTEM Journal of Scientific & Industrial Research Vol. 69, July 2010, pp. 515-522 515 Performance analysis of voice activity

More information

Pulse Code Modulation

Pulse Code Modulation Pulse Code Modulation Modulation is the process of varying one or more parameters of a carrier signal in accordance with the instantaneous values of the message signal. The message signal is the signal

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

Distributed Speech Recognition Standardization Activity

Distributed Speech Recognition Standardization Activity Distributed Speech Recognition Standardization Activity Alex Sorin, Ron Hoory, Dan Chazan Telecom and Media Systems Group June 30, 2003 IBM Research Lab in Haifa Advanced Speech Enabled Services ASR App

More information

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23 Audio Similarity Mark Zadel MUMT 611 March 8, 2004 Audio Similarity p.1/23 Overview MFCCs Foote Content-Based Retrieval of Music and Audio (1997) Logan, Salomon A Music Similarity Function Based On Signal

More information

Adaptive Forward-Backward Quantizer for Low Bit Rate. High Quality Speech Coding. University of Missouri-Columbia. Columbia, MO 65211

Adaptive Forward-Backward Quantizer for Low Bit Rate. High Quality Speech Coding. University of Missouri-Columbia. Columbia, MO 65211 Adaptive Forward-Backward Quantizer for Low Bit Rate High Quality Speech Coding Jozsef Vass Yunxin Zhao y Xinhua Zhuang Department of Computer Engineering & Computer Science University of Missouri-Columbia

More information

Transcoding of Narrowband to Wideband Speech

Transcoding of Narrowband to Wideband Speech University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2005 Transcoding of Narrowband to Wideband Speech Christian H. Ritz University

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

CS 188: Artificial Intelligence Spring Speech in an Hour

CS 188: Artificial Intelligence Spring Speech in an Hour CS 188: Artificial Intelligence Spring 2006 Lecture 19: Speech Recognition 3/23/2006 Dan Klein UC Berkeley Many slides from Dan Jurafsky Speech in an Hour Speech input is an acoustic wave form s p ee ch

More information

DEPARTMENT OF DEFENSE TELECOMMUNICATIONS SYSTEMS STANDARD

DEPARTMENT OF DEFENSE TELECOMMUNICATIONS SYSTEMS STANDARD NOT MEASUREMENT SENSITIVE 20 December 1999 DEPARTMENT OF DEFENSE TELECOMMUNICATIONS SYSTEMS STANDARD ANALOG-TO-DIGITAL CONVERSION OF VOICE BY 2,400 BIT/SECOND MIXED EXCITATION LINEAR PREDICTION (MELP)

More information

Audio Compression using the MLT and SPIHT

Audio Compression using the MLT and SPIHT Audio Compression using the MLT and SPIHT Mohammed Raad, Alfred Mertins and Ian Burnett School of Electrical, Computer and Telecommunications Engineering University Of Wollongong Northfields Ave Wollongong

More information

General outline of HF digital radiotelephone systems

General outline of HF digital radiotelephone systems Rec. ITU-R F.111-1 1 RECOMMENDATION ITU-R F.111-1* DIGITIZED SPEECH TRANSMISSIONS FOR SYSTEMS OPERATING BELOW ABOUT 30 MHz (Question ITU-R 164/9) Rec. ITU-R F.111-1 (1994-1995) The ITU Radiocommunication

More information

EEE 309 Communication Theory

EEE 309 Communication Theory EEE 309 Communication Theory Semester: January 2017 Dr. Md. Farhad Hossain Associate Professor Department of EEE, BUET Email: mfarhadhossain@eee.buet.ac.bd Office: ECE 331, ECE Building Types of Modulation

More information

Automatic Transcription of Monophonic Audio to MIDI

Automatic Transcription of Monophonic Audio to MIDI Automatic Transcription of Monophonic Audio to MIDI Jiří Vass 1 and Hadas Ofir 2 1 Czech Technical University in Prague, Faculty of Electrical Engineering Department of Measurement vassj@fel.cvut.cz 2

More information

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution PAGE 433 Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution Wenliang Lu, D. Sen, and Shuai Wang School of Electrical Engineering & Telecommunications University of New South Wales,

More information

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Rhythmic Similarity -- a quick paper review Presented by: Shi Yong March 15, 2007 Music Technology, McGill University Contents Introduction Three examples J. Foote 2001, 2002 J. Paulus 2002 S. Dixon 2004

More information

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the 2 nd International Conference on Current Trends in Engineering and Management ICCTEM -214 ISSN

More information

Speech Coding in the Frequency Domain

Speech Coding in the Frequency Domain Speech Coding in the Frequency Domain Speech Processing Advanced Topics Tom Bäckström Aalto University October 215 Introduction The speech production model can be used to efficiently encode speech signals.

More information

Department of Electronics and Communication Engineering 1

Department of Electronics and Communication Engineering 1 UNIT I SAMPLING AND QUANTIZATION Pulse Modulation 1. Explain in detail the generation of PWM and PPM signals (16) (M/J 2011) 2. Explain in detail the concept of PWM and PAM (16) (N/D 2012) 3. What is the

More information

Objective Evaluation of Edge Blur and Ringing Artefacts: Application to JPEG and JPEG 2000 Image Codecs

Objective Evaluation of Edge Blur and Ringing Artefacts: Application to JPEG and JPEG 2000 Image Codecs Objective Evaluation of Edge Blur and Artefacts: Application to JPEG and JPEG 2 Image Codecs G. A. D. Punchihewa, D. G. Bailey, and R. M. Hodgson Institute of Information Sciences and Technology, Massey

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

3GPP TS V8.0.0 ( )

3GPP TS V8.0.0 ( ) TS 46.022 V8.0.0 (2008-12) Technical Specification 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Half rate speech; Comfort noise aspects for the half rate

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

Pitch Period of Speech Signals Preface, Determination and Transformation

Pitch Period of Speech Signals Preface, Determination and Transformation Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com

More information

Synthesis Algorithms and Validation

Synthesis Algorithms and Validation Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Analysis/synthesis coding

Analysis/synthesis coding TSBK06 speech coding p.1/32 Analysis/synthesis coding Many speech coders are based on a principle called analysis/synthesis coding. Instead of coding a waveform, as is normally done in general audio coders

More information

The Channel Vocoder (analyzer):

The Channel Vocoder (analyzer): Vocoders 1 The Channel Vocoder (analyzer): The channel vocoder employs a bank of bandpass filters, Each having a bandwidth between 100 Hz and 300 Hz. Typically, 16-20 linear phase FIR filter are used.

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information