A Simple Hardware Pitch Extractor 1 *

Similar documents
ELECTRONOTES APPLICATION NOTE NO Hanshaw Road Ithaca, NY Sept 13, 2013

Capacitive Touch Sensing Tone Generator. Corey Cleveland and Eric Ponce

Exam Booklet. Pulse Circuits

Real-Time Digital Hardware Pitch Detector

PROJECT NOTES/ENGINEERING BRIEFS

4. Digital Measurement of Electrical Quantities

Analog Synthesizer: Functional Description

Department of Electronics & Telecommunication Engg. LAB MANUAL. B.Tech V Semester [ ] (Branch: ETE)

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

LBI-30398N. MAINTENANCE MANUAL MHz PHASE LOCK LOOP EXCITER 19D423249G1 & G2 DESCRIPTION TABLE OF CONTENTS. Page. DESCRIPTION...

GOVERNMENT OF KARNATAKA KARNATAKA STATE PRE-UNIVERSITY EDUCATION EXAMINATION BOARD II YEAR PUC EXAMINATION MARCH-2013 SCHEME OF VALUATION

AN-348(1) OBTAINING SINUSOIDAL WAVEFORMS

ERICSSONZ LBI-30398P. MAINTENANCE MANUAL MHz PHASE LOCKED LOOP EXCITER 19D423249G1 & G2 DESCRIPTION TABLE OF CONTENTS

Integrated Circuit Approach For Soft Switching In Boundary-Mode Buck Converter

Testing Power Factor Correction Circuits For Stability

Chapter 10 Adaptive Delta Demodulator

CHAPTER 6 DIGITAL INSTRUMENTS

EE-4022 Experiment 3 Frequency Modulation (FM)

DUAL STEPPER MOTOR DRIVER

Input Limiter for ADCs

UNIT 2. Q.1) Describe the functioning of standard signal generator. Ans. Electronic Measurements & Instrumentation

Draw in the space below a possible arrangement for the resistor and capacitor. encapsulated components

Audio Applications of Linear Integrated Circuits

CD22202, CD DTMF Receivers/Generators. 5V Low Power DTMF Receiver. Features. Description. Ordering Information. Pinout. Functional Diagram

SPEECH ANALYSIS* Prof. M. Halle G. W. Hughes A. R. Adolph

LIMITATIONS IN MAKING AUDIO BANDWIDTH MEASUREMENTS IN THE PRESENCE OF SIGNIFICANT OUT-OF-BAND NOISE

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

16.2 DIGITAL-TO-ANALOG CONVERSION

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

SRM TM A Synchronous Rectifier Module. Figure 1 Figure 2

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Hands-On Introduction to EE Lab Skills Laboratory No. 2 BJT, Op Amps IAP 2008

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

HIGH LOW Astable multivibrators HIGH LOW 1:1

Pitch Period of Speech Signals Preface, Determination and Transformation

CHAPTER IV DESIGN AND ANALYSIS OF VARIOUS PWM TECHNIQUES FOR BUCK BOOST CONVERTER

Electronics. RC Filter, DC Supply, and 555

Chapter 2 Signal Conditioning, Propagation, and Conversion

For the filter shown (suitable for bandpass audio use) with bandwidth B and center frequency f, and gain A:

Basic Compressor/Limiter Design with the THAT4305

Experiment 1: Instrument Familiarization (8/28/06)

WESTREX RA-1712 PHOTOGRAPHIC SOUND RECORD ELECTRONICS

Type Ordering Code Package TDA Q67000-A5066 P-DIP-8-1

Experiment 5.A. Basic Wireless Control. ECEN 2270 Electronics Design Laboratory 1

A Simple Notch Type Harmonic Distortion Analyzer

LM13600 Dual Operational Transconductance Amplifiers with Linearizing Diodes and Buffers

ENGR 210 Lab 12: Analog to Digital Conversion

THAT Corporation APPLICATION NOTE 102

LM231A/LM231/LM331A/LM331 Precision Voltage-to-Frequency Converters

Improving Loudspeaker Signal Handling Capability

33609/J Limiter/Compressor

ANALOG TO DIGITAL CONVERTER

LINEAR IC APPLICATIONS

VOICE BOX Harmony Machine and Vocoder

Chapter 2 Analog-to-Digital Conversion...

Lauren Gresko, Elliott Williams, Elaine McVay Final Project Proposal 9. April Analog Synthesizer. Motivation

DTMF receiver for telephones

THE BENEFITS OF DSP LOCK-IN AMPLIFIERS

High-definition sound processor

DLVP A OPERATOR S MANUAL

B.E. SEMESTER III (ELECTRICAL) SUBJECT CODE: X30902 Subject Name: Analog & Digital Electronics

APPLICATIONS OF DSP OBJECTIVES

ELM409 Versatile Debounce Circuit

Number of Lessons:155 #14B (P) Electronics Technology with Digital and Microprocessor Laboratory Completion Time: 42 months

Electronic Instrumentation ENGR-4300 Fall Project 4: Optical Communications Link

UNIT-3. Electronic Measurements & Instrumentation

INTEGRATED CIRCUITS. AN1221 Switched-mode drives for DC motors. Author: Lester J. Hadley, Jr.

CI-22. BASIC ELECTRONIC EXPERIMENTS with computer interface. Experiments PC1-PC8. Sample Controls Display. Instruction Manual

Power Line Carrier Communication

Input and output coupling

11. Chapter: Amplitude stabilization of the harmonic oscillator

Experiment 1: Instrument Familiarization

ASTABLE MULTIVIBRATOR

Analog Synthesizer Project

Common-emitter amplifier, no feedback, with reference waveforms for comparison.

Wimborne Publishing, reproduce for personal use only

ML4818 Phase Modulation/Soft Switching Controller

Bel Canto Design evo Digital Power Processing Amplifier

Balanced Transmitter and Receiver II Rod Elliott (ESP) / Uwe Beis * Updated 01 April 2002

CD22202, CD V Low Power DTMF Receiver

Rotek AS440 compatible VOLTAGE REGULATOR (AVR)

MODELLING AN EQUATION

DTMF receiver for telephones

APPLICATION NOTE MAKING GOOD MEASUREMENTS LEARNING TO RECOGNIZE AND AVOID DISTORTION SOUNDSCAPES. by Langston Holland -

Table of Contents Lesson One Lesson Two Lesson Three Lesson Four Lesson Five PREVIEW COPY

EE283 Electrical Measurement Laboratory Laboratory Exercise #7: Digital Counter

Michael F. Toner, et. al.. "Distortion Measurement." Copyright 2000 CRC Press LLC. <

NJM3777 DUAL STEPPER MOTOR DRIVER NJM3777E3(SOP24)

SRVODRV REV7 INSTALLATION NOTES

ELT 215 Operational Amplifiers (LECTURE) Chapter 5

Dual-Axis, High-g, imems Accelerometers ADXL278

A 2 to 4 GHz Instantaneous Frequency Measurement System Using Multiple Band-Pass Filters

1GHz low voltage LNA, mixer and VCO

Principles of Analog In-Circuit Testing

Tel: Fax:

Galilean Moons. dual amplitude transmutator. USER MANUAL v1.02

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Limit-Cycle Based Auto-Tuning System for Digitally Controlled Low-Power SMPS

9 Feedback and Control

Lab 2 Revisited Exercise

THE DESIGN OF WIDEBAND ANALOG 90 PHASE DIFFERENCING NETWORKS WITHOUT /\ LARGE SPREAD OF CAPACITOR VALUES;

Transcription:

FNGINEERING REPORTS A Simple Hardware Pitch Extractor 1 * BERNARD A. HUTCHINS, JR., AND WALTER H. KU Cornell University, School of Electrical Engineering, Ithaca, NY 1485, USA The need exists for a simple, hardware, real-time pitch extractor for speech research, training, and similar applications requiring less than perfect accuracy. A simple combination of a band-pass filter and a fast amplitude detector formed from a tapped analog delay line provides useful and reliable pitch contours from a live speech input. 0 INTRODUCTION While lists of publications on the subject of pitch extraction would run many pages long, most are concerned with computer software, and only a few of these become involved with real-time operation [1], [2]. Still fewer also consider hardware in any way []-[6], and those that involve hardware that is relatively simple are quite rare [7]-[9]. Such simple, hardware, real-time pitch extractors are useful in a number of applications, such as linguistic studies, language training, and speech therapy aids for the deaf. Depending on the accuracy of the extractor and the range of inputs to be processed, simple devices may also suffice for speech encoding systems and for electronic music applications. The pitch extractor, or pitch-to-voltage converter, considered here is a device which takes ordinary speech as an input and gives a voltage contour proportional to the pitch as the output, This contour is available for display on an oscilloscope face, for storage, or for transmission as needed. As always, the distinction between pitch and frequency must be made, and this probably is easiest to understand if we consider pitch extraction in basic terms, appropriate to the actual device to be discussed. The pitched (or voiced) speech signal is produced in the throat by pulses of air passing through the vocal cords, and this excitation is filtered (convolved) by the vocal track. The result is that the actual speech signal is a mixture of excitation features and resonant features. * Manuscript received 1981 July 6. The waveform actually evolves on a cycle-to-cycle basis, and it is virtually assured that there will be no continuously available single prominent peak in the waveform corresponding to the excitation. Thus a simple pitch extraction scheme is to preprocess the speech signal so that features corresponding to excitation are enhanced, while features resulting from resonance are reduced. Once the peak is isolated, its point of occurrence can be identified, and its repetition frequency can be reported with a much simpler frequency-to-voltage (f/v) converter. 1 THEORY OF OPERATION 1.1 Peak Amplitude Detector In the pitch extractor or p/v converter described here, the isolation of a single strong feature is achieved as a two-step process. First a simple band-pass filter, chosen experimentally, serves to strengthen the fundamental frequency in a region where resonances (formants) tend to emphasize upper harmonics. The second element in the process is a very rapid peak amplitude detector formed from a tapped delay line which serves to detect the height of the maximum peaks as well as their point of occurrence. The need for some sort of peak amplitude detector can be understood since it is necessary to know the maximum amplitude in order to detect the moment at which this maximum is achieved. Traditional amplitude detectors such as full-wave rectifiers and peak detectors are often unsatisfactory in this application because they require some sort of averaging time to determine the J. Audio Eng. Soc., Vol. 0, No., 1982 March 0004-7554/82/0015-05$00.75 5 1982 Audio Engineering Society. Inc.

HUTCHINS AND KU amplitude level, and the actual parameters of speech can change significantly over the few pitch cycles it takes to get this average. The parallel of this situation with the energy-time uncertainty relationships of physics becomes apparent, and it is indeed interesting that human speech has evolved to approach this limit in a significant way. The ideal peak amplitude detector would be a device that looked at exactly one cycle of a speech waveform (one pitch period), and searched this waveform instantaneously for a maximum. This is analogous to the process of determining the amplitude of a waveform by scanning it by eye as it is displayed on the face of an oscilloscope. In a practical case, sampled analog values of a speech waveform can be stored, and then the maximum of these stored values can be determined. In the present pitch extractor a fixed-delay tapped analog delay line is used to make available samples of a preprocessed speech waveform in this manner. Fig. 1 illustrates the basic idea where the preprocessed speech is first half-wave rectified, and where parallel diodes then select the maximum of the stored values. The delay time is set so that even at the lowest expected pitch, a fulf cycle will be represented on the delay line. While this means that there are several cycles on the line in the upper pitch regions (as in Fig. 1), this is not a serious problem because significantly higher pitches tend to have simpler waveforms, being above one or more of the format frequencies, and vary less on a cycle-to-cycle basis. 1.2 Block Diagram Fig. 2 shows a block diagram of the pitch extractor. The topmost portion consists of four preprocessing sections. First there is a set of input amplifiers to accept various input signal levels. Second there is a gain adjustment.section which works to equalize middle amplitudes, reducing amplitudes that are very low or very high. Third there is a filter, which is the single most important part of the whole extractor. It is a voltagecontrolled band-pass filter, but is generally used as a fixed band-pass filter (typically with a center frequency of 10 Hz and a Q of for a male talker). The fourth element is a half-wave rectifier which is used here mainly to take better advantage of the available dynamic range of the delay line that follows. The tapped delay line in the middle of the diagram is the peak amplitude detector as described briefly above. A comparator below the delay line serves to detect the actual occurrence of a peak feature to within a threshold voltage. The bottom of the diagram shows the f/v converter and the output control logic which makes a voiced/unvoiced decision, and outputs the pitch for a voiced input. 2 CIRCUITRY OF PITCH EXTRACTOR 2.1 Preprocessing Unit Fig. shows the circuitry of the preprocessing unit. The input amplifiers at the top and the half-wave rectifier at the bottom are well-known circuits and need no further discussion. The filter is a standard transconductance-controlled state-variable filter [10] used as a bandpass with peak response independent of Q. Voltage control is used here mainly as a way of avoiding the use of a dual pot, but also because some users may wish to experiment with feedback arrangements to change the filter frequency slightly when an initial pitch readout is obtained. The automatic gain adjustment unit is neither a true automatic gain control nor a compressor. The actual output amplitude versus input amplitude curve for the unit is shown in Fig. 4, and it can be seen Fig. 1. Fast amplitude extractor based on tapped delay line Fig. 2. Block diagram of pitch extractor. Fig.. Preprocessors of pitch extractor. J Audio Eng. Soc., Vol 0. No., 1982 March

that it is the middle amplitudes that are favored, helping the overall unit to ignore low background noise, and to avoid clipping for high levels. 2.2 Peak Amplitude Detector Circuitry The circuitry of the peak amplitude detector is shown in Fig. 5. A simple two-phase clock for the delay line is formed around timer IC-17 and flip-flop I CMS, and these clock the tapped delay line IC-19. IC-56 is the comparator (here actually a Schmitt trigger) indicated in Fig. 2. The remainder of the circuitry, IC-20 through IC-55, serves to accommodate the needs of the delay line, and could perhaps be simplified if a more suitable charge-transfer device for this application becomes available. Fig. 4. Response of automatic gain adjustment section. SIMPLE HARDWARE PITCH EXTRACTOR Any actual signal that is to be carried on the TAD-2 delay line must enter through pin 21 and be within a relatively small range of voltage, typically about +6 V with an ac excursion of about ±1 V allowed, with any excessive ac voltage being distorted or clipped. IC-4 serves to adapt the input signal to this requirement, attenuating it (input gain trim) and adjusting the dc level (input dc trim). Since the input signal here is unipolar (coming from a half-wave rectifier), it would be a waste of available dynamic range if dc zero were set at the center of the TAD-2 range. Therefore the input dc zero is set at the bottom of the range. Consequently the half-wave rectifier used is redundant since negative signals would be clipped off anyway, but it seems good practice to include it anyway to avoid all possible problems. The signal, thus properly inputted to the TAD-2, appears at each of the taps with approximately the same dc and ac levels, roughly the same as the input ac and dc levels at pin 21. To properly complete the design, each of these taps must be amplified, level shifted, and buffered by its own operational amplifier, of which IC- 21 is typical. The tap numbers and pin numbers are given in Table 1. It is fairly important that all the taps be equally dc trimmed. This is difficult to do directly because of thermal drift of the dc level on the TAD-2 itself. While this drift is significant, the relative drift between taps is very small. In this application, once the taps are all set, any modest absolute drift is of little importance. The problem is thus one of holding the line steady while relative trimming is done. One approach is to trim the first tap to zero and then trim all the other taps relative to the first one by measuring the voltage difference between the taps with a digital multimeter. Another approach is to set the first tap near zero, and connect it to the (-) input of an extra operational amplifier, with the (+) input grounded and the output fed back to the input of the delay line. This extra operational amplifier makes the first tap a virtual ground, and the other taps can then be trimmed to zero dc, after which the extra operational amplifier can be removed. Dc trimming is done through resistors of which R, is Table I. TAD-2 tap pincut. ay-- Fig. 5. Peak amplitude detector. TAP 1 5 7 9 11 1 15 17 19 21 2 25 27 29 1 PIN 2 24 25 26 27 28 29 0 1 2 4 5 6 7 8 TAP 2 4 6 8 10 12 14 16 18 20 22 24 26 28 0 2 PIN 18 17 16 15 14 1 12 11 109 8 7 6 5 4 J. Audio Eng. Soc., Vol. 0, No., 1982 March

HUTCHINS AND KU typical. If done with trim pots connected between +15 and 15 V, R, should be about 1.5 MO. It may be nearly as easy, and much less expensive, to calculate a value of R, that will zero the dc, and then solder the free end to +15 or 15 V supplies as needed. With these adjustments finished, IC-19 through IC- 52 can be considered a good approximation to an ideal tapped delay line for the purposes of this application. The delay line then consists of 2 samples of the input waveform. Whichever of these samples (taps) is the largest will pass through its associated diode and back bias all the other diodes. This voltage represents the peak amplitude and is buffered by IC-5, and adjusted slightly by IC-54 and IC-55. A fraction of this voltage is set by threshold control R Ih, and any tap can be chosen to be compared with this threshold voltage by IC-56. Thus the output of IC-56 changes ideally once per pitch period, and this is fed to the f/v converter. An amplitude readout is available from IC-55 if desired. The output control section is shown in Fig. 7. It is essentially an analog switch (IC-70) which passes the output of the frequency-to-voltage converter when a voiced decision is arrived at. The voiced/unvoiced logic is as follows. First a signal is determined to be unvoiced if the energy passing through a 4-kHz high-pass filter (IC-67) exceeds a certain threshold (as with fricative wide-band noise, for example). Second, a signal is "possibly voiced" if the ramp voltage of the frequency-tovoltage converter remains below a +5-V level (as for periodic triggering, or for random triggering due to transients). The logic voiced condition is then an AND function (IC-71) of a "not unvoiced" (IC-68) and a "possibly voiced" (IC-69), with these logic inputs defined as above. Other voiced/unvoiced logic schemes can of course be considered [12J and will work. In fact, there seems to be a degree of latitude in this decision since linguistic researchers are not wholely comfortable with a voiced/ unvoiced dichotamy in the first place (preferring that a transition phase between the two be included), and in the second place, they soon learn to mentally discard portions of the readout that they consider to be in error. For speech encoding, small errors in voiced/unvoiced determination have a larger effect on the naturalness of the speech than they do on its intelligibility. 4 OPERATION AND RESULTS As with any new and experimental device, there is often a tendency to make as many parameters user controllable as possible, with the result that there are so many panel controls that the device is confusing. In the present extractor the indicated switches should be panel features, along with the "audio gain," "filter frequency" and "filter g" controls from Fig., the "delay time" and "peak amplitude threshold" controls from Fig. 5, and the "unvoiced threshold" control from Fig. 7. A certain amount of calibration and experimentation will be required. Some suggested initial settings of controls are given in Table 2. OUTPUT SECTION.1 Frequency-to-Voltage Converter The frequency-to-voltage converter is shown in Fig. 6 with the upper portion being a period-to-voltage converter and the lower portion an analog divider. The period-to-voltage converter works by sampling, holding, and resetting a ramp voltage each time a pulse comes out of the peak amplitude detector. The frequency-to-voltage converter is described in more detail elsewhere [11] and has no measurable error in the region from 70 Hz through 500 Hz..2 Voiced/Unvoiced Logic Control Fig. 6. Frequency-to-vohage converter. Fig. 7. Voiced/unvoiced logic output and control. J. Audio Erg. Soc., Vol. 0, No., 1982 March

Parameter Table 2. Suggested initial settings. Polarity Automatic gain Audio gain Filter frequency Filter Q Delay line time Peak amplitude threshold Unvoiced threshold Male Talker No preference Auto Medium 10 Hz Maximum 80% 2/ maximum Fig. 8. Example pitch output curves. Female Talker No preference Auto Medium 150 Hz 1.5 /4 maximum 80% 1/ maximum A sample pitch contour readout is shown in Fig. 8. In general, fairly good results are obtained for a given user after about 5 min of parameter adjustment. Nearly all users will have to make at least a few adjustments. If the very best results are desired for a given word or phrase, it is best to record this and play it into the extractor using a tape loop. Then the exact effects of the varius controls can be studied under controlled conditions, and the optimum settings can be determined. 5 ACKNOWLEDGMENT The authors wish to express their thanks to a number of persons at Cornell University who have aided in this project. These include S. Zwolinski and T. Nolan who worked in the School of Electrical Engineering, and J. SIMPLE HARDWARE PITCH EXTRACTOR Grimes, S. Hertz, D. Walter, and J. Gale of the Department of Linguistics and Modern Languages. This work was supported by Rome Air Development Center (RADC), Deputy for Electronic Technology, Hanscom Air Force Base, MA, under Contract F49620-77-C-0069 (Dr. William Ewing, Technical Monitor). 6 REFERENCES [I] J. N. Maksym, "Real-Time Pitch Extraction by Adaptive Prediction of the Speech Waveform," IEEE Trans. Audio Electroacoust.. vol. AU-21, pp. 149-154 (197 June). [2] S. Seneff, "Real-Time Harmonic Pitch Extractor," IEEE Trans. Acoust., Speech, Signal Proc., vol. ASSP-26, pp. 58-65 (1978 Aug.). [] M. M. Sondhi, "New Methods of Pitch Extraction," IEEE Trans. Audio Electroacoust., vol. AU-16, pp. 262-266 (1968 June). [4] J. J. Dubnowski, R. W. Schafer, and L. R. Rabiner, "Real-Time Digital Hardware Pitch Extractor," IEEE Trans. Acoust., Speech, Signal Proc., vol. ASSP- 24, pp. 2-8 (1976 Feb.). [5] R. O. Hamm, "Fast Pitch Detection," presented at the 58th Convention of the Audio Engineering Society, New York, 1977 Nov. 4-7, preprint no. 1265. [6] W. H. Tucker and R. H. T. Bates, "A Pitch Estimation Algorithm for Speech and Music," IEEE Trans. Acoust., Speech, Signal Proc., vol. ASSP-26, pp. 597-604(1978 Dec.). [7] B. A. Hutchins, "Pitch Extraction, Part : The Complete Experimental Device," Electronotes, vol. 7, pp. -11 (1975 July). [81 I. Fritz, "Simple Pitch Extractor for Clarinet," Electronotes, vol. 9, pp. -7 (1977 Sept.). [9] D. Wills, "Pitch Extractor for Guitar and Microphone," Electronotes, vol. 10, pp. 15-2 (1978 Apr.). [10] B. A. Hutchins. Musical Engineer's Handbook (Electronotes, 1975), chap. 5d. [II] B. A. Hutchins, "A Frequency-to-Voltage Converter," Electronotes Application Note 114, 1978 Dec. [12] S. G. Knorr, "Reliable Voiced/Unvoiced Decision," IEEE Trans. Acoust., Speech, Signal Proc., vol. ASSP-27, pp. 26-267 (1979 June). The biographies of Messrs. Hutchins and Ku were published in the Jan./Feb. issue. O J. Audio Eng. Soc., Vol. 0, No., 1982 March