Digitized signals. Notes on the perils of low sample resolution and inappropriate sampling rates.

Size: px

Start display at page:

Download "Digitized signals. Notes on the perils of low sample resolution and inappropriate sampling rates."

Georgiana Nash
6 years ago
Views:

1 Digitized signals Notes on the perils of low sample resolution and inappropriate sampling rates. 1

2 Analog to Digital Conversion Sampling an analog waveform Sample = measurement of waveform amplitude at a specific instant in time. Samples should be evenly spaced in time Effects of sampling rate Effects of sample resolution Digitized waveforms are discrete approximations to continuous functions of time. Digital waveforms are described in terms of their sampling rate in Hertz and sample resolution in bits per sample. Although Hz normally means cycles per second, it is generalized to samples per second when describing a sampling rate. Samples are uniformly spaced in time and each sample is a measurement of the amplitude of an analog waveform at an instant in time. The sampling rate specifies how many measurements of the analog signal amplitude are made per second by the Analog to Digital Converter (ADC) hardware of the computer. Similarly, when digital waveforms are converted to analog signals (using a Digital to Analog Converter or DAC), sampling rate refers to how many times per second the (DAC) hardware must be updated with a new sample value. Sampling rate has a significant impact on the range of frequencies that can be represented in a digitized waveform Sample resolution specifies the accuracy of each amplitude measurement made by the ADC hardware. The more bits of resolution, the greater the accuracy. Using a large number of bits per sample allows the full amplitude range of an ADC to be partitioned into a large number of very small steps. This in turn determines the effective signal to noise ratio of a digitized waveform. 2

3 Digitized Waveform In this figure, the X axis is time in msec and the Y axis is amplitude in arbitrary units. This figure shows an analog sine wave (solid line) which is sampled (illustrated by the circle symbols) at a comparatively high rate compared to its frequency and with good sample resolution. 3

4 Aliasing Sampling rate determines the bandwidth of a digitized signal, that is, the frequency range that can be represented in the signal. The maximum frequency representable in a sampled waveform is termed its Nyquist frequency, and is equal to one half the sampling rate. Thus, for example, a waveform sampled at 16,000 Hz can represent all frequencies up to its Nyquist frequency of 8,000 Hz. Aliasing occurs when a sampled signal contains energy at frequencies above the Nyquist frequency. This is illustrated in the figure wherein the solid curve represents the input analog signal and circles show where samples were taken (at a low rate relative to the frequency of the input signal. The dotted line illustrates the apparent frequency of the sampled waveform, completing about two cycles in the period that the original signal completed 20 cycles. Aliasing generates components of lower frequency from components that are higher in frequency than the Nyquist frequency. It is impossible to distinguish a component generated by aliasing from one that was actually present in the input signal. This effect is one of the most common sources of distortion in digitized waveforms. Fortunately, most modern computer hardware for digitizing sound has built-in filters which are tuned to remove sound energy at frequencies beyond the Nyquist frequency for whatever sampling rate is being used. 4

5 Sample resolution Sample resolution (the number of bits per sample) determines how many gradations of amplitude (corresponding to loudness) can be represented in the digital waveform. The range of amplitudes from the most negative possible sample value to the most positive is referred to as the dynamic range and is normally expressed in decibels relative to the smallest non-zero digital value. For example, with 16-bit resolution, the largest positive value representable is +32,767 and the smallest value representable is -32,768 giving a total range of 65,536 values or about 96 db. That is, 20.0 * log10(65638 / 1) = 96.3 This figure illustrates the effects of varying sample resolution. An analog sine wave is sampled at moderate resolution (circles correspond to about 6 bits resolution) and at very coarse resolution (asterisks correspond to about 3 bits resolution). While the higher resolution samples approximate the shape of the waveform fairly closely, the low resolution samples give a coarse, stair-step, approximation to the original waveform shape. Typically, the ADC hardware has some noise floor such that the accuracy of the conversion is + or - some number of least significant bits. In that case, the signal to noise ratio expressed in decibels for the conversion (assuming 16-bit samples) is: 20.0 * log10(65638/noisefloor) 5

6 Waveform Display and Editing WEDW program - Download. Rules for labeling segments Homework Download sentences and transcriptions from class website. Label each phoneme in each file. Use getseg (runs in DOS window) to save segments from each file & mail to me. There is a link on our class website to Wedw. The zip file you will down load contains Wedw and its user manual, some support programs, and an example waveform. Verify that you can display and playback the example & use it to play with program if you wish. The Wedw manual is also separately available on the class web site. In class we ll go over some of the basics in how to segment waveforms. That is, where to put the boundary markers between adjacent segments. There is also a homework packet on the class website. It contains five waveforms for you to practice labeling. These are sentences from the TIMIT database. Each waveform file also has an associated.txt file that lists the English gloss for the sentence and the phoneme codes we ll use to label them. When you ve labeled the waveforms, use the program getseg (see documentation at to read out the locations of the segment labels into files and the files to me. 6

7 Obstruent - Vowel higher formants rule Nasal - Vowel abrupt spectral shift Labeling Rules 1 Approximant - Vowel higher formants rule maximum slope rule This and the next group of slides provide some sketchy rules for phonetic labeling. These are similar to rules described by Senneff et al. that were used in labeling the TIMIT database. 1) Higher Formants Rule - place the boundary at the onset of the first pitch period for which the higher formants (F2 & F3) are excited by voicing (CV sequence) or just after the last pitch period showing higher formant voiced excitation (VC sequence). 2) Abrupt spectral shift - particularly for sequences involving nasals, the onset of the nasal closure interval is well marked by the very sudden change in the speech spectrum 3) Maximum Slope Rule - In cases where voicing excitation is present for higher formants throughout the transition region, place the segment boundary at the location where the formant transitions (especially F2) are steepest. 7

8 Labeling Rules 2 /.[lrw]/ clusters (slap, trod, quad, etc.) Voiceless: voicing onset (not phonetically correct!). Voiced: max slope rule between /C/ & /[lrw]/. /s[ptk] clusters (spot, stack, scat) label silence between /s/ and vowel as stop. Final unreleased stop label silence as stop 1) It is convenient to mark the onset of approximants in voiceless stop clusters at the point where voicing begins because this can almost always be reliably located. In reality, the first part of the approximant is devoiced and begins during the aspiration interval following the stop release. 2) Voiced stop-approximant clusters generally show the formant transitions from the stop release into the approximant and from the approximant into the vowel. In this case, the approximant is delimited by the points of maximum slope in the transitions or by the higher formants rule if there is a steady-state approximant and onset of F2 & F3 is very abrupt (rarely happens). 3) In /s/-stop sequences, there is almost never a release burst or aspiration, and the transitions from stop to vowel are extremely fast. 4) Syllable-final stops are often unreleased. In a phrase like cat fur the /t/ is assigned the silence, between /ae/ and /f/. In a phrase like cat tail, there will normally be only one silent interval and one /t/ burst even though there are two adjacent /t/s. The silent interval is normally a little longer than it would be in a word like attack where there actually is only one /t/. In this case, we sometimes label it as only one /t/, and sometimes label the first /t/ as part of the silence with the second /t/ being the rest of the silence and release, aspiration, etc. Note that this depends entirely on how we decide to handle the phonology (i.e., one /t/ was deleted by a rule, or reduced by a rule?). 8

9 Labeling Rules 3 /V/ - /V/, /[lrwj]/ - /[lrwj]/, /V/ - /[lrwj]/ Use maximum slope rule Use presence of glottalization listen! Flaps a) Higher formants rule if possible b) Maximum slope rule if not a) c) Point of spectral movement if not [a) or b)] These can often be tricky combinations. 1) Words like piano may be handled with the maximum slope rule. When the maximum slope rule is hard to apply, there are some times cases (especially between two adjacent vowels from different words as in three inch) talkers will produce some glottal fry to mark the word boundary and this is then the inter vowel boundary as well. If all else fails, or even just to check yourself, try listening to the segments and see if you can find the best location for the boundary marker. 2) Flaps can range from being things like short /d/ s with an identifiable closure interval, to cases where there are clear transitions into and out of the flap, but no steady-state closure interval, to cases where there is just a slight disturbance in the spectrum. In the first case, use the higher formants rule if possible. If that s not possible, use the maximum slope rule as it applies to the /d/ transitions. Sometimes, we label the flap as consisting of only the one pitch period wherein the spectral disturbance is seen. 9

10 Obstruent - Vowel This is a very clear cut case. The final /t/ even has a release burst. 10

11 Approximant - Vowel In this case, the boundary between the /b/ and /l/ is located at the onset of voicing and the /l/ to vowel boundary at the point of abrupt spectral change. Note that there is a very brief moment of voicing at the release of the final /b/. This can be weak enough that it is simply marked as part of the /b/ release as in this case. It can also be strong enough to sound like a true schwa vowel. In that case, the schwa is called an epenthetic vowel and should be marked with a special symbol. The TIMIT database has such markings, we do not generally use them in the SRL symbol set, but we do have a diacritic for marking this: we would label the vowel as AX-e with the appended -e indication that it is an epenthetic insertion. 11

12 Approximants 1) Boundary from AA to RX is marked per maximum slope rule. 2) Boundary from RX to WW is marked per maximum slope rule. 3) Boundary from WW to AH appears to be using the higher formant rule. It s a judgment call in this case whether to use the maximum slope or highest formant rule. 12

Fricative - Affricate 1) The AH to SS boundary is placed slightly later than the point at which F2 & F3 appear to loose voicing information, but F1 was still very

13 Fricative - Affricate 1) The AH to SS boundary is placed slightly later than the point at which F2 & F3 appear to loose voicing information, but F1 was still very strong and the boundary coincides exactly with the offset of voicing. 2) the SS to CH boundary is placed at the onset of what appears to be the closure for the affricate. 13

14 Nasal or Liquid in coda Main points of interest here are the EH to LX boundary (using maximum slope rule) and the LX to VV boundary (using higher formant rule). 14

15 Dark /l/ The person who labeled this looked for (1) the change in formant slope from the steady-state vowel AO to the fairly constant slope transition, and (b) probably did a lot of listening to the different regions of the word to help decide where the best boundary point was. 15

16 Nasal - Nasal 1) Notice the glottalized onset of the AH which is considered part of the vowel. 2) Note the abrupt change in the spectrum of the nasal segment when the NN changes to an MM. It is also possible here to see the shift in the oral cavity zero from the frequency for NN to the lower frequency we expect for MM 16

Source-filter Analysis of Consonants: Nasals and Laterals

L105/205 Phonetics Scarborough Handout 11 Nov. 3, 2005 reading: Johnson Ch. 9 (today); Pickett Ch. 5 (Tues.) Source-filter Analysis of Consonants: Nasals and Laterals 1. Both nasals and laterals have voicing