Location of sound source and transfer functions

Size: px

Start display at page:

Download "Location of sound source and transfer functions"

Helen Robertson
5 years ago
Views:

1 Location of sound source and transfer functions Sounds produced with source at the larynx either voiced or voiceless (aspiration) sound is filtered by entire vocal tract Transfer function is well modeled by multiple resonances in cascade. Transfer function = product of transfer function of individual resonances. Number of resonances (below Nyquist frequency) is roughly the same regardless of tube shape, determined by length of tube

2 Noise sources within the vocal tract fricatives, stop releases Will effectively only be filtered by cavity anterior to source There will be fewer resonances of that (smaller) tube below the Nyquist frequency than of the vocal tract as a whole. The main resonance of this cavity will often be similar to one of the full vocal tract resonances This can be modeled by filtering the noise source through all of the vocal tract resonances in parallel, and setting specific amplitudes for each resonance. Vocal tract resonances that correspond to resonances of the cavity anterior to the noise source will have high amplitude, others will have zero amplitude.

3 Cascade vs. Parallel Formant synthesis

4 Klatt (1980) synthesizer F0 xx VOICING SOURCE NOISE SOURCE FIRST DIFF. RADIATION CHARACTERISTIC PARALLEL VOCAL TRACT TRANSFER FUNCTION

5 syn4 New Parameters AF Amplitude of noise A2 Amplitude parallel F2 A3 Amplitude parallel F3 A3 Amplitude parallel F4 A4 Amplitude parallel F5 A5 Amplitude parallel F6 (=4900) AB Amplitude Bypass (no filtering)

6 syn4.m function signal = syn4 (srate,frame_dur,nf,ftable) % synthesize.m % Louis Goldstein % November 2009 % formant synthesizer % usage: % [out, t] = syn3 (srate,frame_dur,ftable) % % input arguments: % srate sampling rate (in Hz) % f0 fundamental frequency (in Hz) % frame_dur duration of each frame in milliseconds % Ftable character string containing filename of F table % Row 1: AV % Row 2: f0 % Row 3: AH % Row 4 to Row 4+nf-1: formant frequencies % Row 4+nf to Row 4+2*nf-1: formant bandwiths % Row 4+2*nf: AF % then rows for A2-A6, then AB % %returned arguments: % signal vector with synthesized waveform samples

7 syn4.m % location of parameters in table iav = 1; if0 = 2; iah =3; if1 = 4; ib1 = if1+nf; iaf = ib1+nf; AV_gain = 100; % voiced gain factor AH_gain =.01; % voiceless gain factor AF_gain =.01; Again = [ ]; ABgain =.1; FBW = get_fbw(ftable); nframes = size(fbw,2); dur = nframes * (frame_dur / 1000 );% duration in seconds samps_per_frame = floor(srate * (frame_dur / 1000));

8 syn4.m % generate sources % voiced source f0 = FBW(iF0,:); AV = FBW(iAV,:)*AV_gain; [voiced, mod_pulse] = make_pulses(f0, srate, frame_dur, AV); nframes = min ([floor(length(voiced)./ samps_per_frame) nframes]); tot_samples = nframes * samps_per_frame; voiced = voiced(1:tot_samples); RG = 0; % RG is the frequency of the Glottal Resonator BWG = 100; % BWG is the bandwidth of the Glottal Resonator [b_glo,a_glo]=resonance(srate,rg,bwg); % filter impulse train thru low-pass filter % to get approximation to shape of glottal pulse voiced=filter(b_glo, a_glo, voiced);

9 syn4.m % noise source AH = FBW(iAH,1:nframes)*AH_gain; noise = randn(1, tot_samples); % Gaussian noise % calculate velocity source from pressure source noise = filter ([.5.5], 1, noise); mod_pulse = mod_pulse (1: tot_samples); noise = noise.* mod_pulse; AH_int = interp(ah, samps_per_frame); AH_int = AH_int(1:tot_samples); % compute composite source in = voiced + (noise.* AH_int);

10 syn4.m % filter successive frames of source through VT cascade for i = nf:-1:1 beg_sample = 1; z = []; for iframe = 1:nframes F = FBW (if1:if1+nf-1, iframe); BW = FBW (ib1:ib1+nf-1, iframe); [b,a]=resonance(srate,f(i),bw(i)); [out,z] = filter(b,a,in(beg_sample:beg_sample+ samps_per_frame-1),z); in(beg_sample:beg_sample+samps_per_frame-1) = out; beg_sample = beg_sample+samps_per_frame; end end signal = in(1: nframes*samps_per_frame);

11 % parallel noise branch: do only if data found in appropriate rows in file if size(fbw,1) >= iaf % filter thru formants in parallel syn4.m % start with F2 and go to FN+1 noise_in = AF_gain * noise; noise_out = zeros(1, length(signal)); for i = 2:nf+1 beg_sample = 1; z = []; f_out = []; for iframe = 1:nframes if i <= nf F = FBW (if1+i-1, iframe); BW = FBW (ib1+i-1, iframe); else F = (srate/2) - 100; BW = 100; end [b,a]=resonance(srate,f,bw); [out,z] = filter(b,a,noise_in(beg_sample:beg_sample+samps_per_frame-1),z); % FAmp factor = A(i) * Again(i) * AF Famp = FBW(iAF+i-1, iframe).* Again(i).* FBW(iAF, iframe); f_out = [f_out out.*famp]; beg_sample = beg_sample+samps_per_frame; end noise_out = noise_out + f_out; end

12 syn4.m % Bypass the formant resonators for noise produced at the lips beg_sample = 1; By_out = []; for iframe = 1:nframes Famp = FBW(iAF+nf+1, iframe).* ABgain.* FBW(iAF, iframe); By_out = [By_out noise_in(beg_sample:beg_sample+samps_per_frame-1).*famp]; beg_sample = beg_sample+samps_per_frame; end % Add parallel noise output and Bypass output to cascade output signal = signal + noise_out + By_out; end % filter through high pass radiation filter and filter % this calculates volume velocity at a distance from the the mouth, signal = filter([1-1],1,signal);

13 syn4.m soundsc (signal, srate); %plot the values of F1-F4 as function of frame in the upper panel %plot the synthesized signal as a function of t in ms in the lower panel figure (1) frames = 1:nframes; subplot (2,1,1), plot (frames, FBW(iF1:iF1+3, :),'-o') xlabel ('Frame No.') make_spect2(signal', srate,6); %subplot (2,1,2), plot ([1:length(signal)]*1000/srate,signal); xlabel ('Time in milliseconds');

14 asa.txt F AV F AH F from analysis of /ada/ F F F F B B B B B AF A A A A A AB

15 asa output

16 Voiced Fricatives Employ voicing (AV) and noise (AF) sources. Amplitude of noise source should be modulated by laryngeal source Noise is mostly restricted to open phase of glottal cycle. Approximate by setting noise source to 0 during half of glottal cycle.

17 make_pulses function [pulses, mod] = make_pulses(f0, srate, frame_dur,av); % % Louis Goldstein % November 2009 % calculate sequence of impulses based on an f0 vector % % Input parameters % f0 vector of f0 values % srate sampling rate (Hz) % frame_dur duration of each f0 frame (corresponds to slide in get_f0) % AV vector of voicing amplitudes frame_length = floor(frame_dur * srate / 1000); % frame length in samples length_f0 = length(f0); % interpolate f0 so it has a value for every sample and scale in cycles/sample cont_freq = interp(f0/srate, frame_length); cont_av = interp(av,frame_length); % calculate elapsed cycles for every sample elapsed_cycles = cumsum(cont_freq); %calculate percentage way through current cycle cycle_percent = rem(elapsed_cycles,1); mod = double(cycle_percent<.5); mod(cont_av == 0) = 1; % set mod to 1 for first half of each voiced cycle % set mod to 1 if if AV>0 shift = [0 cycle_percent(1:end-1)]; % set pulses (1s) and 0s elsewhere pulses = cycle_percent<shift; % will be true only when cycle boundary is crossed pulses = cont_av.* double(pulses);

18 aza.txt from analysis of /ada/ F AV F AH F F F F F B B B B B AF A A A A A AB

19 aza output

20 afa.txt from analysis of /aba/ F AV F AH F F F F F B B B B B AF A A A A A AB

21 afa output

22 Stop Releases

23 Frication at release short fricative excites same resonators as homorganic fricative

24 ata_burst.txt F AV F AH F F F F F B B B B B AF A A A A A AB

25 ata_burst output

26 ata.txt F AV F AH F F F F F B B B B B

27 ata output

28 new version of ftime2 plot short-time spectrum (6 ms) of each frame superimposed on LPC transfer function estimate. plot (freq(1:l), 10*log(abs(h(1:L)))+170) grid xlabel ([' Frame number is:' num2str(iframe)]) % make F, BW vectors to display in title for i = 1:5 Fdisp(i) = fix(f(i, iframe)); BWdisp(i) = fix(bw(i, iframe)); end title (['Frame: ', num2str(iframe), ' F:' num2str(fdisp) '; num2str(bwdisp) '; ' 'f0:' num2str(fix(f0(iframe)))]); ' 'BW:' beg_sample = 1+(iframe-1)*samps_per_frame; nsamps = floor((winsize/1000)*sr); % winsize = 6ms for spectrum hold on spectrum (signal(beg_sample:beg_sample+nsamps-1), sr, 1024); grid hold off

29 /a/ in ada release burst in ada

30 Stop Releases

31 Transfer Functions: For source at constriction

32 Labial Releases

33 Coronals Dorsals

34 Spectral shape and place of articulation Stevens & Blumstein (1981): Overall spectral shape at stop release tends to be invariant across Vs, even though resonances are not. Labial Dorsal Coronal

35 Klatt (1987) Labial Coronal Dorsal Front Back Back Rounded

36 For synthesis, we need to set the amplitudes of particular frequencies for burst TABLE HI. Parameter values for the synthesis of selected components of English consonants before front vowels (see text for source amplitude values). Labials: no anterior cavity, so only set AB. Sonor F B1 B2 B3 Coronals: mostly A5-A6 Dorsals: differ by vowel context [w] [y] [ [1] Klatt (1980) Front V context Fric. F1 /72 F3 B1 B2 B3 A2 A3 A4 A5 A6 AB [fl [v] [0] [ ] [s] [z] [ ] Affricate Plosive [ ] [j] [p] [b] [fl [d] [k] [g]

Nature of Noise source. soundsc (noise, 10000);

Nature of Noise source. soundsc (noise, 10000); Noise Sources Voiceless aspiration can be produced with a noise source at the glottis. (also for voiceless sonorants, including vowels) Noise source that is filtered through VT cascade, so some resonance