LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter PDF Free Download

1.0 Lab overview and objectives This lab will introduce you to displaying and analyzing sounds with spectrograms, with an emphasis on getting a feel for the relationship between harmonicity, pitch, and frequency. Lab due time/date: 2PM, January 24, 2005 What to hand in: A text file containing short answers (at least a sentence, no more than a paragraph, except where noted) to each question in each section of this lab. Format of required submission: an email attachment to the course email address (cs395-23@cs.northwestern.edu). DON T ZIP YOUR ATTACHMENT! For some reason, the mail server is giving us problems with zipped attachments. 2.0 The spectrogram The MATLAB Signal Processing Toolbox provides a function, specgram, that returns the time-dependent Fourier transform for a sequence, or displays this information as a spectrogram. For your convenience, I have appended the MATLAB help page for specgram to the end of this document. You will need to consult this page. The time-dependent Fourier transform is the discrete-time Fourier transform for a sequence, computed using a sliding window. This form of the Fourier transform, also known as the short-time Fourier transform (STFT), has numerous applications in speech, sonar, and radar processing. You will explore some of these applications in this lab. The specgram function calculates the spectrogram for a given signal as follows (yes, this is drawn from the help page): 1. It splits the signal into overlapping sections and applies the window specified by the window parameter to each section. 2. It computes the discrete-time Fourier transform of each section with a length nfft FFT to produce an estimate of the short-term frequency content of the signal; these transforms make up the columns of B (see the MATLAB help page for specgram). The quantity (length(window) - numoverlap) specifies by how many samples specgram shifts the window. 3. For real input, specgram truncates the spectrogram to the first nfft/2 + 1 points for nfft even and (nfft + 1)/2 for nfft odd. All audio signals are real (as opposed to real + imaginary). As a first step to using specgram, copy the following into a simple MATLAB file It doesn t have to be a function, just save this as a.m file, put it on the MATLAB path, and then run the file by typing its name (without the.m extension).

% first, make the pitch using the function you made in lab 1 duration = 0.5; %the duration, in seconds, of the sound Fs = 8000; % the sample frequency, in Hz of the sound soundfreq = 440; % the frequency, in Hz of the pitch y = makepitch(duration,fs, 440); % now create a spectrogram and display it numberoffrequencybins = 500; windowfunction = hanning(numberoffrequencybins); specgram(y, numberoffrequencybins, samplefreq, windowfunction); When you run the file, you will see an image like the one below. In this image, the frequency of the signal is indicated by the strong red line. 4000 3500 3000 2500 Frequency 2000 1500 1000 500 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Time MATLAB has a number of built in audio signals that you can play with. Load the laughter signal by typing the following: load laughter Try playing it and displaying it with the spectrogram. Once you have done that, try loading the chirp signal in the same way, and seeing what its spectrogram looks like. The spectrogram shows the estimate of the relative amplitudes of a set of sinusoids used to approximate the wave form analyzed. In order to see the information about the set of sinusoids used to approximate the wave, add the following to your.m file. (Note: the indicates a line continuation)

% now get the values corresponding to the image [B,frequencies,times] = specgram(y, numberoffrequencybins, Fs, windowfunction); amps = 20*log10(abs(B)); Once you run the file, you will have three new MATLAB variables in your environment. The variable amps contains a two dimensional array of values, each of which contains a real and imaginary part. For our purposes, we will ignore the imaginary portion of each value. The value B(i,j) contains phase and amplitude information for the ith window and the jth sinusoid in that window. Typically, we are interested only in the amplitude of the real portion of the values in B. This information is contained in amps. The variable frequencies contains the frequencies of the sinusoids used to analyze the sound, and frequencies(j) contains the frequency of the jth sinusoid. Similarly, times contains center times for the analysis windows. Thus, times(i) gives the time of the ith window. QUESTION 2.1 If you look at the contents of frequencies, you will notice that 440Hz is NOT among the frequencies used to analyze the sound you created. The spacing between the analysis frequencies determines the frequency resolution of the spectrogram. What is the frequency resolution of the spectrogram? The bins are spaced 16 Hz apart. QUESTION 2.2 Modify the number of frequency bins by modifying the MATLAB script you were given. Include the modified script as your answer to this question. They need to change numberoffrequencybins to some other value. An example would be. numberoffrequencybins = 1000; QUESTION 2.3 What number of bins seems sufficient to determine the frequency of the singal within 2 Hz? Change numberoffrequencybins to be 4000 bins because this makes the spacing between bin centers = 2 Hz. Another acceptable answer is 2000 bins because that gives you + or 2 Hz. 3.0 Window functions The code for spectrogram generation uses a window function. You can plot the window function by typing the following: plot(windowfunction) QUESTION 3.1 What shape is this window function? Any answer that sounded like bell curve or Gaussian is good. QUESTION 3.2 There are a number of other window functions that could be used. The most obvious is the boxcar window. In your MATLAB script, replace hanning with boxcar. Now run the script again. What happens to the spectrogram?

4000 boxcar 3500 3000 2500 Frequency 2000 1500 1000 500 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Time Above is the spectrogram when you replace hanning with rectwin or boxcar in the script from section 2. You should indicate they notice energy happening at other places where they know the frequency of the signal has no energy. Notice the strong energy on alternating windows happening in frequencies where we know there shouldn t be any energy. QUESTION 3.3 Does the spectrogram using the boxcar window look more or less accurate than the one with the hanning window? The right answer is less QUESTION 3.4 What does the boxcar window function look like? A rectangle. A flat line. Something like that is the right answer. QUESTION 3.5 Now replace boxcar with triang and re-run your script. What happens to the spectrogram? What does the triang window function look like? It looks like a triangle and the spectrum gets more accurate. In other words, the strong vertical bands you see happening in the boxcar spectrogram get much weaker.but I won t be a big stickler for more accurate. Here is what the triang window makes the spectrogram looks like.

4000 triang 3500 3000 2500 Frequency 2000 1500 1000 500 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Time QUESTION 3.6 Which window function seemed to produce the best result? Which produced the worst? Best = Hanning Worst = boxcar (rectangle) 4.0 Pitch and the missing fundamental frequency Create a signal of frequency 262 Hz using the makepitch function. Now listen to it, using soundsc. You should hear Middle C on the piano. Display it with the spectrogram function, to double check that you have created a signal of the frequency you expect. Now create two new signals at the frequencies 262*3 = 786 Hz and 262*5 = 1310These are multiples of 262, but not related by a power of two to the frequency 262.Thus, they are harmonics of Middle C, but are NOT frequencies associated with the pitch class of C Play each of these harmonics individually, using soundsc. Now, create a composite signal by adding the two harmonics together and play the resulting signal. Compare its pitch to the pitch for a sine wave of 262 Hz. Compare the pitch of the composite signal to the pitch of the of the signals used to create it. QUESTION 4.1 Do you hear a single pitch? Actually, either Yes or No is an OK answer here. It depends on the person. QUESTION 4.2 Do you hear a pitch that is not contained in either of the harmonics you just added together? Either answer is also OK as people vary on this point. QUESTION 4.3 If you do hear an additional pitch; is this pitch higher or lower than the 786 Hz tone?

The right answer is lower If you claimed not to hear an additional pitch, then you should say this question does not apply or something like that. If you want to argue higher then they need to talk to me, as MANY perceptual tests are on my side for this. QUESTION 4.3 Look at the composite sound with a spectrogram. How many frequencies with strong components are displayed in the spectrogram? The answer is 2. QUESTION 4.4 If you add additional harmonics, what happens to the pitch? (Don t guess, TRY it) Those who didn t hear an additional pitch may now hear the additional pitch. Others are probably going to talk about timbre. Anything that sounds like they are trying is fine. 5.0 Non harmonic tones Harmonic sounds are generally considered to be sounds whose primary frequency components are all integer multiples of a fundamental frequency within the range of human hearing (20 to 20,000 Hz). Create several sinusoids, using makepitch, at the following frequencies: 262, 500, 607, 714. Add these together and listen to the result. QUESTION 5.1 Does the resulting sound seem to have a single pitch? The answer is NO Now create a random signal (also known as white noise ) by using the rand function. y = rand(8000,1); QUESTION 5.2 Play the rand signal. What sounds does it remind you of? Any answer like white noise or rain or the ocean is good. These are all random sounds. QUESTION 5.3 Does the rand signal sound like it has a pitch? The answer is NO QUESTION 5.4 Now display the rand signal with a spectrogram. What does the spectrogram look like? Describe what you see. It has energy all across the spectrum with no clear band. QUESTION 5.5 Now display the laughter signal from section 2. What does the spectrogram look like? There is vertical striping. There is more energy below 2000 Hz. There are some (not very clear) horizontal bands that move up and down a bit. QUESTION 5.6 Does the laughter signal sound like it has a pitch? The answer is NO

6.0 Finding the pitch LAB 2 This section requires several.wav files, downloaded from the web. Go to the course home page and click on the link labeled Lab 2 audio files to reach these files. These will be made available on Tuesday, January 18, 2005. QUESTION 6.1 Sounds that have a pitch are harmonic sounds. Harmonic sounds consist primarily of energy concentrated at frequencies that are simple integer multiples of a fundamental frequency, f, in the range of human hearing. Write out a series of instructions to explain to another person how to determine whether a sound is harmonic or not. Assume they know what a spectrogram is. This explanation should be fairly detailed, taking about ½ a page. Here is an example answer. 1) Look at the spectrogram by running specgram on the sound. 2) Look to see if there are regularly spaced (spacing in the vertical dimension) horizontal lines of strong energy (coded as red in the default spectrogram). 3) In a section of the sound where there are obvious horizontal bands, take find the frequency of each of these horizontal bands and record the value in Hertz. 3B) OPTIONALLY: calculate the distances between all the frequencies. found in step 3. 4) If all the values from step 3 (or step 3B) are multiples of some value in the range of human hearing (20-20,000 Hertz), the sound is probably harmonic. CAVEATS: You might need to round the values before applying step 4, perhaps to the nearest 5 or 10 Hz. If you have multiple harmonic sounds happening at the same time, step 4 won t work very well. QUESTION 6.2 For each of the Lab 2 audio files, say whether or not it is a harmonic sound. Explain how you can determine whether each sound is harmonic, using the approach described in QUESTION 6.1. Anything that sounds fairly reasonable along the lines of what I laid out in class is fine. The answers I was looking for were Clarinet: Harmonic, it shows obvious strong, evenly-spaced bands Saxophone: Harmonic, it also shows strong, evenly-spaced bands. Piano: Depends on your method of describing harmonic from question 6.1 I think the sound IS harmonic, but won t fault you if you said otherwise.

Snare Drum: This would generally be though of as non-harmonic, although I ll accept either answer if it is well backed-up. Drum: This could go either way, as well. For me, I ll call it barely harmonic. QUESTION 6.3 One method of estimating the fundamental frequency of a sound is to look at the rate of repetition of the waveform. Another is to look at the spacing (in the frequency domain) between harmonics to infer the frequency of the fundamental. Assuming a sound is harmonic, write out a series of instructions to explain to another person how to determine the fundamental frequency of the sound, based on one of these two methods (or a third basic approach, if you come up with one). This should be as detailed as the answer to QUESTION 6.1. METHOD 1: Perform the steps 1-3 in question 6.1 and then find the LARGEST common divisor of the values from step 3 (or 3B). This should be the fundamental frequency. Often, this value is the lowest strong frequency found in the spectrogram.but NOT ALWAYS. The missing fundamental you experimented with in Section 4 shows you why this is. METHOD 2: 1) Look at a time-amplitude display of the acoustic signal. 2) Zoom in until you can see the shape of the waveform. 3) Find the period of the signal by looking to see where one complete repetition of the waveform happens. 4) Measure the period of the waveform 5) The frequency is 1 divided by the period. QUESTION 6.4 What are the strengths and weaknesses of the method you describe in the answer to QUESTION 6.3? In other words: What kind of sound that has a pitch would cause your system to fail? Give an example. For something that counts the spaces between harmonics, it will fail when you get a sine wave because there is only the one harmonic (the fundamental). For METHOD 2, if the missing fundamental isn t there, you can t find it by looking at the period of the waveform. Both methods may also suffer depending on the resolution (in time or frequency) of your measurement. When two harmonic sounds are happening at the same time, both methods may fail.

QUESTION 6.4B Find and report the fundamental frequency (or frequencies) for each harmonic sound in the Lab 2 audio files using the method you describe in the answer to QUESTION 6.3. Alto Sax: 506Hz =B4 Hz or 530 Hz = C5 Clarinet: 196Hz= G3, 262Hz = C4, 330Hz = E4 Drum: maybe 165Hz = E3, but really there is no pitch Piano: Too many pitches to say. Anything is fine Snare: Not harmonic. QUESTION 6.5 Given your estimate from QUESTION 6.4, what is the distance from A=440, in semitones, of each harmonic sound? If a sound has multiple fundamental frequencies, calculate the distance for each fundamental frequency. Alto Sax: B4 is 2 steps up. C4 is 3 steps up Clarinet: G3 is 14 steps down. C4 is 9 steps down, E4 is 5 steps down Drum: not harmonic Piano: not applicable (even though it is harmonic) Snare: not harmonic. Now that you know how many semitones each sound is from A=440, you can determine the pitch class. The wheel on the following page will help you determine this. It shows the numerical name assigned to each pitch class by music theorists. It also shows the common letter names given to each pitch class. Simply start at your reference pitch class and count around the wheel the appropriate number of semitones. Each step around the wheel is a semitone. If your pitch is higher than the reference pitch, count clockwise around the wheel. If the pitch is lower than the reference, go counter-clockwise around the wheel. QUESTION 6.6 For each pitch calculated in QUESTION 6.5, give the pitch class (use the letter name). Alto Sax: 506Hz =B4 Hz or 530 Hz = C5 Clarinet: 196Hz= G3, 262Hz = C4, 330Hz = E4 Drum: maybe 165Hz = E3, but really there is no pitch Piano: Too many pitches to say. Anything is fine Snare: Not harmonic.

PITCH & PITCH CLASS Every G has the same pitch class &? c w c w 9 10 8 A Ab/G# G 7 Higher = clockwise 11 B Bb/A# 0 C Gb/F# 6 1 Db/C# D Eb/D# F 5 E 2 4 3

LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter 2005