Synthasaurus: An Animal Vocalization Synthesizer. Robert Martino Master's Project Music Technology Program Advisor: Gary Kendall June 6, 2000

Similar documents
SPEECH AND SPECTRAL ANALYSIS

Subtractive Synthesis & Formant Synthesis

Complex Sounds. Reading: Yost Ch. 4

Many powerful new options were added to the MetaSynth instrument architecture in version 5.0.

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

COMP 546, Winter 2017 lecture 20 - sound 2

Digitalising sound. Sound Design for Moving Images. Overview of the audio digital recording and playback chain

CS 591 S1 Midterm Exam

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 14 Timbre / Tone quality II

Synthesis Techniques. Juan P Bello

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

A-110 VCO. 1. Introduction. doepfer System A VCO A-110. Module A-110 (VCO) is a voltage-controlled oscillator.

Communications Theory and Engineering

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

STO Limited Warranty Installation Overview

ALTERNATING CURRENT (AC)

Helm Manual. v Developed by: Matt Tytel

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

L19: Prosodic modification of speech

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Synthesis Algorithms and Validation

The source-filter model of speech production"

ENSEMBLE String Synthesizer

An introduction to physics of Sound

Combining granular synthesis with frequency modulation.

PULSAR DUAL LFO OPERATION MANUAL

Resonance and resonators

BASIC SYNTHESIS/AUDIO TERMS

Aalto Quickstart version 1.1

RS380 MODULATION CONTROLLER

Resonant Self-Destruction

Epoch Extraction From Emotional Speech

RTFM Maker Faire 2014

Sound Synthesis Methods

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

MKII. Tipt p + + Z3000. FREQUENCY Smart VC-Oscillator PULSE WIDTH PWM PWM FM 1. Linear FM FM 2 FREQUENCY/NOTE/OCTAVE WAVE SHAPER INPUT.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

INTRODUCTION TO COMPUTER MUSIC. Roger B. Dannenberg Professor of Computer Science, Art, and Music. Copyright by Roger B.

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

Photone Sound Design Tutorial

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Modulation is the process of impressing a low-frequency information signal (baseband signal) onto a higher frequency carrier signal

Respiration, Phonation, and Resonation: How dependent are they on each other? (Kay-Pentax Lecture in Upper Airway Science) Ingo R.

HF Receivers, Part 3

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

PHY-2464 Physical Basis of Music

Quick Start. Overview Blamsoft, Inc. All rights reserved.

Square I User Manual

USER MANUAL DISTRIBUTED BY

Plaits. Macro-oscillator

ASPIRATION NOISE DURING PHONATION: SYNTHESIS, ANALYSIS, AND PITCH-SCALE MODIFICATION DARYUSH MEHTA

Source-Filter Theory 1

MMO-3 User Documentation

YAMAHA. Modifying Preset Voices. IlU FD/D SUPPLEMENTAL BOOKLET DIGITAL PROGRAMMABLE ALGORITHM SYNTHESIZER

TURN2ON BLACKPOLE STATION POLYPHONIC SYNTHESIZER MANUAL. version device by Turn2on Software

the blooo VST Software Synthesizer Version by Björn Full Bucket Music

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL

Instruction Manual for Concept Simulators. Signals and Systems. M. J. Roberts

MMG: Limited Warranty: Installation:

VK-1 Viking Synthesizer

Q107/Q107A State Variable Filter

Exam 3--PHYS 151--Chapter 4--S14

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

SuperCollider Tutorial

EE482: Digital Signal Processing Applications

Speech Synthesis; Pitch Detection and Vocoders

Chapter 3. Description of the Cascade/Parallel Formant Synthesizer. 3.1 Overview

Dirty Tricks Reference Manual

EE 225D LECTURE ON SYNTHETIC AUDIO. University of California Berkeley

CMPT 468: Frequency Modulation (FM) Synthesis

Computer Audio. An Overview. (Material freely adapted from sources far too numerous to mention )

the blooo VST Software Synthesizer Version by Björn Full Bucket Music


Acoustic Phonetics. How speech sounds are physically represented. Chapters 12 and 13

I personally hope you enjoy this release and find it to be an inspirational addition to your musical toolkit.

A Look at Un-Electronic Musical Instruments

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Source-filter Analysis of Consonants: Nasals and Laterals

Linguistic Phonetics. Spectral Analysis

BoomTschak User s Guide

Transforming High-Effort Voices Into Breathy Voices Using Adaptive Pre-Emphasis Linear Prediction

Communication by sound PSY 2364 Animal Communication. Sound production in animals

Manual written by Alessio Santini and Simone Fabbri. Manual Version 1.0 (11/2015) Product Version 1.0 (11/2015)

VOICE BOX Harmony Machine and Vocoder

Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics

Principles of Musical Acoustics

Acoustic Resonance Lab

A-126 VC Frequ. Shifter

Linguistics 401 LECTURE #2. BASIC ACOUSTIC CONCEPTS (A review)

The Deep Sound of a Global Tweet: Sonic Window #1

Get t ing Started. Adaptive latency compensation: Audio Interface:

D O C U M E N T A T I O N

ALM-011. Akemie s Castle. - Operation Manual -

Flow Motion FM Synthesizer. User Guide

MAKE SOMETHING THAT TALKS?

INTRODUCTION TO COMPUTER MUSIC PHYSICAL MODELS. Professor of Computer Science, Art, and Music. Copyright by Roger B.

Foundations of Language Science and Technology. Acoustic Phonetics 1: Resonances and formants

Transcription:

Synthasaurus: An Animal Vocalization Synthesizer! Robert Martino Master's Project Music Technology Program Advisor: Gary Kendall June 6, 2000

Introduction A compelling area of exploration in the domain of physical modeling and vocal synthesis is the production of non-human, expressive, animal-like vocalizations. Animal sounds can convey a wide variety of emotional states, and synthesizing life-like vocalizations would allow for interesting applications in the world of video games, film, music, and artificial intelligence systems. This paper describes Synthasaurus, a synthesis engine prototype developed in Opcode MAX/MSP, which enables one to create emotive animal-like calls, and provides enough flexibility to synthesize a variety of organisms that can resemble different mammals, birds, reptiles and amphibians. Alien, robotic, and other imaginary creatures can also be conceived. Synthasaurus builds on research and technology developed for human speech synthesis, with special kinds of control added for creating more animal-like sounds. Basics of Animal Communication Animal Vocal Systems Sound productions systems in typical vertebrates (mammals, anurans, and birds) share a similar basic mechanism. Air flowing through a tube causes one or more membranes in the path of the flow to vibrate. These vibrations can then be modified (such as through a resonating chamber) and are then coupled to the propagating medium (Bradbury 1998). 1

Trachea Muscles Figure 1: Mammalian Larynx Air flow Glottis Vocal Cords In the larynx of mammals (figure 1), two vocal cords (which make up the glottis) block the airflow from the respiratory system. Enough air pressure will push the glottis open, releasing a burst of air, and a Bernoulli force is generated which pushes the vocal cords back together. The result is a series of periodic air fronts of a non-sinusoidal nature. This harmonically rich signal can then be filtered via a resonant chamber that ends with the mouth and nose. Trachea Cartilage Figure 2: Anuran Larynx Air Flow Vocal Cord Glottis Anurans (frogs and toads) also have a larynx system (figure 2), but in this case a second pair of membranes upstream from the glottis can oscillate at a frequency independent from the glottis. Thus amplitude modulation occurs. Air then passes into a throat sac rather than escaping through a mouth or nose, and this air can also be recycled back into the lungs. 2

Avian syrinx: Figure 3: Two Views of an Avian Syrinx Air sac pressure One side of syrinx: Muscles controlling membrane tension Birds have a bronchial-tracheal junction called a syrinx (figure 3), whereby two bronchial paths join with a single trachea. Membranes either in the trachea or on each side of the bronchial passages vibrate when air passes over them. The tension of these membranes can be modified to modulate frequency and amplitude of sounds. When these membranes occur in the two bronchial passageways, they can sometimes be controlled independently, thus creating two independently controlled sounds. (Bradbury 1998) Communicating Animal Emotion While one important goal of this synthesis model was the ability to create sounds with physically realistic timbres, another is to communicate emotion, possibly evoking a particular "mood". Despite the variety of animal species and sound production systems they employ, there are some generalizations that have been made as far as understanding the intention or emotional message of an animal's auditory signals. This kind of 3

information would be helpful in relating emotional states of an organism to the physical properties of sounds it might make in those contexts. Darwin suggested that the size of an animal determines the pitch of it's voice, and that larger individuals are generally more dominant than smaller ones (Darwin 1965). Using this reasoning he argued that aggressive vocalizations tend to be characterized by lower pitch, and submissive vocalizations are relatively higher. Morton (1992) developed a more comprehensive model that related the structure of many mammal and bird vocalizations to motivational states, which he called the Motivational/Structural rules. As an animal gets more aggressive, its vocalizations tend to become more broadband (harsh) and lower in pitch, and as an animal becomes more fearful, the pitch of its vocalization tends to rise and become tonal. Combinations of various degrees of aggression and fear reflect more ambiguous motivational states that combine sonic properties from both (figure 4). Each block represents a basic sonogram, with thickness of the line representing bandwidth, and height of the figure representing frequency. Arrows suggests shapes that can vary in pitch, and dotted lines represent degrees of change in slope. Tones in the upper left corner of the chart show nonaggressive, friendly sounds that are tonal and vary in pitch. Fear is indicated by increasingly higher pitch. Aggression is expressed through harsher sounds that are lower in frequency, and can be mixed with fear characteristics. The "neutral" chevron shape in the middle can express a sense of general alarm or excitement, and depending on the frequency and length is characteristic of a "bark" like sound in many species (Morton 1992). 4

Increasing fear or appeasement Increasing Aggression (size) Figure 4: Morton's Motivational/Structural Rules The Synthesis Model Pulse Osc AM Section Freq env AM amount Gain Control envelopes Amp env Freq env Pulse Osc Vocal Tract Fear Noise Biquad filter Turbulence Size Length Aggression Amp env (Dist) FM amount Sine Osc % of carr. freq. (Smoothness) FM Section Figure 5: Synthesis Model 5

The basic structure of Synthasaurus (figure 5) is a glottal pulse oscillator (which can be modified in a variety of ways) which passes through a waveguide model of a vocal tract. This model most closely reflects a mammalian vocal tract, although anuran and bird like calls are also possible because of the amplitude and frequency modulation possibilities incorporated into the model. The waveguide model of the vocal tract is similar to the one used in Perry Cook's SPASM. In this implementation, a simplified, six-section straight tube is used (as opposed to the three-way system used in Cook's model with throat, mouth and nose passageways). The glottal oscillator is a custom MSP object designed for this project (developed in C with the MSP Software Development Kit), which provides a "smoothed" curve pulse wave that can be lengthened or shortened with a slider for different timbral qualities (this can also be set to modulate randomly). The user can specify a pitch envelope for this oscillator, and a frequency range within which this envelope works (as well as a base frequency). The user can specify an overall amplitude envelope function. This oscillator can be amplitude modulated with a relatively low frequency (0-100 Hz) oscillator of the same smooth pulse type. This not only enables one to simulate the glottis upstream from the vocal cords in anuran vocal tracts, but also provides an effective way to create rapid "stuttering" effects which help in the creation of purring and growling type sounds. The pulse width of this oscillator can be controlled, as well as the strength of the amplitude modulation (0-100%). A configurable envelope function controls the modulation frequency. A frequency modulation section provides for further signal modification. Low frequency modulation of the carrier waveform with a sine wave creates sidebands that 6

contribute to the "harshness" of the sound (which in turn often relates to the degree of aggressiveness in an animal call, as described by Morton). An envelope control is provided for controlling the depth of frequency modulation (which can be further strengthened by the "Aggression" parameter described later), as well as a slider for controlling FM frequency (which is calculated as a percentage of carrier frequency). Filtered noise can also be injected into the modulating oscillator's signal to simulate air turbulence in the vocal tract. The vocal tract (figure 6) is also a custom MSP object developed for this project. It consists of a six section waveguide model, divided by junctions which reflect or transmit signal energy depending on the radius of each tract section, as described in Cook's model (Cook 1993). Envelopes can be defined to control the radii of the sections, and are input into the tract object as sample rate signals (for smooth sounding transitions). k 1 + k k =radiusl - radiusr radiusr + radiusl - k air flow through vocal tract Mouth 1 - k Scattering Junction Vocal cords Figure 6: Waveguide Model 7

At the end of the tract model, where the "mouth" of the animal would be present, a simple crossover filter system controls the reflection characteristics of the vocal tract: higher frequencies escape the tract and lower frequencies are reflected back. The cutoff frequency of this filter can be controlled. By allowing more low frequencies to escape the tract, the impression of a larger tract (and thus larger animal) is created. The Vocal Tract Size slider represents this cutoff parameter. The time of the waveguide sections can also be increased to allow for the lengthening of the vocal tract. The User Interface Figure 7: Synthasaurus Screenshot 8

Fig. 8: Frequency Controls Several presets are provided in this version Synthasaurus, which demonstrate its ability to create a variety of emotive sounds. The most compelling characteristic of a given sound in conveying emotion is the pitch envelope (figure 8), which is a good place to start in designing new sounds. Recordings of real animal calls in spectrograph format (frequency vs. time) are useful examples for designing pitch envelopes. Any of the envelopes on the screen can be edited by dragging existing points, clicking outside a point to add a new one, or shift-clicking to remove a point. A randomize feature is included in the frequency section for providing variation on each playback by offsetting the frequency envelope by a constrained random value. Fig. 9: Amplitude/Pulse Controls 9

An overall amplitude envelope (figure 9) provides overall volume contour for the sound, and sliders control pulse width (the narrower the pulse, the brighter the sound). The frequency modulation section (figure 10) is useful for creating some aggressive distortion in the signal. At a low enough "smoothness" setting, frequency modulation becomes audible and is useful for bird like calls. The turbulence setting adds a degree of "breathiness" to the signal. Amplitude modulation (figure 11) enables one to create some interesting audible "pulsing" or "stuttering" effects, useful for feline purring and growling simulations. Fig. 10: AM Controls Fig. 11: FM Controls In the vocal tract section (figure 12), the size of the animal and articulation of the vocalization are easily specified. The Size slider controls the cutoff frequency of the 10

crossover filter at the mouth, and the Length increases the amount of in the waveguide model. The "Vocal Tract Movement" slider moves through a series of predefined vocal section movements, generally with the movement occurring more towards the base of the larynx when the slider is to the left and more towards the mouth on the right side. These variations of vocal tract movement create different kinds of articulations and formants during the course of a sound, sometimes effective in simulating a primitive "talking" effect. Enabling the Randomize feature causes a different tract movement to occur on each play occurrence. F Fig. 12: Vocal Model Controls Sometimes the vocal tract model can overload due to its feedback nature. Thus two volume controls are provided (figure 14), one for pre-vocal tract gain and one for post-vocal tract. The pre-vocal tract slider should be set as high as possible without the system clipping. The "Fear" and "Aggression" sliders (figure 13) attempt to map more emotive qualities to control changes consistent with Morton's Motivation/Structure rules. Increasing "Fear" simply increases the base frequency of the sound, while more "Aggression" increases both the FM amount (ratio of dry to FM signal) and strength of the Distortion (modulation depth) envelope, which effectively increases the "harshness" of the signal in most cases. 11

Fig. 13: Emotive Controls An overall duration (figure 14) slider provides control for the length of the vocalization. This duration can also be randomized to a limited degree for each playback. Play controls (figure 15) are simple and include a repeat function so that sounds can be heard continuously while editing. The "Variations" toggle activates the random feature of the pulse width, vocal movement, and frequency sections so that each playback of a sound is a bit different. Presets are saved by shift-clicking in the preset box, and recalled by double-clicking. Fig. 14: Duration and Volume Controls Fig. 15: Play, Record and Preset Controls 12

Considerations for Future Development This incarnation of Synthasaurus attempts to make a useful step in the synthesis of emotive, easily controlled animal sounds. There are many ways in which this design could be further developed. This model focuses on the creation of relatively short, one-oscillator timbres. A useful method of working with these sounds would be a "compositing" environment that enables both sequential and simultaneous mixing of voices to create more complex vocalizations. Currently the user can only specify a pre-defined set of envelopes for vocal tract movement. Enabling the user to draw custom vocal tract envelopes on the user interface screen would be a useful feature, so that studies of actual animal mouth movements could be incorporated into sound design. Custom envelopes can be drawn if the user owns the development version of MAX (rather than just the stand-alone MAXPlay application), since the envelopes reside a couple patch layers underneath the user interface screen. More realism could be incorporated by adding a feedback feature in the vocal tract model, which simulates the effect of reflected air influencing the nature of vocal cord oscillation, especially at higher air pressures. This may provide a more realistic distortion or harshness to the signal. The approach in this project has been to simulate a general-purpose animal tract that could create a wide variety of textures. Further nuances related to specific kinds of animals could be modeled with a more configurable oscillator/vocal tract system (allowing one to model the dual bronchial passages in birds, with independently controlled oscillators, for example), or an altogether different kind of mechanism that 13

coupled the animal's vocal cords to the surrounding air (such as air sacs in frogs). Species like arthropods and employ quite different kinds of sound production mechanisms that may be interesting to model. This synthesis engine might be a useful resource within a larger artificial intelligence environment, whereby a created "organism" interacting in an environment (either an abstract being in a virtual world, or a physical robot) may make life-like, emotive vocalizations in response to various stimuli. The pure synthesis approach used in this model (as opposed to using modifications of sampled audio) lends itself to more flexible control of sound parameters. Conclusion Hopefully this project encourages further research in synthetic animal-like vocalizations. There are many applications for such a synthesis engine in the game and film industries, as well as robotics and other kinds of artificially created "lifeforms". The unique nature of these kinds of sounds offers a new domain of timbrally rich and expressive qualities that could be used to interesting musical effect as well. Acknowledgements Northwestern University professors Gary Kendall (Music Technology), Ian Horswill (Computer Science) and Charles Larson (Communication Sciences) were extremely helpful in the design and implementation of this project. Collaboration with these three professors brought much insight into the relationships between the fields of 14

sound synthesis, robotics, artificial intelligence, and animal vocal tract physiology and anatomy. 15

References Bradbury, J. W. and Vehrencamp, S. L. Principles of Animal Communication, Sunderland, Massachusetts: Sinauer Assocaites, Inc., 1998 Cook, P. "SPASM, a Real-Time Vocal Tract Physical Controller" Computer Music Journal, 17(1), 1993 Darwin, C., The Expression of Emotions in Man and Animals, Chicago: University of Chicago Press,1965 Morton, E. S. and Page, J. Animal Talk, New York: Random House, 1992 16