Live multi-track audio recording

Similar documents
Reducing comb filtering on different musical instruments using time delay estimation

IMPULSE RESPONSE MEASUREMENT WITH SINE SWEEPS AND AMPLITUDE MODULATION SCHEMES. Q. Meng, D. Sen, S. Wang and L. Hayes

Realtime auralization employing time-invariant invariant convolver

Recording and post-processing speech signals from magnetic resonance imaging experiments

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback

A Method of Measuring Low-Noise Acoustical Impulse Responses at High Sampling Rates

ROOM IMPULSE RESPONSES AS TEMPORAL AND SPATIAL FILTERS ABSTRACT INTRODUCTION

Lecture Schedule: Week Date Lecture Title

Measuring impulse responses containing complete spatial information ABSTRACT

Composite square and monomial power sweeps for SNR customization in acoustic measurements

Measurement at defined terminal voltage AN 41

Terminology (1) Chapter 3. Terminology (3) Terminology (2) Transmitter Receiver Medium. Data Transmission. Direct link. Point-to-point.

Review of Lecture 2. Data and Signals - Theoretical Concepts. Review of Lecture 2. Review of Lecture 2. Review of Lecture 2. Review of Lecture 2

Laboratory Assignment 5 Amplitude Modulation

A Digital Signal Processor for Musicians and Audiophiles Published on Monday, 09 February :54

3D Distortion Measurement (DIS)

Signals and Systems. Lecture 13 Wednesday 6 th December 2017 DR TANIA STATHAKI

Transfer Function (TRF)

Measurement System for Acoustic Absorption Using the Cepstrum Technique. Abstract. 1. Introduction

YEDITEPE UNIVERSITY ENGINEERING FACULTY COMMUNICATION SYSTEMS LABORATORY EE 354 COMMUNICATION SYSTEMS

SAMPLING THEORY. Representing continuous signals with discrete numbers

arxiv: v1 [cs.ni] 28 Aug 2015

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

MODELLING AN EQUATION

Microphone Array Feedback Suppression. for Indoor Room Acoustics

Lecture 3, Multirate Signal Processing

Introduction to Audio Watermarking Schemes

Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts

Communication Channels

Chapter 3 Data Transmission COSC 3213 Summer 2003

NOISE ESTIMATION IN A SINGLE CHANNEL

Definitions. Spectrum Analyzer

Chapter 3 Digital Transmission Fundamentals

EE 233 Circuit Theory Lab 3: First-Order Filters

Sampling and Reconstruction

REAL-TIME BROADBAND NOISE REDUCTION

STATION NUMBER: LAB SECTION: Filters. LAB 6: Filters ELECTRICAL ENGINEERING 43/100 INTRODUCTION TO MICROELECTRONIC CIRCUITS

Cancellation of Unwanted Audio to Support Interactive Computer Music

Music 270a: Fundamentals of Digital Audio and Discrete-Time Signals

Validation of lateral fraction results in room acoustic measurements

Signals and Systems Lecture 9 Communication Systems Frequency-Division Multiplexing and Frequency Modulation (FM)

Advanced techniques for the determination of sound spatialization in Italian Opera Theatres

Calibration of Microphone Arrays for Improved Speech Recognition

Chapter 3, Sections Electrical Filters

EECS 452, W.03 DSP Project Proposals: HW#5 James Glettler

Dayton Audio is proud to introduce DATS V2, the best tool ever for accurately measuring loudspeaker driver parameters in seconds.

Dayton Audio is proud to introduce DATS V2, the best tool ever for accurately measuring loudspeaker driver parameters in seconds.

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE

Lecture 7 Frequency Modulation

Speech Coding in the Frequency Domain

Chapter 2 Channel Equalization

Lab 10 - INTRODUCTION TO AC FILTERS AND RESONANCE

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position

Lecture 6. Angle Modulation and Demodulation

Some key functions implemented in the transmitter are modulation, filtering, encoding, and signal transmitting (to be elaborated)

Introduction. In the frequency domain, complex signals are separated into their frequency components, and the level at each frequency is displayed

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio

Data Transmission. ITS323: Introduction to Data Communications. Sirindhorn International Institute of Technology Thammasat University ITS323

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

A COMPACT, AGILE, LOW-PHASE-NOISE FREQUENCY SOURCE WITH AM, FM AND PULSE MODULATION CAPABILITIES

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Time-of-arrival estimation for blind beamforming

Problems from the 3 rd edition

SIDELOBES REDUCTION USING SIMPLE TWO AND TRI-STAGES NON LINEAR FREQUENCY MODULA- TION (NLFM)

APPLICATION NOTE MAKING GOOD MEASUREMENTS LEARNING TO RECOGNIZE AND AVOID DISTORTION SOUNDSCAPES. by Langston Holland -

Laboratory Assignment 4. Fourier Sound Synthesis

CMPT 468: Frequency Modulation (FM) Synthesis

GSM Interference Cancellation For Forensic Audio

Enhanced Waveform Interpolative Coding at 4 kbps

Linear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

AudioFix - ANALYSIS AND CORRECTION OF ACOUSTIC RESPONSE WITH DIGITAL SIGNAL PROCESSING (Acoustics, Diffusion, Sonorization)

UNIT-3. Electronic Measurements & Instrumentation

3D Intermodulation Distortion Measurement AN 8

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Chapter 3. Data Transmission

Keysight Technologies Pulsed Antenna Measurements Using PNA Network Analyzers

ESE531 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Signal Processing

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

Chapter 2: Digitization of Sound

Digitally controlled Active Noise Reduction with integrated Speech Communication

UNIT 2. Q.1) Describe the functioning of standard signal generator. Ans. Electronic Measurements & Instrumentation

COMB-FILTER FREE AUDIO MIXING USING STFT MAGNITUDE SPECTRA AND PHASE ESTIMATION

PROBLEM SET 6. Note: This version is preliminary in that it does not yet have instructions for uploading the MATLAB problems.

Three-dimensional sound field simulation using the immersive auditory display system Sound Cask for stage acoustics

FIR/Convolution. Visulalizing the convolution sum. Convolution

Chapter-2 SAMPLING PROCESS

Music 270a: Modulation

Spectrum. Additive Synthesis. Additive Synthesis Caveat. Music 270a: Modulation

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

Audio Analyzer R&S UPV. Up to the limits

Signals and Filtering

Utilizzo del Time Domain per misure EMI

AURALIZATION OF SIGNAL DISTORTION IN AUDIO SYSTEMS PART 1: GENERIC MODELING

Testing DDX Digital Amplifiers

Fourier Analysis. Chapter Introduction Distortion Harmonic Distortion

Communications I (ELCN 306)

The effects of the excitation source directivity on some room acoustic descriptors obtained from impulse response measurements

Transcription:

Live multi-track audio recording Joao Luiz Azevedo de Carvalho EE522 Project - Spring 2007 - University of Southern California Abstract In live multi-track audio recording, each microphone perceives sound coming from multiple sources. This degrades the overall sound quality. This work aims to resolve the dry sound produced from each source using a system inversion method that effectively implements room equalization and crosstalk cancelation. The proposed scheme is demonstrated in a 4-channel experiment. No audible differences were observed when comparing the original dry signals to those recovered from the live recordings. We show that the proposed method is more SNR efficient than direct equalization, and that it can be further improved with better room response measurements. Potential limitations are discussed. 1 Introduction In music recording, each sound track is typically recorded separately, as each musician listens to a previously recorded guide track. Live recording, in which all tracks are recorded simultaneously, is commonly used for recording concerts or band practices. However, if multiple microphones at different locations are used to record different instruments and singers, each microphone may perceive sound coming from multiple sources. This degrades the overall sound quality when these tracks are combined during mixing. This work aims to resolve the dry sound produced from each source in a live recording. This is achieved by using one microphone for each sound source, and the dry sounds are obtained trough system inversion. This effectively implements room equalization and crosstalk cancelation. The proposed method is demonstrated in a 4-channel experiment. The results show that the proposed method is consistently more SNR (signalto-noise ratio) efficient than simply equalizing each channel before mixing. Corresponding author: jcarvalh@usc.edu 1

Cross-correlation evaluation reveals that the method effectively eliminates crosstalk. Simulations suggest that the method s SNR efficiency could be considerably improved if room response measurements with higher SNR were obtained. 2 Theory In a single-source/single-track application, the relation between the produced sound x(t) and the recorded sound y(t) can be modeled as a liner system y(t) = x(t) h(t), where h(t) is the impulse response of the room (Figure 1). In frequency (Fourier) domain, this relation becomes a multiplication: Y (ω) = X(ω) H(ω). The produced dry sound can be obtained by measuring the room response, solving for X(ω), and inverse Fourier transforming back to time domain. x(t) H y(t) Figure 1: Linear system model of the relation between the produced sound x(t) and the recorded sound y(t). H is a linear system that models the room response. In a multi-source/multi-track application where the number of microphones is equal to the number of sound sources, each microphone records the sound produced by its correspondent source (Figure 2a), but also the sound being produced by all the other sources (Figure 2b). As each source (x i ) and each microphone (j) is positioned in a different location, there is a different room response H ij associated with each source-microphone pair. The system in Figure 2b can be generalized for any number of sound sources, and can be modeled for each associated microphone as y j (t) = i x i(t) h ij (t). In frequency domain, this system model becomes: Y j (ω) = i X i(ω) H ij (ω). This can be represented in matrix form as: 2

a H1A H2B H3C H4D b x1 H1A ya H x2 H 4A H 3A 2A x4 x3 Figure 2: System model used in the proposed method. Each microphone records the sound produced by its correspondent source (a), as well as the sound being produced by all the other sources (b). A different room response Hij is associated with each source-microphone pair. 3

Y Ḅ. Y A Y N = H 1A H 2A H na H 1B H 2B H nb...... H 1N H 2N H nn X 1 X 2. X n, or simply Y = H X. The Y vector is formed by the Fourier transform of the set of recordings obtained from each of the microphones. The H matrix is formed from the set of frequency responses associated with each source-microphone pair. These room responses can be measured using the log-sweep technique [1], for example. The X vector contains the unknowns, as each element correspond to one of the n different sound sources in the recording. If the number of recordings is equal to number of sound sources, this system can be solved directly by matrix inversion: ˆX = H 1 Y. The system is solved independently for each frequency component in Fourier domain. Then, each ˆX i (ω) is inverse Fourier transformed back to time domain to obtain the set {ˆx i (t)}, corresponding to n dry recordings associated with each sound source. 3 Methods 3.1 Room response measurements We used the log-sweep technique [1] to measure the room responses h ij (t) associated with each source-microphone pair. For each measurement, a speaker was placed at the i-th location, and a microphone was placed at the j-th location. The following waveform was played, and simultaneously recorded: a(t) = sin 2πf 1T [e (t/t ) log(f 2/f 1 ) 1], log(f 2 /f 1 ) where the length (T ) of the log-sweep signal was 3 seconds, and the frequency range covered from f 1 = 1 Hz to f 2 = 22050 Hz (Figure 3). Each measurement was repeated 10 times, and averaged, in order to achieve higher SNR. The Fourier transform of the log-sweep signal A(ω) was obtained, as well as the Fourier transform of the averaged recordings B ij (ω). The impulse responses h ij (t) were obtained by inverse Fourier transforming B ij (ω)/a(ω), 4

a b amplitude magnitude (db) 1 0.5 0 0.5 1 0 0.5 1 1.5 2 2.5 3 time (s) 0 10 20 30 40 50 (Hz) 10 0 10 1 10 2 frequency 10 3 10 4 Figure 3: Log-sweep signal used to measure the room responses: (a) time domain; (b) frequency domain. and selecting the first 372 ms. We observed that this length was enough to capture the T 60 of the room. The frequency responses H ij (ω) were obtained by Fourier transforming each impulse response. 3.2 Recordings The proposed method was demonstrated in a 4-channel experiment. The sound recordings and room response measurements were performed as discussed next. We used 10 second segments of four different tracks from a music CD as our set of signals {x i (t)}. Each segment was played from a different location. A microphone was placed at each of these locations, capturing not only the direct sound from its correspondent source, but also its reflections, and the sound coming from all the other sources. Due to hardware limitations, we cold not record or produce multiple sound tracks simultaneously. Instead, we recorded each source-microphone pair separately, producing 16 recordings y ij (t). These recordings were synchronized, and combined to form 4 different recordings y j (t) associated with each microphone location. Due to significant background noise, each record- 5

ing was repeated 10 times, and averaged. The log-sweep signals, discussed in section 3.1, were recorded for each source-microphone pair immediately before each corresponding yij (t) signal was obtained. The room setup is illustrated in Figure 4. Figure 4: Room setup. The speaker was moved from locations 1 to 4, and for each location of the speaker, the microphone was also rotate from locations 1 to 4. For each speak-microphone pair of locations, a log-sweep measurement was obtained, and the sound track correspondent to the current speaker location (represented by different colors) was played and recorded. 3.3 SNR comparison The reconstructed signals x i (t) were evaluated in terms of SNR to the original dry signals xi (t). For comparison, we also evaluated the SNR for signals x i (t), obtained by simply equalizing the correspondent recorded signal yj (t) with the corresponding hij (t) room response. These were obtained by inverse Fourier transforming X i (ω) = Yj (ω)/hij (ω). 3.4 Crosstalk cancelation evaluation In order to evaluate the effectiveness of the proposed method in terms of crosstalk cancelation, we analyzed the cross-correlation of each of the recon6

structed signals ˆx i (t) to all the original dry signals x i (t). For comparison, we also calculated the cross-correlation of the dry signals to themselves, and the cross-correlation of each equalized signal x i (t) to the dry signals. 3.5 SNR efficiency simulation In order to evaluate the influence of the accuracy of the room response measurements in the SNR efficiency of the proposed method, we performed the following simulation: The four y j (t) recordings were synthesized, rather then actually recorded, by convolving the four dry sound signals x i (t) with the corresponding room response measurements, i.e. y j (t) = i x i(t) h ij (t). White noise η(t) was added to the room responses, i.e.: ĥ ij (t) = h ij (t) + η(t). Using the sets of synthesized recordings y j (t) and noisy room response measurements ĥij(t), we used the proposed method to obtain noisy reconstructed signals ˆx i (t), and used equalization to obtain equalized signals x i (t). The average SNR between ĥij(t) and h ij (t) was calculated. The average SNR (and standard deviation) between ˆx i (t) and x i (t) was calculated. The average SNR (and standard deviation) between x i (t) and x i (t) was calculated. The SNR of the reconstructed signals was compared to the SNR of the room response measurements. The amplitude of η(t) was varied, and the experiment was repeated. 4 Results A total of 16 log-sweep measurements h ij (t) were obtained. Representative room responses measured in locations near and far from the speaker are shown in Figure 5. The results indicate that components below 100 Hz and above 11 khz are considerably attenuated. This is due to the fact that the room response measurements actually represent the system speakerroom-microphone. The speaker and/or microphone we used do not have 7

a flat response outside this range, and act as bandpass filters. Because of the high attenuation outside the 0.1-11 khz frequency range (dashed lines), the filters obtained by inverting the matrix H present extremely high gains in those frequencies. Such gains would cause loss of SNR due to noise amplification. Therefore, we focused our analysis on the 0.1-11 khz frequency range only. Components outside this range in all signals (x i, ˆx i, x i ) were nulled when evaluating the methods performance qualitatively (SNR, cross-correlation) and subjectively (listening). A wider bandwidth can be used for the reconstructed signals if equipment with more accurate frequency response is available for the impulse response measurements. a 0 10 b 0 10 20 20 30 30 gain (db) 40 50 gain (db) 40 50 60 60 70 70 80 20 100 1k 10k frequency (Hz) 22k 80 20 100 1k 10k frequency (Hz) 22k Figure 5: Representative room responses measured in locations (a) near and (b) far from the speaker. Components below 100 Hz and above 11 khz (dashed lines) are considerably attenuated. The SNR efficiency comparison between the proposed method and direct equalization is shown in Table 1. The results show that the proposed method is consistently more SNR efficient than equalization. In average, we observed a 5.7 db increase in SNR using the proposed method, when compared to equalization. The SNR improvement was considerably audible. Background noise was clearly heard in the equalized signals, but was not audible in the signals recovered using the proposed method. The results of the crosstalk cancelation evaluation are shown in Figure 6. These results show that the cross-correlation between different signals increase in the equalized results (b) when compared to the reference crosscorrelations (a). This is due to the fact that each microphone captures not only the direct sound from its respective speaker (and its reflections), but also sound from all the other source (speakers in different locations). Equalization compensates the room response, effectively eliminating reflections of the correspondent sound. However, it does not address crosstalk, and back- 8

Table 1: SNR efficiency comparison between the proposed method and direct equalization (db). Track 1 Track 2 Track 3 Track 4 Average Equalization 9.4 6.4-0.4 8.5 6.0 Proposed 12.0 11.2 11.6 12.0 11.7 Improvement 2.6 4.8 12.0 3.5 5.7 ground noise from all the other sources is heard in the equalized signals. This background noise is considerably audible in the equalized recording. If the error waveform x i (t) x i (t) is played, it becomes clear that this background noise is composed by the sum of all other sound tracks. The results in (c) show that the proposed method effectively reduces crosstalk, and the original cross-correlations are recovered. No significant background noise was heard in the reconstructed signals. By playing the error waveform x i (t) ˆx i (t), we notice that the error consists of a considerably attenuated version of the correspondent sound track. Other sound tracks are not audible in the error waveform, indicating that crosstalk was effectively reduced. a x 1 x 2 x 3 x 4 x 1 b x 1 ~ x ~ 1 x ~ 2 x ~ 3 x 4 c x 1 x^ 1 x^ 2 x^ 3 x^ 4 0 db x 2 x 3 x 2 x 3 x 2 x 3-10 db x 4 x 4 x 4-20 db Figure 6: Crosstalk cancelation evaluation. The maximum cross-correlation vale for each pair of signals is shown. (a) dry signals (reference); (b) equalization; (c) proposed method. Figure 7 shows the results of the SNR efficiency simulation. The simulation results show that the SNR efficiency of the proposed method can be highly improved if more accurate room response measurements can be obtained. The maximum SNR for direct equalization was limited to approximately 10 db, as crosstalk noise is not reduced using this approach. Improving the SNR of the room response measurements above 10 db did not improve the equalization performance in this 4-channel simulation. 9

recovered audio SNR (db) 120 100 80 60 40 20 no post processing equalization proposed method 0 20 20 0 20 40 60 80 100 120 room response SNR (db) Figure 7: SNR efficiency as a function of room response measurement accuracy in a 4-channel live recording. These simulation results show that the SNR efficiency of the proposed method is linearly improved as more accurate room response measurements are used. With equalization, the maximum SNR is limited by crosstalk noise. 5 Discussion Equalization compensates the effects of the room, and successfully resolves the dry sound in a single-track recording. However, in a multi-track recording it is not efficient. Equalization does not deal with crosstalk from the other sound sources, which appears as background noise, and degrades the SNR. The proposed method for multi-track audio recording not only achieves equalization in all tracks, but also improves the SNR by effectively reducing crosstalk. High SNR can be achieved if the room responses can be accurately measured. One limitation of the proposed method is the requirement of multiple measurements of the room response. This might be laborious and timeconsuming. Ideally, this measurements would have to be obtained with the band already inside the recording studio, and the technician performing this measurements should not be inside the room during these acquisitions. In a concert recording, the measurements should ideally be obtained with the audience already present and silent, which in practice can not be achieved. 10

Another limitation is the need of one microphone for each sound source. Some instruments are typically not recorded using microphones, being connected directly to the mixing/recording equipment. Also, sound sources that typically would not be recorded (e.g., monitor speakers) may need to be taken into account. Furthermore, additional speakers might be needed for playing the logsweep signals, because some of the sound sources may not be produced by speakers (e.g., singers, audience, acoustical instruments). Also, musicians may prefer the frequency response of their amplifiers not to be equalized. In these cases, speakers with a flat response would have to be used to play the log-sweep signals. These additional speakers would have to be placed very close to their correspondent sound sources. The proposed method is computationally intense and should be used as a post processing stage. The content may be segmented in short blocks (e.g., 10 seconds) to reduce the computational load associated with the Fourier transforms. In this case, some overlap between these blocks could be used to avoid edge artifacts. The computational complexity associated with the matrix inversions increases quadratically with the number of audio tracks. 6 Conclusions We addressed the issue of multi-track live recording. The proposed method resolves the dry sound produced from each source by using one microphone for each sound source, and inverting a linear system that models the recording environment. The matrix describing this system is obtained through multiple log-sweep measurements. The proposed method was demonstrated in a 4-channel experiment. No audible differences were observed when comparing original and recorded signals. Quantitative results showed that this scheme is more SNR efficient than direct equalization. A cross-correlation analysis showed that the method effectively eliminates crosstalk. Simulation results showed that the SNR efficiency can be significantly improved if room responses can be more accurately measured. Potential limitations were discussed. References [1] Farina A. Simultaneous measurement of impulse response and distortion with a swept-sine technique. In: Proc 110th Conv Audio Eng Soc Paris, France, 2000. 11