SOUND FIELD REPRODUCTION OF MICROPHONE ARRAY RECORDINGS USING THE LASSO AND THE ELASTIC-NET: THEORY, APPLICATION EXAMPLES AND ARTISTIC POTENTIALS

Similar documents
UNIVERSITÉ DE SHERBROOKE

Measuring impulse responses containing complete spatial information ABSTRACT

SPATIAL SOUND REPRODUCTION WITH WAVE FIELD SYNTHESIS

Multiple Sound Sources Localization Using Energetic Analysis Method

Implementation of decentralized active control of power transformer noise

Virtual Sound Source Positioning and Mixing in 5.1 Implementation on the Real-Time System Genesis

Sound Source Localization using HRTF database

A Directional Loudspeaker Array for Surround Sound in Reverberant Rooms

Room Impulse Response Modeling in the Sub-2kHz Band using 3-D Rectangular Digital Waveguide Mesh

ON THE APPLICABILITY OF DISTRIBUTED MODE LOUDSPEAKER PANELS FOR WAVE FIELD SYNTHESIS BASED SOUND REPRODUCTION

LOCAL MULTISCALE FREQUENCY AND BANDWIDTH ESTIMATION. Hans Knutsson Carl-Fredrik Westin Gösta Granlund

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Analysis of Frontal Localization in Double Layered Loudspeaker Array System

The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals

ONE of the most common and robust beamforming algorithms

Encoding higher order ambisonics with AAC

Compressive Through-focus Imaging

Proceedings of Meetings on Acoustics

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

Empirical Rate-Distortion Study of Compressive Sensing-based Joint Source-Channel Coding

Towards an enhanced performance of uniform circular arrays at low frequencies

Digital Loudspeaker Arrays driven by 1-bit signals

Psychoacoustic Cues in Room Size Perception

Soundfield Navigation using an Array of Higher-Order Ambisonics Microphones

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

arxiv: v1 [cs.sd] 4 Dec 2018

Convention Paper Presented at the 126th Convention 2009 May 7 10 Munich, Germany

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Subband Analysis of Time Delay Estimation in STFT Domain

Guided Wave Travel Time Tomography for Bends

Spatial audio is a field that

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position

Three-dimensional sound field simulation using the immersive auditory display system Sound Cask for stage acoustics

Sparsity-Driven Feature-Enhanced Imaging

Wave Field Analysis Using Virtual Circular Microphone Arrays

ROOM SHAPE AND SIZE ESTIMATION USING DIRECTIONAL IMPULSE RESPONSE MEASUREMENTS

DESIGN AND APPLICATION OF DDS-CONTROLLED, CARDIOID LOUDSPEAKER ARRAYS

EVALUATION OF A NEW AMBISONIC DECODER FOR IRREGULAR LOUDSPEAKER ARRAYS USING INTERAURAL CUES

CLAUDIO TALARICO Department of Electrical and Computer Engineering Gonzaga University Spokane, WA ITALY

DISTANCE CODING AND PERFORMANCE OF THE MARK 5 AND ST350 SOUNDFIELD MICROPHONES AND THEIR SUITABILITY FOR AMBISONIC REPRODUCTION

AUTOMATIC EQUALIZATION FOR IN-CAR COMMUNICATION SYSTEMS

Audio Imputation Using the Non-negative Hidden Markov Model

Spatialisation accuracy of a Virtual Performance System

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

DIGITAL IMAGE PROCESSING Quiz exercises preparation for the midterm exam

University Ibn Tofail, B.P. 133, Kenitra, Morocco. University Moulay Ismail, B.P Meknes, Morocco

Radiation Pattern of Waveguide Antenna Arrays on Spherical Surface - Experimental Results

Introduction. 1.1 Surround sound

Enhanced Waveform Interpolative Coding at 4 kbps

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

Development of multichannel single-unit microphone using shotgun microphone array

Sound source localization accuracy of ambisonic microphone in anechoic conditions

Chapter 17 Waves in Two and Three Dimensions

MODELLING ULTRASONIC INSPECTION OF ROUGH DEFECTS. J.A. Ogilvy UKAEA, Theoretical Physics Division HARWELL Laboratory. Didcot, Oxon OXll ORA, U.K.

Sound Radiation Characteristic of a Shakuhachi with different Playing Techniques

Basic Signals and Systems

Lecture 3 Complex Exponential Signals

LINE ARRAY Q&A ABOUT LINE ARRAYS. Question: Why Line Arrays?

Composite aeroacoustic beamforming of an axial fan

Effects on phased arrays radiation pattern due to phase error distribution in the phase shifter operation

Direction-Dependent Physical Modeling of Musical Instruments

Finite Word Length Effects on Two Integer Discrete Wavelet Transform Algorithms. Armein Z. R. Langi

Faculty of science, Ibn Tofail Kenitra University, Morocco Faculty of Science, Moulay Ismail University, Meknès, Morocco

Reducing comb filtering on different musical instruments using time delay estimation

Drum Transcription Based on Independent Subspace Analysis

Digital Video and Audio Processing. Winter term 2002/ 2003 Computer-based exercises

Ivan Tashev Microsoft Research

MULTICHANNEL REPRODUCTION OF LOW FREQUENCIES. Toni Hirvonen, Miikka Tikander, and Ville Pulkki

Wellenfeldsynthese: Grundlagen und Perspektiven

Circumaural transducer arrays for binaural synthesis

GETTING MIXED UP WITH WFS, VBAP, HOA, TRM FROM ACRONYMIC CACOPHONY TO A GENERALIZED RENDERING TOOLBOX

Image Enhancement in Spatial Domain

ARRAY PROCESSING FOR INTERSECTING CIRCLE RETRIEVAL

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS

ROOM IMPULSE RESPONSES AS TEMPORAL AND SPATIAL FILTERS ABSTRACT INTRODUCTION

DISPLAY metrology measurement

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION

THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA. Department of Electrical and Computer Engineering. ELEC 423 Digital Signal Processing

Proceedings of Meetings on Acoustics

Planar Phased Array Calibration Based on Near-Field Measurement System

Omnidirectional Sound Source Tracking Based on Sequential Updating Histogram

Lab S-1: Complex Exponentials Source Localization

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido

The Steering for Distance Perception with Reflective Audio Spot

About Doppler-Fizeau effect on radiated noise from a rotating source in cavitation tunnel

Continuous Arrays Page 1. Continuous Arrays. 1 One-dimensional Continuous Arrays. Figure 1: Continuous array N 1 AF = I m e jkz cos θ (1) m=0

Atmospheric Effects. Atmospheric Refraction. Atmospheric Effects Page 1

The psychoacoustics of reverberation

SYNTHESIS OF DEVICE-INDEPENDENT NOISE CORPORA FOR SPEECH QUALITY ASSESSMENT. Hannes Gamper, Lyle Corbin, David Johnston, Ivan J.

Target detection in side-scan sonar images: expert fusion reduces false alarms

FOURIER analysis is a well-known method for nonparametric

MPEG-4 Structured Audio Systems

Scan-based near-field acoustical holography on rocket noise

Convention Paper Presented at the 139th Convention 2015 October 29 November 1 New York, USA

AN APPROACH TO LISTENING ROOM COMPENSATION WITH WAVE FIELD SYNTHESIS

Multiple Input Multiple Output (MIMO) Operation Principles

Smart antenna for doa using music and esprit

Non Unuiform Phased array Beamforming with Covariance Based Method

Evaluation of a new stereophonic reproduction method with moving sweet spot using a binaural localization model

Transcription:

SOUND FIED REPRODUCTION OF MICROPHONE ARRAY RECORDINGS USING THE ASSO AND THE EASTIC-NET: THEORY, APPICATION EXAMPES AND ARTISTIC POTENTIAS Philippe-Aubert Gauthier GAUS, Groupe d Acoustique de l Université de Sherbrooke 2500, boul. de l Université, Sherbrooke, J1K 2R1, Canada philippe aubert gauthier@hotmail.com Alain Berry GAUS, Groupe d Acoustique de l Université de Sherbrooke 2500, boul. de l Université, Sherbrooke, J1K 2R1, Canada Alain.Berry@USherbrooke.ca ABSTRACT Wave Field Synthesis (WFS) aims at sound field reproduction using a distribution of secondary sources. Most common WFS applications recreate target sound fields as defined by primary sources (spherical, planar, etc.) in a virtual space. These primary sources are typically visually positioned in a graphical interface by the user or the composer. In fewer cases, WFS is aimed at the reproduction of sound field captured by a microphone array (such as in sound field composition in music) where the microphone array is either conformal or not to the loudspeaker array. In the latter, more common, case, one of the challenges is to convert the microphone array recordings to WFS or loudspeaker signals. This can typically be achieved using spatial transformation (and sound field extrapolation) or inverse problem. While relying on inverse problem, it is known that although the reproduced sound field may be accurate at the microphone array, it typically comes at the expense of non-sparse activation of the reproduction sources. This may lead to a blurry spatial image. In this paper, the capacity of the lasso and elastic-net algorithms to provide a highly spatially-sparse inverse solution is investigated in order to provide a sharply localized reproduction. At the light of the reported simulation results, it is concluded that both algorithms have a great potential to strictly limit the number of active reproduction sources. Application examples and artistic potentials are provided and discussed. 1. INTRODUCTION Sound Field Reproduction (SFR) based on Wave Field Synthesis (WFS) relies on the Kirchhoff-Helmholtz integral that relates the sound pressure field inside a given volume to the sound pressure and sound pressure gradient on the surface enclosing the volume [1]. Using discretization of the boundary, WFS then aims at SFR using a discrete set of reproduction sources known as the secondary sources. For most of the current WFS applications in music and audio context, the virtual sources (also known as primary sources) are simple acoustical sources (e.g. spherical waves, plane waves, spherical waves with directivity patterns) for which one can derive a simple, closed-form expression of the secondary sources driving signals. Also, this is a convenient procedure for composers and creators since they can easily create primary sources on a graphical user interface and drive the primary sources with monophonic signals. In some applications, it is more relevant to achieve WFS or SFR of a sound field which has been recorded using one or more microphone arrays. As an example, the authors have worked on the spatial capture and reproduction (in laboratory conditions) of various industrial noise environments in order to study alarms audibility in safe and controlled conditions. This is illustrated in Fig. 1. In Fig. 1(a), various industrial sound environments have been recorded using microphone arrays. At a latter stage, the aim is to recreate the measured sound field using a standard WFS layout as shown in Fig. 1(b) where the microphone array is reinstalled in order to physically assess the reproduced sound fields. The work presented in this paper is targeted at this specific application context which somehow mirrors musical works and sound arts based on field recordings. One of the challenges for these applications is to preserve or even enhance the spatial sharpness of the spatial image. To this end, one looks for a sparse representation of the measured sound fields. It is theoretically possible to achieve WFS of recorded sound fields using microphone arrays. The most straightforward implementation being that of a conformal microphone array [1], i.e. a microphone array matches the loudspeaker array. However, this is not a viable solution when varying WFS loudspeaker layouts. One solution proposed by Huselbos et al. [2] is to rely on spatial transformations in order to achieve sound field extrapolation from the microphone array to the loudspeaker array. This is only possible for regular microphone array geometry such as uniform rectangular, circular or spherical where spatial transform can directly be performed. In the case of irregular microphone arrays, one then relies on an inverse solution to reconstruct secondary source distributions that had potentially created the measured target sound field. One of the issues with inverse problems using classical regularization methods such as Tikhonov regularization or truncated singular value decomposition is that all the secondary sources are typically active. In some cases, this is not a desirable feature since it may lead to a blurry spatial image. One interesting avenue is related to algorithms that allow for a sparse solution, that performs automatic strict and limited source selection. Examples of such algorithms are the lasso ( east

Proc. of insonic2015, Aesthetics of Spatial Audio in Sound, Music and Sound Art (a) On-site microphone array measurement. (b) University of Sherbrooke WFS room. Figure 1: Application example: aboratory reproduction of industrial sound environments recorded using microphone arrays. The picture on the right shows the WFS room with the microphone array for physical evaluation of the reproduced sound environments. Absolute Shrinkage and Selection Operator ) [3] and the elastic-net [4]. Recently, ilis et al. [5] investigated the lasso algorithm to perform SFR. To this end, they developed a complex implementation of the lasso originally developed for real-valued quantities. ilis et al. [5] used a complex implementation of the coordinate-descent algorithm. In [5], the aim of the authors was twofold: 1) perform optimal loudspeaker selection from simulation in order to reduce the actual loudspeaker count in practical implementation, 2) achieve accurate SFR of under-sampled sound field. Other works in SFR looking for solution sparsity are found in the literature [6, 7]. Sparsity has also been investigated for acoustical holography [8]. If the lasso can efficiently perform variable selections, it can also suffers of different issues that the elastic-net is supposed to solve [4]. To the authors knowledge, there is no known work on SFR using the complex elastic-net. Therefore, the aim of this paper is to achieve preliminary verifications of the elastic-net capacity to perform at least as efficiently as the lasso for SFR. In this paper, we further adapt the complex coordinate-descent algorithm to the elastic-net, which, with proper parametrization, includes both the lasso and classical Tikhonov regularization [4]. Simulation results are provided for two different types of SFR: 1) using plane waves or 2) using spherical waves (theoretical loudspeakers). The actual extension to the elastic-net and the practical investigation of these two scenarios are the main distinctions with the work of ilis et al. [5]. Furthermore, some spatial snapping effect are observed and discussed in the case of a small number of active sources. Related artistic and spatial composition potentials and ideas are discussed. 2. PROBEM DEFINITION First, one starts with the direct problem that describes the reproduced sound pressures at microphone array for a source distribution with a known transfer function matrix. Note that the sources can either represents real reproduction sources such as loudspeakers or virtual sources, such as plane waves, that will recreate the sound field, in a least-mean-square (MS) sense at the microphone array. Knowing the direct problem, the aim of inverse problem is to find the potential source distributions that might have created the measured sound pressures at the microphone array. Most inverse problem will lead to wrong solutions since they are typically very sensitive to measurement noise. To solve this issue, a classical approach is to regularize the inverse problem while including a penalization for any strong source distribution. However, classical regularization methods will typically, for SFR application, activate all sources. Both the lasso and the elastic-net can potentially solve these issues. 2.1. The lasso algorithm The lasso algorithm, the east Absolute Shrinkage and Selection Operator, is aimed at solving an inverse problem, i.e. find the sources that minimizes the reproduction error at the microphone array, given the measured sound field while favoring sparsity of the resulting solution (i.e. the number of active plane or spherical wave sources). For the lasso, this is performed through the combination of reproduction error minimization and a penalization of the solution 1-norm. The solution 1-norm is simply the sum of the absolute value of each solution coefficients. The solution 1-norm is responsible for variable selection, i.e. sparsity of the solution. Details are provided in the Appendix. 2.2. The elastic-net The elastic-net takes the lasso capacity to achieve strict source selection while also introducing more classical vector 2-norm penalization. In fact, the elastic-net performs a linear blend of the vector 1-norm regularization and classical (vector 2-norm) regularization. The elastic-net was originally introduced to solve few limitations of the lasso (herein adapted to the SFR context): 1) its inability to select more active sources than the number of microphones for more sources than microphones, 2) any possible correlation between sources effects at the microphone array will reduce the efficiency of the lasso. Details of the elastic-net cost function and corresponding algorithm are summarized in the Appendix.

Analysis of multichannel target signals Processing Synthesis of multichannel source signals Time-domain windowing (WOA) Short-Term Fourier Transform asso or Elastic-net Per frame, per freq. Inverse Short-Term Fourier Transform Time-domain windowing (WOA) Figure 2: Block diagram of the lasso/elastic-net implementation based on weighted overlap-add (WOA) method. From left to right, multichannel target signals are windowed (the window must respect the constant-overlap-add (COA) power complementarity property). Short-Term Fourier Transform (STFT) is then applied to each target signal. The lasso/elastic-net algorithm then processes these spectrograms to turn them in sparse multichannel representation. The synthesis stage includes inverse STFT and synthesis window. (a) Target sound field for a single spherical source. (b) Target sound field for two spherical sources. Figure 3: Real parts of target sound fields at 800 Hz. Microphone array:. oudspeaker array:. 2.3. Signal processing A block diagram of the signal processing chain for time-domain implementation of the algorithms is presented in Fig. 2. This implementation is used to produce the time-domain simulation reported in Sec. 3. 3. SIMUATION RESUTS Simulation results are provided for two scenarios based on the equations provided in the Appendix. The first one is a target sound field created by a single spherical wave at 800 Hz originating from x 1 = 0 m and x 2 = 4 m (see Fig. 3(a)). The second one is the same with an additional spherical wave originating from x 1 = 4 m and x 2 = 4 m (see Fig. 3(b)). Both sources have the same amplitude. The microphone array with M = 85 microphones and the loudspeaker array with = 96 loudspeakers are shown in Fig. 3. For the reported results, the elastic-net parameter is α = 0.8 (see Appendix for α). Indeed, it is known that the valid range for α is either around 0 or 1 [4]. Since α = 0 will not provide sparse solution, α = 0.8 was used. Formal α selection for SFR is a topic of a current research. In the case of sources defined as plane waves, the microphone array recording is converted to a spherical distribution of incoming plane waves (with = 162 plane waves) for subsequent WFS of each plane wave (an example is found in [7]). The plane-wave reconstructed sound fields using the lasso and elastic-net algorithms with maximum number of active source max = 2 and max = 8, respectively, are shown in Fig. 4. The resulting mean squared reproduction error E at the microphone array is included in the legend. The plane waves that are active in the solution are identified by their propagating direction using arrows in Fig. 4. Clearly, the lasso replaces the spherical target sound field by a single plane wave emanating from the correct direction (from the top of the figure) with correct amplitude. For the sound field recreated by the elastic-net, since there are more plane waves max, the curvature of the target sound field is partly recreated, at least around the microphone array. The elastic-net with a larger max provides a smaller resulting error although preserving sparsity. Next, the lasso and elastic-net are tested for the same target sound pressure fields with sources corresponding to a loudspeaker array that emits spherical waves. Results are shown in Fig. 5. Clearly, the lasso replaces the target sound field by a reproduced sound field that only stems from the two central loudspeakers (center top of the figure). Therefore, for a listener in the microphone array region, the incoming direction should be properly reproduced while outside of the microphone array, the virtual source should be perceived as much closer than the one used to create the simulated target sound field. For the elastic-net, since max is larger, more loudspeakers are active and the curvature of the wavefront in the microphone array region is much more closer to the target. However, the reproduced

(a) asso, penalization selected for max = 2, E = 0.005. (b) Elastic-net, penalization selected for max = 8, E = 4.13 10 4. Figure 4: Real parts of recreated sound fields using plane wave conversion for the target shown in Fig. 3(a). Microphone array:. Selected plane waves and corresponding amplitudes are shown as arrows indicating selected propagation directions. (a) asso, penalization selected for max = 2, E = 0.005. (b) Elastic-net, penalization selected for max = 8, E = 2 10 4. Figure 5: Real parts of recreated sound fields using spherical waves (theoretical loudspeakers) for the target shown in Fig. 3(a). Microphone array:. oudspeaker array:. Active loudspeakers are shown as circles (sizes are proportional to sources amplitudes). sound field is much less smooth and inconsistent sound perception may occur outside the microphone array region. This is not the case for the results of lasso with max = 2 since the reproduced sound field is consistent over the entire shown area because of the solution extreme sparsity. The case of a target sound field with two sources (Fig. 3(b)) is now investigated. Results of the lasso with sources as plane waves (i.e. conversion of the microphone array recording to a plane wave representation) and with max = 4 is shown in Fig. 6(a). The lasso reproduces the correct incoming two directions (top and bottom-right) but, as for the previous plane wave cases, the curvatures of the target wavefronts are not reproduced since very few plane waves are activated, i.e. one from the top and one from bottom-right of the figure. Results for the lasso while using the theoretical loudspeaker as potential sources are shown in Fig. 6(b). In this case, the reproduced sound field is created by a very limited set of loudspeakers where the virtual sources somehow snap to the closest loudspeakers. As expected, and as for the previous cases, the solution is sparse and sharply localized while the energy is limited to a small set of sources defined by max = 4. The sparsity of the solution should typically ensure that the spatial image would be much sharper as very few loudspeakers are active for each of the two virtual sources that created the target sound pressure field. For some applications, although it does not exactly reproduce the target sound field in an extended area, this is a desirable feature to avoid a blurry spatial image. The same scenario is solved by the elastic-net both for plane wave and spherical wave sources with max = 16. Results are shown in Fig. 7. The results of the elastic-net with the loudspeakers as sources provide an excellent approximation of the target sound field in the microphone array region. The fact that up to 16 sources can be active concurrently helps recreating the curvature of the target wavefronts. In order to illustrate the validity of the method and of the actual implementation (Fig. 2), a final example is provided for the lasso

(a) Reproduction sources are plane waves, E = 0.0072. (b) Reproduction sources are spherical waves, E = 0.0056. Figure 6: Real parts of recreated sound fields using plane waves and spherical waves (theoretical loudspeakers) using the lasso with max = 4 for the target sound field shown in Fig. 3(b). Microphone array:. oudspeaker array:. Selected plane waves and corresponding amplitudes are shown as arrows indicating selected propagation directions. Active loudspeakers are shown as circle markers. Marker sizes are proportional to sources amplitudes. (a) Reproduction sources are plane waves, E = 0.0014. (b) Reproduction sources are spherical waves, E = 5 10 4. Figure 7: Real parts of recreated sound fields using plane waves and spherical waves (theoretical loudspeakers) using the elastic-net with max = 16 for the target sound field shown in Fig. 3(b). Microphone array:. oudspeaker array:. Selected plane waves and corresponding amplitudes are shown as arrows indicating selected propagation directions. Active loudspeakers are shown as circle markers. Marker sizes are proportional to sources amplitudes. for a time domain simulation using the block implementation described earlier. Results are shown in Fig. 8(a) for the case of three virtual sources (one of which is focused (top right)), with max = 8 (for each frequency and each block) and with spherical sources as reproduction sources. Clearly, the lasso only activates few of the loudspeakers and the incoming directions are recreated. 4. DISCUSSION: ARTISTIC POTENTIAS AND APPICATION EXAMPES In this section, the reported simulation results are discussed from the vantage point of potential applications or the composer in a music or sound arts context that relies on sound field recording (or shortly field recording, as known in the sound art community). 4.1. Sharp and sparse localization versus diffuse sources In this paper, it was shown that the lasso and elastic-net algorithms can convert microphone array signals to spatially-sharp, or sparse, source signals. From a composer s view point, this corresponds to enhancing spatial contrast, the amount of which can be controlled by max. However, one question that arises is what happens for target fields that are diffuse or created by distributed sources. In this case,

Infinite radius circle (plane wave sources) A real source snaps to infinity Microphone array oudspeaker array A real source snaps to loudspeaker array (a) Time domain simulation. (b) Distance snapping. Figure 8: (a): Two snapshots of time domain simulation of the lasso algorithm for a three-source scene using an weighted-overlap-add method (top row: target sound field, bottom row: reproduced sound field). Instantaneous absolute values of non-zero source signals are shown as black and green markers (sizes are proportional to amplitude). (b): Schematic illustration of distance snapping. Actual sources are shown as filled circles. Snapped sources are shown as white-filled circles. Recreated wavefronts are schematically illustrated inside the loudspeaker array. we expect that the snapping will be rapidly changing (at frame rate) and will be different for each frequency, hence creating a potentially diffuse effect. Further investigations are needed as it would also depend on max and elastic-net parameter α (see Appendix). 4.2. Distance snapping As it should now be clear from the comparison of the cases with plane waves versus spherical waves (theoretical loudspeakers) sources, because of the strict selection ability (being manifest in the resulting solution sparsity) of the lasso and elastic-net, one of the application potential is the effortless distance snapping in which the incoming directions of the target wave field are preserved but the distances are simply replaced by the distances of the reproduction sources. It is important to keep in mind that this would not be possible using the inverse solution with Tikhonov regularization (regularization of the solution 2-norm only, elastic-net parameter α = 0) since, in this case, many plane waves or spherical waves will be combined to fuse into a sound field that strictly (in the MS sense) approaches the target sound field at the microphone array. In order to clarify the resulting application potential, two practical examples related to the two reported cases are stated in simple words. For plane wave reproduction sources, any real source that created part of the target sound field will be snapped to an equivalent source at infinity producing plane waves. If the resulting plane wave signals are reproduced using classical WFS of plane waves, the resulting effect is schematically shown in Fig. 8(b) by grey markers. The recreated wavefronts will be flat, as shown in grey in the figure. For spherical wave sources, such as loudspeakers, any real source that contributed to the measured sound field will be replaced by the closest loudspeaker (or group of) in the reproduction array. This is also shown in Fig. 8(b) by black markers. The recreated sound field will emerge from the loudspeaker which is the closest to the actual source that created the target sound field as measured by the microphone array. However, it is important to understand that such snapping can only be effective for relatively small max. This would then correspond to a complex target sound field for a given time frame and frequency being replaced by few single plane waves or spherical wave sources. 4.3. Combined distance snapping For a composer working with field recordings using microphone arrays of any channel count, such snapping capacity is interesting if one wants to mix a given recording and tweak it to the full background, as plane waves, with an other recording and force it in the foreground using spherical waves or loudspeakers at a given distance as reproduction sources. By mixing down the two results, one can then combine and recompose the overall distances of several recordings for artistic and compositional purposes. One could then imagine various ways to achieve combined distance snapping. This is one single example from many other aesthetic, spatial composition and artistic possibilities if one relies on other distributions of sources combined to the lasso or elastic-net algorithms.

5. CONCUSION This paper presented theoretical investigations on the capability of the lasso and elastic-net algorithms to provide a sparse recreation of a target sound field captured using a microphone array for artistic or compositional purposes. Both algorithms were applied for two different scenarios: 1) conversion of the microphone array recordings to plane waves for subsequent reproduction by WFS using classical plane wave operators and 2) reproduction of the microphone array recording while directly looking for a sparse solution at the loudspeakers. Based on the reported simulation results, it was shown that both the lasso and elastic-net can perform on complexvalued quantities and provide sparse solutions (while keeping the maximum number of active sources relatively small) with a sharp spatial resolution. Depending on the application context, this is a desirable feature that the standard MS approach cannot achieve since classical vector 2-norm regularization cannot perform strict source selection. Based on the reported results, one can conclude that the elastic-net can at least perform as well as the lasso. Thorough investigations of the behavioral differences between the lasso and elastic-net are now required in order to further identify the potential interest of the elastic-net over the lasso for SFR. On this matter, based on the literature [4], the elastic-net has shown to demonstrate several advantages over the lasso: 1) group selection, 2) capacity to handle high correlations between the column vectors of the transfer function matrix. These should be investigated for the specific context of SFR. Finally, a snapping effect was observed. Indeed, while limiting the maximum number of active sources in the sparse solution, it was observed that the actual sound source that created the target sound field tends to snap to the closest reproduction source. For the reported scenarios, this was called distance snapping since for the plane wave conversion case, all target sources are replaced by equivalent plane waves (sources at infinite distance) while for the scenario with spherical waves (theoretical loudspeakers), the actual sources snapped to the closest loudspeakers. This is a promising feature for sound environment reproduction or musical works based on microphone array recordings. The simulation results were provided for simple illustrative scenarios, further investigations and simulations of more complex target sound fields such as diffuse sound fields or sound created by moving sources could be an interesting research avenue. The idea to look for solution sparsity should also be further investigated in order to illustrate and quantify its interest for SFR in relation with spatial sound perception. Experimental reproduction of actual recordings and experimental validation of both the lasso and elastic-net solutions are the topics of current work. 6. ACKNOWEDGMENT The authors acknowledge the financial support of the IRSST ( Institut de recherche Robert-Sauvé en santé et sécurité du travail ). 7. REFERENCES [1] J. Ahrens, Analytical Methods of Sound Field Synthesis, Springer, 2012. [2] E. Hulsebos, D. de Vries, E. Bourdillat, Improved Microphone Array Configuration for Auralization of Sound Fields by Wave- Field Synthesis, J. Audio Eng. Soc., vol. 50, no. 10, pp. 779-790, October 2002. [3] R. Tibshirani, Regression Shrinkage and Selection via the asso, J. Roy. Stat. Soc., vol. 58, no. 1, pp. 267 188, 1996. [4] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, J. Roy. Stat. Soc., vol. 67, pp. 301-320, 2005. [5] G.N. ilis, D. Angelosante, G.B. Gionnakis, Sound Field Reproduction using the asso, IEEE Trans. Audio Speech ang. Process., vol. 18, no. 8, pp. 1902 1912, Nov 2010. [6] A. Wabnitz, N. Epain, A. Van Schaik, C. Jin, Time domain reconstruction of spatial sound fields using compressed sensing, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, pp. 465 468, 2011. [7] M. Cobos, S. Spors, J. Ahrens, J.J. opez, On the use of small microphone arrays for wave field synthesis auralization, in Proceedings of the AES 45th International Conference, Helsinki, Finland, March, 2012. [8] P. Schmidt, Improvements in ocalization of Planar Acoustic Holography, Master s thesis, University of Music and Performing Arts, Graz, Austria. 8. APPENDIX: PROBEM DEFINITION Assuming that the described computations are done for each frequency f and for each time frame t, dependency on frequency and time frame are omitted. The direct problem that provides the complex reproduced sound pressures p C M 1 for a source distribution s C 1 is given by: p = Gs (1) with the complex transfer function matrix G C M. For the case of plane wave sources and spherical wave sources, the G coefficients are given by: G ml = e jk l r m for plane waves, G ml = e jkr ml /r ml for spherical waves (2)

where r m is the position vector of microphone m and k l is the wavenumber vector for source plane wave l: k l = k[cos(θ l )cos(β l ) sin(θ l )cos(β l ) sin(β l )] (with azimuth θ l and elevation β l ). The wavenumber is k = 2πf/c with sound speed c. For spherical waves, r ml is the distance from the l-th source to the m-th microphone. 8.1. Cost function of the lasso The lasso algorithm is aimed at solving an inverse problem (i.e. given Eq. (1), find s that minimizes the reproduction error e = d p, with d being the measured sound field) while favoring sparsity of the resulting solution s (i.e. the number of active plane or spherical wave sources). For the lasso, this is performed through the combination of reproduction error minimization and a penalization of the solution 1-norm. The cost function J λ is [3] J λ = 1 2 d Gs 2 2 + λ s 1 (3) where p is the p-norm of the argument. It is recalled, that the vector 1-norm is defined as the sum of the magnitude of the coefficients s 1 = l=1 s l. In Eq. (3), λ > 0 is a penalization parameter. 8.2. Cost function of the elastic-net The elastic-net takes the lasso capacity to achieve strict source selection while also introducing more classical vector 2-norm penalization. The cost function is given by J λ,α = 1 2 d (1 α) Gs 2 2 + λα s 1 + λ s 2 2 (4) 2 with the elastic-net parameter α [0, 1]. In fact, α performs a linear blend of the vector 1-norm regularization and classical Tikhonov (vector 2-norm) regularization. With α = 0, the elastic-net is simply a least-mean-square problem with Tikhonov regularization (provided that λ > 0). With α = 1, the elastic-net exactly corresponds to the lasso. Accordingly, the elastic-net includes the lasso. Therefore, in the next sections, the developments are only presented for the elastic-net. 9. APPENDIX: COMPEX COORDINATE-DESCENT AGORITHM FOR THE EASTIC-NET Originally, the lasso and elastic-net algorithms were developed for real-valued quantities (d, G, s, and p) [3, 4]. In this paper, a similar development than the one found in [5] is adapted to combine the elastic-net and the lasso. 9.1. Adaptation of the coordinate-descent algorithm The inverse problem of finding the sources s provided a measured pressure field d is given by s = argmin{j λ,α }. The complex coordinate-descent is an iterative process that works as follows for the -th coefficient in s. The iteration index is denoted by (i), the preceding source coefficients (l < ) are known to be [s (i) 1 s(i) 1 ] from the current iteration and the following source coefficients (l > ) are known from the previous iteration to be [s (i 1) +1 s(i 1) ]. The algorithm starts with s (1) l = 0. Using these known 1 coefficients, it is possible to know the reproduction error e (i) with the -th source missing for the current iteration. Then the elastic-net problem can be rewritten using this temporary error e (i) for the s coefficient only s (i) with a temporary cost function for source and current iteration J (i) λ,α, = 1 e (i) 2 2 gs + λα s + 2 (i) = argmin{j λ,α, } (5) (1 α) λs s (6) 2 where complex conjugation is denoted by and where g is the -th column of G. It turns out that the last equation is differentiable with respect to ρ and φ if one uses a polar representation of s = ρ e jφ. In the previous equation and from now on, the iteration superscript for s (i) is dropped for sake of simplification. Therefore, s is s(i), ρ is ρ(i), and φ is φ(i). By first requesting J (i) λ,α, / φ = 0 with ρ > 0, one finds that ( ) φ = g H e (i) (7) where designates the phase of the argument. And one finally finds, for J (i) λ,α, / ρ = 0 with ρ > 0, after some developments, that 0 if g H e (i) λα ρ = ( g 2 2 + λ(1 α) ) ( ) 1 g H e (i) λα if g H e (i) (8) > λα