FROM BLIND SOURCE SEPARATION TO BLIND SOURCE CANCELLATION IN THE UNDERDETERMINED CASE: A NEW APPROACH BASED ON TIME-FREQUENCY ANALYSIS

' FROM BLIND SOURCE SEPARATION TO BLIND SOURCE CANCELLATION IN THE UNDERDETERMINED CASE: A NEW APPROACH BASED ON TIME-FREQUENCY ANALYSIS Frédéric Abrard and Yannick Deville Laboratoire d Acoustique, de Métrologie et d Instrumentation Université Paul Sabatier 8 route de Narbonne 6 Toulouse cede, France abrard@cictfr, ydeville@cictfr ABSTRACT Many separation methods are restricted to non-gaussian, stationary and independent s This yields some problems in real applications where the s ten do not match these hypotheses Moreover, in some cases we are dealing with more s than available observations which is critical for most classical separation approaches In this paper, we propose a new simple separation method which uses - information to cancel one signal from two observations in linear instantaneous mitures This efficient method is directly designed for non-stationary s and applies to various dependent or Gaussian signals which have different - s Its other attractive feature is that it performs cancellation when the two considered mitures contain more than two s Detailed results concerning mitures speech and music signals are presented in this paper INTRODUCTION We first consider the following miture model: () where the coefficients are real and constant Our goal is to find a method for separating two signals and from the two observations and without! knowing the miing coefficients nor the s This problem is called Blind Source Separation (BSS) and is well known in the signal processing community $#&% Writing Equ () in matri notation ", this problem is equivalent to finding an inverse matri ' such that Paul White Signal Processing and Control Group Institute Sound and Vibration Research University Southampton Highfield SO7 BJ England prw@isvrsotonacuk )(+*,#,- where * is a permutation matri and ( is a diagonal matri [] One can find a review many methods for achieving this separation in [] Most them are statistic-based methods including an adaptive part and can only be applied to specific signals like stationary and non-gaussian signals Moreover, these methods need the signals to be independent and ten fail when more s than s are present in the observations Especially, we recently proposed an approach based on / -order normalized cumulants (ie kurtosis) [] allowing one to solve the problem when the number s is equal to the number observations This method consists in finding a linear combination the two observations 5 7698: () which achieves the etraction one up to a scale factor ; The 8 proper separating coefficients for etracting or by means Equ () are respectively: 8<>=@? A =ABA 8C>=@?B? =AD? () We also proposed a related solution for the underdetermined case by cancelling the influence the stationary s during the adaptation step in order to achieve a partial separation [], [], [5] These methods are efficient but the s must be non-gaussian, independent with a special stationarity We show in this paper that these restrictions can be reduced if we use the and information the signals A few authors [6], [7] proposed solutions using information but their approaches are comple and require high-computational load With the same separation structure as in Equ (), we propose here a new simple method for cancelling one with less restrictions than with classical methods 7

/ " " PRELIMINARY IDEA: TEMPORAL ANALYSIS If we can find sections in the domain where ; and contain only the contribution one, we can easily find the separating coefficient values 8 that we introduced in Equ () For eample if we can find a such that +, then () yields: + + + + () By computing the ratio + + (5) we directly obtain the value 8@ 8 which etracts the This means that we theoretically only need a to disappear at to find a separating coefficient This is a really simple separation method but, unfortunately, it is usually hard to find an instant or interval where only one occurs To overcome this problem we propose a new approach eploiting the domain TIME-FREQUENCY ANALYSIS In the previous section, we presented a technique for finding the separating coefficient if one disappears over a known short interval But we need to find a more general method allowing one to solve this problem if both s are simultaneously present or if one does not know when these s disappear To this end, we use and request the following assumptions: The - transform each must be different for -adjacent - windows There must eist some - windows where only one is present Many powerful - methods have been developed during the last fifty years with different application fields One can find most them with detailed references in [8], [9], [], [] To avoid the interference areas present in the and higher order eisting methods, the most relevant starting point to solve our problem is to use the simple short Fourier transform the observations as defined in [] We first Due to statistical fluctuations, even white noise signals with theoretically constant power spectrum densities satisfy this assumption for short windows in practice This situation is really common in speech or music for eample The formants a same or different speaker/instrument are located in different - areas depending on the produced sound! + multiply each mied 6 signal by a shifted Hanning window function, centered at, to produce the modified signal: + 5 + 6 (6) This new function is now a function two s, the fied we are interested in,, and the running We + then compute the short Fourier transform each, ie: - + 6 Our goal is now to find some - domains where only one occurs To this end we introduce the comple ratio: " (7) (8) This ratio is computed for each and angular window With Equ (), this leads to: % % % % (9) One can easily see that if one does not have any component at, ie on the Hanning window and window respectively centered on and, then 8 is real and equal to the value the separating % coefficient for etracting this For eample if is missing then becomes: () which is the correct coefficient to etract with Equ () This situation, when s have slightly different - s is more frequent than the case when one disappears during a period For eample the - properties two people speaking at the same <! are different #" $ % & We denote ' % )( only one occurs at ' % Now the remaining question is how can we find these domains? Our 8 $ % 8 idea is $ % +* that each < value is ideally equal to or as, whereas,& it takes different values in all the *5< other regions Especially, $ % if only ' % is present in several successive 8@ then is constant and equal to over these successive windows, whereas it successively takes different values if both s are present AND if their - s are not constant To eploit this, we compute the statistical variance on a limited series -) short half-overlapping windows corresponding to 75

/ : :, and this for each window We resp define the mean and variance - / over these windows by: () 6 - () % If eg for these / windows, then Equ (9) shows that is constant over them so %7 that its variance % is equal to zero Conversely, if both and are different from zero AND non constant values over -, then is significantly different from zero So by searching for the lowest value epression () vs all the available series windows -%, we directly find a - domain -% where 8 only one is present The corresponding value to cancel this is then given by the mean computed in Equ () To find the second separating coefficient, we just have to check the net lowest value epression 8 () vs -) +which gives a significantly different A difference is a good practical value, allowing 8 8 hard mitures, where both separating coefficients and are similar range We now have the two best estimated values the correct separating coefficients given in Equ () EXTENSION TO THE UNDERDETERMINED CASE The previous criterion allows one to cancel one in the observations if there eists a - window where only this occurs This criterion may be etended to the case when we have observations s In this case, the observed signals become: ; () The comple ratio Equ (8) here reads: % %' () One can see in Equ () that if only eists in a - window we have eactly the same epression as in Equ (), ie: (5) This value gives the eact coefficient to cancel the contribu- in the observations by using () The tion from only restriction is, once again, that there must eist a window where only this occurs and that the - transform each is not constant over - This solution is perfectly suited to noise reduction for eample By determining a - window where only the noise occurs this method gives an efficient solution to cancel it, under the assumption that the signal considered as noise is the same in both observations, up to a scale factor This method also applies to karaoke-like applications Using the stereo observation a recorded song, we are able under assumption and to cancel the contribution a singer or an instrument This performs perfect cancellation if no global stereo reverberation is added in the song, which would transform the instantaneous miture in a convolutive miture Moreover, eperimental tests show that even in this latter case we cancel an important part one because the reverberation normally has a lower level than the instantaneous contribution The main drawback for such applications is that the linear combination between the observations performed by our method, as shown in (), changes the balance between the instruments and gives a mono output 5 EXPERIMENTAL RESULTS 5 Configuration with two mitures two s We choose the miing matri as: :! 6 :" (6) The two theoretical separating coefficients are, according to Equ (): 8 :! and 8C 6 This first test has been performed using two different voice signals recorded from the radio at a sampling rate 8 Hz We compute the short Fourier Transform on 8-sample half-overlapping windows, which equates to 6 ms The period -% for variance analysis consists / these windows, which means that a is only requested to occur alone in one window during 6 ms to be cancelled With these settings our # method yields 8 :"!!! and 8 6 # ", which is quite close to the target values Respective observed variances are 65e- and 589e- Figures to 6 show the temporal the s, mitures and output signals Figures 7 to show the - analysis these s and mitures signals One can see that the - s the s in Figures 7 and 8 are slightly different These signals can be considered Usually, all the instruments are recorded one by one and then artificially mied using linear instantaneous miing devices 76

as a difficult configuration because the formants both voices are present in nearly the same - areas The two mitures in Figures 9 and are very similar and the plain ratio shown in Figure does not allow one to localize the constant values domains,which shows the need to compute the variance this ratio as described in () For better legibility the inverse the variance is presented in Figure This enhances the domains where the variance is low One can easily see which - domains provide the proper solutions for the separating coefficients Figure 5 and 6 show that the separation is achieved with high resolution On listening to these signals the difference between the original and separated signals is not perceptible 5 Configuration with two mitures three s We recorded a stereo song with continuous voice and two guitars which play nearly the same instrumental part The purpose here is to show the ability the proposed approach to cancel the voice from the mitures, although the guitars are continuously playing All these s were recorded one by one on a -track magneto recorder with a SNR around 6 db We sampled the signals from the console at khz with 6-bit resolution and then artificially mied them with the following miing matri: : :" : :" :" (7) and are for the voice whereas the other coefficients are for the guitars We chose to put the voice in the middle the stereo, like in a regular mi Thus the theoretical separating coefficient for the voice is 8 With Equ (5) one can also see that the two separating coefficients 8 which allow to separately cancel each guitar are : :# 8 : and The length the windows for Fourier transform is set to 56 samples, which corresponds to 58 ms The variance is then computed on these windows, which is equal to 58 ms We used seconds the song when all three s are present to compute the separating coefficients We used the method proposed in the previous subsection : and we obtained the two with a variance with a variance e- Thus the voice cancellation is nearly perfect The obtained output is a mono signal with the :!!!! new miing!" coefficients, 6 given : by (7) " and (): 6 " 6 # :# # which gives an ideal karaoke playback As we have nearly half the power guitar in the output as compared to any input, an approimate "! ' value the voice attenuation is given by # $ % & - $ 6! # % ' Figures, and 5 show the s the first, second 8 & separating coefficients 8@!: 66e-8 and # guitar and the voice One can see that one the two guitars contains more component than the other one which is confirmed by listening to their respective sounds Thus even if both guitars play the same instrumental part, there eist some differences in the - s their signals We notice in Figure 5 that unlike guitars, the voice includes high-medium and high component which are situated between 7 khz and 5 khz Thus only the voice eists in this band None these three s contains high above 5 khz So, the remaining signal between 5 khz and khz is some noise Figures 6 and 7 show the - the left and right sides the stereo input which look very similar The inverse variance graph in Figure 8 is interesting We can see on it that most lowvariance points are in a band, ie 7 to 5 khz, where only the voice is present No low-variance point eists for frequencies higher than 5 khz because no occurs in these regions and the respective noises added to each do not produce constant - values, ie are not short- stationary Only few low variance points eist for frequencies lower than 7 khz because both guitars occur, play the same chords and the voice has the same fundamental tone So it is hard to find some areas with only one below 7 khz Our method performs voice cancellation by self-focusing on the - domains where only the voice is present It also gives a separating coefficient to cancel a guitar, which might be hard to find because the similarity the produced sounds We demonstrated here that the - information allows to perform a nearly perfect cancellation We obtained similar results on mitures realised on a studio miing console 6 CONCLUSION We proposed here an efficient method for solving the linear instantaneous blind separation problem with mitures s This method also performs very well in karaoke-like applications when only two observations more than two s are available Unlike classical methods [], this new approach based on - analysis only needs the s to be nonstationary and to have some differences in their - s Thus no assumption is made about the gaussianity, coloration or independence the s This allows one to separate some signals which are ten ecluded from other methods Moreover this method directly achieves cancellation without any convergence issues and is much simpler than the few - methods that were previously reported [6], [7] Many tests have been performed on speech or music samples and show the 77

5 5 5 5 5 6 8 6 8 5 5 5 5 6 8 6 8 5 5 5 5 5 5 6 8 6 8 5 5 5 5 6 8 6 8 5 6 8 6 8 5 5 5 5 5 5 6 8 6 8 robustness this approach 7 REFERENCES [] J F Cardoso, Blind signal separation: statistical principles, in Proceedings the IEEE, October 998, vol 86, number, pp 9 5 [] Y Deville, A separation criterion based on signed normalized kurtosis, in Proceedings the th International Workshop on Electronics, Control, Measurement and Signals (ECMS 99), Liberec, Czech Republic, May - June, 999, pp 6 [] Y Deville, F Abrard, and M Benali, A new separation concept and its validation on a preliminary speech enhancement configuration, in Proceedings CFA, Lausanne, Switzerland, September -6,, pp 6 6 [] Y Deville and M Benali, Differential separation: concept and application to a criterion based on differential normalized kurtosis, in Proceedings EUSIPCO, Tampere, Finland, September, -8, [5] F Abrard, Y Deville, and M Benali, Numerical and analytical solution to the differential separation problem, in Proceedings EUSIPCO, Tampere, Finland, September, -8, [6] A Belouchrani and M G Amin, Blind separation based on - signal s, IEEE Transactions on Signal Processing, vol 6, no, pp 888 897, November 998 [7] M Zibulevsky and B A Pearlmutter, Blind separation by sparse decomposition in a signal dictionary, in Independent component analysis: Principles and practice, Robert S J and Everson R M editors, Cambridge University Press, [8] J K Hammond and P R White, The analysis non-stationary signals using - methods, Journal sound and vibrations, pp 9 7, 996 [9] F Hlawatsch and G F Boudreau-Bartels, Linear and quadratic - signal s, IEEE Signal Processing Magazine, vol 9, pp 67, April 99 [] L Cohen, Time- analysis, Prentice hall PTR, Englewood Cliffs, New Jersey, 995 [] L Cohen, Time- distributions - a review, in Proceedings the IEEE, July 989, vol 77, No 7, pp 9 979 Fig Source ; in domain Fig Source in domain Fig Mied signal in domain Fig Mied signal! in domain Fig 5 Output signal Fig 6 Output signal in domain in domain 78

5 s 5 5 5 5 5 5 Fig 7 Time Frequency 5 5 5 5 5 Fig Time Frequency guitar 5 s 5 5 5 5 5 5 Fig 8 Time Frequency 5 5 5 5 5 Fig Time Frequency guitar y s 5 5 5 5 5 5 5 5 Fig 9 Time Frequency 5 5 5 5 Fig 5 Time Frequency voice y 5 5 5 5 5 Fig Time Frequency 5 5 5 5 5 5 5 Fig 6 Time Frequency Y(t,f)/Y(t,f) 5 5 5 5 5 5 5 5 5 5 5 5 " Fig " & Time Frequency Fig 7 Time Frequency 7 5 5 /variance /variance 5 8 5 7 6 6 =? / 7A / Fig 5 5 5 5 =? 8/ 7A / temporal window Fig Time Frequency Aes units: windows indices 79 temporal window Time Frequency Aes units: windows indices