I-Hao Hsiao, Chun-Tang Chao*, and Chi-Jo Wang (2016). A HHT-Based Music Synthesizer. Intelligent Technologies and Engineering Systems, Lecture Notes in Electrical Engineering (LNEE), Vol.345, pp.523-528. (EI Journal, ISBN: 978-3-319-17313-9).
Chapter 67 A HHT-Based Music Synthesizer I-Hao Hsiao, Chun-Tang Chao, and Chi-Jo Wang Abstract Synthesizing musical sound plays an important role in modern music composition. Composers nowadays can easily take advantage of powerful and userfriendly personal computers to produce the desired musical sound with a good music synthesis method. In this chapter, the Hilbert-Huang Transform (HHT) timefrequency analysis method is employed, in an attempt to implement a new efficient music synthesizer. By applying the HHT technique, the original varying-pitch music signals can be decomposed into several intrinsic mode functions (IMF) based on the empirical mode decomposition (EMD). The instantaneous amplitude and frequency of IMFs can be further obtained by Hilbert transform. By extracting the main spectrum coefficients of the instantaneous amplitude and frequency of the IMFs, the original musical signal can be reconstructed with little error. Experimental results indicate the feasibility of the proposed method. Keywords Music synthesis Hilbert-Huang transform (HHT) Empirical mode decomposition (EMD) Intrinsic mode function (IMF) 67.1 Introduction For regular music synthesis methods, the two most popular methods may be the wavetable music synthesis [1] and FM synthesis [2]. A good music synthesis allows music creators to synthesize the sound signal accurately and quickly. However, the two methods have been unable to provide satisfactory quality for high performance applications. In recent years, the trend for a musical-tone generator has been based on physical modeling of sound production mechanisms [3]. The digital waveguide filter [4, 5] can be applied to simulate a wide class of musical instruments. Figure 67.1 shows the nonlinear predictive model of an instrument. The excitation unit (Exciter) is the nonlinear part, responsible for generating an oscillatory signal source. And the I.-H. Hsiao C.-T. Chao (*) C.-J. Wang Department of Electrical Engineering, Southern Taiwan University of Science and Technology, Tainan, Taiwan e-mail: tang@mail.stust.edu.tw Springer International Publishing Switzerland 2016 J. Juang (ed.), Proceedings of the 3rd International Conference on Intelligent Technologies and Engineering Systems (ICITES2014), Lecture Notes in Electrical Engineering 345, DOI 10.1007/978-3-319-17314-6_67 523
524 I.-H. Hsiao et al. Fig. 67.1 The nonlinear predictive model of an instrument RBF Network Delay Line H AP u[n] H LP Exciter Nonlinear Predictor Resonator Synthetic output wavetable y 0 y 1 y 2 y 3 y N z 1 z 1 z 1 w 0,1 w 0,2 w 0,3 w0,n Fig. 67.2 The nonlinear predictive model of an instrument resonance unit (Resonator) belongs to the linear filter part, responsible for modulating out the sound signal. Figure 67.2 shows a simple model-based structure implemented by IIR (infinite impulse response) synthesis, consisting of a prediction filter and a delay line to synthesize tones produced by instruments [6]. The design of the coefficients for the IIR synthesizer is accomplished by using a neural network (NN)-based training algorithm. A recurrent NN (RNN) is applied for the prediction filter design. However, such kinds of design approaches can be time-consuming during the training process. Huang et al. [7] in 1998 developed a new method called Hilbert-Huang Transform (HHT) for analyzing nonlinear and nonstationary data. The HHT should be more powerful and suitable in timbre analysis when compared with traditional Short-Time Fourier Transform (STFT). Through the understanding of the HHT method, this chapter proposes a more efficient HHT-Based Music Synthesizer. 67.2 The HHT and EMD The HHT was pioneered by Huang et al., for adaptively representing nonstationary signals as sums of zero-mean amplitude modulation frequency modulation components. The Fourier Transform views the signal as a combination of many
67 A HHT-Based Music Synthesizer 525 fixed-frequency and fixed-amplitude sinusoids. The HHT regards the signal as a combination of many intrinsic mode functions (IMF), which have time-varying frequency (instantaneous frequency) and time-varying amplitude (instantaneous amplitude) [8]. Thus, the HHT provides a more powerful analysis and synthesis tool for the pitch and timbre of a music sound. In this section, the HHT and EMD are briefly introduced. There are two steps in the HHT: (1) For a given signal x(t), extract the IMFs by means of empirical mode decomposition (EMD); and (2) apply the Hilbert Transform on each IMF to get the corresponding instantaneous frequency and amplitude. Step 1 is iteratively finished until the residue becomes a monotonic function or a function with only one cycle from which no more IMFs can be extracted. Equation (67.1) shows the decomposition of the x(t) into N-empirical modes, where c j (t) is the jth IMF and r N (t) is the final residue. xt ðþ¼ XN c j ðþþr t N ðþ t j¼1 ð67:1þ In Step 2, the Hilbert Transform is utilized to obtain an analytic complex representation z(t) for each IMF c(t), as shown in (67.2), where d(t) is the Hilbert Transform of c(t). The instantaneous amplitude and instantaneous phase are denoted as a(t) and θ(t), respectively. zt ðþ¼ct ðþþidðþ¼at t ðþe iθðþ t Then the original signal x(t) can be represented as 8 >< xt ðþ¼re >: XN j¼1 ð 9 a j ðþe t i ω j ðþdτ τ >= >; ð67:2þ ð67:3þ ðþ¼ dθðþ t where ω t dt is the instantaneous frequency and Re denotes real part. Equation (67.3) showsthedifferencebetweenthehhtandthediscretefourier Transform. In the HHT, each component is considered as time-varying amplitude andtime-varyingfrequencysinusoid.forbrevity, the instantaneous frequency and the instantaneous amplitude will be referred to as IF and IA in the following text. For each IMF, its corresponding IF and IA can be calculated. On the contrary, the IMF can be reconstructed by its corresponding IF and IA. In this chapter, the FFT will be applied to the IF and IA, instead of directly applied to the IMF. Using this approach, less FFT coefficients are needed and will yield better synthesis performance.
526 I.-H. Hsiao et al. 67.3 Simulation Results The simulation was implemented in the MATLAB environment. Different sound signals, including piano, trumpet, violin, and bird chirps, are provided. Figure 67.3 shows the EMD analysis of trumpet music (pitch A4 or A440), including the original signal, IMF1 IMF8, and the final residue. For each IMF, the corresponding IF and IA can be obtained. Figure 67.4 shows the IA analysis for each IMF. In the proposed method, only the first four IMFs (IMF1 IMF4) are considered and the latter IMFs (IMF5 IMF11) are omitted. The FFT is applied to IF and IA, and 128 main coefficients for IF and IA, respectively, are selected. Thus for the first four IMFs, all the 1,024 FFT coefficients are stored. The proposed synthesis method is compared with the original sound with direct 1,024-point FFT analysis, to demonstrate its efficiency and feasibility. Table 67.1 shows the synthesis error comparison under the same coefficients number 1,024 for different instruments, where the error is measured by the Euclidean distance defined in (67.4). dx; ð yþ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X n i¼1 ðx i y i Þ 2 ð67:4þ res. imf11imf10 imf9 imf8 imf7 imf6 imf5 imf4 imf3 imf2 imf1signal Fig. 67.3 EMD analysis of trumpet music (A4)
l 67 A HHT-Based Music Synthesizer 527 imf1 IA imf2 IA 5 1.5 4 3 1 2 imf3 IA 1.2 1 0.8 0.6 0.4 0.4 0.3 0.2 0.1 imf5 IA 0.5 imf4 IA 0.4 0.3 0.2 0.1 imf6 IA 0.2 0.15 0.1 0.05 imf7 IA 0.05 0.04 0.03 0.02 0.01 imf8 IA 0.15 0.1 0.05 Fig. 67.4 Instantaneous amplitude (IA) analysis for each IMF
528 I.-H. Hsiao et al. Table 67.1 Synthesis error comparison Method Sound The proposed method Direct FFT Piano 0.3407 0.8267 Trumpet 0.5202 0.8229 Violin 0.2152 0.8219 Bird chirps 0.0596 0.2952 67.4 Conclusion This chapter presents a music synthesizer based on the HHT. For some advanced model-based approaches, the procedure may be tedious and time-consuming for parameter learning. Since most practical music sounds are not stationary, especially in the beginning of the timbre, the conventional Fourier Transform cannot be expected to realistically synthesize the music sounds. The HHT is an advanced signal-processing technique for analyzing nonlinear and nonstationary time series data. The signal is first segregated into narrow band components, the IMFs, by performing EMD. The Hilbert transform is then applied on each mode to obtain the respective instantaneous frequency and the amplitude. By extracting the main FFT coefficients of the instantaneous frequency and the amplitude for each IMF, the original signal can be restored in a good performance. Simulation results show the feasibility of the proposed synthesis method. Further improvement should be developed for practical applications. References 1. Robert, B.J.: Wavetable synthesis 101, a fundamental perspective. In: Proceedings 101st Convention of the Audio Engineering Society, Los Angeles (1996) 2. John, M.: Chowning: the synthesis of complex audio spectra by means of frequency modulation. Comput. Music J. 1(2), 46 54 (1977) 3. Drioli, C., Rocchesso, D.: A generalized musical-tone generator with application to sound compression and synthesis. In: Proceeding of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, vol. 1, pp. 431 434 (1997) 4. Smith, J.O.: Physical modeling using digital waveguides. Comput. Music J. 16(4), 74 87 (1992) 5. Smith, J.O.: Efficient synthesis of stringed musical instruments. In: Proceedings of the 1993 International Computer Music Conference, pp. 64 71, Computer Music Association, Tokyo (1993) 6. Su, A.W.Y., Liang, S.F.: A new automatic IIR analysis/synthesis technique for plucked-string instruments. IEEE Trans. Speech Audio Process. 9(7), 747 754 (2001) 7. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. 454(1971), 903 995 (1998) 8. Gloersen, P., Huang, N.E.: Comparison of interannual intrinsic modes in hemispheric sea ice covers and other geophysical parameters. IEEE Trans. Geosci. Remote Sens. 41(5), 1062 1074 (2003)