Singing Expression Transfer from One Voice to Another for a Given Song

Size: px

Start display at page:

Download "Singing Expression Transfer from One Voice to Another for a Given Song"

Phillip Booth
5 years ago
Views:

1 Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology Sangeon Yong, Juhan Nam MACLab Music and Audio Computing

2 Introduction

3 Introduction source target

4 Related Works Antares Autotune 8 graphical mode Steinberg Variaudio

Related Works Cano et al. (ICMC, 2000) Voice morphing system with source and target voice Score information is used for temporal alignment Nakano et al.

5 Related Works Cano et al. (ICMC, 2000) Voice morphing system with source and target voice Score information is used for temporal alignment Nakano et al. (SMC, 2009) Similar with above but using a singing synthesizer instead of the source voice (i.e. Vocaloid) Tune synthesizer parameter with the lyric information of the song However, they require additional score information!

6 Research Goal Voice color? Rhythm, Pitch, Dynamics Transfer musical expressions without any additional information

7 System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

8 System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

9 System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

10 Temporal Alignment Singer A Lyrics Let it go let it go Singer B

11 Temporal Alignment Dynamic Time Warping

12 System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

13 Temporal Alignment Feature Extraction Spectrogram of Source Spectrogram of Target

14 Temporal Alignment Feature Extraction Similarity matrix with spectrogram

15 Temporal Alignment Feature Extraction Spectrogram of Source Spectrogram of Target

16 Feature Extraction Strategy Preserving common elements Note-level melody Lyrics Suppressing different characteristics Vibrato or other pitch-related articulations Singer timbre

17 Proposed Features Max-filtered Constant-Q transform Semi-tone pitch resolution: vibrato with less than one semi-tone Frequency-wise max-filtering: vibrato with more than one semi-tone Constant-Q Transform Const-Q Trans with Maximum Filtering

18 Phonemes Proposed Features Phoneme score (phoneme classifier posteriorgram) Frame-level features for accurate temporal alignment Singer invariant lyrical features

19 Temporal Alignment Feature Comparison Spectrogram Max-filtered Constant-Q Transform

20 Temporal Alignment Feature Comparison Spectrogram phoneme score

21 Temporal Alignment Feature Comparison Spectrogram Phoneme Score +Const-Q Trans

22 System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

23 Temporal Alignment Path Smoothing

24 Temporal Alignment Path Smoothing Savitzky, Abraham, and Marcel JE Golay. "Smoothing and differentiation of data by simplified least squares procedures." Analytical chemistry 36.8 (1964):

25 Temporal Alignment Path Smoothing

26 Temporal Alignment Path Smoothing

27 System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE WSOLA Gain Modified

28 System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

29 Pitch Alignment Harmonic-Percussion Source Separation (HPSS) Pre-processing of pitch detection to increase detection accuracy Median filter (IEEE Signal Processing Letters 2014) Pitch Detector YIN Pitch shifting Pitch-Synchronous Overlap-Add (PSOLA) Formant preservation

30 Pitch Alignment source target result

31 System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

32 Dynamics Alignment source target result

33 Evaluation Datasets 4 recordings for each of 4 songs (total 16 recordings) One of 4 recordings is a target singing voice (professional or skilled) Totally 12 pairs of source-target singing voice Song 1 Song 2 Song 3 Song 4 Gender female male male male No. of source Remarks high pitch English low pitch English swing rhythm Korean swing rhythm Korean

34 Evaluation Temporal alignment Better alignment has less fluctuation of the DTW slope Standard deviation of slope angle θ = arctan(slope) Song 1 Song 2 Song 3 Song 4 Gender female male male male No. of source Remarks high pitch English low pitch English swing rhythm Korean swing rhythm Korean song 1 song 2 song 3 song 4

35 Evaluation Pitch alignment Song 1 Song 2 Song 3 Song 4 Gender female male male male No. of source Remarks high pitch English low pitch English swing rhythm Korean swing rhythm Korean

36 Evaluation Dynamics alignment Song 1 Song 2 Song 3 Song 4 Gender female male male male No. of source Remarks high pitch English low pitch English swing rhythm Korean swing rhythm Korean

37 Audio Examples let it go source target result cherry blossom ending More examples are available on

38 Summary Proposed a method to transfer vocal expressions from one voice to another in terms of tempo, pitch and dynamics without any additional information Showed the proposed method effectively transformed the source voices so that they mimic singing skills from the target voice

39 Future Plan The limitation of this work is that the target voice must be available A possible solution is to model a target singer model (e.g. singing synthesizer with natural expressions) and generate a target example using melody and lyrics information extracted from the source voice Improve the audio quality using other time-scale/pitch modification algorithms

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

SPEECH TO SINGING SYNTHESIS SYSTEM Mingqing Yun, Yoon mo Yang, Yufei Zhang Department of Electrical and Computer Engineering University of Rochester ABSTRACT This paper describes a speech-to-singing synthesis