Singing Expression Transfer from One Voice to Another for a Given Song

Similar documents
SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

A system for automatic detection and correction of detuned singing

Converting Speaking Voice into Singing Voice

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Applications of Music Processing

THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

L19: Prosodic modification of speech

The Music Retrieval Method Based on The Audio Feature Analysis Technique with The Real World Polyphonic Music

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation

Sound Synthesis Methods

Verse (Bars 5 20) The Contour of the Acoustic Guitar Riff

INTRODUCTION TO COMPUTER MUSIC. Roger B. Dannenberg Professor of Computer Science, Art, and Music. Copyright by Roger B.

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW

Lesson Plans Contents

Drum Transcription Based on Independent Subspace Analysis

Mel Spectrum Analysis of Speech Recognition using Single Microphone

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

DAFX - Digital Audio Effects

Transcription of Piano Music

G (IV) D (I) 5 R. G (IV) o o o

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Combining Pitch-Based Inference and Non-Negative Spectrogram Factorization in Separating Vocals from Polyphonic Music

CMPT 468: Frequency Modulation (FM) Synthesis

VOICE BOX Harmony Machine and Vocoder

Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech

Proceedings of Meetings on Acoustics

Lecture 5: Sinusoidal Modeling

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Automatic Lyrics Alignment for Cantonese Popular Music

Vocal effort modification for singing synthesis

YAMAHA. Modifying Preset Voices. IlU FD/D SUPPLEMENTAL BOOKLET DIGITAL PROGRAMMABLE ALGORITHM SYNTHESIZER

REpeating Pattern Extraction Technique (REPET)

Localized Robust Audio Watermarking in Regions of Interest

Automatic Evaluation of Hindustani Learner s SARGAM Practice

DREAM DSP LIBRARY. All images property of DREAM.

Speech Synthesis; Pitch Detection and Vocoders

Linear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis

Fitur YAMAHA ELS-02C. An improved and superbly expressive STAGEA. AWM Tone Generator. Super Articulation Voices

Tempo and Beat Tracking

Query by Singing and Humming

Advanced Music Content Analysis

Music Signal Processing

2. Experiment with your basic ring modulator by tuning the oscillators to see and hear the output change as the sound is modulated.

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

A Design of Matching Engine for a Practical Query-by-Singing/Humming System with Polyphonic Recordings

SPEECH AND SPECTRAL ANALYSIS

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES

Audio Watermarking Based on Music Content Analysis: Robust against Time Scale Modification

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

Tempo and Beat Tracking

Class Overview. tracking mixing mastering encoding. Figure 1: Audio Production Process

Transferring Singing Expressions from One Voice to Another

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

Lecture 6. Rhythm Analysis. (some slides are adapted from Zafar Rafii and some figures are from Meinard Mueller)

Characterization of the singing voice from polyphonic recordings

Recent Development of the HMM-based Singing Voice Synthesis System Sinsy

Assessment Schedule 2014 Music: Demonstrate knowledge of conventions used in music scores (91094)

Color Score Melody Harmonization System & User Guide

COMP 546, Winter 2017 lecture 20 - sound 2

Machine Learning for Signal Processing. Course Projects. Class Sep 2009

A Novel Approach to Separation of Musical Signal Sources by NMF

A SEGMENTATION-BASED TEMPO INDUCTION METHOD

TRANSCRIBING VOCAL EXPRESSION FROM POLYPHONIC MUSIC. Yukara Ikemiya, Katsutoshi Itoyama, Hiroshi G. Okuno

Introduction... xxvii Conventions used in this book... xxvii Acknowledgements...xxviii

Separating Voiced Segments from Music File using MFCC, ZCR and GMM

Adaptive time scale modification of speech for graceful degrading voice quality in congested networks

Using Audio Onset Detection Algorithms

Psychology of Language

WK-7500 WK-6500 CTK-7000 CTK-6000 BS A

Isolated Digit Recognition Using MFCC AND DTW

Contents. Sevana Voice Quality Analyzer Copyright (c) 2009 by Sevana Oy, Finland. All rights reserved.

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

- CROWD REVIEW FOR - The Silent Me enge

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Speech Synthesis using Mel-Cepstral Coefficient Feature

Worship Team Expectations

Digital Speech Processing and Coding

TIME DOMAIN ATTACK AND RELEASE MODELING Applied to Spectral Domain Sound Synthesis

Single-channel Mixture Decomposition using Bayesian Harmonic Models

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Enhanced Harmonic Content and Vocal Note Based Predominant Melody Extraction from Vocal Polyphonic Music Signals

Sinusoidal Modelling in Speech Synthesis, A Survey.

SOUND SOURCE RECOGNITION AND MODELING

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

SurferEQ 2. User Manual. SurferEQ v Sound Radix, All Rights Reserved

The Deep Sound of a Global Tweet: Sonic Window #1

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

HCS 7367 Speech Perception

Dept. of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark

City, University of London Institutional Repository

Transcription:

Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology Sangeon Yong, Juhan Nam MACLab Music and Audio Computing

Introduction

Introduction source target

Related Works Antares Autotune 8 graphical mode Steinberg Variaudio

Related Works Cano et al. (ICMC, 2000) Voice morphing system with source and target voice Score information is used for temporal alignment Nakano et al. (SMC, 2009) Similar with above but using a singing synthesizer instead of the source voice (i.e. Vocaloid) Tune synthesizer parameter with the lyric information of the song However, they require additional score information!

Research Goal Voice color? Rhythm, Pitch, Dynamics Transfer musical expressions without any additional information

System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

Temporal Alignment Singer A Lyrics Let it go let it go Singer B

Temporal Alignment Dynamic Time Warping

System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

Temporal Alignment Feature Extraction Spectrogram of Source Spectrogram of Target

Temporal Alignment Feature Extraction Similarity matrix with spectrogram

Temporal Alignment Feature Extraction Spectrogram of Source Spectrogram of Target

Feature Extraction Strategy Preserving common elements Note-level melody Lyrics Suppressing different characteristics Vibrato or other pitch-related articulations Singer timbre

Proposed Features Max-filtered Constant-Q transform Semi-tone pitch resolution: vibrato with less than one semi-tone Frequency-wise max-filtering: vibrato with more than one semi-tone Constant-Q Transform Const-Q Trans with Maximum Filtering

Phonemes Proposed Features Phoneme score (phoneme classifier posteriorgram) Frame-level features for accurate temporal alignment Singer invariant lyrical features

Temporal Alignment Feature Comparison Spectrogram Max-filtered Constant-Q Transform

Temporal Alignment Feature Comparison Spectrogram phoneme score

Temporal Alignment Feature Comparison Spectrogram Phoneme Score +Const-Q Trans

System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

Temporal Alignment Path Smoothing

Temporal Alignment Path Smoothing Savitzky, Abraham, and Marcel JE Golay. "Smoothing and differentiation of data by simplified least squares procedures." Analytical chemistry 36.8 (1964): 1627-1639.

Temporal Alignment Path Smoothing

Temporal Alignment Path Smoothing

System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE WSOLA Gain Modified

System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

Pitch Alignment Harmonic-Percussion Source Separation (HPSS) Pre-processing of pitch detection to increase detection accuracy Median filter (IEEE Signal Processing Letters 2014) Pitch Detector YIN Pitch shifting Pitch-Synchronous Overlap-Add (PSOLA) Formant preservation

Pitch Alignment source target result

System Structure Temporal Alignment Pitch Alignment Dynamics Alignment Target Feature Extraction DTW Smoothing HPSS Pitch Detector Envelope Detector stretching ratio harmonic signal smoothed stretching ratio pitch ratio gain ratio Source Time-Scale Modification Pitch Shifting s s T s TP s TPE Gain Modified

Dynamics Alignment source target result

Evaluation Datasets 4 recordings for each of 4 songs (total 16 recordings) One of 4 recordings is a target singing voice (professional or skilled) Totally 12 pairs of source-target singing voice Song 1 Song 2 Song 3 Song 4 Gender female male male male No. of source 3 3 3 3 Remarks high pitch English low pitch English swing rhythm Korean swing rhythm Korean

Evaluation Temporal alignment Better alignment has less fluctuation of the DTW slope Standard deviation of slope angle θ = arctan(slope) Song 1 Song 2 Song 3 Song 4 Gender female male male male No. of source 3 3 3 3 Remarks high pitch English low pitch English swing rhythm Korean swing rhythm Korean song 1 song 2 song 3 song 4

Evaluation Pitch alignment Song 1 Song 2 Song 3 Song 4 Gender female male male male No. of source 3 3 3 3 Remarks high pitch English low pitch English swing rhythm Korean swing rhythm Korean

Evaluation Dynamics alignment Song 1 Song 2 Song 3 Song 4 Gender female male male male No. of source 3 3 3 3 Remarks high pitch English low pitch English swing rhythm Korean swing rhythm Korean

Audio Examples let it go source target result cherry blossom ending More examples are available on

Summary Proposed a method to transfer vocal expressions from one voice to another in terms of tempo, pitch and dynamics without any additional information Showed the proposed method effectively transformed the source voices so that they mimic singing skills from the target voice

Future Plan The limitation of this work is that the target voice must be available A possible solution is to model a target singer model (e.g. singing synthesizer with natural expressions) and generate a target example using melody and lyrics information extracted from the source voice Improve the audio quality using other time-scale/pitch modification algorithms