Slovak University of Technology and Planned Research in Voice De-Identification. Anna Pribilova

Size: px

Start display at page:

Download "Slovak University of Technology and Planned Research in Voice De-Identification. Anna Pribilova"

Clinton Barber
5 years ago
Views:

1 Slovak University of Technology and Planned Research in Voice De-Identification Anna Pribilova

2 SLOVAK UNIVERSITY OF TECHNOLOGY IN BRATISLAVA the oldest and the largest university of technology in Slovakia founded: 1938

3 SLOVAK UNIVERSITY OF TECHNOLOGY IN BRATISLAVA Academic year 2012/ study programs students graduates 1 st DEGREE 3 years Bachelor study 52 study programs students graduates 2 nd DEGREE 2 years Master study 65 study programs students graduates 3 rd DEGREE 3 years Doctoral study 69 study programs students 299 graduates 3

4 SLOVAK UNIVERSITY OF TECHNOLOGY IN BRATISLAVA Faculty of Civil Engineering Faculty of Mechanical Engineering Faculty of Electrical Engineering and Information Technology Faculty of Chemical and Food Technology Faculty of Architecture Faculty of Material Sciences and Technology Faculty of Informatics and Information Technologies

5 FACULTY OF ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY 1940 Department of Mechanical and Electrical Engineering 1941 Department of Electrical Engineering Faculty of Electrical Engineering 1994 Faculty of Electrical Engineering & Information Technology

6 FACULTY OF ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY Institute of Automobile Mechatronics Institute of Computer Science and Mathematics Institute of Control And Industrial Informatics Institute of Electrical Engineering Institute of Electronics and Photonics Institute of Nuclear and Physical Engineering Institute of Power and Applied Electrical Engineering Institute of Telecommunications

7 INSTITUTE OF ELECTRONICS AND PHOTONICS Department of Electronic Devices Department of Electronic Systems Department of Integrated Circuit Design and Test Department of Optoelectronics and Laser technique Department of Sensors And Microsystem Technique Department of Surface, Interface, and Nanostructure Analysis Department of Wireless Communication and Multimedia

8 Department of Wireless Communication and Multimedia MRI of the vocal tract during phonation cooperation with: Institute of Measurement Science, Slovak Academy of Sciences - design, realization, and testing of a new head scanning coil for MRI Esaote E-scan Opera for 3-D modelling of the human vocal tract Simulation of a field of a coil z Practical realization Bz r Results: MRI of the vocal tract

Czech Republic involved in COST Action 2103

9 Department of Wireless Communication and Multimedia MRI of the vocal tract during phonation cooperation with: Institute of Thermomechanics Academy of Sciences of the Czech Republic involved in COST Action 2103 Advanced Voice Function Assessment 3-D model of the human vocal tract

10 Score [-] Score [-] Department of Wireless Communication and Multimedia Evaluation of Synthetic Speech Quality by the GMM Classifier cooperation with: Department of Cybernetics, Faculty of Applied Sciences, Male, N gmix =1 University of West Bohemia, Plzeň, Czech 5 Republic Type Model Specification 4 1 (h48) harmonic Spectral envelope smoothed by B-Splines 2 (i80) cepstral Impulse response of the real cepstrum 1 3 (k64) cepstral Minimum phase of real cepstrum, mixed excitation 4 (s64) cepstral Real cepstrum, excitation by Hilbert impulse 5 (o50) cepstral Optimized structure of cascade approximation filter 3 2 h48 i80 k64 s64 o50 Resynthesized speech with different types of synthesis and parameterization Speech features analysis Features vector F(1) F(8) F(16) Trained on the original speech GMM classifier T, i male/female GMM models N-level Score discriminator N output classes N N-1 N Male r, N gmix =1 h48 i80 k64 s64 o50

11 Occurence [%] C1 Shimmer Correctness [%] Occurence [%] SFM SHE Occurence [%] R0ener HNR Department of Wireless Communication and Multimedia Detection and Localization of Artifacts in Synthetic Speech based on ANOVA Statistics cooperation with: Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Plzeň, Czech Republic Synt Synt Synt1 Synt1 HNR Synt Synt Synt1 Synt1 SHE Synt Histograms Synt Statistics Synt1 Synt1 Artf Artf v1o v2o v3o v4o v5o v6o v7o v8o v9o v10o Artifact Classification per Speech Features Speech feature set: 2 => "" - female M R Cener R0ener RE SHE F0DIFF ZCR F0K Jitter Shimmer Accuracy of Artifact Detection per Speech Features Speech feature set: 2 => "Artf" - female Speech feature set: 2 M R - female M R Artf v1s v2s v3s v4s v5s v6s v7s v8s v9s v10s Synt1 Synt1 Shimmer 0 Synt1 Cener 0 R0ener RE SHE F0DIFF ZCR F0K Jitter Shimmer Cener R0ener RE SHE F0DIFF ZCR F0K Jitter Shimmer All

12 De-identification for privacy protection in multimedia content our area of interest: de-identification of voice and face de-identification of personal health information following on cooperation with the Czech group: COST 258 The Naturalness of Synthetic Speech ( ) COST 277 Non-Linear Speech Processing ( ) our participation: COST 2102 Cross-Modal Analysis of Verbal and Non-verbal Communication ( )

13 De-identification of voice as a behavioural and soft-biometric identifier (speaker de-identification by voice conversion) investigation of reversibility of voice conversion - inverse voice transformation of converted speech (using parameters reciprocal to voice conversion) - utilization of our former work (male voice conversion to female, young male, child) evaluation of identification of converted voice - speaker identification based on GMMs

14 De-identification of voice as a behavioural and soft-biometric identifier (age and gender de-identification by voice conversion) voice conversion with preserved naturalness and intelligibility - change of apparent age of a speaker - change of apparent gender of a speaker evaluation of degree of similarity between converted and original voice - choice of segmental and suprasegmental features - choice of a classifier

15 De-identification of voice as a behavioural and soft-biometric identifier (speaker de-identification by emotional voice conversion) emotional voice conversion with preserved naturalness and intellibility - change of apparent emotional state of a speaker (our involvement in the former COST Action 2102) - possible change of apparent speaker identity evaluation of speaker identification after emotional conversion of his/her voice - choice of segmental and suprasegmental features for GMM speaker identification

16 De-identification of voice as a behavioural and soft-biometric identifier (speaker de-identification by combination of gender conversion and emotional voice conversion) male to female voice transformation (and vice versa) - spectrum envelope modification - F0 modification at the same time: emotional voice conversion - spectral modification - prosodic modification

17 Current work: Emotion conversion and speaker re-identification Neutral-to-emotional voice conversion Input sentence (in Neutral style) Spectral properties modification {F n } SFM Male / female voice Target sentences Joyous style Speech analysis and parameterization Joy Surprise Sadness Speech resynthesis Surprised style Sad style Modification of prosodic parameters F0 En DUR Anger Cepstral / harmonic speech model Angry style

18 Emotion conversion and speaker re-identification Determination of feature vectors of emotional speech Input sentence Segmentation F0 determination Smoothed spect. envelope Power spectral density {PSD db } F0 E(f) Speech signal analysis Suprasegmental parameters Basic spectral properties Supplement. spectral properties {F0 DIFF } {Jitter} {Shimmer}, etc. {F 1,2,3 } {F 12 ratio} {Tilt}, etc. {HNR} {SFM} {SC}, etc. Statistical processing Mean Median Min Std Rel. min Rel. max Skewness Kurtosis Output feature vector F ( 1) F ( 8) F ( N F ) N feat

19 Emotion conversion and speaker re-identification GMM emotion classification of emotionally converted speech trained on real emotional speech Tested sentence with transformed emotional speech styles Resynthetized sentence in neutral speech style for comparison only Speech features analysis Features vector F(1) F(8) F(16) GMM emotion classifier GMM models for male/female Trained data of the original speech in neutral and emotional styles Output emotion classes Neutral Joy Surprise Sadness Anger

20 Emotion conversion and speaker re-identification GMM speaker identification of emotionally converted speech trained on real neutral speech Tested sentence with transformed emotional speech styles Features vector F( 1) N gmix N-level N output speaker classes Speaker N Speech features analysis F ( 8) GMM speaker recognizer T, i Score discriminator Speaker N-1 F ( N F ) Trained data of the original speech in neutral style Male / female GMM speaker models Speaker 2 Speaker 1

21 Thank you for your attention.

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Introducing COVAREP: A collaborative voice analysis repository for speech technologies John Kane Wednesday November 27th, 2013 SIGMEDIA-group TCD COVAREP - Open-source speech processing repository 1 Introduction