AN AUDIO SEPARATION SYSTEM BASED ON THE NEURAL ICA METHOD

Similar documents
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Source Separation and Echo Cancellation Using Independent Component Analysis and DWT

TIMIT LMS LMS. NoisyNA

Malaviya National Institute of Technology Jaipur

Separation of Noise and Signals by Independent Component Analysis

High-speed Noise Cancellation with Microphone Array

MINE 432 Industrial Automation and Robotics

The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D.

DURING the past several years, independent component

Experimental Study on Feature Selection Using Artificial AE Sources

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

A Novel Fault Diagnosis Method for Rolling Element Bearings Using Kernel Independent Component Analysis and Genetic Algorithm Optimized RBF Network

Live Hand Gesture Recognition using an Android Device

Proposers Day Workshop

MICROCHIP PATTERN RECOGNITION BASED ON OPTICAL CORRELATOR

Biometric: EEG brainwaves

Radiation Pattern Reconstruction from the Near-Field Amplitude Measurement on Two Planes using PSO

Using of Artificial Neural Networks to Recognize the Noisy Accidents Patterns of Nuclear Research Reactors

Chapter 4 SPEECH ENHANCEMENT

An Adaptive Algorithm for Speech Source Separation in Overcomplete Cases Using Wavelet Packets

Performance Evaluation of Noise Estimation Techniques for Blind Source Separation in Non Stationary Noise Environment

Classification of Analog Modulated Communication Signals using Clustering Techniques: A Comparative Study

Qäf) Newnes f-s^j^s. Digital Signal Processing. A Practical Guide for Engineers and Scientists. by Steven W. Smith

Evaluation of Audio Compression Artifacts M. Herrera Martinez

Real-time Adaptive Concepts in Acoustics

Application of Classifier Integration Model to Disturbance Classification in Electric Signals

Acoustic Emission Source Location Based on Signal Features. Blahacek, M., Chlada, M. and Prevorovsky, Z.

Multiple Sound Sources Localization Using Energetic Analysis Method

Professor Zdzisław Bubnicki in my memory

SSB Debate: Model-based Inference vs. Machine Learning

Slovak University of Technology and Planned Research in Voice De-Identification. Anna Pribilova

Drum Transcription Based on Independent Subspace Analysis

AUTOMATIC MODULATION RECOGNITION OF COMMUNICATION SIGNALS

Figure 1. Artificial Neural Network structure. B. Spiking Neural Networks Spiking Neural networks (SNNs) fall into the third generation of neural netw

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

REAL-TIME BLIND SOURCE SEPARATION FOR MOVING SPEAKERS USING BLOCKWISE ICA AND RESIDUAL CROSSTALK SUBTRACTION

NEURAL NETWORK DEMODULATOR FOR QUADRATURE AMPLITUDE MODULATION (QAM)

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle

Measurement of RMS values of non-coherently sampled signals. Martin Novotny 1, Milos Sedlacek 2

On the Subcarrier Averaged Channel Estimation for Polarization Mode Dispersion CO-OFDM Systems

Study on OFDM Symbol Timing Synchronization Algorithm

Статистическая обработка сигналов. Введение

IOMAC' May Guimarães - Portugal

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Radar Signal Classification Based on Cascade of STFT, PCA and Naïve Bayes

FROM BLIND SOURCE SEPARATION TO BLIND SOURCE CANCELLATION IN THE UNDERDETERMINED CASE: A NEW APPROACH BASED ON TIME-FREQUENCY ANALYSIS

A.I in Automotive? Why and When.

D DAVID PUBLISHING. 1. Introduction

High Performance Computing Systems and Scalable Networks for. Information Technology. Joint White Paper from the

Reducing of Colors in Digital Images by Using (Kohonen) Artificial Neural Networks

Visvesvaraya Technological University, Belagavi

THOMAS PANY SOFTWARE RECEIVERS

1 line

Learning Algorithms for Servomechanism Time Suboptimal Control

Smart antenna for doa using music and esprit

ESA400 Electrochemical Signal Analyzer

Artificial Intelligence: Using Neural Networks for Image Recognition

SOFTWARE FOR MAGNETIC FLUX DENSITY WAVEFORM CORRECTION SYSTEMS

ICA for Musical Signal Separation

EE 791 EEG-5 Measures of EEG Dynamic Properties

Monitoring Station for GNSS and SBAS

Computational Intelligence Introduction

How to Use the Method of Multivariate Statistical Analysis Into the Equipment State Monitoring. Chunhua Yang

Introduction to Blind Signal Processing: Problems and Applications

Lane Detection in Automotive

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Blind fault detection using spectral signatures

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido

New Method for Transformer Winding Fault Detection

M.Tech Student, Asst Professor Department Of Eelectronics and Communications, SRKR Engineering College, Andhra Pradesh, India

Feature analysis of EEG signals using SOM

FFT Convolution. The Overlap-Add Method

ARTIFICIAL INTELLIGENCE IN POWER SYSTEMS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Analysis of Processing Parameters of GPS Signal Acquisition Scheme

I. Cocktail Party Experiment Daniel D.E. Wong, Enea Ceolini, Denis Drennan, Shih Chii Liu, Alain de Cheveigné

Design Of A Parallel Pipelined FFT Architecture With Reduced Number Of Delays

FAULT DIAGNOSIS AND PERFORMANCE ASSESSMENT FOR A ROTARY ACTUATOR BASED ON NEURAL NETWORK OBSERVER

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

SEPARATION AND DEREVERBERATION PERFORMANCE OF FREQUENCY DOMAIN BLIND SOURCE SEPARATION. Ryo Mukai Shoko Araki Shoji Makino

SMARTPHONE SENSOR BASED GESTURE RECOGNITION LIBRARY

ANALYSIS OF CITIES DATA USING PRINCIPAL COMPONENT INPUTS IN AN ARTIFICIAL NEURAL NETWORK

Electric Guitar Pickups Recognition

New ways in non-stationary, nonlinear EEG signal processing

To be published by IGI Global: For release in the Advances in Computational Intelligence and Robotics (ACIR) Book Series

Artificial Neural Network Channel Estimation for OFDM System

Simulate IFFT using Artificial Neural Network Haoran Chang, Ph.D. student, Fall 2018

COMPUTER SCIENCE AND ENGINEERING

Computational Principles of Mobile Robotics

Detecting Unusual Changes of Users Consumption

Neural Blind Separation for Electromagnetic Source Localization and Assessment

Fault detection of a spur gear using vibration signal with multivariable statistical parameters

Decriminition between Magnetising Inrush from Interturn Fault Current in Transformer: Hilbert Transform Approach

Synthetic Aperture Radar

BLIND SOURCE SEPARATION USING WAVELETS

2 TD-MoM ANALYSIS OF SYMMETRIC WIRE DIPOLE

Great Minds. Internship Program IBM Research - China

Content Based Image Retrieval Using Color Histogram

Voice Activity Detection

Transcription:

AN AUDIO SEPARATION SYSTEM BASED ON THE NEURAL ICA METHOD MICHAL BRÁT, MIROSLAV ŠNOREK Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science and Engineering Karlovo náměstí 13, 121 35 Praha 2 Email: bratm@fel.cvut.cz, snorek@cslab.felk.cvut.cz KEYWORDS Data Mining, Signal Mining, Blind Signal Separation - BSS, Independent Component Analysis - ICA, Fast Fourier Transformation - FFT, Principal Component Analysis - PCA, Self-Organizing Map - SOM, Learning Vector Quantization LVQ. ABSTRACT This contribution deals with the problems based on data mining, especially signal mining. The main representative of signal mining is Blind Signal Separation. This group of problems can be solved by traditional (mathematical) methods or also untraditional techniques that utilize artificial intelligence such as neural networks. They are not possible to use alone, therefore this contribution focuses on pre-processing of input signals too. In conclusion we show our developed system based on self-organizing neural network and several experiments with it. 1. INTRODUCTION At this time the amount of data in electronic format in many academic and other disciplines is increased. Otherwise one from many problems of huge data is in their incomprehensiveness and blind information about them. The group of data problems comes under a discipline, which is called data mining. One part of data mining, that is concentrated only on the signal problems, is well known as data stream mining or also signal mining. We may solve these problems by traditional methods such as mathematical algorithms, especially statistical algorithms or also other untraditional techniques based on artificial intelligence like neural networks. 2. PROBLEM DEFINITION: BLIND SIGNAL SEPARATION Imagine a group of people who are sitting in a room and speaking simultaneously (see Figure 1). We are member of speaking group and we want to obtain speech from only a person who is speaking important information for us. We must quite concentrate on this person. Human ability of speech recognition can exactly focus on speech from one person and other noise is eliminated. We want to implement the same recognition abilities in a computer science. This problem based on separation of a signal is well known as cocktail party problem. It is one problem of Blind Signal Separation (BSS). The separation is called blind because we do hardly know quite anything about an environment in which mixing of signals takes place. It is special section of signal mining, which focuses on signal separation with minimal information about input signals. They are just hard problems from data mining. The BSS problem covers in as well other signal process. This is economic data stream mining, which wants to obtain knowledge about data stream. Other process is based on separation of damaged medical signals such as EEG or MEG. All these problems are almost solved by traditional techniques. The main representative of these techniques is Independent Component Analysis (ICA) [1]. It

could be used other techniques based on adaptive filters, decision rules and others. Figure 1: A typical situation in a cocktail party problem 3. STANDARD TECHNIQUES BASED ON MATHEMATICAL ALGORITHMS The traditional methods for solving of problems, which come out the BSS problem, are almost based on complex mathematical algorithms. The main representative of these techniques is the ICA method. The basic idea of it comes out non-linear transformation of signals in co-ordinates system. The new one represents turning of co-ordinates to direction for better view of signals. Firstly, co-ordinates are turned to direction of maximal variance (this is second statistical moment, in fact it is only linear transformation). Then it is used non-linear transformation of signals. Coordinates are turned in direction of maximal kurtosis (it is third statistical moment). More details about it are in [3]. The ICA method is very useful but its computation by mathematical algorithms is quite complex. It can be implemented by easier techniques - using artificial intelligent especially neural networks. Neural networks can be usable for many applications and solutions of hard and non-algorithm problems. The basic idea of the neural ICA method comes out mathematical solution, but implementation is completely different. 4. IMPLEMENTATION OF THE ICA METHOD BASED ON NEURAL NETWORKS First idea about neural solution of the ICA method has been inspired by article Nonlinear Blind Source Separation by Self- Organizing Maps [4]. This meaning was not quite perfect because author has entirely used SOM without using other methods for modification of input signals. Therefore we have prepared first version of a system, which is improvement of the idea came out promising article. This system (ExNeurICA_PS) is based on neural networks with pre-processing of input signals. A structure of this system is shown in Figure 2. The basic idea of the system consists of pre-processing of input signals and a core of system using neural networks. Pre-processing of input signals is done by the PCA method. It is in fact the same pre-processing such as for mathematical solution of the ICA method. Co-ordinates are turned to direction of maximal variance. Figure 2: The structure of our systems The second part of this system utilizes neural network, especially Kohonen s self-organizing map SOM. This neural network seems to be also used for non-linear transformation because of its architecture. We have prepared several experiments with this system [3]. These results seemed to be not perfect therefore we have prepared new system. More details about previous system are in [2]. 5. METHOD IMPROVEMENT: FREQUENCY DOMAIN APPROACH The structure of new system is the same as previous system, but meaning is completely different. Audio signal in time domain is not quite applicable because it is dependent of quality and level of signal. Therefore almost all audio signals are processed in frequency domain because of easier elaboration. Generally, the signals in frequency domain keep better features. The same idea about signals in frequency domain is usable for implementation of the ICA method. A developed system is just based on frequency pre-processing and clustering

according to self-organizing neural networks. Transformation from time to frequency domain has been performed by fast Fourier transformation (FFT). Now we can define variables for computing of this system. The input signals are x(t). They are in fact the damaged (or also mixed) signals, which are separated. The separated signals are marked as s (t). The original signals s(t) mean etalon for test of quality results. In fact we have not these signals in real application. In addition to they are the basic variable and the inside (only in system) variable is Fourier s image X(k). 5.1. Fast Fourier Transformation - FFT This transformation has been known a long time but in era without computers it was disapproved and not much used. At this time this transformation is quite used, mainly in discipline, which deals with an audio process. The signal is transformed by the equation N 1 ( ) = X k x( i) i= 0 j2 i N e π, k = 0,1,2,..., N 1 domain. SOM is unsupervised neural networks therefore we do not exactly set a number of clusters. This is very important, because a number of clusters must be the same as a number of signals. This condition cannot be followed. Figure 3: The basic idea based on SOM (This is not possible to set a number of clusters therefore there are different a number of clusters than signals.) where x(i) represents the mixed signal (in the time domain) and X(k) is Fourier s image of the mixed signal (in the frequency domain). It is FFT, but we need also inversion of FFT (ifft). It is defined by the equation N 1 '( ) = s i S'( k) i= 0 j2 i N e π, i = 0,1,2,..., N 1 where S (k) represents Fourier s image of the estimated signal (in the frequency domain) and s (i) is the estimated signal (in the time domain). 5.2. Neural Networks SOM and LVQ We use the same neural networks such as was used in first system. First used neural network SOM has been used as classifier [5], because of its non-linear ability of transformation. The basic idea of it is based on change a position of neurons (in fact it is only change the weight of neurons). These neurons are attracted to clustering. The basic idea of using SOM in develop system is shown in Figure 3. The spectral lines, which are very close among them, are clustered. Each cluster means an audio signal in frequency domain. After ifft, these signals are separated to time Figure 4: The basic idea based on LVQ (There is exactly to set a number of clusters. The number has to be the same as a number of signals.) Accordingly we have used LVQ because of its similarity with SOM. This neural network is simply put SOM with supervised learning [5]. The idea of this system is the same as with SOM, but we can set exact number of clusters. The basic idea using LVQ is shown in Figure 4. After both clustering (by SOM or LVQ) we transform signals in time domain from

frequency clusters. For example in Figure 4, the Cluster 1 is first separated signal and Cluster 2 is second separated signal. We describe only situation with two signals, but this idea is used for more signals. For easier explanation we show only this approach. This system was programmed in Java programming language. It follows that it is independent of operation system. This system will be located on web page http://cs.felk.cvut.cz/~bratm. The mixed signals have been pre-processing by FFT. After that we have only used SOM, because the results are quite good. The results can be shown in Figure 6 a) and b). 6. EXPERIMENTS We would like to describe several experiments with a developed system. Some experiments utilize simple audio signals (e.g. mixture of audio signals with exactly fixed frequency) and also songs or human speech. The experiments have been performed on a PC with the Intel 600 MHz processor, with 256 MB operation memory. The operation system has been Windows 2000. We have prepared more experiments, but now we show only experiments with audio signals. The input (mixed) signals are shown in Figure 5 a) and b). There are two mixed signals, in fact damaged signals, which have to be repaired. This is simulation of cocktail party. a) First separated audio signal b) Second separated audio signal Figure 6: The separated signals We would like to compare quality of separated signals. Quality can be obtained from joint density. This graph of joint density must be square, but it can be also turned. Figure 7 shows joint density of mixed signals. It is nonorthogonal (non-squared). Figure 8 shows joint density after separation. It is much better then mixed signals. If we look at audio signals or we are listening songs, it is quite good. In conclusion we resume our results. a) First damaged audio signal b) Second damaged audio signal Figure 5: The input mixed signals (speech /one, two, / & song simultaneously) Figure 7: Joint density of mixed signals

AUTHORS BIBLIOGRAPHY Figure 8: Joint density of separated signals 7. CONCLUSION This system based on FFT and SOM seems to be very usable for audio separation problems. We can resume that this idea can be used for solving of the BSS problem especially cocktail party problem. We also know that this system is not completely perfect. Firstly we thought that the system based on clustering using SOM is inapplicable. But during developed experiments we ascertain that this system is quite good, maybe better than a system utilizes an idea of clustering by LVQ. We are increasing this system based on LVQ neural network and after that we compare the results. We would like to present our results on next conference. REFERENCES [1] Hyvärinen, A., Karhunen, J., Oja, E. 2001. Independent Component Analysis. Canada. ISBN 0-471-40540-X [2] Brát, M., Šnorek, M. 2002. Extended Neural ICA for Blind Signal Separation. pages 125-132. MOSIS. ISBN 80-85988- 71-2. [3] Brát, M. 2003. Blind Signal Separation Data Streams Mining Using Neural Network. Postgraduate Study Report DC- PSR-2002-11. CTU. [4] Pajumen, P., Hyvärinen, A., Karhunen, J. 2000. Non-linear Blind Source Separation by Self-Organizing Maps. Helsinky University of Technology. Espoo. [5] Šíma, J., Neruda, R. 1996. Teoretické otázky neuronových sítí. MATFYZPRESS. ISBN 80-85863-18-9. MICHAL BRÁT was born in south-bohemia in Počátky, Czech Republic, in 1977. He studied Computer Science and Engineering at Czech Technical University. At this time he is Ph.D. student at the Department of Computer Science and Engineering of Faculty of the same University (CTU). He is interested on a processing of signals, audio and video process and artificial intelligent, especially neural networks. MIROSLAV ŠNOREK was born in south bohemian town Písek, CZ, in 1947. He studied Technical Cybernetisc at Czech technical University Prague and he graduated in 1970. He is currently Associated Professor at the Department of Computer Science and Engineering of Electrical Faculty of the same university (CTU). He is the head of Neural Network Group. His research interests include unsupervised clustering, GMDH algorithm and neural network applications in modelling and interfacing computers to the real world.