Pitch estimation using spiking neurons

Size: px

Start display at page:

Download "Pitch estimation using spiking neurons"

Easter O’Brien’
5 years ago
Views:

1 Pitch estimation using spiking s K. Voutsas J. Adamy Research Assistant Head of Control Theory and Robotics Lab Institute of Automatic Control Control Theory and Robotics Lab Institute of Automatic Control Control Theory and Robotics Lab Darmstadt University of Technology Darmstadt University of Technology Petersenstr. 2 / D Darmstadt Landgraf-Georg-Str. 4 / D Darmstadt kvoutsas@rtr.tu-darmstadt.de adamy@rtr.tu-darmstadt.de ABSTRACT The paper introduces a brain-like al model for sound processing. The Periodicity Analyzing Network (PAN) is a bio-inspired neural network of spiking s simulating certain parts of nuclei of the auditory system in detail. The PAN consists of complex models of s, which can be used for understanding the dynamics of individual s and the mechanisms of structured neural networks of the auditory system. Because of the cochlear frequency analysis, a responds strongest at its characteristic frequency (CF). In addition to its CF, a coincidence is tuned to a certain periodicity, i.e. a certain modulation frequency of an AM signal, also called the best modulation frequency (BMF). Following the cochlear filtering, each PAN responds to the encoded carrier and modulation information according to its BMF and CF, thus forming a spatial structure, where the representations of CF and BMF, encoding carrier and modulation frequency respectively, are roughly orthogonal. On a technical level, the network is able to process fundamental frequency characteristics of harmonic sound signals. The PAN model may therefore be used in audio signal processing applications, such as periodicity analysis, pitch extraction and the cocktail party problem. Introduction Most common pitch estimation algorithms are based upon time domain (temporal) or frequency domain (spatial) methods. Autocorrelation, zero crossings or maximum likelihood methods in the time domain [, 2, 3] or the cepstrum method and the harmonic product spectrum method in the frequency domain [4, 5] provide a wide range of different algorithms for pitch estimation and extraction. Most of these methods are mathematical models and only vaguely based on the physiological models of hearing. Some other methods combine the advantages of spatial and temporal processing methods [6] and only one biologically inspired spatiotemporal method for pitch estimation is so far widely known [7]. A new biologically based spatiotemporal approach to the pitch estimation problem is introduced in this paper. The Periodicity Analyzing Network (PAN) is a spiking neural network based on neural mechanisms, utilizing complex models, and attempting to simulate certain parts of nuclei of the auditory system in detail. It can be used both for

2 purposes of understanding the mechanisms of a structured neural network of the auditory system and for periodicity analysis and sound source localization tasks with amplitude modulation (AM) signals in technical applications.. Physiological fundamentals. The auditory pathway The external ear (pinna) bundles the arriving sound, which is then directed through the outer auditory canal to the ear drum and to the inner ear. The basilar membrane in the cochlea is tonotopically organized. This feature of the cochlea enables a decomposition of the traveling wave generated by the incoming acoustic stimulus at different points of the membrane, thus activating a frequency filtering mechanism, which filters higher frequencies at the beginning of the cochlea and lower at the end of it. The basilar membrane in the cochlea is lined with sensitive hair cells, which trigger the generation of nerve signals that are sent through the auditory nerve (AN) to the central nervous system (CNS). The AN transfers spike encoded sound signals to the three centers of the cochlear nucleus (CN) (Fig. (a)). The first neural processing levels of periodicity analysis occur in the CN [8]. The DCN and the PVCN nuclei forward the signal to the nucleus of the lateral lemniscus (NLL), and to the inferior colliculus (IC), the next processing levels of periodicity analysis [8]. The resulting information is transferred to the auditory cortex (AC) via the medial corpus geniculatum (MGB). A spiking neural network was developed, which makes use of the described interconnections in the auditory pathway. The neural network is able to perform periodicity analysis tasks as described in the following section along with biological evidence from electrophysiological experiments supporting this model..2 Physiological structure of the periodicity analyzing network The neural network described in this section is a correlation network of spiking s. The basic structure of the periodicity analysis model (Fig. (b)) consists of a trigger, an oscillator, an integrator complex, and a coincidence. Exemplary al potentials describing the function of the four modules of the network driven with an optimal stimulus are shown in the right part of Fig. (b). The function of the network is based upon the correlation of delayed and undelayed al responses of the depicted s to envelopes of AM signals. These responses converge finally at s acting as coincidence detectors [8]. Each modulation period of an AM signal triggers the trigger (Fig. (b)), which triggers a rapid oscillation (oscillator potential in Fig. (b)) with a predefined frequency. Parallel to that process, the integrator responds to the same cycle only with a longer delay (integration period of the integrator). The coincidence will be activated, despite the different delay times of the two previous units, provided that the integration period equals the period of the AM signal. A coincidence will respond more often, when its inputs are synchronized, i.e. when the oscillation and integration delay periods of its inputs have approximately the same duration. Thus, modulation periods, m τ m, with m =, 2,..., which activate the oscillations and drive the coincidence unit can be computed from the following linear equation:

3 Cortex t m Corpus geniculatum MGB Auditory nerve Colliculus inferior IC Trigger - t m Nucleus cochlearis DCN PVCN AVCN Cochlea MSO MNTB NLL LSO Oscillator Coincidence Inhibition FF Integrator - 2 FF2 n t c Coincidence equation: t m= n t c - kmax tk t k Figure. (a) The auditory pathway, (b) The periodicity analyzing neural model and some exemplary al potentials of a PAN module. The model is driven with a stimulus generating equal oscillation and integration delay periods and therefore a coincidence for the specific module m τ = n τ k τ (.) m c k where m, n are small integers, and k =,,...,k max. n τ c is the integration period, which consists of n carrier periods and which is the time the integrated input signal needs to reach a certain threshold. /τ c is the carrier frequency of the AM signal, /τ k the frequency of the oscillations and k max the number of the oscillations triggered by the modulation of the AM signal which are required for the synchronization of the two inputs of the coincidence unit. The parameter m takes into account the fact that coincidence s respond also to harmonics (m>) of the modulation frequency of the AM signal, which implicates an ambiguity of IC s with respect to harmonically related signals. A solution to this problem based on electrophysiology results is proposed by [9] and is also tested in the present model. Because of the cochlear frequency analysis, a responds strongest at its characteristic frequency (CF). In addition to its CF, a coincidence is tuned to a certain periodicity, i.e. a certain modulation frequency of an AM signal, also called the best modulation frequency (BMF). Therefore, different trigger, oscillator, integrator, and coincidence units are needed to cover the range of periodicity of AM signals. The biological evidence supporting the hypothesis about the existence of such periodicity analysis in the auditory system are described in detail in []. The periodicity analysis model explains the selectivity of the s of the midbrain for a specific BMF. Utilizing a model of cochlear filtering, a mechanism of encoding the carrier and modulation information of an AM signal and numerous PANs in parallel differently tuned for various CFs and BMFs, we can simulate the response of the IC to AM signals (Fig. 2(a)). We can therefore perform periodicity analysis and pitch extraction. The implementation of the modules up to the input of the PAN and the simulation of the PAN model are described in the following section.

4 (a) Frequency Band Pass Rectifier Cochlear Filter DCN Integrator VCN Envelope Coder CN Frequency ICC Coincidence Detector Periodicity (b) Trigger Oscillator Inhibition Modulation In Carrier 2 In2 Integrator Coincidence Flip-Flop Flip-Flop 2 Figure 2. (a) A highly simplified scheme of the tonotopic and periodotopic organization of the auditory brainstem []. Following the cochlear filtering, the modules of the PAN respond to the encoded carrier and modulation information according to their BMF and CF, thus forming a spatial structure, where the tonotopic and periodotopic axes of the IC s are roughly orthogonal. (b) Block diagram of the PAN model implementation corresponding to the physiological model of Fig. (b). 2. Simulation of the PAN model 2. Models of the cochlea and of the inner hair-cells A model of the cochlear filtering mechanism is used to simulate the band-pass decomposition of a sound signal and the tonotopic organization of the cochlea. A corresponding band-pass filterbank is used, where the filterbank consists of a series of band-pass filters, the so-called ERB-filters [2]. The equivalent rectangular bandwidth (ERB) corresponds to the bandwidth of each filter of the human cochlea along various points on the basilar membrane based on psychoacoustic measurements. The decomposition of the AM signal in the cochlea is followed by a simulation of the inner hair-cells, which transform the mechanical response of each filter to electrical pulses [3]. At every positive zero-crossing of the filtered signal a spike is triggered. The amplitude of each spike equals. A spike train for each filter is thus generated, which is then used as encoded information about the modulation and the carrier frequency of the AM signal. A more detailed description of the cochlea and inner hair-cell models can be found in []. 2.2 Simulating s The functional structure of the chemical synapse model can be seen in Fig. 3. An incoming spike from the presynaptic releases synaptic vesicles containing neurotransmitters. The vesicle emission mechanism is simulated with a look-up table providing a certain predefined amount of vesicles each time the subsystem is enabled by an incoming spike, as seen in Fig. 3. The transmitter molecules diffuse to the postsynaptic through the synaptic cleft. The decay of the transmitter

5 concentration is simulated by a leaky-integrator. The amount of transmitters on the postsynaptic changes its permeability to certain ions. Ion channels are thus gradually opened, receiving even more ions, forming a current moving towards the soma of the using a resistance mechanism which forms a gradually increasing post synaptic current (PSC). PSCs can be either excitatory or inhibitory (EPSC or IPSC), depending on the ions rushing through the postsynaptic membrane. This mechanism is simulated by the weight function of the synapse model. The overall time needed for the diffusion of the transmitters and the transmission of the PSCs to the soma is modelled with a predefined time delay for each synapse. A soma model based on an integrate-and-fire model [4] was especially here developed for the PAN simulation. A leaky integrate-and-f\/ire consists of a leak resistance R, in parallel to a capacitance C driven by an external current I. The will fire only if the excitatory input is strong enough to overcome the leak. The voltage u across the capacitor can be interpreted as the membrane potential of the. The voltage u starts from zero and increases or decreases in dependence of the synaptic input. When the voltage u reaches a threshold ϑ, the fires instantly a spike, and returns to the initial value of u=v. After an absolute refractory period, during which the cannot fire due to hyperpolarization of the membrane, and a relative refractory period, during which the can fire only when a very strong input exists, the cell is ready to fire again. A detailed description of the models, the tunable parameters and their value regions can be found in []. AP input OR conductance s local time transmitter emission weight transmitter amount latency transmitter concentration s leackage PSC output other incoming PSCs s integrator 2 leackage current delay2 z norm membrane potential 2 AP spike output generation. threshold z delay refractory period Out gaussian noise integrator2 s Figure 3. Block diagrams of the chemical synapse and the leaky integrate-and-fire soma model implemented in MATLAB SIMULINK. 2.2 Simulating the PAN model Based upon the biological model seen in Fig. (b), a simulation model utilizing the model described above was developed, Fig. 2(b). The implemented PAN unit is functionally similar to its biological analogon described in section.2 including also a third inhibitory connection to the coincidence. Furthermore, a new function of the PAN is proposed here to cover stimuli at higher frequencies. The trigger and the integrator receive the two PAN inputs, one encoding the modulation and the other the carrier frequency of the acoustic signal. The trigger is synchronised to the incoming signal from the inner hair-cells model and triggers the oscillator, which is implemented by only one oscillating in our model. One spike (AP) of the trigger is sufficient for the oscillator to release a series of spikes with a predefined frequency, thus providing the coincidence time window needed for the periodicity analysis. The flip-flop s

6 synchronize the accumulation of spikes in the integrator with the output of the trigger and the integrator provides spikes to the coincidence, which also has a third input simulating the modulation coupled inhibition of the coincidence mechanism. This inhibition mechanism suppresses reactions of the coincidence to harmonics of the preferred BMF of a specific PAN unit. Depending on the frequency of the incoming signals we propose a dual-function mode scheme for the PAN model. When receiving low frequency stimuli (< khz), the response of the integrator is coupled to each modulation period of the stimulus [], while for high frequency stimuli (above khz), the integrator and thus the flip-flop structure respond every two modulation periods of the stimulus (Fig. 4). The advantage of the second mode is that the integrator and the flip-flop s are still able to respond phase coupled to higher frequency stimuli, while, if working in the first mode, this would not be the case and one would need a population of s to encode higher frequency stimuli. Therefore, system simplicity and robustness (higher frequencies can be better encoded with fewer s) and model execution time are positively affected by the introduction of the proposed dual-mode scheme. Carrier frequency APs (a) Trigger APs (d) Integrator PSP (b) Oscillator s APs (e) Integrator APs Time in sec (c) Coincidence APs Time in sec (f) Figure4. AP and PSP plots of a PAN unit tuned for a Hz to 6Hz (modulation/carrier frequency) signal and tested with signal.(a) APs of the encoded carrier frequency as received from the cochlear filterbank, (b) PSP of the integrator to the incoming APs of (a), (c) resulting APs of the integrator, (c) APs of the trigger, which receives various cochlear filterbank channels and decodes the modulation frequency of the signal, (d) oscillator APs generated at each incoming AP of the trigger seen in (d), and (f) coincidence APs, resulting from the temporal coincidence of (c) and (e) and thus encoding the specific carrier to modulation frequency ratio of the incoming signal.

7 Each block of the model consists of a as described in Section 2.2, with the trigger, and the oscillator having one, the integrator and the flip-flop s having two, and the oscillator having three synaptic inputs. Numerous parameters of each can be tuned according to the CF and the BMF that one PAN unit should maximally react to. Among these parameters are the amount of transmitters, the time delay and the weight of each synaptic model. The threshold, the leakage current and the refractory period of each soma model can be optimised for every PAN unit. Adjusting the parameters of a PAN unit can be done by using optimization algorithms and is a challenging task for further research. 3. An example of pitch estimation The tests presented in Fig. 5 show an aspect of the evaluation of a PAN unit. One PAN unit tuned for a specific modulation to carrier frequency ratio of an arbitrary incoming stimulus and for a specific CF is tested with a wide range of SAM stimuli. 5 modulation frequencies ranging from 6 to 2 Hz and 5 carrier frequencies ranging from 3 to 4 Hz were tested. As seen in both exemplary cases, the maximum response of the PAN unit is correctly placed at the tuned (desired) ratio. Existing responses in the neighbourhood of the maximum response can be suppressed utilizing a winner-take-all neural network at the output layer of a complete PAN array, thus providing an increased efficiency of the model. Figure 5. Simulation results of two PAN units, the one on the left tuned to react at a Hz modulation to 6 Hz carrier frequency AM signal and the one on the right for a 5 Hz modulation to 8 Hz carrier frequency AM signal. The PAN units were tested with 225 SAM signals with different combinations of modulation (6 to 2 Hz) and carrier (3 to 4 Hz) frequencies. 4. Summary and conclusions The simulation results of the complete auditory spatial tonotopic and periodotopic structure consisting of PAN units show, that it is possible to combine processing tasks with detailed models of spiking s and neural networks based on al mechanisms to obtain technical applications that perform comparable to the auditory system.

8 Furthermore, an accurate periodicity analysis mechanism providing pitch estimation can be implemented using the PAN unit. The tonotopic and periodotopic structure proposed in this paper can therefore be used for distinguishing one among many simultaneously speaking persons. A further improvement is proposed with a dual-mode function scheme to cover a wide range over frequencies of incoming stimuli. 5. Literature [] A. E. Rosenberg, M. R. Sambur: New techniques for automatic speaker verification, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-23, pp , 975. [2] N. J. Miller: Pitch detection by data reduction, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-23, pp , 975. [3] J. J. Dubnowski, R. W. Schafer, L. R. Rabiner: Real-time digital hardware pitch detector, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-24, pp. 2-8, 976. [4] R. W. Schafer, L. R. Rabiner: System for automatic formant analysis of voiced speech, J. Acoust. Soc. Am., vol. 47, pp , 97. [5] A. M. Noll: Cepstrum pitch determination, J. Acoust. Soc. Am., vol. 4, no. 2, pp , 967. [6] T. Tolonen, M. Karjalainen: A computationally efficient multipitch analysis model, IEEE Trans. Speech Audio Processing, Vol. 8(6), S , 2. [7] M. Slaney, R.F. Lyon: A perceptual pitch detector, in Proc. of IEEE Int. Conf. on Acoustics Speech and Signal Processing, Vol., S , 99. [8] G. Langner: Neuronal periodicity coding and pitch effects, in Central Auditory Processing and Neural Modeling (Ed. Poon, and Brugge), New York: Plenum Press, pp. 3-4, 998. [9] M. Ochse, G. Langner: Modulation tuning in the auditory midbrain of gerbils: bandpasses are formed by inhibition, Proc. 5th Meet. of the German Neurosc. Soc., pp , 23. [] K. Voutsas, G. Langner, J. Adamy, M. Ochse: A brain-like neural network for periodicity analysis, Trans. Systems, Man, and Cybernetics, Part B, submitted November 23, accepted as a regular paper, July 24. [] G. Langner, M. Sams, P. Heil, H. Schulze: Frequency and periodicity are represented in orthogonal maps in the human auditory cortex: evidence from magnetoencephalography, Journal Comp. Physiol., Vol. 8, S , 997. [2] R. Patterson, I. Nimmo-Smith, J. Holdsworth, P. Rice: Spiral VOS final report: Part A, the auditory filterbank, Internal Report, University of Cambridge, England, 988. [3] R. Meddis, M.J. Hewitt, and T.M. Shackleton: Implementation details of a computational model of the inner hair-cell/auditory-nerve sysnapse, J. Acoust. Soc. Am., vol. 87(4), pp , 99. [4] C. Koch, C.H. Mo, W. Softky: Single-Cell Models, in The Handbook of Brain Theory and Neural Networks (M.A. Arbib, Ed.), 2nd ed., Cambridge, MA: MIT Press, 23, S

AUDL 4007 Auditory Perception. Week 1. The cochlea & auditory nerve: Obligatory stages of auditory processing

AUDL 4007 Auditory Perception. Week 1. The cochlea & auditory nerve: Obligatory stages of auditory processing AUDL 4007 Auditory Perception Week 1 The cochlea & auditory nerve: Obligatory stages of auditory processing 1 Think of the ear as a collection of systems, transforming sounds to be sent to the brain 25