Subission International Conference on Acoustics, Speech, and Signal Processing (ICASSP ) PARAMETRIC AND NON-PARAMETRIC SIGNAL ANALYSIS FOR MAPPING AIR FLOW IN THE EAR-CANALTO TONGUE MOVEMENT: A NEW STRATEGY FOR HANDS-FREE HUMAN-MACHINE INTERFACE Ravi Vaidyanathan,, Hyunseok Kook, Lalit Gupta, & Jaes West Departent of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH, Departent of Electrical and Coputer Engineering, Southern Illinois University, Carbondale, IL Think-A-Move, Ltd., Beachwood, OH ABSTRACT A coplete signal processing strategy is presented to detect and precisely recognize tongue oveent by onitoring changes in airflow that occur in the ear canal. Tongue oveents within the huan oral cavity create unique, subtle pressure signals in the ear that can be processed to produce coands signals in response to that oveent. Once recognized, said oveents can in turn be used in huanachine interface applications such as counicating with a coputer and controlling echanical devices. The processing strategy includes pressure signal acquisition using a icrophone inserted into the ear-canal, PSD analysis to design bandpass filters to reject pressure changes due to sources other than tongue oveents, start- and end-point detection in the wavefors through crosscorrelation, signal estiation, and the design and evaluation of paraetric and non-paraetric signal classifiers. The non-paraetric signal classifiers include non-linear alignent classifiers and atched filters, while the paraetric classification involves a ultivariate Gaussian classifier using AR odel paraeters. The coplete strategy is tested on tongue actions: touching the tongue to the left and right corners of the outh, and to the top and botto center of the outh. Through extensive experients, it is shown that the pressure signals due to tongue oveents are distinct and can be detected with over 97% accuracy. It is thus concluded that the unique strategy will ake hands-free control of devices using tongue oveents a practical reality. This work is supported by NIH Phase I SBIR Research grant RHD7-A. INTRODUCTION Although there is a well-recognized need in society for effective achine interface echaniss that will enable the physically ipaired to be ore independent, uch of the technology developed towards this goal still fails to eet their specific needs. At present, the ajority of existing systes ay be classified as echanical-input devices; i.e. the user physically oves a part of a device in order to generate a control input signal. Exaples of such systes include hand-operated joysticks and the use of head or chin oveents to ove a lever whose otion is translated into control coands. Systes of this nature require constant bodily oveent that can be tiring and uncofortable to the user, while regular use can also cause repetitive otion injuries and skin irritation. Furtherore, the single-lever design liits allowable coands that can be generated, and is extreely liiting for people with liited upper extreity function. Given that ost patients with liited extreity function (such as victis of spinal cord injury (SCI) and arthritis) possess the ability to ove their tongue and/or outh effectively, the potential of the huan oral cavity has been exploited as a source for achine control signals. Conteporary exaples include inserting a track-ball, joystick, plastic palate, or sip-and-puff straw into the outh of an individual with the tongue or lips providing control input. These devices, however, are extreely intrusive, irritate the outh, ipair verbal counication, present hygiene issues, and are also liited signal generation capacity. The goal of our work is to develop a patientgenerated control strategy which can overcoe the deficits of these existing systes.
Specifically, it is our intention to develop a nonintrusive tongue-oveent based achine interface without the insertion of any device within the oral cavity. We introduce a unique strategy for detecting tongue oveent through the onitoring of air pressure changes within the ear canal. The focus of this paper is a new eans of detecting tongue oveent in order to generate input signals that can be used for hands-free control of devices (such as a wheel chair) in huan-achine interface applications. Our ongoing investigations have shown that various oveents within the oral cavity create unique traceable pressure changes in airflow within the ear canal that can be easured with a siple pressure sensor (e.g. icrophone) placed in the ear. Individuals with liited upper extreity control can use the output of the icrophone as an effective eans to counicate with a coputer and/or to control electro-echanical assist devices (e.g. a power wheelchair). Patients suffering fro spinal cord injuries (SCI), repetitive strain injuries (RSI), severe arthritis, loss of otion due to stroke, and central nervous syste (CNS) disorders would all benefit greatly fro this concept. The success of this strategy will clearly depend on the accurate classification of tongue oveents based on air flow easured in the ear. The scope of the paper, therefore, is to deonstrate that pressure signals in the ear corresponding to tongue oveent are distinct, and can be classified accurately. The detection of the tongue oveent pressure signals in the ear canal is forulated as a M -class pattern classification proble in which the classes correspond to M distinct tongue oveents. Pressure signals resulting fro M = tongue oveents (left, right, up, and down) is used to deonstrate the effectiveness of the strategy.. SIGNAL ACQUISITION Figure illustrates a pressure sensor inserted partially into the ear of an individual (within the cavity defined by the pinna, if not deeper within the ear such as within the concha, at the opening of the ear canal). The sensor includes a shielding housing and an internal icrophone. The internal icrophone resides on the interior portion of the housing within the ear canal at a Figure : Earpiece Housing depth of. to. easured fro the opening of the ear canal. Insertion of the icrophone into the ear canal shields pressure signals fro environental noise. The external icrophone (not used in the present study), will be used in future studies to onitor and exclude signals fro external sources. Figure shows exaples of pressure signals in the ear (sapled at KHz), when a subject was asked to ove their tongue lightly to the left, right, top, and botto of the outh respectively. Each oveent was repeated ties, thus each figure has superiposed signals corresponding to the sae tongue oveent. x x - - - - x x - - - - Figure : Raw Data fro Tongue Moveents. SIGNAL ANALYSIS AND PROCESSING Conventional signal processing techniques are generally inadequate to recognize the subtle pressure variations in the ear canal resulting fro tongue oveent. The ear canal itself is an interference-ridden, noise-aplified environent for acoustic recording. Furtherore, external noise (environental sounds) can easily obscure the slight pressure deviations accopanying tongue oveent. The following two sections enuerate the steps Page of
in our current processing and classification strategy. Bandpass Filtering and Noralization The first step in the analysis of the signals in Figure is to identify the frequency range of interest in the signals. The averaged PSD of the signals are shown in Figure. It is observed that pressure signal activity is approxiately in the band to Hz. Therefore, in the first step of processing, the signals are bandpass filtered using and as the lower and upper cutoff frequencies, respectively. By exaining the signals in Figure, it is clear that the signals have aplitude differences within the sae class and are not aligned in tie. The signals can be easily aplitude noralized by dividing each saple of a signal by the standard deviation of the saples in the signal []. In the generalized forulation to follow, let h, i ( k =,,..., N; i =,,..., L be the ith filtered and aplitude noralized signal of class:, =,,..., M, where, M is the nuber of signal classes, N is the nuber of saples, and L is the nuber signals in each class (assued equal for convenience). average of PSD 8 8 action action action action.......... Figure Average PSD of signals in Figure Signal Estiation Signal averaging is one of the ost frequently used operations to estiate signals fro the outcoes of a rando process [,] and can, therefore, be used to estiate the underlying signal of each pressure signal class fro the aplitude noralized outcoes. However, directly averaging the signals h ( i,,...,, i = L will result in a poor tie-seared estiate because the signals are not aligned in tie. The accuracy of the estiate can be iproved if the signals are first aligned in tie with a teplate of each class and then averaged. The proble, however, is that the teplates are not available because the true pressure signals are unknown. A pairwise cross-correlation based averaging procedure is introduced to first generate an initial signal teplate for each class and then use the initial teplate to align signals and estiate the signal of each class. If L is assued to be an integer power of, the average h, L ( =,,..., M of the L signals can be coputed as: h, L ( k) = ( / )[ h; ( L / ) ( k) + h;( L / ) + L ( k)], where, L / h ; ( L / ) ( k) = [/( L / )] h, k =,,... N i = is the ean of the first half of the L signals and L / h ; ( L / ) + L( k) = [ /( L / )] h, k =,,... N i = ( L / ) + is the ean of the second half of the L signals. By further decoposing the first half and second half of the signals into equally sized sets of size (L/), the eans can be coputed as h; ( L / ) ( k) = (/ )[ h; ( L / ) ( k) + h;( L / ) + ( L / ) ( k)] h;( L / ) + L( k) = ( / )[ h;( L / ) + (L / ) ( k) + h;(l / ) + L( k)] The L signals can be decoposed into successively saller sets until pairs of signals are left. The signals in each pair are averaged by aligning the sequences in the position of axiu cross-correlation. The eans of the pairs are cobined in a pairwise fashion according to the steps outlined above to deterine h, L ( =,,..., M. The initial teplate for each class is fored by identifying the start- and end-points of the tongue action in h, L ( k) and extracting the signal segent between these two points. If the start- and endpoints in the initial teplate are denoted by a Page of
and b, respectively, each signal h ( i,,..., is segented by aligning it, i = L with the initial teplate in the axiu crosscorrelation position and ultiplying it with a rectangular window R ( ). That is, the a, b k segented signals are given by h, k) Ra, b( i =,,..., L; =,,..., M. If N = ( b a + ), then, the N saples of the segented signals are re-ordered and represented by v, k =,..., N. The final estiate h ( =,,..., M, of the signal for each action class can be estiated by averaging the segented signals v, k =,..., N. Figure shows estiates of the signals of the action classes coputed using L =. - - k = N. If it is assued that the noise is white, then, the atched filter h ( =,,..., M is given by h ( k) = h ( N k =,,..., N That is, the unit saple response is the signal reversed in tie and delayed by N saples. The response of each atched filter to an input test signal represented by T = t( k =,,..., N is coputed and the test signal is assigned to the class of the atched filter that yields the axiu value at tie N. That is, if y ( N), =,,..., M, is the response of h (k) to T at k = N, then, T is assigned to the class given by = arg MAX[ y ( N)] Autoregressive (AR) Modeling The underlying generation of the pressure signals of each action can be odeled by an AR process of the for p v, k) = α, i + θ, i, jv, k j) + β, iω( i) j = - - - - 7 8 - - - - 7 8-7 8. CLASSIFICATION STRATEGIES Given the segented signals belonging to the M classes, different classification ethodologies can be applied to detect the classes of the signals. In this study, atched filtering, autoregressive odeling, and non-linear alignent ethods are developed to deterine the signal classes. Matched Filter A atched filter can be designed to detect a signal buried in noise under the conditions that the signal is known and the noise is stationary. The atched filter is designed to axiize the output signal-to-noise ratio at the tie instant - - - - - - 7 8 Figure : Estiates of the Pressure Signals and the odel paraeters ( θ, i, j, α, i / β, i ) can be used as features for signal classification. If it is assued that the class conditional density functions of the AR feature vector are Gaussian with ean vector µ and covariance atrix Ψ, the discriinant function of the resulting Gaussian classifier for class, assuing equal prior probabilities, is given by T D ( T ) = ( / ){ln Ψ + ( T µ ) Ψ ( T µ )} + ln P( ) where, P() is the class prior probability. For this case, the test signal T is assigned to the class given by = arg MAX[ D ( T )] Non-linear Alignent Various alignent-based ethods can also be forulated to deterine the siilarity of a test signal and a teplate of a signal [,]. Nonlinear alignent, also called dynaic alignent, optially aligns two signals to copensate for non-linear expansions and copressions in signal segents and also to copensate for Page of
duration differences. In the design of non-linear alignent classifiers, the goal is to deterine a apping W between the tie-index p of a test signal t( p) and the tie-index q of a reference signal h (q) such that the best alignent between the two sequences is obtained. The apping W = [ w(), w(),..., w( Z)] where w ( z) = [ z), j( z)] ; p = z), z =,,..., Z ; q = j( z), z =,,..., Z, defines a piecewise linear alignent path in the ( p, q) plane. Both tie axes are transfored into a coon tie axis z of length Z. When there is no tiing difference between the sequences, the warping path coincides with the diagonal line ( p = q). The best alignent path is given by deterining W that iniizes D = Z d[ t( z)), h ( j( z))] z = where D is the total accuulated distance between t ( p) and h (q) along W and d[ x, y] is the local distance between the saples x and y. Exaples of local distance etrics include the absolute difference and the difference-squared nor. In order to restrict W in a eaningful anner in the ( p, q) plane, end-point, continuity, and slope constraints are iposed on W [,]. If D (T) is the aligned distance between a test sequence T and a reference sequence h ( q), =,,..., M, then, the test sequence is assigned to the class given by = arg MIN[ D ( T )]. EXPERIMENTS AND RESULTS Pressure data corresponding to tongue oveents: touching the tongue lightly to the left, right, top, and botto of the outh were recorded to design and evaluate the strategy. Each oveent was repeated ties; therefore, each tongue oveent class had pressure signals. Each signal was bandpass filtered and segented ( N =8) as described in Section. The signals were randoly partitioned into utually exclusive and equalsized sets to generate a design set and a test set for each class. For each signal class, the signal estiated fro the training set was used as the reference teplate for non-linear alignent and to deterine the unit saple response of the atched filter. The AR odel paraeters for each signal class were deterined fro the signals in the respective training set using the Yule-Walker autocorrelation ethod. The odel order p = was deterined epirically. The rando resapling approach described in [,] was used to generate J design and test set pairs. Each pair is referred to as a trial and the classification accuracies were estiated over J = trials. Each trial consisted of testing test signals fro each class, therefore, the classification accuracy was estiated by testing (xx)=, signals. For convenience, the pressure signal classes: left, right, up, and down, are represented by =,,, and, respectively. The atched filter, AR odel, and non-linear alignent classification results, assuing equal prior probabilities, are presented in Tables,, and, respectively. The tables show confusion atrices as well as the classification accuracies. The confusion atrix part of the results can be interpreted by exaining the first row of Table which shows that out of the tests conducted with signals vectors drawn fro class, 89.8% were classified correctly as belonging to class,.% were isclassified as class, 9.98% were isclassified as class, and.7% were isclassified as class. The results show that an average classification accuracy of 9.8%, 8.%, and 97.7% can be achieved by the atched filter, AR odel classifier, and the non-linear alignent classifier, respectively. The perforance of the non-linear alignent classifier, which can copensate for non-linear variations, is superior to that of the atched filter which is essentially a cross-correlator. Cross-correlators are not capable of accoodating duration and non-linear variations. The perforances of the non-linear alignent classifier and the atched filter are superior to that of the AR odel classifier which copresses the 8 saples into a sall set of odel paraeters that are used as features. Page of
.. 8.8.. 98.99 Class. Accuracy = 9.8% Table : Matched Filter 89.8. 9.98.7 77..8.7 7..9 78.... 9.7 Class. Accuracy = 8.% Table : AR Gaussian. 9.9. 97.97... 97.97 Class. Accuracy = 97.7% Table : Non-linear Alignent. CONCLUSIONS The goal of this paper was to develop a signal processing strategy to deonstrate that the pressure changes in air flow that occur in the ear canal due to tongue oveent are distinct and that they can be detected accurately. PSD analysis was conducted to deterine the frequency range of the pressure signals in order to design bandpass filters. A pairwise crosscorrelation based averaging procedure was developed to obtain initial estiates of the pressure signals corresponding to the tongue oveents. Start- and end-points in the initial teplate were identified and the signals were segented between the end-points in the position of axiu cross-correlation with the initial teplate. A final estiate of the signal of each class was obtained by averaging the segented signals. Three different classification ethods were ipleented to classify the signals. The atched filter and non-linear alignent classifier ade use of the signal estiates for the unit saple responses and the reference teplates, respectively. The paraeters of the AR-Gaussian paraetric classifier were estiated directly fro the segented signals in the training set. The results fro experients conducted on four tongue oveents show that all three classifiers yield good results. The best results were obtained using non-linear alignent which yielded classification accuracies of over 97%. 7. FUTURE WORK Current investigations are focused on the analyses and classification of a wider range of tongue actions and issues related to the practical application of this strategy. These issues include: (a) real-tie detection of the onset of the tongue oveent in the pressure signals, (b) filtering to isolate pressure signals fro other bodily signals and external noise, and (c) deterining the ost suitable classification strategy, in ters of accuracy and speed, for real-tie applications. We are presently targeting coercial applications for this technology including wheelchair control for quadriplegic user [], and robotic interface and control. In suary, the results for the tongue actions are highly encouraging. Based on these results as well as the results fro our on-going investigations, it is concluded that the unique signal processing strategy developed for classifying air flow pressure signals in the ear canal will ake hands-free control of devices using tongue oveents a practical reality. REFERENCES. L. Gupta and S. Ma, Gesture-based interaction and counication: autoated classification of hand gesture contours, IEEE Transactions on Sytes, Man, & Cybernetics C, vol., No.,.. L. Gupta, D. L. Molfese, R. Taana, and P. G. Sios, Non-linear alignent and averaging for estiating the evoked potential, IEEE Transactions on Bioedical Engineering, vol., No., l99.. L. Gupta, J. Phegley, and D.L. Molfese, Paraetric classification of ultichannel averaged event-related potentials, IEEE Transactions on Bioedical Engineering, Vol. 9, No. 8,.. G. Neirovski, Syste and ethod for detecting an action of the head and generating an output in response thereto ; U.S. Patent nuber,,97, issued Page of