IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 1

Size: px

Start display at page:

Download "IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 1"

Clinton Junior Richard
6 years ago
Views:

The final version of record is available at http://dx.doi.org/1.119/jstsp.214.2374574 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO.

1 The final version of record is available at IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 1 Perceptual and Bitrate-scalable Coding of Haptic Surface Texture Signals Rahul Chaudhari, Student Member, IEEE, Clemens Schuwerk, Student Member, IEEE, Mojtaba Danaei, Student Member, IEEE, and Eckehard Steinbach, Senior Member, IEEE Abstract Applications involving indirect interpersonal communication, such as collaborative design/assembly/exploration of physical objects, can benefit strongly from the transmission of contact-based haptic media, in addition to the more traditional audiovisual media. Inclusion of haptic media has been shown to improve immersiveness, task performance, and the overall experience of task execution. While several decades of research have been dedicated to the acquisition, processing, coding, and display of audio and video streams, similar aspects for haptic streams have been addressed only recently. Simultaneous masking is a perceptual phenomenon widely exploited in the compression of audio data. In the first part of this paper, to the best of our knowledge, we present first-time empirical evidence for masking in the perception of wideband vibrotactile signals. Our results show that this phenomenon for haptics is very similar to its auditory analog. Signals closer in frequency to a powerful masker (25 db above detection threshold) are masked more strongly (peak threshold-shifts of up to 28 db) than those away from the masker (threshold-shifts of 15-2 db). The masking curves approximately follow the masker s spectral profile. In the second part of this paper, we present a bitrate scalable haptic texture codec, which incorporates the masking model and describe its subjective and objective performance evaluation. Experiments show that we can drive down the codec output bitrate to a very low value of kbps, without the subjects being able to reliable discriminate between the codec input and distorted output texture signals. Index Terms haptic textures, perceptual coding, data compression. I. INTRODUCTION AND BACKGROUND HAPTICS, in addition to the traditional audiovisual media has brought physical interactivity to multimedia applications. Several high-impact applications, e.g., satellite repair, telesurgery, etc. that require the ability to interact bidirectionally with remote real/virtual surroundings have driven haptics research worldwide. Multimedia research has recognized the potential of haptics also for indirect interpersonal communication in collaborative haptic assembly tasks [1], or remote training of contact-intensive tasks such as surgery [2]. Haptic teleoperation technology [3], as depicted in Fig. 1, allows for physical interaction with objects in remote/virtual environments, and thus supports the subjective feeling of being Copyright (c) 214 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permission@ieee.org R. Chaudhari, C. Schuwerk, M. Danaei, and E. Steinbach are all with the Chair of Media Technology, Technische Universität München, 829 Munich, Germany {rahul.chaudhari,clemens.schuwerk,eckehard.steinbach}@tum.de, m.danaei@mytum.de Manuscript received March 28, 214; revised November 2, 214. Fig. 1. A haptic (kinesthetic + vibrotactile) teleoperation system. The operator (MASTER) controls the activities of the remote SLAVE robot during interactions with the remote environment. The accelerometer sensor on the SLAVE robot picks up wideband acceleration signals arising in the interactions, which encode the surface texture of the remote objects. These signals are then transmitted to the MASTER and displayed to the human operator hand through a vibrotactile actuator (figure reproduced from [5]. present and immersed in them. In the most common teleoperation architecture, a human operator with a MASTER haptic device commands the position/velocity of a remote SLAVE robot (teleoperator) over a communication channel. The haptic (H) signals - forces, vibrations, etc. that are generated at the interface of the robot and the (remote) environment are captured and sent back over the communication channel for display to the operator through a haptic device. Audio-visual (AV) signals sensed at the remote site are also fed back to her/him. The AVH data is typically multiplexed together before transmission from the remote teleoperator side to the operator side [4]. Efficient data transmission is achieved by (lossy) media coding schemes. Such schemes ensure low bitrate consumption per modality, while keeping the associated coding distortion in the signal stream as low as possible (ideally below human perceptual thresholds). Thus the perceptual transparency of the system is maximized, given the available transmission capacity. Research in haptics is focused on two primary areas - kinesthetic and tactile haptics. Kinesthetics feedback involves the display of large amplitude low-frequency force feedback, such as the one elicited by pressing against an object with our hands. However, purely kinesthetic technical systems lack the realistic feel supplied by high-frequency vibrations occurring, for instance, when tapping on hard surfaces [6] or scanning small-scale surface details on textured surfaces [7]. It has been demonstrated that sensing high-frequency acceleration signals during haptic interaction and displaying them as vibrotactile feedback to the human leads to a significant improvement in the perceived realism of haptic interaction [6], [7], [8]. Copyright (c) 214 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by ing pubs-permissions@ieee.org.

2 The final version of record is available at IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 2 In addition to the improved immersion, task performance gains have also been achieved through tactile feedback in inspection, exploration and manipulation tasks in general [8] and in specific applications like telesurgery [9]. A. Motivation for vibrotactile texture compression Haptic data compression will lead to cost-efficient transmission and storage of digitized haptic data. This necessity for haptic data compression is further emphasized by the trend towards multi-channel haptic feedback for bimanual [9] and multi-finger interaction [1]. Furthermore, distributed sensors that capture haptic signals over a wide area on the human body, e.g. the artificial skin presented in [11], increase the haptic data volume by multiples. In teleoperation, texture signals (number of channels x 32 kbps, which is the uncompressed texture signal bitrate [5]) will be multiplexed together with other potentially bitrate-hungry modalities such as video (1 Mbps 4 Mbps) and audio (22 kbps 2 kbps). Hence they should consume as low bitrate as possible out of the overall bitrate available, e.g. for a pointto-point teleoperation link (imagine a constant bitrate satellite channel). Such multiplexing further motivates the development of techniques for haptic data compression. Finally, transmission of haptic data over limited capacity channels like mobile radio would also benefit from compression. Kinesthetic coding schemes have been developed and refined considerably over the past decade [12], [13]. However, very little attention has been paid to the tactile modality in this respect. This is the gap we seek to fill with our work. B. Contributions of this paper We approach the problem of developing an efficient bitratescalable compression algorithm for the transmission of haptic texture data over a packet-switched network in a teleoperation system. We have previously presented a perceptually transparent compression algorithm in [5], which operates at constant bitrate of 3.55 kbps (the uncompressed bitrate being 32 kbps). With this work, we wish to make two main contributions: to provide, to the best of our knowledge, the first empirical evidence for the wideband simultaneous perceptual masking phenomenon for vibrotactile signals (Section III). This is a very critical part of the texture codec that is used to shape the coding noise in such a way that it becomes imperceptible to the human. to present a bitrate-scalable version of the haptic texture codec from [5] (Section IV), enabling us to operate in a range of different bitrates under varied network traffic conditions and quality demands.. C. Related work To the best of our knowledge, the only other work on tactile data compression is [14]. It describes a frequencydomain texture compression algorithm that exploits human vibrotactile perceptual limitations for compression. Herein, textures are represented as surface height profiles captured by an ultra high-resolution laser scanner. They are reconstructed as waveforms representing the height of a surface point as a function of lateral distance on the object surface. These waveforms are transformed to the temporal frequency domain using the Discrete Cosine Transform (DCT) assuming constant scan velocity. The DCT coefficients below human perceptual detection thresholds are set to zero. The remaining coefficients are quantized with step-sizes determined from perceptual difference thresholds [15]. Finally, this lossy-coded frequencydomain information is transformed back to the temporaldomain, and further to the distance-domain assuming the same constant scan velocity as before. This height-waveform is then used for position-based texture rendering. The perceptual quality of this algorithm was evaluated in subjective tests, where subjects scanned the height profile with a haptic device [14]. It was shown that the proposed algorithm is perceptually transparent (i.e., the coding distortion goes unnoticed) up to a quantization as coarse as 12-levels. The algorithm achieves a very significant compression ratio of 4:1. However, when it comes to realtime haptic teleoperation, there are a couple of limitations to the applicability of this approach. Firstly, the algorithm is offline meaning that the entire surface height profile is required in advance as an input to the algorithm. The second and more severe limitation is the assumption of a constant (and predetermined) texture scan velocity, necessary to transform the spatial-domain data to the temporal-frequency domain. In [5], we take a completely different approach to texture compression. In [14], textures are represented by height profiles, whereas in [5], they are represented by acceleration signals sensed by Microelectromechanical Systems (MEMS)- based acceleration sensors. We use MEMS sensors since they are inexpensive compared to position-sensing systems with comparable resolutions, small and lightweight making them easy to mount on the teleoperator end-effector. We draw an analogy between the production of texture signals and speech signals, and adapt speech coding techniques for texture compression. We also show that for a similar level of subjective quality, the compression performance of our algorithm is better than that of the algorithm in [14]. It should be noted here that compression is achieved in [14] by discarding information based solely on a human perception models (the sink for texture signals). Our approach, on the other hand, is based on a model of texture signal production (the source) in addition to that of human perception (the sink). II. ANALYSIS-BY-SYNTHESIS CODING OF HAPTIC TEXTURES In [5], we propose a low bitrate compression algorithm for vibrotactile (haptic texture) data, and prove it to be perceptually transparent through rigorous psychophysical tests. Therein, we develop a code-excited linear prediction (CELP)- based algorithm for haptic texture compression, inspired by qualitative waveform similarity between speech and haptic texture signals. Fig. 2 shows abstract block diagrams of the compression algorithm. On the encoder side, a linear-predictive coding (LPC) synthesis filter p[n] excited by an input signal produces a Copyright (c) 214 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by ing pubs-permissions@ieee.org.

3 The final version of record is available at IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 3 s[n] input texture signal Codec excitation paramaters {i,g,t,β} tune excitation parameters LPC analysis Quantization MSE minimization a i weighted error Encoder Quantization Construct filter excitation Perceptual weighting filter w[n] â i LPC synthesis filter p[n] square(.) synthesized (decoded) signal ŝ[n] â i {i,g,t,β} - + s[n] input texture error signal Construct filter excitation Decoder LPC synthesis filter p[n] ŝ[n] synthesized (decoded) signal Fig. 2. Encoder structure inspired by typical speech CELP (code-excited linear prediction) codecs [16]. The LPC filters capture the spectral envelope of the signal, while the filter excitation captures salient time-domain features, and exploits temporal redundancy for compression. All the codec parameters - i, G, T, β and a i - are quantized to î, Ĝ, ˆT, ˆβ and â i respectively, before transmission to the decoder. In this paper, specifically for vibrotactile perception, we identify the form of w[n] and tune the vector quantizer for the LPC parameters. synthesized texture signal at its output. This synthesized signal is an approximation to the input texture signal. The synthesis filter parameters that capture the frequency characteristics of the texture signal are determined a-priori from the input signal via LPC analysis. The excitation parameters, which capture the long-term time-domain redundancy and the short-term timedomain features are determined iteratively in a closed-loop manner, as explained in the next paragraph. The quantized excitation and synthesis filter parameters are transmitted to the decoder side, where a texture signal identical to the one at the encoder is synthesized. Since the decoder is itself a part of the encoder, this method of coding is referred to as Analysis-by-Synthesis coding. The closed-loop structure (Fig. 2, bottom left) mentioned in the paragraph above minimizes the coding error between the input texture signal and the synthesized signal. Here, a suitable measure of coding error is required. The Mean Square Error (MSE) is frequently used in signal processing applications due to its simplicity and acceptable performance [17], [18]. However, especially for low-bitrate codecs like the one in [5], it is important to judge the perceptual significance of the coding error (noise). A model of the vibrotactile perception must therefore be accounted for in the error criterion in order to minimize the perceived error. The perceptual masking phenomenon, an important limitation of human perception, may be used for this purpose. This phenomenon implies that the perception of a weaker frequency component gets masked by the presence of a stronger one. To exploit the masking phenomenon, the coding noise must be shaped in the frequency domain according to the vibrotactile masking characteristics in such a way that the error is perceived with the lowest intensity possible, or not at all. Therefore, we incorporate the perceptual weighting filter w[n] in the encoder. The perceptually weighted MSE that must then be minimized is given as: ε = Fs/2 S(f) Ŝ(f)) 2 W (f) df (1) where F s is the sampling rate, S(f) and Ŝ(f) are the spectra of the input s[n] and reconstructed ŝ[n] texture signals. Fig. 3 shows example plots for the codec input and output signals. This codec (for a sampling frequency of 2 khz, frame-size 4 samples, i/p bitrate 32 kbps, o/p bitrate 3.55 kbps) works with a compression ratio of approx. 8:1, without introducing any perceptual degradation. A constant output bitrate ensures a deterministic packet-rate, considerably simplifying audio-video-haptic multiplexer design. Frame-byframe processing ensures that free realtime natural scanning of textured surfaces is possible. We established the perceptual transparency of the compression algorithm with rigorous subjective tests in [5]. To match this performance, for every frame of 4 samples, the codec described in [14] should generate 8 bits. That is, the corresponding 4 DCT coefficients should be quantized with a 2-bit (4-level) linear quantizer. However, [14] reports a compression ratio of approximately 4:1 with a 12-level quantizer (fewer than 12 levels trigger significant perceptual differences between the original and quantized textures). This comparison proves that in terms of bitrate reduction, the Acc. [m/s 2 ] ACC(f) time (milliseconds) codec i/p codec o/p Frequency (Hz) Fig. 3. Time- and frequency-domain plots for a segment of a texture signal. Copyright (c) 214 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by ing pubs-permissions@ieee.org.

4 The final version of record is available at IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 4 CELP-based codec described above performs better by at least a factor of 2. Given the novelty of this research area, no work other than [14] could be found for comparison. The SIMULINK (Mathworks, Inc.) model of our codec can be downloaded from [19] for study and research purposes. III. MASKING MODEL FOR VIBROTACTILE PERCEPTION A. Introduction and background For coding, the masking phenomenon implies that humans could tolerate larger errors in high-energy frequency bands, and smaller ones in low-energy frequency bands. In order to make coding error (noise) imperceptible (or to reduce its perceptual intensity), it should be weighted more in lowenergy frequency bands compared to high-energy ones. In order to achieve this, the transfer function for the perceptual weighting filter is made a function of the LP synthesis filter response as: W (z) = P (z/α 2) (2) P (z/α 1 ) where < α 1,2 < 1. Thus, the poles of W (z) lie at the same angles as the poles of the LPC synthesis filter P (z), but at radii α 2 times those of P (z) s poles. On the other hand, the zeros of W (z) lie at the same angles as the poles of P (z), but at radii α 1 times those of P (z) s poles. If α 1 > α 2, the frequency response of W (z) is like a controlled inverse filter of W (z) [2]. Fig. 4 shows an illustration of the nature of W (z) in comparison to the LPC synthesis filter P (z), which is estimated from the segment of acceleration shown in the top pane. The bottom pane also shows the coding error (labeled shaped noise ) that is admitted for this signal segment as a result of such a choice of W (z). During the development of the texture codec in [5], we carried over the auditory masking model used for speech compression as an assumption for the vibrotactile modality. This step was motivated by some (limited) evidence for the existence of simultaneous masking found in the vibrotactile literature. Perceptual transparency of the compression algorithm, proven through subjective tests in [5], justified this assumption. Acc. [m/s 2 ] Gain [db] Time (s) i/p response P(z) W(z) shaped noise Fig. 4. (Top) An acceleration signal segment, (Bottom) the frequency response of the LPC synthesis filter P (z) estimated from the segment, and of the corresponding perceptual weighting filter W (z). With the present work, we wish to provide empirical evidence to support it further. B. Related work on masking models Auditory masking models: A vast number of studies on auditory masking have been carried out since the 195 s. Psychophysical findings from these studies have been applied to auditory signal/device design, and to compression of speech and audio signals. For the sake of brevity, we refrain from a comprehensive review of the auditory masking literature and focus on a select few studies which are in essence suitable for application to speech/audio compression techniques. Pioneering studies on auditory masking have been reported by Feldtkeller and Zwicker in [21]. Fig. 5 shows simultaneous masking patterns for pure tones ranging from approximately 25 Hz to 8 khz, with a narrowband noise centered at 1 khz as the masker stimulus. Similar results were also reported by Egan and Hake in [22] for a masker noise center frequency of about 41 Hz. Finally, in [23], Atal et al. proposed a speech coding algorithm exploiting the simultaneous masking properties of the human ear. Despite the availability of previous masking results, further masking studies are reported in [23] with the roles of the masker and maskee reversed (pure tones as masker stimuli and narrowband noise as maskee). However, the authors made the same qualitative observations about masking as the previous studies. It should be emphasized here that the masking model deployed in speech coding is also qualitative rather than quantitative, and the model parameters are tuned a-posteriori in subjective listening tests. The authors of [23] also found that the model is rather insensitive to the model parameters, and laborious tuning is not required. Vibrotactile masking models: Vibrotactile perception uses sensory information received by cutaneous mechanoreceptors Sound Pressure Level (SPL, db) Detection thresholds L R = 6 db Masking thresholds frequency (khz) Fig. 5. Wideband auditory masking characteristics reproduced from [21]. The increasing peaks represent masking curves for increasing masker loudness. The tone-detection curve is shown at the bottom. It can been seen that the detection thresholds for the pure tones in presence of the masker rise well above their absolute detection thresholds. Moreover, in general, the pure tones further away in frequency from the masker are increasingly easier to detect as compared to the ones close to it. This behavior is seen to hold for a range of masker noise intensity from db to 6 db. 4 2 Copyright (c) 214 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by ing pubs-permissions@ieee.org.

5 The final version of record is available at IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 5 embedded in the human skin. Of the four kinds of mechanoreceptors, our research focuses on the perception mediated by Pacinian corpuscles (PC) found in the deep subcutaneous tissue. PCs respond to vibrations in the frequency-range of 5-1 Hz, with highest sensitivity in 2-3 Hz. Please refer to [24] or [25] for a comprehensive treatment of haptic perceptual mechanisms and psychophysics. Some of the earliest vibrotactile masking results were reported in [26]. Herein, the thenar eminence of the right hand was excited through a 2.9 cm 2 vibrating contactor. A narrowband noise centered at 275 Hz (thus located in the frequency region where vibrotactile detection is dominated by Pacinian receptors) acted as the masker. Pure sinusoidal tones of frequencies 15 (non-pacinian region), 5, 8, and 3 (all three in the Pacinian region) Hz acted as maskees. Detection threshold shifts for each of the maskees were obtained for masker levels ranging from -52 db SL (sensation level). It was found that, in general, threshold shifts rose in a monotonically increasing manner with increasing masker levels. Maskees located far away in frequency from the masker were found to have lower threshold shifts compared to the one close to it. Also, from the case of the 15 Hz maskee, it was shown that cross-channel masking did not occur in the duplex (Pacinian and non-pacinian) model for vibrotactile perception. In-channel and cross-channel masking situations with both sinusoidal and noise maskers were explored in [27]. For inchannel masking, for both Pacinian and non-pacinian regions, threshold shifts for the pure tone maskees increased with increasing masker levels. This was true irrespective of whether the masker was a pure tone or narrowband noise. It was once again shown that signals did not mask each other across channels, corroborating evidence for a critical band like structure in the tactile mechanoreceptor system, similar to that in audition. This conclusion was further consolidated in [28]. The influence of stimulus onset asynchrony (SOA) between the masker and the maskee on threshold shifts was studied in [29]. SOA was varied over a range of -1 to 15 ms, to generate backward, simultaneous, and forward masking. Again, both sinusoidal as well as narrowband noise signals were used as maskers and in-channel masking was explored for both Pacinian and non-pacinian channels separately. In general, higher threshold shifts were observed for simultaneous masking as compared to backward and forward masking. Most studies above have used relatively small localized areas on the human hand (index finger pad or the thenar eminence) to deliver their stimuli directly to the skin in a direction perpendicular to it. The haptic devices used today in teleoperation, however, commonly employ a stylus-like interface attached to a robotic arm, so that the user s movements can be easily mapped to that of the remote robot and appropriate reaction forces can be displayed during contact with remote objects. The human operator usually holds this stylus in a way similar to how we would hold a pen while writing (see Fig. 1). The rich array of haptic cues like shape, stiffness, friction, texture, etc. that we perceive through the tool in such a setup are displayed to a large area on the hand, tangential to the surface of the skin. To the best of our knowledge, no masking data are available for such toolmediated haptic display in the literature. Moreover, the range and number of masker as well as maskee frequencies tested in most masking studies is rather limited, hence a wideband masking threshold pattern cannot be derived from these results. Thus, none of the results summarized above or otherwise could be applied directly in our case to judge the perceptual quality of the synthesized vibrotactile texture signal with respect to the original one. To overcome the limitations mentioned above, we conduct new psychophysical experiments. In the following, we describe the vibrotactile stimuli used for these experiments, the signal processing chain used to generate the stimuli, the experiment method, and results. C. Stimuli The masker and maskee stimuli are displayed to the subjects through a K24E1 electrodynamic minishaker (The Modal Shop, Inc., USA). In order to avoid the beats phenomenon, which occurs when two pure tones slightly different in frequency are added together [3], we use narrowband noise (NBN) as the masker and pure tone sinusoids as maskees. The frequencies of the stimuli have been chosen to be in the range 8-38 Hz, which covers the sensitivity bandwidth of the pacinian mechanoreceptors. Three NBN center frequencies were spread out over this range - 12 Hz, 2 Hz, and 28 Hz. To generate narrowband noise, we employ a second order bandpass filter (BPF) given in (3) at the output of a white noise generator. The bandwidth of the BPF is controlled by the parameter α, and its center frequency by β. H BP (z) = 1 α 2 1 z 2 1 β(1 + α)z 1 + αz 2 (3) The NBN masker intensity was fixed at 25 db above the detection threshold. For every masker frequency f c, the maskee frequency was varied from (f c 3 f) to (f c +3 f) in steps of f (= 1%f c ), resulting in 7 maskees per masker. The duration of both the stimuli is fixed at 1 second, and both of them start and stop at the same time instant. Both stimuli are windowed with a Hanning window (symmetric 1 ms rise and fall times) to reduce transient effects. Five maskees centered around the 2 Hz masker were also tested for a masker intensity of 4 db above detection threshold. Fig. 6 shows the block diagram for generating the masker and maskee stimuli described above. In the perceptual tests, subjects determine the amplitude at which the maskee is detected in the presence of the masker. This particular maskee amplitude is the masking threshold for that masker-maskee combination. D. Signal processing chain a) Stimuli up-sampling: As mentioned before, the stimuli are generated at a low sampling rate of 8 Hz. Most precise vibrotactile perception investigations, however, prefer to deliver better quality stimuli at higher sampling rates of 8-1 khz. Another reason for a high sampling rate is to avoid the adverse effect of the sinc frequency roll-off due to the sample-and-hold nature of digital-to-analog converters. Copyright (c) 214 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by ing pubs-permissions@ieee.org.

The final version of record is available at http://dx.doi.org/1.119/jstsp.214.2374574 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO.

2 (1 ) (1 z ) 1 2 1 (1 ) z z 2 Masker ( m _ zeros ( n _ poles ) ) ( BW ), ( f center ) Psylab matlab software K24E1 minishaker a ( s a ) Analog RC filter for anti-imaging (1 e st ) st DAC sample and

6 The final version of record is available at IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 6 Experiment GUI A, f A sin( 2 * pi * f * t ) Cascaded tunable BPFs Maskee Simulink/RTAI/Ubuntu Linux Inverted (Minishaker + hand) Transfer function Bandlimited white noise ( 5 khz) 2 (1 ) (1 z ) (1 ) z z 2 Masker ( m _ zeros ( n _ poles ) ) ( BW ), ( f center ) Psylab matlab software K24E1 minishaker a ( s a ) Analog RC filter for anti-imaging (1 e st ) st DAC sample and hold ( b b z b 5 z Pre-equalizer for DAC 5 ) Fig. 6. Block schematic for generating the masker and maskee stimuli. b) Hand-device equalizer: To ensure that a user indeed feels stimuli as close to the ones we command, the non-flat hand-device system response was neutralized. The hand-device acceleration response was measured for a chirp input signal. The users maintained a consistent grip force by monitoring visual feedback from a FlexiForce R sensor. Matlab R functions were used to determine a linear transfer function based on this I/O data (see Fig. 7): H hd (s) = 3896 s s s + 2.9e4.2128s s The influence of these dynamics was canceled out by incorporating the inverse model into the signal processing chain. Fig. 7 shows the transfer function estimated for averaged data, which was collected two days before the psychophysical experiments began. Before each subject session during the experiments, new hand-device data was collected and if required, the parameters of the above model were tuned to fit the new data. c) Digital-to-Analog conversion: The signal processing chain shown in Fig. 6 is implemented in the form of a Phase [deg] Magnitude Fig. 7. Frequency response functions (FRFs) plotted for the hand-device input-output data recorded for five users. The thick black curves show the linear transfer function approximation for these FRFs. (4) Simulink block diagram. A realtime executable is generated using the RTAI realtime interface for the Linux target operating system running on a standard PC. The realtime process sends signal samples to the minishaker through the Digital-to- Analog (DAC) converter on a NI PCI-6221 Data Acquistion (DAQ) card. The time-domain sample-and-hold response of the DAC creates a sinc function roll-off in the frequency domain, which extends indefinitely on the frequency axis. This undesirable effect is alleviated to some extent, but not eliminated completely, by operating at an oversampled rate of 8 khz. Therefore, a 5-tap FIR pre-equalizer is used to further cancel out the sinc roll-off and to obtain a flat frequency response in the bandwidth of interest. E. Psychophysical experiments 1) Subjects: Five subjects participated in the pyschophysical experiments - 1 female, and 4 male. Their ages ranged from 24 to 29, with an average of 26 years. All of them were right handed. Four of them had participated in haptic psychophysical experiments before, and one was inexperienced. Their selfreported experience with haptic devices ranged from none to extensive. None of them reported having any ailments that would affect their sensorimotor performance. 2) Experimental setup: A custom-made stylus-like handle similar to that of the PHANToM Omni (Sensable, Inc., USA) haptic device was mounted on the K24E1 minishaker (see Fig. 8). The subjects were instructed to hold the stylus like a pen in their dominant hand in a standardized 3-finger grip. Their elbow and forearm rested on a wooden plank, which supported a neutral wrist position. When the stylus was held as instructed, the vibrations transmitted through the stylus were mostly tangential to the skin in contact. The subjects wore acoustic noise-canceling headphones (QuietComfort 15, Bose Corp., USA) that played pink noise to mask out auditory cues from the experimental apparatus. They interacted with the experiment s GUI through the keyboard. Copyright (c) 214 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by ing pubs-permissions@ieee.org.

The final version of record is available at http://dx.doi.org/1.119/jstsp.214.2374574 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 7 Fig. 8. Experimental setup.

7 The final version of record is available at IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 7 Fig. 8. Experimental setup. The minishaker device is shown as an inset. 3) Method: For determining the masking thresholds, we chose a standard psychophysical method similar to the one used for characterizing the force and position absolute detection thresholds in [31]. In the literature [32], this method is referred to as the three-alternative forced-choice (3AFC) 1-up 3-down (1U-3D) adaptive staircase method. Under this method, on each trial of the experiment (a given masker-maskee combination), the subject was presented with three consecutive stimulus intervals (3A) with interleaved pauses. The masker was present in each of the three stimulus intervals. However, the maskee stimulus was present only in one randomly-selected interval out of the three. The currently active stimulus interval was indicated visually on the GUI. At the end of the trial, the subject had to respond by indicating the one interval in which he detected the maskee. A new trial started after the subject s response. The stimulus duration was chosen to be 1s, since haptic adaptation to the stimulus is known to creep in after that. The pause duration was chosen as 6ms, so that haptic enhancement and summation effects [33] were alleviated. Haptic enhancement means the increase of the subjective magnitude of the second stimulus due to the presentation of the first one. Haptic summation, on the other hand, is the effect of increment in overall subjective magnitude of two successive stimuli. Each trial thus lasted about 5.2s, which falls within the constraint of several seconds imposed by the haptic working memory [34]. Three consecutive correct responses led to a reduction in the amplitude of the maskee (3D), so that it should become more difficult to detect. On the other hand, one incorrect response leads to an increase (1U), so that it should become easier to detect. Such a staircase thus guides the subject towards his perception threshold, at which he detects the maskee just as frequently as he does not. Thresholds obtained this way correspond to the 79.4 percentile point on the psychometric function, which is considered quite reliable in psychophysics [35]. The higher this percentile, the lower the chance that the subject guessed his way to the final threshold, so that it is not his real threshold at all. The Matlab-based Psylab library [36] implements this procedure, and was used to control the GUI as well as the stimulus parameters during the experiment. To make our experiment even more rigorous, each maskermaskee combination was evaluated by the subjects twice - once tracing an ascending staircase, and once while tracing a descending one (see Fig. 9). For the ascending staircase, the initial maskee amplitude was chosen to be well below the expected masking threshold level, so that it was not detectable at all at the beginning. On the other hand, for the descending staircase, the initial maskee amplitude was chosen to be well above the expected detection threshold level, so that the maskee was easily detectable at the beginning. The final threshold was calculated by averaging the thresholds obtained from the two staircases; such averaging is standard practice for staircase methods used in psychophysics [24]. The step size through which the intensity of the maskee was increased/decreased was initially set to 8 db for faster convergence towards the threshold. With each reversal of the staircase, it was halved until it reached a minimum of.5 db (see Fig. 9). The minimum step size of.5 db was found to be a good a tradeoff between the precision of the masking threshold determined and the experiment size. It stayed fixed at this value during the rest of the staircase. A staircase was terminated after three direction reversals (change of intensity from increasing to decreasing) occurred with the minimum step-size. The masking threshold was calculated to be the mean of all the staircase points across the last three reversals. Every staircase was inspected by the experimenter visually. If it seemed that the data had not converged, the subject repeated that staircase. Each staircase took about 1 minutes. To avoid fatigue, subjects took a break of 1-2 minutes after every staircase. In all, each subject underwent 3 maskers x 7 maskees per masker x 2 staircases per maskee = 42 trials. This amounted to about 7 hours per subject, excluding breaks. The whole experiment was scheduled over two weeks. Each subject performed his experiments over two days 6 disjoint sessions each day (corresponding to 3 maskers, 2 staircases each) with 7 trials (corresponding to 7 maskee tones) per session. F. Results Fig. 1 (a) shows the masking thresholds in comparison to the corresponding sine-detection thresholds. Fig. 1 (b) shows the average threshold shifts, whereas Fig. 11 shows Maskee intensity [db] trial number Fig. 9. Ascending and descending staircases corresponding to a maskermaskee combination. Copyright (c) 214 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by ing pubs-permissions@ieee.org.

8 The final version of record is available at IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 8 Thresholds [db re 1 m/s 2 ] Masking thresholds BPF responses Threshold shifts [DB re 1m/s 2 ] maskers 25 db above threshold ~15 db masker 4 db above threshold 64 Detection Detection thresholds thresholds (literature) (a) (b) Fig. 1. Results of the simultaneous masking experiments. (a) Sine-detection thresholds (with 95 % confidence intervals) are shown at the bottom of the figure, in comparison with those from [31]. The upper part shows the masking thresholds for 3 masker frequencies and 5 subjects, (b) Average threshold shifts (= masking thresholds - corr. sine detection thresholds), and the corresponding 95% confidence intervals. Threshold shifts [DB re 1m/s 2 ] masker 25 db above threshold Thresholds shifts [DB re 1m/s 2 ] masker 4 db above threshold masker 25 db above threshold Thresholds shifts [DB re 1m/s 2 ] masker 25 db above threshold (a) (b) (c) Fig. 11. Threshold shifts for sinusoids around the masker frequency at 12 Hz (a), at 2 Hz (b), and at 28 Hz (c) the threshold shifts for individual subjects for each of the maskers. It is obvious that the sine-detection thresholds shift upwards in the presence of the masker for each of the three maskers. It can also be seen that these masking thresholds are the highest when the sinusoid s frequency coincides with the masker center-frequency, and they roll-off gradually on both sides as the sinusoid s frequency goes away from the masker center-frequency. This phenomenon is very similar to the auditory simultaneous masking (see Fig. 5), and substantiates the masking assumption we made in the development of our haptic texture codec presented in [5]. From Fig. 1(b), it can be seen that for a masker intensity of 25 db above threshold, the central maskee has an approximately 25 db threshold shift up for all maskers. For a 2 Hz masker intensity of 4 db above threshold, the threshold shifts are all approximately 15 db above the 25 db results. Thus, the 4 db threshold shifts are higher proportionally w.r.t. the masker intensity, while appearing more or less parallel to the 25 db ones in the tested frequency range. From the figure, it can be observed that while the masking thresholds corresponding to the 2 Hz masker are more or less symmetric, the ones corresponding to the 12 Hz and 28 Hz maskers are less so. This asymmetry may be attributed to the asymmetric behavior of the band-pass filter that is used to generate the masker stimuli. IV. BITRATE SCALABILITY OF THE TEXTURE CODEC The haptic texture codec from [5] which is described in Section IV has a constant output bitrate of 3.55 kbps. In this section, we present work towards turning this codec bitratescalable, enabling us to operate in a range of different bitrates under varied network traffic conditions and quality demands. The LPC coeffcients in the codec and the fixed-codebook (FCB) excitation codevector account for 18 bits (.9 kbps) and 26 bits (1.3 kbps) per frame, repectively, out of the total 71 bits per frame (3.55 kbps). Hence, we will target these two parameters for bitrate reduction, since they contribute more than 5% of the total bitrate. A. Scalable Vector-Quantization (VQ) for the LPC coefficients A scalable two-stage split VQ has been implemented for the 1-coefficient LPC vectors. Stage 1 coarsely quantizes the Copyright (c) 214 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by ing pubs-permissions@ieee.org.

9 The final version of record is available at IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 9 TABLE I LPC BIT ALLOCATION. LPC bitrate (bits/frame) Scheme1 (1 st, 2 nd low, 2nd high ) Scheme2 (1 st, 2 nd low, 2nd high ) 17 (7,5,5) (7,5,5) 16 (7,5,4) (7,5,4) 15 (7,4,4) (7,5,3) 14 (7,4,3) (7,5,2) 13 (7,3,3) (7,5,1) 12 (7,3,2) (7,5,) 11 (7,2,2) (7,4,) 1 (7,2,1) (7,3,) 9 (7,1,1) (7,2,) 8 (7,1,) (7,1,) 7 (7,,) (7,,) 6 (6,,) (6,,) 5 (5,,) (5,,) 4 (4,,) (4,,) 3 (3,,) (3,,) 2 (2,,) (2,,) 1 (1,,) (1,,) (,,) (,,) entire input vector. Stage 2 separately and finely quantizes the lower and upper halves of the quantization error vector from stage 1. Such splitting reduces the memory requirements for the same number of total allocated bits. We use Mean- Square Error as the error criterion to minimize during the codebook searches. VQ codebooks corresponding to different VQ resolutions have been generated from a pre-recorded texture dataset. Anywhere from 17 to bits can be allocated to the LPC VQ (see Table I) to have bitrate scalability. There are multiple ways in which bits can be allocated to the various stages of the VQ, based on the total available number of bits. Scheme 1 shares the available bits (almost) equally between the lower and higher splits, whereas Scheme 2 allocates more bits to the lower-split, thus favoring lower frequencies over higher ones. B. Fixed-codebook (FCB) The texture codec contains a fixed excitation codebook holding codevectors that serve as input to the LPC synthesis filter. A specially structured algebraic codebook is used as the FCB. It is made up of four single pulse interleaved permutation codes (4 tracks). The key advantage of an algebraic codebook is that there is no need to store the codebook. It also enables the use of large codebooks and offers good texture quality. At the encoder, the fixed codebook is searched to determine the best codevector for a particular texture subframe. 1) 4-Pulse FCB Structure: Each codevector of length 2 samples (the length of a subframe), which is selected using the algebraic codebook structure, contains four nonzero pulses. Each pulse can have the amplitude of either +1 or -1 and can assume the positions indicated in the table below: Pulse Sign Positions p s : ±1 m :, 5, 1, 15 p 1 s 1 : ±1 m 1 : 1, 6, 11, 16 p 2 s 2 : ±1 m 2 : 2, 7, 12, 17 p 3 s 3 : ±1 m 3 : 3, 8, 13, 18, 4, 9, 14, 19 The excitation codevector v[n] is constructed by summing the four pulses as follows: 3 3 v[n] = p i [n] = s i δ[n m i ]; n =,..., 19 (5) i= i= Hence, each pulse requires 1 bit per sign, 2 bits for positions p, p 1 and p 2, and 3 bits for position p 3. Thus, a total of 13 bits are required to index the entire codebook for each subframe in the reference texture codec (4 bits for signs and 9 bits for the positions). Since a frame consists of two subframes, 26 bits are necessary to encode each frame. 2) 2-Pulse FCB Structure: Here, instead of 4-pulses per subframe, we encode only two pulses per subframe. The two signed pulses are searched in two overlapping tracks. The track table is given in the table below. The search of pulse positions is an exhaustive, but computationally efficient, search over all vectors. Like the previous case, pulses can have the amplitude of either +1 or -1. Thus, 2 bits are required for the sign index. For the position p, 3 bits are needed and for position p 1, 4 bits. Hence, a total of 9 bits are required to index the entire codebook for a subframe, and 18 bits for a frame. Pulse Sign Positions p s : ±1 m : 1, 3, 6, 8, 11, 13, 16, 18 p 1 s 1 : ±1 m 1 :, 1, 2, 4, 5, 6, 7, 9, 1 11, 12, 14, 15, 16, 17, 19 V. CODEC PERFORMANCE EVALUATION A. Objective evaluation of the bitrate-scalable codec There is a lack of reliable and robust objective quality metrics for wideband vibrotactile signals in the literature. Hence, we employ log-spectral distortion and segmental SNR metrics to give an idea of the monotonic change in distortion as the codec bitrate is scaled down. The LPC VQ performance is evaluated in the frequencydomain with the standard average log-spectral distortion (SD): SD(i)[dB] = 1 f h f l fh f l [ 1 log 1 S i (f) Ŝ i (f) ] 2 df (6) where f l = 4Hz and f h = 45Hz are the lower and upper frequency limits, respectively, corresponding to the sensitivity bandwidth of mechanoreceptors in the human hand, S i (f) and Ŝi(f) denote LP power spectra for the ith texture frame corresponding to the unquantized and quantized LPC polynomials, respectively. Fig. 12 shows the averaged SD results across all texture signals in our test dataset. Due to the non-existence of haptic perceptual thresholds for spectral distortion in literature, it is impossible to comment on the significance of a given SD value at this point. However, the figure serves as an illustration of how SD decreases monotonically as the LPC bitrate increases. In the time-domain, we employ the segmental signal-tonoise ratio metric (also used in [37]): SSNR(dB) = 1 N f 1 N f i= log 1 ( Ns 1 j= s 2 (k) Ns 1 j= [s(k) ŝ(k)]2 ) (7) Copyright (c) 214 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by ing pubs-permissions@ieee.org.

This is the author s version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

2 SDavg (db) 1 Scheme1 Scheme2.8 (a) (b) (d) (e) (c).6.4 5 1 15 LPC bit rate (bits/frame) Fig. 12.

where k = (i Ns +j), Nf is the humber of frames in a texture signal, Ns = 4 is the number of samples per frame, s(n) and s (n) are the codec input and output signals, respectively. Fig.

Subjective evaluation of the bitrate scalable codec 1) Subjects: Seven male subjects from the university in the age-range of 21 to 44 years participated in the experiment designed to evaluate the

All of them were right handed. Two had participated in haptic psychophysical experiments before, while the rest were inexperienced. 2) Textures: Five different textures shown in Fig.

10 This is the author s version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX SDavg (db) 1 Scheme1 Scheme2.8 (a) (b) (d) (e) (c) LPC bit rate (bits/frame) Fig. 12. Average spectral distortion for the LPC VQ, evaluated for the two bit allocation schemes over the LPC bitrates ranging from 17 to bits per frame. where k = (i Ns +j), Nf is the humber of frames in a texture signal, Ns = 4 is the number of samples per frame, s(n) and s (n) are the codec input and output signals, respectively. Fig. 13 shows the monotonic rise of the SSNR for a 2-pulse codebook. Similar curves were obtained for the 4-pulse fixed codebook, but with higher SSNR values in general than the 2-pulse fixed codebook. B. Subjective evaluation of the bitrate scalable codec 1) Subjects: Seven male subjects from the university in the age-range of 21 to 44 years participated in the experiment designed to evaluate the perceptual performance of the codec. None of them reported having any ailments that would affect their sensorimotor performance. Their experience with haptic devices ranged from none to extensive. All of them were right handed. Two had participated in haptic psychophysical experiments before, while the rest were inexperienced. 2) Textures: Five different textures shown in Fig. 14 were used to perform the experiment. Textures (a), (b), (c) in the figure were used before in generating the training set to design the codebooks of the VQ encoder, while textures (d), (e) were not used previously in this work. 3) Method: Pilot testing was conducted with five subjects to get an informal first impression of the codec quality, before a formal subjective experiment was performed using a standard psychophysical method. Subjects explored each of the five textures with the teleoperation setup shown in Fig. 1. The visual of the remote side was blocked with a screen, and pink noise played into the active headphones that the subjects wore masked any noise from the remote side. This was done so that 7 SSNR (db) Scheme1 Scheme LPC bitrate (bits/frame) Fig. 13. SSNR when a 2-pulse FCB is used, plotted against LPC bitrate. Fig. 14. Textures used for haptic interaction. (a) quasisinusoidal aluminium grating, (b) random-structured plastic, (c) random-structured fine vinyl, (d) peaky metal case, (d) circular sinsuoidal plastic grating. the subjects judged the stimuli based only on haptic feedback. During exploration, sometimes the codec was turned OFF, and at other times, it was turn ON and set to the maximum compression strength. When interviewed, the subjects reported that while none of them could perceive any distortion introduced by the codec, all of them could clearly distinguish between different textures. With this outcome, we designed a simple psychophysical experiment based on the classical psychophysical method of 1-up-1-down (1U-1D) adaptive staircases [35]. The need for a higher order adaptive staircase method was not peceived due to the clear impressions gotten from the pilot tests above. The stimuli used for the experiment consisted of the codec highest rate setting (3.55 kbps), the lowest rate setting ( kbps), and several settings corresponding to compression levels in between. A total of 26 stimuli corresponding to 26 different bitrate settings (5 bps steps) were chosen by making various combinations of the LPC and the FCB bit allocation. In each experimental run, the subject compared two settings of the codec while exploring a remote texture, judging them to feel the same or different. One of the two settings was fixed to be the best case bitrate or the reference (3.55 kbps), while the other comparison bitrate corresponded to one of the remaining 25 settings. To begin with, the comparison bitrate was set to the lowest value of kbps. Depending upon whether the subject responded with a same or a different, the comparison bitrate was decreased or increased, respectively, through a step-size of 5 bps. The adaptive staircase thus guided the subject towards a perceptual threshold, where he could just perceive a difference between the two stimuli. Note that if the subject responds with same when the current bitrate setting is kbps or with different when it is 3.55 kbps, the current bitrate setting will only be repeated, since it is not possible currently to go beyond these settings. An experimental run was terminated when the staircase had undergone three reversals (up to down) of direction. It was also terminated if the comparison bitrate got saturated to the highest bitrate of 3.55 kbps or the lowest bitrate of kbps over the past three steps of the staircase. In case of reversalbased termination, the threshold bitrate was calculated as the mean of all the staircase points across the reversals, while for Copyright (c) 214 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by ing pubs-permissions@ieee.org.

11 The final version of record is available at IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX 11 saturation-based termination, the threshold bitrate was selected to be the saturation value. 4) Results: Fig. 15 shows staircase patterns for all 7 subjects for Texture (a) from Fig. 14. It can be seen that the top six subjects could not reliably distinguish between the reference bitrate stimulus and the lowest comparison bitrate stimulus of kbps, while the last subject flipped between and 5 kbps three times in a row. In total 35 experimental runs (7 subjects and 5 textures) were performed. Thirty-one of them showed that the subjects could not distinguish between the reference bitrate (3.55 kbps) and highest compression rate ( kbps). All these 31 runs ended after receiving three same responses from the subjects. The remaining 4 runs stopped after six reversals of direction between and 5 kbps, giving a threshold of 25 kbps. Overall, these experiments show that subjects could not reliably detect coding distortion even at the lowest bitrate currently possible in our realtime teleoperation setup. VI. CONCLUSION We have presented psychophysical experiments aimed at showing the existence of a wideband simultaneous masking phenomenon for vibrotactile perception. The results provide substance to our assumption of perceptual masking used in the development of the haptic texture codec [5]. In addition, we believe that they fill an important gap in the science of vibrotactile perception. Future work in this area may address the characterization of the shape of the masking curves in the presence of more than one wideband maskers. This is an interesting direction for further research, since the presence of multiple frequency-domain peaks in a haptic texture signal is a common occurrence in a real-world texture system. Secondly, we have introduced a bitrate-scalable version of the constant bitrate codec from [5], and evaluated its performance. Our results show that it is possible to drive down the codec output bitrate to as low as kbps without the subjects being able to reliable discriminate between the codec input and distorted output texture signals. bitrate (kbps) 5 subject 1 5 subject 2 subject subject 4 subject subject subject 7 trial number Fig. 15. Staircase patterns for all 7 subjects for Texture (a) in Fig. 14. This paper studies the deterioration of perceptual performance for a given texture as the coding distortion rises. However, in the future it is also necessary to answer the following question - can subjects still discriminate between two different textures as the coding distortion rises? Such a task performance evaluation of the codec will give a more comprehensive picture of the codec performance. Audio-video research also usually progresses along the same lines. Future work will also address the further reduction of bitrate in order to observe how the perceived quality degrades as a function of the bitrate. Furthermore, objective quality metrics tailored to vibrotactile perception should be developed to be able to algorthmically predict perceived quality variations. ACKNOWLEDGMENT This work has been supported, in part, by the German Research Foundation (DFG) under the project STE 193/4-2 and, in part, by the European Research Council under the European Union s Seventh Framework Programme (FP7/27-213)/ERC Grant agreement no REFERENCES [1] R. Iglesias, S. Casado, T. Gutiérrez, A. García-Alonso, W. Yu, and A. Marshall, Simultaneous remote haptic collaboration for assembling tasks, Multimedia Systems, vol. 13, no. 4, pp , October 27. [2] B. Chebbi, D. Lazaroff, F. Bogsany, P. Liu, L. Niy, and M. Rossi, Design and implementation of a collaborative virtual haptic surgical training system, Mechatronics and Automation, IEEE International Conference on, vol. 1, pp , Jul. 25. [3] W. R. Ferrell and T. B. Sheridan, Supervisory control of remote manipulation, IEEE Spectrum, vol. 4, no. 1, pp , Oct [4] B. Cizmeci, R. Chaudhari, X. Xu, N. Alt, and E. Steinbach, A visual-haptic multiplexing scheme for teleoperation over constant-bitrate communication links, in Haptics: Neuroscience, Devices, Modeling, and Applications, ser. Lecture Notes in Computer Science, M. Auvray and C. Duriez, Eds. Springer Berlin Heidelberg, 214, pp [Online]. Available: 17 [5] R. Chaudhari, B. Cizmeci, K. J. Kuchenbecker, S. Choi, and E. Steinbach, Low bitrate source-filter model based compression of vibrotactile texture signals in haptic teleoperation, Proceedings of the 2th ACM International Conference on Multimedia, pp , Nov [Online]. Available: [6] K. J. Kuchenbecker, J. P. Fiene, and G. Niemeyer, Improving contact realism through event-based haptic feedback, IEEE Trans. on Visualization and Computer Graphics, vol. 12, no. 2, pp , Mar. 26. [7] W. McMahan, J. M. Romano, A. M. A. Rahuman, and K. J. Kuchenbecker, High frequency acceleration feedback significantly increases the realism of haptically rendered textured surfaces, Haptics Symposium, 21 IEEE, pp , Mar. 21. [8] D. A. Kontarinis and R. D. Howe, Tactile display of vibratory information in teleoperation and virtual environments, Presence, vol. 4, pp , [9] W. McMahan, J. Gewirtz, D. Standish, P. Martin, J. A. Kunkel, M. Lilavois, A. Wedmid, D. I. Lee, and K. J. Kuchenbecker, Tool contact acceleration feedback for telerobotic surgery, IEEE Trans. on Haptics, vol. 4, no. 3, pp , Jul [1] J. Fishel, V. Santos, and G. Loeb, A robust micro-vibration sensor for biomimetic fingertips, Biomedical Robotics and Biomechatronics, 28. 2nd IEEE RAS International Conference on, pp , Oct. 28. [11] M. Strohmayr, H. Worn, and G. Hirzinger, The DLR artificial skin step I: Uniting sensitivity and collision tolerance, Robotics and Automation (ICRA), 213 IEEE Int. Conf. on, pp , May 213. [12] P. Hinterseer, S. Hirche, S. Chaudhuri, E. Steinbach, and M. Buss, Perception-based data reduction and transmission of haptic data in telepresence and teleaction systems, IEEE Trans. on Signal Processing, vol. 56, no. 2, pp , Feb. 28. [13] E. Steinbach, S. Hirche, M. Ernst, F. Brandi, R. Chaudhari, J. Kammerl, and I. Vittorias, Haptic communications, Proceedings of the IEEE, vol. 1, no. 4, pp , 212. Copyright (c) 214 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by ing pubs-permissions@ieee.org.

This is the author s version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

Yamada, Perceptual properties of vibrotactile material texture: Effects of amplitude changes and stimuli beneath detection thresholds, IEEE/SICE Int. Symp. on System Integration, pp. 384 389, Dec. 21.

Thyssen, Analysis-by-Synthesis speech coding. In J. Benesty, M. M. Sondhi, Y. Huang (Eds.), Springer Handbook of Speech Processing, Springer Berlin-Heidelberg, 28. [17] W. McMahan and K.

IEEE/RSJ International Conference on, Oct 29, pp. 317 3177. [18] A. El Saddik, M. Orozco, M. Eid, and J. Cha, Haptics Technologies. Springer Verlag, 211, ch. 5. Computer Haptics. [19] M. Danaei.

Schafer, Introduction to Digital Speech Processing, ser. Foundations and Trends in Signal Processing. Now Publishers, 27, vol. 1, no. 1 2. [21] R. Feldtkeller and E.

12 This is the author s version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. XX, NO. X, DECEMBER XXXX [14] S. Okamoto and Y. Yamada, Perceptual properties of vibrotactile material texture: Effects of amplitude changes and stimuli beneath detection thresholds, IEEE/SICE Int. Symp. on System Integration, pp , Dec. 21. [15] J. Craig, Difference threshold for intensity of tactile stimuli, Attention, Perception & Psychophysics, vol. 11, pp , [16] J. Chen and J. Thyssen, Analysis-by-Synthesis speech coding. In J. Benesty, M. M. Sondhi, Y. Huang (Eds.), Springer Handbook of Speech Processing, Springer Berlin-Heidelberg, 28. [17] W. McMahan and K. Kuchenbecker, Haptic display of realistic tool contact via dynamically compensated control of a dedicated actuator, in Intelligent Robots and Systems, 29. IROS 29. IEEE/RSJ International Conference on, Oct 29, pp [18] A. El Saddik, M. Orozco, M. Eid, and J. Cha, Haptics Technologies. Springer Verlag, 211, ch. 5. Computer Haptics. [19] M. Danaei. (213) Simulink model for a celpbased haptic texture codec. [Online]. Available: [2] L. Rabiner and R. Schafer, Introduction to Digital Speech Processing, ser. Foundations and Trends in Signal Processing. Now Publishers, 27, vol. 1, no [21] R. Feldtkeller and E. Zwicker, Das Ohr als Nachrichtenempfa nger. S. Hirzel Verlag Stuttgart (Monographien der elektrischen Nachrichtentechnik), [22] J. P. Egan and H. W. Hake, On the masking pattern of a simple auditory stimulus, J. Acoust. Soc. Am., vol. 22, no. 5, pp , 195. [23] M. R. Schroeder, B. S. Atal, and J. L. Hall, Optimizing digital speech coders by exploiting masking properties of the human ear, J. Acoust. Soc. Am., vol. 66, no. 6, Dec [24] G. A. Gescheider, Psychophysics: the fundamentals. Psychology Press, 213. [25] L. A. Jones and S. J. Lederman, Human hand function. Oxford University Press, 26. [26] G. A. Gescheider, R. T. Verrillo, and C. L. V. Doren, Prediction of vibrotactile masking functions, J. Acoust. Soc. Am., vol. 72, no. 5, Nov [27] R. D. Hamer, R. T. Verrillo, and J. J. Zwislocki, Vibrotactile masking of pacinian and non-pacinian channels, J. Acoust. Soc. Am., vol. 73, no. 4, pp , Apr [28] J. Makous, R. Friedman, and C. J. Vierck, A critical band filter in touch, The Journal of Neuroscience, vol. 15, no. 4, pp , [29] G. Gescheider, J. S.J. Bolanowski, and R. Verrillo, Vibrotactile masking: Effects of stimulus onset asynchrony and stimulus frequency, J. Acoust. Soc. Am., vol. 85, no. 5, pp , May [3] S.-C. Lim, K.-U. Kyung, and D.-S. Kwon, Effect of frequency difference on sensitivity of beats perception, Experimental Brain Research, vol. 216, pp , 212. [31] A. Israr, S. Choi, and H. Z. Tan, Mechanical impedance of the hand holding a spherical tool at threshold and suprathreshold stimulation levels, In Proc. of Symp. on Haptic Interfaces for Virtual Environment and Teleoperator Systems.World Haptics 7, pp. 56 6, 27. [32] L. Jones and H. Tan, Application of psychophysical techniques to haptic research, Haptics, IEEE Trans. on, vol. 6, no. 3, pp , Jul [33] R. T. Verrillo and G. A. Gescheider, Enhancement and summation in the perception of two successive vibrotactile stimuli, Perception & Psychophysics, vol. 8, no. 2, pp , Mar [34] A. L. Kaas, M. C. Stoeckel, and R. Goebel, The neural bases of haptic working memory. In Human Haptic Perception: Basics and Applications, Springer Verlag, 28. [35] H. Levitt, Transformed updown methods in psychoacoustics, J. Acoust. Soc. Am., vol. 49, no. 2B, pp , [Online]. Available: [36] M. Hansen, Lehre und Ausbildung in Psychoakustik mit psylab: freie Software fu r psychoakustische Experimente, Fortschritte der Akustik DAGA 6, pp , 26. [37] W. McMahan and K. Kuchenbecker, Spectral subtraction of robot motion noise for improved event detection in tactile acceleration signals, in Haptics: Perception, Devices, Mobility, and Communication, ser. Lecture 12 Notes in Computer Science, P. Isokoski and J. Springare, Eds. Springer Berlin Heidelberg, 212, vol. 7282, pp Rahul Chaudhari received a master s degree in Communication Systems from the Technische Universita t Mu nchen (TUM), Germany in 29. His master s thesis focused on haptic signal processing in particular, compression/reduction of data for kinesthetic haptic communication. He received an undergraduate degree (B. Eng.) in Electronics and Telecommunications from the University of Pune, India, graduating in June 26 as the top student in class. In January 21, he joined the Chair of Media Technology at TUM, where he is now working as a member of the research and teaching staff. His current research focuses on perceptually transparent compression of haptic (vibrotactile) texture signals, and the objective evaluation of the perceptual quality of compressed haptic signals. Clemens Schuwerk studied electrical engineering at the Technische Universita t Mu nchen (TUM), Germany, and the University of Edinburgh, Scotland. He received the degree Dipl.-Ing. (Univ) in Electrical Engineering from TUM in November 21. He joined the Chair of Media Technology, TUM, in March 211, where he is currently a member of the research staff. His current research interests lie in the area of haptic data communication and signal processing, especially in Shared Haptic Virtual Environments. Mojtaba Danaei received a master s degree in Communication Systems from the Technische Universita t Mu nchen in 213. For his master s thesis, he worked on the topic of optimization of a haptic texture codec. He obtained a B.Sc. degree in Electrical and Electronics Engineering from the Eastern Mediterranean University in 211. Eckehard Steinbach (SM 8) studied electrical engineering at the University of Karlsruhe, Karlsruhe, Germany, the University of Essex, Essex, U.K., and ESIEE, Paris, France. He received the Engineering Doctorate from the University of ErlangenNuremberg, Germany, in From 1994 to 2, he was a member of the research staff of the Image Communication Group, University of ErlangenNuremberg. From February 2 to December 21, he was a Postdoctoral Fellow with the Information Systems Laboratory, Stanford University, Stanford, CA. In February 22, he joined the Department of Electrical Engineering and Information Technology, Technische Universita t Mu nchen, Munich, Germany, where he is currently a Full Professor for Media Technology. His research interests are in the area of audiovisual-haptic information processing and communication as well as networked and interactive multimedia systems. Copyright (c) 214 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by ing pubs-permissions@ieee.org.

Similar documents

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

Nonuniform multi level crossing for signal reconstruction

6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

A Pilot Study: Introduction of Time-domain Segment to Intensity-based Perception Model of High-frequency Vibration

A Pilot Study: Introduction of Time-domain Segment to Intensity-based Perception Model of High-frequency Vibration Nan Cao, Hikaru Nagano, Masashi Konyo, Shogo Okamoto 2 and Satoshi Tadokoro Graduate School

More information

Discrimination of Virtual Haptic Textures Rendered with Different Update Rates

Discrimination of Virtual Haptic Textures Rendered with Different Update Rates Seungmoon Choi and Hong Z. Tan Haptic Interface Research Laboratory Purdue University 465 Northwestern Avenue West Lafayette,

More information

From Encoding Sound to Encoding Touch

From Encoding Sound to Encoding Touch Toktam Mahmoodi King s College London, UK http://www.ctr.kcl.ac.uk/toktam/index.htm ETSI STQ Workshop, May 2017 Immersing a person into the real environment with Very

More information

Auditory modelling for speech processing in the perceptual domain

ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information

EE482: Digital Signal Processing Applications

Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC Jimmy Lapierre 1, Roch Lefebvre 1, Bruno Bessette 1, Vladimir Malenovsky 1, Redwan Salami 2 1 Université de Sherbrooke, Sherbrooke (Québec),

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 MODELING SPECTRAL AND TEMPORAL MASKING IN THE HUMAN AUDITORY SYSTEM PACS: 43.66.Ba, 43.66.Dc Dau, Torsten; Jepsen, Morten L.; Ewert,

More information

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

9th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, -7 SEPTEMBER 7 A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL PACS: PACS:. Pn Nicolas Le Goff ; Armin Kohlrausch ; Jeroen

More information

Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders

Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders Václav Eksler, Bruno Bessette, Milan Jelínek, Tommy Vaillancourt University of Sherbrooke, VoiceAge Corporation Montreal, QC,

More information

FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE

APPLICATION NOTE AN22 FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE This application note covers engineering details behind the latency of MEMS microphones. Major components of

More information

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.

Tone-in-noise detection: Observed discrepancies in spectral integration Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O. Box 513, NL-5600 MB Eindhoven, The Netherlands Armin Kohlrausch b) and

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb 2008. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum,

More information

The role of intrinsic masker fluctuations on the spectral spread of masking

The role of intrinsic masker fluctuations on the spectral spread of masking Steven van de Par Philips Research, Prof. Holstlaan 4, 5656 AA Eindhoven, The Netherlands, Steven.van.de.Par@philips.com, Armin

More information

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner. Perception of pitch AUDL4007: 11 Feb 2010. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum, 2005 Chapter 7 1 Definitions

More information

Results of Egan and Hake using a single sinusoidal masker [reprinted with permission from J. Acoust. Soc. Am. 22, 622 (1950)].

Results of Egan and Hake using a single sinusoidal masker [reprinted with permission from J. Acoust. Soc. Am. 22, 622 (1950)]. XVI. SIGNAL DETECTION BY HUMAN OBSERVERS Prof. J. A. Swets Prof. D. M. Green Linda E. Branneman P. D. Donahue Susan T. Sewall A. MASKING WITH TWO CONTINUOUS TONES One of the earliest studies in the modern

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb 2009. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence

More information

You know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels

AUDL 47 Auditory Perception You know about adding up waves, e.g. from two loudspeakers Week 2½ Mathematical prelude: Adding up levels 2 But how do you get the total rms from the rms values of two signals

More information

Enhancing 3D Audio Using Blind Bandwidth Extension

Enhancing 3D Audio Using Blind Bandwidth Extension (PREPRINT) Tim Habigt, Marko Ðurković, Martin Rothbucher, and Klaus Diepold Institute for Data Processing, Technische Universität München, 829 München,

More information

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile 8 2. LITERATURE SURVEY The available radio spectrum for the wireless radio communication is very limited hence to accommodate maximum number of users the speech is compressed. The speech compression techniques

More information

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

Psycho-acoustics (Sound characteristics, Masking, and Loudness) Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University Mar. 20, 2008 Pure tones Mathematics of the pure

More information

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec Akira Nishimura 1 1 Department of Media and Cultural Studies, Tokyo University of Information Sciences,

More information

UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik

UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik Department of Electrical and Computer Engineering, The University of Texas at Austin,

More information

A Tactile Display using Ultrasound Linear Phased Array

A Tactile Display using Ultrasound Linear Phased Array Takayuki Iwamoto and Hiroyuki Shinoda Graduate School of Information Science and Technology The University of Tokyo 7-3-, Bunkyo-ku, Hongo, Tokyo,

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Binaural Hearing. Reading: Yost Ch. 12

Binaural Hearing. Reading: Yost Ch. 12 Binaural Hearing Reading: Yost Ch. 12 Binaural Advantages Sounds in our environment are usually complex, and occur either simultaneously or close together in time. Studies have shown that the ability to

More information

Sound Synthesis Methods

Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

5G Tactile Internet Lab King s

5G Tactile Internet Lab King s 5G Tactile Internet Lab Experimentation @ King s Mischa Dohler Fellow, IEEE & Royal Society of Arts Director, Centre for Telecom Research Chair Professor, King's College London Cofounder, Worldsensing

More information

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback PURPOSE This lab will introduce you to the laboratory equipment and the software that allows you to link your computer to the hardware.

More information

THE PERCEPTION OF ALL-PASS COMPONENTS IN TRANSFER FUNCTIONS

PACS Reference: 43.66.Pn THE PERCEPTION OF ALL-PASS COMPONENTS IN TRANSFER FUNCTIONS Pauli Minnaar; Jan Plogsties; Søren Krarup Olesen; Flemming Christensen; Henrik Møller Department of Acoustics Aalborg

More information

Force versus Frequency Figure 1.

Force versus Frequency Figure 1. An important trend in the audio industry is a new class of devices that produce tactile sound. The term tactile sound appears to be a contradiction of terms, in that our concept of sound relates to information

More information

A Digital Signal Processor for Musicians and Audiophiles Published on Monday, 09 February :54

A Digital Signal Processor for Musicians and Audiophiles Published on Monday, 09 February 2009 09:54 The main focus of hearing aid research and development has been on the use of hearing aids to improve

More information

Speech, Hearing and Language: work in progress. Volume 12

Speech, Hearing and Language: work in progress Volume 12 2 Construction of a rotary vibrator and its application in human tactile communication Abbas HAYDARI and Stuart ROSEN Department of Phonetics and

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat Audio Transmission Technology for Multi-point Mobile Voice Chat Voice Chat Multi-channel Coding Binaural Signal Processing Audio Transmission Technology for Multi-point Mobile Voice Chat We have developed

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

AUDITORY ILLUSIONS & LAB REPORT FORM

01/02 Illusions - 1 AUDITORY ILLUSIONS & LAB REPORT FORM NAME: DATE: PARTNER(S): The objective of this experiment is: To understand concepts such as beats, localization, masking, and musical effects. APPARATUS:

More information

EXPERIMENTAL INVESTIGATION INTO THE OPTIMAL USE OF DITHER

EXPERIMENTAL INVESTIGATION INTO THE OPTIMAL USE OF DITHER PACS: 43.60.Cg Preben Kvist 1, Karsten Bo Rasmussen 2, Torben Poulsen 1 1 Acoustic Technology, Ørsted DTU, Technical University of Denmark DK-2800

More information

Communications Theory and Engineering

Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

TOUCH screens are an indispensable part of our lives.

TOUCH screens are an indispensable part of our lives. JOURNAL OF L A T E X CLASS FILES, VOL., NO., 218 1 Tactile Masking by Electrovibration Yasemin Vardar, Member, IEEE, Burak Güçlü, and Cagatay Basdogan, Member, IEEE Abstract Future touch screen applications

More information

APPLICATIONS OF DSP OBJECTIVES

APPLICATIONS OF DSP OBJECTIVES This lecture will discuss the following: Introduce analog and digital waveform coding Introduce Pulse Coded Modulation Consider speech-coding principles Introduce the channel

More information

APPLICATION NOTE 3942 Optimize the Buffer Amplifier/ADC Connection

APPLICATION NOTE 3942 Optimize the Buffer Amplifier/ADC Connection Maxim > Design Support > Technical Documents > Application Notes > Communications Circuits > APP 3942 Maxim > Design Support > Technical Documents > Application Notes > High-Speed Interconnect > APP 3942

More information

Audio and Speech Compression Using DCT and DWT Techniques

Audio and Speech Compression Using DCT and DWT Techniques M. V. Patil 1, Apoorva Gupta 2, Ankita Varma 3, Shikhar Salil 4 Asst. Professor, Dept.of Elex, Bharati Vidyapeeth Univ.Coll.of Engg, Pune, Maharashtra,

More information

Dimensional Reduction of High-Frequency Accelerations for Haptic Rendering

Dimensional Reduction of High-Frequency Accelerations for Haptic Rendering Nils Landin, Joseph M. Romano, William McMahan, and Katherine J. Kuchenbecker KTH Royal Institute of Technology, Stockholm, Sweden

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Haptic Communication for the Tactile Internet

Haptic Communication for the Tactile Internet Technical University of Munich (TUM) Chair of Media Technology European Wireless, EW 17 Dresden, May 17, 2017 Telepresence Network audiovisual communication Although conversational services are bidirectional,

More information

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

More information

Vibrotactile Apparent Movement by DC Motors and Voice-coil Tactors

Vibrotactile Apparent Movement by DC Motors and Voice-coil Tactors Masataka Niwa 1,2, Yasuyuki Yanagida 1, Haruo Noma 1, Kenichi Hosaka 1, and Yuichiro Kume 3,1 1 ATR Media Information Science Laboratories

More information

Michael F. Toner, et. al.. "Distortion Measurement." Copyright 2000 CRC Press LLC. <

Michael F. Toner, et. al.. Distortion Measurement. Copyright 2000 CRC Press LLC. < Michael F. Toner, et. al.. "Distortion Measurement." Copyright CRC Press LLC. . Distortion Measurement Michael F. Toner Nortel Networks Gordon W. Roberts McGill University 53.1

More information

Synthesis Techniques. Juan P Bello

Synthesis Techniques. Juan P Bello Synthesis Techniques Juan P Bello Synthesis It implies the artificial construction of a complex body by combining its elements. Complex body: acoustic signal (sound) Elements: parameters and/or basic signals

More information

III. Publication III. c 2005 Toni Hirvonen.

III. Publication III. c 2005 Toni Hirvonen. III Publication III Hirvonen, T., Segregation of Two Simultaneously Arriving Narrowband Noise Signals as a Function of Spatial and Frequency Separation, in Proceedings of th International Conference on

More information

Dimensional Reduction of High-Frequencey Accelerations for Haptic Rendering

University of Pennsylvania ScholarlyCommons Departmental Papers (MEAM) Department of Mechanical Engineering & Applied Mechanics 7-2010 Dimensional Reduction of High-Frequencey Accelerations for Haptic

More information

COM325 Computer Speech and Hearing

COM325 Computer Speech and Hearing Part III : Theories and Models of Pitch Perception Dr. Guy Brown Room 145 Regent Court Department of Computer Science University of Sheffield Email: g.brown@dcs.shef.ac.uk

More information

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP Monika S.Yadav Vidarbha Institute of Technology Rashtrasant Tukdoji Maharaj Nagpur University, Nagpur, India monika.yadav@rediffmail.com

More information

Pre- and Post Ringing Of Impulse Response

Pre- and Post Ringing Of Impulse Response Source: http://zone.ni.com/reference/en-xx/help/373398b-01/svaconcepts/svtimemask/ Time (Temporal) Masking.Simultaneous masking describes the effect when the masked

More information

TELECOMMUNICATION SYSTEMS

TELECOMMUNICATION SYSTEMS By Syed Bakhtawar Shah Abid Lecturer in Computer Science 1 MULTIPLEXING An efficient system maximizes the utilization of all resources. Bandwidth is one of the most precious resources

More information

ANALYSIS AND EVALUATION OF IRREGULARITY IN PITCH VIBRATO FOR STRING-INSTRUMENT TONES

Abstract ANALYSIS AND EVALUATION OF IRREGULARITY IN PITCH VIBRATO FOR STRING-INSTRUMENT TONES William L. Martens Faculty of Architecture, Design and Planning University of Sydney, Sydney NSW 2006, Australia

More information

Audio Compression using the MLT and SPIHT

Audio Compression using the MLT and SPIHT Mohammed Raad, Alfred Mertins and Ian Burnett School of Electrical, Computer and Telecommunications Engineering University Of Wollongong Northfields Ave Wollongong

More information

The Effect of Frequency Shifting on Audio-Tactile Conversion for Enriching Musical Experience

The Effect of Frequency Shifting on Audio-Tactile Conversion for Enriching Musical Experience Ryuta Okazaki 1,2, Hidenori Kuribayashi 3, Hiroyuki Kajimioto 1,4 1 The University of Electro-Communications,

More information

Adaptive Forward-Backward Quantizer for Low Bit Rate. High Quality Speech Coding. University of Missouri-Columbia. Columbia, MO 65211

Adaptive Forward-Backward Quantizer for Low Bit Rate High Quality Speech Coding Jozsef Vass Yunxin Zhao y Xinhua Zhuang Department of Computer Engineering & Computer Science University of Missouri-Columbia

More information

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals. XIV. SPEECH COMMUNICATION Prof. M. Halle G. W. Hughes J. M. Heinz Prof. K. N. Stevens Jane B. Arnold C. I. Malme Dr. T. T. Sandel P. T. Brady F. Poza C. G. Bell O. Fujimura G. Rosen A. AUTOMATIC RESOLUTION

More information

10 Speech and Audio Signals

10 Speech and Audio Signals 0 Speech and Audio Signals Introduction Speech and audio signals are normally converted into PCM, which can be stored or transmitted as a PCM code, or compressed to reduce the number of bits used to code

More information

9 Best Practices for Optimizing Your Signal Generator Part 2 Making Better Measurements

9 Best Practices for Optimizing Your Signal Generator Part 2 Making Better Measurements In consumer wireless, military communications, or radar, you face an ongoing bandwidth crunch in a spectrum that

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Chapter 2 Introduction to Haptics 2.1 Definition of Haptics

Chapter 2 Introduction to Haptics 2.1 Definition of Haptics The word haptic originates from the Greek verb hapto to touch and therefore refers to the ability to touch and manipulate objects. The haptic

More information

IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR

IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR Tomasz Żernici, Mare Domańsi, Poznań University of Technology, Chair of Multimedia Telecommunications and Microelectronics, Polana 3, 6-965, Poznań,

More information

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley University of California Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences Professors : N.Morgan / B.Gold EE225D Spring,1999 Medium & High Rate Coding Lecture 26

More information

EEE 309 Communication Theory

EEE 309 Communication Theory Semester: January 2016 Dr. Md. Farhad Hossain Associate Professor Department of EEE, BUET Email: mfarhadhossain@eee.buet.ac.bd Office: ECE 331, ECE Building Part 05 Pulse Code

More information

Lab/Project Error Control Coding using LDPC Codes and HARQ

Linköping University Campus Norrköping Department of Science and Technology Erik Bergfeldt TNE066 Telecommunications Lab/Project Error Control Coding using LDPC Codes and HARQ Error control coding is an

More information

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008 R E S E A R C H R E P O R T I D I A P Spectral Noise Shaping: Improvements in Speech/Audio Codec Based on Linear Prediction in Spectral Domain Sriram Ganapathy a b Petr Motlicek a Hynek Hermansky a b Harinath

More information

MPEG-4 Structured Audio Systems

MPEG-4 Structured Audio Systems Mihir Anandpara The University of Texas at Austin anandpar@ece.utexas.edu 1 Abstract The MPEG-4 standard has been proposed to provide high quality audio and video content

More information

ECMA TR/105. A Shaped Noise File Representative of Speech. 1 st Edition / December Reference number ECMA TR/12:2009

ECMA TR/105. A Shaped Noise File Representative of Speech. 1 st Edition / December Reference number ECMA TR/12:2009 ECMA TR/105 1 st Edition / December 2012 A Shaped Noise File Representative of Speech Reference number ECMA TR/12:2009 Ecma International 2009 COPYRIGHT PROTECTED DOCUMENT Ecma International 2012 Contents

More information

Performance Evaluation of STBC-OFDM System for Wireless Communication

Performance Evaluation of STBC-OFDM System for Wireless Communication Apeksha Deshmukh, Prof. Dr. M. D. Kokate Department of E&TC, K.K.W.I.E.R. College, Nasik, apeksha19may@gmail.com Abstract In this paper

More information

Thresholds for Dynamic Changes in a Rotary Switch

Proceedings of EuroHaptics 2003, Dublin, Ireland, pp. 343-350, July 6-9, 2003. Thresholds for Dynamic Changes in a Rotary Switch Shuo Yang 1, Hong Z. Tan 1, Pietro Buttolo 2, Matthew Johnston 2, and Zygmunt

More information

Exploring Surround Haptics Displays

Exploring Surround Haptics Displays Ali Israr Disney Research 4615 Forbes Ave. Suite 420, Pittsburgh, PA 15213 USA israr@disneyresearch.com Ivan Poupyrev Disney Research 4615 Forbes Ave. Suite 420, Pittsburgh,

More information

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

COMPUSOFT, An international journal of advanced computer technology, 3 (3), March-204 (Volume-III, Issue-III) ISSN:2320-0790 Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech

More information

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma & Department of Electrical Engineering Supported in part by a MURI grant from the Office of

More information

8.3 Basic Parameters for Audio

8.3 Basic Parameters for Audio Analysis Physical audio signal: simple one-dimensional amplitude = loudness frequency = pitch Psycho-acoustic features: complex A real-life tone arises from a complex superposition

More information

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Audio Engineering Society Convention Paper Presented at the 112th Convention 2002 May 10 13 Munich, Germany 5627 This convention paper has been reproduced from the author s advance manuscript, without

More information

The Channel Vocoder (analyzer):

Vocoders 1 The Channel Vocoder (analyzer): The channel vocoder employs a bank of bandpass filters, Each having a bandwidth between 100 Hz and 300 Hz. Typically, 16-20 linear phase FIR filter are used.

More information

Downloaded from 1

Downloaded from 1 VII SEMESTER FINAL EXAMINATION-2004 Attempt ALL questions. Q. [1] How does Digital communication System differ from Analog systems? Draw functional block diagram of DCS and explain the significance of

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

THE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES

J. Rauhala, The beating equalizer and its application to the synthesis and modification of piano tones, in Proceedings of the 1th International Conference on Digital Audio Effects, Bordeaux, France, 27,

More information

6.555 Lab1: The Electrocardiogram

6.555 Lab1: The Electrocardiogram Tony Hyun Kim Spring 11 1 Data acquisition Question 1: Draw a block diagram to illustrate how the data was acquired. The EKG signal discussed in this report was recorded

More information

Fundamentals of Digital Audio *

Fundamentals of Digital Audio * Digital Media The material in this handout is excerpted from Digital Media Curriculum Primer a work written by Dr. Yue-Ling Wong (ylwong@wfu.edu), Department of Computer Science and Department of Art,

More information

Low Bit Rate Speech Coding

Low Bit Rate Speech Coding Jaspreet Singh 1, Mayank Kumar 2 1 Asst. Prof.ECE, RIMT Bareilly, 2 Asst. Prof.ECE, RIMT Bareilly ABSTRACT Despite enormous advances in digital communication, the voice is still

More information

TRANSFORMS / WAVELETS

TRANSFORMS / WAVELETS RANSFORMS / WAVELES ransform Analysis Signal processing using a transform analysis for calculations is a technique used to simplify or accelerate problem solution. For example, instead of dividing two

More information

ESE531 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Signal Processing

ESE531 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Signal Processing University of Pennsylvania Department of Electrical and System Engineering Digital Signal Processing ESE531, Spring 2017 Final Project: Audio Equalization Wednesday, Apr. 5 Due: Tuesday, April 25th, 11:59pm

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

Cutaneous Feedback of Fingertip Deformation and Vibration for Palpation in Robotic Surgery

Cutaneous Feedback of Fingertip Deformation and Vibration for Palpation in Robotic Surgery Claudio Pacchierotti Domenico Prattichizzo Katherine J. Kuchenbecker Motivation Despite its expected clinical

More information

Distortion products and the perceived pitch of harmonic complex tones

Distortion products and the perceived pitch of harmonic complex tones D. Pressnitzer and R.D. Patterson Centre for the Neural Basis of Hearing, Dept. of Physiology, Downing street, Cambridge CB2 3EG, U.K.

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information

Outline. Communications Engineering 1

Outline. Communications Engineering 1 Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband channels Signal space representation Optimal

More information

inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering August 2000, Nice, FRANCE

Copyright SFA - InterNoise 2000 1 inter.noise 2000 The 29th International Congress and Exhibition on Noise Control Engineering 27-30 August 2000, Nice, FRANCE I-INCE Classification: 6.1 AUDIBILITY OF COMPLEX

More information

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.