Watermarked Movie Soundtrack Finds the Position of the Camcorder in a Theater

Size: px

Start display at page:

Download "Watermarked Movie Soundtrack Finds the Position of the Camcorder in a Theater"

Mariah Neal
5 years ago
Views:

1 1 Watermarked Movie Soundtrack Finds the Position of the Camcorder in a Theater Yuta Nakashima, Ryuki Tachibana, Noboru Babaguchi, senior member, IEEE Abstract In recent years, the problem of camcorder piracy in theaters has become more serious due to technical advances in camcorders. In this paper, as a new deterrent to camcorder piracy, we propose a system for estimating the recording position from which a camcorder recording is made. The system is based on spread-spectrum audio watermarking for the multichannel movie soundtrack. It utilizes a stochastic model of the detection strength, which is calculated in the watermark detection process. Our experimental results show that the system estimates recording positions in an actual theater with a mean estimation error of 0.44 m. The results of our MUSHRA subjective listening tests show the method does not significantly spoil the subjective acoustic quality of the soundtrack. These results indicate that the proposed system is applicable for practical uses. Index Terms Audio watermarking, recording position estimation, movie soundtrack, prevention of movie piracy Fig. 1. A scenario for identifying a pirate. I. INTRODUCTION CAMCORDER piracy in theaters is movie theft by persons who bring a camcorder into a theater and record a movie from the screen. Recently, camcorder piracy has become a serious problem due to technical advances in camcorders. The Motion Picture Association claims that the annual loss caused by pirated movies is 6.1 billion dollars, and that over 90% of the pirated movies of new release titles are illegal recordings made by camcorder piracy [1], [2]. Camcorder piracy in theater is explicitly banned by law in many countries. For instance, in the United Sates, the Family Entertainment and Copyright Act, which became law in 2005, bans the uses of recording devices in theaters. The law also imposes a strict penalty on any person who makes pre-release works (not only movies) publicly available. In Japan, in response to the significant loss of box-office revenues, an anti-camcorder law has been enforced since This law prohibits recording movies even for private uses, which was permitted by the previous copyright law. The law also encourages the movie industry to prevent any person from making illegal recordings. As a deterrent against the camcorder piracy in theaters, several watermarking techniques have been proposed [3], [4], [5], [6], [7]. The main idea of these techniques is to embed a secret message into the movie, and the message indicates where and when the movie was shown. If movies are pirated Manuscript received xxx xx, 20xx; revised xxx xx, 20xx. This work was partly supported by the research grant of the Okawa Foundation. Y. Nakashima and N. Babaguchi are with Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka, Japan, {nakashima, babaguchi}@nanase.comm.eng.osaka-u.ac.jp. R. Tachibana is with Tokyo Research Laboratory, IBM Japan, Shimotsuruma, Yamato, Kanagawa, Japan, ryuki@jp.ibm.com. and the illegal recordings are made available via the Internet or some other route, then the secret message can be extracted to determine where and when the illegal recordings were made. This sort of technique is very effective since it can help to specify the theater and showtime the illegal recordings were made for a further surveillance. However, the previously proposed techniques cannot identify the pirate, who made the illegal recordings. We consider a scenario for the purpose of identifying the pirate as follows: (1) The pirate illegally records watermarked movies and uploads the illegal recordings to the Internet. (2) A conventional watermarking system such as [5] finds the illegal recordings on the Internet and analyzes the embedded message to determine the theater and the showtime at which the illegal recordings were made. (3) The position estimation system estimates the position in the theater where the pirate was, precisely enough for specifying the seat. (4) A person identification system identifies the pirate by making correspondence between the seat and the person who was on the seat. A ticketing system or a video surveillance system may be used as the person identification system. This scenario is illustrated in Fig. 1. This paper focuses on the position estimation system surrounded with thick lines in Fig. 1, which is a key component of this scenario. The position estimation system uses an audio watermark signal embedded into movie soundtrack for estimating the recording position. It is not easy to embed audio watermark signals into movie soundtracks. In fact, most of the watermarking methods that have been proposed for movies are video watermarking methods. This difficulty comes from the nature of the movie soundtracks. They are composed of several types of audio such as music, sound effects, voice, and silent portions. In the voice

2 2 Watermark embedder HS1 HS0 HS2 WHS1 WHS0 WHS2 Theater (x, ˆ y) ˆ Watermark embedder Watermark embedder (x, y) Recorded signal Estimated position ^ ^ (x, y) Position estimator Detection strengths Watermark detector movie soundtracks is addressed by a one-step approach utilizing the detection strength model and the entire RS. We present the results of subjective listening tests assessing the acoustic quality of the watermarked multichannel movie soundtracks. As far as we know, this is the first efforts to assess the acoustic quality of audio watermarking in an environment with more than two speakers. The rest of this paper is organized as follows. In Section II, related works are introduced. Section III describes our watermarking algorithm, and then Section IV describes the position estimator. Experimental evaluations of our system and a discussion are given in Section V. We conclude this paper in Section VI. Fig. 2. An overview of the proposed position estimation system. and silent portions, which seems to dominate large portions of a soundtrack, the watermark embedders cannot embed a strong watermark signal without degrading the acoustic quality. We call this the sparseness of movie soundtracks problem. However, we can overcome this problem by maximum-likelihood analysis using the entire recorded signal, and achieve precise recording position estimation by watermarking the multiplechannel soundtrack. An overview of our system is shown in Fig. 2. Now we explain how the position estimation system works. We call each of the channels of the soundtrack as a host signal (HS). The watermark signal for each HS is generated using spread spectrum (SS) technique with different SS codes. The watermark embedder generates a watermark signal for each HS and adds the watermark signal to the HS to generate a watermarked host signal (WHS). Each WHS is emitted into the air from a separate loudspeaker. If the movie is recorded with a camcorder, the monaural recorded signal (RS) of the audio will be a mixture of all of the WHSs. In the RS, the signal from each loudspeaker is delayed in proportion to the distance from that loudspeaker to the microphone of the camcorder. Our main idea is to utilize these delays for the position estimation. The watermark detector calculates detection strengths, which are defined as the correlations between the SS codes and the RS. Therefore, the detection strength of each watermark signal will have a peak at a particular time dependent on the delay times. Taking this into account, we construct a stochastic model of the detection strength. The system calculates the probability of obtaining the detection strengths based on the model and finds an optimal recording position from the probability using maximum-likelihood analysis. The main contributions of this paper are: We demonstrate that digital watermarking of multiplechannel audio signals can be used for finding recording positions down to specifying a specific seat in a large auditorium. This is a brand-new application of the digital watermarking technique. We present a recording position estimation method that is usable even for sparse movie soundtracks. The problem of unreliable watermark signals in the silent portions of the II. RELATED WORK A. Copy Prevention Using Watermarking Technique For music, Tachibana et al. [8] proposed sonic watermarking to allow us to search for illegal recordings made available on the Internet by embedding a secret message into the audio signals. The most distinctive characteristic of this sonic watermarking is that it is applicable even to unplugged live performances. For digital cinema, some studies reveal and classify the sources of pirated movies and assert the importance of the copy prevention using digital watermarking techniques [3], [4]. As a copy prevention method, some watermarking techniques have been proposed. Haitsma et al. [5] developed a video watermarking method for detecting the illegal recordings. Watermark detection from these illegal recordings is a very tough problem because of their geometric distortion. They overcame this problem by relying only on the time axis of the movie and their system allows us to identify the theater, presentation time, and other characteristics. Another way to overcome the geometric distortion was proposed by Nguyen et al. [6], canceling the geometric distortion by considering a model of geometric deformations that occurs according to the positions of the projector and the camcorder. Gohshi et al. [9] presented a watermarking method that is designed to detect watermarks in illegally recorded footage made from CRT screens. Manually canceling the geometric distortion of the footage, they achieved accurate detection. Lubin et al. [7] proposed a video watermarking method aiming at digital cinema applications. This method includes a scheme to cancel the geometric deformations. All of these methods can distinguish illegal recordings made available on the Internet, and are effective in deterring the camcorder piracy. However, they are not able to specify the recording locations where the illegal recordings were made. B. Digital Watermarking Algorithm for Audio Signal There are many digital watermarking algorithms for audio signals. A watermarking algorithm that exploits a psychoacoustic model to maintain the inaudibility of the watermark signal was presented by Swanson et al. [10]. Their psychoacoustic model takes the temporal and frequency masking

3 3 effects of the human auditory system (HAS) into account. The proposed watermarking algorithm of Kirovski and Malvar uses an SS technique [11]. They improved the robustness against distortion by arranging the SS code on the time frequency plane of the HS. For watermarking algorithms that use SS techniques, desynchronization attacks are a serious problem because the desynchronization attacks make watermark detection impossible. They overcame this problem by searching exhaustively for the synchronization position. The algorithm proposed by Tachibana et al. [12] also uses the time-frequency plane of the HS to embed the watermark signal. Another interesting algorithm called echo hiding was presented by Gruhl et al. [13]. Echo hiding embeds a watermark by adding an echo to the HS. The inaudibility of the watermark relies on the temporal masking effect of the HAS. However, since this algorithm uses only one echo to embed a watermark, anyone can detect the watermark and this algorithm is not capable of embedding multiple watermarks. To overcome these problems, Ko et al. [14] proposed a time-spread echo method. This algorithm spreads the echoes in the time domain with a pseudo-random sequence. Frequency (a) Pattern block Pattern block Pattern block (b) Tile Pattern block A frame (c) Time-frequency plane of an HS An amplitude spectrum Time Fig. 3. (a) A pattern block consisting of W B H B tiles. (b) A tile comprised of H T amplitude spectra of two consecutive frames. (c) Repeated pattern blocks on the time-frequency plane of a HS. C. Position Estimation Using Information Hiding For position estimation using information hiding techniques, only a few methods have been proposed. Lazic and Aarabi [15] presented a data hiding method for an audio signal to be a communication channel between a loudspeaker and a microphone, and they applied the method to a position estimation system. Their position estimation system exploits a property of their SS-based data hiding method: the detection strength decreases depending on the distance between the loudspeaker and the microphone. They reported that their system is able to specify the loudspeaker nearest to the microphone. However, since this is done based only on comparison among the detection strengths of the watermarked signals from the loudspeakers, it cannot give the precise position of the recording. Nakashima et al. [16], [17], [18] proposed a position estimation system using an audio watermarking technique to specify the recording position of an illegal recording. This system uses delays of the watermark signal embedded in a multi-channel piece of music and is able to estimate the recording position with a mean estimation error of 1.21 meter in a 6 6 m 2 room. Our position estimation system is based on [18], and is extended to be applicable to the movie soundtracks so that the system can be used for deterrent of the camcorder piracy by using a stochastic model of the detection strength. The existing method [18] has a problem when it is applied to sparse soundtracks. That is, this method fails in estimating the recording position from soundtracks. This is because it takes a two-step approach. The method first calculates delays of the watermark signals and then estimates the recording position. The insufficient energy of the watermark signal in the silent portion of the soundtrack causes a large error in the delay calculation. For accurate position estimation, delays of at least two channels are required. This is a tough condition for soundtracks since the insufficient energy of the watermark signal in the silent portion causes a large error in the delay calculation. Hence, the position estimation resulted in a large error. For this reason, the system [18] targeted only on multiple-channel music pieces. In contrast, the proposed method employs a onestep approach utilizing the detection strength model and the whole RS. This allows accurate estimation of the recording positions even for the soundtracks. III. WATERMARKING ALGORITHM Our algorithm is based on [12] which can detect the watermark signals in recorded signals, and we modify [12] to improve the estimation accuracy. In this section, we describe the basic concepts of the watermark embedding and the watermark detection and then describe them in more detail. A. Basic Concepts 1) Pattern Block and Tile: The watermark embedder constructs the time-frequency plane of the HS by using the discrete Fourier transform (DFT), and modifies the amplitudes of the segmented areas called pattern blocks, as shown in Fig. 3 (a). A pattern block has W B H B tiles, each of which consists of the H T amplitude spectra of two consecutive DFT frames. The tile in the wth column and in the hth row is represented as the tile at (w, h). 2) Pseudo-Random Array: The amplitude spectra in each tile are modified according to the pseudo-random number in {+1, 1} assigned to the tile. The pseudo-random numbers of the tiles in a pattern block form a two-dimensional pseudorandom array (PRA) as shown in Fig. 4 (a). A pseudo-random number for the tile at (w, h) is denoted by ω(w, h). 3) Multiple Watermark Detection: For the recording position estimation, we need to detect multiple watermark signals in a RS. This is achieved by using a different PRA for the watermark signal for each channel of the soundtrack. The value of ω(w, h) for the watermark signal of the cth channel (c = 1, 2,, N C ) is represented as ω c (w, h).

4 and the frequency masking [19]. We use the ISO-MPEG 1 audio psychoacoustic model 2 for layer 3 [20] as the basis of our psychoacoustic model, and alter it as described in [8]. (a) PRA (b) A tile assigned with +1 Fig. 4. (a) Pseudo-random numbers assigned to the tiles in a pattern block. The pseudo-random numbers form a PRA. (b) A tile assigned with +1. In this case, in the two consecutive frames of the tile, the amplitudes in the first frame are increased (represented by + ) and those in the second frame are decreased (represented by ). 4) Fine Detection: Since the position estimation system requires accurate delay times, we need to calculate the detection strengths at a fine resolution. We call this fine detection. To achieve the fine detection, the detection strength, which is basically a normalized cross-correlation between the RS and each PRA, is repeatedly calculated by shifting the PRA by samples. The detection shift determines the accuracy of the detection strength resolution. A sufficiently small should be used not to spoil the sharpness of the peaks of the detection strengths. 5) Modulus Operator: A modulus operator and the pseudorandom number assigned to a tile determine how the amplitude spectra in the tile are modified. The modulus operator m is defined by m = (m 0, m 1 ) = (+1, 1), (1) The signs of the amplitude modifications for the first and the second frames in the tile at (w, h) of the cth channel are determined by ω c (w, h)m 0 and ω c (w, h)m 1, respectively. This means that ω c (w, h) = +1 increases the amplitude spectra in the first frame of the tile and decreases the spectra in the second frame. In the opposite case, ω c (w, h) = 1 decreases the amplitude spectra in the first frame and decreases the spectra in the second frame. Figure 4 (b) shows a tile which is assigned +1. In the watermark detection, taking the difference of the adjacent frames in a tile enhances the watermark signal, and reduces the influence of the HS. In [12], the modulus operator is defined by m = (m 0, m 1, m 2, m 3 ) = (+1, +1, 1, 1), (2) and a tile consists of four consecutive frames so that the watermarks can be detected even when the starting positions of the frames in the WHS and in the RS are different. In other words, modifying the amplitudes of two consecutive frames by the same sign broadens the peaks of the detection strengths, enabling us to detect watermarks without knowing the exact starting position of the PRA. However, since the broadened peak degrades the accuracy of the delay times, we use the modulus operator defined by (1). 6) Psychoacoustic Model: To make the watermark signals inaudible, we use a psychoacoustic model to decide the amount of the amplitude modifications. There are several kinds of psychoacoustic effects of the human auditory system such as the absolute threshold of hearing, the temporal masking, B. Watermark Embedder The watermark embedder generates a WHS. The energy of the watermark signal is spread on the pattern block using the PRA. The WHS for the cth channel, y c (t), is generated by the following steps. 1) The HS in the time domain, x c (t), is divided into frames, each of which consists of N samples, using the sine window. Adjacent frames are overlapped with each other by N/2 samples to avoid discontinuities. The tth sample of the fth frame is represented as x c (f, t) = x c (t + fn/2)win(t), (3) where win(t) is the sine window defined as win(t) = sin(πt/n) for 0 t N 1. (4) 2) The frames are transformed into the frequency domain using the DFT. The kth complex spectrum of the fth frame, X c (f, k), is obtained as X c (f, k) = DFT[ x c (f, t)](k). (5) The amplitude spectrum, XA c (f, k), and the phase spectrum, XP c (f, k), are given by X c A(f, k) = X c (f, k) (6) X c P(f, k) = arg X c (f, k). (7) 3) The psychoacoustic model determines the inaudible amount of amplitude modification A c (f, k). 4) The amplitude modification sign, Sign c (f, k), for an amplitude spectrum in the tile at (w, h) is calculated as Sign c (f, k) = ω c (w, h)m (f mod 2). (8) 5) The amplitude spectrum of the WHS, YA c (f, k), is obtained as Y c A(f, k) = X c A(f, k) + αa c (f, k)sign c (f, k), (9) where α is the watermarking rate which controls the tradeoff between the acoustic quality of the WHS and the position estimation accuracy. 6) The time-domain representation of the WHS in each frame is constructed with the inverse DFT (IDFT) by using the original phases of the HS. ỹ c (f, t) = IDFT[Y c A(f, k) exp{ 1X c P(f, k)}](t). (10) 7) The final WHS in the time domain, y c (t), is generated by the overlap-and-add technique using the sine window as follows. y c (t) = F 1 f=0 ỹ c (f, t fn/2)win(t fn/2), (11) where F is the number of frames in the HS.

5 5 samples Frames for a tile Pattern blocks (a) Frames for samples... frequency (b) Frames for... (a) RS time Fig. 5. samples Frames in the watermark detection. C. Watermark Detector The watermark detector calculates the detection strengths using the fine detection. The watermark detector detects multiple watermark signals with different SS codes, and gives the detection strengths in a fine resolution. The detection strengths of the cth channel with an i -sample time delay, s c (i), are calculated from the RS by the following steps. 1) The RS, z(t), is divided into frames by the sine window. Each frame is comprised of N samples, and overlaps with each other by N/2 samples. The first frame starts at the i th sample as shown in Fig. 5. That is, z i (f, t) = z(t + i + fn/2)win(t). (12) 2) The frames are transformed into the frequency domain by the DFT. The kth amplitude spectrum of the fth frame, Z i (f, k), is computed by Z i (f, k) = DFT[ z i (f, t)](k). (13) 3) The amplitudes are normalized as Z i (f, k) = 1 N/2 Z i (f, k) N/2 1 k=0 Z i (f, k). (14) 4) The difference between logarithmic amplitudes of two frames in a tile at (w, h), D i (w, k), is calculated as D i (w, k) = log Z i (2w, k) log Z i (2w + 1, k). (15) This alleviates the influence of the HS because the amplitudes of the consecutive frames have close values, while the watermark signal is enhanced by the modulus operator. 5) The amplitude of the tile at (w, h), ρ i (w, h), is given by ρ i (w, h) = k D i (w, k). (16) The summation is computed for k included in the tile at (w, h). 6) The ith detection strength of the cth channel, s c (i), is calculated as W B 1 w=0 s c (i) = W B 1 w=0 H B 1 h=0 H B 1 h=0 ω c (w, h) [ρ i (w, h) ρ i ] {ω c (w, h) [ρ i (w, h) ρ i ]} 2, (17) detection strength detection strength (b) The detection strengths calculated from the RS (c) The detection strength blocks time time Fig. 6. (a) A RS containing multiple watermark signals. The pseudo-random number assigned to each tile in pattern blocks is also shown. (b) The detection strengths calculated from the RS. (c) The detection strength blocks. where 1 ρ i = W B H B W B 1 w=0 H B 1 h=0 ρ i (w, h). (18) From the central limit theorem, s c (i) follows the normal distribution. If the RS is not watermarked, since the standard deviation of the numerator of (17) is given by the denominator, s c (i) asymptotically follows the standard normal distribution. IV. POSITION ESTIMATOR In this section, we describe the maximum-likelihood position estimator in detail. An algorithm which reduces the computational cost of finding the maximum of the likelihood function is also presented. A. Basic Concepts 1) Detection Strength Model: As described above, the detection strengths asymptotically follows the normal distribution with unknown mean and variance. Hence, we model a detection strength as a random value which follows the normal distribution. We call this model a detection strength model. The mean and variance of the distribution is determined as follows. The watermark signal in a RS is shifted by the time delay in proportion to the distance from the loudspeaker to the microphone. Therefore, the detection strength forms a certain pattern with some peaks corresponding to the time delay as shown in Fig. 6 (b), when we see the sequence of the detection strength along the time axis. Based on this assumption, we assume that the mean of the distribution can be determined by a function of the time delay of the peaks. The variance is assumed to be 1 to maintain the simplicity. Since the recording

6 6 position gives the time delay theoretically, we can calculate the likelihood of the detection strength given the recording position. The position estimator finds the recording position which maximizes the likelihood. 2) Fast Maximization using Upper Bound: Our position estimation is a maximization problem of the likelihood function. Since the derivation of the analytical solution of the maximization is too difficult or even impossible, we must maximize the likelihood function by exhaustively searching for the best one from a set of the possible values of the parameters, which is computationally expensive. To reduce the computational cost, we introduce pruning using an upper bound of the likelihood function. If the upper bound is lower than the maximum value that has been obtained, we need not perform further searching with the value of the parameter. B. Derivation of Detection Strength Model We model the detection strength by the normal distribution. The mean of the distribution is determined by the recording position and the recording conditions. In this section, we determine the mean of the distribution from the position and shape of the peaks. The shape of the peak is determined by the watermarking algorithm, the condition of the recording, and the HS. For the fine detection, the correlation of the PRA and the RS, s c (i), is calculated for every samples in the RS. This must be smaller than the length of a tile, N. Therefore, not only at the exact time position of the starting position of the pattern block, but also around that time position, strong correlation values are given as in Fig. 6 (b). Furthermore, the condition of the recording (i.e. the volume, bandwidth of the recording device, noises) and the HS affect the shape. These factors mainly alter the height of the peak. Taking these into account, we compute the averaged shape of the detection strength peak as follows. First, a watermark signal with single pattern block is generated using a PRA, and the watermark detector is applied to the signal. In the calculation of (17), since the pattern block is arranged repeatedly in the actual embedding process, we assume that the signal is periodic, and the detection strength is calculated for i = 0 to I 1 where I is the repetition period. Since the pattern block in the signal starts at the beginning of the signal, the peak is at i = 0. This process is repeated using different PRAs. Then, for each i, the average of detection strengths among the PRAs is calculated. We denote this averaged shape of the detection strength peak by g(i). Using g(i) and assuming that a pattern block starts at i = i, we obtain the mean of the distribution as µ(β, i ) = βg(i i ) (19) where β is a parameter which determines the height of the peak dependent on the recording condition and the HS. As mentioned in Section III-C, the variance of the detection strength is asymptotically 1 for time positions i except the peaks, because the mean of the numerator of (17) is 0 and thus the denominator can be a sample standard deviation of the numerator. On the other hand, the variance is not 1 for time positions close to the peaks. This is because the watermark signal shifts the mean of the distribution of the numerator. However, we ignore this to maintain the simplicity. C. Position of the Peak We formulate the relationship between the recording position and the peaks of the detected strengths as follows. We calculate the relative time delay of the time position of the cth channel theoretically from the reference channel r. Let x m and x c sp denote the recording position and the position of the loudspeaker for the cth channel, respectively. The relative time delay of the cth channel is given by the function of x m as ι c (x m ) = F S( x c sp x m x r sp x m ), (20) V S where F S is the sampling frequency and V S is the speed of sound. From this equation, the time position of the peak of the cth channel is given as i c (x m, i r ) = i r + ι c (x m ) where i r is the time position of the peak of the reference channel. To simplify the notation, we omit the parameter (x m, i r ) which is common for any c unless it is ambiguous. D. Derivation of the Position Estimator Since the PRA is arranged repeatedly on the time-frequency plane, the detection strengths form a peak at the beginning of each pattern block. We segmented the detection strengths into J detection strength blocks as shown in Fig. 6 (c) so that each detection strength block has single peak in itself. The length of the detection strength block is equal to that of the pattern block, I. A pattern block consists of W B N samples. There are W B tiles in each row and each tile occupies two consecutive frames overlapping with each other by N/2. Therefore, since the detection strength is calculated for every W B N samples starting from the i th sample, the length of a detection strength block is I = W B N/. The jth detection strength block of the cth channel, o c j, is represented as o c j = (o c j,0, o c j,1,, o c j,i 1), (21) where o c j,i = sc (ji + i). The detection strengths of the cth channel are denoted as O c = {o c 0, o c 1,, o c J 1}. (22) The value of J depends on the duration of the RS. Now, we derive the maximum-likelihood estimator of the recording position. First, we calculate the probability of O = {O 1, O 2,, O N C }. Since the peak in o c j is at ic, o c j,i follows the normal distribution, N (µ(βj c, ic ), 1), where βj c is the height of the peak. Therefore, the conditional probability of o c j,i is obtained as Pr[o c j,i x m, i r, β c j ] = 1 2π exp [ {oc j,i µ(βc j, ic )} 2 2 (23) Thus, the conditional probability of O is given as N C N C J 1 Pr[O Θ] = Pr[O c Θ] = Pr[o c j Θ] (24) c=1 c=1 j=0 N C = J 1 c=1 j=0 i=0 ] I 1 Pr[o c j,i Θ], (25).

7 7 where Θ = {x m, i r, B} and B = {βj c c = 1, 2,, N C, j = 0, 1,, J 1}. We define the log-likelihood function, L(Θ), as J 1 I 1 {o c j,i L(Θ) = µ(βc j, ic )} 2. (26) 2 c=1 j=0 i=0 Eliminating βj c by setting L(Θ)/ βc j = 0, and ignoring the irrelevant terms, we obtain the following maximization criterion equivalent to (26): [ J 1 I 1 2 L (Θ ) = o c j,ig(i i )] c, (27) c=1 j=0 i=0 where Θ = {x m, i c }. The recording position is estimated by finding the parameters which maximize this criterion. That is, the maximumlikelihood estimator of Θ is ˆΘ = arg max Θ L (Θ ), (28) and the element, x m, is the maximum-likelihood estimator of x m. The simplest solution for this maximization problem is an exhaustive search in the set of possible values of Θ. E. Maximization Algorithm to Reduce Computational Cost Finding the maximum of L (Θ ) by the exhaustive search is computationally too expensive since the parameter space is three-dimensional when x m is two-dimensional, and each possible Θ requires the calculation of (27). In this section, we propose an algorithm which can drastically reduce the computational cost by using an upper bound of L (Θ ). We calculate the upper bound for each value of i r. If the upper bound is lower than the maximum that has been obtained by that time in the search, further search with the value of x m is unnecessary. We define the following function: [ J 1 I 1 2 λ c (i c ) = o c j,ig(i i )] c. (29) j=0 i=0 Since i c can be calculated from Θ, the maximization criterion (27) can be rewritten as L (Θ ) = λ c (i c ) = λ r (i r ) + c=1 c=1,c r λ c (i c ) (30) by separating λ r (i r ), which is irrelevant to x m. In the exhaustive search, the summation in the rightmost side of the equation is maximized for each given i r. That is, max x m λ r (i r ) + c=1,c r λ c (i c ) = λ r (i r ) + max x m c=1,c r λ c (i c ). (31) Since the maximum of the last term is less than or equal to the sum of the maximums of λ c (i c ), we obtain the following inequality: max x m c=1,c r λ c (i c ) c=1,c r max λ c (i c ) = M. (32) i c Maximization algorithm ĩ c arg max i c λ c (i c ) r arg max N C c=1 λc (ĩ c ) current maximum 0 for i r = ĩ r to I 1 and i r = 0 to ĩ r 1 do if current maximum < λ r (i r ) + M then xˇ m search possible x m exhaustively ˇΘ {i r, xˇ m } if L ( ˇΘ ) > current maximum then current maximum L ( ˇΘ ) Θ cand ˇΘ end if end if end for return Θ cand Fig. 7. Maximization algorithm. The maximization of λ c (i c ) is not too computationally expensive since this is a maximization problem involving only one parameter, i c. In other words, although i c is determined by x m and i r, the maximization of λ c (i c ) is simply finding the value of i c regardless of x m and i r. Also, the maximization is done only once because it is irrelevant to the value of i r. Thus, we obtain an upper bound u(i r ) of L (Θ ) given i r as L (Θ ) u(i r ) λ r (i r ) + M. (33) Now we have the upper bound, we can prune the search of x m for i r if u(i r ) is less than the maximum value that we have computed for a different value of i r. Figure 7 shows the maximization algorithm using the upper bound. This algorithm can drastically reduce the number of possible i r s while the exhaustive search requires to find x m which maximize L (Θ ) for all i r s. Furthermore, the earlier we obtain large values, the more effective the pruning of the algorithm becomes. Therefore, we choose the reference channel as where r = arg N C max c=1 λc (ĩ c ), (34) ĩ c = arg max λ c (i c ), (35) i c and the search is begun from ĩ r, where we can expect that L (Θ ) gets large. V. EXPERIMENTAL RESULTS To evaluate the estimation accuracy of our position estimation system, we conducted experiments in a circular auditorium with 250 seats. The effect of the watermarking rate, α, which controls the volume of watermark signals, on the estimation accuracy was investigated by simulation experiments. We also subjectively assessed the acoustic quality of the WHS by using the MUSHRA listening tests [21].

8 8 TABLE III EXPERIMENTAL PARAMETERS. (-6, 8) Loudspeaker for y [m] (-3, 10) (-1, 10) (1, 10) (3, 10) (-3, 8) (-1, 8) (1, 8) (3, 8) (-3, 6) (-1, 6) (1, 6) (3, 4) (6, 8) Loudspeaker for Number of tiles in a column of a pattern block W B 20 Number of tiles in a row of a pattern block H B 24 Height of a tile H T 6 Number of channels N C 3 Frame length [samples] N 512 Detection shift [samples] 16 Sampling frequecny [Hz] F S Sound velocity [m/s] V S 340 TABLE IV ROOT MEANS SQUARE (RMS) VALUES OF THE WATERMARK SIGNALS. (-3, 4) (-1, 4) (1, 4) (3, 4) Label RMS for c = 1 RMS for c = 2 RMS for c = 3 DS [db] [db] [db] DS [db] [db] [db] DS [db] [db] [db] DS [db] [db] [db] DS [db] [db] [db] Fig. 8. Loudspeaker for (0, 0) x [m] The experimental environment for the estimation accuracy evaluation. TABLE I THE TEST SAMPLES USED IN THE EXPERIMENTS. Label Title Start at RMS [db] for channel [sec] c = 1 c = 2 c = 3 DS1 Saw 1, DS2 Pretty Woman 3, DS3 The Bourne Identity 3, DS4 Harry Potter and the Goblet of Fire 2, DS5 RENT 2, A. Estimation Accuracy Evaluation To evaluate the estimation accuracy of our system in a semirealistic environment, we conducted experiments in the Hankyu Sanwa Conference Hall in the Alumnus Union Building for the Osaka University Medical School 1. This is a circular auditorium with a radius of 8.8 m and has 250 seats. Three loudspeakers and 16 microphones (represented by the dots) were arranged in the same plane, as shown in Fig. 8. The experimental setup is shown in Fig. 9. We recorded the sound with all 16 microphones simultaneously. The volume of the two powered mixers was manually adjusted to be the same. The test samples used in these experiments are listed in Table I. The test samples are excerpts from the right (c = 1), center (c = 2), and left (c = 3) channels of the original movie soundtracks. The starting positions were randomly chosen. The TABLE II ROOT MEANS SQUARE (RMS) VALUES OF THE TEST SAMPLES. Label RMS for c = 1 RMS for c = 2 RMS for c = 3 DS [db] [db] [db] DS [db] [db] [db] DS [db] [db] [db] DS [db] [db] [db] DS [db] [db] [db] 1 duration of each test sample is 1,800 seconds (30 minutes). The root mean square (RMS) values of each test sample are listed in Table II. The parameters used in the experiments are listed in Table III. To reduce the cross-correlation effects among the PRAs, we generated PRAs for each test sample by exhaustively searching a set of PRAs for pairs that had low cross-correlation values. The watermarking rate, α, was set to 1.0. The RMS values for the watermark signals are listed in Table IV. Figure 10 shows the estimation errors for each microphone position. Almost all microphone positions were accurately estimated except for microphone positions (3, 4), (1, 4), and ( 1, 4) for DS2. The estimation errors for these microphone positions were large. One of the reasons is that there were not enough watermark signals of the first and third channels in the RS to form peaks in the detection strengths. Since the energy of the first and third channels of the DS2 was low, the watermark embedder could not embed the watermark signals with sufficient energy. The directional characteristics of the loudspeakers and the distances from the loudspeakers to these microphones enhanced this energy imbalance. Furthermore, effect of cross-correlation among the three PRAs enlarged the error. If the first and second channel have correlation, the strong watermark signal of the second channel forms a false peak in the detection strength of the first channel even the correlation is weak. If the false peak is larger than the actual peak, the estimator cannot give the correct estimation. Therefore, in practical use, some types of technique which controls the volume of watermark signals to balance the energy of channels may be necessary. The mean and standard deviation of the estimation errors for all of the microphone positions were 0.40 m and 1.33 m, respectively. Although the standard deviation is large due to the large errors of DS2, this is good enough to reduce the number of suspected seats to a few seats. B. Watermarking Rate versus Estimation Accuracy In the previous section, we showed that our system accurately estimated the recording positions for α = 1.0. However,

9 9 PC Powered Mixer YAMAHA EMX66M Microphone Amp. audio-technica AT-MA2 Audio Interface EDIROL UA-101 RSs PC WHSs Microphone Amp. audio-technica AT-MA2 Audio Interface EDIROL UA-101 Powered Mixer YAMAHA EMX312SC Microphone Amp. audio-technica AT-MA2 Audio Interface EDIROL UA-101 RSs PC Loudspeakers YAMAHA HS-50M Microphones SHURE SM63L Fig. 9. The experimental setup. Estimation Error [m] DS1 DS2 DS3 DS4 DS5 Estimation error [m] (-3, 4) (-1, 4) (1, 4) (3, 4) (-3, 6) (-1, 6) (1, 6) (3, 6) (-3, 8) (-1, 8) (1, 8) (3, 8) (-3, 10) (-1, 10) (1, 10) (3, 10) Watermarking rate Fig. 10. Microphone Position (x [m], y [m]) Estimation errors of the experiment in the auditorium. Fig. 11. The relationship between the watermarking rate and the estimation error. The means and the standard deviations of estimation errors are calculated for each α value. the acoustic quality was heavily degraded since α was too large. To maintain the acoustic quality, the watermarking rate should be small to keep the energy of watermark signals in a low level. This may cause larger estimation errors. We investigated the relationships between α and the estimation errors by using simulation experiments. First, we model the RS which is received by the microphone at x as z x (t) = y c (t) h c x(t) + n B (t), (36) c=1 where y c (t) is a WHS of the cth channel, h c x(t) is the impulse response of the path from the loudspeaker for the cth channel to the microphone at x, n B (t) is the noise (including background noise and the thermal noise), and is the convolution operator. We measured the impulse response, h c x(t), by the time stretched pulse method [22] under the same experimental setup as discussed in Section V-A. The noise, n B (t), is assumed to follow the normal distribution, N (0, σb 2 ), and its variance, σ2 B is determined from the RS without any sound coming from the loudspeakers. Although the impulse response characterizes the linear aspects of the experimental system (i.e. the powered mixers, the loudspeakers, the microphone and so forth), the experimental system actually has nonlinear aspects (e.g. amplifier clipping). However, the effect of the nonlinearity is small in general and thus we consider that the difference between the detection strength of the simulated version of the RS and the actually recorded RS can be neglected. Applying (36) to the WHSs with various α, we generate simulated versions of the RSs. The other parameter values were the same as in Section V-A. The mean and standard deviation of the estimation errors were calculated for each α. The result is shown in Figure 11. The mean and standard deviation of the estimation error for α = 1.0 are 0.41 m and 1.26 m, respectively. These are close enough to the result in Section V-A. Thus, we consider that the result of this simulation experiment is reliable. The mean of the estimation errors was large for α < 0.1. Meanwhile, the microphone positions were estimated with small errors for α 0.1, although the standard deviations were relatively large due to the large estimation errors of DS2, as mentioned in Section V-A. The mean of the estimation error for α = 0.1 was 0.44 m. This result indicates that, in this experimental environment, the peak of the detection strengths is buried in the noise for α < 0.1. In other words, we can reduce the value of α as small as 0.1 without significant estimation errors. Note that the appropriate value of α may depend on the frequency response of the acoustical system of the auditorium including background noise. To show the effectiveness of the algorithm in reducing the computational costs, in this experiment, we measured the

10 10 TABLE V THE SAMPLES USED IN THE SUBJECTIVE ASSESSMENT OF THE ACOUSTIC QUALITY. Label Excerpt from Starts at Ends at SUB1 DS2 454 [s] 473 [s] SUB2 DS3 111 [s] 129 [s] SUB3 DS4 1,229 [s] 1,248 [s] SUB4 DS5 326 [s] 349 [s] SUB5 DS2 1,229 [s] 1,046 [s] TABLE VI THE DESCRIPTION OF THE TEST SIGNAL USED IN THE SUBJECTIVE ASSESSMENT OF THE ACOUSTIC QUALITY. y [m] (0, 3) Loudspeaker for Listening position (3, 3) Label Description REF Reference signal HREF Hidden reference ALPF Low pass filtered signal as an anchor AM48 Compressed signal using MP3 48 kbps as an anchor AM32 Compressed signal using MP3 32 kbps as an anchor WR01 Watermarked signal with α = 0.1 WR03 Watermarked signal with α = 0.3 WR05 Watermarked signal with α = 0.5 time to estimate the positions. A PC with an Intel Core 2 Duo processor running at 1.6 GHz using Windows XP (Service Pack 2) with 1 Gbyte of memory was used in these experiments. The average time was 596 seconds to process a 1,800-second RS with three embedded watermark signals. For comparison, we also measured the time to estimate the positions with the exhaustive search version. However, since this was time consuming, the estimation was executed only twice. The average time for these two estimations was 179,573 seconds. Hence, the proposed algorithm achieved the 99.7% execution time reduction compared to the exhaustive search without any loss of the accuracy. C. Subjective Evaluation of Acoustic Quality We subjectively assessed the acoustic quality of the WHSs by using MUSHRA listening tests [21]. This is a method to assess the acoustic quality of audio signals which undergoes some audio signal processing system like encoding and decoding. A subject listens to multiple audio signals including not only the audio signal which undergoes the objective system but also the original audio signal called the hidden reference and audio signals called anchor, which undergoes the other system, for comparison and were required to grade all audio signals comparing each other. In this assessment, we used the test samples listed in Table V which are excerpts from the samples used in Section V-A. Each of the test samples were processed as in Table VI. For each test sample, 17 inexperienced listeners, who underwent training sessions in which they were exposed to all of the signals used in the tests in advance, graded the processed signals. Since MUSHRA listening tests take a long time, we could not conduct the subjective listening tests in the auditorium with the loudspeakers. Instead, the subjects assessed the test signals under the following conditions. (a) Assessment in a small office with three loudspeakers. The subjects were at the listening position corresponding to (3, 3) in a 6 6 m 2 office as shown in Fig. 12, and Fig. 12. (0, 0) (3, 0) Loudspeaker for Loudspeaker for Listening position in the room for the (a) condition. x [m] TABLE VII SUMMARY OF THE CONDITIONS UNDER WHICH THE ACOUSTIC QUALITY WAS ASSESSED. Listening Method Loudspeaker Headphone Office Room (a) (b) Auditorium (c) assessed the test signals coming from three loudspeakers. (b) Assessment of the signals simulating listening in the office using headphones. The test signals were convolved by the impulse responses measured by a dummy head at the listening position in the same room as used for the (a) condition, and subjects listened to the simulated signals with headphones. (c) Assessment of the signals simulating listening in the auditorium using headphones. This condition is almost the same as (b) but the impulse responses were measured at (0, 6) in the auditorium of Fig. 8. These conditions are summarized in Table VII. Since the test signals for (b) were generated using the impulse responses measured in the same room as used for (a), the results of (a) and (b) should be similar. If this is satisfied, the results of (c) can be considered to be similar to those of subjective assessment under the condition where the subjects listen to the test signals from the loudspeakers in the auditorium. The means and 95% confidence intervals for the acoustic quality of the test signals under (a) and (b) are shown in Figs. 13 and 14, respectively. The degradation of the acoustic quality for WR01 and WR03 was almost imperceptible, and that for WR05 was perceptible, though it was still acceptably low. We can say that the subjective acoustic qualities under (a) and (b) were almost the same. Therefore, the results under (c) should be similar to these results under the conditions where the subjects assessed the acoustic quality in the auditorium. Figure 15 shows the means and 95% confidence intervals for the acoustic quality of the test signals under (c). Although the watermark signals were relatively audible compared to (a) or (b), the subjective acoustic qualities of WR01 and WR03

11 Excellent 80 Good 60 Fair 40 Poor 20 Bad 0 REF HREF ALPF AM48 AM32 WR01 WR03 WR05 Fig. 13. The means and the 95% confidence intervals for the acoustic quality of the test signals under (a) for all of the subjects. 100 Excellent 80 were still good enough for practical use. D. Discussion From the results of Section V-B and V-C, with α = 0.1, our system was able to estimate the recording position with the mean estimation errors of 0.44 m while the subjective acoustic quality was in the range of excellent quality. By increasing α to 0.3, the estimation error can be reduced to 0.34 m at the expense of the acoustic quality degradation down to the range of good quality. Therefore, we successfully showed that the proposed system is able to estimate the recording position without significantly spoiling the acoustic quality of movie soundtracks. However, the difference between the results of (b) and (c) indicated that the acoustic quality is largely dependent on the environment in which the system is used. It is also supposed that the estimation accuracy depends on the frequency response of the auditorium, the background noise and so forth. Hence, we need a preliminary experiment in the actual environment before the practical use to determine the appropriate value of α. Summary of the conditions under which the acoustic quality was assessed. Good 60 Fair 40 Poor 20 Bad 0 REF HREF ALPF AM48 AM32 WR01 WR03 WR05 Fig. 14. The means and the 95% confidence intervals for the acoustic quality of the test signals under (b) for all of the subjects. 100 Excellent 80 Good 60 Fair 40 Poor 20 Bad 0 REF HREF ALPF AM48 AM32 WR01 WR03 WR05 Fig. 15. The means and the 95% confidence intervals for the acoustic quality of the test signals under (c) for all of the subjects. VI. CONCLUSION In this paper, we have presented a position estimation system to prevent camcorder piracy in theaters as a new application of the audio watermarking technique. The core idea of our system is to utilize delays of the multiple-channel watermark signals in a recorded signal. Our system consists of a watermarking algorithm and a position estimator. The presented watermarking algorithm is designed to accurately obtain the delay times. Then to implement the position estimation from the recorded movie soundtracks, we have developed a position estimator using a stochastic model of the detection strengths. The long duration of the recorded movie soundtrack enables us to improve the estimation accuracy. Our experimental results showed that the system was able to estimate the recording position with the mean estimation errors of 0.44 m without significantly spoiling the acoustic quality assessed by MUSHRA listening test. However, the acoustic quality seemed to depend on the environment in which the system was used. To clarify the effect of environmental factors (i.e. the frequency response of the auditorium, the background noise and so forth) on the acoustic quality and the estimation accuracy, we need more experiments in various environments. Furthermore, the robustness of our system against attacks such as pitch shifting and lossy compression should be investigated. Especially, pitch shifting is serious since it may be caused by the slight difference of the sampling rate of the playback device and the recording device. Collusion attack is also a tough problem. Our system may estimate an irrelevant position if two or more recorded signals that are recorded at the different positions are mixed. However, it would be possible to detect the collusion attack by examining the detection strength.

12 R EFERENCES [1] 2005 US Picary Fact Sheet. Motion Picture Association of America. [Online]. Available: http://www.mpaa.org/uspiracyfactsheet.pdf [2] Anti-piracy fact sheet asia-pacific region.

of Signals, Systems and Computers, 2004. Conference Record of the Thirty-Eighth Asilomar Conference on, vol. 1, 2004, pp. 363 367. [4] S. Byers, L. Cranor, E. Cronin, D. Kormann, and P.

Kalker, A watermarking scheme for digital cinema, in Proc. of International Conference on Image Processing, vol. 2, October 2001, pp. 487 489. [6] P. Nguyen, R. Balter, N. Montfort, and S.

12 12 R EFERENCES [1] 2005 US Picary Fact Sheet. Motion Picture Association of America. [Online]. Available: [2] Anti-piracy fact sheet asia-pacific region. Motion Picture Association. [Online]. Available: [3] J. A. Bloom and C. Polyzois, Watermarking to track motion picture theft, in Proc. of Signals, Systems and Computers, Conference Record of the Thirty-Eighth Asilomar Conference on, vol. 1, 2004, pp [4] S. Byers, L. Cranor, E. Cronin, D. Kormann, and P. McDaniel, Analysis of security vulnerabilities in the movie production and distribution process, Telecommunications Policy, vol. 28, no. 7-8, pp , August-September [5] J. Haitsma and T. Kalker, A watermarking scheme for digital cinema, in Proc. of International Conference on Image Processing, vol. 2, October 2001, pp [6] P. Nguyen, R. Balter, N. Montfort, and S. Baudry, Registration methods for non blind watermark detection in digital cinema applications, in Proc. of Security and Watermarking of Multimedia Contents V, vol. SPIE vol. 5020, June 2003, pp [7] J. Lubin, J. A. Bloom, and H. Cheng, Robust, content-dependent, high-fidelity watermark for tracking in digital cinema, in Security and Watermarking of Multimedia Contents V, Proc. of SPIE, vol. 5020, January [8] R. Tachibana, Sonic watermarking, EURASIP Journal on Applied Signal Processing, vol. 13, pp , [9] S. Gohshi, H. Nakamura, H. Ito, R. Fujii, M. Suzuki, S. Takai, and Y. Tani, A new watermark surviving after re-shooting the images displayed on a screen, KES (2), vol. 3682, pp , [10] M. D. Swanson, B. Zhu, A. H. Tewfik, and L. Boney, Robust audio watermarking using perceptual masking, Signal Processing, vol. 66, pp , [11] D. Kirovski and H. S. Malvar, Spread-spectrum watermarking of audio signals, IEEE Transactions on Signal Processing, vol. 51, no. 4, pp , [12] R. Tachibana, S. Shimizu, S. Kobayashi, and T. Nakamura, An audio watermarking method using a two-dimensional psuedo-random array, Signal Processing, vol. 82, pp , [13] D. Gruhl, A. Lu, and W. Bender, Echo hiding, in Proc. of the First International Workshop on Information Hiding, vol. 1174, 1996, pp [14] B.-S. Ko, R. Nishimura, and Y. Suzuki, Time-spread echo method for digital audio watermarking, IEEE Transactions on Multimedia, vol. 7, no. 2, April [15] N. Lazic and P. Aarabi, Communication over an acoustic channel using data hiding technique, IEEE Transactions on Multimedia, vol. 8, no. 5, pp , [16] Y. Nakashima, R. Tachibana, M. Nishimura, and N. Babaguchi, Estimation of recording location using audio watermarking, in Proc. of ACM Multimedia and Security Workshop 2006, Geneva, September 2006, pp [17] Y. Nakashima, R. Tachibana, M. Nishimura and N. Babaguchi, Determining recording location based on synchrnonization positions of audio watermarking, in Proc. of International Conference on Acoustic, Speech, Signal Processing 2007, Hawaii, April 2007, pp. II253 II256. [18] Y. Nakashima, R. Tachibana, M. Nishimura, and N. Babaguchi, Maximum-likelihood estimation of recording position based on audio watermarking, in Proc. of The Third International Conference on Intelligent Information Hiding and Multimdia Signal Processing, [19] E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models. SpringerVerlag, [20] Information technology Coding of moving pictures and associated audio for digital storage media up to about 1.5Mbits/s part 3: Audio, ISO/IEC Std :1993, [21] Method for the subjective assessment of intermediate quality levels of coding systems, ITU Std. BS. 1534, [22] Y. Suzuki, F. Asano, H.-Y. Kim, and T. Sone, An optimum computergenerated pulse signal suitable for the measurement of very long impulse responses, The Journal of the Acoustical Society of America, vol. 97, no. 2, pp , February Yuta Nakashima received the B.E. and M.E. degrees in communication engineering from Osaka University, Osaka, Japan, in 2006 and 2008, respectively. He was with Texas Instrument Japan Limited and engages in research and development regarding audio signal processing. He is currently pursuing the doctoral degree at Osaka University. Ryuki Tachibana is a researcher at Tokyo Research Laboratory of IBM Japan. He received his B.E. and M.E. in aerospace engineering from the University of Tokyo, Japan, in 1996 and 1998, and his Dr. Eng. degree from Osaka University in Since he joined IBM Japan in 1998, his main research interests have been in the field of digital audio watermarking and text-to-speech synthesis. In 2003, he was awarded the Digital Watermarking Industry Gathering Event s Best Paper Award at Security and Multimedia Contents V of Electronic Imaging He is a member of the ASJ and the IEICE. Noboru Babaguchi (M 90-SM 07) received the B.E., M.E. and Ph.D. degrees in communication engineering from Osaka University, in 1979, 1981 and 1984, respectively. He is currently a Professor of the Department of Communication Engineering, Osaka University. From 1996 to 1997, he was a Visiting Scholar at the University of California, San Diego. His research interests include image analysis, multimedia computing and intelligent systems, currently content based video indexing and summarization. He has published over 100 journal and conference papers and several textbooks. Dr. Babaguchi received Best Paper Award of 2006 Pacific-Rim Conference on Multimedia (PCM2006). He is on the editorial board of Multimedia Tools and Applications and New Generation Computing. He served as a workshop Co-chair of 3rd International Workshop on Multimedia Information Retrieval (MIR2001), a Track Co-chair of 2006 IEEE International Conference on Multimedia & Expo (ICME2006), and served as a General Co-chair of the 14th International MultiMedia Modeling Conference (MMM2008). He also served on program committee of international conferences in these fields. He is a senior member of the IEEE, and a member of the ACM, the IEICE, the IPSJ, the ITE and the JSAI.

Introduction to Audio Watermarking Schemes

Introduction to Audio Watermarking Schemes N. Lazic and P. Aarabi, Communication over an Acoustic Channel Using Data Hiding Techniques, IEEE Transactions on Multimedia, Vol. 8, No. 5, October 2006 Multimedia