RNN-SM: Fast Steganalysis of VoIP Streams Using Recurrent Neural Network. , Yongfeng Huang, Senior Member, IEEE, and Jilong Wang

Size: px
Start display at page:

Download "RNN-SM: Fast Steganalysis of VoIP Streams Using Recurrent Neural Network. , Yongfeng Huang, Senior Member, IEEE, and Jilong Wang"

Transcription

1 1854 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 7, JULY 2018 RNN-SM: Fast Steganalysis of VoIP Streams Using Recurrent Neural Network Zinan Lin, Yongfeng Huang, Senior Member, IEEE, and Jilong Wang Abstract Quantization index modulation (QIM) steganography makes it possible to hide secret information in voice-over IP (VoIP) streams, which could be utilized by unauthorized entities to set up covert channels for malicious purposes. Detecting short QIM steganography samples, as is required by real circumstances, remains an unsolved challenge. In this paper, we propose an effective online steganalysis method to detect QIM steganography. We find four strong codeword correlation patterns in VoIP streams, which will be distorted after embedding with hidden data. To extract those correlation features, we propose the codeword correlation model, which is based on recurrent neural network (RNN). Furthermore, we propose the feature classification model to classify those correlation features into cover speech and stego speech categories. The whole RNN-based steganalysis model (RNN-SM) is trained in a supervised learning framework. Experiments show that on full embedding rate samples, RNN-SM is of high detection accuracy, which remains over 90% even when the sample is as short as 0.1 s, and is significantly higher than other state-of-the-art methods. For the challenging task of conducting steganalysis towards low embedding rate samples, RNN-SM also achieves a high accuracy. The average testing time for each sample is below 0.15% of sample length. These clues show that RNN-SM meets the short sample detection demand and is a state-of-the-art algorithm for online VoIP steganalysis. Index Terms Steganalysis, steganography, information hiding, covert channel, recurrent neural network. I. INTRODUCTION STEGANOGRAPHY is the technique that hides secret information into digital carriers in undetectable ways. It can be used for setting up covert channels and sending concealed information over the Internet between two parties whose connection is being restricted or monitored. The carriers could be any kind of data streams transferred over the Internet, such as images [1], texts [2], [3], and protocols [4]. Manuscript received August 4, 2017; revised November 27, 2017 and January 28, 2018; accepted February 8, Date of publication February 15, 2018; date of current version March 27, This work was supported in part by the National Key Research and Development Program of China under Grant 2016YFB and in part by the National Natural Science Foundation of China under Grant U , Grant U , Grant U , and Grant U The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Tomas Pevny. (Corresponding author: Yongfeng Huang.) Z. Lin is with the Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA USA. Y. Huang is with the Electronic Engineering Department, Tsinghua University, Beijing , China ( yfhuang@mail.tsinghua.edu.cn). J. Wang is with the Institute for Network Sciences and Cyberspace, Tsinghua University, Beijing , China. Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TIFS In recent years, Voice-over IP (VoIP) [5], a protocol for making high quality calls via the Internet, facilitates the popularity of a number of voice-based applications such as mobile VoIP (mvoip) and voice over instant messenger (VoIM), which drives many researches on VoIP-based steganograpy [6] [13]. Compared with traditional carriers, VoIP has many essential advantages. Its massive payloads provide great information hiding capacity and high covert bandwidth. Its instantaneity enables real-time steganography. And its widespread popularity makes it possible to be deployed in many different scenes. Therefore, VoIP-based steganography turns out to be a good option for secure communication. However, hackers, terrorists, and other lawbreakers may use this technique for malicious intents. For example, they can smuggle unauthorized data or send virus control instructions without being detected by network surveillance. Hence, it is important to develop countermeasures to effectively detect steganography. And this technique is called steganalysis. There are two types of speech coders in VoIP scenarios: waveform coders (e.g. G.711, G.726) and vocoders (e.g. G.723, G.729, ilbc). Compared with waveform coders which are based on quantization values of the original speech signal, vocoders try to minimize the decoding error by analysis-by-synthesis (AbS) framework and can achieve high compression ratio while preserving superb voice quality. Therefore, vocoders have been widely used in VoIP applications and their related steganography techniques are among research focuses. For example, based on quantization index modulation (QIM) [14], researchers proposed algorithms to embed secret information in vocoder streams by changing the process of vector quantization of linear predictive coding (LPC) [11], [12]. The resultant error is theoretically bounded and experiments show that QIM based steganography can achieve state-of-the-art results [11], [12]. In this paper, we focus on detecting QIM based steganography. The classic VoIP steganalysis scenario is shown in Figure 1. Two suspect entities are communicating through a VoIP channel (e.g. making a VoIP phone call). We set up a traffic monitor on the router that the communication must go through. The collected network packets are being assembled into VoIP streams in real time. At the same time, we use sliding window algorithm [15] with a window of length l and step s to sample the latest segment, which is sent to the pre-trained classifier to get the online detection results. The online detection results are sent to the monitor for further actions (e.g. reporting to administrators and cutting off the connection) IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 LIN et al.: RNN-SM: FAST STEGANALYSIS OF VoIP STREAMS USING RNN 1855 Fig. 1. VoIP Steganalysis Scenario. All the above steganalysis actions must be done in real time for the following reasons. First, to minimize losses from potential malicious actions, we need to cut off the covert channel as soon as possible if it exists. The essential step is to know whether there is steganography happening and the detection delay determines how soon we can react. Online detection is therefore a must. Second, because of the popularity of VoIP applications, there are a large volume of VoIP connections on the Internet. For each connection, the size of the whole VoIP stream is unpredictable. Therefore, it is impractical to cache the data streams and do offline detection. When deploying online steganalyis, we can not only react to malicious steganography more quickly, but also save memory resources. To enable online detection, the time for classifying sample of length l need to be shorter than step s. Taking overheads into account, the time for classification must be as short as possible. This is the first requirement for VoIP steganalysis algorithms. We should also notice that, to avoid being detected, steganography applications do not embed secret data into VoIP streams all the time. Instead, in many circumstances, they only do information embedding in short periods and keep inactive for most of the time. If the sample we extract for classification is too long, it will be filled with a mixture of embedding and non-embedding frames, which impairs detection accuracy. To achieve successful detection, the window length l must be as short as possible. This poses the second requirement for VoIP steganalysis that it must be able to detect short samples. However, existing steganalysis methods towards QIM based steganography [16], [17] cannot achieve effective detection results when samples are short. In this paper, we design a recurrent neural network (RNN) based model for steganalysis tasks. The contributions of this work are: We conduct a detailed analysis of codeword correlation in VoIP streams by summarizing correlations into four categories and proposing a metric to evaluate their existence and importance, which provides helpful evidence for steganalysis. To the best of our knowledge, we are the first to introduce RNN into VoIP steganalysis task. Experiment results verify the practicability of this mechanism and indicate that RNN is a powerful alternative to traditional methods when solving similar problems. The detection accuracy of our proposed steganalysis method is above 90% even if the sample is as short as 0.1s, and its accuracy is significantly higher than other state-of-the-art methods on short samples. In addition, the average detection time for each sample is below 0.15% of the sample length. These features indicate that our method can be effectively deployed for online VoIP steganalysis. The rest of the paper is structured as follows. In Section II, we introduce some background knowledge. Related work is introduced in Section III. In Section IV, our proposed steganalysis method is presented. Experiments and discussions are shown in Section V. Finally, we give the conclusion and the future work in Section VI. II. BACKGROUND In this section, we introduce some preliminary knowledge for our algorithm: QIM based steganography and LPC. A. QIM Based Steganography QIM was first proposed by Chen and Wornell [14]. It embeds data by changing the quantization process when encoding a digital media such as image, text, audio, and video. During the encoding process, there are many coefficients that need to be quantized. In the normal procedure, for the coefficient vector x, we will choose the closest vector from a codebook D as its representative: Q(x) = arg min x y (1) y D QIM modifies this procedure. It first divides the codebook D into sub-codebooks C ={C 1, C 2,...,C n }, which satisfies n D = and i=1 C i i = j, C i C j = Assume that the secret information we want to transfer is from the set S ={s 1, s 2,...,s n }. We further define an embedding projection function f as a one-to-one mapping from S to C, and f 1 is its inverse function. When we want to quantize coefficient vector x and hide secret information s k at the same time, we just use the sub-codebook f(s k ) instead of the whole codebook D: Q (x, s k ) = arg min x y (2) y f(s k ) The receiver can recover the secret information by judging to which sub-codebook the quantitative vector belongs: R(y) = f 1 (C k ) where y C k (3) The core problem of QIM based steganography is the codebook partitioning strategy. The simplest way is to divide the codebook randomly. However, it will lead to large additional quantization distortion. Xiao et al. [11] proposed Complementary Neighbor Vertices (CNV) algorithm. It can guarantee that

3 1856 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 7, JULY 2018 every codeword and its nearest neighbor be in different subcodebooks, so the additional quantization distortion can be bounded. In this paper, we will take CNV algorithm as our test target, while our algorithm can be directly applied to other QIM steganalysis algorithm. B. Linear Predictive Coding LPC [18] has been widely used to model speech signal, and is the essential part of vocoders such as G.723 and G.729. It is based on the physical process of speech signal generation. Speech signal is generated by organs in respiratory tract. The organs involved are lung, glottis, and vocal track. When passing through glottis, the exhaled breath from lung would turn to a periodic excitation signal. The excitation signal would then go through vocal track. We can divide vocal track into cascaded segments, whose functions can be modeled as onepole filters. Therefore, the function of vocal tracker can be modeled as an all-pole filter, i.e. LPC filter: H (z) = 1 A(z) = 1 1 n i=1 a i z i (4) where a i is the i-th order coefficient of LPC filter. Because speech signal has short-time stationarity, we can assume that LPC coefficients a i would not change in short time. Therefore, we can divide the speech into short frames and compute the LPC coefficients respectively. Vocoders only encode the deduced LPC coefficients and excitation signals to achieve high compression ratio. In LPC encoding, the LPC coefficients are first converted into Line Spectrum Frequency (LSF) coefficients. And the LSFs are encoded by vector quantization. Specifically, G.729 and G.723 quantizes LSFs into three codewords l 1, l 2, and l 3 using codebooks L 1, L 2,andL 3 respectively. QIM steganography could be performed while quantizing LSFs [11]. Altered after QIM steganography, LSF quantization vectors serve as clues for steganalysis. In this paper, we propose an algorithm to detect QIM steganography on LSFs. Moreover, it is also possible to apply our algorithm on steganography on other quantization processes such as pitch period prediction [13], since pitch period prediction-based steganography uses similar way to hide data (changing quantization vectors). III. RELATED WORK There has been some effort in steganalysis of digital audio. The most common way was to directly extract statistical features from the audio and then conduct classification. Melcepstrum was one of the statistical features that steganalysis algorithms used [19], [20]. Liu et al. [21] improved this method by discovering that high-frequency components were more effective for classification. The three papers above used Support Vector Machine (SVM) classifier. Other statistical features were also used. For example, Dittmann et al. [22] combined features such as mean value, variance, LSB-ratio, and histogram altogether to classify the audio. Avcibas [23] used a series of audio quality measures such as signal-to-noise ratio (SNR) and log-likelihood ratio (LLR) to detect steganography. These two papers used threshold classifier. At the observation that marginal distortion decreases under repeated embedding, Altun et al. [24] watermarked the audio sample for another two times and fed the additional distortion into a neural network classifier. Similarly, Ru et al. [25] discovered that the variations of statistical features such as mean, variance, skewness, and kurtosis were different when conducting steganography on stego object and cover object. Therefore, they embedded random message on the audio sample and put the increment of statistical features into kernel SVM classifier [25]. Huang et al. [26] applied a second steganography on compressed speech to estimate the embedding rate. Neural network models were also introduced into speech steganalysis tasks. Paulin et al. [27] employed deep belief networks to solve this problem. They calculated Mel Frequency Cepstrum Coefficient (MFCC) and deep belief networks (DBN) served as a classifier. In another work, Paulin et al. [28] used Evolutionary Algorithms (EAs) to train a Restricted Boltzmann Machines (RBMs), which classified stego and cover speech. The input to RBMs was still MFCC features. Rekik et al. [29] first introduced Time Delay Neural Networks (TDNN) to detect stego-speech. They extracted LSF from the original audio and did the classification with TDNN. Those methods were partly inspired by the good performance of Artificial Neural Network (ANN) in other fields. However, they all firstly extracted hand-crafted features and then used ANN as classifier, which could not fully exploit ANN s capability in feature extraction. Chen et al. [30] used Convolutional Neural Network (CNN) to do steganalysis tasks, and raw audio streams served as input. The above speech steganalysis algorithms were universal. They extracted features from the original audio streams and therefore could be applied to almost all kinds of steganography algorithms. The weakness was that their accuracies on specific steganography were usually lower than other targeted steganalysis algorithm, for example, steganalysis towards QIM based steganography. QIM steganography algorithms only modifies specific codewords to achieve information hiding. Extracting only those modified bits, instead of the whole audio stream, will certainly benefit the detection accuracy. The QIM steganalysis algorithms [16], [17] utilized this intuition. Li et al. [17] extracted the modified codewords into a data stream, and used Markov chain to model the transition pattern between successive codewords. Li et al. [16] further took the transition probability within a frame into consideration. Those two steganalysis algorithms achieved state-of-the-art detection results. However, in the codeword sequence, there were other correlation relationships that those two methods did not consider. The algorithm proposed in this paper has better ability to model correlation patterns by utilizing RNN model and achieves better results. Since our proposed speech steganalysis methods involve neural network models, we are also interested in image steganalysis algorithms that use neural networks. Actually there has been a long history of utilizing neural networks

4 LIN et al.: RNN-SM: FAST STEGANALYSIS OF VoIP STREAMS USING RNN 1857 for image steganalysis. However, earlier works all used hand-crafted features, and neural networks only served as classifiers [31] [33], which could not make full use of the power of neural networks. Qian et al. [34] first utilized CNN for image steganalysis, and proposed a unified neural network model for both feature extraction and classification. Xu et al. [35] later proposed another CNN-based image steganalysis model by incorporating more domain knowledge. Chen et al. [36] extended this work from spatial domain to JPEG domain. Ye et al. further proposed a new CNN-based image steganalysis model with some novel ideas. They used precomputed weights in the first layer for faster convergence, introduced truncated linear unit (TLU) in the network, and used selection channel in training. The proposed method achieved state-of-the-art results. IV. STEGANALYSIS USING RECURRENT NEURAL NETWORK For normal speech encoding, there exist strong correlation patterns in codewords. The correlation patterns would likely be weakened if original codewords are embedded with hidden data. Correlation patterns are consequently regarded as an indicator of steganography and could be extracted for steganalysis. RNN is supposed to be capable of exploiting codeword correlations, as its current output always takes a reference of earlier input data. Our solution for steganalysis is applying RNN to detecting the disparities in codeword correlations. It takes the advantage that RNN could not only show temporal behavior, but integrate a variety of correlation patterns which are drawn from our analysis (Section IV-A). We propose a Codeword Correlation Model (CCM) to delineate correlations in codewords (Section IV-B). We then put forward a Feature Classification Model (FCM) for RNN to decide judge threshold of cover speech and stego speech (Section IV-C). Finally, we suggest how the two models above should be cascaded in order to construct our RNN Based Steganalysis Model (RNN-SM) (Section IV-D). A. Codeword Correlation Analysis First we clarify what codeword correlation is. We define x i, j as the i-th codeword at frame j, where j [1, T ] and T is the time duration. For G.729 and G.723, i [1, 3], and the three codewords are from codebook L 1, L 2,andL 3 respectively. When all codewords are uncorrelated, their appearances are independent. Therefore, we have P(x i, j = u and x k,l = v) = P(x i, j = u) P(x k,l = v), i, k [1, 3], j, l [1, T ], u L i,v L k (5) When the two sides of the equation are not equal, certain correlation pattern exists. For example, when the left side of the equation is of higher value than right, it means that u and v are more likely to appear in pair in the given positions. Otherwise, u and v are less likely to appear in pair in the given positions. Larger imbalance of the two sides indicates stronger correlation. Fig. 2. Correlations Between Codewords. However, given only one codeword sequence, we cannot estimate the three involved possibility items. More observations are required so that we can accurately estimate those items. One solution is to consider the possibilities for multiple frame pairs where j and l have a fixed distance, instead of taking j and l as fixed frames. Specifically, we need to estimate the following three possibility items: P(x i, j = u and x k,l = v l j = δ) (6) P(x i, j = u l j = δ) = P(xi, j = u j T δ) (7) P(x k,l = v l j = δ) = P(xk,l = v l δ + 1) (8) We denote the possibility estimated from observations as P. Thus, the following equation can be used to evaluate correlation: P(x i, j = u and x k,l = v l j = δ) P(x i, j = u j T δ) P(x k,l = v l δ + 1) (9) The state-of-the-art steganalysis algorithms [16], [17] shared the same pattern: extracting correlation features from the codewords and then feeding the features to SVM classifiers. Li et al. [17] modeled the sequence of codewords as a Markov chain, and transition probability from one codeword to the one that was the most likely to appear immediately behind was selected as feature in this model. Li et al. [16] extended the method by taking the transition probabilities between l 1, l 2, and l 3 in one frame into consideration. And the features were selected by principal component analysis (PCA). These feature selection strategies had limitations. They only considered the codeword connections in one frame and between successive two frames. However, speech signals are highly correlated in a long time interval. Current codeword is not only determined by the previous codeword, but also influenced by the codewords appeared long before. Figure 2 explains the four kinds of correlations between codewords: Successive frame correlation Each codeword is computed on a short time frame (10ms for G.729, 30ms for G.723), which is comparable to the length of a phoneme in a word. The successive phonemes in a word are correlated, so that the successive codewords in the coding streams are correlated. We name this kind of correlation as successive frame correlation. To model

5 1858 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 7, JULY 2018 successive frame correlation, Li et al. [16], [17] used the deduced features from transition probabilities between P(x i, j =u and x i,l =v l j=1) any two codewords, i.e. P(x i, j =u for j T 1) all i, u and v. Intra-frame correlation In each frame, there are three codewords: l 1, l 2,andl 3. l 1 and l 2 together compose the first five LSFs, while l 1 and l 3 together compose the last five LSFs. Therefore, l 1, l 2,andl 3 are also correlated within a frame. We name the correlations between l 1, l 2, and l 3 as intra-frame correlation. Li et al. [16] used the transition probabilities of l 1 l 2, l 1 l 3,andl 2 l 3 to model intra-frame P(x i, j =u and x k, j =v j) correlation, i.e. P(x i, j =u for all u, v, and j) for (i, k) in {(1, 2), (1, 3), (2, 3)}. Cross frame correlation There are multiple phonemes in a word. Different words have different phoneme transition patterns. Therefore, current phoneme cannot be fully determined by the previous phoneme. Instead, we should take all previous appeared phonemes in this word into consideration. Cross frame correlation means the correlations between nonadjacent codewords in a word. Cross word correlation Codeword streams are essentially generated from sentences. It is known to all that words are highly correlated with each other on the sentence level. Therefore, their corresponding codewords are also correlated. In other words, a codeword from a word is not only determined by other codewords from the same word, but also determined by codewords from other words in the whole context. We name the correlation of codewords from different words as cross word correlation. The first two correlations explain the local features while the last two correlations describe the global features. Li et al. [16], [17] simplified the problem by only keeping local features, i.e. successive frame correlation and intraframe correlation, and omitting global ones, i.e. cross frame correlation and cross word correlation, which would harm the detection accuracy to some extent. In recent years, stimulated by big data, ANN was successfully used in many pattern recognition and artificial intelligence tasks. It is composed of a network of neuron-like units. At any time step, each non-input neuron computes its current output as a nonlinear function of the weighted sum of the activations of all units from which it receives inputs. Many ANNs, like CNN and multi-layer perceptron (MLP), are in a feedforward structure, which means the output at a time is only determined by its current input. RNN, on the other hand, is able memorize the past inputs by an internal state in the neuron as shown in Figure 3. The memory ability makes RNN very suitable for modeling long time series like audio. RNN has been widely and successfully used in many audio related tasks, such as speech recognition [37], natural language processing [38], phoneme classification [39], etc. But to the best of our knowledge, RNN has never been used in audio steganalysis tasks. Fig. 3. The Structure of RNN Unit. Because RNN can generate outputs with not only the information of the latest two frames, but also the information of all past frames, it is possible for RNN to consider all the four kinds of correlations at the same time. Long-Short Term Memory (LSTM) [40] is a refined version of RNN. It is capable of learning long-term dependencies in time series. This feature suits our task well. We use it to model the correlations of speech codewords. The model is further explained in the next subsection. B. Codeword Correlation Model For simplicity, we first introduce some notations. Assume M is a matrix and m i, j is its element. We define M i,a:b as the row vector composed by the elements at row i and column a to b of M, i.e. M i,a:b =[m i,a, m i,a+1,...,m i,b ] and M a:b,i as the column vector composed by the elements at column i and row a to b of M, i.e. M a:b,i =[m a,i, m a+1,i,...,m b,i ] T and M a:b,c:d as the matrix composed by the elements at row a to b and column c to d of M, i.e. M a:b,c:d =[M a:b,c, M a:b,c+1,...,m a:b,d ] Assume V is a vector and v i is its elements. We define V a:b as the row vector composed by a-th to b-th elements of V, i.e. V a:b =[v a,v a+1,...,v b ] We pack all codewords of a speech sample which has T frames into a codeword matrix X as X = x 1,1 x 1,2... x 1,T x 2,1 x 2,2... x 2,T (10) x 3,1 x 3,2... x 3,T where x 1,i, x 2,i, x 3,i stand for l 1, l 2, l 3 coefficients of the i-th frame respectively. For G.729 vocoder, x 1,i, x 2,i,andx 3,i are of 7 bits, 5 bits, and 5 bits respectively. For G.723 vocoder, x 1,i, x 2,i,andx 3,i are all of 8 bits. Because steganography only changes l 1, l 2,andl 3, X contains the full information for steganalysis. It is presented as the input of our CCM. As stated before, LSTM has good ability to model time series. We use LSTM to build our CCM. We denote the transfer function of LSTM units by f. In other words, when the input sequence is Q =[q 1, q 2,...,q t ], the output sequence R =[r 1, r 2,...,r t ] satisfies r i = f( Q 1:i ) The whole structure of CCM is shown in Figure 4. CCM contains two layers of LSTM units. The first layer has n 1

6 LIN et al.: RNN-SM: FAST STEGANALYSIS OF VoIP STREAMS USING RNN 1859 recompose preliminary features. CW is represented as an n 1 n 2 matrix B: b 1,1 b 1,2... b 1,n2 B = b 2,1 b 2,2... b 2,n2... (16) Fig. 4. Codeword Correlation Model. LSTM units and the second layer has n 2 LSTM units. We name the set of LSTM units in the first layer as U 1 = {u 1,1, u 1,2,...,u 1,n1 } and the set of LSTM units in the second layer as U 2 ={u 2,1, u 2,2,...,u 2,n2 }. Between input codewords and LSTM units in the first layer, there are Input Weights (IW) which define how much we should value each codeword. IW is presented in a 3 n 1 matrix A: A = a 1,1 a 1,2... a 1,n1 a 2,1 a 2,2... a 2,n1 (11) a 3,1 a 3,2... a 3,n1 For each LSTM unit u 1,i, there are three associated weights: a 1,i, a 2,i,anda 3,i, which will be multiplied to the three input codewords respectively to formulate the final input value at each time step. To be more specific, the input value for u 1,i at time t is e 1 i,t = a 1,i x 1,t + a 2,i x 2,t + a 3,i x 3,t (12) We define E 1 as the matrix packing all e 1 i,t together: E 1 = e 1 1,1 e 1 1,2... e 1 1,T e 1 2,1 e 1 2,2... e 1 2,T... e 1 n 1,1 e 1 n 1,2... e 1 n 1,T Then the output value of u 1,i at time t is (13) oi,t 1 = f(ei,1:t 1 ) = f(a i,1 X 1,1:t + a i,2 X 2,1:t + a i,3 X 3,1:t ) (14) And we define O 1 as the matrix gathering all first-layer outputs from start to end, i.e. o1,1 1 o1, o 1 1,T O 1 = o2,1 1 o2, o2,t 1... (15) on 1 1,1 on 1 1,2... on 1 1,T At every time step, each unit will give a separate output based on all codewords in the past. This first layer serves as the step of extracting preliminary features O 1. Inspired by the common sense that a deeper network usually yields a better modeling ability, we stack the network with another layer of LSTM units. Between the two layers of LSTM units, there are Connection Weights (CW) which b n1,1 b n1,2... b n1,n 2 For each LSTM unit u 2,i,therearen 1 associated weights: b 1,i, b 2,i,, b n1,i, which will be multiplied to the outputs of previous layer to form the final input. To be more specific, the input value for u 2,i at time t is n 1 ei,t 2 = o 1 j,t b j,i j=1 = O1:n 1 T 1,t B1:n1,i (17) We define E 2 as the matrix packing all ei,t 2 together: e1,1 2 e1, e 2 1,T E 2 = e2,1 2 e2, e2,t 2... (18) en 2 2,1 en 2 2,2... en 2 2,T Then the output of u 2,i at time t is: oi,t 2 = f(ei,1:t 2 ) = f(b 1:n1,i T O1:n 1 1,1:t ) (19) The final output matrix O 2 o1,1 2 o1, o 2 1,T O 2 = o2,1 2 o2, o2,t 2... (20) on 2 2,1 on 2 2,2... on 2 2,T contains the final correlation features. CCM has the potential of modeling all four types of correlations for the following reasons. First, IW combines l 1, l 2, and l 3 together into a value which is propagated in the whole network. Different weights on l 1, l 2, and l 3 indirectly determine what combinations of l 1, l 2,andl 3 can lead to the activation of LSTM units. Intra-frame correlation is therefore taken into account. Second, with LSTM s ability of memorizing the past, every output is deduced from all past codewords. The LSTM units in first layer can directly memorize the original codewords. The LSTM units in the second can further memorize more complicate past features by receiving information from the first layer. Thus, CCM has strong ability to model patterns over time. Successive frame correlation, cross frame correlation, and cross word correlation are just correlations on different time spans. Definitely they can be modeled by CCM. C. Feature Classification Model We can use the features collected in O 2 to classify whether the original speech has hidden data. A basic idea is to calculate the linear combination of all features. To be more specific, we define the Detection Weight (DW) as matrix C which is of n 2 T size and the linear combination is calculated as y = n 2 i=1 j=1 T Oi, 2 j C i, j (21)

7 1860 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 7, JULY 2018 Fig. 5. Feature Classification Models. (a) Full Model. (b) Pruned Model. To get normalized output between [0, 1], we put the value through a sigmoid function S and the final output is S(x) = e x O 3 = S(y) n 2 T = S( Oi, 2 j C i, j ) (22) i=1 j=1 If we set the detection threshold at 0.5, the final detection result can be expressed as Detection Result = { Stego Speech (O 3 0.5) Normal Speech (O 3 < 0.5) (23) In other words, the model tries to predict the label (0 for normal, 1 for stego) for a given speech. In Section V-D, we will further discuss how the threshold will influence the results. We name this model as full FCM. The structure is shown in Figure 5(a). However, when the speech sequence is long, DW matrix will grow large. The training and testing process of the model will be slowed down as a result. In addition, too many coefficients will raise the possibility of overfitting. Moreover, the size of model is dependent on the length of input sequence and it will severely limit the model s practicability. To solve these problems, we propose a pruned FCM model as shown in Figure 5(b). Notice that the final outputs at the end time T have already included all outputs at all time steps from the first layer because of LSTM s memorizing ability. Therefore, it is fair to only use O1:n 2 2,T for detection and cast away all past outputs O1:n 2 2,1:T 1. DW now shrinks to a n 2 -dimensional vector and the size of model is independent to the length of input sequence. To be more specific, we define DW as a vector C which contains n 2 coefficients: C =[c 1, c 2,...,c n2 ] T (24) Fig. 6. RNN Based Steganalysis Model. (a) Full Model. (b) Pruned Model. The final output is n 2 O 3 = S( Oi,T 2 c i) i=1 = S(O 2 1:n 2,T T C) (25) We will make a comparison of the full and the pruned model in Section V-C. D. RNN Based Steganalysis Model The final RNN-SM is constructed by cascading CCM and FCM together. Full RNN-SM and pruned RNN-SM are shown in Figure 6(a) and Figure 6(b) respectively. At each time step, we input the new l 1, l 2,andl 3 coefficients to the network. Starting from the left to the right, each LSTM unit upgrades its internal state according to the current input and outputs with a new value. For pruned RNN-SM, at the end of the sequence, the outputs from the second layer of LSTM are being forwarded to the final output node. For full RNN-SM, all outputs from the second layer of LSTM are being forwarded to the final output node. The output node gives the final detection value which is between [0, 1]. The final detection result can then be decided according to (23). In RNN-SM, there are three sets of undetermined weights: IW, CW, and DW, which are presented in matrix A, matrixb, and matrix/vector C respectively. They need to be determined before being used for steganalysis. To determine the weights, we follow a supervised learning framework as shown in Figure 7. First we collect a number of normal speech samples which make up the cover speech set. Each sample is further encoded with G.729 vocoder with or without QIM steganography. And then LSF codewords are extracted from the speech coding streams. For codeword segments with secret information, we assign a label 1 to them. For codeword segments without secret information, we assign a label 0 to them. Those segments will be randomly

8 LIN et al.: RNN-SM: FAST STEGANALYSIS OF VoIP STREAMS USING RNN 1861 Fig. 7. Steganalysis Framework. grouped into mini-batches. Each mini-batches will be inputed to RNN-SM whose weights are randomly initialized and the deviations between RNN-SM s outputs and true labels will be back-propagated to optimize the weights using Adam algorithm [41]. During the testing stage, the untested samples are being processed by similar procedure: G.729 encoding, LSF coefficient extraction, and being inputed to RNN-SM. And the final detection result is given according to (23). Our implementation of RNN-SM can be found on which is based on Keras library. V. EXPERIMENTS AND DISCUSSION In this section, we do some experiments to show the high accuracy and efficiency of RNN-SM. As discussed in Section IV-C, pruned RNN-SM is more efficient and has better usability than full RNN-SM. In Section V-C, we compare their performance. In other sections, RNN-SM stands for pruned RNN-SM. In Section V-A, we introduce the dataset and the performance evaluation metric we use. In Section V-B, we introduce how we determine the model size parameters, i.e. n 1 and n 2. In Section V-C, we compare the performance of full RNN-SM and pruned RNN-SM. In Section V-D, we discuss how the classification threshold will influence the results. In Section V-E, we evaluate the importance of four kinds of codeword correlations. In Section V-F, we present the accuracy testing results of RNN-SM and compare it with other state-ofthe-art methods. In Section V-G, we test the time consuming performance of RNN-SM and other state-of-the-art methods. A. Dataset and Metrics To the best of our knowledge, there is no public steganography/steganalysis dataset available for our evaluation. To test our algorithm, we need to construct our own dataset, which includes a cover speech dataset and a stego speech dataset. We publish the speech dataset on We collected 41 hours of Chinese speech and 72 hours of English speech in PCM format with 16 bits per sample from the Internet. The speech samples are from different male and female speakers. Those speech samples make up the cover speech dataset. For each sample in cover speech dataset, we embed random 01 bit streams using CNV-QIM steganography proposed in [11]. Embedding rate is defined as the ratio of the number of embedded bits to the whole embedding capacity. Lower embedding rate indicates fewer changes to the original data streams, and therefore it is harder to detect low embedding rate steganography. CNV-QIM is a 100% embedding algorithm and it embeds data in every frame. To further test the ability of our algorithm, we extend CNV-QIM by enabling low embedding rate steganography. When conducting a% embedding rate steganography, we embed each frame with a% probability. We perform 10%, 20%,, 100% embedding rate CNV-QIM to each sample in cover speech dataset, and the generated speech samples make up the stego speech dataset. In addition to embedding rate, sample length is another factor that influences detection accuracy. Usually when the sample length decreases, the detection accuracy decreases. However, as explained in Section I, steganalysis algorithm should be able to detect short samples. Therefore, we test the algorithms performance on detecting samples of different lengths. We cut the samples in cover speech dataset and stego speech dataset into 0.1s, 0.2s,, 10s segements. Segments of the same length are successive and nonoverlapped. Those segments make up the cover segment dataset and stego segment dataset respectively. For each test on RNN-SM, we pick up the positive and negative samples from stego segment dataset and cover segment dataset according to the required language, embedding rate and sample length. The ratio of the number of positive samples to the number of negative samples is 1 to 1. We randomly pick up four fifths of the samples as training set and the rest as testing set. In order to compare RNN-SM to other methods, we also conduct tests on two state-of-the-art methods: IDC [17] and SS-QCCN [16]. Those two methods are based on SVM. SVM has quadratic time complexity. Therefore, it is impractical to utilize all samples in stego segment dataset and cover segment dataset when evaluating IDC and SS-QCCN. According to experimental settings in [16], for each test on IDC and SS-QCCN, we randomly pick up 2000 samples from stego segment dataset and 2000 samples from cover segment dataset. Those 4000 samples form the training set. In addition, we randomly pick up 1000 samples from stego segment dataset and 1000 samples from cover segment dataset. Those 2000 samples form the testing set. We use three metrics to evaluate the performance. The first metric we use is classification accuracy, which is defined as the ratio of the number of samples that are correctly classified to the total number of samples. The second metric we use is false positive rate, which is defined as the ratio of cover segments that are classified as stego segments. The third metric we use

9 1862 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 7, JULY 2018 TABLE I GRID SEARCH FOR MODEL SIZE (100% EMBEDDING RATE, 0.1S CHINESE SAMPLES) TABLE II COMPARING FULL RNN-SM AND PRUNED RNN-SM is false negative rate, which is defined as the ratio of stego segments that are classified as cover segments. B. Determining Model Size There are two parameters in RNN-SM that are not yet determined: n 1 and n 2, which are the numbers of RNN units in the first layer and in the second layer. Generally, increasing the number of RNN units will enhance network s representation ability. However, it may increase the possibility of overfitting and slow down the training and testing process. To determine how n 1 and n 2 will influence the accuracy, training time, and prediction time, we enumerate n 1 and n 2 to be 25, 50, 75, and test all 9 combinations on pruned RNN-SM. The tests are done on all 0.1s 100% embedding rate Chinese samples in cover segment dataset and stego segment dataset. Specifically, the training set contains 1,243,240 stego segments and 1,243,240 cover segments. The testing set contains 310,810 stego segments and 310,810 cover segments. We run each test for 30 epochs, and report: (1) the accuracy on testing set, (2) the average training time for each epoch, and (3) the total prediction time for all samples in training set and testing set. The training process was done on a single GeForce GTX 1080 GPU and the prediction process was done on Intel(R) Xeon(R) CPU E GHz. Table I shows the results. 1 As we can see, when the model size increase from n1 = 25 and n2 = 25 to n1 = 50 and n2 = 50, the accuracy increases from 89.11% to 92.00%, but the training time and prediction time also increase. When n1 = 50 and n2 = 50, the training time and prediction time is reasonable, and the accuracy is also satisfactory. In the following tests, we just empirically set n 1 = 50 and n 2 = 50. It should be noted that n 1 and n 2 could be further tuned when one wants to get a better balance between accuracy and time cost. C. Comparing Pruned RNN-SM and Full RNN-SM Following the same experiment settings as Section V-B except for setting n 1 = 50 and n 2 = 50, we test the accuracy and efficiency of pruned RNN-SM and full RNN-SM. The results are shown in Table II. 1 The results in Table I are based on an different run with the results in Table III, so the accuracy for n1 = 50 and n2 = 50 in Table I (91.29%) is slightly different from the one in Table III (90.91%). Fig. 8. ROC curve of RNN-SM (Tested on 100% embedding rate, 0.1s Chinese samples). Compared with pruned RNN-SM, full RNN-SM s accuracy is slightly higher, but training time is significantly longer. The training time is even longer than the largest model (n 1 = n 2 = 75) in Table I. Considering the efficiency as well as other benfits of pruned RNN-SM mentioned in Section IV-C, we choose to use pruned RNN-SM in following tests. D. Discussion of Classification Threshold In (23), we define the detection threshold as 0.5. If the output of FCM is larger than 0.5, we will classify the speech as stego speech; otherwise, we will classify the speech as normal speech. In fact, this threshold can be adjusted for a desired balance between true positive rate and true negative rate. We change the threshold and plot an example of ROC curve in Figure 8. In this test case, when the threshold is 0.5, the true positive rate and the true positive rate are very close. If we want to decrease the false positive rate with some sacrifice of true positive rate, we can increase the threshold. RNN-SM provides a very easy way for users to adjust their desired working point by simply changing the threshold. For simplicity, we set threshold to be 0.5 in the following tests. E. Codeword Correlation Testing There are four kinds of codeword correlations discussed in the paper: successive frame correlation, intra-frame correlation, cross frame correlation, and cross word correlation. To show the importance of them, we do some analyses. We collect a G.729 coding stream with 180,000 frames and evaluate the codeword correlations according to (9). We fix u = 15 and enumerate reference codeword v from 0 to 31. Other parameters are set as follows: (1) For successive frame

10 LIN et al.: RNN-SM: FAST STEGANALYSIS OF VoIP STREAMS USING RNN 1863 Fig. 10. RNN-SM s Detection Accuracy of 100% Embedding Rate Samples at Different Lengths. Cross frame correlation and cross word correlation were omitted in those two methods. However, in the example we present, cross frame correlation is more important than intra-frame correlation. Moreover, even though cross word correlation is the weakest, it can still provide classification clues. RNN-SM has the potential to consider all four correlations at the same time, and therefore it is more likely for RNN-SM to have better results. Fig. 9. Evaluation of the Four Correlations. (a) Ranked Absolute Correlation Values. (b) Ranked Absolute Correlation Change. correlation, we set δ = 1, i = 2, k = 2; (2) For intra-frame correlation, we set δ = 0, i = 2, k = 3; (3) For cross frame correlation, we set δ = 2, i = 2, k = 2; (4) For cross word correlation, we set δ = 100, i = 2, k = 2. For each type of correlation, we take the absolute value of the results and rank them in descending order. The result is presented in Figure 9(a). Larger value indicates stronger correlation. As the figure shows, in this example, successive frame correlation is the strongest one. Intra-frame correlation and cross frame correlation are tying with each other. Cross word correlation is the weakest one. To further evaluate how the four kinds of correlations would change after embedded with hidden data, we embed the speech coding stream with hidden data (100% embedding rate) and rank the absolute value of correlation change for all v from 0 to 31 in descending order, as shown in Figure 9(b). The correlation with larger change is a better indication for steganalysis. As the figure shows, the importance of the four correlations in this example can be roughly ranked as: successive frame correlation > cross frame correlation > intraframe correlation > cross word correlation. The method proposed in [17] only considered successive frame correlation. The method proposed in [16] only considered successive frame correlation and intra-frame correlation. F. Accuracy Testing In this section, we test and compare RNN-SM s accuracy with other state-of-the-art methods: IDC [17] and SS-QCCN [16]. For each embedding rate, sample length, and language, we train a separate model for all three algorithms. The code of RMM-SM and our implementations of IDC and SS-QCCN can be found on RNN-SM/. 1) Influence of Sample Length: Detection of short steganography samples is challenging. To test the performance of our RNN-SM algorithm towards different sizes of samples, we fix the embedding rate at 100%. As for sample length, we first test 10 samples whose lengths are equally spaced in the range of 0.1s to 1s. We then increase step size to 1s and test another 5 samples, which lie between 2s and 6s. English and Chinese speech are tested separately. The result is shown in Table III and Figure 10. As we see, when the sample length increases, the accuracy also increases. This phenomenon is easy to explain. Longer sequence provides more observations on codeword correlations, which can therefore be modeled more accurately. Thus, the difference between the codeword correlation patterns of stego speech and cover speech is more distinct, leading to easier classification. Moreover, when the sample length is small, increasing sample length significantly benefits the accuracy. As the sample length increases, the benefit of increasing sample length diminishes. When the sample length is longer than 2s, accuracy starts to stabilize at around 99%. This observation indicates that the sample length as short as 2s is totally enough

11 1864 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 7, JULY 2018 TABLE III DETECTION ACCURACY OF 100% EMBEDDING RATE SAMPLES UNDER DIFFERENT LENGTHS for RNN-SM in full embedding scenario. We should also notice that even when the sample is of only 0.1s (10 frames), the detection accuracy is above 90%, which is an acceptable accuracy for steganalysis task. These clues indicate that RNN-SM can effectively detect both short samples and long samples. We also notice that the accuracies of English and Chinese speech are very close. Although the accuracy of Chinese speech starts to be a little higher than that of English speech when the sample length is greater than 0.8s, the accuracy difference is still smaller than 1%. This means that the characteristic difference between two languages has little effect in full embedding situations. And we can see that the accuracy on Chinese speech does not increase consistently with sample length. There are some peaks in the results (e.g. at 0.9s). This may due to the variance resulted from the randomness during training (e.g. randomly initialized neural network parameters, random mini-batch). We also compare the results with IDC and SS-QCCN. Full results are shown in Table III. As you can see, when sample length is longer than 2s, all three methods almost converge to their own saturation accuracy. SS-QCCN and RNN-SM have similar saturation accuracy, which is slightly higher than IDC s saturation accuracy. However, when sample length is shorter than 2s, their accuracies are very different. To further compare their performance on short samples, we draw their accuracy on sample length between 0.1s and 2s in Figure 11 (Chinese) and Figure 12 (English). Obviously, RNN-SM outperforms other two methods on short samples. This phenomenon is easy to explain. SS-QCCN and IDC are based on intraframe correlation and successive frame correlation. When the sample is short, information from those two correlations is limited. RNN-SM has the potential of exploiting correlations between frames of longer distance. Therefore, it can detect short samples better. 2) Influence of Embedding Rate: To avoid being easily detected, steganography algorithms often adopt low embedding rate strategy, which poses a challenge to steganalysis. Fig. 11. Comparison on Detection Accuracy of 100% Embedding Rate Chinese Samples at Different Lengths. In this test, we fix the sample length at 10s, and change embedding rate from 10% to 100% with step size of 10%. English and Chinese speech are tested separately. The result on RNN-SM is shown in Table IV and Figure 13. As the figure shows, when the embedding rate is low, the accuracy increases remarkably with the increase of embedding rate. When the embedding rate is above 30%, the detection accuracies of English speech samples and Chinese speech samples are both above 90%. We also notice that, when the embedding rate is low, the accuracy of English speech samples is higher than that of Chinese speech samples. However, when the embedding rate is high, the accuracies of two languanges are close. This phenomenon may be explained by the different characteristics of the two languages. English is composed by 20 vowels and 28 consonants. However, in Chinese, there are 412 kinds of syllables. The diversity makes correlation model for Chinese language more complicated and therefore it is more difficult to detect steganography in Chinese speech, especially when

12 LIN et al.: RNN-SM: FAST STEGANALYSIS OF VoIP STREAMS USING RNN 1865 TABLE IV DETECTION ACCURACY OF 10S SAMPLES UNDER DIFFERENT EMBEDDING RATE Fig. 12. Comparison on Detection Accuracy of 100% Embedding Rate English Samples at Different Lengths. Fig. 13. RNN-SM s Detection Accuracy of 10s Samples at Different Embedding Rates. embedding rate is low. When the embedding rate increases, the detection difficulty decreases and impact resulted from language characteristics goes down. Therefore, the two accuracy curves both converge to the same high level. We also compare the results with IDC and SS-QCCN. Full results are shown in Table IV. Results on Chinese and English are plotted in Figure 14 and Figure 15 respectively. For Chinese speech, RNN-SM and SS-QCCN have very close accuracy, which is much better than IDC s accuracy. For English speech, when embedding rate is smaller than 30%, RNN-SM has better accuracy than SS-QCCN. When embedding rate is greater than 40%, RNN-SM and SS-QCCN have close accuracy, which is still better than IDC s accuracy. These results indicate that compared with other state-of-the-art methods, RNN-SM can provide competitive accuracy in low embedding rate samples. 3) Simultaneous Influence of Sample Length and Embedding Rate: To further evaluate how sample length and embedding rate would influence the detection accuracy, we test a set of samples with multiple lengths and multiple embedding rates. Specifically, we test with 3 different sample lengths, which are 0.5s, 2s, and 6s, respectively; and with 5 embedding rates from 20% to 100%, increasing by 20%. Our experimental goal is to determine detection accuracy of all 15 combinations. English and Chinese speech are tested separately. The results are listed in Table V. We first look at results of RNN-SM. We plotted its results in Figure 16. As the figure shows, the accuracy plane is in a convex shape: decreasing in embedding rate or sample length will result in more detection errors, and the impact is bigger when embedding rate and sample length are small. When the sample is longer than 2s and the embedding rate is higher than 40%, the accuracies of Chinese speech and English speech are both above 90%. We also notice that, the accuracy of English speech is slightly higher than that of Chinese speech at most of the

13 1866 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 7, JULY 2018 TABLE V DETECTION ACCURACY UNDER DIFFERENT SAMPLE LENGTHS AND DIFFERENT EMBEDDING RATES Fig. 14. Comparison on Detection Accuracy of 10s Chinese Samples at Different Embedding Rate. Fig. 15. Comparison on Detection Accuracy of 10s English Samples at Different Embedding Rate. points. This observation accords with what we discovered in the previous test and can be explained in the same way. Now let s compare the results with IDC and SS-QCCN. As Table V shows, RNN-SM outperforms other two methods in all 0.5s tasks, most of the 2s tasks and half of the 6s tasks. For all tasks that RNN-SM does not have the best accuracy, the results of RNN-SM are actually very close to the best results. Again, these results show that RNN-SM can effectively detect samples of various lengths and various embedding rates. G. Efficiency Testing a) Testing time: To enable online steganalysis, the time for testing each sample must be as short as possible. We collect the average detecting time for samples of 0.1s and 0.5s and samples whose lengths lie between 1s and 10s with a step of 1s. This experiment is conducted on a computer whose CPU is Intel(R) Xeon(R) CPU E GHz. Figure 17 shows the testing time of RNN-SM. As the figure shows, the testing time approximately increases linearly with respect to the sample length, and is below 0.15% of sample length. This result demonstrates that RNN-SM is highly efficient and has no problem being deployed in online steganalysis tasks. We also compare the testing time with IDC and SS-QCCN. The results are shown in Table VI. Because SS-QCCN computes a high dimensional feature vector and needs to perform PCA reduction, its overhead is distinctly higher than the other two methods. b) Training time: SS-QCCN and IDC depends on SVM algorithm, which has quadratic time complexity during training, whereas RNN-SM s training time is linear with respect

14 LIN et al.: RNN-SM: FAST STEGANALYSIS OF VoIP STREAMS USING RNN 1867 steganography detection and achieves accuracy above 90% even when the sample is of 0.1s. The average testing time for each sample is only 0.15% of sample length. These features demonstrate that RNN-SM is a state-of-the-art algorithm for short sample detection problem and can be effectively used for online VoIP steganalysis. Moreover, we are the first to introduce RNN into VoIP steganalyis field and our work shows its practicability. In the future, we will further excavate the advantages of RNN and work on tasks that are temporarily unsolved with traditional steganalysis method, such as predicting the positions of embedding bits. Fig. 16. RNN-SM s Detection Accuracy under Different Sample Lengths and Different Embedding Rates. Fig. 17. Time to Perform RNN-SM. TABLE VI TESTING TIME COMPARISON to the number of training samples. Therefore, RNN-SM has the ability to scale up to large dataset whereas the other two methods do not. In practice, we can generate large training dataset, and usually large training dataset can cover more data modes and improve classifier s generalization capability. VI. CONCLUSION AND FUTURE WORK In this paper, we design a novel VoIP steganalysis algorithm called RNN-SM which can effectively detect QIM steganography in VoIP streams. Compared with previous state-of-the-art algorithms, our method has higher accuracy for short sample ACKNOWLEDGEMENTS The authors thank Yubo Luo, Wenhui Que, and Huaizhou Tao for helpful discussions on the algorithm, and thank Wenyu Wang for useful suggestions on the paper. REFERENCES [1] A. Cheddad, J. Condell, K. Curran, and P. M. Kevitt, Digital image steganography: Survey and analysis of current methods, Signal Process., vol. 90, no. 3, pp , Mar [2] M. H. Shirali-Shahreza and M. Shirali-Shahreza, A new approach to persian/arabic text steganography, in Proc. 1st IEEE/ACIS Int. Workshop Compon.-Based Softw. Eng., Comput. Inf. Sci., 5th IEEE/ACIS Int. Conf. Softw. Archit. Reuse (ICIS-COMSAR), Jul. 2006, pp [3] Y. Luo and Y. Huang, Text steganography with high embedding rate: Using recurrent neural networks to generate chinese classic poetry, in Proc. 5th ACM Workshop Inf. Hiding Multimedia Secur., 2017, pp [4] N. B. Lucena, J. Pease, P. Yadollahpour, and S. J. Chapin, Syntax and semantics-preserving application-layer protocol steganography, in Proc. Int. Workshop Inf. Hiding, 2004, pp [5] B. Goode, Voice over Internet protocol (VoIP), Proc. IEEE, vol. 90, no. 9, pp , Sep [6] M. Hamdaqa and L. Tahvildari, ReLACK: A reliable VoIP steganography approach, in Proc. 5th Int. Conf. Secure Softw. Integr. Rel. Improvement (SSIRI), Jun. 2011, pp [7] H. Tian, K. Zhou, H. Jiang, Y. Huang, J. Liu, and D. Feng, An adaptive steganography scheme for voice over IP, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2009, pp [8] E. Xu, B. Liu, L. Xu, Z. Wei, B. Zhao, and J. Su, Adaptive VoIP steganography for information hiding within network audio streams, in Proc. 14th Int. Conf. Netw.-Based Inf. Syst. (NBiS), 2011, pp [9] D. M. L. Ballesteros and J. M. A. Moreno, Highly transparent steganography model of speech signals using efficient wavelet masking, Expert Syst. Appl., vol. 39, no. 10, pp , [10] Y. F. Huang, S. Tang, and J. Yuan, Steganography in inactive frames of VoIP streams encoded by source codec, IEEE Trans. Inf. Forensics Security, vol. 6, no. 2, pp , Jun [11] B. Xiao, Y. Huang, and S. Tang, An approach to information hiding in low bit-rate speech stream, in Proc. IEEE Global Telecommun. Conf. (GLOBECOM), Nov. 2008, pp [12] H. Tian, J. Liu, and S. Li, Improving security of quantization-indexmodulation steganography in low bit-rate speech streams, Multimedia Syst., vol. 20, no. 2, pp , [13] Y. Huang, C. Liu, S. Tang, and S. Bai, Steganography integration into a low-bit rate speech codec, IEEE Trans. Inf. Forensics Security, vol. 7, no. 6, pp , Dec [14] B. Chen and G. W. Wornell, Quantization index modulation: A class of provably good methods for digital watermarking and information embedding, IEEE Trans. Inf. Theory, vol. 47, no. 4, pp , May [15] Y. F. Huang, S. Tang, and Y. Zhang, Detection of covert voiceover Internet protocol communications using sliding window-based steganalysis, IET Commun., vol. 5, no. 7, pp , May [16] S. Li, Y. Jia, and C.-C. J. Kuo, Steganalysis of QIM steganography in low-bit-rate speech signals, IEEE/ACM Trans. Audio, Speech, Language Process., vol. 25, no. 5, pp , May 2017.

15 1868 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 7, JULY 2018 [17] S.-B. Li, H.-Z. Tao, and Y.-F. Huang, Detection of quantization index modulation steganography in G bit stream based on quantization index sequence analysis, J. Zhejiang Univ. SCI. C, vol. 13, no. 8, pp , [18] D. O Shaughnessy, Linear predictive coding, IEEE Potentials, vol. 7, no. 1, pp , Feb [19] C. Kraetzer and J. Dittmann, Mel-cepstrum-based steganalysis for VoIP steganography, Proc. SPIE, vol. 6505, p , Mar [20] C. Kraetzer and J. Dittmann, Pros and cons of mel-cepstrum based audio steganalysis using SVM classification, in Proc. Int. Workshop Inf. Hiding, 2007, pp [21] Q. Liu, A. H. Sung, and M. Qiao, Temporal derivative-based spectrum and Mel-Cepstrum audio steganalysis, IEEE Trans. Inf. Forensics Security, vol. 4, no. 3, pp , Sep [22] J. Dittmann, D. Hesse, and R. Hillert, Steganography and steganalysis in voice-over IP scenarios: Operational aspects and first experiences with a new steganalysis tool set, Proc. SPIE, vol. 5681, pp , Mar [23] I. Avcıbas, Audio steganalysis with content-independent distortion measures, IEEE Signal Process. Lett., vol. 13, no. 2, pp , Feb [24] O. Altun, G. Sharma, M. U. Celik, M. Sterling, E. L. Titlebaum, and M. Bocko, Morphological steganalysis of audio signals and the principle of diminishing marginal distortions, in Proc. ICASSP, Mar. 2005, pp [25] X.-M. Ru, Y.-T. Zhuang, and F. Wu, Audio steganalysis based on negative resonance phenomenon caused by steganographic tools, J. Zhejiang Univ.-SCI A, vol. 7, no. 4, pp , [26] Y. Huang, S. Tang, C. Bao, and Y. J. Yip, Steganalysis of compressed speech to detect covert voice over internet protocol channels, IET Inf. Secur., vol. 5, no. 1, pp , Mar [27] C. Paulin, S.-A. Selouani, and E. Hervet, Audio steganalysis using deep belief networks, Int. J. Speech Technol., vol. 19, no. 3, pp , [28] C. Paulin, S.-A. Selouani, and É. Hervet, Speech steganalysis using evolutionary restricted Boltzmann machines, in Proc. IEEE Congr. Evol. Comput. (CEC), Jul. 2016, pp [29] S. Rekik, S. Selouani, D. Guerchi, and H. Hamam, An autoregressive time delay neural network for speech steganalysis, in Proc. 11th Int. Conf. Inf. Sci. Signal Process. Appl. (ISSPA), Jul. 2012, pp [30] B. Chen, W. Luo, and H. Li, Audio steganalysis with convolutional neural network, in Proc. 5th ACM Workshop Inf. Hiding Multimedia Secur., 2017, pp [31] L. Shaohui, Y. Hongxun, and G. Wen, Neural network based steganalysis in still images, in Proc. Int. Conf. Multimedia Expo (ICME), vol , pp. II-509 II-512. [32] Y. Q. Shi et al., Image steganalysis based on moments of characteristic functions using wavelet decomposition, prediction-error image, and neural network, in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jun. 2005, p. 4. [33] V. Sabeti, S. Samavi, M. Mahdavi, and S. Shirani, Steganalysis and payload estimation of embedding in pixel differences using neural networks, Pattern Recognit., vol. 43, no. 1, pp , [34] Y. Qian, J. Dong, W. Wang, and T. Tan, Deep learning for steganalysis via convolutional neural networks, Media Watermarking, Secur., Forensics, vol. 9409, p J, Mar [35] G. Xu, H.-Z. Wu, and Y.-Q. Shi, Structural design of convolutional neural networks for steganalysis, IEEE Signal Process. Lett., vol. 23, no. 5, pp , May [36] M. Chen, V. Sedighi, M. Boroumand, and J. Fridrich, Jpeg-phase-aware convolutional neural network for steganalysis of JPEG images, in Proc. 5th ACM, 2017, pp [37] A. Graves, A.-R. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networks, in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., May 2013, pp [38] R. Socher, C. C. Lin, C. Manning, and A. Y. Ng, Parsing natural scenes and natural language with recursive neural networks, in Proc. 28th Int. Conf. Mach. Learn. (ICML), 2011, pp [39] A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Netw., vol. 18, no. 5, pp , [40] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, no. 8, pp , [41] D. Kingma and J. Ba. (2014). Adam: A method for stochastic optimization. [Online]. Available: Zinan Lin received the B.E. degree in electronic engineering from Tsinghua University, Beijing, China, in He is currently pursuing the Ph.D. degree with the Department of Electrical and Computer Engineering, Carnegie Mellon University. He has broad interests in machine learning and information security. Yongfeng Huang (SM 11) received the Ph.D. degree in computer science and engineering from the Huazhong University of Science and Technology, in He is currently a Professor with the Department of Electronic Engineering, Tsinghua University, Beijing, China. His research interests include cloud computing, data mining, and network security. Jilong Wang received the Ph.D. degree in computer science from Tsinghua University, Beijing, China, in He is currently a Professor with the Institute for Network Sciences and Cyberspace, Tsinghua University. His research interests include network architecture and network management.

Convolutional Neural Network-based Steganalysis on Spatial Domain

Convolutional Neural Network-based Steganalysis on Spatial Domain Convolutional Neural Network-based Steganalysis on Spatial Domain Dong-Hyun Kim, and Hae-Yeoun Lee Abstract Steganalysis has been studied to detect the existence of hidden messages by steganography. However,

More information

Steganalysis of compressed speech to detect covert voice over Internet protocol channels

Steganalysis of compressed speech to detect covert voice over Internet protocol channels Steganalysis of compressed speech to detect covert voice over Internet protocol channels Huang, Y., Tang, S., Bao, C. and Yip, YJ http://dx.doi.org/10.1049/iet ifs.2010.0032 Title Authors Type URL Steganalysis

More information

Deep Neural Network Architectures for Modulation Classification

Deep Neural Network Architectures for Modulation Classification Deep Neural Network Architectures for Modulation Classification Xiaoyu Liu, Diyu Yang, and Aly El Gamal School of Electrical and Computer Engineering Purdue University Email: {liu1962, yang1467, elgamala}@purdue.edu

More information

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE A Thesis by Andrew J. Zerngast Bachelor of Science, Wichita State University, 2008 Submitted to the Department of Electrical

More information

An Enhanced Least Significant Bit Steganography Technique

An Enhanced Least Significant Bit Steganography Technique An Enhanced Least Significant Bit Steganography Technique Mohit Abstract - Message transmission through internet as medium, is becoming increasingly popular. Hence issues like information security are

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks Recently, consensus based distributed estimation has attracted considerable attention from various fields to estimate deterministic

More information

CHAPTER 4 LINK ADAPTATION USING NEURAL NETWORK

CHAPTER 4 LINK ADAPTATION USING NEURAL NETWORK CHAPTER 4 LINK ADAPTATION USING NEURAL NETWORK 4.1 INTRODUCTION For accurate system level simulator performance, link level modeling and prediction [103] must be reliable and fast so as to improve the

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2

More information

Generating an appropriate sound for a video using WaveNet.

Generating an appropriate sound for a video using WaveNet. Australian National University College of Engineering and Computer Science Master of Computing Generating an appropriate sound for a video using WaveNet. COMP 8715 Individual Computing Project Taku Ueki

More information

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

Introduction to Video Forgery Detection: Part I

Introduction to Video Forgery Detection: Part I Introduction to Video Forgery Detection: Part I Detecting Forgery From Static-Scene Video Based on Inconsistency in Noise Level Functions IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5,

More information

Chapter 3 LEAST SIGNIFICANT BIT STEGANOGRAPHY TECHNIQUE FOR HIDING COMPRESSED ENCRYPTED DATA USING VARIOUS FILE FORMATS

Chapter 3 LEAST SIGNIFICANT BIT STEGANOGRAPHY TECHNIQUE FOR HIDING COMPRESSED ENCRYPTED DATA USING VARIOUS FILE FORMATS 44 Chapter 3 LEAST SIGNIFICANT BIT STEGANOGRAPHY TECHNIQUE FOR HIDING COMPRESSED ENCRYPTED DATA USING VARIOUS FILE FORMATS 45 CHAPTER 3 Chapter 3: LEAST SIGNIFICANT BIT STEGANOGRAPHY TECHNIQUE FOR HIDING

More information

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO Introduction to RNNs for NLP SHANG GAO About Me PhD student in the Data Science and Engineering program Took Deep Learning last year Work in the Biomedical Sciences, Engineering, and Computing group at

More information

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks Alfredo Zermini, Qiuqiang Kong, Yong Xu, Mark D. Plumbley, Wenwu Wang Centre for Vision,

More information

An Integrated Image Steganography System. with Improved Image Quality

An Integrated Image Steganography System. with Improved Image Quality Applied Mathematical Sciences, Vol. 7, 2013, no. 71, 3545-3553 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.34236 An Integrated Image Steganography System with Improved Image Quality

More information

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION Scott Deeann Chen and Pierre Moulin University of Illinois at Urbana-Champaign Department of Electrical and Computer Engineering 5 North Mathews

More information

Improved Detection by Peak Shape Recognition Using Artificial Neural Networks

Improved Detection by Peak Shape Recognition Using Artificial Neural Networks Improved Detection by Peak Shape Recognition Using Artificial Neural Networks Stefan Wunsch, Johannes Fink, Friedrich K. Jondral Communications Engineering Lab, Karlsruhe Institute of Technology Stefan.Wunsch@student.kit.edu,

More information

Iterative Joint Source/Channel Decoding for JPEG2000

Iterative Joint Source/Channel Decoding for JPEG2000 Iterative Joint Source/Channel Decoding for JPEG Lingling Pu, Zhenyu Wu, Ali Bilgin, Michael W. Marcellin, and Bane Vasic Dept. of Electrical and Computer Engineering The University of Arizona, Tucson,

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016 Artificial Neural Networks Artificial Intelligence Santa Clara, 2016 Simulate the functioning of the brain Can simulate actual neurons: Computational neuroscience Can introduce simplified neurons: Neural

More information

Image Manipulation Detection using Convolutional Neural Network

Image Manipulation Detection using Convolutional Neural Network Image Manipulation Detection using Convolutional Neural Network Dong-Hyun Kim 1 and Hae-Yeoun Lee 2,* 1 Graduate Student, 2 PhD, Professor 1,2 Department of Computer Software Engineering, Kumoh National

More information

Lossy Compression of Permutations

Lossy Compression of Permutations 204 IEEE International Symposium on Information Theory Lossy Compression of Permutations Da Wang EECS Dept., MIT Cambridge, MA, USA Email: dawang@mit.edu Arya Mazumdar ECE Dept., Univ. of Minnesota Twin

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Introduction to More Advanced Steganography. John Ortiz. Crucial Security Inc. San Antonio

Introduction to More Advanced Steganography. John Ortiz. Crucial Security Inc. San Antonio Introduction to More Advanced Steganography John Ortiz Crucial Security Inc. San Antonio John.Ortiz@Harris.com 210 977-6615 11/17/2011 Advanced Steganography 1 Can YOU See the Difference? Which one of

More information

HYBRID MATRIX CODING AND ERROR-CORRECTION CODING SCHEME FOR REVERSIBLE DATA HIDING IN BINARY VQ INDEX CODESTREAM

HYBRID MATRIX CODING AND ERROR-CORRECTION CODING SCHEME FOR REVERSIBLE DATA HIDING IN BINARY VQ INDEX CODESTREAM International Journal of Innovative Computing, Information and Control ICIC International c 2013 ISSN 1349-4198 Volume 9, Number 6, June 2013 pp. 2521 2531 HYBRID MATRIX CODING AND ERROR-CORRECTION CODING

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS Kuan-Chuan Peng and Tsuhan Chen Cornell University School of Electrical and Computer Engineering Ithaca, NY 14850

More information

Current Harmonic Estimation in Power Transmission Lines Using Multi-layer Perceptron Learning Strategies

Current Harmonic Estimation in Power Transmission Lines Using Multi-layer Perceptron Learning Strategies Journal of Electrical Engineering 5 (27) 29-23 doi:.7265/2328-2223/27.5. D DAVID PUBLISHING Current Harmonic Estimation in Power Transmission Lines Using Multi-layer Patrice Wira and Thien Minh Nguyen

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying

More information

Digital Television Lecture 5

Digital Television Lecture 5 Digital Television Lecture 5 Forward Error Correction (FEC) Åbo Akademi University Domkyrkotorget 5 Åbo 8.4. Error Correction in Transmissions Need for error correction in transmissions Loss of data during

More information

A New Steganographic Method for Palette-Based Images

A New Steganographic Method for Palette-Based Images A New Steganographic Method for Palette-Based Images Jiri Fridrich Center for Intelligent Systems, SUNY Binghamton, Binghamton, NY 13902-6000 Abstract In this paper, we present a new steganographic technique

More information

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS 1 WAHYU KUSUMA R., 2 PRINCE BRAVE GUHYAPATI V 1 Computer Laboratory Staff., Department of Information Systems, Gunadarma University,

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

Background Dirty Paper Coding Codeword Binning Code construction Remaining problems. Information Hiding. Phil Regalia

Background Dirty Paper Coding Codeword Binning Code construction Remaining problems. Information Hiding. Phil Regalia Information Hiding Phil Regalia Department of Electrical Engineering and Computer Science Catholic University of America Washington, DC 20064 regalia@cua.edu Baltimore IEEE Signal Processing Society Chapter,

More information

A variable step-size LMS adaptive filtering algorithm for speech denoising in VoIP

A variable step-size LMS adaptive filtering algorithm for speech denoising in VoIP 7 3rd International Conference on Computational Systems and Communications (ICCSC 7) A variable step-size LMS adaptive filtering algorithm for speech denoising in VoIP Hongyu Chen College of Information

More information

Steganography & Steganalysis of Images. Mr C Rafferty Msc Comms Sys Theory 2005

Steganography & Steganalysis of Images. Mr C Rafferty Msc Comms Sys Theory 2005 Steganography & Steganalysis of Images Mr C Rafferty Msc Comms Sys Theory 2005 Definitions Steganography is hiding a message in an image so the manner that the very existence of the message is unknown.

More information

IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE

IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE International Journal of Technology (2011) 1: 56 64 ISSN 2086 9614 IJTech 2011 IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE Djamhari Sirat 1, Arman D. Diponegoro

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

3432 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER 2007

3432 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 10, OCTOBER 2007 3432 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 53, NO 10, OCTOBER 2007 Resource Allocation for Wireless Fading Relay Channels: Max-Min Solution Yingbin Liang, Member, IEEE, Venugopal V Veeravalli, Fellow,

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

An Approach to Very Low Bit Rate Speech Coding

An Approach to Very Low Bit Rate Speech Coding Computing For Nation Development, February 26 27, 2009 Bharati Vidyapeeth s Institute of Computer Applications and Management, New Delhi An Approach to Very Low Bit Rate Speech Coding Hari Kumar Singh

More information

Speech Enhancement Based On Noise Reduction

Speech Enhancement Based On Noise Reduction Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion

More information

A Novel Approach of Compressing Images and Assessment on Quality with Scaling Factor

A Novel Approach of Compressing Images and Assessment on Quality with Scaling Factor A Novel Approach of Compressing Images and Assessment on Quality with Scaling Factor Umesh 1,Mr. Suraj Rana 2 1 M.Tech Student, 2 Associate Professor (ECE) Department of Electronic and Communication Engineering

More information

CHAPTER 6 BACK PROPAGATED ARTIFICIAL NEURAL NETWORK TRAINED ARHF

CHAPTER 6 BACK PROPAGATED ARTIFICIAL NEURAL NETWORK TRAINED ARHF 95 CHAPTER 6 BACK PROPAGATED ARTIFICIAL NEURAL NETWORK TRAINED ARHF 6.1 INTRODUCTION An artificial neural network (ANN) is an information processing model that is inspired by biological nervous systems

More information

Medium Access Control via Nearest-Neighbor Interactions for Regular Wireless Networks

Medium Access Control via Nearest-Neighbor Interactions for Regular Wireless Networks Medium Access Control via Nearest-Neighbor Interactions for Regular Wireless Networks Ka Hung Hui, Dongning Guo and Randall A. Berry Department of Electrical Engineering and Computer Science Northwestern

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

STEGANALYSIS OF IMAGES CREATED IN WAVELET DOMAIN USING QUANTIZATION MODULATION

STEGANALYSIS OF IMAGES CREATED IN WAVELET DOMAIN USING QUANTIZATION MODULATION STEGANALYSIS OF IMAGES CREATED IN WAVELET DOMAIN USING QUANTIZATION MODULATION SHAOHUI LIU, HONGXUN YAO, XIAOPENG FAN,WEN GAO Vilab, Computer College, Harbin Institute of Technology, Harbin, China, 150001

More information

Radar Signal Classification Based on Cascade of STFT, PCA and Naïve Bayes

Radar Signal Classification Based on Cascade of STFT, PCA and Naïve Bayes 216 7th International Conference on Intelligent Systems, Modelling and Simulation Radar Signal Classification Based on Cascade of STFT, PCA and Naïve Bayes Yuanyuan Guo Department of Electronic Engineering

More information

A Novel Fuzzy Neural Network Based Distance Relaying Scheme

A Novel Fuzzy Neural Network Based Distance Relaying Scheme 902 IEEE TRANSACTIONS ON POWER DELIVERY, VOL. 15, NO. 3, JULY 2000 A Novel Fuzzy Neural Network Based Distance Relaying Scheme P. K. Dash, A. K. Pradhan, and G. Panda Abstract This paper presents a new

More information

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab.  김강일 신경망기반자동번역기술 Konkuk University Computational Intelligence Lab. http://ci.konkuk.ac.kr kikim01@kunkuk.ac.kr 김강일 Index Issues in AI and Deep Learning Overview of Machine Translation Advanced Techniques in

More information

PRIOR IMAGE JPEG-COMPRESSION DETECTION

PRIOR IMAGE JPEG-COMPRESSION DETECTION Applied Computer Science, vol. 12, no. 3, pp. 17 28 Submitted: 2016-07-27 Revised: 2016-09-05 Accepted: 2016-09-09 Compression detection, Image quality, JPEG Grzegorz KOZIEL * PRIOR IMAGE JPEG-COMPRESSION

More information

REVERSIBLE data hiding, or lossless data hiding, hides

REVERSIBLE data hiding, or lossless data hiding, hides IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 10, OCTOBER 2006 1301 A Reversible Data Hiding Scheme Based on Side Match Vector Quantization Chin-Chen Chang, Fellow, IEEE,

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

Laser Printer Source Forensics for Arbitrary Chinese Characters

Laser Printer Source Forensics for Arbitrary Chinese Characters Laser Printer Source Forensics for Arbitrary Chinese Characters Xiangwei Kong, Xin gang You,, Bo Wang, Shize Shang and Linjie Shen Information Security Research Center, Dalian University of Technology,

More information

MAGNT Research Report (ISSN ) Vol.6(1). PP , Controlling Cost and Time of Construction Projects Using Neural Network

MAGNT Research Report (ISSN ) Vol.6(1). PP , Controlling Cost and Time of Construction Projects Using Neural Network Controlling Cost and Time of Construction Projects Using Neural Network Li Ping Lo Faculty of Computer Science and Engineering Beijing University China Abstract In order to achieve optimized management,

More information

arxiv: v1 [cs.ce] 9 Jan 2018

arxiv: v1 [cs.ce] 9 Jan 2018 Predict Forex Trend via Convolutional Neural Networks Yun-Cheng Tsai, 1 Jun-Hao Chen, 2 Jun-Jie Wang 3 arxiv:1801.03018v1 [cs.ce] 9 Jan 2018 1 Center for General Education 2,3 Department of Computer Science

More information

6. FUNDAMENTALS OF CHANNEL CODER

6. FUNDAMENTALS OF CHANNEL CODER 82 6. FUNDAMENTALS OF CHANNEL CODER 6.1 INTRODUCTION The digital information can be transmitted over the channel using different signaling schemes. The type of the signal scheme chosen mainly depends on

More information

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Emeric Stéphane Boigné eboigne@stanford.edu Jan Felix Heyse heyse@stanford.edu Abstract Scaling

More information

Lane Detection in Automotive

Lane Detection in Automotive Lane Detection in Automotive Contents Introduction... 2 Image Processing... 2 Reading an image... 3 RGB to Gray... 3 Mean and Gaussian filtering... 5 Defining our Region of Interest... 6 BirdsEyeView Transformation...

More information

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES SF Minhas A Barton P Gaydecki School of Electrical and

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Global Contrast Enhancement Detection via Deep Multi-Path Network

Global Contrast Enhancement Detection via Deep Multi-Path Network Global Contrast Enhancement Detection via Deep Multi-Path Network Cong Zhang, Dawei Du, Lipeng Ke, Honggang Qi School of Computer and Control Engineering University of Chinese Academy of Sciences, Beijing,

More information

Watermarking-based Image Authentication with Recovery Capability using Halftoning and IWT

Watermarking-based Image Authentication with Recovery Capability using Halftoning and IWT Watermarking-based Image Authentication with Recovery Capability using Halftoning and IWT Luis Rosales-Roldan, Manuel Cedillo-Hernández, Mariko Nakano-Miyatake, Héctor Pérez-Meana Postgraduate Section,

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Analysis of Secure Text Embedding using Steganography

Analysis of Secure Text Embedding using Steganography Analysis of Secure Text Embedding using Steganography Rupinder Kaur Department of Computer Science and Engineering BBSBEC, Fatehgarh Sahib, Punjab, India Deepak Aggarwal Department of Computer Science

More information

A SECURE IMAGE STEGANOGRAPHY USING LEAST SIGNIFICANT BIT TECHNIQUE

A SECURE IMAGE STEGANOGRAPHY USING LEAST SIGNIFICANT BIT TECHNIQUE Int. J. Engg. Res. & Sci. & Tech. 2014 Amit and Jyoti Pruthi, 2014 Research Paper A SECURE IMAGE STEGANOGRAPHY USING LEAST SIGNIFICANT BIT TECHNIQUE Amit 1 * and Jyoti Pruthi 1 *Corresponding Author: Amit

More information

Hash Function Learning via Codewords

Hash Function Learning via Codewords Hash Function Learning via Codewords 2015 ECML/PKDD, Porto, Portugal, September 7 11, 2015. Yinjie Huang 1 Michael Georgiopoulos 1 Georgios C. Anagnostopoulos 2 1 Machine Learning Laboratory, University

More information

MLP for Adaptive Postprocessing Block-Coded Images

MLP for Adaptive Postprocessing Block-Coded Images 1450 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 10, NO. 8, DECEMBER 2000 MLP for Adaptive Postprocessing Block-Coded Images Guoping Qiu, Member, IEEE Abstract A new technique

More information

Attention-based Multi-Encoder-Decoder Recurrent Neural Networks

Attention-based Multi-Encoder-Decoder Recurrent Neural Networks Attention-based Multi-Encoder-Decoder Recurrent Neural Networks Stephan Baier 1, Sigurd Spieckermann 2 and Volker Tresp 1,2 1- Ludwig Maximilian University Oettingenstr. 67, Munich, Germany 2- Siemens

More information

Improved Spread Spectrum: A New Modulation Technique for Robust Watermarking

Improved Spread Spectrum: A New Modulation Technique for Robust Watermarking 898 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 4, APRIL 2003 Improved Spread Spectrum: A New Modulation Technique for Robust Watermarking Henrique S. Malvar, Fellow, IEEE, and Dinei A. F. Florêncio,

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

Research on Hand Gesture Recognition Using Convolutional Neural Network

Research on Hand Gesture Recognition Using Convolutional Neural Network Research on Hand Gesture Recognition Using Convolutional Neural Network Tian Zhaoyang a, Cheng Lee Lung b a Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China E-mail address:

More information

Scale estimation in two-band filter attacks on QIM watermarks

Scale estimation in two-band filter attacks on QIM watermarks Scale estimation in two-band filter attacks on QM watermarks Jinshen Wang a,b, vo D. Shterev a, and Reginald L. Lagendijk a a Delft University of Technology, 8 CD Delft, etherlands; b anjing University

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information

Application of Classifier Integration Model to Disturbance Classification in Electric Signals

Application of Classifier Integration Model to Disturbance Classification in Electric Signals Application of Classifier Integration Model to Disturbance Classification in Electric Signals Dong-Chul Park Abstract An efficient classifier scheme for classifying disturbances in electric signals using

More information

FPGA implementation of LSB Steganography method

FPGA implementation of LSB Steganography method FPGA implementation of LSB Steganography method Pangavhane S.M. 1 &Punde S.S. 2 1,2 (E&TC Engg. Dept.,S.I.E.RAgaskhind, SPP Univ., Pune(MS), India) Abstract : "Steganography is a Greek origin word which

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2 Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter

More information

Resampling and the Detection of LSB Matching in Colour Bitmaps

Resampling and the Detection of LSB Matching in Colour Bitmaps Resampling and the Detection of LSB Matching in Colour Bitmaps Andrew Ker adk@comlab.ox.ac.uk Royal Society University Research Fellow Oxford University Computing Laboratory SPIE EI 05 17 January 2005

More information

Time division multiplexing The block diagram for TDM is illustrated as shown in the figure

Time division multiplexing The block diagram for TDM is illustrated as shown in the figure CHAPTER 2 Syllabus: 1) Pulse amplitude modulation 2) TDM 3) Wave form coding techniques 4) PCM 5) Quantization noise and SNR 6) Robust quantization Pulse amplitude modulation In pulse amplitude modulation,

More information

Reversible data hiding based on histogram modification using S-type and Hilbert curve scanning

Reversible data hiding based on histogram modification using S-type and Hilbert curve scanning Advances in Engineering Research (AER), volume 116 International Conference on Communication and Electronic Information Engineering (CEIE 016) Reversible data hiding based on histogram modification using

More information

An Iterative BP-CNN Architecture for Channel Decoding

An Iterative BP-CNN Architecture for Channel Decoding 1 An Iterative BP-CNN Architecture for Channel Decoding Fei Liang, Cong Shen, and Feng Wu arxiv:1707.05697v1 [stat.ml] 18 Jul 2017 Abstract Inspired by recent advances in deep learning, we propose a novel

More information

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Compression Using Voice Excited Linear Predictive Coding Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality

More information

Dynamic Collage Steganography on Images

Dynamic Collage Steganography on Images ISSN 2278 0211 (Online) Dynamic Collage Steganography on Images Aswathi P. S. Sreedhi Deleepkumar Maya Mohanan Swathy M. Abstract: Collage steganography, a type of steganographic method, introduced to

More information

Hand & Upper Body Based Hybrid Gesture Recognition

Hand & Upper Body Based Hybrid Gesture Recognition Hand & Upper Body Based Hybrid Gesture Prerna Sharma #1, Naman Sharma *2 # Research Scholor, G. B. P. U. A. & T. Pantnagar, India * Ideal Institue of Technology, Ghaziabad, India Abstract Communication

More information

UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik

UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik Department of Electrical and Computer Engineering, The University of Texas at Austin,

More information

Steganalytic methods for the detection of histogram shifting data-hiding schemes

Steganalytic methods for the detection of histogram shifting data-hiding schemes Steganalytic methods for the detection of histogram shifting data-hiding schemes Daniel Lerch and David Megías Universitat Oberta de Catalunya, Spain. ABSTRACT In this paper, some steganalytic techniques

More information

TIME encoding of a band-limited function,,

TIME encoding of a band-limited function,, 672 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 8, AUGUST 2006 Time Encoding Machines With Multiplicative Coupling, Feedforward, and Feedback Aurel A. Lazar, Fellow, IEEE

More information

Speech Coding using Linear Prediction

Speech Coding using Linear Prediction Speech Coding using Linear Prediction Jesper Kjær Nielsen Aalborg University and Bang & Olufsen jkn@es.aau.dk September 10, 2015 1 Background Speech is generated when air is pushed from the lungs through

More information

Improved Detection of LSB Steganography in Grayscale Images

Improved Detection of LSB Steganography in Grayscale Images Improved Detection of LSB Steganography in Grayscale Images Andrew Ker adk@comlab.ox.ac.uk Royal Society University Research Fellow at Oxford University Computing Laboratory Information Hiding Workshop

More information

Application of Histogram Examination for Image Steganography

Application of Histogram Examination for Image Steganography J. Appl. Environ. Biol. Sci., 5(9S)97-104, 2015 2015, TextRoad Publication ISSN: 2090-4274 Journal of Applied Environmental and Biological Sciences www.textroad.com Application of Histogram Examination

More information