EXPLORING MULTIDIMENSIONAL LSTMS FOR LARGE VOCABULARY ASR

Size: px
Start display at page:

Download "EXPLORING MULTIDIMENSIONAL LSTMS FOR LARGE VOCABULARY ASR"

Transcription

1 EXPLORING MULTIDIMENSIONAL LSTMS FOR LARGE VOCABULARY ASR Jinyu Li, Abderahman Mohamed, Geoffrey Zweig, and Yifan Gong Microsoft Corporation, One Microsoft Way, Redmond, WA {jinyi, asamir, gzweig, ABSTRACT Long short-term memory (LSTM) recurrent neura networks (RNNs) have recenty shown significant performance improvements over deep feed-forward neura networks. A key aspect of these modes is the use of time recurrence, combined with a gating architecture that aows them to track the ong-term dynamics of speech. Inspired by human spectrogram reading, we recenty proposed the frequency LSTM (F-LSTM) that performs -D recurrence over the frequency axis and then performs -D recurrence over the time axis. In this study, we further improve the acoustic mode by proposing a 2-D, time-frequency (TF) LSTM. The TF-LSTM jointy scans the input over the time and frequency axes to mode spectro-tempora warping, and then uses the output activations as the input to a time LSTM (T-LSTM). The joint timefrequency modeing better normaizes the features for the upper ayer T-LSTMs. Evauated on a 375-hour short message dictation task, the proposed TF-LSTM obtained a 3.4% reative reduction over the best T-LSTM. The invariance property achieved by joint time-frequency anaysis is demonstrated on a mismatched test set, where the TF-LSTM achieves a 4.2% reative reduction over the best T-LSTM. Index Terms LSTM, RNN, time and frequency, mutidimensiona. INTRODUCTION Recenty, significant progress has been made in automatic speech recognition (ASR) thanks to the appication of deep neura networks (DNNs) [][2][3][4][5][6]. DNNs, however, ony consider information in a fixed-ength siding window of frames and thus cannot expoit ong-range correations in the signa. Recurrent neura networks (RNNs), on the other hand, can encode sequence history in their interna state, and thus have the potentia to predict phonemes based on a the speech features observed up to the current frame. Unfortunatey, simpe RNNs, depending on the argest eigenvaue of the state-update matrix, may have gradients which either increase or decrease exponentiay over time. Thus, the basic RNN is difficut to train, and in practice can ony mode short-range effects. Long short-term memory (LSTM) RNNs [7][8] were deveoped to overcome these probems. LSTM-RNNs use input, output and forget gates to achieve a network that can maintain state and propagate gradients in a stabe fashion over ong spans of time. These networks have been shown to outperform DNNs on a variety of ASR tasks [9][0][][2][3][4]. A previousy proposed LSTMs use a recurrence aong the time axis to mode the tempora patterns of speech signas, and we ca them T-LSTMs in this paper. In common practice, og-fiter-bank features are often used as the input to the neura-network-based acoustic mode [5]. In standard systems, the og-fiter-bank features are independent of one-another, i.e. switching the positions of two fiter-banks won t affect the performance of the DNN or LSTM. However, this is not the case when a human reads a spectrogram: a human reies on both patterns that evove on time, and frequency, to predict phonemes. Switching the positions of two fiter-banks wi destroy the frequency-wise patterns. Meanwhie, switching the positions of two frames wi destroy the time-wise patterns. Inspired by the way peope read spectrograms, we recenty proposed frequency LSTM (F-LSTM) in [6] which performs recurrence aong the frequency axis to summarize the frequency invoving patterns as the feature for the upper eve T-LSTMs. A the LSTM operations in [6] are onedimensiona, either aong the frequency axis or the time axis. However, both time-wise and frequency-wise patterns are important to human spectrogram reading. Hence, it may be better to extract feature with both patterns. Further, the concept of mutidimensiona processing has been proved very successfu in the handwriting recognition tasks [7][8] and the computer vision tasks [9], and it outperformed the traditiona handwriting systems that use convoutiona neura networks (CNNs) [20][2] as the feature extractor. The main contribution of this paper is the proposa to use a mutidimensiona LSTM to mode both time and frequency dynamics for speech recognition. We further propose a method for doing this joint time-frequency anaysis in a highy efficient way. We term the proposed method the time-frequency LSTM or TF- LSTM. Evauated on a 375-hour Microsoft short message dictation (SMD) task, the TF-LSTM consistenty outperformed the F-LSTM and obtained 3.4% reative word error rate () reduction from the T-LSTM on the SMD test set, and a 4.2% reative reduction on a mismatched test set. The rest of the paper is organized as foows. In Section 2, we briefy introduce LSTMs and then we present the proposed timefrequency LSTM in Section 3. We differentiate the proposed method from the convoutiona LSTM DNN (CLDNN) [4] and muti-dimensiona RNN [7][8] in Section 4. Experimenta evauation of the agorithm is provided in Section 5. We summarize our study and draw concusions in Section THE LSTM-RNN An RNN is fundamentay different from the feed-forward DNN in that the RNN does not operate on a fixed window of frames; instead, it maintains a hidden state vector, which is recursivey updated after seeing each time frame. This aows RNNs to be resiient to arbitrary input warping aong the recurrence dimension eading to better generaization abiities. Stacking mutipe ayers of RNNs aows the network to discover reationships between frames on progressivey higher eves of abstraction. During earning, the simpe RNN suffers from the vanishing/expoding gradient probem [22]. This probem is we handed in the LSTM-RNNs through the use of the foowing four components:

2 Memory units: these store the tempora state of the network; Input gates: these moduate the input activations into the ces; Output gates: these moduate the output activations of the ces ; Forget gates: these adaptivey reset the ce s memory. Taken together as in Figure beow, these four components are termed a LSTM ce. Figure : Architecture of LSTM-RNNs with one recurrent ayer. Z is a time-deay node. Figure depicts the architecture of an LSTM-RNN with one recurrent ayer. In LSTM-RNNs, in addition to the past hidden-ayer output h t, the past memory activation c t is aso an input to the LSTM ce. This mode can be described as: i t = σ(w xi x t + W hi h t + W ci c t + b i ), () f t = σ(w xf x t + W hf h t + W cf c t + b f ), (2) c t = f t. c t + i t. tanh(w xc x t + W hc h t + b c ), (3) o t = σ(w xo x t + W ho h t + W co c t + b o ), (4) h t = o t. tanh(c t ), (5) where i t, o t, f t, and c t denote the activation vectors of input gate, output gate, forget gate, and memory ce at the -th ayer and time t, respectivey. h t is the output of the LSTM ces at ayer and time t. W terms denote different weight matrices. For exampe, W xi is the weight matrix from the ce input to the input gate at the -th ayer. b terms are the bias terms (e.g., b i is the bias of input gate at ayer ).. denotes eement wise mutipication. In [], a LSTM with an additiona projection ayer prior to the output was proposed to reduce the computationa compexity of LSTM. A projection ayer is appied to h t as r t = W hr h t And then h t in Eqs ()--(4) is repaced by r t. In this study, we adopt this structure for T-LSTM modeing. Figure 2: An exampe of time-frequency LSTM-RNN which scans both the time and frequency axis at the bottom ayer using TF-LSTM, and then scans the time axis at the upper ayers using T-LSTM. Note that the outputs of a TF-LSTM ces are fed into the upper ayer T- LSTM. f k,t The formuation of the TF-LSTM is as foows. = σ(w xi x k,t + W hi h k,t + W hi2 h k,t + W ci c k,t + b i ), (6) = σ(w xf x k,t + W hf h k,t + W hf2 h k,t + W cf c k,t + b f ), (7) i k,t o k,t c k,t = f k,t = σ(w xo x k,t. c k,t + W ho + i k,t h k,t. tanh(w xc x k,t W hc2 + W ho2 h k,t + W hc h k,t + + b c ), (8) h k,t + W co c k,t + b o ), (9) h k,t = o k,t. tanh(c k,t ), (0) In this formuation, every gate now has three indices: ayer, frequency band k, and time t. For exampe, f k,t denotes the activation vectors of forget gate at the ayer, frequency band k, and time t. Different from Eqs ()--(4), now we have both time-deay input h k,t and frequency-deay input h k,t. The W h. and W h.2 matrices denote the weight matrices connecting h k,t and h k,t, respectivey. The structure of a TF-LSTM ce is potted in Figure 3, where φ denotes the tanh function. 3. JOINT TIME-FREQUENCY ANALYSIS VIA MULTIDIMENSIONAL LSTM In this section, we propose a time-frequency LSTM (TF-LSTM) as shown in Figure 2. In contrast to the frequency LSTM (F-LSTM) in our previous work [6] which scans the frequency bands so that frequency-evoving information is summarized by the output of the F-LSTM, the new method scans both the time and frequency axes jointy to perform the time-frequency anaysis. Figure 3: A TF-LSTM ce at frequency band k, and time t.

3 The proposed TF-LSTM in Eqs (6)--(0) is a genera case of T-LSTM or F-LSTM. When a the time frequency bands are concatenated together as a singe unit, frequency index k and a the items associated with W h.2 are removed. Then the TF-LSTM reduces to the T-LSTM of Eqs ()--(5). In contrast, if a the items associated with W h. are removed, the TF-LSTM reduces to a F- LSTM, which can be viewed as removing the connections to h k,t in Figure 3. The detaied TF-LSTM processing is described as foows. At each time step, divide the N og-fiter-banks at the current time into M overapped chunks, shifting by C ogfiter-banks between adjacent chunks. They are denoted as x k,t, k = M. Using the hidden activations at each frequency chunk from the previous time step h k,t, the hidden activations at each time step from the previous frequency chunk h k,t, and the input at the current frequency chunk and time step x k,t, go through Eqs (6)--(0) to generate the output of h k,t, k = M. Note that we use og-fiterbanks as the input which means the time-frequency anaysis is in the first ayer, is set as in Eqs (6)--(0). Merge h k,t, k = M into a super-vector h t which can be considered as a trajectory of time-frequency patterns. Then use h t as the input to the upper ayer T-LSTM. It is aso worthwhie to investigate the stacking of mutipe TF- LSTM ayers. This can be easiy done by repacing x k,t with the hidden activations from the previous ayer h k,t in Eqs (6)--(9). Again, the output of the ast TF-LSTM ayer is merged into a supervector as the input to the upper ayer T-LSTM. A sampe of stacked two TF-LSTM ayers is shown in Figure 4. the CNN ayer is fed into a muti-ayer LSTM to earn the tempora patterns. Finay, the output of the ast LSTM ayer is fed into severa fuy connected DNN ayers for the purpose of cassification. The key difference between the TF-LSTM and the CLDNN is that the TF-LSTM uses joint time-frequency recurrence, whereas the CLDNN uses a siding convoutiona window for pattern detection. Whie the siding window achieves some oca invariance, it is not the same as a joint two-dimensiona recurrent network which scans the whoe time and frequency axis. The two approaches both aim to achieve invariance to input distortions, but the pattern detectors in the CNN maintain a constant dimensionaity, whie the TF-LSTM can perform a genera time-frequency warping. The proposed method is simiar to the mutidimensiona LSTM [7][8] which is used for handwriting recognition. Mutidimensiona LSTM has been used in [23] on a very sma phone recognition task, TIMIT [24], using connectionist tempora cassification (CTC) [25] as the training criterion. However, there is no accuracy comparison with T-LSTM in [23]. In contrast, we wi show the advantage of our proposed TF-LSTM over T-LSTM with the cross-entropy training criterion on a arge scae speech recognition task in next section. Athough using simiar concepts, the proposed TF-LSTM has a different formuation from the mutidimensiona LSTM in [7][8]. The proposed TF-LSTM has ony a singe memory unit and a singe forget gate whie the mutidimensiona LSTM in [7][8] has mutipe forget gates, each handing one dimensiona information. Thus we achieve a significant reduction in compexity. We are currenty buiding a strong CLDNNs baseine to compare with, and it wi be reported in the future. We wi aso impement the mutidimensiona LSTM with mutipe forget gates [7][8] and compare with our proposed method. 5. EXPERIMENTS AND DISCUSSIONS The proposed methods are evauated on a Microsoft Windows phone short message dictation task. The transcribed training data contain 375 hours of US-Engish audio. The test set is from the same Windows Phone task, and has 25k words. This arge test set guarantees the significance of reported improvement. The 87-dimentiona feature used in the DNN and T-LSTM experiments consists of the 29-dimensiona static og-fiter-bank outputs and their first- and second-order derivatives [26]. For the F- LSTM and TF-LSTM experiments, we ony use the static og-fiterbanks as the feature. A modes evauated in this study use 5976 tied-triphone states (senones), determined by a baseine CD-GMM- HMM system, and were trained to minimize the frame-eve crossentropy criterion. A experiments were conducted using the Computationa Network Tookit (CNTK) [27], which aows us to buid and evauate various network structures efficienty without deriving and impementing compicated training agorithms. Figure 4: An exampe of stacked TF-LSTM ayers. 4. RELATION TO PRIOR WORK In this section, we first discuss the difference between our proposed TF-LSTM and the convoutiona LSTM DNN (CLDNN) [4] which combines CNNs, LSTMs, and DNNs together. The CLDNN first uses a CNN to reduce the spectra variation, and then the output of To buid the baseine DNN, we augment the 87-dimensiona feature vectors with 5 frames of context on either side (5--5). The DNN has 5 hidden ayers, each with 2048 sigmoid units. The baseine T-LSTM is modeed after that in []. Each T-LSTM ayer has 024 hidden units and the output size of each T-LSTM ayer is reduced to 52 using a inear projection ayer. There is no frame stacking, and the output HMM state abe is deayed by 5 frames as in []. When training T-LSTM, the backpropagation through time (BPTT) [28] step is 20. We use a 4-ayer T-LSTM as our baseine.

4 This has 5.35%. It outperforms the baseine DNN with 0.39% reative reduction. This setup is better than the mode with three or five T-LSTM ayers as shown in Tabe. There is a 4.3% reative reduction when increasing one additiona ayer from 3-ayer T-LSTM to 4-ayer T-LSTM. However, a 5-ayer LSTM does not outperform a 4-ayer T-LSTM. Tabe : and mode size comparison of DNN and T-LSTM. M denotes miion in the coumn of number of. Mode DNN M 3-ayer T-LSTM M 4-ayer T-LSTM M 5-ayer T-LSTM M In Tabe 2, we compare the performance of the F-LSTM and TF-LSTM modes. The F-LSTM mode uses a singe LSTM to scan the og-fiter-banks whie the TF-LSTM uses a singe LSTM to scan both the time and og-fiter-banks. The generated time-frequency evoving summary or the frequency evoving summary wi then be passed into 3 or 4 ayers of T-LSTMs. At each time step, the 29 og-fiter-bank channes are divided into 22 overapped chunks with each chunk containing 8 og-fiterbanks, which means the frequency shift is og-fiter-bank. This og-fiter-bank grouping strategy foows our previous wisdom in CNN [29]. Then these 22 chunks are fed into F-LSTM. The input to the TF-LSTM ces incudes not ony the previous frequency chunks but aso the output of this TF-LSTM ce in the previous time frame. Both the F-LSTM and TF-LSTM have 24 memory ces, introducing sma computationa cost. The upper ayer T-LSTMs have the same structure as the baseine T-LSTMs, with 024 hidden units in each ayer, and the output size is reduced to 52 using a projection. A the setups in Tabe 2 outperform the baseine 4-ayer T- LSTM. With a 3-ayer T-LSTM on top of it, the F-LSTM and TF- LSTM perform amost the same. However, with a 4-ayer T-LSTM on top it, the TF-LSTM is much better than the F-LSTM, and gets 4.83% a 3.4% reative reduction from the baseine 4- ayer T-LSTM. The joint time-frequency modeing provides a better feature for the upper ayer T-LSTMs to consume. As shown in Tabe, simpy increasing number of ayers from 4 to 5 doesn t give any gain. Tabe 2: Comparison of F-LSTM or TF-LSTM Mode F-LSTM + 3-ayer T-LSTM M F-LSTM + 4-ayer T-LSTM M TF-LSTM + 3-ayer T-LSTM M TF-LSTM + 4-ayer T-LSTM M We further investigate the performance of stacked F-LSTM and TF-LSTM in Tabe 3. To have the same number of ayers as the TF-LSTM + 4-ayer T-LSTM setup in Tabe 2, we tried to use either 2-ayer F-LSTM or 2-ayer TF-LSTM, foowed by 3-ayer T- LSTM. Again, the setup using TF-LSTM outperformed the setup with F-LSTM. However, none outperformed the TF-LSTM + 4- ayer T-LSTM setup. Note that it ony introduces 0.M additiona from the TF-LSTM + 3-ayer T-LSTM setup in Tabe 2 to the 2-ayer F-LSTM + 3-ayer T-LSTM setup in Tabe 3 and this brings very sight improvement. This is because the TF- LSTM itsef has very sma number of parameter because the ce size is ony 24. In the future, we can have 2-ayer TF-LSTM foowed by 4-ayer T-LSTM to get some further gains. Tabe 3: The stacking of F-LSTM and TF-LSTM Mode 2-ayer F-LSTM + 3-ayer T-LSTM M 2-ayer TF-LSTM + 3-ayer T- LSTM M In a fina set of experiments, we evauated the invariance properties of the TF-LSTM mode by testing the modes trained with Windows phone data on the Aurora 4 [30] test sets. Two cean evauation sets (A and C) are recorded with the Sennheiser microphone and the secondary microphone, respectivey. The remaining two groups (B and D), are recorded with two types of microphone respectivey, and 6 types of noise are added with randomy chosen SNRs between 5 and 5 db for each of the microphone types. Therefore, these test sets have totay mismatched acoustic environments from the Windows phone training set. We used the baseine 4-ayer T-LSTM mode in Tabe and the TF-LSTM mode in Tabe 2 for the evauation. The anguage mode is a bigram provided by Aurora 4. As shown in Tabe 4, the TF-LSTM performs much better than the T-LSTM in a test conditions, and reduced the average from 7.46% to 5.0%, a 4.2% reative reduction. This confirms the robustness [3] of the joint time-frequency anaysis of the TF-LSTM. Tabe 4: The comparison of T-LSTM and TF-LSTM modes on the mismatched Aurora 4 test sets. Modes are trained with Windows phone short message dictation data. Mode A B C D Avg. 4-ayer T-LSTM TF-LSTM + 4- ayer T-LSTM CONCLUSIONS In this paper, we have presented a two-dimensiona TF-LSTM architecture that scans both the time and frequency axes to mode the evoving patterns of the spectrogram. The TF-LSTM uses a LSTM to perform a joint time-frequency recurrence that summarizes spectro-tempora patterns. The summarized patterns are then fed into upper eve T-LSTMs. The proposed TF-LSTM obtained a 3.4% reative reduction over the traditiona T- LSTM on a 375-hour short message dictation task. We further investigated the effectiveness of stacking mutipe TF-LSTM ayers, and found that the additiona accuracy gain is margina. This indicates that a one ayer TF-LSTM is good enough to extract the patterns reevant to speech recognition. When evauated with a totay mismatched Aurora 4 test set, the TF-LSTM demonstrates much better resistance to the distortion, giving 4.2% reative reduction over a T-LSTM.

5 REFERENCES [] F. Seide, G. Li, and D. Yu, Conversationa speech transcription using context-dependent deep neura networks, in Proc. Interspeech, pp , 20. [2] N. Jaity, P. Nguyen, A. Senior, and V. Vanhoucke, An appication of pretrained deep neura networks to arge vocabuary conversationa speech recognition, in Proc. Interspeech, 202. [3] T. N. Sainath, B. Kingsbury, B. Ramabhadran, P. Fousek, P. Novak, A.-R. Mohamed, Making deep beief networks effective for arge vocabuary continuous speech recognition, in Proc. ASRU, pp , 20. [4] G. E. Dah, D. Yu, L. Deng, and A. Acero, Large vocabuary continuous speech recognition with context-dependent DBN- HMMs, in Proc. ICASSP, pp , 20. [5] A. Mohamed, G. E. Dah, and G. Hinton, Acoustic modeing using deep beief networks, IEEE Trans. Audio Speech and Language Process., vo. 20, no., pp. 4-22, Jan [6] L. Deng, J. Li, J.-T. Huang et. a. Recent advances in deep earning for speech research at Microsoft, in Proc. ICASSP, 203. [7] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neura Computation, vo. 9, no. 8, pp , 997. [8] A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continua prediction with LSTM, Neura Computation, vo. 2, no. 0, pp , [9] A. Graves, A. Mohamed, G. Hinton. Speech recognition with deep recurrent neura networks, in Proc. ICASSP, 203. [0] A. Graves, N. Jaity, A. Mohamed. Hybrid speech recognition with deep bidirectiona LSTM, in Proc. ASRU, 203. [] H. Sak, A. Senior, F. Beaufays, "Long short-term memory recurrent neura network architectures for arge scae acoustic modeing," in Proc. Interspeech, 204. [2] H. Sak, O. Vinyas, G. Heigod, A. Senior, E. McDermott, R. Monga, M. Mao, "Sequence discriminative distributed training of ong short-term memory recurrent neura networks," in Proc. Interspeech, 204. [3] X. Li and X. Wu, Constructing ong short-term memory based deep recurrent neura networks for arge vocabuary speech recognition, in Proc. ICASSP, 205. [4] T. N. Sainath, O. Vinyas, A. Senior and H. Sak, "Convoutiona, ong short-term memory, fuy connected deep neura networks," in Proc. ICASSP, 205. [5] A. Mohamed, G. Hinton, and G. Penn, Understanding how deep beief networks perform acoustic modeing, in Proc. ICASSP, pp , 202. [6] J. Li, A. Mohamed, G. Zweig, and Yifan Gong, LSTM time and frequency recurrence for automatic speech recognition, in Proc. ASRU, 205. [7] A. Graves, S. Fernández, J. Schmidhuber, Muti-dimensiona recurrent neura networks, in ICANN, pp , [8] A. Graves and J. Schmidhuber, Offine handwriting recognition with mutidimensiona recurrent neura networks, Advances in Neura Information Processing Systems, pp , [9] W. Byeon, T. M. Breue, F. Raue, and M. Liwicki, Scene abeing with LSTM recurrent neura networks, In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp , 205. [20] T. N. Sainath, A. Mohamed, B. Kingsbury and B. Ramabhadran, "Deep convoutiona neura networks for LVCSR," in Proc. ICASSP, 203. [2] O. Abde-Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn, and Dong Yu, Convoutiona neura networks for speech recognition, IEEE/ACM Transactions on Audio, Speech, and Language processing, vo. 22, no. 0, pp , 204. [22] Y. Bengio, P. Simard, and P. Frasconi. Learning ong-term dependencies with gradient descent is difficut, IEEE Transactions on Neura Networks, vo. 5, no. 2, pp , 994. [23] A. Graves, "Practica variationa inference for neura networks." In Advances in Neura Information Processing Systems, pp , 20. [24] J. S. Garofoo, L. F. Lame, W. M. Fisher, J. G. Fiscus, D. S. Paett, and N. L. Dahgren, DARPA TIMIT Acoustic- Phonetic Continuous Speech Corpus, U.S. Dept. of Commerce, NIST, Gaithersburg, MD, February 993. [25] A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, Connectionist tempora cassification: abeing unsegmented sequence data with recurrent neura networks, in Proceedings of the 23rd internationa conference on Machine earning. ACM, pp , [26] J. Li, D. Yu, J. T. Huang, and Y. Gong. "Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM," in Proc. IEEE Spoken Language Technoogy Workshop, pp. 3 36, 202. [27] D. Yu, A. Eversoe, M. Setzer, et. a., "An introduction to computationa networks and the computationa network tookit," Microsoft Technica Report MSR-TR-204-2, 204. [28] H. Jaeger, Tutoria on training recurrent neura networks, covering BPPT, RTRL, EKF and the echo state network approach, GMD Report 59, GMD German Nationa Research Institute for Computer Science, [29] J.-T. Huang, J. Li, and Y. Gong, An anaysis of convoutiona neura networks for speech recognition, in Proc. ICASSP, 205. [30] N. Parihar and J. Picone, Aurora working group: DSR front end LVCSR evauation AU/384/02, Tech. Rep., Institute for Signa and Information Processing, Mississippi State Univ., [3] J. Li, L. Deng, Y. Gong, and R. Haeb-Umbach, Robust Automatic Speech Recognition: A Bridge to Practica Appications, Esevier Press, 205.

LSTM TIME AND FREQUENCY RECURRENCE FOR AUTOMATIC SPEECH RECOGNITION

LSTM TIME AND FREQUENCY RECURRENCE FOR AUTOMATIC SPEECH RECOGNITION LSTM TIME AND FREQUENCY RECURRENCE FOR AUTOMATIC SPEECH RECOGNITION Jinyu Li, Abderahman Mohamed, Geoffrey Zweig, and Yifan Gong Microsoft Corporation, One Microsoft Way, Redmond, WA 98052 { jinyi, asamir,

More information

arxiv: v1 [cs.ne] 5 Feb 2014

arxiv: v1 [cs.ne] 5 Feb 2014 LONG SHORT-TERM MEMORY BASED RECURRENT NEURAL NETWORK ARCHITECTURES FOR LARGE VOCABULARY SPEECH RECOGNITION Haşim Sak, Andrew Senior, Françoise Beaufays Google {hasim,andrewsenior,fsb@google.com} arxiv:12.1128v1

More information

IMPROVING WIDEBAND SPEECH RECOGNITION USING MIXED-BANDWIDTH TRAINING DATA IN CD-DNN-HMM

IMPROVING WIDEBAND SPEECH RECOGNITION USING MIXED-BANDWIDTH TRAINING DATA IN CD-DNN-HMM IMPROVING WIDEBAND SPEECH RECOGNITION USING MIXED-BANDWIDTH TRAINING DATA IN CD-DNN-HMM Jinyu Li, Dong Yu, Jui-Ting Huang, and Yifan Gong Microsoft Corporation, One Microsoft Way, Redmond, WA 98052 ABSTRACT

More information

Endpoint Detection using Grid Long Short-Term Memory Networks for Streaming Speech Recognition

Endpoint Detection using Grid Long Short-Term Memory Networks for Streaming Speech Recognition INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Endpoint Detection using Grid Long Short-Term Memory Networks for Streaming Speech Recognition Shuo-Yiin Chang, Bo Li, Tara N. Sainath, Gabor Simko,

More information

Improving the Active Power Filter Performance with a Prediction Based Reference Generation

Improving the Active Power Filter Performance with a Prediction Based Reference Generation Improving the Active Power Fiter Performance with a Prediction Based Reference Generation M. Routimo, M. Sao and H. Tuusa Abstract In this paper a current reference generation method for a votage source

More information

BER Performance Analysis of Cognitive Radio Physical Layer over Rayleigh fading Channel

BER Performance Analysis of Cognitive Radio Physical Layer over Rayleigh fading Channel Internationa Journa of Computer ppications (0975 8887) Voume 5 No.11, Juy 011 BER Performance naysis of Cognitive Radio Physica Layer over Rayeigh fading mandeep Kaur Virk Dr. B R mbedkar Nationa Institute

More information

Resource Allocation via Linear Programming for Multi-Source, Multi-Relay Wireless Networks

Resource Allocation via Linear Programming for Multi-Source, Multi-Relay Wireless Networks Resource Aocation via Linear Programming for Muti-Source, Muti-Reay Wireess Networs Nariman Farsad and Andrew W Ecford Dept of Computer Science and Engineering, Yor University 4700 Keee Street, Toronto,

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Performance Measures of a UWB Multiple-Access System: DS/CDMA versus TH/PPM

Performance Measures of a UWB Multiple-Access System: DS/CDMA versus TH/PPM Performance Measures of a UWB Mutipe-Access System: DS/CDMA versus TH/PPM Aravind Kaias and John A. Gubner Dept. of Eectrica Engineering University of Wisconsin-Madison Madison, WI 53706 akaias@wisc.edu,

More information

Learning the Speech Front-end With Raw Waveform CLDNNs

Learning the Speech Front-end With Raw Waveform CLDNNs INTERSPEECH 2015 Learning the Speech Front-end With Raw Waveform CLDNNs Tara N. Sainath, Ron J. Weiss, Andrew Senior, Kevin W. Wilson, Oriol Vinyals Google, Inc. New York, NY, U.S.A {tsainath, ronw, andrewsenior,

More information

Rate-Allocation Strategies for Closed-Loop MIMO-OFDM

Rate-Allocation Strategies for Closed-Loop MIMO-OFDM Rate-Aocation Strategies for Cosed-Loop MIMO-OFDM Joon Hyun Sung and John R. Barry Schoo of Eectrica and Computer Engineering Georgia Institute of Technoogy, Atanta, Georgia 30332 0250, USA Emai: {jhsung,barry}@ece.gatech.edu

More information

Fuzzy Model Predictive Control Applied to Piecewise Linear Systems

Fuzzy Model Predictive Control Applied to Piecewise Linear Systems 10th Internationa Symposium on Process Systems Engineering - PSE2009 Rita Maria de Brito Aves, Caudio Augusto Oer do Nascimento and Evaristo Chabaud Biscaia Jr. (Editors) 2009 Esevier B.V. A rights reserved.

More information

Channel Division Multiple Access Based on High UWB Channel Temporal Resolution

Channel Division Multiple Access Based on High UWB Channel Temporal Resolution Channe Division Mutipe Access Based on High UWB Channe Tempora Resoution Rau L. de Lacerda Neto, Aawatif Menouni Hayar and Mérouane Debbah Institut Eurecom B.P. 93 694 Sophia-Antipois Cedex - France Emai:

More information

Convolutional Neural Networks for Small-footprint Keyword Spotting

Convolutional Neural Networks for Small-footprint Keyword Spotting INTERSPEECH 2015 Convolutional Neural Networks for Small-footprint Keyword Spotting Tara N. Sainath, Carolina Parada Google, Inc. New York, NY, U.S.A {tsainath, carolinap}@google.com Abstract We explore

More information

A BAG-OF-FEATURES APPROACH TO ACOUSTIC EVENT DETECTION. Department of Computer Science, TU Dortmund University, Dortmund, Germany

A BAG-OF-FEATURES APPROACH TO ACOUSTIC EVENT DETECTION. Department of Computer Science, TU Dortmund University, Dortmund, Germany A BAG-OF-FEATURES APPROACH TO ACOUSTIC EVENT DETECTION Axe Pinge, René Grzeszick, and Gernot A. Fink Department of Computer Science, TU Dortmund University, Dortmund, Germany ABSTRACT The cassification

More information

Rateless Codes for the Gaussian Multiple Access Channel

Rateless Codes for the Gaussian Multiple Access Channel Rateess Codes for the Gaussian Mutipe Access Channe Urs Niesen Emai: uniesen@mitedu Uri Erez Dept EE, Te Aviv University Te Aviv, Israe Emai: uri@engtauaci Devavrat Shah Emai: devavrat@mitedu Gregory W

More information

An Approach to use Cooperative Car Data in Dynamic OD Matrix

An Approach to use Cooperative Car Data in Dynamic OD Matrix An Approach to use Cooperative Car Data in Dynamic OD Matrix Estimation L. Montero and J. Barceó Department of Statistics and Operations Research Universitat Poitècnica de Cataunya UPC-Barceona Tech Abstract.

More information

Secure Physical Layer Key Generation Schemes: Performance and Information Theoretic Limits

Secure Physical Layer Key Generation Schemes: Performance and Information Theoretic Limits Secure Physica Layer Key Generation Schemes: Performance and Information Theoretic Limits Jon Waace Schoo of Engineering and Science Jacobs University Bremen, Campus Ring, 879 Bremen, Germany Phone: +9

More information

Minimizing Distribution Cost of Distributed Neural Networks in Wireless Sensor Networks

Minimizing Distribution Cost of Distributed Neural Networks in Wireless Sensor Networks 1 Minimizing Distribution Cost of Distributed Neura Networks in Wireess Sensor Networks Peng Guan and Xiaoin Li Scaabe Software Systems Laboratory, Department of Computer Science Okahoma State University,

More information

Neural Network Acoustic Models for the DARPA RATS Program

Neural Network Acoustic Models for the DARPA RATS Program INTERSPEECH 2013 Neural Network Acoustic Models for the DARPA RATS Program Hagen Soltau, Hong-Kwang Kuo, Lidia Mangu, George Saon, Tomas Beran IBM T. J. Watson Research Center, Yorktown Heights, NY 10598,

More information

Pulsed RF Signals & Frequency Hoppers Using Real Time Spectrum Analysis

Pulsed RF Signals & Frequency Hoppers Using Real Time Spectrum Analysis Pused RF Signas & Frequency Hoppers Using Rea Time Spectrum Anaysis 1 James Berry Rohde & Schwarz Pused Rea Time and Anaysis Frequency Seminar Hopper Agenda Pused Signas & Frequency Hoppers Characteristics

More information

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks roceedings of the 46th IEEE Conference on Decision and Contro New Oreans, LA, USA, Dec. 12-14, 27 FrB2.5 ower Contro and Transmission Scheduing for Network Utiity Maximization in Wireess Networks Min Cao,

More information

Predicting Eye Fixations using Convolutional Neural Networks

Predicting Eye Fixations using Convolutional Neural Networks Predicting Eye Fixations using Convoutiona Neura Networks Nian Liu 1, Junwei Han 1*, Dingwen Zhang 1, Shifeng Wen 1 and Tianming Liu 2 1 Northwestern Poytechnica University, P.R. China 2 University of

More information

Airborne Ultrasonic Position and Velocity Measurement Using Two Cycles of Linear-Period-Modulated Signal

Airborne Ultrasonic Position and Velocity Measurement Using Two Cycles of Linear-Period-Modulated Signal Airborne Utrasonic Position and Veocity Measurement Using Two Cyces of Linear-Period-Moduated Signa Shinya Saito 1, Minoru Kuribayashi Kurosawa 1, Yuichiro Orino 1, and Shinnosuke Hirata 2 1 Department

More information

Radial basis function networks for fast contingency ranking

Radial basis function networks for fast contingency ranking Eectrica Power and Energy Systems 24 2002) 387±395 www.esevier.com/ocate/ijepes Radia basis function networks for fast contingency ranking D. Devaraj a, *, B. Yegnanarayana b, K. Ramar a a Department of

More information

Fast Hybrid DFT/DCT Architecture for OFDM in Cognitive Radio System

Fast Hybrid DFT/DCT Architecture for OFDM in Cognitive Radio System Fast Hybrid DF/D Architecture for OFDM in ognitive Radio System Zhu hen, Moon Ho Lee, Senior Member, EEE, hang Joo Kim 3 nstitute of nformation&ommunication, honbuk ationa University, Jeonju, 56-756,Korea

More information

Google Speech Processing from Mobile to Farfield

Google Speech Processing from Mobile to Farfield Google Speech Processing from Mobile to Farfield Michiel Bacchiani Tara Sainath, Ron Weiss, Kevin Wilson, Bo Li, Arun Narayanan, Ehsan Variani, Izhak Shafran, Kean Chin, Ananya Misra, Chanwoo Kim, and

More information

Resource Allocation via Linear Programming for Fractional Cooperation

Resource Allocation via Linear Programming for Fractional Cooperation 1 Resource Aocation via Linear Programming for Fractiona Cooperation Nariman Farsad and Andrew W Ecford Abstract In this etter, resource aocation is considered for arge muti-source, muti-reay networs empoying

More information

FEATURE COMBINATION AND STACKING OF RECURRENT AND NON-RECURRENT NEURAL NETWORKS FOR LVCSR

FEATURE COMBINATION AND STACKING OF RECURRENT AND NON-RECURRENT NEURAL NETWORKS FOR LVCSR FEATURE COMBINATION AND STACKING OF RECURRENT AND NON-RECURRENT NEURAL NETWORKS FOR LVCSR Christian Plahl 1, Michael Kozielski 1, Ralf Schlüter 1 and Hermann Ney 1,2 1 Human Language Technology and Pattern

More information

ADAPTIVE ITERATION SCHEME OF TURBO CODE USING HYSTERESIS CONTROL

ADAPTIVE ITERATION SCHEME OF TURBO CODE USING HYSTERESIS CONTROL ADATIV ITRATION SCHM OF TURBO COD USING HYSTRSIS CONTROL Chih-Hao WU, Kenichi ITO, Yung-Liang HUANG, Takuro SATO Received October 9, 4 Turbo code, because of its remarkabe coding performance, wi be popuar

More information

Audio Effects Emulation with Neural Networks

Audio Effects Emulation with Neural Networks DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2017 Audio Effects Emulation with Neural Networks OMAR DEL TEJO CATALÁ LUIS MASÍA FUSTER KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL

More information

Information Theoretic Radar Waveform Design for Multiple Targets

Information Theoretic Radar Waveform Design for Multiple Targets 1 Information Theoretic Radar Waveform Design for Mutipe Targets Amir Leshem and Arye Nehorai Abstract In this paper we use information theoretic approach to design radar waveforms suitabe for simutaneousy

More information

Generalized constrained energy minimization approach to subpixel target detection for multispectral imagery

Generalized constrained energy minimization approach to subpixel target detection for multispectral imagery Generaized constrained energy minimization approach to subpixe target detection for mutispectra imagery Chein-I Chang, MEMBER SPIE University of Maryand Batimore County Department of Computer Science and

More information

Joint Beamforming and Power Optimization with Iterative User Clustering for MISO-NOMA Systems

Joint Beamforming and Power Optimization with Iterative User Clustering for MISO-NOMA Systems This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Citation information: DOI 0.09/ACCESS.07.70008,

More information

Iterative Transceiver Design for Opportunistic Interference Alignment in MIMO Interfering Multiple-Access Channels

Iterative Transceiver Design for Opportunistic Interference Alignment in MIMO Interfering Multiple-Access Channels Journa of Communications Vo. 0 No. February 0 Iterative Transceiver Design for Opportunistic Interference Aignment in MIMO Interfering Mutipe-Access Channes Weipeng Jiang ai Niu and Zhiqiang e Schoo of

More information

Joint Optimization of Scheduling and Power Control in Wireless Networks: Multi-Dimensional Modeling and Decomposition

Joint Optimization of Scheduling and Power Control in Wireless Networks: Multi-Dimensional Modeling and Decomposition This artice has been accepted for pubication in a future issue of this journa, but has not been fuy edited. Content may change prior to fina pubication. Citation information: DOI 10.1109/TMC.2018.2861859,

More information

A Low Complexity VCS Method for PAPR Reduction in Multicarrier Code Division Multiple Access

A Low Complexity VCS Method for PAPR Reduction in Multicarrier Code Division Multiple Access 0 JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA, VOL. 5, NO., JUNE 007 A Low Compexity VCS Method for PAPR Reduction in Muticarrier Code Division Mutipe Access Si-Si Liu, Yue iao, Qing-Song Wen,

More information

PROPORTIONAL FAIR SCHEDULING OF UPLINK SINGLE-CARRIER FDMA SYSTEMS

PROPORTIONAL FAIR SCHEDULING OF UPLINK SINGLE-CARRIER FDMA SYSTEMS PROPORTIONAL FAIR SCHEDULING OF UPLINK SINGLE-CARRIER SYSTEMS Junsung Lim, Hyung G. Myung, Kyungjin Oh and David J. Goodman Dept. of Eectrica and Computer Engineering, Poytechnic University 5 Metrotech

More information

Analysis, Analysis Practices, and Implications for Modeling and Simulation

Analysis, Analysis Practices, and Implications for Modeling and Simulation , Practices, and Impications for Modeing and imuation Amy Henninger The Probem The act of identifying, enumerating, evauating, and mapping known technoogies to inferred program requirements is an important

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /GLOCOM.2003.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /GLOCOM.2003. Coon, J., Siew, J., Beach, MA., Nix, AR., Armour, SMD., & McGeehan, JP. (3). A comparison of MIMO-OFDM and MIMO-SCFDE in WLAN environments. In Goba Teecommunications Conference, 3 (Gobecom 3) (Vo. 6, pp.

More information

THE TRADEOFF BETWEEN DIVERSITY GAIN AND INTERFERENCE SUPPRESSION VIA BEAMFORMING IN

THE TRADEOFF BETWEEN DIVERSITY GAIN AND INTERFERENCE SUPPRESSION VIA BEAMFORMING IN THE TRADEOFF BETWEEN DIVERSITY GAIN AND INTERFERENCE SUPPRESSION VIA BEAMFORMING IN A CDMA SYSTEM Yan Zhang, Laurence B. Mistein, and Pau H. Siege Department of ECE, University of Caifornia, San Diego

More information

Dealing with Link Blockage in mmwave Networks: D2D Relaying or Multi-beam Reflection?

Dealing with Link Blockage in mmwave Networks: D2D Relaying or Multi-beam Reflection? Deaing with Lin Bocage in mmwave etwors: DD Reaying or Muti-beam Refection? Mingjie Feng, Shiwen Mao Dept. Eectrica & Computer Engineering Auburn University, Auburn, AL 36849-5, U.S.A. Tao Jiang Schoo

More information

TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION. Vikramjit Mitra, Horacio Franco

TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION. Vikramjit Mitra, Horacio Franco TIME-FREQUENCY CONVOLUTIONAL NETWORKS FOR ROBUST SPEECH RECOGNITION Vikramjit Mitra, Horacio Franco Speech Technology and Research Laboratory, SRI International, Menlo Park, CA {vikramjit.mitra, horacio.franco}@sri.com

More information

Georgia Institute of Technology. simulating the performance of a 32-bit interconnect bus. referenced to non-ideal planes. A transient simulation

Georgia Institute of Technology. simulating the performance of a 32-bit interconnect bus. referenced to non-ideal planes. A transient simulation Power ntegrity/signa ntegrity Co-Simuation for Fast Design Cosure Krishna Srinivasan1, Rohan Mandrekar2, Ege Engin3 and Madhavan Swaminathan4 Georgia nstitute of Technoogy 85 5th St NW, Atanta GA 30308

More information

Performance Comparison of Cyclo-stationary Detectors with Matched Filter and Energy Detector M. SAI SINDHURI 1, S. SRI GOWRI 2

Performance Comparison of Cyclo-stationary Detectors with Matched Filter and Energy Detector M. SAI SINDHURI 1, S. SRI GOWRI 2 ISSN 319-8885 Vo.3,Issue.39 November-14, Pages:7859-7863 www.ijsetr.com Performance Comparison of Cyco-stationary Detectors with Matched Fiter and Energy Detector M. SAI SINDHURI 1, S. SRI GOWRI 1 PG Schoar,

More information

Deep Neural Network Architectures for Modulation Classification

Deep Neural Network Architectures for Modulation Classification Deep Neural Network Architectures for Modulation Classification Xiaoyu Liu, Diyu Yang, and Aly El Gamal School of Electrical and Computer Engineering Purdue University Email: {liu1962, yang1467, elgamala}@purdue.edu

More information

Co-channel Interference Suppression Techniques for STBC OFDM System over Doubly Selective Channel

Co-channel Interference Suppression Techniques for STBC OFDM System over Doubly Selective Channel Co-channe Interference Suppression Techniques for STBC OFDM System over Douby Seective Channe Jyoti P. Patra Dept. of Eectronics and Communication Nationa Institute Of Technoogy Rourkea-769008, India E

More information

Blind Multiuser Detection in Asynchronous DS-CDMA Systems over Nakagami-m Fading Channels

Blind Multiuser Detection in Asynchronous DS-CDMA Systems over Nakagami-m Fading Channels Bind Mutiuser Detection in Asynchronous DS-CDMA Systems over akagami-m Fading Channes Vinay Kumar Pamua JU Kakinada, Andhra Pradesh, India 533 003 pamuavk@yahoo.com ABSRAC his paper presents a technique

More information

New Image Restoration Method Based on Multiple Aperture Defocus Images for Microscopic Images

New Image Restoration Method Based on Multiple Aperture Defocus Images for Microscopic Images Sensors & Transducers, Vo. 79, Issue 9, September 204, pp. 62-67 Sensors & Transducers 204 by IFSA Pubishing, S. L. http://www.sensorsporta.com New Image Restoration Method Based on Mutipe Aperture Defocus

More information

Sparse Channel Estimation Based on Compressed Sensing for Massive MIMO Systems

Sparse Channel Estimation Based on Compressed Sensing for Massive MIMO Systems Sparse Channe Estimation Based on Compressed Sensing for Massive MIMO Systems Chenhao Qi, Yongming Huang, Shi Jin and Lenan Wu Schoo of Information Science and Engineering, Southeast University, Nanjing

More information

Fusing Noisy Fingerprints with Distance Bounds for Indoor Localization

Fusing Noisy Fingerprints with Distance Bounds for Indoor Localization Fusing Noisy Fingerprints with Distance Bounds for Indoor Locaization Suining He 1 S.-H. Gary Chan 1 Lei Yu 2 Ning Liu 2 1 Department of CSE, The Hong Kong University of Science and Technoogy, Hong Kong,

More information

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Networks 1

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Networks 1 Recurrent neural networks Modelling sequential data MLP Lecture 9 Recurrent Networks 1 Recurrent Networks Steve Renals Machine Learning Practical MLP Lecture 9 16 November 2016 MLP Lecture 9 Recurrent

More information

I D I A P. Hierarchical and Parallel Processing of Modulation Spectrum for ASR applications Fabio Valente a and Hynek Hermansky a

I D I A P. Hierarchical and Parallel Processing of Modulation Spectrum for ASR applications Fabio Valente a and Hynek Hermansky a R E S E A R C H R E P O R T I D I A P Hierarchical and Parallel Processing of Modulation Spectrum for ASR applications Fabio Valente a and Hynek Hermansky a IDIAP RR 07-45 January 2008 published in ICASSP

More information

Audio Effects Emulation with Neural Networks

Audio Effects Emulation with Neural Networks Escola Tècnica Superior d Enginyeria Informàtica Universitat Politècnica de València Audio Effects Emulation with Neural Networks Trabajo Fin de Grado Grado en Ingeniería Informática Autor: Omar del Tejo

More information

Recurrent neural networks Modelling sequential data. MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1

Recurrent neural networks Modelling sequential data. MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent neural networks Modelling sequential data MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent Neural Networks 1: Modelling sequential data Steve

More information

SCHEDULING the wireless links and controlling their

SCHEDULING the wireless links and controlling their 3738 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 13, NO. 7, JULY 2014 Minimum Length Scheduing With Packet Traffic Demands in Wireess Ad Hoc Networks Yacin Sadi, Member, IEEE, and Sinem Coeri Ergen,

More information

Evaluating robust features on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions

Evaluating robust features on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions INTERSPEECH 2014 Evaluating robust on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions Vikramjit Mitra, Wen Wang, Horacio Franco, Yun Lei, Chris Bartels, Martin Graciarena

More information

Time-domain Techniques in EMI Measuring Receivers. Technical and Standardization Requirements

Time-domain Techniques in EMI Measuring Receivers. Technical and Standardization Requirements Time-domain Techniques in EMI Measuring Receivers Technica and Standardization Requirements CISPR = Huge, Sow, Compex, CISPR = Internationa Specia Committee on Radio Interference Technica committee within

More information

A Comparative Analysis of Image Fusion Techniques for Remote Sensed Images

A Comparative Analysis of Image Fusion Techniques for Remote Sensed Images roceedings of the Word Congress on Engineering 27 Vo I WCE 27, Juy 2-4, 27, London, U.K. Comparative naysis of Image Fusion Techniques for emote Sensed Images sha Das 1 and K.evathy 2 Department of Computer

More information

An Investigation on the Use of i-vectors for Robust ASR

An Investigation on the Use of i-vectors for Robust ASR An Investigation on the Use of i-vectors for Robust ASR Dimitrios Dimitriadis, Samuel Thomas IBM T.J. Watson Research Center Yorktown Heights, NY 1598 [dbdimitr, sthomas]@us.ibm.com Sriram Ganapathy Department

More information

Acoustic modelling from the signal domain using CNNs

Acoustic modelling from the signal domain using CNNs Acoustic modelling from the signal domain using CNNs Pegah Ghahremani 1, Vimal Manohar 1, Daniel Povey 1,2, Sanjeev Khudanpur 1,2 1 Center of Language and Speech Processing 2 Human Language Technology

More information

Effect of Estimation Error on Adaptive L-MRC Receiver over Nakagami-m Fading Channels

Effect of Estimation Error on Adaptive L-MRC Receiver over Nakagami-m Fading Channels Internationa Journa of Appied Engineering Research ISSN 973-456 Voume 3, Number 5 (8) pp. 77-83 Research India Pubications. http://www.ripubication.com Effect of Estimation Error on Adaptive -MRC Receiver

More information

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent neural networks Modelling sequential data MLP Lecture 9 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent Neural Networks 1: Modelling sequential data Steve Renals Machine Learning

More information

Utility-Proportional Fairness in Wireless Networks

Utility-Proportional Fairness in Wireless Networks IEEE rd Internationa Symposium on Persona, Indoor and Mobie Radio Communications - (PIMRC) Utiity-Proportiona Fairness in Wireess Networks G. Tychogiorgos, A. Gkeias and K. K. Leung Eectrica and Eectronic

More information

Satellite Link Layer Performance Using Two Copy SR-ARQ and Its Impact on TCP Traffic

Satellite Link Layer Performance Using Two Copy SR-ARQ and Its Impact on TCP Traffic Sateite Link Layer Performance Using Two Copy SR-ARQ and Its Impact on TCP Traffic Jing Zhu and Sumit Roy Department of Eectrica Engineering, University of Washington Box 352500, Seatte, WA 98195, USA

More information

A Neural Attention Model for Urban Air Quality Inference: Learning the Weights of Monitoring Stations

A Neural Attention Model for Urban Air Quality Inference: Learning the Weights of Monitoring Stations The Thirty-Second AAAI Conference on Artificia Inteigence (AAAI-18) A Neura Attention Mode for Urban Air Quaity Inference: Learning the Weights of Monitoring Stations Weiyu Cheng, Yanyan Shen, Yanmin Zhu,

More information

arxiv: v2 [cs.cl] 20 Feb 2018

arxiv: v2 [cs.cl] 20 Feb 2018 IMPROVED TDNNS USING DEEP KERNELS AND FREQUENCY DEPENDENT GRID-RNNS F. L. Kreyssig, C. Zhang, P. C. Woodland Cambridge University Engineering Dept., Trumpington St., Cambridge, CB2 1PZ U.K. {flk24,cz277,pcw}@eng.cam.ac.uk

More information

SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING DEEP RECURRENT NEURAL NETWORKS

SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING DEEP RECURRENT NEURAL NETWORKS SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING DEEP RECURRENT NEURAL NETWORKS Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis Department of Electrical and Computer Engineering,

More information

Speech Enhancement In Multiple-Noise Conditions using Deep Neural Networks

Speech Enhancement In Multiple-Noise Conditions using Deep Neural Networks Speech Enhancement In Multiple-Noise Conditions using Deep Neural Networks Anurag Kumar 1, Dinei Florencio 2 1 Carnegie Mellon University, Pittsburgh, PA, USA - 1217 2 Microsoft Research, Redmond, WA USA

More information

FBMC/OQAM for the Asynchronous Multi-User MIMO Uplink

FBMC/OQAM for the Asynchronous Multi-User MIMO Uplink FBMC/OQAM for the Asynchronous Muti-User MIMO Upin Yao Cheng, Peng Li, and Martin Haardt Communications Research Laboratory, Imenau University of Technoogy P. O. Box 100565, D-98694 Imenau, Germany {y.cheng,

More information

Theoretical Profile of Ring-Spun Slub Yarn and its Experimental Validation

Theoretical Profile of Ring-Spun Slub Yarn and its Experimental Validation Chong-Qi Ma, Bao-Ming Zhou, Yong Liu, Chuan-Sheng Hu Schoo of Texties, Tianjin Poytechnic University, 399 West Binshui Road, Xiqing District, Tianjin, 300387, China E-mai: iuyong@tjpu.edu.cn Theoretica

More information

Performance of Single User vs. Multiuser Modulation in Wireless Multicarrier (MC) Communications

Performance of Single User vs. Multiuser Modulation in Wireless Multicarrier (MC) Communications erformance of Singe User vs. Mutiuser Moduation in Wireess Muticarrier (MC) Communications Anwaru Azim, ecturer, East West University Bangadesh Abstract-- he main objective of this paper is to compare

More information

COMPARATIVE ANALYSIS OF ULTRA WIDEBAND (UWB) IEEE A CHANNEL MODELS FOR nlos PROPAGATION ENVIRONMENTS

COMPARATIVE ANALYSIS OF ULTRA WIDEBAND (UWB) IEEE A CHANNEL MODELS FOR nlos PROPAGATION ENVIRONMENTS COMPARATIVE ANALYSIS OF ULTRA WIDEBAND (UWB) IEEE80.15.3A CHANNEL MODELS FOR nlos PROPAGATION ENVIRONMENTS Ms. Jina H. She PG Student C.C.E.T, Wadhwan, Gujarat, Jina_hshet@yahoo.com Dr. K. H. Wandra Director

More information

Joint Optimal Power Allocation and Relay Selection with Spatial Diversity in Wireless Relay Networks

Joint Optimal Power Allocation and Relay Selection with Spatial Diversity in Wireless Relay Networks Proceedings of SDR'11-WInnComm-Europe, 22-24 Jun 2011 Joint Optima Power Aocation and Reay Seection with Spatia Diversity in Wireess Reay Networks Md Habibu Isam 1, Zbigniew Dziong 1, Kazem Sohraby 2,

More information

Radar Signal Demixing via Convex Optimization

Radar Signal Demixing via Convex Optimization Radar Signa Demixing via Convex Optimization Youye Xie Shuang Li Gongguo Tang and Michae B. Wain Department of Eectrica Engineering Coorado Schoo of Mines Goden CO USA Emai: {youyexie shuangi gtang mwain@mines.edu

More information

STUDY ON AOTF-BASED NEAR-INFRARED SPECTROSCOPY ANALYSIS SYSTEM OF FARM PRODUCE QUALITY

STUDY ON AOTF-BASED NEAR-INFRARED SPECTROSCOPY ANALYSIS SYSTEM OF FARM PRODUCE QUALITY STUDY ON AOTF-BASED NEAR-INFRARED SPECTROSCOPY ANALYSIS SYSTEM OF FARM PRODUCE QUALITY Xiaochao Zhang *, Xiaoan Hu, Yinqiao Zhang, Hui Wang, Hui Zhang 1 Institute of Mechatronics Technoogy and Appication,

More information

Model of Neuro-Fuzzy Prediction of Confirmation Timeout in a Mobile Ad Hoc Network

Model of Neuro-Fuzzy Prediction of Confirmation Timeout in a Mobile Ad Hoc Network Mode of Neuro-Fuzzy Prediction of Confirmation Timeout in a Mobie Ad Hoc Network Igor Konstantinov, Kostiantyn Poshchykov, Sergej Lazarev, and Oha Poshchykova Begorod State University, Pobeda Street 85,

More information

Optimal and Suboptimal Finger Selection Algorithms for MMSE Rake Receivers in Impulse Radio Ultra-Wideband Systems 1

Optimal and Suboptimal Finger Selection Algorithms for MMSE Rake Receivers in Impulse Radio Ultra-Wideband Systems 1 Optima and Suboptima Finger Seection Agorithms for MMSE Rake Receivers in Impuse Radio Utra-Wideband Systems Sinan Gezici, Mung Chiang, H. Vincent Poor and Hisashi Kobayashi Department of Eectrica Engineering

More information

Relays that Cooperate to Compute

Relays that Cooperate to Compute Reays that Cooperate to Compute Matthew Nokeby Rice University nokeby@rice.edu Bobak Nazer Boston University bobak@bu.edu Behnaam Aazhang Rice University aaz@rice.edu Natasha evroye University of Iinois

More information

FREQUENCY-DOMAIN TURBO EQUALIZATION FOR SINGLE CARRIER MOBILE BROADBAND SYSTEMS. Liang Dong and Yao Zhao

FREQUENCY-DOMAIN TURBO EQUALIZATION FOR SINGLE CARRIER MOBILE BROADBAND SYSTEMS. Liang Dong and Yao Zhao FREQUENCY-DOMAIN TURBO EQUALIZATION FOR SINGLE CARRIER MOBILE BROADBAND SYSTEMS Liang Dong and Yao Zhao Department of Eectrica and Computer Engineering Western Michigan University Kaamazoo, MI 49008 ABSTRACT

More information

SMOOTHED DOPPLER PROFILE IN MST RADAR DATA- THE MODIFIED CEPSTRUM APPROACH

SMOOTHED DOPPLER PROFILE IN MST RADAR DATA- THE MODIFIED CEPSTRUM APPROACH SMOOTHED DOPPLER PROFILE IN MST RADAR DATA- THE MODIFIED CEPSTRUM APPROACH M. Venatanarayana 1 and T. Jayachandra Prasad 1 Department of ECE, KSRE, Kadapa, India RGET, Nandya, India E-Mai: narayanamoram@gmai.com

More information

Communication Systems

Communication Systems Communication Systems 1. A basic communication system consists of (1) receiver () information source (3) user of information (4) transmitter (5) channe Choose the correct sequence in which these are arranged

More information

Cross-layer queuing analysis on multihop relaying networks with adaptive modulation and coding K. Zheng 1 Y. Wang 1 L. Lei 2 W.

Cross-layer queuing analysis on multihop relaying networks with adaptive modulation and coding K. Zheng 1 Y. Wang 1 L. Lei 2 W. www.ietd.org Pubished in IET Communications Received on 18th June 2009 Revised on 30th Juy 2009 ISSN 1751-8628 Cross-ayer queuing anaysis on mutihop reaying networks with adaptive moduation and coding

More information

DISTANT speech recognition (DSR) [1] is a challenging

DISTANT speech recognition (DSR) [1] is a challenging 1 Convolutional Neural Networks for Distant Speech Recognition Pawel Swietojanski, Student Member, IEEE, Arnab Ghoshal, Member, IEEE, and Steve Renals, Fellow, IEEE Abstract We investigate convolutional

More information

DESIGN OF SHIP CONTROLLER AND SHIP MODEL BASED ON NEURAL NETWORK IDENTIFICATION STRUCTURES

DESIGN OF SHIP CONTROLLER AND SHIP MODEL BASED ON NEURAL NETWORK IDENTIFICATION STRUCTURES DESIGN OF SHIP CONROLLER AND SHIP MODEL BASED ON NEURAL NEWORK IDENIFICAION SRUCURES JASMIN VELAGIC, FACULY OF ELECRICAL ENGINEERING SARAJEVO, BOSNIA AND HERZEGOVINA, asmin.veagic@etf.unsa.ba ABSRAC his

More information

Space-Time Focusing Transmission in Ultra-wideband Cooperative Relay Networks

Space-Time Focusing Transmission in Ultra-wideband Cooperative Relay Networks ICUWB 2009 (September 9-11, 2009) 1 Space-Time Focusing Transmission in Utra-wideband Cooperative Reay Networks Yafei Tian and Chenyang Yang Schoo of Eectronics and Information Engineering, Beihang University

More information

Copyright 2000 IEEE. IEEE Global Communications Conference (Globecom 2000), November 27 - December 1, 2000, San Francisco, California, USA

Copyright 2000 IEEE. IEEE Global Communications Conference (Globecom 2000), November 27 - December 1, 2000, San Francisco, California, USA Copyright 2000 EEE. EEE Goba Communications Conference (Gobecom 2000), November 27 - December 1, 2000, San Francisco, Caifornia, USA Persona use of this materia is permitted. owever, permission to reprint/repubish

More information

A CPW-Fed Printed Monopole Ultra-Wideband Antenna with E-Shaped Notched Band Slot

A CPW-Fed Printed Monopole Ultra-Wideband Antenna with E-Shaped Notched Band Slot Iraqi Journa of Appied Physics Emad S. Ahmed Department of Eectrica and Eectronic Engineering University of Technoogy, Baghdad, Iraq A CPW-Fed Printed Monopoe Utra-Wideband Antenna with E-Shaped Notched

More information

JOINT NOISE AND MASK AWARE TRAINING FOR DNN-BASED SPEECH ENHANCEMENT WITH SUB-BAND FEATURES

JOINT NOISE AND MASK AWARE TRAINING FOR DNN-BASED SPEECH ENHANCEMENT WITH SUB-BAND FEATURES JOINT NOISE AND MASK AWARE TRAINING FOR DNN-BASED SPEECH ENHANCEMENT WITH SUB-BAND FEATURES Qing Wang 1, Jun Du 1, Li-Rong Dai 1, Chin-Hui Lee 2 1 University of Science and Technology of China, P. R. China

More information

A Novel Method for Doppler and DOD- DOA Jointly Estimation Based on FRFT in Bistatic MIMO Radar System

A Novel Method for Doppler and DOD- DOA Jointly Estimation Based on FRFT in Bistatic MIMO Radar System 7 Asia-Pacific Engineering and Technoogy Conference (APETC 7) ISBN: 978--6595-443- A Nove Method for Dopper and DOD- DOA Jointy Estimation Based on FRFT in Bistatic MIMO Radar System Derui Song, Li Li,

More information

arxiv: v1 [cs.it] 22 Jul 2014

arxiv: v1 [cs.it] 22 Jul 2014 MODULATION FORMATS AND WAVEFORMS FOR THE PHYSICAL LAYER OF 5G WIRELESS NETWORKS: WHO WILL BE THE HEIR OF OFDM? Paoo Banei, Stefano Buzzi, Giuio Coavope, Andrea Modenini, Fredrik Rusek, and Aessandro Ugoini

More information

An Optimization Framework for XOR-Assisted Cooperative Relaying in Cellular Networks

An Optimization Framework for XOR-Assisted Cooperative Relaying in Cellular Networks n Optimization Framework for XOR-ssisted Cooperative Reaying in Ceuar Networks Hong Xu, Student Member, IEEE, Baochun Li, Senior Member, IEEE bstract This work seeks to address two questions in cooperative

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 9: Brief Introduction to Neural Networks Instructor: Preethi Jyothi Feb 2, 2017 Final Project Landscape Tabla bol transcription Music Genre Classification Audio

More information

A New Framework for Supervised Speech Enhancement in the Time Domain

A New Framework for Supervised Speech Enhancement in the Time Domain Interspeech 2018 2-6 September 2018, Hyderabad A New Framework for Supervised Speech Enhancement in the Time Domain Ashutosh Pandey 1 and Deliang Wang 1,2 1 Department of Computer Science and Engineering,

More information

Discriminative Enhancement for Single Channel Audio Source Separation using Deep Neural Networks

Discriminative Enhancement for Single Channel Audio Source Separation using Deep Neural Networks Discriminative Enhancement for Single Channel Audio Source Separation using Deep Neural Networks Emad M. Grais, Gerard Roma, Andrew J.R. Simpson, and Mark D. Plumbley Centre for Vision, Speech and Signal

More information

Path Delay Estimation using Power Supply Transient Signals: A Comparative Study using Fourier and Wavelet Analysis

Path Delay Estimation using Power Supply Transient Signals: A Comparative Study using Fourier and Wavelet Analysis Path Deay Estimation using Power Suppy Transient Signas: A Comparative Study using Fourier and Waveet Anaysis Abhishek Singh, Jitin Tharian and Jim Pusqueic VLSI Research Laboratory Department of Computer

More information

CAPACITY OF UNDERWATER WIRELESS COMMUNICATION CHANNEL WITH DIFFERENT ACOUSTIC PROPAGATION LOSS MODELS

CAPACITY OF UNDERWATER WIRELESS COMMUNICATION CHANNEL WITH DIFFERENT ACOUSTIC PROPAGATION LOSS MODELS CAPACITY OF UNDERWATER WIRELESS COMMUNICATION CHANNEL WITH DIFFERENT ACOUSTIC PROPAGATION LOSS MODELS Susan Joshy and A.V. Babu, Department of Eectronics & Communication Engineering, Nationa Institute

More information

GRAY CODE FOR GENERATING TREE OF PERMUTATION WITH THREE CYCLES

GRAY CODE FOR GENERATING TREE OF PERMUTATION WITH THREE CYCLES VO. 10, NO. 18, OCTOBER 2015 ISSN 1819-6608 GRAY CODE FOR GENERATING TREE OF PERMUTATION WITH THREE CYCES Henny Widowati 1, Suistyo Puspitodjati 2 and Djati Kerami 1 Department of System Information, Facuty

More information

CNMF-BASED ACOUSTIC FEATURES FOR NOISE-ROBUST ASR

CNMF-BASED ACOUSTIC FEATURES FOR NOISE-ROBUST ASR CNMF-BASED ACOUSTIC FEATURES FOR NOISE-ROBUST ASR Colin Vaz 1, Dimitrios Dimitriadis 2, Samuel Thomas 2, and Shrikanth Narayanan 1 1 Signal Analysis and Interpretation Lab, University of Southern California,

More information

An Efficient Adaptive Filtering for CFA Demosaicking

An Efficient Adaptive Filtering for CFA Demosaicking Dev.. Newin et. a. / (IJCSE) Internationa Journa on Computer Science and Engineering An Efficient Adaptive Fitering for CFA Demosaicking Dev.. Newin*, Ewin Chandra Monie** * Vice Principa & Head Dept.

More information