Text Emotion Detection using Neural Network

Similar documents
Epoch Extraction From Emotional Speech

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Characterization of LF and LMA signal of Wire Rope Tester

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Sentiment Analysis of User-Generated Contents for Pharmaceutical Product Safety

Latest trends in sentiment analysis - A survey

Voice Activity Detection

Voiced/nonvoiced detection based on robustness of voiced epochs

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

A Novel Fuzzy Neural Network Based Distance Relaying Scheme

SPEECH AND SPECTRAL ANALYSIS

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

Handling Emotions in Human-Computer Dialogues

Linear Gaussian Method to Detect Blurry Digital Images using SIFT

Applications of Music Processing

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

Live Hand Gesture Recognition using an Android Device

Improved Detection by Peak Shape Recognition Using Artificial Neural Networks

Speech Synthesis using Mel-Cepstral Coefficient Feature

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

A comparative study of different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron

Speech Signal Analysis

Different Approaches of Spectral Subtraction Method for Speech Enhancement

CHAPTER 6 BACK PROPAGATED ARTIFICIAL NEURAL NETWORK TRAINED ARHF

COMP 546, Winter 2017 lecture 20 - sound 2

Can binary masks improve intelligibility?

Slovak University of Technology and Planned Research in Voice De-Identification. Anna Pribilova

Speech Enhancement using Wiener filtering

Figure 1. Artificial Neural Network structure. B. Spiking Neural Networks Spiking Neural networks (SNNs) fall into the third generation of neural netw

Multimodal Face Recognition using Hybrid Correlation Filters

CC4.5: cost-sensitive decision tree pruning

IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

SLIC based Hand Gesture Recognition with Artificial Neural Network

Communications Theory and Engineering

A New Localization Algorithm Based on Taylor Series Expansion for NLOS Environment

Auto-tagging The Facebook

Blind Blur Estimation Using Low Rank Approximation of Cepstrum

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis

Speech Recognition using FIR Wiener Filter

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

Journal of Chemical and Pharmaceutical Research, 2013, 5(9): Research Article. The design of panda-oriented intelligent recognition system

Adaptive Waveforms for Target Class Discrimination

Use of Neural Networks in Testing Analog to Digital Converters

BIOMETRIC IDENTIFICATION USING 3D FACE SCANS

Fault Diagnosis of Analog Circuit Using DC Approach and Neural Networks

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Voice Excited Lpc for Speech Compression by V/Uv Classification

Isolated Digit Recognition Using MFCC AND DTW

3D Face Recognition in Biometrics

NCCF ACF. cepstrum coef. error signal > samples

Generating Personality Character in a Face Robot through Interaction with Human

NOISE ESTIMATION IN A SINGLE CHANNEL

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

CHAPTER 1 INTRODUCTION

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

SOUND SOURCE RECOGNITION AND MODELING

Pitch Period of Speech Signals Preface, Determination and Transformation

COGNITIVE Radio (CR) [1] has been widely studied. Tradeoff between Spoofing and Jamming a Cognitive Radio

Acoustic Echo Cancellation using LMS Algorithm

System Identification and CDMA Communication

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Classification of Voltage Sag Using Multi-resolution Analysis and Support Vector Machine

Local and Low-Cost White Space Detection

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1

Emotion analysis using text mining on social networks

Analysis of the SNR Estimator for Speech Enhancement Using a Cascaded Linear Model

Emotion Based Music Player

Psychology of Language

A Comparative Study of Formant Frequencies Estimation Techniques

DIAGNOSIS OF STATOR FAULT IN ASYNCHRONOUS MACHINE USING SOFT COMPUTING METHODS

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

Propagation Modelling White Paper

Speed and Accuracy Improvements in Visual Pattern Recognition Tasks by Employing Human Assistance

Audio Signal Compression using DCT and LPC Techniques

SMARTPHONE SENSOR BASED GESTURE RECOGNITION LIBRARY

Chapter 2 Channel Equalization

Resonance and resonators

Speech Enhancement Based On Noise Reduction

Speech Endpoint Detection Based on Sub-band Energy and Harmonic Structure of Voice

EE482: Digital Signal Processing Applications

NEURAL NETWORK DEMODULATOR FOR QUADRATURE AMPLITUDE MODULATION (QAM)

Decriminition between Magnetising Inrush from Interturn Fault Current in Transformer: Hilbert Transform Approach

Comparison of Head Movement Recognition Algorithms in Immersive Virtual Reality Using Educative Mobile Application

Histogram Equalization: A Strong Technique for Image Enhancement

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

DESIGN AND IMPLEMENTATION OF AN ALGORITHM FOR MODULATION IDENTIFICATION OF ANALOG AND DIGITAL SIGNALS

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech/Music Change Point Detection using Sonogram and AANN

Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics

Transcription:

International Journal of Engineering Research and Technology. ISSN 0974-3154 Volume 7, Number 2 (2014), pp. 153-159 International Research Publication House http://www.irphouse.com Text Emotion Detection using Neural Network Rupinder Singh, Veenu Mangat and Mandeep Kaur University Institute of Engineering and Technology Panjab University Chandigarh rupinder0590@gmail.com, veenumangat@yahoo.com, mandeep@pu.ac.in Abstract Text emotion detection refers to identifying the type of emotion getting used by the text. The process involves two process training and testing. The training section involves training the classifier with the text and the testing section involves the identification of the type of text used. After this accuracy of the classifier is checked by measuring how many correct labels of the text does the classifier identifies. This paper focuses on the enhancement of the text emotion detection using back propagation neural network. The classification results have improved by 5 to 10 percent KEYWORDS: Text emotion, Feature Extraction, Classification, Neural Networks INTRODUCTION Detecting emotional state of a person by analyzing a text document written by him/her appear challenging but also essential many times due to the fact that most of the times textual expressions are not only direct using emotion words but also result from the interpretation of the meaning of concepts and interaction of concepts which are described in the text document. Recognizing the emotion of the text plays a key role in the human-computer interaction. Emotions may be expressed by a person s speech, face expression and written text known as speech, facial and text based emotion respectively. Sufficient amount of work has been done regarding to speech and facial emotion recognition but text based emotion recognition system still needs attraction of researchers[1]. In computational linguistics, the detection of human emotions in text is becoming increasingly important from an applicative point of view. Emotion is expressed as joy, sadness, anger, surprise, hate, fear and so on. Since there is not any standard emotion word hierarchy, focus is on the related research about emotion in cognitive psychology domain. Emotions In Social Psychology, in

154 Rupinder Singh, Veenu Mangat and Mandeep Kaur which he explained the emotion system and formally classified the human emotions through an emotion hierarchy in six classes at primary level which are Love, Joy, Anger, Sadness, Fear and Surprise. Certain other words also fall in secondary and tertiary levels. [2] Keyword Spotting Technique The keyword pattern matching problem can be described as the problem of finding occurrences of keywords from a given set as substrings in a given string [4]. This problem has been studied in the past and algorithms have been suggested for solving it. In the context of emotion detection this method is based on certain predefined keywords. These words are classified into categories such as disgusted, sad, happy, angry, fearful, surprised etc. Process of Keyword spotting method is shown in the figure 1. Keyword spotting technique for emotion recognition consists of five steps shown in fig.1 where a text document is taken as input and output is generated as an emotion class. At the very first step text data is converted into tokens, from these tokens emotion words are identified and detected. Initially this technique takes some text as input and in next step we perform tokenization to the input text. Words related to emotions will be identified in the next step afterwards analysis of the intensity of emotion words will be performed. Sentence is checked whether negation is involved in it or not then finally an emotion class will be found as the required output. Design of Neural Networks for Emotion Recognition General description Now with all the background knowledge, we can start the design of neural networks

Text Emotion Detection using Neural Network 155 for emotion recognition. The 12 features data obtained from the face tracker are used as the input of the 12 input nodes in a neural network. The output layer contains 2-7 nodes that represent the emotion categories, depending on different networks. There are 1 or 2 hidden layers and the number of hidden nodes ranges from 1 to 29x29. The learning rate, momentum number, and the parameter of the sigmoid activation function are automatically adjusted during the training procedure. In some networks, the Powell s method is considered, while in others, a set of empirical ways are combined, i.e. take the peak frames of the emotion data sequence, sort the training set, delete some of the emotions, normalize the output, set threshold to the weights, etc. The test results are based on Cohn-Kanade database and on the authentic database separately. The activation function we used is the sigmoid function. In the following sections, we present a description of all parameters, and their combinations results in experiments.[3] Weights Back-propagation is a gradient descent search, so it is easy to stop at a local minimum, while randomly selected weights help to avoid this. If the weights are too large, the networks tend to get saturated. The solution is to ensure that, after weight initialization and before learning, the output of all the neurons is small value between [-0.5, 0.5]. We initialize the weights by a random function and ignore those weights that are larger than a specific threshold which can also be adjusted as one of the parameter of the network. One question is if during the training procedure, should we constrain the weights as well? This partially depends on how large the input is, because the sigmoid function is very close to one when the input is greater than 10. Since our feature data for input is very small, usually smaller than 2, we set the threshold of the weights, ignoring those weights that exceed this constraint during training. It came out that this did not make much difference at improving the hit rate. On the other hand, when we tried a very strict threshold in a 2-hidden-layer neural network during the training procedure, sometimes it led awful performance of the 2- layer network. This is because the parameters of the activation we set in a 2-hiddenlayer network were not proper for that threshold, and caused the saturation of the neuron. Hence, we gave a very large threshold after initialization to the weights to avoid similar problems. Another thing we should take care of is the starting point, which can also affect the search direction to find a good local minimum( figure 2)If we start at point A, we obtain the global minimum, while from C, we get the local minimum. So we should try different staring points by initializing the weights with different random values. We tested this in some of the networks and found that the accuracy curve fluctuates, but not too much.

156 Rupinder Singh, Veenu Mangat and Mandeep Kaur Figure: Local minimum and global minimum Key parameters and their combination In design of neural networks, there are some critical parameters that need to be set, i.e. the learning rate α, the momentum number λ, and the activation function parameter σ. The speed of the learning is governed by the learning rate α. The momentum number carries along the weight change, thus it tends to smooth the error-weight space. An improper value of σ will cause the neuron saturated. In general, the performance of neural networks will be very awful, if these values are not chosen correctly. Unfortunately, there are no precise rules or mathematics definition upon when one should choose what particular value of these parameters. Normally, the setting of the parameters is done empirically. Does it help in finding better combinations if we let the computer do part of the job? We tried this in the following way. First, we defined three different categories. The increase or decrease step size of α, λ, and σ is given by input or macro definition. This depends on how frequently these categories should be changed during training procedure, e.g., for those categories that need little interference, we give a macro definition for training efficiency. When better accuracy occurs, the rates together with the parameters which lead to this accuracy are recorded in a file. We repeat the training until the accuracy stops improving for some turns. When testing, we construct a neural network by reading these parameters from the file. Since the training with the complete combinations of these parameters costs quite a long time, we only tried a part of these three parameters combinations. Therefore, it is possible to miss some better combinations of them. Feature Extraction To achieve a successful classification, it is extremely important to extract the relevant features from the processed speech data. The most important features for emotions classification are summarized as follows as Pitch Pitch is the most distinctive difference between male and female. A person s pitch

Text Emotion Detection using Neural Network 157 originates in the vocal cords/folds, and the rate at which the vocal folds vibrate is the frequency of the pitch. Various Pitch Detection Algorithms (PDAs) have been developed: Autocorrelation method, Harmonic Product Spectrum (HPS), Robust Algorithm for Difference Function (AMDF) method, Cepstrum Pitch Determination (CPD), Simplified Inverse Filtering Tracking (SIFT). and Direct Time Domain Fundamental Frequency Estimation (DFE). Most of them have a very high accuracy for voiced pitch estimation, but the error rate considering voicing decision is still quite high. Moreover, the PDAs performance degrades significantly as the signal conditions deteriorate. The automatic glottal inverse filtering method and iterative adaptive inverse filtering (IAIF) was used as a computational tool for getting an accurate estimation for pitch Formants The formants are one of the quantitative characteristics of the vocal tract. In the frequency domain, the location of vocal tract resonances depends upon the shape and the physical dimensions of the vocal tract. Each formant is characterized by its center frequency and its bandwidth as in [4],. The formants can be used to discriminate the improved articulated speech from the slackened one. The formant bandwidth during slackened articulated speech is gradual, whereas the formant bandwidth during improved articulated speech is narrow with steep flanks. A simple method to estimate formant frequencies and formant bandwidths relies on linear prediction analysis. Energy Energy is one of the most important features that give good information about the emotion. The long term definition of signal energy is defined as in (1): Energy= (x normalised ) 2 There is little or no utility of this definition for time-varying signals, speech. So the short term energy contour is evaluated because it is related to the arousal level of emotions as in (2): Energy n = m=n-n+1 n [x(m)w(n-m)] 2 where w(n-m) is the window, n is the sample that the analysis window is centered on, and N is the window size. ALGORITHM 1. Upload Files for all categories ( SAD HAPPY ANGRY ) 2. Store values in in db 3. Target(1:4- for every category ) 4. Y=Net.train(Uploaded_Vaues_files, targets,epochs) 5. Upload a value for testing 6. Test Sample=Input Sample 7. G=Newff(Y,Uploaded Set,10) where 10 is the number of neurons 8. If G<=1 9. CATEGORY-SAD

158 Rupinder Singh, Veenu Mangat and Mandeep Kaur 10. Else if 1<G<2 11. CATEGORY HAPPY 12. Else if2< G <3 13. CATEGORY ANGRY RESULT OF CLASSIFICATION The results are categorized as below Figure represents the main working window where training and testing can be performed. The left side contains the training part where as the right side contains the testing part. The training section involves the value of the uploading of the files and the testing section involves the classification. The above figure[] represents the configuration of the classification using neural network in which back propagation neural network has been called. The input contains two layers. The first layer is the input set of data which the user has uploaded to be tested and the second input is the stored database values. The output would be computed according to the estimation values as mentioned in the algorithm. The results can be classified according to the following matrix

Text Emotion Detection using Neural Network 159 FILE CONTENT LENGTH CATEGORY SAD CATEGORY HAPPY CATEGORY ANGRY 100 words 85 % 92 % 93 % 200 words 89 % 94% 95 % 250 words 89.2 % 93.7% 96.3% The above figure represents the false positive ratio of the contents when plotted into the real time scenario in which Sad has the maximum occurance followed by happy and then angry. The current research work opens a lot of doors for the future research workers. The current system does not signify any mixed emotion data and also the classifiers can be upgraded to BFO. REFERENCES [1] Nicu Sebe, Michael S. Lew, Ira Cohen, Ashutosh Garg, Thomas S. Huang Emotion Recognition Using a Cauchy Naive Bayes Classifier ICPR, 2002 [2] G. Little Wort, I. Fasel, M. Stewart Bartlett, J. Movellan Fully automatic codeing of basic expressions from video, University of California, San Diego [3] C. Maaoui, A. Pruski, and F. Abdat, Emotion recognition for human machine communication, Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 08), IEEE Computer Society, Sep. 2008, pp. 1210-1215, doi: 10.1109/IROS.2008.4650870 [4] Chun-Chieh Liu, Ting-Hao Yang, Chang-Tai Hsieh, Von-Wun Soo, Towards Text-based Emotion Detection: A Survey and Possible Improvements,in International Conference on Information Management and Engineering,2009. [5] N. Fragopanagos, J.G. Taylor, Emotion recognition in human computer interaction, Department of Mathematics, King s College, Strand, London WC2 R2LS, UK Neural Networks 18 (2005) 389 405 march 2005.

160 Rupinder Singh, Veenu Mangat and Mandeep Kaur