Evaluation Model of Microblog Information Confidence Based on BP Neural Network

Evaluation Model of Microblog Information Confidence Based on BP Neural Network Yuguang Ye Quanzou Normal University; Quanzou, 36, Cina Abstract: As te carrier of social media, microblog as become an important broadcasting tool for news. However, te caracteristics of microblog platform causes tat tey cannot provide effective mecanisms to avoid te transmission of rumours or false information. Ten we take te information around te main context as te features of microblog classification, integrated wit te context to form a mixed feature for feature extraction based on classification. Te BP-based network is establised to evaluate te confidence of news by te mixed features. During te simulations, we adopt te real data in certain large websites to perform detailed analysis on te features and model proposed in tis paper. Te results sow tat te improved evaluation model as better performance to distinguis te autenticity of te news. Te mixed features can provide better review of discrimination so our model can effectively solve te problems on confidence evaluation and rumour detection. Keywords: microblog; rumour; comments; confidence; BP neural network; mixed feature. Introduction In recent years, wit te rapid development of social media and te development of Internet, social networking as become an important tool for people to communicate and excange eac oter. As an important form of social networks, microblog [-4] plays an increasingly important role in te dissemination of news process. In social network, te information is broadcasting fast like virus. Once a piece of news information enters te micro-blog platform, it usually will be forwarded by tousands of users in a sort time. However, since most of te micro-blog users are ordinary users, tey are in te process of dissemination of news, usually lacking te identification of true or false news, and leading to rumours or false information tat is spread by unsuspecting public. It will mislead te users and even cause a great negative impact to te wole social media [5,6]. Furtermore, if te social network is full of false information it will cause users not to trust te social network, wic fundamentally influence te ealty development of social media and social networks. It is an important and difficult problem to stop rumours and false information from te source rapidly and effectively. Recently, more and more researcers ave studied te direction of information dissemination on te Internet, most of wic are based on data analysis of social network platform. Traditional confidence evaluation or rumours detection mostly begin from news dissemination, news content, news sources, to analyse te confidence of te news ualitatively, and tere exists few uantitative models or metods. Tere are parts of researc results [7-9], using te broadcasting mecanism and content of message to establis uantitative confidence evaluation model. Suc researces all make use of te caracteristics of wide participation in news broadcasting, wile tey neglect self-comments or opinions wen user participating te broadcasting process. Tese comments or opinions ave important value on evaluating te news confidence and distinguising te autenticity of te message. Terefore, it is necessary to establis an effective evaluation model based on te comments. Based on te analysis above, tis paper analyses te caracteristics of te social network, using mixed feature extraction simultaneously, to automatically realize te rapid detection of false news or rumours in microblog by confidence evaluation algoritm. Ten it provides accurate judgment of rumours and false information to prevent it from broadcasting witout affecting te work of social media. For te algoritm design, we focus on te feature extraction of information and discriminant model selection, to try to find a metod tat is more suitable to distinguising true and false information. Compared to oter information broadcasting model, te messages of social media ave its specific caracteristics. It as certain organization structure and we can obtain more information about te user's communication, wic are more conducive to te true and false information to make judgments. In te first section, we provide relative definitions about te microblog comments. Ten it introduces a classification metod based on mixed features. Te tird section adopts te classification formed by above study to establis a microblog news confidence evaluation model based on BP neural network, wic is used to determine te autenticity of te message. Te fourt section designs an experiment for te proposed sceme and analyses te results. Finally, we summarize our researc and propose te future work of te project.. Microblog Information Classification Based on Mixed Features If we ignore te pictures, video, text information in micro-blog and only focus on te words part, te micro-blog can be regarded as a special sort text. We analyse te similarities and differences between microblogs and oter common sort texts. Ten we define tree uantitative caracteristics of microblog comment: supportability, content relevance, confidence, troug natural language processing, statistical learning metod etc. Tey are used to engrave te language caracteristics of te relationsip between source microblog and comment content. Tese features represent different views of explaining te comment content: supportability derives from te support analysis of te comments content, reflecting preliminary discriminative viewpoints of users on source microblog news; content correlation derives from te correlation analysis between comments content and source microblog news content; confidence derives from confidence analysis of comment content, reflecting te uncertainty of opinions commented by te users temselves. Te analysis of relevant information is sown as table : Table. Related information analysis of microblog Relevant information of microblog Must appear Relation wit classification Canged wit time Supportability Name Yes medium No Te number of concerns Yes low No Te number of fans Yes low Yes Te man be cited No low Yes Te number of news tat are Yes low Yes forwarded content correlation Title in brackets Yes low No Content of microblog Yes low No related links Yes ig No Publiser name Yes ig No confidence autenticates Yes ig Yes Personal information No ig No Journal of Residuals Science & Tecnology, Vol. 3, No. 5, 6 34. 6 DEStec Publications, Inc. doi:.783/issn.544-853/3/5/34

Te information in comment No medium No content sent by autenticated users Labeled tags No low Yes Te body part of micro-blog is not sufficient as traditional news []. Terefore, we ave to introduce specific microblog features to fill in te missing semantics, suc as te release of personal information, related links in content, te following comments of microblog, etc. Tese contents can be introduced along wit te microblog text, call mixed features of microblog. For example, we may define te mixed feature as a T T, T, T3, T4 T te web =. T includes personal information of te autor; T is te text part; 3 four uadruple ( ) content pointed by te links in related links of te content; T 4 is comment content of autenticated users. Tese four parts are integrated to fulfill te lacked information, to determine wic class of microblog it belongs to. T: Personal information T: Body part T3: Related links T(Category) T4: Comment contents Figure. Classification metod based on mixed caracteristics of micro-blog 3. Confidence Evaluation Model 3. Principle Idea By intuitive analysis, te users supportability of rumours is weak. Its content and te relevance of te content of source micro-blog news is also weak, and te uncertainty of te content of te comment is strong. As mentioned above, micro-blog's function it bears is very diverse. Toug micro-blog as become an important platform for news and information dissemination way, te proportion of news information in all micro-blog is not ig in fact []. Most of te microblog information belongs to te range of gossip. If we want to make evaluation and prediction on te confidence, we must find te types of news tat is useful for evaluation. First we adopt te feature extraction to extract te information of content and users. Nest, it enters te confidence evaluation system. By te mixed feature we can determine weter te news of microblog is gossip or useful one. If it te latter, it will go to te next classification evaluation mecanism to compute te final confidence by BP neural network. Tus, we design an algoritm framework for confidence evaluation, wic is depicted as figure : Microblogs to be detected Supportability Content correlation Reliability Mixed caracteristics Detecting results Classification of true comments and flase comments BP neuralnetwork Figure. Microblog information confidence degree evaluation algoritm framework 3. Algoritm Description Tis paper describes tree features: supportability, content relevance and confidence, balancing te mutual relation among source microblogs and language features of comment content from different points of views. Compared wit true comments, te rumours ave more comments wit low confidence, negative comments and low content relevance []. So te object of defining feature is to distinguis source micro-blog news autenticity, and assess te confidence of te source micro-blog news. Input layer Hidden layer Output layer x N, N, N, y x N, N, N, y xn N,n N,p N, y Figure 3. Te structure of BP neural network model We define a news confidence evaluation model based on BP neural network. It takes into account tree features of comments as input variables of te model, and source microblogging message autenticity as output variables. Since input variables derives from microblog comment, tere is not obvious or direct corresponding relationsip between input variables and output variables, and it may cause loss of accuracy wen using macine learning algoritm to construct te model. By te analysis of BP neural network mode, we know tat te model includes input layer, idden layer and output layer [3]. Tere exist implicit variables between input variables and output variables and te idden variables accurately depicts indirect and implicit causality te variables, wic is in line wit te actual application scenarios of tis paper. Tus, we adopt BP neural network algoritm to establis te model. Journal of Residuals Science & Tecnology, Vol. 3, No. 5, 6 34. 6 DEStec Publications, Inc. doi:.783/issn.544-853/3/5/34

Generally, we set,, as te input variables of model, and as te output variables of model. Te structure of te model is depicted as figure 3. From figure 3 we know, te model is a 3-layer BP neural network. Te principle teory is: input signal makes effect on te output units troug te units in idden layer. After non-linear transformation it generates output signal. Eac sample in network training include input vector and expected output vector. Te bias between network output value and expected output value will adjust te connection weigt between input unit and idden layer unit, te weigt between idden layer unit and output unit, and te tresolds of eac neuron, to make te error descend along te gradient direction. By repeatedly iterated training, we can determine te network parameters relative to te minimum error. We define te input vector is x= ( x, x,..., x n ) ; input vector of idden layer i = ( i, i,..., i p ) ; output vector of idden layer is o = ( o, o,..., o p ) ; output vector of output layer is yo = ( yo, yo,..., yo ) ; expected output vector yo = ( y, y,..., y) ; te connection weigt between input layer and idden layer is w i ; te connection weigt between idden layer and output layer is w o ; te tresold of idden layer neuron is b ; te tresold of output layer neuron is b o. Assuming te number of samples is k =,,.., m; te error function is e = ( yo yoo). = Step : Variable and parameters initialization. Assign a random number in (-.) to eac connection weigt. Set error function e, given computation accuracy value ε and te maximum study times M. Step : Coose te samples. xk ( ) = ( x, x,... x), y = ( y, y,... y) n o Step 3: Compute te input and output of eac neuron in idden layer and output layer. n i = wi xi b, =,,..., p () i= o = f ( i) () p yi = w o b,,,..., o o o o= (3) = yoo = f ( yio) (4) (,,..., ) (,,..., ) Step 4: Use network expectation to output yo = y y y and actual output yo = yo yo yo. Compute te partial derivative δ o( k ) of error function e for eac neuron in te output layer. yio = w yi w Step 5: Use te connection weigt o o o o p wo o o o from idden layer to output layer, ( k ) (5) δ o in output layer and output = (,,..., ) in idden layer. Ten compute te partial derivative δ of e for eac neuron in te idden layer. ( ( do f ( yio)) o= ok = δ i o i k p ( ( do f ( wkeok ) bo)) ) o= = ok = o i = ( d yo ) f '( yi ) w o= = ( ( δ w f '( i )) o= k o o o o o o ok i (6) Journal of Residuals Science & Tecnology, Vol. 3, No. 5, 6 34.3 6 DEStec Publications, Inc. doi:.783/issn.544-853/3/5/34

Step 6: Use δ ( k ) of eac neuron in output layer and te output of eac neuron in idden layer to correct te weigt w ( k ) ; o wo = µ = µδo o w o N+ N w = w + ηδ o (8) o i Step 7: Use δ ( k ) of eac neuron in idden layer and input of eac neuron in input layer to correct w ( k ). wi = µ = µ = δ( kx ) i w i i N+ N w = w + ηδ ( kx ) () i i i i i (7) (9) Step 8: Compute te global error. E = d k y k m ( ( ) ( )) () o o m k = o = Step 9: Determine weter te network learning error meets te reuirements. Wen te error reaces te preset precision or te number of learning is greater tan te maximum number of te set, te learning process is over; Oterwise, coose te next learning sample, and return to step 3, for te next round of learning. Wen te model gets te estimated value of eac parameter by samples training, for te test samples, tey can compute te output according to te process in step 3, wic is te classification results of test samples. After te microblog comments are analyzed by BP neural network we ave two different using cases actually: () Using analysis results to identify te autenticity of micro-blog new and recognize te false comments and to establis simple model. Wen te classification results are taken as statistics, if te proportion of false comments is bigger tan 5 percent, te source microblog is rumors; oterwise it is a true microblog; () Using analysis results to evaluate te confidence of microblog news. We can construct te model by te proportion of false comments and true comments. If te percent of false comments is r and te percent of true comments is t, te confidence of microblog news can be calculated as: credibility = t( r) () 4. Experiment and Results Analysis To verify te performance, te confidence evaluation designed in tis paper, we extract 386 pieces of microblog on QQ.com and related user information. After filtering context wit meta-information like forwarding message te final dataset can be generated and relative labelling work are completed. By statistics, it is found in tis dataset, we obtain te mixed features by te metod mentioned in section, to balance te content of microblog comments. Ten we count te mean value tat tese features lie in rumours and true microblog comments. Te statistical results are sown in table. Table. Mean statistical results of mixed features in rumours and true microblog Supportability content correlation confidence True microblog.355.98.694 Rumours -.65.63.55 From table we can see, te discriminability of eac mixed feature on rumours and true comments are better. Te mean value of supportability feature is only -.65 in false comments, demonstrating te user in rumours can effectively provide initial distinguising on source microblog messages. Te average supportability of true comments is.355, wic as obvious difference wit tat of rumours comments. Te content correlation and confidence features also ave difference in te comments of rumours and true microblog, and te statistical result sow eac feature as better effect in distinguising. Simultaneously, in order to make more accurate observations of te caracteristics of te rumours and true micro-blog comments, we make comparative analysis of te distribution of eac feature in rumours and true microblog comments. Te distribution difference is compared by probability density and te results are sown as figure 4. real comments false comments.8 density.6.4. -4-3 - - 3 4 5 6 (a) Supportability Journal of Residuals Science & Tecnology, Vol. 3, No. 5, 6 34.4 6 DEStec Publications, Inc. doi:.783/issn.544-853/3/5/34

4 3.5 real comments false comments 3.5 density.5.5 -...4.6.8. (b) Content correlation 5 4.5 real comments false comments 4 3.5 3 density.5.5.5 -...4.6.8. (c) Reliability Figure 4. Te probability density comparison of features in false comment and true microblog comments Figure 4(a) indicates tat te supportability feature as obvious effect in distinguising rumours and true microblog comments. Te support degree tat is less tan as wide comments distribution and te users participating can make initial identification on te autenticity of source microblog in time. In figure 4(b), te feature of content correlation presents polarization caracteristics, tat is, oter information content of source micro-blog news rumours provide few comments related to te topics. Te confidence feature of rumours and true comments ave approximate distribution. Wen comparing te distribution of and, te distribution tat confidence degree is in rumour is bigger tan tat in true microblog comments. Te increase of idden layers will dramatically increase te number of parameters model, increasing te complexity of model simultaneously. To simplify te model, single idden layer structure is used in tis experiment; on te oter and, considering te universal function approximation, te number of idden units is of better generalization ability. But in te network, te number of idden units will lead to excessive filtering and over fitting of te sample model, reducing te generalization ability of te network. So te idden layer unit number sould not be too muc, and tere must be te optimal value of idden layer unit number. Troug te analysis of relevant literatures, combined wit te actual application environment and te model structure, we set te number of idden units as 9 by compreensive training and test errors. Te structure of model is 3-9-: input unit number is 3, idden layer unit number is 9 and output unit layer is. Te training grap of model is depicted as figure 5..9 training test.8.7 MSE.6.5.4.3.. 5 5 5 3 Stop times Figure 5. Model training grap wen te number of idden layer units is 9 From figure 5 we know, during te previous 3 times of training te error as obvious decreased tendency; after iterated training more tan 3 times, te descent speed will get slower and te total error get smaller wit te increase of iteration times. Wen te iteration approaces 4 times, te model is converged and te error reaces relative minimum value. Te classification result ROC curve of model is depicted as figure 6. By te ROC curve we can see te model as iger accuracy in te classification result for training samples and test results. Te accuracy of study and prediction is euivalent. It also sows better fitness degree of test sample and generalization ability. Te detailed classification results are sown in table 3. Table 3. Classification results statistical table False comments True comments rumors 445 94 true microblog 375 993 Journal of Residuals Science & Tecnology, Vol. 3, No. 5, 6 34.5 6 DEStec Publications, Inc. doi:.783/issn.544-853/3/5/34

From above table we know, te recall rate of false comments is 76.5%, and te precision rate is 79.8%; te recall rate of true comments is 87.%, and te precision rate is 85.3%. Te classification results are integrated to get te average classification recall rate as 8. and precision rate as 8.%, wic lie in iger accuracy. So te model as better discriminability and it can make distiguisment on te comments of rumor and true microblog..9.8.7.6.5.4.3.....3.4.5.6.7.8.9 5. Conclusion Figure 6. Model classification Roc curve wen te number of idden layer units is 9 Tis paper introduces an analysis and distinguising metod of information confidence evaluation for social media. For te problems existing in rumor detection, a confidence evaluation in microblog platform, we start form te comments of news to define tree features to balance te rumor and true comments, wic breaks troug te traditional view using news broadcasting. We establis a confidence evaluation model based on BP neural network. Ten te real data are adopted to make detailed experimental analysis on te effectiveness, discriminability, te accuracy of te model, and te comparison wit traditional macine learning metods. Te future work will be paid attention to te optimization of evaluation algoritms among te correlation information of users, expecting to acuire more accurate results wen detecting false comments or rumors. Acknowledgements Te autors acknowledge te National Natural Science Foundation of Cina (Grant: 55789), te National Natural Science Foundation of Cina (Grant: 55). References [] Carlos Castillo, Marelo Mendoza, Barbara Poblete. Information confidence on twitter. Proceedings of te international conference on World Wide Web, : 675-684 [] Liu Zitao,Yu Wencao,Cen Wei,et al. Sort text feature selection for microblog mining. Proceedings of International Conference on Computational Intelligence and Software Engineering,:-4 [3] Wang Dong, Al Amin Md Tanvir, Abdelzaer Tarek. Provenance-assisted classification in social networks. IEEE Journal on Selected Topics in Signal Processing, 4,8(4):64-637 [4] Wu TengFei, Zou CangLe, Wang XiaoHua. Microblog propagation network model based on mean-field teory. Acta Pysica Sinica, 4, 63(4):4-3 [5] DUAN Dagao,WANG Cangseng,HAN Zongming. A Rumor Detection Model Based on Weibo Reviews. Computer Simulation, 6, 33():386-39. [6] Wang Cenxu, Qin Tao, Guan Xiaoong. A fast mining algoritm for interest community in directed networks and its application to detection of zombie fans. Journal of Xi'an Jiaotong University, 4, 48(6):7- [7] Montejo-Ráez A., Díaz-Galiano M.C., Martínez-Santiago F. Crowd explicit sentiment analysis. Knowledge-Based Systems, 4, 69:34-39 [8] Huang Jiajia, Su Qiang. A rumor spreading model based on user browsing beavior analysis in microblog. Proceedings of International Conference on Service Systems and Service Management, 3:7-73 [9] Brigt Laura F., Kleiser Susan Bardi, Grau Stacy Landret. Too muc Facebook? An exploratory examination of social media fatigue. Computers in Human Beavior, 5, 44:48-55 [] Wu Ke, Yang Song, Zu Kenny Q. False rumors detection on Sina Weibo by propagation structures. Proceedings of International Conference on Data Engineering, 5:65-66 [] Qazvinian Vaed, Rosengren Emily, Radev Dragomir R. Rumor as it: Identifying misinformation in microblogs. Proceedings of Conference on Empirical Metods in Natural Language Processing, :589-599 [] Ou Gaoyan,Cen Wei,Li Binyang. CLUSM: An unsupervised model for microblog sentiment analysis incorporating link information. Lecture Notes in Computer Science, 4, 84(): 48-494 [3] Ma Jing, Gao Wei, Wei Zongyu. Detect rumors using time series of social context information on microblogging websites. Proceedings of International Conference on Information and Knowledge Management, 5:75-754 Journal of Residuals Science & Tecnology, Vol. 3, No. 5, 6 34.6 6 DEStec Publications, Inc. doi:.783/issn.544-853/3/5/34