Predicting User Activity Level in Social Networks

Predctng User Actvty Level n Socal Networks Yn Zhu, Erheng Zhong, Snno Jaln Pan, Xao Wang, Mnzhe Zhou, Qang Yang Hong Kong Unversty of Scence and Technology, Hong Kong, Chna Insttute for Infocomm Research, Sngapore Renren Inc., Bejng, Chna Huawe Noah s Ark Lab, Hong Kong, Chna {ynz,ezhong,qyang}@cse.ust.hk, jspan@2r.a-star.edu.sg {xao.wang,mnzhe.zhou}@renren-nc.com ABSTRACT Socal meda such as Facebook, Renren and Twtter provde an deal ground to study how to predct users future actvtes based on ther past socal behavor. An mportant measure of the behavor s actvty level, such as users level of weekly actveness, or bnary classfcatons n terms of actve or nactve. Ths predcton problem s closely related to Socal Customer Relatonshp Management Socal CRM. Compared to tradtonal CRM, socal CR- M exhbt some specal characterstcs, n terms of user dversty, socal nfluence, and dynamc nature of socal networks. Users socal dversty property mples that a global predctve model may not be precse for all users. However, the hstorcal data of ndvdual users are too sparse to enable hgh-qualty personalzed models. The socal nfluence property suggests that relatonshps between users can be embedded to further boost the predcton results on ndvdual users. Fnally, the dynamcal nature of socal networks means that users behavors change over tme. To address these challenges, we develop a personalzed and socally regularzed tme-decay model for accurate user actvty level predcton. We conduct experments on the socal meda Renren to valdate the effectveness of our proposed model to demonstrate the superor performance when compared wth tradtonal supervsed learnng methods as well as node classfcaton methods n socal networks. Categores and Subject Descrptors H.2.8 [Database Applcatons]: [Data mnng] Keywords Socal Network Analyss, User Actvty, Predcton 1. INTRODUCTION The number of actve users n a socal network s a crtcal measure of ts popularty, whch can be used as a sgnal of nvestment value for nvestors. In many socal network companes quarterly Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. Copyrghts for components of ths work owned by others than ACM must be honored. Abstractng wth credt s permtted. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. Request permssons from permssons@acm.org. CIKM 13, Oct. 27 Nov. 1, 2013, San Francsco, CA, USA. Copyrght 2013 ACM 978-1-4503-2263-8/13/10...$15.00. Weekly Actve Days 7 6 5 4 3 2 1 0 Alce Bob Candy 18 Apr 2012 9 Oct 2012 1 5 10 15 20 25 Week Fgure 1: The weekly actve days of three Renren users over 25 weeks. reports, such as that of Facebook FB 1 and Renren RENN 2, the number of Monthly Actve Users MAUs and other user actveness measures are publshed. Because these numbers are the strong ndcators of popularty and nvestment value, socal network companes adopt varous strateges to attract new users and mantan old users, e.g., buldng faster and more stable servces, provdng better recommendatons, developng nnovatve UI, and supportng personalzed servces. An mportant strategy for ncreasng the number of actve users s to gve ncentves to users who are nactve or gong to be nactve. But f a user s already nactve for a long tme.e., lost users, t s much harder to actvate the user agan than when he/she only shows a sgn of becomng nactve. Ths fact motvates us to explore how to predct a user s future actveness ether n terms of the level of actvty or bnary classfcaton nto actve or nactve. By accurately predctng the future actvty levels of users, we can track potentally lost users n an early stage and gve them ncentves to stay actve. The predcton model can also shed lghts on the explanaton of what user behavors show correlatons wth ther future actvty levels. These nsghts can help mprove user-mantenance strateges. Fgure 1 shows the weekly onlne days of three users over 25 weeks from Apr. 18, 2012 to Oct. 9, 2012 n Renren the detals of the dataset are descrbed n the experment secton. Among the three users, Alce and Bob are close frends and exhbt smlar patterns of weekly onlne days. Candy becomes relatvely nactve after the 11 th week, and has no more onlne actons snce the 18 th week. If we were able to dentfy a user lke Candy who was stll actve by some tme pont but was about to declne hs/her onlne 1 Reports are avalable at http://nvestor.fb.com 2 Reports are avalable at http://r.renren-nc.com

actvtes afterwards, we could try to gve them ncentves to make them reman actve n the socal network, e.g., by provdng them new servces, free e-gfts, gamng ponts, etc. If we defne a weekly actve user as a user who s onlne for at least three days durng one week, then Alce and Bob are mostly weekly actve whle Candy s nactve after the 11 th week. Our goal s to make an early predcton n the 11 th week to recognze that Candy would be potentally nactve n the comng week based on her onlne behavors durng the frst 11 weeks. Ths problem has also been studed n tradtonal Customer Relatonshp Management CRM for many years [5, 16]. For example, the three data mnng tasks n KDD Cup 2009 organzed by Orange [17] were to make predctons on moble phone users, ncludng swtchng provders, buyng new products, and upgradng servces. And n partcular, when predctng users who have strong tendences to leave a servce provder, t s called Churn Predcton [10, 21, 22]. These predcton tasks are usually solved by constructng useful features and buldng a good classfer, or an ensemble of a few classfers wth the features. However, n the context of socal networks, new challenges arse. Frst, n a socal network, users may be dverse, and ther behavor patterns can be very dfferent among each other. As a result, a global predcton model traned on all users may fal to generalze well on ndvduals. On the other hand, hstorcal data of ndvdual users are too sparse to tran personalzed models precsely. Second, the dynamc nature of socal network servces and ndvdual behavors cannot be captured well n a statc model. As a result, predctve models whch contan a factor to model behavor changes are more desrable. Last but not least, n a socal network, users are more or less nfluenced by ther frends, and close socal frends may exhbt smlar actve patterns. Therefore, how to leverage the socal network structure s crtcal to make good predctons on user actveness. Note that the above three challenges are not specfc to the actvty level predcton problem. One or two of them are common to varous predcton tasks. For example, the tasks of emal spam detecton [4] and emal mportance rankng [1] share the same problem of personalzaton, typcal tme seres modelng needs to deal wth the dynamcs n the temporal space [31], and recently socal recommender systems and node classfcaton n socal networks usually requre to encode socal structures nto model learnng [25, 38]. However, dealng wth the three challenges smultaneously makes our problem unque, and we propose a learnng model to overcome the three challenges n a unfed learnng framework. Our proposed soluton starts wth a smple model based on l- ogstc regresson. We then extend the base model by equppng dfferent terms to capture the three propertes of the user actvty level predcton task, resultng n a unfed optmzaton problem. Specfcally, to address the user dversty ssue, we propose to decompose the model nto two parts: the common part, whch s for global optmzaton over all users, and the user-specfc part, whch s for personalzed optmzaton on specfc users. However, as descrbed above, hstorcal data of each ndvdual user are extremely sparse to optmze personalzed models ndependently. Therefore, we propose to jontly learn personalzed models for ndvdual users by makng use of the common part of the models as a brdge. To model the dynamcs n user behavor, we ntroduce a tme-decay term to penalze out-of-date tranng data. To model socal nfluence, we deploy a socal regularzaton term for smooth predctons over close frends and groups of users whose actvty levels are also close to each other. In summary, the man contrbutons of ths paper nclude: Table 1: Defnton of Notatons Notaton Notaton Descrpton Data G Socal network N Number of users T Number of tme perods x t Feature vector of the -th user n perod t y t {+1, 1} Actveness status nactve v.s. actve p Number of features S t Users socal te matrx durng tme perod t Model w 0 Common model coeffcents w Model coeffcents of the -th user γ, γ 0, β Regularzaton parameters α Tme-decay parameters We propose a learnng model that encodes users personalzaton, socal nfluence and dynamc behavors nto a unfed optmzaton framework for the user actvty level predcton task. We valdate the effectveness of our proposed model by comparng several baselnes on a large-scale realworld socal meda, Renren. Furthermore, for ths specfc socal meda, we constructed three categores of useful features and show ther effectveness n actvty level predcton. The rest of the paper s organzed as follows: In Secton 2, we defne the actvty level predcton task as a classfcaton problem and summarze the notatons used through the paper. In Secton 3, we start wth ntroducng a base logstc regresson model, then propose a unfed framework to encode three components to capture the three propertes of the task, and fnally show how to optmze the unfed model. After that we conduct extensve experments n Secton 4. We fnally revew some related works and conclude ths paper n Sectons 5 and 6 respectvely. 2. PROBLEM FORMULATION We formally defne the problem of user actvty level predcton n ths secton. The defnton of notatons can be found n Table 1. In a socal network G, suppose there are N users, and each user n the tme perod t can be represented by a p-dmenson feature vector, x t R p. Ths feature vector represents varous user actvty nformaton tll the tme perod t. The correspondng label of x t, denoted by y t {+1, 1}, s the actveness status nactve v.s. actve n the next perod t + 1. The goal s to learn a predcton functon f that takes the features x t of user n the tme perod t as nputs and predct hs/her actveness status y n the next tme perod t + 1. We use a matrx S t to represent the socal tes between users durng the tme perod t. If user and user j are not frends durng the tme perod t, then S t j follows, S t j = = 0, otherwse, S t j s defned as I, j, 1 I, k k N where I, j denotes the number of socal nteractons e.g., leavng messages, vstng homepages, etc. between user and user j,

and N denotes the set of frends of user. Here we assume that a larger value of S t j suggests a stronger frendshp between the users and j n tme perod t. 3. PERSONALIZED TIME-DECAY LOGIS- TIC REGRESSION WITH SOCIAL REG- ULARIZATION As stated n the prevous secton, we formulate the actvty level predcton task as a bnary classfcaton problem. In ths secton, we propose a model to address t based on logstc regresson. Generally, logstc regresson bulds a lnear functon on nput features, and predcts target labels usng the sgmod functon as follows, ŷ = σw x = 1 1 + exp w x, where x s a data nstance, ŷ s the correspondng predcton, and w s the coeffcent vector to be learned. By usng logstc regresson, a base model for actvty level predcton can be formulated as learnng an optmal soluton w 0 by solvng the followng mnmzaton problem, mn w 0 J = =1 t=1 l y t, w0 x t + γ 0 w 0 2 2 2 where T s the number of tme perods observed so far, γ 0 s a parameter on the regularzaton term w 0 2 2 penalzng the model complexty, and the loss functon ly, w0 x s defned as l y t, w0 x t = log 1 + exp y t w0 x t. 3 An advantage of usng logstc regresson as the base classfer s that t can generate a probablstc output of a user beng nactve or actve. Ths s mportant n real-world applcatons as users can be ranked accordng to ther probabltes beng actve or nactve such that dfferent marketng strateges can be adopted based on the rankng. However, ths base model fals to embed the three mportant propertes of the actvty level predcton task n learnng: user dversty, dynamc behavors and socal nfluence. As stated n Secton 1, dfferent users may have very dfferent actvtes, and ther actvtes may be further nfluenced by ther frends. Furthermore, users actvtes can change over tme. Therefore, n the followng, we show how to extend the base logstc regresson model to capture these three specfc characterstcs. 3.1 Personalzaton As mentoned, dfferent users may have dfferent behavors, and the correlatons between users behavors and ther actvty levels may be dfferent. For example, some users may prefer to leave messages on ther frends homepages whle others may prefer to talk wth them drectly usng web chat. In addton, for those who log n a socal meda system frequently, some may only vst ther frends homepages, but others may have lots of nteractons wth ther frends. Ths mples that for dfferent users, the predcton models on actvty levels should be dfferent. Therefore, usng a global model w 0 n partcular learned n 2 cannot make precse predctons on all users. Inspred by mult-task learnng [15], whch ams to learn a set of dfferent but related tasks jontly by explorng the commonalty across tasks, we consder buldng a personalzed model for an ndvdual user as a task. We further assume that for each task, the predctve model can be decomposed nto two parts. One s referred to as a common part shared by multple tasks and the other s referred to as a specfc part for ndvdual tasks. Therefore, we ntroduce a user-specfc parameter w for each user, and plug t nto 2 as follows, mn J = w 0,{w } N =1 =1 t=1 l y t + γ 0 w 0 2 2 + γ, w 0 + w x t w 2 2, 4 where γ s a parameter on the regularzaton terms for ndvdual users. Note that the global knowledge across users can be modeled through the global parameter w 0 and users specfc patterns can be captured by the user-specfc parameters {w } s. The tradeoff between commonalty across users and extreme personalzaton les n the rato between the values of γ 0 and γ. For nstace, we can set γ a relatvely small value to allow more personalzaton, and ths however may cause overftng to ndvdual users tranng data. 3.2 Dynamcal Modelng Another mportant property n user actvty level predcton s the dynamc nature of users behavors over tme. For example, a user may nteract wth hs/her frends frequently when he/she s stll a student, but may only vst hs/her frends homepages after graduaton due to the lack of tme. Another example s that, a user may be actve when a new applcaton or game s launched on a socal meda, but may become nactve when he/she looses nterests n t. In summary, a user s actvty level may be smlar n short tme but may become more and more dfferent n long tme. Therefore, motvated by the network dynamc model [31], we further ntroduce a term, e αt t, to model tme decay nto the personalzed model n 4 to reduce the mpacts of out-of-date tranng data, mn J = w 0,{w } N =1 =1 t=1 + γ 0 w 0 2 2 + γ e αt t l =1 y t, w 0 + w x t w 2 2, 5 where α s a parameter to control the decay rate. Note that the weghts of the tranng data decrease exponentally wth tme ncreases. Based on the objectve n 5, more recent tranng data play more mportant roles n model learnng. 3.3 Socal Regularzaton The major dfference between socal and tradtonal user actvty level predcton s socal nfluence. In socal networks, users usually nteract wth ther frends, and thus ther actvty levels tend to be nfluenced by ther frends, and vce versa. Intutvely, one may be actve f a few of hs/her frends are actve, whle may become nactve f most of hs/her frends are nactve. Furthermore, the mpact of the socal nfluence may depend on the strength of the relatonshps, as users may only be nfluenced by ther close frends. Formally, we ntroduce the followng regularzaton term to smooth the predcton results so that the predcton of a user s actvty level s smlar to that of hs/her close frends, =1 t=1 j C t =1 w x t 2 wj x t j 6 where C t s the set of x s close frends based on socal nteracton counts n the t-th tme perod. How to choose the subset C from the full lst of x s frends s based on 1, whch wll be dscussed n experments. By addng ths regularzaton term,

knowledge n users socal relatons can be encoded nto the model. We notce that, the socal regularzaton term s only performed on users close frends, whch s consstent wth real-world applcatons, as most users are nfluenced by only a few close frends. Ths also brngs two advantages: 1 t accelerates the model computaton as less data are consdered, and 2 t boosts the model performance n actvty level predcton as most rrelevant data are elmnated, whch wll be verfed n experments. 3.4 Overall Optmzaton Problem We now embed all components descrbed n Secton 3 nto a unfed optmzaton problem for user actvty level predcton as follows, mn J = w 0,{w } N =1 + β =1 t=1 =1 t=1 e αt t l j C t + γ 0 w 0 2 2 + γ y t w x t, w 0 + w x t 2 wj x t j w j 2 2. 7 j=1 We call ths model Personalzed Tme-Decay Logstc Regresson wth Socal Regularzaton SocTPerLR. By learnng model coeffcents wth multple regularzaton terms collectvely, personalzaton, socal tes and dynamcal knowledge can be encoded to buld a more accurate model. To learn the optmal solutons of w 0 and {w } N =1, we propose to use gradent descent methods. It can be shown that the dervatves of the objectve J wth respect to w 0 and each w can be computed as J w 0 = J w = =1 t=1 t=1 + β e αt t x t t y σw0 x t + γ 0w 0 e αt t x t t y σw x t j C w x t wj x t j x t + γw, Based on the above dervatves, we can update w 0 and w alternatvely by usng the followng rules tll the solutons are converged, w 0 w 0 η J w 0, 8 w w η J w, 9 where η s the learnng rate. That s, n each teraton, we frst fx all w = 1,..., n and optmze w 0, and then we fx w 0 and optmzaton w. The overall algorthm for SocTPerLR s summarzed n Algorthm 1. 3.5 Computatonal Analyss The computatonal cost n each teraton Steps 5-8 n Algorthm 1 s approxmately the cost of runnng two plan logstc regressons [27] on the whole dataset. In our experments, we show the convergence speed emprcally. Notce that steps 6-8 n Algorthm 1 can be massvely parallelzed because the N user specfc models can be traned ndependently. The speedup factor of such data parallel tasks [34] s proportonal to the number of CPU cores the program s gven. Further more, the optmzaton of the global model Step 5 can Algorthm 1 Gradent Decedent Optmzaton for SocTPerLR 1: Input: user features X = {X t = {x t } N =1} T t=1, labeled data Y, regularzaton parameters γ, learnng rate η and maxmal number of teratons I 2: Output: Common model w 0 and specfc models for all users N w =1 3: Generate w 0 and all w randomly 4: for = 1 to I do 5: Fx other parameters, and keep updatng w 0 w 0 + η J w 0 6: for n = 1 to N do 7: Fx other parameters, and keep updatng w w +η J w 0 8: end for 9: IF convergence break 10: end for 11: Return w 0 and w N =1 also be done n parallel usng technques presented n [24]. Therefore, the mplementaton of Algorthm 1 s fast and can scale up to dstrbuted envronments. 4. EXPERIMENTS In ths secton, we compare the proposed actvty level predcton method wth several baselnes on a real-world data set. Through extensve expermental results, we demonstrate the effectveness of our model, and the mpacts of the three components n our model: user personalzaton, modelng dynamc behavors and socal regularzaton. Furthermore, we also report the features desgned for socal actvty level predcton n detal. 4.1 Data Descrpton The data set used for evaluaton s collected from a real-world socal meda, Renren.com, whch s one of the largest onlne socal meda n Chna and has over 170 mllon regstered users. Smlar to Facebook, Renren s an undrected frendshp network wth a mature applcaton platform to support varous socal servces. On the platform, users can perform varous actons such as socal messagng and gamng. To prepare the evaluaton data set, we extract a subnetwork from the whole socal network n Renren by applyng a communty detecton algorthm [7]. The subnetwork contans 26, 418 users n total after removng those who have no actvty durng 18 Aprl 2012 and 9 October 2012 25 weeks n total. A user n the socal network can perform many actons, from updatng statues, sendng messages, postng photos, to playng varous socal games. Ther actvty levels are expressed by these actons. We extract the acton log between the 25 weeks for these users. The user acton log, whch s summarzed from the raw HTTP requests, s a content-less 3 log of users onlne actvtes, ncludng updatng statues, postng/replyng on walls, commentng on photos, checkng notfcatons, etc. These actons can be classfed nto two categores. The frst category of actons does not nvolve other users, e.g., uploadng a photo, postng a message, etc. Whle the other category of actons, whch consttute most of the actons of a user, are nteractons wth other users, e.g., a message s sent from user a to user b, user a s photos are vewed by user b. Actons n the latter category are used to calculate the strength of frendshp between user and user j durng week t, S t,j Eq. 1. 4.2 Feature Constructon 3 Though all the user names have been anonymzed, to protect the user prvacy, we avod usng any actual content e.g. the actual message content.

Category Acton features Tme seres features Socal features Table 2: Summary of Features Feature descrpton for x t Number of status posted Number of photos posted Number of searches Number of frend applcatons Number of accepted frend applcatons Number of dened applcatons Number of repled messages Number of lkes Number of vdeos played Number of songs played Number of blogs vsted Number of photos vsted Number of notfcaton checks Number of forwarded statuses... more actons Number of actve days of week t Mean number of actve days over the weeks n the wndow Standard varance of the number of actve days over the weeks Rato of actve weeks n the wndow Mean number of days among the actve weeks If the last week s actve, count backwardly untl an nactve week occures If the last week s nactve, count backwardly untl an actve week occures Number of socal frends Number of actve socal frends n week t Number of socal frends wth nteractons n week t Remnd that for a user n the t th week, we need to generate a feature vector x t to represent t. In practce, t has been proven that a lot of machne learnng tasks resort to feature engneerng rather than complex modelng to boost classfcaton accuraces [12]. For example, feature engneerng has proven to be effectve n recent data mnng compettons [20, 39], and socal churn predcton [21]. In ths secton, before conductng experments to verfy our proposed model, we frst ntroduce the features desgned for the user actvty level predcton task on socal meda. For a user n the t th week, the feature vector x t conssts of three dfferent groups of features as follows, whch are also summarzed n Table 2. 1. A frst group of features s referred to as acton features. From the acton log, we use 31 frequent actons and count the tmes of every acton that user has performed durng the t th week. We have also calculated dfferent versons of the counts condtoned on tme, e.g. splttng tme nto Weekdays or Weekend, and splttng tme nto Mornng, Afternoon, Evenng or Nght. These condtoned features have proven to be useful n sensor-based actvty recognton [39]. 2. A second group of features s referred to as tme seres features on actvty level. For the t th week, we take the past k weeks nto consderaton and construct an actve-or-not seres of length k. We then extract some statstcs from the seres as our features. These statstcs features nclude the average length of contnuous actve weeks, last nactve week, the rato between actve weeks and nactve weeks, etc. The parameter k s set to 5 and 10 to generate two versons of the features. Note that a smlar set of features has been used for detectng Internet path changng [8]. Table 3: Sample Statstcs for Evaluaton Weeks Label #21 #22 #23 #24 #25 actve nactve 1732 1733 1709 1846 2112 actve actve 13418 13476 13291 13088 12481 3. A thrd group of features s referred to as socal features. We have also extracted several features from the structure of the socal network among users, whch have proven to be useful n churn predcton [22, 28]. Such features nclude number of socal frends of user, number of socal frends wth nteractons to user durng week k, number of socal frends who are actve durng week t. For the later two features, we also calculate the normalzed versons,.e., dvdng them by the number of socal frends of user. 4.3 Evaluaton Methods We hold out the data from the 21 th week to the 25 th week for testng. Specfcally, we use the frst 20 weeks of data for tranng and the subsequent week after the 20 th week for testng. For example, to evaluate the predctons on week 21, we use data from the 1 th week to the 20 th week to buld the model, and apply the model on the feature vectors of the 21 th week, {x 21 } n =1. The data sample counts for the fve testng weeks are shown n Table 3. As mentoned n Secton 1, f a user has onlne actons for at least three day durng week t, we label hm/her as actve for week t; otherwse, nactve. We observe that the dstrbuton s unbalanced over the two classes and thus we need to adjust the rato between the weghts of actve and nactve nstances [11]. Ths can be sm-

ply done by addng dfferent weghts or cost on dfferent classes to the loss functon n 3. We denote by b actve and b nactve the weghts on the two classes actve and nactve respectvely. We compare the proposed algorthm, Personalzed Tme-Decay Logstc Regresson wth Socal Regularzaton SocTPerLR, wth two types of baselnes: 1. Logstc Regresson LR and RandomForest RF. These t- wo classfers, one of whch s lnear and the other s nonlnear, are commonly used n varous classfcaton tasks. Usng these two classfers we can frst explore the dscrmnatve power of the proposed features presented n Table 2, and further verfy how much mprovement our proposed model can brng. 2. A state-of-the-art node classfcaton algorthm Node proposed by [38]. In ths algorthm, a user can be only represented by one feature vector nstead of tme-seres feature vectors. We thus choose the latest feature vector to represent each user. Ths model consders the smlarty among frends, but stll uses a sngle global model to make predctons on all users. Precson, recall and F1-score are often used as evaluaton crtera n churn predcton tasks [21, 22, 28]. Followng ths common practce, we use them as evaluaton crtera n ths paper. These measures are defned for the actve-to-nactve users,.e., users whose actvty level declnes from actve to nactve n the comng week: recall = precson = F1-score = #correctly classfed nactve users #true nactve users #correctly classfed nactve users #users classfed as nactve 2 precson recall precson + recall We tune the parameters of all models by consderng users actveness n the 20 th week as targets and generatng feature vectors from data observed n the frst 19 weeks, and fx the parameters n all experments for makng predctons on the weeks 21 to 25. 4.4 Strong Socal Tes Constructon Socal regularzaton s a crtcal component n the proposed model, and the queston s whch part of a users frends are supposed to be smlar to hm/her wth respect to weekly onlne days. A prevous research on Facebook s socal nteracton [36] shows that most of a user s socal nteractons are played wth only a few close frends. We verfy ths fndng n Renren network as shown n Fgure 2. Each curve shows a Cumulatve Dstrbuton Functon CDF of an nteracton dstrbuton. For nstance, the sold lne shows the proporton of the nteractons of the top frend over all frends of a user. From ths CDF curve, we observe that about 10% of users have more than 50% of nteractons wth a sngle frend. To draw the further concluson that a user s actvty level s closely related wth the frends wth whom ths user nteracts most, we plot the dstrbuton of the followng rato: r t = 1/ C t d t j C t, = 1, 2,..., N, 10 d t j where d t s the number of user s actve days durng week t 3 to week t to obtan a stable statstcs, we use one month rather than one week. We use dfferent methods to construct C t : 1 frends wth whom the user has at least 3 nteractons and the rato accounts at least 15% of hs total socal nteractons.e. for user, select all Cumulatve Dstrbuton Functon CDF 0.0 0.2 0.4 0.6 0.8 1.0 Top 1 Top 5 Top 10 0.0 0.2 0.4 0.6 0.8 1.0 Interacton Rato Top Frends/All Frends Fgure 2: Interacton Dstrbuton for Top 1/5/10 Frends. Probablty Densty Functon PDF 0.0 0.5 1.0 1.5 2.0 {j} s wth S t j user.e. C t all frends random close frends 0 1 2 3 4 5 Onlne Days Rato, See Eq. 9 Fgure 3: Rato Dstrbuton for Onlne Days. 15% as the close frend C t ; 2 all frends of = N ; and 3 a random set of users wth a sze equal to the number of user s frends. We plot the dstrbuton of r 4 n Fgure 8, we can fnd that the rato dstrbuton s centered for close frends; however the dstrbuton for the whole frends s far from that of the close frends and s even close to that of random users. Ths fact shows that only knowledge from close frend may be helpful to regularze the model buldng. The two peaks n the densty are due to that the onlne days of the users are large at the two extremes,.e., durng a week, more users are actve for 7 or 0 days than for 3/4 days. 4.5 Performance Comparson The performance of the model proposed n ths paper and the baselnes are presented n Table 4. We can fnd that LR whch consders only users features, performs the worst. By accountng the non-lnear predcton ablty, RF performs slghtly better than LR but stll worse than our method, as RF does not take three mportant factors nto account. Node can only take one sample from each user, and though t uses socal regularzaton t does not model user varety and tme decay. The mprovement of our proposed model SocTPerLR over the three baselnes s obvous. We also show the precson and recall n Tables 5 and 6. Take Week 21 as an nstance, the 11.8% recall mprovement over LR means that our model can fnd 204 more actve nactve users out of the total 1732. Through the comparson, we can also fnd that though we have three features that encode the socal nformaton, the two baselnes LR and RF cannot use them to reach the performance of our proposed method.

Table 4: Performance Comparson of Dfferent Methods n terms of F1-score Method #21 #22 #23 #24 #25 LR 0.512 0.498 0.521 0.492 0.523 RF 0.529 0.507 0.525 0.495 0.533 Node 0.510 0.503 0.516 0.489 0.531 SocTPerLR 0.583 0.553 0.551 0.542 0.561 Value 1 0.9 0.8 0.7 0.6 0.5 Precson Recall F1 score Table 5: Performance Comparson of Dfferent Methods n terms of Precson Method #21 #22 #23 #24 #25 LR 0.403 0.392 0.411 0.379 0.533 RF 0.422 0.398 0.414 0.386 0.430 Node 0.403 0.394 0.417 0.377 0.429 SocTPerLR 0.452 0.441 0.431 0.419 0.436 0.4 1 2 4 6 8 10 Classfer mbalance weght b nactve Fgure 4: Change the label weght between two classes. 0.6 Table 6: Performance Comparson of Dfferent Methods n terms of Recall Method #21 #22 #23 #24 #25 LR 0.701 0.681 0.712 0.700 0.685 RF 0.708 0.699 0.716 0.689 0.701 Node 0.693 0.695 0.675 0.693 0.697 SocTPerLR 0.819 0.742 0.761 0.768 0.786 F1 score 0.58 0.56 0.54 0.52 SocTPerLR TPerLR Table 7: Performance Comparson on Dfferent Components n terms of F1-score Method #21 #22 #23 #24 #25 PerLR 0.553 0.533 0.542 0.533 0.547 TPerLR 0.563 0.542 0.545 0.535 0.550 SocTPerLR 0.583 0.553 0.551 0.542 0.561 Our next experment s to compare two reductons of SocTPerL- R: PerLR whch only consders the personalzaton factor n 4 and TPerLR whch has personalzaton and tme decay but gnores the socal factor n Eq. 5. The comparson results are showed n Table 7. The mprovement of PerLR over the basc logstc regresson LR n Table 4 s sgnfcant, as hgh as 0.04 on F1-score n all comparsons. Ths confrms our observatons that the users are dfferent n terms of ther behavors and feature descrpton. Due to the lmted number of nstances for each users 20 and the short tme range 20 weeks, decayng the mportance of the tranng nstances temporally may not brng sgnfcant mprovements. However, addng tme decay to the personalzed logstc regresson also outperforms PerLR by 0.01 on F1-score n the 21 st week. The proposed method SocTPerLR performs best among all the algorthms over all weeks and mproves the F1-score of TPerLR by 0.01 on average. Ths, from the emprcal aspect, supports the necessty to consder three mportant factors. 4.6 Senstvty Analyss In the followng we study how the model parameters affect the performance of SocTPerLR and look n depth the contrbuton of each component n our model. To study the effort of the class 0.5 10 3 10 2 10 1 Tme decay parameter α Fgure 5: Change the tme decay parameter α. weght on the recall/preceson/f1-score, we set the weght of actve samples b actve as 1 and change the weght for nactve samples b nactve from 1 to 10, and show the precson/recall/f1-score trend n Fgure 4. We can see our model generally obtans a good recall though the precson s not hgh for all weghts. When the weght for nactve samples s 4, the model has the best F1-score. Consderng that among the users who are actve ths week, and only a fracton of them wll fall nactve next week and that the onlne user behavor s also heavly affected by ther offlne actvty whch s beyond our knowledge, the 0.78 recall s actually not bad, whle the precson s also acceptable. Next we study the tme decay parameter α, socal regularzaton parameter β and the personalzed regularzaton parameter γ. Whle changng one parameter, we fx all other parameters to see how changng the value affects the F1-score of the predcton on week 21. Fgure 5 shows the performance of two models TPerLR and SocTPerLR on week 21. We fnd that a small decay parameter, α = 0.01 the weght for the frst week samples s 0.82, works better than large ones. Ths s because when α s large, the penalzaton of prevous samples s too much and the 20 tranng samples are too few to provde enough nformaton. Fgure 6 shows how the socal regularzaton coeffcent β affects the model performance. We can fnd that when β s too large the regularzaton can hurt the performance and the F1-score can drop below that of TPerLR. To study the effectveness of personalzed models, we fx the global regularzaton parameter γ 0 = 1 and vary the parameter of λ from 0.01 to 100, the performance trend of three models

F1 score 0.61 0.6 0.59 0.58 0.57 0.56 PerLR TPerLR SocTPerLR Objectve Value, Eq. 7 2 x 105 1.8 1.6 1.4 1.2 1 0.55 0.01 0.05 0.1 1 5 Socal regularzaton parameter β Fgure 6: Change the socal regularzaton parameter β. 0.8 0 5 10 15 20 25 30 Iteratons Fgure 8: The convergence speed of Algorthm 1. F1 score 0.61 0.6 0.59 0.58 0.57 0.56 0.55 PerLR TPerLR SocTPerLR Table 8: Socal Tes Constructon Usng Dfferent Thresholds Threshod Precson Recall F1-score 50% 0.433 0.772 0.554 30% 0.441 0.773 0.561 15% 0.452 0.819 0.583 10% 0.437 0.801 0.566 5% 0.411 0.755 0.532 0.54 0.53 0.01 0.1 1 10 100 Personalzed regularzaton parameter γ Fgure 7: Change the personalzed regularzaton parameter γ. on week 21 s shown n Fgure 7. As we have shown n the prevous subsecton, the choce of the close-frend set C may be crtcal to the performance of the socal regularzaton. For user, we use dfferent thresholds to select hs close frends by flterng S t j usng the threshold. Table 8 shows the Precson/Recall/F1-score for dfferent thresholds from 5% to 50% when other parameters are fxed. We fnd that when usng top 15% close frends for socal regularzaton, the perform s best. The teratve optmzaton of SocTPerLR generally takes less than 15 teratons to converge, Fgure 8 shows the convergence curve when tranng the model usng data from weeks 1 to 20. The second pont n the fgure shows the objectve value after the frst teraton. The gap between ths value and the fnal converged objectve value verfes that personalzaton helps to reduce the tranng error sgnfcantly than a global model. 4.7 Dscussons Our experments show that predctng users future actvty levels n a socal network s generally qute hard to have hgh precson and hgh recall at the same tme. Ths s not uncommon n actvty level predcton n other applcatons, e.g. churn predcton n onlne chat rooms [28] and onlne games [22]. In practce, ths precson and recall are already good for socal CRM purposes n Renren network. Usng our predcton results, we can reduce the number of update emals sgnfcantly because for users who are predcted as actve-n-the-future we do not need to send them up- date emals. For such users, t s much better to let them check the updates on the socal network. The hardness n accurate predcton s partally due to that the user actvty level s also affected by users offlne schedules and cannot be predcted usng ther onlne traces alone. How to ncorporate the offlne behavor to predct the onlne actvty level can be an nterestng future work. In ths paper, by modelng the three aspects personalzaton, socal reutlzaton, and tme decay nto a unfed model, we archve much better results than a sngle predcton model, whch s a common practce n some prevous works [13, 21, 22]. The followng table shows the top fve features ranked by Random Forest classfer 4. The mportance has been normalzed by that of the most dscrmnatve feature. From ths table, we can fnd that two actons, notfcaton check and photo uploadng, are hgh ndctors of actveness n the future. Ths confrms our observatons of some exstng functonaltes n Renren, e.g. a recent feature, today s photo n hstory 5, brngs qute a lot of nactve users to become actve agan. The user wll keep actve n the comng week to see whether hs/her photos, especally the newly uploaded ones, are beng vsted and commented. The fact that notfcaton checks have hgh ndcaton of user actveness can also lead us to mprove the current notfcaton check user experence. 5. RELATED WORK Socal CRM, and Socal Actvty Analyss. The man goal of Socal CRM s to keep actve users n a socal network [16]. Ths s a very broad research topc, and researchers and practtoners use qute dfferent approaches, e.g., nnovatve UI [32], socal 4 We use varimpplot command n randomforest R package to perform feature rankng; the varable mportance measure s Mean Decrease Gn Index. 5 On the rght bar of the user home page of Renren, t dsplays photos that were uploaded on the same date years before.

Table 9: Feature Importance Feature Relatve Importance Number of notfcaton checks 100 Number of actve days of week t 77 Rato of actve weeks n the wndow 69 Number of actve socal frends 53 Number of photos posted 41 games [2], personalzed news/notfcatons [9], and offlne promoton [33]. Our approach s more focused on predctng users who have a tendency to declne ther actvty levels. Prevous lterature calls ths problem as churn predcton. Socal user behavor has been studed recently, e.g., analyss on the user nteractons n Facebook [36, 35], actvty recommendaton [23] and user actvty level [3]. [3] s workng on smlar data as our work, however t manly focuses on the statstcs and analyss, rather than predcton. Churn Predcton. Churn predcton ams to fnd users who wll leave a network or a servce, and by fndng such users the servce provder can analyze the reason and fgure out the strateges to mantan such users. Our problem of actvty level predcton s smlar to socal churn predcton n the aspect that both am to predct the future actvty level of a user. In the past, socal churn predcton has been studed n many dfferent applcaton areas, ncludng telecommuncatons [10, 17, 30], onlne socal games [22], and QA forums [13, 14, 37]. The users n these applcatons usually do not have complcated user behavors, e.g., n telecommuncatons, there are only two man actvtes, short messagng and calls. Our work enrches the applcaton area by performng analyss to a real-name socal network wth complcated user behavors. Most of these research projects ft n the feature engneerng theme encodng user behavor log and socal structures as features and buldng a classfer usng these features. The unfed model proposed n ths paper works n a perpendcular drecton by consderng the smlarty and dfference among users. In partcular, we allow each user to have a unque predcton model [19], and to mprove the generalzaton ablty of each model, we requre all the models to have a common part and the user unque part s also regularzed by strong socal tes. Modelng technques. The user dfference has been studed prevously n Mult-task Learnng [15] and Transfer Learnng [29]. Our personalzed logstc regresson model can be vewed as a specal mult-task learnng model. A smlar personalzed model s used n Gmal mportance rankng system [1]. Model personalzaton s also used n ad predcton recently [6]. Tme decay s a typcal modelng technque used n tme-seres [18]. Our use of socal regularzaton s nspred by the recent advances n personalzaton, especally n Collaboratve Flterng [25]. Although the three man technques are not new, we take a smple yet practcal approach to unfy them nto the logstc regresson classfer and our fnal model s stll qute easy to mplement. The computatonal complexty of our model s also acceptable as t s only a small multpler the number of teratons of that of a sngle logstc regresson. 6. CONCLUSION AND FUTURE WORK Ths study s motvated by the need to boost more daly actve users n a socal network. In ths paper, we have studed users actvty level predcton, whch s an mportant task n Socal Customer Relatonal Management Socal CRM. Dfferent from the tradtonal CRM, socal CRM has some specal propertes: user dversty, dynamc behavors and socal nfluence. These ssues make the problem more challengng. By takng these factors nto account, we have desgned a unfed learnng framework that can predct the future actvty level of a user n the socal network more accurately than baselne methods. These actvty predctons not only reduce the cost for user mantenance, but also avod dsturbng normal users, e.g., by sendng notfcaton and update emals only to users who have the tendency to drop the actvty level n Renren. For our future work, we wll test our models on dfferent socal networks other than Renren. Another mportant future work s to use more nformaton, not only the onlne user behavor but also the offlne user actvtes. In the past, such offlne nformaton was hard to obtan; nowadays socal networks have moble apps for user to use wth ther smartphones and such apps can sense the physcal actvtes provded that the user has granted the rght. By usng the extra nformaton n the offlne/physcal world and explotng the onlne and offlne knowledge collectvely, the onlne actvty level can be predcted more precsely. Ths future work can be put n a general research theme, learnng from both physcal world and vrtual world. 7. ACKNOWLEDGMENTS We thank the support of Hong Kong RGC GRF Projects 621211 and 620812. 8. REFERENCES [1] D. Aberdeen, O. Pacovsky, and A. Slater. The learnng behnd the gmal prorty nbox. In NIPS 2010 Workshop on Learnng on Cores, Clusters and Clouds, 2010. [2] G. Abramson and M. Kuperman. Socal games n a socal network. Physcal Revew E, 633:030901, 2001. [3] F. Benevenuto, T. Rodrgues, M. Cha, and V. A. F. Almeda. Characterzng user behavor n onlne socal networks. In Internet Measurement Conference, pages 49 62, 2009. [4] E. Blanzer and A. Bryl. A survey of learnng-based technques of emal spam flterng. Artf. Intell. Rev., 291:63 92, 2008. [5] F. Buttle. Customer Relatonshp Management. Butterworth-Henemann, 1st edton, 2003. [6] H. Cheng and E. Cantú-Paz. Personalzed clck predcton n sponsored search. In WSDM, pages 351 360, 2010. [7] A. Clauset, M. E. J. Newman, and C. Moore. C moore: Fndng communty structure n very large networks. http://www.arxv.org/abs/condmat/0408187, 2004. [8] I. Cunha, R. Texera, D. Vetch, and C. Dot. Predctng and trackng nternet path changes. In SIGCOMM, pages 122 133, 2011. [9] A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalzaton: scalable onlne collaboratve flterng. In Proceedngs of the 16th nternatonal conference on World Wde Web, WWW 07, pages 271 280, New York, NY, USA, 2007. ACM. [10] K. Dasgupta, R. Sngh, B. Vswanathan, D. Chakraborty, S. Mukherjea, A. A. Nanavat, and A. Josh. Socal tes and ther relevance to churn n moble telecom networks. In EDBT, pages 668 677, 2008. [11] P. Domngos. MetaCost: a general method for makng classfers cost-senstve. In KDD, pages 155 164, 1999. [12] P. Domngos. A few useful thngs to know about machne learnng. Commun. ACM, 5510:78 87, 2012. [13] G. Dror, D. Pelleg, O. Rokhlenko, and I. Szpektor. Churn predcton n new users of yahoo! answers. In Mlle et al. [26], pages 829 834.

[14] G. Dror, D. Pelleg, O. Rokhlenko, and I. Szpektor. Churn predcton n new users of yahoo! answers. In Mlle et al. [26], pages 829 834. [15] T. Evgenou and M. Pontl. Regularzed mult task learnng. In KDD, pages 109 117, 2004. [16] P. Greenberg. CRM at the speed of lght: Capturng and keepng customers n Internet real tme. Mcgraw-Hll, 4th edton, 2009. [17] I. Guyon, V. Lemare, M. Boullé, G. Dror, and D. Vogel. Analyss of the kdd cup 2009: Fast scorng on a large orange customer database. Journal of Machne Learnng Research - Proceedngs Track, 7:1 22, 2009. [18] J. Hamlton. Tme seres analyss, volume 2. Cambrdge Unv Press, 1994. [19] S. Huang, M. Chen, B. Luo, and D. Lee. Predctng aggregate socal actvtes usng contnuous-tme stochastc process. In CIKM, pages 982 991, 2012. [20] M. Jahrer, A. Toscher, J.-Y. Lee, J. Deng, H. Zhang, and J. Spoelstra. Ensemble of collaboratve flterng and feature engneered models for clck through rate predcton. In KDDCup Workshop, 2012. [21] M. Karnstedt, M. Rowe, J. Chan, H. Alan, and C. Hayes. The effect of user features on churn n socal networks. 2011. [22] J. Kawale, A. Pal, and J. Srvastava. Churn predcton n mmorpgs: A socal nfluence based approach. In CSE 4, pages 423 428, 2009. [23] K. Lerman, S. Intagorn, J.-H. Kang, and R. Ghosh. Usng proxmty to predct actvty n socal networks. In Mlle et al. [26], pages 555 556. [24] J. Ln and A. Kolcz. Large-scale machne learnng at twtter. In SIGMOD Conference, pages 793 804, 2012. [25] H. Ma, T. C. Zhou, M. R. Lyu, and I. Kng. Improvng recommender systems by ncorporatng socal contextual nformaton. ACM Trans. Inf. Syst., 292:9, 2011. [26] A. Mlle, F. L. Gandon, J. Mssels, M. Rabnovch, and S. Staab, edtors. Proceedngs of the 21st World Wde Web Conference, WWW 2012, Lyon, France, Aprl 16-20, 2012 Companon Volume. ACM, 2012. [27] T. P. Mnka. A comparson of numercal optmzers for logstc regresson. Techncal Report, 2003. [28] R. J. Oentaryo, E.-P. Lm, D. Lo, F. Zhu, and P. K. Prasetyo. Collectve churn predcton n socal network. In ASONAM, pages 210 214, 2012. [29] S. J. Pan and Q. Yang. A survey on transfer learnng. IEEE Transactons on Knowledge and Data Engneerng, 2210:1345 1359, October 2010. [30] Y. Rchter, E. Yom-Tov, and N. Slonm. Predctng customer churn n moble networks through analyss of socal groups. In SDM, pages 732 741, 2010. [31] P. Sarkar, D. Chakrabart, and M. I. Jordan. Nonparametrc lnk predcton n dynamc networks. In ICML, 2012. [32] A. Sttg and M. Zuckerberg. Managng nformaton about relatonshps n a socal network va a socal tmelne, 2010. US Patent 7,725,492. [33] P. Smth and Z. Zook. Marketng communcatons: ntegratng offlne and onlne wth socal meda. Kogan Page, 2011. [34] J. Subhlok, J. M. Stchnoth, D. R. O hallaron, and T. Gross. Explotng task and data parallelsm on a multcomputer. ACM SIGPLAN Notces, 287:13 22, 1993. [35] B. Vswanath, A. Mslove, M. Cha, and P. K. Gummad. On the evoluton of user nteracton n facebook. In WOSN, pages 37 42, 2009. [36] C. Wlson, B. Boe, A. Sala, K. P. N. Puttaswamy, and B. Y. Zhao. User nteractons n socal networks and ther mplcatons. In EuroSys, pages 205 218, 2009. [37] J. Yang, X. We, M. S. Ackerman, and L. A. Adamc. Actvty lfespan: An analyss of user survval patterns n onlne knowledge sharng communtes. In ICWSM, 2010. [38] T. Yang, R. Jn, Y. Ch, and S. Zhu. Combnng lnk and content for communty detecton: a dscrmnatve approach. In KDD, pages 927 936, 2009. [39] Y. Zhu, E. Zhong, Z. Lu, and Q. Yang. Feature engneerng for place category classfcaton. In Noka Moble Data Challenge Workshop, n conjuncton wth Pervasve 12, 2012.