Your Neighbors Affect Your Ratings: On Geographical Neighborhood Influence to Rating Prediction

Similar documents
Recommender systems and the Netflix prize. Charles Elkan. January 14, 2011

Recommender Systems TIETS43 Collaborative Filtering

Time-aware Collaborative Topic Regression: Towards Higher Relevance in Textual Items Recommendation

TICRec: A Probabilistic Framework to Utilize Temporal Influence Correlations for Time-aware Location Recommendations

Music Recommendation using Recurrent Neural Networks

Image Processing Computer Graphics I Lecture 20. Display Color Models Filters Dithering Image Compression

DS504/CS586: Big Data Analytics Recommender System

Final report - Advanced Machine Learning project Million Song Dataset Challenge

IMPACT OF LISTENING BEHAVIOR ON MUSIC RECOMMENDATION

SELECTING RELEVANT DATA

Million Song Dataset Challenge!

Dynamic Data-Driven Adaptive Sampling and Monitoring of Big Spatial-Temporal Data Streams for Real-Time Solar Flare Detection

arxiv: v1 [cs.ir] 14 Nov 2017

Multihop Routing in Ad Hoc Networks

Reduce the Wait Time For Customers at Checkout

Updates. v Quiz 1 has been graded (by our TA) Grades are available on Canvas

Ar#ficial)Intelligence!!

Geolocating Static Cameras

AUTOMATED MUSIC TRACK GENERATION

Privacy preserving data mining multiplicative perturbation techniques

Recommendation Systems UE 141 Spring 2013

AVA: A Large-Scale Database for Aesthetic Visual Analysis

DS504/CS586: Big Data Analytics Recommender System

Transport Capacity and Spectral Efficiency of Large Wireless CDMA Ad Hoc Networks

CH 13. Probability and Data Analysis

Automatic Generation of Social Tags for Music Recommendation

Resource Management in QoS-Aware Wireless Cellular Networks

Computing Touristic Walking Routes using Geotagged Photographs from Flickr

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition

Kernels and Support Vector Machines

On-site Traffic Accident Detection with Both Social Media and Traffic Data

Improvement of Himawari-8 observation data quality

Power allocation for Block Diagonalization Multi-user MIMO downlink with fair user scheduling and unequal average SNR users

Supervisors: Rachel Cardell-Oliver Adrian Keating. Program: Bachelor of Computer Science (Honours) Program Dates: Semester 2, 2014 Semester 1, 2015

Localization in Wireless Sensor Networks

Intrinsic Semiconductor

Global Journal of Engineering Science and Research Management

PHYSICS-BASED THRESHOLD VOLTAGE MODELING WITH REVERSE SHORT CHANNEL EFFECT

Dota2 is a very popular video game currently.

Where We re Going. Heavyweight Applications of Lightweight User Models. Some Stories. Usenet Interface. Some Stories. Cross-Sales at GUS

Correction of Clipped Pixels in Color Images

Recommendations Worth a Million

Chapter 5. Signal Analysis. 5.1 Denoising fiber optic sensor signal

Estimation of Non-stationary Noise Power Spectrum using DWT

The Automatic Classification Problem. Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification

Throughput-optimal number of relays in delaybounded multi-hop ALOHA networks

Context-Aware Movie Recommendations: An Empirical Comparison of Pre-filtering, Post-filtering and Contextual Modeling Approaches

Why is scramble needed for DFE. Gordon Wu

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

LOCATION PRIVACY & TRAJECTORY PRIVACY. Elham Naghizade COMP20008 Elements of Data Processing 20 rd May 2016

Location and User Activity Preference Based Recommendation System

Chapter 2 Distributed Consensus Estimation of Wireless Sensor Networks

Using Crowdsourced Data in Location-based Social Networks to Explore Influence Maximization

Statistical Tests: More Complicated Discriminants

Deepening Our Understanding of Social Media via Data Mining

INDOOR LOCATION SENSING AMBIENT MAGNETIC FIELD. Jaewoo Chung

Lane Detection in Automotive

CSE 258 Winter 2017 Assigment 2 Skill Rating Prediction on Online Video Game

ASTER GDEM Readme File ASTER GDEM Version 1

TD-Leaf(λ) Giraffe: Using Deep Reinforcement Learning to Play Chess. Stefan Lüttgen

Lecture 8: GIS Data Error & GPS Technology

Applications & Theory

Emitter Location in the Presence of Information Injection

Adaptive Selective Sidelobe Canceller Beamformer

Satellite Navigation Integrity and integer ambiguity resolution

Prediction of Cluster System Load Using Artificial Neural Networks

BIG DATA EUROPE TRANSPORT PILOT: INTRODUCING THESSALONIKI. Josep Maria Salanova Grau CERTH-HIT

CS 4501: Introduction to Computer Vision. Filtering and Edge Detection

Randomized Channel Access Reduces Network Local Delay

Advancing the Frontier in Social Media Mining

Privacy-Preserving Collaborative Recommendation Systems Based on the Scalar Product

Scaling Mobile Alternate Reality Games with Geo-Location Translation

A Comparison of Playlist Generation Strategies for Music Recommendation and a New Baseline Scheme

Adaptive Sensor Selection Algorithms for Wireless Sensor Networks. Silvia Santini PhD defense October 12, 2009

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

Package reddprec. October 17, 2017

Reinforcement Learning in Games Autonomous Learning Systems Seminar

Bayesian Positioning in Wireless Networks using Angle of Arrival

Contents. List of Figures List of Tables. Structure of the Book How to Use this Book Online Resources Acknowledgements

REAL TIME DIGITAL SIGNAL PROCESSING

VALIDATION OF THE CLOUD AND CLOUD SHADOW ASSESSMENT SYSTEM FOR LANDSAT IMAGERY (CASA-L VERSION 1.3)

Noncoherent Compressive Sensing with Application to Distributed Radar

COGNITIVE Radio (CR) [1] has been widely studied. Tradeoff between Spoofing and Jamming a Cognitive Radio

We calculate the median of individual (observed) seismic spectra over 3-hour time slots.

A Comparative Study of Quality of Service Routing Schemes That Tolerate Imprecise State Information

POSITION ESTIMATION USING LOCALIZATION TECHNIQUE IN WIRELESS SENSOR NETWORKS

The International Pulsar Timing Array. Maura McLaughlin West Virginia University June

Learning Recency and Inferring Associations in Location Based Social Network for Emotion induced Point-of-Interest Recommendation

Ionospheric Estimation using Extended Kriging for a low latitude SBAS

Image analysis. CS/CME/BioE/Biophys/BMI 279 Oct. 31 and Nov. 2, 2017 Ron Dror

CellSpecks: A Software for Automated Detection and Analysis of Calcium

Anti-Jamming Partially Regular LDPC Codes for Follower Jamming with Rayleigh Block Fading in Frequency Hopping Spread Spectrum

induced Aging g Co-optimization for Digital ICs

Empirical Assessment of Classification Accuracy of Local SVM

IES, Faculty of Social Sciences, Charles University in Prague

VARIOUS METHODS IN DIGITAL IMAGE PROCESSING. S.Selvaragini 1, E.Venkatesan 2. BIST, BIHER,Bharath University, Chennai-73

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

10:00-10:30 HOMOGENIZATION OF THE GLOBAL TEMPERATURE Victor Venema, University of Bonn

Audio Restoration Based on DSP Tools

Lecture 3 - Regression

Transcription:

Your Neighbors Affect Your Ratings: On Geographical Neighborhood Influence to Rating Prediction Longke Hu Aixin Sun Yong Liu Nanyang Technological University Singapore

Outline 1 Introduction 2 Data analysis and observations 3 Related work 4 Business rating prediction 5 Experiments 6 Conclusion Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 2 / 25

The problem: business rating prediction Rating prediction is to predict the preference rating of a user to a product or service (i.e., an item) that she has not rated before. A well defined research problem in recommender systems An array of widely studied solutions, e.g., collaborative filtering Users Items: songs, movies, books... A business is an item in our problem setting A business can be a restaurant, shopping mall, beauty salon... A business physically exists at a specific geo-location with latitude/longitude coordinates Most businesses are not geographically isolated from others Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 3 / 25

A business physically exists at a geo-location When a user visits a business, there is a good chance that: She walks by its neighbors if they are located within walking distance. The overall environment of that region might affect her rating to the business. Questions 1 Is it true that most businesses have neighbors in walking distance? 2 Is there any correlation between a business s rating and its neighbors average rating? 3 Is the category of a business a factor here? Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 4 / 25

The Yelp dataset Was used in ACM RecSys Challenge 2013 Sampled from the greater Phoenix, AZ metropolitan area from March 2005 to January 2013 11,537 businesses, 229,907 reviews by 43,873 users, and 8,282 check-in sets More details A business has id, name, latitude longitude, categories... A review contains business id, user id, rating from 1 to 5 stars, date, review text, and voting. A check-in set for a business contains the aggregated number of check-ins in every hour from Monday to Sunday. Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 5 / 25

Geographical neighbors within walking distance? Observation 1 Most businesses have neighbors within a short geographical distance from their locations. Percentage of businesses having at least 1, 3, 6, 10 neighbors within a distance of 20-2000 meters. Percentage 100 90 80 70 60 50 40 30 20 10 0 1 neighbor 3 neighbors 6 neighbors 10 neighbors 20 50 100 200 500 1000 2000 Distance threshold (meter) More than 44% of businesses have one neighbor next to it within 20 meters. About 95% of businesses have one neighbor within 500 meters. Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 6 / 25

Business rating correlation? Observation 2 The average rating of a business is weakly positively correlated with the average rating of its neighbors. Pearson s correlation coefficient between a business s rating and the average rating of its 1, 3, 6, and 10 nearest neighbors, at different distance thresholds from 20 to 2000 meters. Correlation coefficient 0.20 0.15 0.10 0.05 0.00 1NN 3NN 6NN 10NN Random Pearson s correlation coefficient is in the range of 0.109 to 0.173. The correlation is relatively stronger within a smaller distance. -0.05 20 50 100 200 500 1000 2000 Distance threshold (meter) Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 7 / 25

Is business category a factor? Correlation coefficient 0.30 0.25 0.20 0.15 0.10 0.05 0.00 Restaurants Shopping Food Beauty & Spas Nightlife Random Correlation coefficient 0.30 0.25 0.20 0.15 0.10 0.05 0.00 Restaurants Shopping Food Beauty & Spas Nightlife Random Percentage in same category -0.05 20 50 100 200 500 1000 2000 Distance threshold of the 1NN (meter) 70 60 50 40 30 20 10 (a) Rating correlation of 1NN Restaurants Shopping Food Beauty & Spas Nightlife 0 20 50 100 200 500 1000 2000 Distance threshold of 1NN (meter) (c) % 1NN in same category Percentage in same category -0.05 20 50 100 200 500 1000 2000 Distance threshold of 6NN (meter) 70 60 50 40 30 20 10 (b) Rating correlation of 6NN Restaurants Shopping Food Beauty & Spas Nightlife 0 20 50 100 200 500 1000 2000 Distance threshold of 6NN (meter) (d) % 6NN in same category Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 8 / 25

Questions and observations 1 Is it true that most businesses have neighbors in walking distance? Observation 1: Most businesses have neighbors within a short geographical distance from their locations. 2 Is there any correlation between a business s rating and its neighbors rating? Observation 2: The average rating of a business is weakly positively correlated with the average rating of its neighbors. 3 Is the category of a business a factor here? Observation 3: The weak positive correlation in ratings is independent of the categories of the businesses and/or their neighbors. Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 9 / 25

Data analysis: a summary Intrinsic characteristics The rating of a business should mainly depend on the characteristics of the business itself, e.g., quality of products or services, not its neighbors. Extrinsic characteristics Things of one kind come together : A business is not geographically independent from its neighbors. These neighbors give a user the sense of the surrounding environment of the business, e.g., hygiene standard. Business rating prediction Both the intrinsic and extrinsic characteristics of a business shall be modeled in rating prediction. Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 10 / 25

Related work: collaborative filtering Collaborative Filtering: Similar users rate items similarly or similar items receive similar ratings from users. Memory-Based CF Finding similar users or items by using similarity measures UserKNN, ItemKNN, Pearson s Correlation, Cosine similarity Similar users or items are also known as neighbors Model-Based CF Building models from the observed user-item ratings Latent factor model: users and items are jointly mapped into a shared latent space of low dimensionality Matrix factorization models: Biased MF, SVD++, Social MF... Evaluated on: Yahoo! Music, Last.fm, Netflix, Douban... Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 11 / 25

Related work: POI recommendation and prediction POI recommendation is to recommend unvisited POIs to users Geographical influence: Users tend to visit nearby POIs of their home/office locations; nearby locations of the POIs in their favor Temporal influence: Users check-in different types of POIs at different time slots of a day Social influence among friends POI prediction is to predict which POI a user would visit next Based on user s current location/time, predict next POI to visit Both geographical and temporal influence have been considered. Neighborhood influence: key differences User s point of view vs business s point of view User s cost of travel (time, monetary) Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 12 / 25

Business rating prediction: Biased Matrix Factorization The basic idea of Biased MF Each user and each item is represented by latent factors p u and q i The predicted rating ˆr ui is the inner product of the two, with biases ˆr ui = µ + bu + b i + p u q i Parameter estimation Optimization: minimize regularized squared error on K Algorithm: Stochastic gradient descent (SGD) and alternating least squares (ALS) min p,q,b (u,i) K ( (r ui ˆr ui ) 2 + λ 1 p u 2 + q i 2) ) + λ 2 (bu 2 + bi 2 Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 13 / 25

Incorporating neighborhood influence Two kinds of factors of a business Intrinsic Extrinsic Intrinsic characteristics: latent factors q i Extrinsic characteristics: latent factors v i Intrinsic q i Extrinsic v i A business + N influence Intrinsic Extrinsic Intrinsic Extrinsic Its neighbors With influence from neighborhood, the predicted rating ˆr ui is: ˆr ui = µ + b u + b i + p u qi + α 1 N i n N i v n Objective function is updated with regularization components for v n. Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 14 / 25

Incorporating category influence Why category influence? Category of a business reflects the characteristics of a business Users may use different criteria in different categories POI recommendation achieves better accuracy by considering the categories of the POIs Approach: Each category is modeled by a latent factors vector d c. ˆr ui = µ + b u + b i + p u q i + α α 1 2 v n + d c N i C i n N i c C i The objective function is updated with regularization components for d c Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 15 / 25

Incorporating review content A user rating usually comes with a textual review Review elaborates the reason behind the rating Partially reflects the characteristics of the business Approach: Map the review words to the same latent factors space. Decompose q i into a combination of latent factors of review words business latent facotrs q i 1 R i ˆr ui = µ + b u + b i + p u 1 R i w R i q w + α 1 N i w R i q w v n + α 2 C i n N i c C i d c Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 16 / 25

Popularity and geo-distance influences Both are distinctive features in POI recommendation Businesses in downtown area likely receive more visits Users tend to visit nearby POIs Approach: model region popularity and geo-distance as biases Business popularity ρ i : Number of reviews + number of check-ins Geo-distance τ u,i : Estimate a user s home location by recursive grid search algorithm, then compute the distance to business Rating bias z with two parameters β i and β u : z = β i ρ i + β u τ u,i ˆr ui = µ + b u + b i + z + p 1 u R i w R i q w + α 1 N i n N i v n + α 2 C i c C i d c Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 17 / 25

Five factors in business rating prediction Neighborhood influence Category influence Review content Popularity bias Geo-distance bias ˆr ui = µ+b u +b i + z +p u 1 R i w R i q w + α 1 α 2 v n N i + d c C i n N i c C i z = β i ρ i + β u τ u,i Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 18 / 25

Experiment setting Yelp dataset Removal of businesses and users having fewer than 10 reviews Stopword removal and stemming in reviews 113,514 ratings by 3,965 users to 3,760 businesses For each user, 70% ratings used for training, 30% for testing Evaluation metric Mean Absolute Error: MAE = 1 r ui ˆr ui T (u,i) T Root Mean Square Error: RMSE = 1 (r ui ˆr ui ) 2 T (u,i) T Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 19 / 25

Experimental results Method MAE RMSE Global Mean (µ) 0.8854 1.0962 Item Mean 0.8369 1.0939 User Mean 0.8599 1.0838 Item KNN 0.8208 1.0574 User KNN 0.8110 1.0429 Biased MF 0.8237 1.0483 SVD++ 0.8120 1.0352 Social MF 0.8123 1.0303 N-MF 0.7952 1.0110 NC-MF 0.7929 1.0096 NCR-MF 0.7923 1.0078 NCRP-MF 0.7920 1.0072 NCRPD-MF 0.7958 1.0132 CRP-MF 0.7956 1.0138 CRPD-MF 0.8062 1.0191 Method comparison 8 baseline methods 7 proposed methods N Neighborhood influence C Category influence R Review content P Popularity bias D Distance bias Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 20 / 25

Experimental results: observations 1 Methods with geographical neighborhood influence outperform all baseline methods 2 The best prediction accuracy is achieved by NCRP-MF; NCRPD-MF is poorer than N-MF Geographical neighborhood (N) Business category (C) Review content (R) Business popularity (P) Geo-distance (D) 3 SVD++, Social MF, and User KNN are the three best methods among baselines Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 21 / 25

Impact of neighborhood size 0.7960 MAE 0.7960 MAE 0.7958 0.7958 0.7956 0.7956 0.7954 0.7954 0.7952 0.7952 0.7950 20 50 100 200 500 1000 2000 Distance threshold (meter) (e) Neighbors by distance (MAE) 0.7950 1 2 3 4 5 6 7 8 9 10 Nearest neighbor (f) By neighborhood size (MAE) 1.0120 RMSE 1.0120 RMSE 1.0118 1.0118 1.0116 1.0116 1.0114 1.0114 1.0112 1.0112 1.0110 1.0110 20 50 100 200 500 1000 2000 Distance threshold (meter) (g) Neighbors by distance (RMSE) 1 2 3 4 5 6 7 8 9 10 Nearest neighbor (h) By neighborhood size (RMSE) Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 22 / 25

Cold-start business rating prediction Predict ratings of existing users to new businesses Users: appear in our training data (p u and b u are known) Businesses: removed in data pre-processing for having fewer than 10 reviews (q i and b i are unknown) 20,395 ratings made by 3,319 existing users to 6,939 new businesses Known factors: Global mean µ User mean µ u User latent factors p u User bias b u Method MAE RMSE Global Mean 1.0319 1.2749 User Mean 0.9963 1.2566 Biased MF 1.0020 1.2539 N-MF 0.9956 1.2538 NC-MF 0.9936 1.2535 Neighbor latent factors v n Category latent factors d c Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 23 / 25

Conclusion 1 A business has a physical location and a business has neighbors. 2 A business s rating is weakly positively correlated with its geographical neighbors rating. 3 We extend the Biased MF model to include both intrinsic characteristics and extrinsic characteristics of a business. 4 We show that geographical neighborhood influence, business category, popularity, and review content improve rating prediction accuracy. 5 We show that geographical distance between a user and a business adversely affects the prediction accuracy. Which neighbors to consider? Longke Hu, Aixin Sun, Yong Liu Your Neighbors Affect Your Ratings SIGIR 14 Gold Coast 24 / 25

Dr. Aixin SUN axsun@ntu.edu.sg http://www.ntu.edu.sg/home/axsun/