Crowdsourcing and Its Applications on Scientific Research. Sheng Wei (Kuan Ta) Chen Institute of Information Science, Academia Sinica

Crowdsourcing and Its Applications on Scientific Research Sheng Wei (Kuan Ta) Chen Institute of Information Science, Academia Sinica PNC 2009

Crowdsourcing = Crowd + Outsourcing soliciting solutions via open calls to large scale communities PNC 2009 / Kuan Ta Chen 2

Examples Call for professional helps Award 50,000 to 1,000,000 for each tasks Office work platform Microtask platform Over 30,000 tasks at the same time PNC 2009 / Kuan Ta Chen 3

What tasks are crowdsourceable? PNC 2009 / Kuan Ta Chen 4

Software Development Reward: 25,000 USD PNC 2009 / Kuan Ta Chen 5

Reward: 4.4 USD/hour Data Entry PNC 2009 / Kuan Ta Chen 6

Reward: 0.04 USD Image Tagging PNC 2009 / Kuan Ta Chen 7

General Questions Reward: points on Yahoo! Answers PNC 2009 / Kuan Ta Chen 8

Applications in Scientific Researches PNC 2009 / Kuan Ta Chen 9

Image Understanding 0.01 USD/ task PNC 2009 / Kuan Ta Chen 10

0.02 USD/ task PNC 2009 / Kuan Ta Chen 11

Human Action Recognition 0.01 USD/ task PNC 2009 / Kuan Ta Chen 12

0.01 USD/ task PNC 2009 / Kuan Ta Chen 13

Linguistic Annotations Word similarity (Snow et al. 2008) USD 0.2 for labeling 30 word pairs PNC 2009 / Kuan Ta Chen 14

Linguistic Annotations Affection recognition (Snow et al. 2008) USD 0.4 to label 20 headlines (140 labels) PNC 2009 / Kuan Ta Chen 15

Linguistic Annotations Textual entailment If Microsoft was established in Italy in 1985, then Was Microsoft established in 1985? Word sense disambiguation a bass on the line vs. a funky bass line Temporal annotation Ran happens before fell PNC 2009 / Kuan Ta Chen 16

More Examples Document relevance evaluation Alonso et al. (2008) User rating collection Kittur et al. (2008) Noun compound paraphrasing Nakov (2008) Name resoluation Su et al. (2007) PNC 2009 / Kuan Ta Chen 17

Introduction Talk Progress Crowdsourcing Applications Crowdsourcing and Scientific Researches Crowdsourcing in Multimedia QoE Assessment Conclusion

What is QoE? Quality of Experience = Users Subjective Satisfaction about A Service (Multimedia Content in this Context) PNC 2009 / Kuan Ta Chen 19

Movitation To provide a satisfying end user experience, we need to measure the QoE of multimedia content efficiently and reliably But How? Common approaches objective evaluation methodology subjective evaluation methodology PNC 2009 / Kuan Ta Chen 20

Objective Methodologies Image: PSNR, SSIM Voice: PESQ Video: VQM, PEVQ Problems cannot capture all the QoE dimensions that may affect users experiences cannot include external factors the quality of the headsets the distance between the viewer and the display PNC 2009 / Kuan Ta Chen 21

Subjective Methodology MOS (Mean Opinion Score) Issues MOS Quality Impairment 5 Excellent Imperceptible 4 Good Perceptible but not annoying 3 Fair Slightly annoying 2 Poor Annoying 1 Bad Very annoying The concepts of the five scales cannot be concretely defined Dissimilar interpretations of the scale among users The MOS is only on an ordinal scale No methodology for verifying users scoring results PNC 2009 / Kuan Ta Chen 22

Drawbacks of Subjective Evaluation High economic cost Participant payment High labor cost Supervision labor Physcial space/time requirement Transportation cost Laboratory space (cannot do 1000 ppl experiment unless extremely resourceful) Difficult to find participants doing experiments at 3am PNC 2009 / Kuan Ta Chen 23

Crowdsourcing Challenges Not every Internet user is trustworthy Experiments without supervision users may give erroneous feedback perfunctorily, carelessly, or dishonestly Increase the variance of the evaluation results and lead to biased conclusions Need to find a way to detect problematic inputs! PNC 2009 / Kuan Ta Chen 24

Our Contributions We propose a crowdsourceable framework to quantify the QoE of multimedia content. supports systematic verification of participants inputs; simpler than that of MOS, so there is less burden on participants; derives interval scale scores that enable subsequent quantitative analysis and QoE provisioning. PNC 2009 / Kuan Ta Chen 25

Paired Comparison Test Stimulus A Stimulus B Which one is better? Vote Stimulus A PNC 2009 / Kuan Ta Chen 26

Features of Paired Comparison Generalizable across a variety of multimedia applications Simple comparative judgment Interval scale QoE scores can be calculated The users feedback can be verified PNC 2009 / Kuan Ta Chen 27

Verification of Users Inputs Transitivity property If A > B and B > C A should be > C Transitivity Satisfaction Rate (TSR) # of triples satisfy the transitivity rule # of triples the transitivity rule may apply to Detect inconsistent judgments from problematic users TSR = 1 perfect consistency TSR >= 0.8 generally consistent TSR < 0.8 judgments are consistent PNC 2009 / Kuan Ta Chen 28

Experiment Design Suppose our task is to evaluate the effect of n audio processing algorithms (e.g., audio encoding) 1. Select an audio clip (source clip) as the evaluation target 2. Apply the n algorithms to the source clip and generate n different versions of the clip (test clips) 3. Create an Adobe Flash based system for users to evaluate the n test clips n 2 4. A user need to perform paired comparisons PNC 2009 / Kuan Ta Chen 29

Concept Flow of Acoustic QoE Evaluation PNC 2009 / Kuan Ta Chen 30

Acoustic QoE Evaluation Which one is better? Simple pair comparison PNC 2009 / Kuan Ta Chen 31

Optical QoE evaluation Which one is better? Simple pair comparison PNC 2009 / Kuan Ta Chen 32

Acoustic QoE Evaluation MP3 compression level Source clips: one fast paced and one slow paced song MP3 CBR format with 6 bit rate levels: 32, 48, 64, 80, 96, and 128 Kbps 127 participants and 3,660 paired comparisons Effect of packet loss rate on VoIP Two speech codecs: G722.1 and G728 Packet loss rate: 0%, 4%, and 8% 62 participants and 1,545 paired comparisons PNC 2009 / Kuan Ta Chen 33

Evaluation Results MP3 Compression Level VoIP Packet Loss Rate PNC 2009 / Kuan Ta Chen 34

Video codec Optical QoE Evaluation Source clips: one fast paced and one slow paced video clip Three codecs: H.264, WMV3, and XVID Two bit rates: 400 and 800 Kbps 121 participants and 3,345 paired comparisons PNC 2009 / Kuan Ta Chen 35

Optical QoE Evaluation Loss concealment scheme Source clips: one fast paced and one slow paced video clip Two concealment schemes Frame copy (FC): conceal errors in a video frame by replacing a corrupted block with the block in the corresponding position in the previous frame Frame copy with frame skip (FCFS): a frame will be dropped if the percentage of corrupted slices in it exceeds 10%; otherwise apply the FC method to conceal the errors Packet loss rate: 1%, 5%, and 8% 91 participants and 2,745 paired comparisons PNC 2009 / Kuan Ta Chen 36

Evaluation Results Video Codec Concealment Scheme PNC 2009 / Kuan Ta Chen 37

Laboratory Participant Source Recruit part time workers at an hourly rate of 8 USD MTurk Post experiments on the Mechanical Turk web site Pay the participant 0.15 USD for each qualified experiment Community Seek participants on the website of an Internet community with 1.5 million members Pay the participant an amount of virtual currency that was equivalent to one US cent for each qualified experiment PNC 2009 / Kuan Ta Chen 38

Evaluation of Proposed Framework Three participant sources Laboratory Amazon Mechanical Turk (MTurk) Community Each with different cost structure We compare the cost required by each participant source and the data quality it produces PNC 2009 / Kuan Ta Chen 39

Summary The first crowdsourcable QoE evaluation framework Users inputs can be verified the transitivity property: A > B and B > C A > C detect inconsistent judgements from problematic users Experiments can thus be outsourced to Internet crowd lower monetary cost wider participant diversity maintaining the evaluation results quality Chen et al, "A Crowdsourceable QoE Evaluation Framework for Multimedia Content, Proceedings of ACM Multimedia 2009.

Quadrant of Euphoria http://mmnet.iis.sinica.edu.tw/link/qoe

Conclusion Crowdsourcing provides a new paradigm and a new platform for scientific researches New applications, new methodologies, and new businesses are emergent with the aid of crowdsouring PNC 2009 / Kuan Ta Chen 42

Thank You! Sheng Wei (Kuan Ta) Chen http://www.iis.sinica.edu.tw/~swc PNC 2009