Crowdsourcing and Its Applications on Scientific Research. Sheng Wei (Kuan Ta) Chen Institute of Information Science, Academia Sinica

Similar documents
ORIGINAL ARTICLE A COMPARATIVE STUDY OF QUALITY ANALYSIS ON VARIOUS IMAGE FORMATS

From network-level measurements to Quality of Experience: Estimating the quality of Internet access with ACQUA

Understanding User Privacy in Internet of Things Environments IEEE WORLD FORUM ON INTERNET OF THINGS / 30

Online Game Quality Assessment Research Paper

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat

INTERNATIONAL TELECOMMUNICATION UNION

Effect of Dynamic Local Lag Control with Dynamic Control of Prediction Time in Joint Haptic Drum Performance

Enhancement of Dynamic Local Lag Control for Networked Musical Performance

computational social media lecture 07: crowdsourcing

372 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 1, JANUARY Natural images are not necessarily images of natural environments such as

Cheap, Fast and Good Enough: Speech Transcription with Mechanical Turk. Scott Novotney and Chris Callison-Burch 04/02/10

RECOMMENDATION ITU-R M.1181

Adaptive -Causality Control with Adaptive Dead-Reckoning in Networked Games

CS 5306 INFO 5306: Crowdsourcing and Human Computation

HISTOGRAM BASED APPROACH FOR NON- INTRUSIVE SPEECH QUALITY MEASUREMENT IN NETWORKS

IEEE TRANSACTIONS ON IMAGE PROCESSING 1. Massive Online Crowdsourced Study of Subjective and Objective Picture Quality

Application-driven Cross-layer Optimization in Wireless Networks

Application-driven Cross-layer Optimization for Mobile Multimedia Communication using a Common Application Layer Quality Metric

Perceptual wideband speech and audio quality measurement. Dr Antony Rix Psytechnics Limited

Telephone Speech Quality Standards. for. Wideband IP Phone Terminals (handsets) CES-Q March 30, 2009

New Challenges of immersive Gaming Services

LCC 3710 Principles of Interaction Design. Readings. Sound in Interfaces. Speech Interfaces. Speech Applications. Motivation for Speech Interfaces

Review of recent standardization activities in speech quality of experience

Chapter IV THEORY OF CELP CODING

UNIT 7C Data Representation: Images and Sound

Conversational Speech Quality - The Dominating Parameters in VoIP Systems

15110 Principles of Computing, Carnegie Mellon University

Recommendation ITU-R BT.1866 (03/2010)

15110 Principles of Computing, Carnegie Mellon University

Digitizing Color. Place Value in a Decimal Number. Place Value in a Binary Number. Chapter 11: Light, Sound, Magic: Representing Multimedia Digitally

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

5/17/2009. Digitizing Color. Place Value in a Binary Number. Place Value in a Decimal Number. Place Value in a Binary Number

Android Speech Interface to a Home Robot July 2012

Information Services with Social Components

Survey of Web-based Crowdsourcing Frameworks for Subjective Quality Assessment

Performance Improving LSB Audio Steganography Technique

SUBJECTIVE QUALITY ASSESSMENT OF SCREEN CONTENT IMAGES

Introduction to Equalization

Quality comparison of wideband coders including tandeming and transcoding

Final project proposals are in!

Implementation of a Visible Watermarking in a Secure Still Digital Camera Using VLSI Design

UNIT 7C Data Representation: Images and Sound Principles of Computing, Carnegie Mellon University CORTINA/GUNA

CHAPTER 7 ROLE OF ADAPTIVE MULTIRATE ON WCDMA CAPACITY ENHANCEMENT

INTERNATIONAL TELECOMMUNICATION UNION

AVA: A Large-Scale Database for Aesthetic Visual Analysis

Crowdsourcing Cultural Heritage

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

Wideband Speech Coding & Its Application

Proceedings of Meetings on Acoustics

Web Science & Technologies University of Koblenz Landau, Germany. Human Computation. Cristina Sarasua

Speech Quality Assessment for Wideband Communication Scenarios

INFO/CS 4302 Web Informa6on Systems

UNIT 7B Data Representa1on: Images and Sound. Pixels. An image is stored in a computer as a sequence of pixels, picture elements.

About user acceptance in hand, face and signature biometric systems

Do-It-Yourself Object Identification Using Augmented Reality for Visually Impaired People

EUROPEAN pr ETS TELECOMMUNICATION August 1995 STANDARD

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

The Jigsaw Continuous Sensing Engine for Mobile Phone Applications!

Home Certification and Troubleshooting Technics. SCTE Piedmont Chapter

CONNECTICUT Statewide Career & Technical Education ASSESSMENT

3GPP TS V5.0.0 ( )

Audio Imputation Using the Non-negative Hidden Markov Model

12/1/2010. Internet Enabled Human Computation. To do. Crowdsourcing. Powerset. Built in 1770 by Wolfgang von Kempelen. CSE 454 Daniel Weld

Practical Content-Adaptive Subsampling for Image and Video Compression

PARAMETER-BASED SPEECH QUALITY MEASURES FOR GSM

Comparison of Low-Rate Speech Transcoders in Electronic Warfare Situations: Ambe-3000 to G.711, G.726, CVSD

Call Quality Measurement for Telecommunication Network and Proposition of Tariff Rates

Testing Triple Play Services Over Open Source IMS Solution for Various Radio Access Networks

IMPROVING AUDIO WATERMARK DETECTION USING NOISE MODELLING AND TURBO CODING

Differential Image Compression for Telemedicine: A Novel Approach

3GPP TS V8.0.0 ( )

Liangliang Cao *, Jiebo Luo +, Thomas S. Huang *

ITU-T P.863. Amendment 1 (11/2011)

The Perception-Action Cycle

3GPP TS V8.0.0 ( )

A CROWDSOURCED DESIGN EXPERIMENT USING FREE- HAND SKETCH DESIGN METHOD BASED ON THE CDESIGN FRAMEWORK

COM 12 C 288 E October 2011 English only Original: English

The Future of Cloud Gaming

Objective and subjective evaluations of some recent image compression algorithms

Case Study: The Autodesk Virtual Assistant

Quality of Experience assessment methodologies in next generation video compression standards. Jing LI University of Nantes, France

Transcoding free voice transmission in GSM and UMTS networks

Transparency! in open collaboration environments

No-Reference Image Quality Assessment using Blur and Noise

This is by far the most ideal method, but poses some logistical problems:

Efficient Bitrate Reduction Using A Game Attention Model in Cloud Gaming

An Overview to Human Computation. Dr. Ling-Jyh Chen Institute of Information Science, Academia Sinica

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

Introduction. Descriptive Statistics. Problem Solving. Inferential Statistics. Chapter1 Slides. Maurice Geraghty

Why Visual Quality Assessment?

VISUAL ARTIFACTS INTERFERENCE UNDERSTANDING AND MODELING (VARIUM)

WEB-BASED VR EXPERIMENTS POWERED BY THE CROWD

Environmental Sound Recognition using MP-based Features

Lecture 9: Case Study -- Video streaming over Hung-Yu Wei National Taiwan University

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec

RECOMMENDATION ITU-R BT SUBJECTIVE ASSESSMENT OF STANDARD DEFINITION DIGITAL TELEVISION (SDTV) SYSTEMS. (Question ITU-R 211/11)

DWT based high capacity audio watermarking

THE REAL PANDORA STORY. Liz\Internet & Satellite Radio\Pandora\Radio and Pandora R LA

Contents. Sevana Voice Quality Analyzer Copyright (c) 2009 by Sevana Oy, Finland. All rights reserved.

A GAME ATTENTION MODEL FOR EFFICIENT BITRATE ALLOCATION IN CLOUD GAMING

Transcription:

Crowdsourcing and Its Applications on Scientific Research Sheng Wei (Kuan Ta) Chen Institute of Information Science, Academia Sinica PNC 2009

Crowdsourcing = Crowd + Outsourcing soliciting solutions via open calls to large scale communities PNC 2009 / Kuan Ta Chen 2

Examples Call for professional helps Award 50,000 to 1,000,000 for each tasks Office work platform Microtask platform Over 30,000 tasks at the same time PNC 2009 / Kuan Ta Chen 3

What tasks are crowdsourceable? PNC 2009 / Kuan Ta Chen 4

Software Development Reward: 25,000 USD PNC 2009 / Kuan Ta Chen 5

Reward: 4.4 USD/hour Data Entry PNC 2009 / Kuan Ta Chen 6

Reward: 0.04 USD Image Tagging PNC 2009 / Kuan Ta Chen 7

General Questions Reward: points on Yahoo! Answers PNC 2009 / Kuan Ta Chen 8

Applications in Scientific Researches PNC 2009 / Kuan Ta Chen 9

Image Understanding 0.01 USD/ task PNC 2009 / Kuan Ta Chen 10

0.02 USD/ task PNC 2009 / Kuan Ta Chen 11

Human Action Recognition 0.01 USD/ task PNC 2009 / Kuan Ta Chen 12

0.01 USD/ task PNC 2009 / Kuan Ta Chen 13

Linguistic Annotations Word similarity (Snow et al. 2008) USD 0.2 for labeling 30 word pairs PNC 2009 / Kuan Ta Chen 14

Linguistic Annotations Affection recognition (Snow et al. 2008) USD 0.4 to label 20 headlines (140 labels) PNC 2009 / Kuan Ta Chen 15

Linguistic Annotations Textual entailment If Microsoft was established in Italy in 1985, then Was Microsoft established in 1985? Word sense disambiguation a bass on the line vs. a funky bass line Temporal annotation Ran happens before fell PNC 2009 / Kuan Ta Chen 16

More Examples Document relevance evaluation Alonso et al. (2008) User rating collection Kittur et al. (2008) Noun compound paraphrasing Nakov (2008) Name resoluation Su et al. (2007) PNC 2009 / Kuan Ta Chen 17

Introduction Talk Progress Crowdsourcing Applications Crowdsourcing and Scientific Researches Crowdsourcing in Multimedia QoE Assessment Conclusion

What is QoE? Quality of Experience = Users Subjective Satisfaction about A Service (Multimedia Content in this Context) PNC 2009 / Kuan Ta Chen 19

Movitation To provide a satisfying end user experience, we need to measure the QoE of multimedia content efficiently and reliably But How? Common approaches objective evaluation methodology subjective evaluation methodology PNC 2009 / Kuan Ta Chen 20

Objective Methodologies Image: PSNR, SSIM Voice: PESQ Video: VQM, PEVQ Problems cannot capture all the QoE dimensions that may affect users experiences cannot include external factors the quality of the headsets the distance between the viewer and the display PNC 2009 / Kuan Ta Chen 21

Subjective Methodology MOS (Mean Opinion Score) Issues MOS Quality Impairment 5 Excellent Imperceptible 4 Good Perceptible but not annoying 3 Fair Slightly annoying 2 Poor Annoying 1 Bad Very annoying The concepts of the five scales cannot be concretely defined Dissimilar interpretations of the scale among users The MOS is only on an ordinal scale No methodology for verifying users scoring results PNC 2009 / Kuan Ta Chen 22

Drawbacks of Subjective Evaluation High economic cost Participant payment High labor cost Supervision labor Physcial space/time requirement Transportation cost Laboratory space (cannot do 1000 ppl experiment unless extremely resourceful) Difficult to find participants doing experiments at 3am PNC 2009 / Kuan Ta Chen 23

Crowdsourcing Challenges Not every Internet user is trustworthy Experiments without supervision users may give erroneous feedback perfunctorily, carelessly, or dishonestly Increase the variance of the evaluation results and lead to biased conclusions Need to find a way to detect problematic inputs! PNC 2009 / Kuan Ta Chen 24

Our Contributions We propose a crowdsourceable framework to quantify the QoE of multimedia content. supports systematic verification of participants inputs; simpler than that of MOS, so there is less burden on participants; derives interval scale scores that enable subsequent quantitative analysis and QoE provisioning. PNC 2009 / Kuan Ta Chen 25

Paired Comparison Test Stimulus A Stimulus B Which one is better? Vote Stimulus A PNC 2009 / Kuan Ta Chen 26

Features of Paired Comparison Generalizable across a variety of multimedia applications Simple comparative judgment Interval scale QoE scores can be calculated The users feedback can be verified PNC 2009 / Kuan Ta Chen 27

Verification of Users Inputs Transitivity property If A > B and B > C A should be > C Transitivity Satisfaction Rate (TSR) # of triples satisfy the transitivity rule # of triples the transitivity rule may apply to Detect inconsistent judgments from problematic users TSR = 1 perfect consistency TSR >= 0.8 generally consistent TSR < 0.8 judgments are consistent PNC 2009 / Kuan Ta Chen 28

Experiment Design Suppose our task is to evaluate the effect of n audio processing algorithms (e.g., audio encoding) 1. Select an audio clip (source clip) as the evaluation target 2. Apply the n algorithms to the source clip and generate n different versions of the clip (test clips) 3. Create an Adobe Flash based system for users to evaluate the n test clips n 2 4. A user need to perform paired comparisons PNC 2009 / Kuan Ta Chen 29

Concept Flow of Acoustic QoE Evaluation PNC 2009 / Kuan Ta Chen 30

Acoustic QoE Evaluation Which one is better? Simple pair comparison PNC 2009 / Kuan Ta Chen 31

Optical QoE evaluation Which one is better? Simple pair comparison PNC 2009 / Kuan Ta Chen 32

Acoustic QoE Evaluation MP3 compression level Source clips: one fast paced and one slow paced song MP3 CBR format with 6 bit rate levels: 32, 48, 64, 80, 96, and 128 Kbps 127 participants and 3,660 paired comparisons Effect of packet loss rate on VoIP Two speech codecs: G722.1 and G728 Packet loss rate: 0%, 4%, and 8% 62 participants and 1,545 paired comparisons PNC 2009 / Kuan Ta Chen 33

Evaluation Results MP3 Compression Level VoIP Packet Loss Rate PNC 2009 / Kuan Ta Chen 34

Video codec Optical QoE Evaluation Source clips: one fast paced and one slow paced video clip Three codecs: H.264, WMV3, and XVID Two bit rates: 400 and 800 Kbps 121 participants and 3,345 paired comparisons PNC 2009 / Kuan Ta Chen 35

Optical QoE Evaluation Loss concealment scheme Source clips: one fast paced and one slow paced video clip Two concealment schemes Frame copy (FC): conceal errors in a video frame by replacing a corrupted block with the block in the corresponding position in the previous frame Frame copy with frame skip (FCFS): a frame will be dropped if the percentage of corrupted slices in it exceeds 10%; otherwise apply the FC method to conceal the errors Packet loss rate: 1%, 5%, and 8% 91 participants and 2,745 paired comparisons PNC 2009 / Kuan Ta Chen 36

Evaluation Results Video Codec Concealment Scheme PNC 2009 / Kuan Ta Chen 37

Laboratory Participant Source Recruit part time workers at an hourly rate of 8 USD MTurk Post experiments on the Mechanical Turk web site Pay the participant 0.15 USD for each qualified experiment Community Seek participants on the website of an Internet community with 1.5 million members Pay the participant an amount of virtual currency that was equivalent to one US cent for each qualified experiment PNC 2009 / Kuan Ta Chen 38

Evaluation of Proposed Framework Three participant sources Laboratory Amazon Mechanical Turk (MTurk) Community Each with different cost structure We compare the cost required by each participant source and the data quality it produces PNC 2009 / Kuan Ta Chen 39

Summary The first crowdsourcable QoE evaluation framework Users inputs can be verified the transitivity property: A > B and B > C A > C detect inconsistent judgements from problematic users Experiments can thus be outsourced to Internet crowd lower monetary cost wider participant diversity maintaining the evaluation results quality Chen et al, "A Crowdsourceable QoE Evaluation Framework for Multimedia Content, Proceedings of ACM Multimedia 2009.

Quadrant of Euphoria http://mmnet.iis.sinica.edu.tw/link/qoe

Conclusion Crowdsourcing provides a new paradigm and a new platform for scientific researches New applications, new methodologies, and new businesses are emergent with the aid of crowdsouring PNC 2009 / Kuan Ta Chen 42

Thank You! Sheng Wei (Kuan Ta) Chen http://www.iis.sinica.edu.tw/~swc PNC 2009