Bayesian Nonparametrics and DPMM

Similar documents
Alternation in the repeated Battle of the Sexes

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition

Markov Chain Monte Carlo (MCMC)

A Survey on Machine-Learning Techniques in Cognitive Radios

An Adaptive Intelligence For Heads-Up No-Limit Texas Hold em

Detection of Compound Structures in Very High Spatial Resolution Images

Unsupervised Clustering of EO-1 ALI Panchromatic Data Using Multilevel Local Pattern Histograms and Latent Dirichlet Allocation Classification

A Message-Passing Receiver For BICM-OFDM Over Unknown Clustered-Sparse Channels. Phil Schniter T. H. E OHIO STATE UNIVERSITY

k-means Clustering David S. Rosenberg December 15, 2017 Bloomberg ML EDU David S. Rosenberg (Bloomberg ML EDU) ML 101 December 15, / 18

Hybrid Discriminative/Class-Specific Classifiers for Narrow-Band Signals

Machine Learning of Noise for LHD Thomson Scattering System. Keisuke Fujii, Kyoto univ.

A Decision Support System for Inbound Marketers: An Empirical Use of Latent Dirichlet Allocation Topic Model to Guide Infographic Designers

Review of Cooperative Localization with Factor Graphs. Aggelos Bletsas ECE TUC. Noptilus Project Sept. 2011

Section 6.4. Sampling Distributions and Estimators

State-Space Models with Kalman Filtering for Freeway Traffic Forecasting

Dynamic Structural Equation Models

IMPULSIVE NOISE MITIGATION IN OFDM SYSTEMS USING SPARSE BAYESIAN LEARNING

A Bayesian rating system using W-Stein s identity

Learning to Detect Adverse Traffic Events from Noisily Labeled Data

Performance Assessment of The Extended Gower Coefficient on Mixed Data with Varying Types of Functional Data.

Keywords: - Gaussian Mixture model, Maximum likelihood estimator, Multiresolution analysis

A Blind Pre-Processor for Modulation Classification Applications in Frequency-Selective Non-Gaussian Channels

GMMC: Gaussian Mixture Model Based Clustering Hierarchy Protocol in Wireless Sensor Network

Approximate Message Passing: Applications to Communications Receivers

A Bag of Systems Representation for Music Auto-tagging

Research on Friction Ridge Pattern Analysis

Channel Equalization in Radio-over-Fiber Transmission Links

Probability - Introduction Chapter 3, part 1

Non Intrusive Load Monitoring

Markov Chains in Pop Culture

RESOLUTION ENHANCEMENT FOR COLOR TWEAK IN IMAGE MOSAICKING SOLICITATIONS

Most typical tests can also be done as permutation tests. For example: Two sample tests (e.g., t-test, MWU test)

Performance Analysis of a 1-bit Feedback Beamforming Algorithm

Midterm for Name: Good luck! Midterm page 1 of 9

Outlier-Robust Estimation of GPS Satellite Clock Offsets

The fundamentals of detection theory

BIOS 312: MODERN REGRESSION ANALYSIS

computational social media lecture 04: shooting

Advanced Signal Processing and Digital Noise Reduction

Statistical modeling with stochastic processes. Alexandre Bouchard-Côté Winter 2011

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO

Population Structure and Genealogies

Environmental Sound Recognition using MP-based Features

Mobile Robot Positioning with 433-MHz Wireless Motes with Varying Transmission Powers and a Particle Filter

Revision of Channel Coding

Toward Automatic Transcription -- Pitch Tracking In Polyphonic Environment

Service Appointment Scheduling with Walk-In, Short-term, and Traditional Scheduling

Supplementary Information for paper Communicating with sentences: A multi-word naming game model

OFDM Pilot Optimization for the Communication and Localization Trade Off

Communications over Sparse Channels:

Interaction Learning

Comparative method, coalescents, and the future

Conditional Distributions

Object based Classification of Satellite images by Combining the HDP, IBP and k-mean on multiple scenes

Frugal Sensing Spectral Analysis from Power Inequalities

Chapter 4 SPEECH ENHANCEMENT

NEXT generation wireless communications systems are

Bayesian Analysis of Multiple Indicator Growth Modeling using Random Measurement Parameters Varying Across Time and Person

6.1 (CD-ROM TOPIC) USING THE STANDARDIZED NORMAL DISTRIBUTION TABLE

Discrete Random Variables Day 1

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays

Computer Vision. Intensity transformations

Developing the Model

Real Time Video Analysis using Smart Phone Camera for Stroboscopic Image

Non-Parametric Impulsive Noise Mitigation in OFDM Systems Using Sparse Bayesian Learning

Proposed Graduate Course at ANU: Statistical Communication Theory

Estimation of Omission Rate in Census Count

Suggested Solutions to Examination SSY130 Applied Signal Processing

A Non-parametric Multi-stage Learning Framework for Cognitive Spectrum Access in IoT Networks

Transmission characteristics of 4x4 MIMO system with OFDM multiplexing and Markov Chain Monte Carlo Receiver

Paper Presentation. Steve Jan. March 5, Virginia Tech. Steve Jan (Virginia Tech) Paper Presentation March 5, / 28

Outline. Randomized Algorithms for Motif Finding. Randomized Algorithms. PWMs Revisited. Motif finding: a probabilistic approach

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE

Autonomous Biconnected Networks of Mobile Robots

I will assign you to teams on Tuesday.

Noise Exposure History Interview Questions

Math March 12, Test 2 Solutions

ENERGY consumption is a key issue in wireless sensor. Distributed Estimation of Channel Gains in Wireless Sensor Networks

A PERFORMANCE-BASED APPROACH TO DESIGNING THE STIMULUS PRESENTATION PARADIGM FOR THE P300-BASED BCI BY EXPLOITING CODING THEORY

Spectral Reconstruction and Noise Model Estimation based on a Masking Model for Noise-Robust Speech Recognition

Frequency and Power Allocation for Low Complexity Energy Efficient OFDMA Systems with Proportional Rate Constraints

Tracking a Moving Target in Cluttered Environments with Ranging Radios

Comparative method, coalescents, and the future. Correlation of states in a discrete-state model

Bayesian Reliability Testing for New Generation Semiconductor Processing Equipment Paul Tobias and Michael Pore

A New Reduction Scheme for Gaussian Sum Filters

CSE 21 Practice Final Exam Winter 2016

Lab 3.0. Pulse Shaping and Rayleigh Channel. Faculty of Information Engineering & Technology. The Communications Department

CCMR Educational Programs

Recent Advances in Image Deblurring. Seungyong Lee (Collaboration w/ Sunghyun Cho)

Auto-tagging The Facebook

Parametric Approaches for Refractivity-from-Clutter Inversion

Notes on Optical Amplifiers

Community Detection and Labeling Nodes

Introduction to probability

Sampling distributions and the Central Limit Theorem

Individuality of Fingerprints

Opponent Modelling by Expectation-Maximisation and Sequence Prediction in Simplified Poker

Impulsive Noise Mitigation in Powerline Communications Using Sparse Bayesian Learning

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

Transcription:

Bayesian Nonparametrics and DPMM Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 17 Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 1 of 17

Clustering as Probabilistic Inference GMM is a probabilistic model (unlike K-means) There are several latent variables: Means Assignments (Variances) Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 2 of 17

Clustering as Probabilistic Inference GMM is a probabilistic model (unlike K-means) There are several latent variables: Means Assignments (Variances) Before, we were doing EM Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 2 of 17

Clustering as Probabilistic Inference GMM is a probabilistic model (unlike K-means) There are several latent variables: Means Assignments (Variances) Before, we were doing EM Today, new models and new methods Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 2 of 17

Nonparametric Clustering What if the number of clusters is not fixed? Nonparametric: can grow if data need it Probabilistic distribution over number of clusters Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 3 of 17

Dirichlet Process Distribution over distributions Parameterized by: α, G Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 4 of 17

Dirichlet Process Distribution over distributions Parameterized by: α, G Concentration parameter Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 4 of 17

Dirichlet Process Distribution over distributions Parameterized by: α, G Concentration parameter Base distribution Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 4 of 17

Dirichlet Process Distribution over distributions Parameterized by: α, G Concentration parameter Base distribution You can then draw observations from x DP(α, G). Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 4 of 17

Defining a DP Break off sticks Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 5 of 17

Defining a DP Break off sticks Draw atoms Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 5 of 17

Defining a DP Break off sticks Draw atoms Merge into complete distribution Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 5 of 17

Properties of a DPMM Expected value is the same as base distribution E DP(α,G) [x] = E G [x] (1) As α, DP(α, G) = G Number of components unbounded Impossible to represent fully on computer (truncation) You can nest DPs Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 6 of 17

Effect of scaling parameter α Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 7 of 17

DP as mixture Model Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 8 of 17

The Chinese Restaurant as a Distribution To generate an observation, you first sit down at a table. You sit down at a table proportional to the number of people sitting at the table. Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 9 of 17

The Chinese Restaurant as a Distribution To generate an observation, you first sit down at a table. You sit down at a table proportional to the number of people sitting at the table. 2 7 3 7 2 7 Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 9 of 17

The Chinese Restaurant as a Distribution To generate an observation, you first sit down at a table. You sit down at a table proportional to the number of people sitting at the table. 2 7 3 7 2 7 Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 9 of 17

The Chinese Restaurant as a Distribution To generate an observation, you first sit down at a table. You sit down at a table proportional to the number of people sitting at the table. 2 7 3 7 x µ 1 x µ 2 x µ 3 2 7 Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 9 of 17

The Chinese Restaurant as a Distribution To generate an observation, you first sit down at a table. You sit down at a table proportional to the number of people sitting at the table. 2 7 3 7 x µ 1 x µ 2 x µ 3 But this is just Maximum Likelihood Why are we talking about Chinese Restaurants? 2 7 Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 9 of 17

Always can squeeze in one more table... The posterior of a DP is CRP A new observation has a new table / cluster with probability proportional to α But this must be balanced against the probability of an observation given a cluster Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 10 of 17

Gibbs Sampling We want to know the cluster assignment of each observation Take a random guess initially Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 11 of 17

Gibbs Sampling We want to know the cluster assignment of each observation Take a random guess initially This provides a mean for each cluster Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 11 of 17

Gibbs Sampling We want to know the cluster assignment of each observation Take a random guess initially This provides a mean for each cluster Let the number of clusters grow Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 11 of 17

Gibbs Sampling We want to know the cluster assignment of each observation (tables) Take a random guess initially This provides a mean for each cluster Let the number of clusters grow Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 11 of 17

Gibbs Sampling We want to know z Compute p(z i z 1... z i 1, z i+1,... z m, x, α, G) Update z i by sampling from that distribution Keep going... Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 12 of 17

Gibbs Sampling We want to know z Compute p(z i z 1... z i 1, z i+1,... z m, x, α, G) Update z i by sampling from that distribution Keep going... Notation p(z i = k z i ) p(z i z 1... z i 1, z i+1,... z m ) (2) Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 12 of 17

Gibbs Sampling for DPMM p(z i = k z i, x, {θ k }, α) (3) (4) Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 13 of 17

Gibbs Sampling for DPMM p(z i = k z i, x, {θ k }, α) (3) =p(z i = k z i, x i, x, θ k, α) (4) (5) Dropping irrelevant terms Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 13 of 17

Gibbs Sampling for DPMM p(z i = k z i, x, {θ k }, α) (3) =p(z i = k z i, x i, x, θ k, α) (4) =p(z i = k z i, α)p(x i θ k, x) (5) (6) Chain rule Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 13 of 17

Gibbs Sampling for DPMM Applying CRP p(z i = k z i, x, {θ k }, α) (3) =p(z i = k z i, x i, x, θ k, α) (4) =p(z i = k z i, α)p(x i θ k, x) (5) = {( nk n +α) θ p(x i θ)p(θ G, x) existing θ p(x i θ)p(θ G) new α n +α (6) (7) Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 13 of 17

Gibbs Sampling for DPMM p(z i = k z i, x, {θ k }, α) (3) =p(z i = k z i, x i, x, θ k, α) (4) =p(z i = k z i, α)p(x i θ k, x) (5) = = {( nk n +α) θ p(x i θ)p(θ G, x) existing α θ p(x i θ)p(θ G) new ) ( ) n x N x, n+1, 1 existing n +α {( nk n +α (6) α n +α N (x, 0, 1) new (7) Scary integrals assuming G is normal distribution with mean zero and unit variance. (Derived in optional reading.) Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 13 of 17

Algorithm for Gibbs Sampling 1. Random initial assignment to clusters 2. For iteration i: 2.1 Unassign observation n 2.2 Choose new cluster for that observation Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 14 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example New cluster created! Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Toy Example And repeat... Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 15 of 17

Differences between EM and Gibbs Gibbs often faster to implement EM easier to diagnose convergence EM can be parallelized Gibbs is more widely applicable Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 16 of 17

In class and next week Walking through DPMM clustering Clustering discrete data with more than one cluster per observation Machine Learning: Jordan Boyd-Graber Boulder Bayesian Nonparametrics and DPMM 17 of 17