Conditional Distributions

Similar documents
Mixture of Discrete and Continuous Random Variables

Topic 6: Joint Distributions

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

Joint Distributions, Independence Class 7, Jeremy Orloff and Jonathan Bloom

Intro to Probability Instructor: Alexandre Bouchard

3.5 Marginal Distributions

I will assign you to teams on Tuesday.

Stat 35: Introduction to Probability. Outline for the day: 1. Hand in HW1. See hw2. 2. Violette and Elezra again. 3. Variance and SD. 4.

Multiple Integrals. Advanced Calculus. Lecture 1 Dr. Lahcen Laayouni. Department of Mathematics and Statistics McGill University.

I will assign you to teams on Tuesday.

EE 126 Fall 2006 Midterm #1 Thursday October 6, 7 8:30pm DO NOT TURN THIS PAGE OVER UNTIL YOU ARE TOLD TO DO SO

HW1 is due Thu Oct 12 in the first 5 min of class. Read through chapter 5.

CHAPTER 11 PARTIAL DERIVATIVES

FUNCTIONS OF SEVERAL VARIABLES AND PARTIAL DIFFERENTIATION

Estimating Areas. is reminiscent of a Riemann Sum and, amazingly enough, will be called a Riemann Sum. Double Integrals

Stat 100a: Introduction to Probability.

Calculus II Fall 2014

DIFFERENTIAL EQUATIONS. A principal model of physical phenomena.

Independent of path Green s Theorem Surface Integrals. MATH203 Calculus. Dr. Bandar Al-Mohsin. School of Mathematics, KSU 20/4/14

ECE313 Summer Problem Set 4. Reading: RVs, mean, variance, and coniditional probability

Discrete Random Variables Day 1

33. Riemann Summation over Rectangular Regions

14.4. Tangent Planes. Tangent Planes. Tangent Planes. Tangent Planes. Partial Derivatives. Tangent Planes and Linear Approximations

2.1 Partial Derivatives

REVIEW SHEET FOR MIDTERM 2: ADVANCED

Math 2311 Bekki George Office Hours: MW 11am to 12:45pm in 639 PGH Online Thursdays 4-5:30pm And by appointment

Review guide for midterm 2 in Math 233 March 30, 2009

STAT Statistics I Midterm Exam One. Good Luck!

EE 451: Digital Signal Processing

Partial Differentiation 1 Introduction

Maxima and Minima. Terminology note: Do not confuse the maximum f(a, b) (a number) with the point (a, b) where the maximum occurs.

Stat 100a: Introduction to Probability. NO CLASS or OH Tue Mar 10. Hw3 is due Mar 12.

Functions of several variables

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO

MATH 20C: FUNDAMENTALS OF CALCULUS II FINAL EXAM

CS1802 Week 6: Sets Operations, Product Sum Rule Pigeon Hole Principle (Ch )

Review #Final Exam MATH 142-Drost

MA 524 Midterm Solutions October 16, 2018

MATH 12 CLASS 9 NOTES, OCT Contents 1. Tangent planes 1 2. Definition of differentiability 3 3. Differentials 4

COS Lecture 7 Autonomous Robot Navigation

VECTOR CALCULUS Julian.O 2016

4 to find the dimensions of the rectangle that have the maximum area. 2y A =?? f(x, y) = (2x)(2y) = 4xy

Chapter 16. Partial Derivatives

13.1 Double Integral over Rectangle. f(x ij,y ij ) i j I <ɛ. f(x, y)da.

CRF and Structured Perceptron

Lecture 4 : Monday April 6th

10.1 Curves defined by parametric equations

Goals: To study constrained optimization; that is, the maximizing or minimizing of a function subject to a constraint (or side condition).

EE 451: Digital Signal Processing

INTRODUCTORY STATISTICS LECTURE 4 PROBABILITY

QAT Sample Questions (SET 3)

11.2 LIMITS AND CONTINUITY

Math March 12, Test 2 Solutions

11.7 Maximum and Minimum Values

Exam Time. Final Exam Review. TR class Monday December 9 12:30 2:30. These review slides and earlier ones found linked to on BlackBoard

ECS 20 (Spring 2013) Phillip Rogaway Lecture 1

ES 111 Mathematical Methods in the Earth Sciences Lecture Outline 6 - Tues 17th Oct 2017 Functions of Several Variables and Partial Derivatives

Definitions and claims functions of several variables


Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

Math Exam 2 Review. NOTE: For reviews of the other sections on Exam 2, refer to the first page of WIR #4 and #5.

1. Describe the sample space and all 16 events for a trial in which two coins are thrown and each shows either a head or a tail.

, x {1, 2, k}, where k > 0. (a) Write down P(X = 2). (1) (b) Show that k = 3. (4) Find E(X). (2) (Total 7 marks)

Team 13: Cián Mc Leod, Eoghan O Neill, Ruaidhri O Dowd, Luke Mulcahy

5. Aprimenumberisanumberthatisdivisibleonlyby1anditself. Theprimenumbers less than 100 are listed below.

Section 6.4. Sampling Distributions and Estimators

Due Friday February 17th before noon in the TA drop box, basement, AP&M. HOMEWORK 3 : HAND IN ONLY QUESTIONS: 2, 4, 8, 11, 13, 15, 21, 24, 27

Differentiable functions (Sec. 14.4)

Exam 2 Review Sheet. r(t) = x(t), y(t), z(t)

Exam 2 Summary. 1. The domain of a function is the set of all possible inputes of the function and the range is the set of all outputs.

IMAGE PROCESSING (RRY025) THE CONTINUOUS 2D FOURIER TRANSFORM

DIFFERENTIAL EQUATIONS. A principal model of physical phenomena.

Stat Summer 2012 Exam 1. Your Name:

Mathematics 205 HWK 19b Solutions Section 16.2 p750. (x 2 y) dy dx. 2x 2 3

LESSON 18: INTRODUCTION TO FUNCTIONS OF SEVERAL VARIABLES MATH FALL 2018

The Chain Rule, Higher Partial Derivatives & Opti- mization

smart board notes ch 6.notebook January 09, 2018

Scheduling in omnidirectional relay wireless networks

Solutions to the problems from Written assignment 2 Math 222 Winter 2015

Review Sheet for Math 230, Midterm exam 2. Fall 2006

Final Exam Review Problems. P 1. Find the critical points of f(x, y) = x 2 y + 2y 2 8xy + 11 and classify them.

Image preprocessing in spatial domain

Limits and Continuity

1. The masses, x grams, of the contents of 25 tins of Brand A anchovies are summarized by x =

LECTURE 19 - LAGRANGE MULTIPLIERS

Similarly, the point marked in red below is a local minimum for the function, since there are no points nearby that are lower than it:

Math 116 Calculus II

i + u 2 j be the unit vector that has its initial point at (a, b) and points in the desired direction. It determines a line in the xy-plane:

Review Questions on Ch4 and Ch5

MULTI-VARIABLE OPTIMIZATION NOTES. 1. Identifying Critical Points

14.7 Maximum and Minimum Values

Vector Calculus. 1 Line Integrals

Probability I Sample spaces, outcomes, and events.

CS1802 Week 6: Sets Operations, Product Sum Rule Pigeon Hole Principle (Ch )

Lecture 19 - Partial Derivatives and Extrema of Functions of Two Variables

APPENDIX 2.3: RULES OF PROBABILITY

Lecture 19. Vector fields. Dan Nichols MATH 233, Spring 2018 University of Massachusetts. April 10, 2018.

Pre-AP Algebra 2 Unit 8 - Lesson 2 Graphing rational functions by plugging in numbers; feature analysis

Digital Image Processing. Lecture # 4 Image Enhancement (Histogram)

Class XII Chapter 13 Probability Maths. Exercise 13.1

Transcription:

Conditional Distributions X, Y discrete: the conditional pmf of X given Y y is defined to be p X Y (x y) P(X x, Y y) P(Y y) p(x, y) p Y (y), p Y (y) > 0. Given Y y, the randomness of X is described by p(x, y) but p(x, y) is NOT a pmf wrt a x since all x p(x, y) 1. We need this normalizing constant p Y (y) to make it a valid pmf. a wrt with respect to 1

X, Y continuous: the conditional pdf of X given Y y is defined to be f X Y (x y) f(x, y) f Y (y), f Y (y) > 0. Given Y y, f(x, y) is NOT a pdf wrt x, since f(x, y)dx fy (y) 1. So we need f Y (y) in the denominator to make it a legit pdf. Check the Wolfram Demo. 2

If X and Y independent, f X Y (x y) f(x, y) f Y (y) f X(x)f Y (y) f Y (y) f X (x) due to independence p X Y (x y) p X (x) In general, X and Y are dependent and then f X Y (x y) f X (x): Given the extra information that Y y, the distribution of X is no longer the same as the marginal f X (x). 3

Example 1 (2.1.1 on p.74, Revisit) X / Y 0 1 2 3 p X (x) 0 1/8 1/8 0 0 2/8 1 0 2/8 2/8 0 4/8 2 0 0 1/8 1/8 2/8 p Y (y) 1/8 3/8 3/8 1/8 a) Find the conditional pmf p Y X (y x), conditional expectation E(Y X x) and conditional variance Var(Y X x). 0 1 2 3 E(Y X x) Var(Y X x) p Y X (y 0) 0.5 0.5 0 0 0.5 1/4 p Y X (y 1) 0 0.5 0.5 0 1.5 1/4 p Y X (y 2) 0 0 0.5 0.5 2.5 1/4 4

Conditional Expectations E(X Y y) x x P(X x Y y) x x p X Y (x y). E(X Y y) x f X Y (x y)dx What does the symbol E(X Y ) mean? You can view it as a function of Y, i.e., E(X Y ) g(y ) with its value at Y y given by g(y) E(X Y y). Therefore E(X Y ) is a random variable. We can talk about its distribution (HW1, p7) and compute its mean and variance. 5

Example 1 (2.1.1 on p.74, Revisit) E(Y X) is a r.v., which equals E(Y X x) with probability p X (x). That is, E(Y X) 0.5 with prob p X (0) 1/4, E(Y X) 1.5 with prob p X (1) 1/2, E(Y X) 2.5 with prob p X (2) 1/4. What s the expectation of the r.v. E(Y X)? E E(Y X) (0.5) 1 4 +(1.5)1 2 +(2.5)1 4 which is the same as 0.5 + (1.5)(2) + 2.5 4 6 4 1.5 EY (0) 1 8 + (1)3 8 + (2)3 8 + (3)1 8 12 8 1.5. This is true for any joint dist, EE(Y X) EY, due to the iterative rule for expectation. 6

Iterative Rule E E(X Y ) EX a. E E(X Y ) E(X Y y) f Y (y)dy x x f X Y (x y)dx f Y (y)dy f(x, y) f Y (y) f Y (y)dxdy x f(x, y)dydx xf X (x)dx EX. x x x f(x, y)dxdy f(x, y)dy dx f(x, y) f Y (y) dx f Y (y)dy Similarly we have E E(g(X) Y ) Eg(X). a What s more useful is E X X EY E(X Y ). 7

Sometimes, I ll write the conditional expectation E Y as E X Y especially when has a lengthy expression, where E X Y just means that taking expectation of X with respect to the conditional distribution of X given Y a. I also use notations like E Y in the slides, to remind you that this expectation is over Y only, wrt the marginal distribution f Y (y). Similarly, E X refers to the expectation over X wrt f X (x) Usually the meaning of expectation is clear from the context, e.g., Eg(X) must be E X g(x), so you don t need to write subscripts in your homework/exam. a Note that E X Y would only average over X but treat Y as a constant. 8

The general Iterative Rule Eg(X, Y ) E Y E X Y g(x, Y ) LHS Eg(X, Y ) f(x, y)g(x, y)dxdy RHS E Y EX Y g(x, Y ) f Y (y) f Y (y) f X Y (x y)g(x, y)dx dy f(x, y) g(x, y)dxdy f Y (y) f(x, y)g(x, y)dxdy This is essentially the same as the Chain Rule of probability. 9

Useful Properties Linearity E(aX 1 + bx 2 Y ) ae(x 1 Y ) + be(x 2 Y ) Take constants outside an expectation Eg(Y ) Y g(y ), Eg(Y )X Y g(y )E(X Y ) In particular, EE(X Y ) Y E(X Y ) Iterative rule E Y E(X Y ) E(X), (X ) E E(X Y ) g(y ) 0 E Y E X Y ( X E(X Y ) ) g(y ) EY E(g(X) Y ) Eg(X) ( ) E Y g(y ) E X Y X E(X Y ) E Y g(y ) E(X Y ) E(X Y ) 10

Similarly, we can define conditional variance Var(X Y ) that is the variance of r.v. g(y ) E(X Y ) (check the calculation for Example 1). (X ) 2 Y Var(X Y ) E E(X Y ) E X 2 2X E(X Y ) + E(X Y ) 2 Y E X 2 Y 2E X E(X Y ) Y + E E(X Y ) 2 Y E X 2 Y 2E(X Y )E(X Y ) + E(X Y ) 2 E(X 2 Y ) E(X Y ) 2 (Conditional 2nd Moment) (Conditional Mean) 2. Note that (shown on the next slide) Var(X) E(Var(X Y )) + Var(E(X Y )) 11

Var(X) E(X µ X ) 2 E(X E(X Y ) + E(X Y ) µ X ) 2 E X E(X Y ) 2 + E E(X Y ) µx 2 +2E (X E(X Y )(E(X Y ) µ X ) E Y { E X Y X E(X Y ) 2 } + E Y E X Y E(X Y ) µx 2 +2E Y { EX Y (X E(X Y )(E(X Y ) µx ) } E(Var(X Y )) + Var(E(X Y )) 12

You may get confused with the expression E(X Y ) on the previous slide. Let s go through the proof again with notation g(y ) E(X Y ) and note that Eg(Y ) EX µ X. Var(X) E(X µ X ) 2 E(X g(y ) + g(y ) µ X ) 2 E X,Y X g(y ) 2 + EY g(y ) µx 2 +2E X,Y (X g(y ))(g(y ) µx ) E Y { E X Y X g(y ) 2 } + E g(y ) µ X 2 +2E Y { EX Y (X g(y ))(g(y ) µx ) } E(Var(X Y )) + Var(E(X Y )) 13

How to understand Var(X) E(Var(X Y )) + Var(E(X Y )) Let X denote the height of a randomly chosen student from stat410. Suppose students can be divided into several sub-populations (r.v. Y ). The height (r.v. X) variation comes from two sources: Variation within each sub-population (variation of X given Y ) Variation among the mean height for each sub-population (variation of E(X Y )) The total variation is the sum of these two. 14

Example 2: The joint pdf is f(x, y) 60x 2 y, 0 x, y 1, x + y 1, zero, elsewhere. (JointDistributions.pdf, ConditionalDistributions.pdf) We have computed the marginal pdf f X (x) 30x 2 (1 x) 2, 0 < x < 1, E(X) 1 2, f Y (y) 20y(1 y) 3, 0 < y < 1, E(Y ) 1 3. a) Find the conditional pdf f X Y (x y) of X given Y y, 0 < y < 1. f X Y (x y) f(x, y) f Y (y) 3x2 (1 y) 3, 0 < x < 1 y. Check f X Y (x y) is a valid pdf wrt x: apparently f X Y (x y) 0, f X Y (x y)dx 1 y 0 3x 2 /(1 y) 3 dx 1. 15

b) Find P(X > 1 2 Y 1 3 ). Calculate this conditional probability using conditional pdf: f X Y (x 1/3) 3x 2 (1 1/3) 3 3x2 (2/3) 3, 0 < x < 2/3. ( P X > 1 2 Y 1 ) 3 2/3 1/2 1 27 64 37 64. 3x 2 x3 2/3 dx (2/3) 3 (2/3) 3 1/2 16

How to use conditional pmf/pdf to evaluate P(a < X < b Y y)? For discrete rvs, you can either use conditional pmf a<x<b p X Y (x y) a<x<b p(x, y) p Y (y) a<x<b or just follow the definition of conditional probability P(a < X < b, Y y) a<x<b p(x, y). P(Y y) p(x, y) all x p(x, y), p Y (y) For continuous rvs, we CANNOT evaluate this probability via P(a < X < b, Y y)/p(y y) as in the discrete case, since P(Y y) 0, instead we need to use conditional pdf P(a < X < b Y y) b a f X Y (x y)dx. 17

Go through other examples from ConditionalDistributions.pdf 18