Conditional Distributions X, Y discrete: the conditional pmf of X given Y y is defined to be p X Y (x y) P(X x, Y y) P(Y y) p(x, y) p Y (y), p Y (y) > 0. Given Y y, the randomness of X is described by p(x, y) but p(x, y) is NOT a pmf wrt a x since all x p(x, y) 1. We need this normalizing constant p Y (y) to make it a valid pmf. a wrt with respect to 1
X, Y continuous: the conditional pdf of X given Y y is defined to be f X Y (x y) f(x, y) f Y (y), f Y (y) > 0. Given Y y, f(x, y) is NOT a pdf wrt x, since f(x, y)dx fy (y) 1. So we need f Y (y) in the denominator to make it a legit pdf. Check the Wolfram Demo. 2
If X and Y independent, f X Y (x y) f(x, y) f Y (y) f X(x)f Y (y) f Y (y) f X (x) due to independence p X Y (x y) p X (x) In general, X and Y are dependent and then f X Y (x y) f X (x): Given the extra information that Y y, the distribution of X is no longer the same as the marginal f X (x). 3
Example 1 (2.1.1 on p.74, Revisit) X / Y 0 1 2 3 p X (x) 0 1/8 1/8 0 0 2/8 1 0 2/8 2/8 0 4/8 2 0 0 1/8 1/8 2/8 p Y (y) 1/8 3/8 3/8 1/8 a) Find the conditional pmf p Y X (y x), conditional expectation E(Y X x) and conditional variance Var(Y X x). 0 1 2 3 E(Y X x) Var(Y X x) p Y X (y 0) 0.5 0.5 0 0 0.5 1/4 p Y X (y 1) 0 0.5 0.5 0 1.5 1/4 p Y X (y 2) 0 0 0.5 0.5 2.5 1/4 4
Conditional Expectations E(X Y y) x x P(X x Y y) x x p X Y (x y). E(X Y y) x f X Y (x y)dx What does the symbol E(X Y ) mean? You can view it as a function of Y, i.e., E(X Y ) g(y ) with its value at Y y given by g(y) E(X Y y). Therefore E(X Y ) is a random variable. We can talk about its distribution (HW1, p7) and compute its mean and variance. 5
Example 1 (2.1.1 on p.74, Revisit) E(Y X) is a r.v., which equals E(Y X x) with probability p X (x). That is, E(Y X) 0.5 with prob p X (0) 1/4, E(Y X) 1.5 with prob p X (1) 1/2, E(Y X) 2.5 with prob p X (2) 1/4. What s the expectation of the r.v. E(Y X)? E E(Y X) (0.5) 1 4 +(1.5)1 2 +(2.5)1 4 which is the same as 0.5 + (1.5)(2) + 2.5 4 6 4 1.5 EY (0) 1 8 + (1)3 8 + (2)3 8 + (3)1 8 12 8 1.5. This is true for any joint dist, EE(Y X) EY, due to the iterative rule for expectation. 6
Iterative Rule E E(X Y ) EX a. E E(X Y ) E(X Y y) f Y (y)dy x x f X Y (x y)dx f Y (y)dy f(x, y) f Y (y) f Y (y)dxdy x f(x, y)dydx xf X (x)dx EX. x x x f(x, y)dxdy f(x, y)dy dx f(x, y) f Y (y) dx f Y (y)dy Similarly we have E E(g(X) Y ) Eg(X). a What s more useful is E X X EY E(X Y ). 7
Sometimes, I ll write the conditional expectation E Y as E X Y especially when has a lengthy expression, where E X Y just means that taking expectation of X with respect to the conditional distribution of X given Y a. I also use notations like E Y in the slides, to remind you that this expectation is over Y only, wrt the marginal distribution f Y (y). Similarly, E X refers to the expectation over X wrt f X (x) Usually the meaning of expectation is clear from the context, e.g., Eg(X) must be E X g(x), so you don t need to write subscripts in your homework/exam. a Note that E X Y would only average over X but treat Y as a constant. 8
The general Iterative Rule Eg(X, Y ) E Y E X Y g(x, Y ) LHS Eg(X, Y ) f(x, y)g(x, y)dxdy RHS E Y EX Y g(x, Y ) f Y (y) f Y (y) f X Y (x y)g(x, y)dx dy f(x, y) g(x, y)dxdy f Y (y) f(x, y)g(x, y)dxdy This is essentially the same as the Chain Rule of probability. 9
Useful Properties Linearity E(aX 1 + bx 2 Y ) ae(x 1 Y ) + be(x 2 Y ) Take constants outside an expectation Eg(Y ) Y g(y ), Eg(Y )X Y g(y )E(X Y ) In particular, EE(X Y ) Y E(X Y ) Iterative rule E Y E(X Y ) E(X), (X ) E E(X Y ) g(y ) 0 E Y E X Y ( X E(X Y ) ) g(y ) EY E(g(X) Y ) Eg(X) ( ) E Y g(y ) E X Y X E(X Y ) E Y g(y ) E(X Y ) E(X Y ) 10
Similarly, we can define conditional variance Var(X Y ) that is the variance of r.v. g(y ) E(X Y ) (check the calculation for Example 1). (X ) 2 Y Var(X Y ) E E(X Y ) E X 2 2X E(X Y ) + E(X Y ) 2 Y E X 2 Y 2E X E(X Y ) Y + E E(X Y ) 2 Y E X 2 Y 2E(X Y )E(X Y ) + E(X Y ) 2 E(X 2 Y ) E(X Y ) 2 (Conditional 2nd Moment) (Conditional Mean) 2. Note that (shown on the next slide) Var(X) E(Var(X Y )) + Var(E(X Y )) 11
Var(X) E(X µ X ) 2 E(X E(X Y ) + E(X Y ) µ X ) 2 E X E(X Y ) 2 + E E(X Y ) µx 2 +2E (X E(X Y )(E(X Y ) µ X ) E Y { E X Y X E(X Y ) 2 } + E Y E X Y E(X Y ) µx 2 +2E Y { EX Y (X E(X Y )(E(X Y ) µx ) } E(Var(X Y )) + Var(E(X Y )) 12
You may get confused with the expression E(X Y ) on the previous slide. Let s go through the proof again with notation g(y ) E(X Y ) and note that Eg(Y ) EX µ X. Var(X) E(X µ X ) 2 E(X g(y ) + g(y ) µ X ) 2 E X,Y X g(y ) 2 + EY g(y ) µx 2 +2E X,Y (X g(y ))(g(y ) µx ) E Y { E X Y X g(y ) 2 } + E g(y ) µ X 2 +2E Y { EX Y (X g(y ))(g(y ) µx ) } E(Var(X Y )) + Var(E(X Y )) 13
How to understand Var(X) E(Var(X Y )) + Var(E(X Y )) Let X denote the height of a randomly chosen student from stat410. Suppose students can be divided into several sub-populations (r.v. Y ). The height (r.v. X) variation comes from two sources: Variation within each sub-population (variation of X given Y ) Variation among the mean height for each sub-population (variation of E(X Y )) The total variation is the sum of these two. 14
Example 2: The joint pdf is f(x, y) 60x 2 y, 0 x, y 1, x + y 1, zero, elsewhere. (JointDistributions.pdf, ConditionalDistributions.pdf) We have computed the marginal pdf f X (x) 30x 2 (1 x) 2, 0 < x < 1, E(X) 1 2, f Y (y) 20y(1 y) 3, 0 < y < 1, E(Y ) 1 3. a) Find the conditional pdf f X Y (x y) of X given Y y, 0 < y < 1. f X Y (x y) f(x, y) f Y (y) 3x2 (1 y) 3, 0 < x < 1 y. Check f X Y (x y) is a valid pdf wrt x: apparently f X Y (x y) 0, f X Y (x y)dx 1 y 0 3x 2 /(1 y) 3 dx 1. 15
b) Find P(X > 1 2 Y 1 3 ). Calculate this conditional probability using conditional pdf: f X Y (x 1/3) 3x 2 (1 1/3) 3 3x2 (2/3) 3, 0 < x < 2/3. ( P X > 1 2 Y 1 ) 3 2/3 1/2 1 27 64 37 64. 3x 2 x3 2/3 dx (2/3) 3 (2/3) 3 1/2 16
How to use conditional pmf/pdf to evaluate P(a < X < b Y y)? For discrete rvs, you can either use conditional pmf a<x<b p X Y (x y) a<x<b p(x, y) p Y (y) a<x<b or just follow the definition of conditional probability P(a < X < b, Y y) a<x<b p(x, y). P(Y y) p(x, y) all x p(x, y), p Y (y) For continuous rvs, we CANNOT evaluate this probability via P(a < X < b, Y y)/p(y y) as in the discrete case, since P(Y y) 0, instead we need to use conditional pdf P(a < X < b Y y) b a f X Y (x y)dx. 17
Go through other examples from ConditionalDistributions.pdf 18