Comparison of Two Measurement Devices I. Fundamental Ideas.

Comparson of Two Measurement Devces I. Fundamental Ideas. ASQ-RS Qualty Conference March 16, 005 Joseph G. Voelkel, COE, RIT Bruce Sskowsk Rechert, Inc. Topcs The Problem, Eample, Mathematcal Model One Soluton: Bland-Altman Plots Better Data A Comparson to Gage R&R Mandel s Estmate Our Method Models Structural Equatons, Path Dagrams Our Method Analyss Informal Graphs and Background Checks Formal Lkelhood Methods Rev: 03/15/05 ASQ-RS 005 - The Problem Two measurng devces need to be compared You make these and are desgnng a new verson or new model: better than the old? You use these and a new one has been added to the lab. How does t compare to the current one? (Can etend to more than two ) No Standard No standard ests for what s the rght answer A standard ests but s hard to come by $$ A standard ests but s not realstc Rev: 03/15/05 ASQ-RS 005-3

Eamples Blood pressure Cardac Output Fck method Dyel dluton Thermal dluton The Problem Correct answer hard to come by Even Gold Standard has measurement error Rev: 03/15/05 ASQ-RS 005-4 Tonometer Medcal screenng devce that measures ntraocular pressure of the human eye. Pressure acts on retna and optc nerve. Increased sustaned pressures above 3mm Hg can lead to vson loss condton glaucoma. If tonometer ndcates possble rsk, an M.D. of ophthalmology runs other detaled tests for a more accurate dagnoss. Rev: 03/15/05 ASQ-RS 005-5 Tonometer Problem wth tonometer calbraton Dffcult to put pressure sensors nsde the human eye to measure eact values Sensor nserton surgery ests (!) but would change the eye anyway Orgnal gold standard s Goldman Applanaton Tonometer (GAT) that touches the eye Rechert Eample of a contact tonometer Rev: 03/15/05 ASQ-RS 005-6

Tonometer Rechert nvented several non-contact ar-puff versons snce 197 that Do not requre eye anesthetc drops Do reduce operator varaton va computerzed automaton. Rechert s goal s to use/create better statstcal tests to prove Rechert tonometers have less measurement repeatablty varaton than the GAT Prevalent technques only check agreement and bas across a sample populaton. Rechert Rev: 03/15/05 ASQ-RS 005-7 Tonometer Eample Two tonometers (dfferent models). The reference devce s called MD and the devce under test s MD y. Eample slghtly smplfed from orgnal study. Only measurements of the left eye, n mm Hg. (Coded.) Study performed by selectng a sample of subjects. Each subject measured wth MD and then wth MD y Rev: 03/15/05 ASQ-RS 005-8 Data 5 MD 0 15 15 0 5 MDy Note: Ths s only based on one readng from each eye We wll later consder averages based on multple readngs per eye (more common) Rev: 03/15/05 ASQ-RS 005-9

Data 0 MD 15 Two hghest MD values set asde Based on only one readng from each eye 15 0 MDy Rev: 03/15/05 ASQ-RS 005 - Are the Two Devces Equvalent? And Other Questons What does t mean to say equvalent? And f they are not equvalent, n what way are they not equvalent? Rev: 03/15/05 ASQ-RS 005-11 A (Tentatve) Mathematcal Model... Long-term average MD 1 3 4 N rght now ("true?") X1 X... X3 X4 XN Observed What does t mean to say equvalent? MD y y1 y... y3 y4 yn Y1 Y Y3 Y4... YN (Tentatve wll try to use data to see f reasonable) Rev: 03/15/05 ASQ-RS 005-1

A Mathematcal Model 1. Where dd these subjects come from?? 1 3... 4 N r.s. sze N from a pop'n. What do the s look lke n the populaton? nd N ( µ, ) Our s Rev: 03/15/05 ASQ-RS 005-13 A Mathematcal Model 3. What do we observe? ( µ ) nd N, ( e ) X = + e, e nd N 0, e s the measurement error Rev: 03/15/05 ASQ-RS 005-14 A Mathematcal Model The dstrbuton and, say, 1 ( µ ) nd N, 0.15 0. Dst 0.05 0.00 5 15 0 5 Rev: 03/15/05 ASQ-RS 005-15

A Mathematcal Model The dstrbuton and, say, 1 The X dstrbuton at 1. Also, X 1 ( e ) X = + e, e nd N 0, 0.0 0.15 0. 0.05 0.00 5 15 0 5 Rev: 03/15/05 ASQ-RS 005-16 A Mathematcal Model, under Equvalency 4. What about the y s? Should have some connecton to the s! Equvalency Model 1 y = ( u) Y = y + u, u nd N 0, = u e Rev: 03/15/05 ASQ-RS 005-17 A Mathematcal Model, under Equvalency 0.0 0.15 0. 0.05 0.00 Y y 5 15 0 5 Rev: 03/15/05 ASQ-RS 005-18

Analyss, to See f Model s Reasonable? ˆ Medcal researchers β XY = 0.86 s.e.=0.07 0 Regresson of X on Y? Regresson of Y on X? Correlaton of Y and X? 15 ˆ β = 0.70 s.e.=0.06 XY MD 0 MDy 15 15 0 MDy ˆ ρ XY = 0.78 Based on only one readng from each eye 15 0 MD Rev: 03/15/05 ASQ-RS 005-19 Bland-Altman Response to ths state of affars Instead of Y vs X Plot Y-X vs average(y & X) A dfference-mean plot Used before, e.g. Tukey Then look for agreement Rev: 03/15/05 ASQ-RS 005-0 MD 5 0 15 6 Data. All 93. Based on only one readng from each eye 15 0 5 MDy X Y 5-15 MD 0-6 MDy 15 0 5 1 14 16 18 0 4 Aver X&Y Rev: 03/15/05 ASQ-RS 005-1

Bland-Altman Use graph to check for Outlers If so, decde what to do Lnear Trends If so, use to eamne amount of bas More Spread at hgher Aver(X&Y) values If so, try log transformaton If all OK, summarze agreement by s.e.(x Y) Here, f only use N=91, get s.e.=.0 Rev: 03/15/05 ASQ-RS 005 - Bland-Altman Became very popular method Bland-Altman became the voce to ensure good studes Rev: 03/15/05 ASQ-RS 005-3 Back to Model Thnkng So far, have just defned equvalent devces. More generally, consder model wth possble lnear bas Model u e ( µ ) nd N, y = β0 + β1 X = + e, e nd N 0, Y = y + u, u nd N 0, = ( ) ( ) e u Rev: 03/15/05 ASQ-RS 005-4

Another Model Last model possble lnear bas but same measurement s.d. s Ths model no lnear bas but possble dfferent measurement s.d. s nd N( µ, ) y = Model ' X = + e, e nd N( 0, e ) Y = y + u, u nd N 0, ( ) u Rev: 03/15/05 ASQ-RS 005-5 And Another Model A model wth possble lnear bas and dfferent measurement s.d. s nd N( µ, ), y = β0 + β1 X = + e, e nd N( 0, e ) Y = y + u, u nd N 0, ( ) u Model 3 Very reasonable! MD and MD y measurng the same feature, but possbly un-calbrated and possbly wth dfferent precson. Rev: 03/15/05 ASQ-RS 005-6 Informaton n the Data for Model 3 Under Model 3 assumptons, t s well known that all the nformaton n nd N( µ, ) the data can be summarzed wth 5 numbers: y = β0 + β1 X = + e, e nd N 0, e X, Y, s X, sy, rx, Y (or Cov( X, Y) ) Y = y + u, u nd N 0, u µ,, β0, β1, e, u A slght problem: there are 6 parameters that must be estmated n the Model!!! ( ) ( ) Rev: 03/15/05 ASQ-RS 005-7

Model 3 Problem Model 3 s sad to be undentfable wth the data avalable E X = µ E Y = β0 + β 1µ E s = + X e E s = β + Y 1 u (, ) E Cov X Y = β 1 Rev: 03/15/05 ASQ-RS 005-8 Model 3 Problem Model 3: undentfable wth the data avalable Bland and Altman stll advocate ther method What does t gve Does not allow bas to be estmated cleanly Does not gve a pure estmated measure of agreement, but does gve a lower bound of t. ( ) E sx Y = β1 1 + e+ u So, our s.e.=.0 s a lower bound estmate of the s.d. of the dfferences Rev: 03/15/05 ASQ-RS 005-9 Bland and Altman: A Queston Is agreement really want we want to eamne? If there s lack of agreement, do we know why? whch devce, f ether, s better? Rev: 03/15/05 ASQ-RS 005-30

Ths s smple Better Data Collect more than one observaton for each subject! Rev: 03/15/05 ASQ-RS 005-31 MD (Our total data) Better Data... Long-term average 1 3 4 N rght now X11 X1 X13 X14... X1N X 1 X X3 X4... XN Observed X X X X... X 31 3 33 34 3N, X, = 1,..., N, j= 1,..., J j Rev: 03/15/05 ASQ-RS 005-3 Better Data The addtonal nformaton X X X X... X 11 1 13 14 1N X X X X... X 1 3 4 N X X X X... X 31 3 33 34 3N s s s s s s e1 e e3 e4 en e and s u Now: 7 summares to estmate 6 parameters. Rev: 03/15/05 ASQ-RS 005-33

A Bgger Model Wth 7 summares to estmate 6 parameters, let s consder an even bgger (=more Model 4 realstc) model What f the two measurng devces are nd N( µ, ) y = β0 + β1 + δ, δ ndn 0, δ not qute measurng the same feature? X j = + ej, ej nd N( 0, e ) Y = y + u, u nd N 0, j j j ( u ) ( ) Rev: 03/15/05 ASQ-RS 005-34 30 Model 1 Model 30 0 0 y y= lne 0 30 0 30 30 Model 3 30 Model 4 0 0 Models wth hypothetcal data: and y 0 30 0 30 Rev: 03/15/05 ASQ-RS 005-35 30 Model 1 Model 30 y and Y 0 30 Model 3 0 30 and X 0 Y values graphed at as X values graphed at y as 30 Model 4 0 30 Model ' also 0 Models wth hypothetcal data: X and Y 0 0 30 0 30 Rev: 03/15/05 ASQ-RS 005-36

Comparson to Gage R&R Two devces one devce but several operators Operators as devces General operator dfferences (vs. specfc lnear trend dfferences & devatons from t) Assumes each operator s measurement error equal (vs. lookng for dfferent devce precson) Typcally small study, wth poor estmates (vs. more data and better estmates) Rev: 03/15/05 ASQ-RS 005-37 Mandel s Estmate and The Regresson Problem Mandel 1984 JQT) consdered our Model 3 (possbly un-calbrated and dfferent precson, but measurng same feature) He found the rule for fndng the best fttng lne (estmatng the relaton between and y, not X and Y) Rev: 03/15/05 ASQ-RS 005-38 All meas t error n X: Least Squares based on Regresson of X on Y All meas t error n Y: Least Squares based on Regresson of Y on X MD Based on only one readng from each eye X 0 15 Equal meas t error n X & Y: Least Squares based on 45 lne General Case: Least Squares based on k lne 15 0 MDy Y Rev: 03/15/05 ASQ-RS 005-39

s Data Analyss: Informal Methods The largest model we want to ft s Model 4. But what f even ths sn t rght? Can the data tell us? Yes, up to a pont. Eamples of Informal analyss: Does measurement varablty ncrease as the values ncrease? Is there a trend n three consecutve readngs? Is the bas, f any, lnear? Rev: 03/15/05 ASQ-RS 005-40 Does Measurement Varablty Increase as the Values Increase? Consder MD y only here Plot of s 3 Y, vs. Y1 Y Y s Y Y, Y.5.0 1.5 1.0 0.5 0.0 8 1 14 16 18 0 Ybar Rev: 03/15/05 ASQ-RS 005-41 Is there a Trend n Three Consecutve Readngs? Look at Y3, Y1, Y 11 Y Y Y Y 1 31 11 31 0.3 0. 0.1 0.0-6 -5-4 -3 - -1 0 1 3 4 5 Y3 - Y1 Rev: 03/15/05 ASQ-RS 005-4

Sold lnes: lnear, quadratc fts to all the data Dashed lnes: lnear, quadratc fts wthout two largest X values Is the Bas, f any, Lnear? X 5 0 15 Set asde two largest X values 8 1 16 0 Y Rev: 03/15/05 ASQ-RS 005-43 Another Lack of Ft? 5 Note "Boundary" of X at X ~ 0 Set asde 7 lowest values X X 15 Both hgh and low X features to be nvestgated... 8 1 16 0 Y Rev: 03/15/05 ASQ-RS 005-44 Formal Data Analyss Comparson of Models Start at largest and work down Fnd smallest model consstent wth the data Model 4 Model 3 Model Model ' Model 1 Rev: 03/15/05 ASQ-RS 005-45

Formal Data Analyss How to get estmates? Mamum Lkelhood How to compare models? Lkelhood Rato Tests Both: common and powerful statstcal technques Mamum Lkelhood: for a gven model, fnd the parameter estmates most lkely to have generated the data. Lkelhood Rato Tests: f the smaller model s almost as lkely to have generated the data as the larger model, accept t. Otherwse reject t n favor of the larger model. Rev: 03/15/05 ASQ-RS 005-46 Mamum Lkelhood Estmates ˆθ for Model k θ for Model k = 4 k = 3 k = k = k = 1 µ 14.805 14.806 14.806 14.834 14.84 β 0 1.48 0.504 1.653 0.000 0.000 β 1 0.918 0.968 0.891 1.000 1.000 6.43 6.153 6.690 5.849 5.971 δ 0.370 0.000 0.000 0.000 0.000 e 3.119 3.398.115 3.4.139 u 0.9 0.933.115 0.97.139 ( µ y = β0 + β 1µ ) 14.84 14.84 14.84 14.834 14.84 Rev: 03/15/05 ASQ-RS 005-47 ˆθ for Model k θ for Model k = 4 k = 3 k = k = k = 1 µ 14.805 14.806 14.806 14.834 14.84 β 0 1.48 0.504 1.653 0.000 0.000 β 1 0.918 0.968 0.891 1.000 1.000 6.43 6.153 6.690 5.849 5.971 δ 0.370 0.000 0.000 0.000 0.000 e 3.119 3.398.115 3.4.139 u 0.9 0.933.115 0.97.139 ( µ y = β0 + β 1µ ) 14.84 14.84 14.84 14.834 14.84 L L ( θ ˆ ) ( ˆ ) θ Dfference, test of Model k versus k 1.31 3.84 /5.99 s α = 0.05 here 6.70 65.01 114.45 65.41 118.97 59.44 vs 0.40 vs 4.5 63.56 Rev: 03/15/05 ASQ-RS 005-48

MD Conclusons Some unusual behavor at lowest and hghest readngs Round-off error (seen n ndvdual values). MD y vs MD Both MD s are measurng the same feature No evdence of lnear bas MD y s 1.9 more precse than MD (MD y s test, MD s reference. Bland-Altman plots would have smply noted lack of agreement ) Rev: 03/15/05 ASQ-RS 005-49 Topcs The Problem, Eample, Mathematcal Model One Soluton: Bland-Altman Plots Better Data A Comparson to Gage R&R Mandel s Estmate Our Method Models Structural Equatons, Path Dagrams Our Method Analyss Net Sesson: Ft models n Ecel (Mntab used to get startng values quckly) Informal Graphs and Background Checks Formal Lkelhood Methods Rev: 03/15/05 ASQ-RS 005-50