Anna T. Waggener, Ph.D. Institutional Assessment United States Army War College Principles of Graphical Excellence Best Paper: ALAIR April 5 6, 2001 AIR: June 2-5, 2002, Toronto Focus-IR, February 21, 2003
The Visual Display of Quantitative Information Leading authority: Edward R. Tufte
History of Graphical Development First geographic maps were drawn on clay tablets. 17 th Century: combined map skills and statistical skills to construct maps. Trade winds and monsoons on a world map. Chart patterns of disease. Later sophistication showed distribution of 1.3 million galaxies.
Graphical excellence consists of the efficient communication of complex quantitative ideas.
Presentation Topics Organizing Numerical Data: The Ordered Array and Stem-leaf Display Tabulating and Graphing Numerical Data: Frequency Distributions: Tables, Histograms, Polygons Cumulative Distributions: Tables, the Ogive
Presentation Topics (continued) Tabulating and Graphing Univariate Categorical Data: The Summary Table Bar and Pie Charts, the Pareto Diagram Tabulating and Graphing Bivariate Categorical Data: Contingency Tables Side by Side Bar charts Graphical Excellence and Common Errors in Presenting Data
At their best, graphics are instruments for reasoning about quantitative information.
Organizing Numerical Data Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21 Ordered Array 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 Frequency Distributions Cumulative Distributions Stem and Leaf Display 2 144677 3 028 4 1 Tables Histograms Polygons Ogive
Organizing Numerical Data: Data in Raw form (as collected): 24, 26, 24, 21, 27, 27, 30, 41, 32, 38 Date Ordered from Smallest to Largest: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 Stem and Leaf display: 2 1 4 4 6 7 7 3 0 2 8 4 1
Design is choice.
Tabulating and Graphing Numerical Data Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21 Ordered Array 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 Frequency Distributions Cumulative Distributions 120 100 80 60 Ogive 40 20 Stem and Leaf Display 2 144677 3 028 4 1 Histograms 7 6 Tables 5 4 3 2 1 0 10 20 30 Ogive 40 50 60 Polygons 0 10 20 30 40 50 60
Tabulating Numerical Data: Frequency Distributions (continued) Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Class Frequency Relative Frequency 10 but under 20 3.15 15 20 but under 30 6.30 30 30 but under 40 5.25 25 40 but under 50 4.20 20 Percentage 50 but under 60 2.10 10 Total 20 1 100
Fr e q u e n cy Graphing Numerical Data: The Histogram Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 H is t o g r a m 7 6 6 5 4 3 2 3 5 4 2 No Gaps Between Bars 1 0 0 0 5 15 25 36 45 55 M o r e Class Midpoints
Graphing Numerical Data: The Frequency Polygon Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 F r e q u e n c y 7 6 5 4 3 2 1 0 5 15 25 36 45 55 M o r e Class Midpoints
Tabulating Numerical Data: Cumulative Frequency Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Cumulative Cumulative Class Frequency % Frequency 10 but under 20 3 15 20 but under 30 9 45 30 but under 40 14 70 40 but under 50 18 90 50 but under 60 20 100
Graphing Numerical Data: The Ogive (Cumulative % Polygon) Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 O g iv e 120 100 80 60 40 20 0 10 20 30 40 50 60
Tabulating and Graphing Categorical Data: Univariate Data Categorical Data Tabulating Data The Summary Table Graphing Data Pie Charts Bar Charts Pareto Diagram
Summary Table (University Revenues) Revenue Category Amount Percentage (in thousands $) Patient Services 46.5 42.27 Tuition/fees 32 29.09 Appropriations 15.5 14.09 Grants/Contracts 16 14.55 Total 110 100 Variables are Categorical.
Graphing Categorical Data: Univariate Data Categorical Data Tabulating Data The Summary Table Graphing Data Pie Charts CD S a vin g s B o n d s Bar Charts Pareto Diagram S to c k s 0 10 20 30 40 50 45 40 35 30 25 20 15 10 5 0 120 100 80 60 40 20 0 S to c k s B o n d s S a vin g s CD
Bar Chart Enrollment Summary 1st Prof Grad Unclas s Seniors Juniors Sophomores Fres hmen 0 500 1000 1500 2000 2500 3000
Pie Chart (for a factbook) Students by Classification Seniors 15% Sophomores 14% Freshmen 42% Juniors 29% Percentages are rounded to the nearest percent.
Pareto Diagram Axis for bar chart shows % in each category 45 40 35 30 25 20 15 10 5 0 1 2 3 4 120 100 80 60 40 20 0 Axis for line graph shows cumulative %
Tabulating and Graphing Bivaria Categorical Data Contingency Tables Side by Side Charts
Tabulating Categorical Data: Bivariate Data Contingency Table: Enrollment by College Enrollment A&S BUS NRS Total Category Freshmen 46 55 27 128 Sophomores 32 44 19 95 Juniors 15 20 13 48 Seniors 16 28 7 51 Total 109 147 66 322
Graphing Categorical Data: Bivariate Data Side by Side Chart 1st Prof Grad Unclass Seniors Juniors Sophomores Freshmen 0 200 400 600 800 1000 1200 A&S AH NS
Principles of Graphical Excellence Well designed presentation of data that provides: Substance Statistics Design Communicates complex ideas with clarity, precision and efficiency Gives the largest number of ideas in the most efficient manner Almost always involves several dimensions Requires telling the truth about the data
Data-Ink Ratio Data information Total ink used to print the graphic
Much of twentieth-century thinking about statistical graphics has been preoccupied with the question of how some amateurish chart might fool a naive viewer.
Errors in Presenting Data Using chart junk No relative basis In comparing data Batches Compressing the Vertical axis No zero point on the Vertical axis
Chart Junk Bad Presentation Good Presentation Minimum Wage 1960: $1.00 1970: $1.60 1980: $3.10 1990: $3.80 4 2 0 $ Minimum Wage 1960 1970 1980 1990
Lie Factor Size of effect shown in graphic Size of effect in data
No Relative Basis 300 200 0 Bad Presentation Freq. A s received by students. FR SO JR SR 30% 10% % Good Presentation A s received by students. FR SO JR SR FR = Freshmen, SO = Sophomore, JR = Junior, SR = Senior
Compressing Vertical Axis Bad Presentation Good Presentation 200 $ Quarterly Income 50 $ Quarterly Income 100 25 0 0 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
No Zero Point on Vertical Axis 45 42 39 36 $ Bad Presentation Monthly Expenses J F M A M J 45 42 39 36 0 Good Presentation $ Monthly Expenses J F M A M J Graphing the first six months of sales.
No Zero Point on Vertical Axis Bad Presentation Good Presentation 45 $ Monthly Expenses 60 $ Monthly Expenses 42 40 39 20 36 0 J F M A M J J F M A M J Graphing the first six months of sales.
Main defense of the lying graphic... Well, at least it was approximately correct, we were just trying to show the general direction of change.
Presentation Summary Organized Numerical Data: The Ordered Array and Stem-leaf Display Tabulated and Graphed Numerical Data Frequency Distributions: Tables, Histograms, Polygons Cumulative Distributions: Tables, the Ogive
Presentation Summary (continued) Tabulated and Graphed Univariate Categorical Data: The Summary Table Bar and Pie Charts, the Pareto diagram Tabulated and Graphed Bivariate Categorical Data: Contingency Tables Side by Side charts Discussed Graphical Excellence and Common Errors in Presenting Data
There remain, however, many other consideration in the design of statistical graphics not only of efficiency, but also of complexity, structure, density, and even beauty.
Without data, it is anyone s opinion. Author unknown