Chapter 10 Graphs, Good and Bad Chapter 10 3 Distribution Definition: Tells what values a variable takes and how often it takes these values Can be a table, graph, or function Categorical Variables Places an individual into one of several groups or categories Display distribution with a bar graph or a pie chart 1
Quantitative (Numerical) Variables Takes numerical values for which arithmetic operations such as adding and averaging make sense Displayed with stem plot, histogram, Variable types quantitative categorical Age Weight Smoking status Gender Eye color Data Table: Exam 1 Spring 07 Grade A B C D Total Count 49 62 32 4 147 Percent 33% 42% 22% 3% 100 % 2
Line Graphs A line graph shows behavior over time. Time is always on the horizontal axis. Look for an overall pattern (trend). Look for patterns that repeat at known regular intervals (seasonal variations). Look for any striking deviations that might indicate unusual occurrences. Line Graphs Displays a variable over time. Line graph of winning times for men s 500-meter speed skating in Winter Olympics 1924 to 2002 Source: http://sportsillustrated.cnn.com Pictograms Bar graph that uses pictures related to topic. Percentage of Ph.D.s earned by women. Left pictogram: Misleading because eye focuses on area rather than just height. Right pictogram: Visually more accurate, but less appealing. Source: Science (vol. 260, 16 April, 1993, p. 409). 3
Making Good Graphs Title your graph. Make sure labels and legends describe variables and their measurement units. Be careful with the scales used. Make the data stand out. Avoid distracting grids, artwork, etc. Pay attention to what the eye sees. Avoid pictograms and tacky effects. No Labeling on One or More Axes Example: Graph with no labeling (a) and possible interpretation s (b and c) Source: Insert in the California Aggie (UC Davis), 30 May 1993. Not Starting at Zero Example: Winning times for Olympic speed skating data with vertical axis starting at 0. Note: For some variables, graphs should not start at zero. e.g. SAT scores with range from 350 to 800. 4
Changes in Labeling on One or More Axes Example: A bar graph with gap in labeling. At first look, seems vertical axis starts at 0, but bottom of the graph actually corresponds to 4.0% Source: Davis (CA) Enterprise, 4 March 1994, p. A-7. Misleading Units of Measurement Fine print: In 1971 dollars, the price of a 32-cent stamp in February 1995 would be 8.4 cents. More truthful picture: show changing price of a first-class stamp adjusted for inflation. Source: USA Today, 7 March 1994, p. 13A. Using Poor Information Graph appears to show very few deaths from solvent abuse before late 1970 s. Article quote: It s only since we have started collecting accurate data since 1982 that we have begun to discover the real scale of the problem (p.5). Source: The Independent on Sunday (London), 13 March 1994. 5
Chapter 11 Displaying Distributions with Graphs Chapter 11 29 Turning quantitative data into information: things to know Center of the data Spread of the data (Variability) Shape of the data Stemplots (Stem-and-Leaf Plots) For quantitative variables Separate each observation into a stem (first part of the number) and a leaf (the remaining part of the number) Write the stems in a vertical column; draw a vertical line to the right of the stems Write each leaf in the row to the right of its stem; order leaves if desired 6
Weight Data: Stemplot (Stem and Leaf Plot) Key 20 3 means 203 pounds Stems = 10 s Leaves = 1 s 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 5 2 2 192 152 135 Weight Data: Frequency Table Weight Group Frequency 100-120 7 120-140 12 140-160 7 160-180 9 180-200 12 200-220 4 220-240 1 240-260 0 260-280 1 * Left endpoint is included in the group, right endpoint is not. Weight Data: Histogram 100 120 140 160 180 200 220 240 260 280 Weight * Left endpoint is included in the group, right endpoint is not. 7
Shape Symmetric: if draw line through center, picture on one side would be mirror image of picture on other side. Example: bell-shaped data set. Unimodal: single prominent peak Bimodal: two prominent peaks Skewed to the Right: higher values more spread out than lower values Skewed to the Left: lower values more spread out and higher ones tend to be clumped Outliers Extreme values, far from the rest of the data May occur naturally May occur due to error in recording May occur due to error in measuring Observational unit may be fundamentally different Obtaining Info from the Stemplot Determine shape, identify outliers, locate center. Pulse Rates: 5 4 5 789 6 023344 6 55567789 7 00124 7 58 Bell-shape Centered mid 60 s no outliers Test Scores 3 2 4 5 5 6 024418 7 56598398 8 5430820 9 53208 Outlier of 32. Apart from 55, rest uniform from the 60 s to 90 s. Median Incomes: 4 66789 5 11344 5 56666688899999 6 011112334 6 556666789 7 01223 7 8 0022 Wide range with 4 unusually high values. Rest bellshape around high $50,000s. 8
Today s Concepts Categorical and quantitative variables Distributions Pie charts, bar graphs Line graphs Good vs bad graphs Stemplots & histograms Shapes Symmetric, Skewed Outliers Don t forget to collect your data before lab next week! 9