Descriptive Statistics II. Graphical summary of the distribution of a numerical variable. Boxplot

Similar documents
Section 1.5 Graphs and Describing Distributions

Sections Descriptive Statistics for Numerical Variables

To describe the centre and spread of a univariate data set by way of a 5-figure summary and visually by a box & whisker plot.

Numerical: Data with quantity Discrete: whole number answers Example: How many siblings do you have?

AP Statistics Composition Book Review Chapters 1 2

Symmetric (Mean and Standard Deviation)

Chapter 2. Describing Distributions with Numbers. BPS - 5th Ed. Chapter 2 1

Algebra I Notes Unit One: Real Number System

Chapter 1: Stats Starts Here Chapter 2: Data

STK110. Chapter 2: Tabular and Graphical Methods Lecture 1 of 2. ritakeller.com. mathspig.wordpress.com

(Notice that the mean doesn t have to be a whole number and isn t normally part of the original set of data.)

BE540 - Introduction to Biostatistics Computer Illustration. Topic 1 Summarizing Data Software: STATA. A Visit to Yellowstone National Park, USA

Lecture 5 Understanding and Comparing Distributions

Chapter 3. Graphical Methods for Describing Data. Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Exploring Data Patterns. Run Charts, Frequency Tables, Histograms, Box Plots

Chapter 2. Organizing Data. Slide 2-2. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Data About Us Practice Answers

Univariate Descriptive Statistics

Probability WS 1 Counting , , , a)625 b)1050c) a)20358,520 b) 1716 c) 55,770

Chapter 4. Displaying and Summarizing Quantitative Data. Copyright 2012, 2008, 2005 Pearson Education, Inc.

1.3 Density Curves and Normal Distributions

Displaying Distributions with Graphs

EE EXPERIMENT 3 RESISTIVE NETWORKS AND COMPUTATIONAL ANALYSIS INTRODUCTION

1.3 Density Curves and Normal Distributions. Ulrich Hoensch MAT210 Rocky Mountain College Billings, MT 59102

HOMEWORK 3 Due: next class 2/3

1.3 Density Curves and Normal Distributions

Left skewed because it is stretched to the left side. Lesson 5: Box Plots. Lesson 5

MAT Mathematics in Today's World

Chapter Displaying Graphical Data. Frequency Distribution Example. Graphical Methods for Describing Data. Vision Correction Frequency Relative

10/13/2016 QUESTIONS ON THE HOMEWORK, JUST ASK AND YOU WILL BE REWARDED THE ANSWER

Chapter 1. Statistics. Individuals and Variables. Basic Practice of Statistics - 3rd Edition. Chapter 1 1. Picturing Distributions with Graphs

Possible responses to the 2015 AP Statistics Free Resposne questions, Draft #2. You can access the questions here at AP Central.

CHAPTER 13A. Normal Distributions

Describing Data Visually. Describing Data Visually. Describing Data Visually 9/28/12. Applied Statistics in Business & Economics, 4 th edition

SAMPLE. This chapter deals with the construction and interpretation of box plots. At the end of this chapter you should be able to:

(2) Do the problem again this time using the normal approximation to the binomial distribution using the continuity correction A(2)_

Name: Date: Period: Histogram Worksheet

PASS Sample Size Software

The numbers are...,..., ...,...,...,...,...,

Unit 8, Activity 1, Vocabulary Self-Awareness Chart

Find the following for the Weight of Football Players. Sample standard deviation n=

Sidcot intranet - Firefly. Useful links: Instant classroom. MyMaths. Objectives

Spring 2017 Math 54 Test #2 Name:

12.1 The Fundamental Counting Principle and Permutations

Collecting, Displaying, and Analyzing Data

Business Statistics. Lecture 2: Descriptive Statistical Graphs and Plots

One-Sample Z: C1, C2, C3, C4, C5, C6, C7, C8,... The assumed standard deviation = 110

Mean for population data: x = the sum of all values. N = the population size n = the sample size, µ = the population mean. x = the sample mean

1.1 Displaying Distributions with Graphs, Continued

Notes: Displaying Quantitative Data

Why Should We Care? Everyone uses plotting But most people ignore or are unaware of simple principles Default plotting tools are not always the best

Organizing Data 10/11/2011. Focus Points. Frequency Distributions, Histograms, and Related Topics. Section 2.1

Describing Data. Presenting Categorical Data Graphically. Describing Data 143

Seven Basic Quality Control Tools HISTOGRAM TOOL

Lecture 16 Sections Tue, Sep 23, 2008

Objectives. Organizing Data. Example 1. Making a Frequency Distribution. Solution

Why Should We Care? More importantly, it is easy to lie or deceive people with bad plots

Elementary Statistics. Graphing Data

Frequency Distribution and Graphs

What Is a Histogram? A bar graph that shows the distribution of data A snapshot of data taken from a process HISTOGRAM VIEWGRAPH 1

TJP TOP TIPS FOR IGCSE STATS & PROBABILITY

Chapter 1. Picturing Distributions with Graphs

She concludes that the dice is biased because she expected to get only one 6. Do you agree with June's conclusion? Briefly justify your answer.

Learning Objectives. Describing Data: Displaying and Exploring Data. Dot Plot. Dot Plot 12/9/2015

Outline. Drawing the Graph. 1 Homework Review. 2 Introduction. 3 Histograms. 4 Histograms on the TI Assignment

Chapter 6: Descriptive Statistics

Describing Data: Displaying and Exploring Data. Chapter 4

NCSS Statistical Software

Data Analysis. (1) Page #16 34 Column, Column (Skip part B), and #57 (A S/S)

Chapter 4 Displaying and Describing Quantitative Data

Algebra 2 P49 Pre 10 1 Measures of Central Tendency Box and Whisker Plots Variation and Outliers

HPS Scope Sequence Last Revised June SUBJECT: Math GRADE: 7. Michigan Standard (GLCE) Code & Language. What this Standard means:

Chapter 10. Definition: Categorical Variables. Graphs, Good and Bad. Distribution

Variables. Lecture 13 Sections Wed, Sep 16, Hampden-Sydney College. Displaying Distributions - Quantitative.

Q Scheme Marks AOs. 1a All points correctly plotted. B2 1.1b 2nd Draw and interpret scatter diagrams for bivariate data.

Question 1. The following set of data gives exam scores in a class of 12 students. a) Sketch a box and whisker plot of the data.

Interval of Head Circumferences (mm) XS 510 < 530 S 530 < 550 M 550 < 570 L 570 < 590 XL 590 < 610 XXL 610 < 630. Hat Sizes.

Chpt 2. Frequency Distributions and Graphs. 2-3 Histograms, Frequency Polygons, Ogives / 35

DESCRIBING DATA. Frequency Tables, Frequency Distributions, and Graphic Presentation

Ace of diamonds. Graphing worksheet

Topics for today. Why not use R for graphics? Why use R for graphics? Introduction to R Graphics: U i R t t fi. Using R to create figures

Chapter 4. September 08, appstats 4B.notebook. Displaying Quantitative Data. Aug 4 9:13 AM. Aug 4 9:13 AM. Aug 27 10:16 PM.

Core Connections, Course 2 Checkpoint Materials

Lecture 16 Sections Tue, Feb 10, 2009

0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 5, 8

LESSON 2: FREQUENCY DISTRIBUTION

Algebra 1 B Semester Exam Review

Section 1: Data (Major Concept Review)

Algebra I. Measures of Central Tendency: Mean, Median, Mode & Additional Measures of Data. Slide 1 / 141 Slide 2 / 141. Slide 4 / 141.

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.2- #

You must have: Pen, HB pencil, eraser, calculator, ruler, protractor.

Learning Log Title: CHAPTER 2: ARITHMETIC STRATEGIES AND AREA. Date: Lesson: Chapter 2: Arithmetic Strategies and Area

How to Make a Run Chart in Excel

Exam Time. Final Exam Review. TR class Monday December 9 12:30 2:30. These review slides and earlier ones found linked to on BlackBoard

Mathematics. Pre-Leaving Certificate Examination, Paper 2 Ordinary Level Time: 2 hours, 30 minutes. 300 marks L.19 NAME SCHOOL TEACHER

Assessing Measurement System Variation

Notes 5C: Statistical Tables and Graphs

Coordinate Algebra 1 Common Core Diagnostic Test 1. about 1 hour and 30 minutes for Justin to arrive at work. His car travels about 30 miles per

A C E. Answers Investigation 3. Applications. Sample 2: 11 moves. or 0.44; MAD Sample 2: 22. , or 2.44; MAD Sample 3: 0, or 0.

2.2 More on Normal Distributions and Standard Normal Calculations

Transcription:

MAT 2379 (Spring 2012) Descriptive Statistics II Graphical summary of the distribution of a numerical variable We will present two types of graphs that can be used to describe the distribution of a numerical variable : a boxplot. a histogram. Boxplot We will present a useful visual tool invented by John Tukey in 1977 to describe the distribution a numerical variable. The diagram will help us to describe the central tendency and the dispersion of the distribution. It is useful to compare the central tendency and also to compare the dispersion of many groups. It will also help us to identify outlying values (i.e. atypical values). We draw a box from the first quartile Q 1 to the third quartile Q 3 which is cut at the median. The range from Q 1 to Q 3 is the inter-quartile range which is measure of dispersion. The median is a mesure of central tendency. At the left of the box, we draw a stem (also called a whisker) down to the smallest value that is within 1.5 times the interquartile range of Q 1. At the right, we draw a stem (also called a whisker) to the largest value that is within 1.5 times the interquartile range of the 3rd quartile. 1

For values that are past a distance of 1.5IQR to the right or left of the box, then we put a point for each of these values. We call these values outliers or atypical values. q 1 x q 3 1,5 IQR IQR 1,5 IQR Example 3 : Consider the radish growth from Example 2. The data are in the file RadishGrowth.txt., see the update of May 24 on the course Web page. To construct the boxplot use the following command in Minitab (here we suppose that the data is in column C1) : MTB > boxplot c1. Here is the output : 2

Now suppose that we would like to compare this distribution of growth for radishes in darkness with the distribution of growth for radishes that were given 12 hours of light per day. Here is the data : 3 10 15 17 18 18 18 20 20 25 25 25 28 29 The data can also be found in the file radishii.txt, see the update of May 29 on the Web page. We will assume that the variable are in the columns C1 and C2 of Minitab. Here are the commands to produce side-by-side boxplots. MTB > boxplot c1 c2; SUBC> overlay. Here is the graph : 3

Discussion : The central tendency of growth are similar for both groups. But the growth for radishes that receive the light are less variable (i.e. less dispersed) in comparison to the growth of radish in the dark. In addition, there is a radish growth in the group of radishes that receive the light that is an outlier in comparison to other growths in its group. This growth of 3mm is atypical. Remarks : This is an example of data from an experiment. The response variable is the growth after 3 days (in mm). The factor (or explanatory variable) is access to light. The light factor has two levels : access to light 12 hours a day and total darkness. We call the levels of the factor treatments. The radish seedling are the experimental units. We randomly assign treatments to these basic units of study. If the conditional distribution of the response varies from treatment to treatment, then we say that there is a treatment effect. We have data from one experiment, however these data can vary from experiment to experiment or from sample to sample. We will learn in this course how to determine if a treatment effect is significant by taking into account the sample to sample variability. 4

Histogram The frequency distribution of a numerical variable can be displayed with a histogram. Construction of a histogram : 1. Dived the horizontal axis into sub-intevals (preferably of equal length). Each sub-interval represents a range values for the random variable. It is often suggested to use between 5 to 20 classes. Often # of classes = n works well. 2. Different statistical packages use different techniques to determine the number of subintervals. However often the default works well. 3. Terminology : Often a subinterval is called a bin. 4. For each bin, erect a rectangle whose height is equal to either the frequency, the relative frequency or the density. 5. If you use the density, that is density = relative frequency/length of bin, then the area of the bin, which is density length of bin, is equal to the relative frequency (i.e. probability). A histogram is used to describe the shape of the distribution of a numerical variable. Some examples of histograms that are respectively approximately symmetric, skewed to the right, skewed to the left. The asymmetry is in the direction of the atypical values. approximately symmetric skewed to the right skewed to the left 5

Example 4 : Consider the following histograms. Identify the distribution that are skewed to the right or skewed to the left. Describe the skewness as being weak or strong. Also identify the histograms that are approximately symmetric. 6

Example 5 : Consider the radishes that receive 12 hours of light per day from Example 3. The data can also be found in the file radishii.txt, see the update of May 29 on the Web page. We will construct the histogram for the growth of radish. Here we assume that the data is in column c2. Here is the command. MTB > histogram c2. Here are the results. This histogram is a histogram of frequencies. We can also produce a probability histogram with the following commands : MTB > histogram c2; SUBC> percent. Here is the graph. 7

This histogram is a histogram of frequencies. We can also produce a density histogram with the following commands : MTB > histogram c2; SUBC> density. Here is the graph. 8