DESCRIBING DATA Frequency Tables, Frequency Distributions, and Graphic Presentation
Raw Data A raw data is the data obtained before it is being processed or arranged. 2
Example: Raw Score A raw score is the score obtained by a particular student in a particular test before it is being processed or arranged. 3
The Raw Score is a Variable 78, 74, 65, 74, 74, 67, 63, 67, 80, 58 74, 50, 65, 74, 86, 78, 63, 65, 80, 89 The raw scores for 20 students in a test 4
Organizing and Presenting Data Graphically Data in raw form are usually not easy to use for decision making Some type of organization is needed Table Graph Techniques reviewed here: Ordered Array Stem-and-Leaf Display Frequency Distributions and Histograms Bar charts and pie charts Contingency tables 5
Tables and Charts for Interval Data Interval Data Ordered Array Stem-and-Leaf Display Frequency Distributions and Cumulative Distributions Histogram Polygon Ogive 6
The Ordered Array A sorted list of data: Shows range (minimum to maximum) Provides some signals about variability within the range May help identify outliers (unusual observations) If the data set is large, the ordered array is less useful 7
The Ordered Array Data in raw form (as collected): 24, 26, 24, 21, 27, 27, 30, 41, 32, 38 (continued) Data in ordered array from smallest to largest: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 8
Stem-and-Leaf Diagram A simple way to see distribution details in a data set METHOD: Separate the sorted data series into leading digits (the stem) and the trailing digits (the leaves) 9
Stem-and-Leaf The major advantage to organizing the data into stemand-leaf display is that we get a quick visual picture of the shape of the distribution. Stem-and-leaf display is a statistical technique to present a set of data. Each numerical value is divided into two parts. The leading digit(s) becomes the stem and the trailing digit the leaf. The stems are located along the vertical axis, and the leaf values are stacked against each other along the horizontal axis. Advantage of the stem-and-leaf display over a frequency distribution - the identity of each observation is not lost. 10
Example Data in ordered array: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 Here, use the 10 s digit for the stem unit: 21 is shown as 38 is shown as Stem Leaf 2 1 3 8 11
Example Suppose the seven observations in the 90 up to 100 class are: 96, 94, 93, 94, 95, 96, and 97. The stem value is the leading digit or digits, in this case 9. The leaves are the trailing digits. The stem is placed to the left of a vertical line and the leaf values to the right. The values in the 90 up to 100 class would appear as Then, we sort the values within each stem from smallest to largest. Thus, the second row of the stemand-leaf display would appear as follows: 12
Example Data in ordered array: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 (continued) Completed stem-and-leaf diagram: Stem Leaves 2 1 4 4 6 7 7 3 0 2 8 4 1 13
Using other stem units Using the 100 s digit as the stem: Round off the 10 s digit to form the leaves Stem Leaf 613 would become 6 1 776 would become 7 8... 1224 becomes 12 2 14
Using other stem units Using the 100 s digit as the stem: (continued) The completed stem-and-leaf Data: display: 613, 632, 658, 717, 722, 750, 776, 827, 841, 859, 863, 891, 894, 906, 928, 933, 955, 982, 1034, 1047,1056, 1140, 1169, 1224 15 Stem Leaves 6 1 3 6 7 2 2 5 8 8 3 4 6 6 9 9 9 1 3 3 6 8 10 3 5 6 11 4 7 12 2
Stem-and-leaf: Another Example Listed in Table 4 1 is the number of 30-second radio advertising spots purchased by each of the 45 members of the Greater Buffalo Automobile Dealers Association last year. Organize the data into a stem-and-leaf display. Around what values do the number of advertising spots tend to cluster? What is the fewest number of spots purchased by a dealer? The largest number purchased? 16
Stem-and-leaf: Another Example 17
Tabulating Numerical Data: Frequency Distributions What is a Frequency Distribution? A frequency distribution is a list or a table containing class groupings (categories or ranges within which the data fall)... and the corresponding frequencies with which data fall within each grouping or category 18
Why Use Frequency Distributions? A frequency distribution is a way to summarize data The distribution condenses the raw data into a more useful form... and allows for a quick visual interpretation of the data 19
Frequency Distribution (ungrouped data) Score (X) 50 58 63 65 67 74 78 80 86 89 Frequency (f) 1 1 2 3 2 5 2 2 1 1 Frequency distribution table for ungrouped data Total 20 20
Frequency Distribution (grouped data) A Frequency Distribution is a grouping of data into mutually exclusive categories showing the number of observations in each class. 21
EXAMPLE Creating a Frequency Distribution Table Ms. Kathryn Ball of AutoUSA wants to develop tables, charts, and graphs to show the typical selling price on various dealer lots. The table on the right reports only the price of the 80 vehicles sold last month at Whitner Autoplex. 22
Constructing a Frequency Table - Example Step 1: Decide on the number of classes. A useful recipe to determine the number of classes (k) is the 2 to the k rule. such that 2 k > n. There were 80 vehicles sold. So n = 80. If we try k = 6, which means we would use 6 classes, then 2 6 = 64, somewhat less than 80. Hence, 6 is not enough classes. If we let k = 7, then 2 7 128, which is greater than 80. So the recommended number of classes is 7. Step 2: Determine the class interval or width. The formula is: i (H-L)/k where i is the class interval, H is the highest observed value, L is the lowest observed value, and k is the number of classes. ($35,925 - $15,546)/7 = $2,911 Round up to some convenient number, such as a multiple of 10 or 100. Use a class width of $3,000 23
Or Use No. of Class = 1 + 3.3log(n) Guide line Collect data Bills 42.19 38.45 29.23 89.35 118.04 110.46 0.00 72.88 83.05.. (There are 200 data points Prepare a frequency distribution How many classes to use? Number of observations Number of classes Less then 50 5-7 50-200 7-9 200-500 9-10 500-1,000 10-11 1,000 5,000 11-13 5,000-50,000 13-17 More than 50,000 17-20 NO of Class= 1 +3.3 log (n) n: No of data/observation Class width = [Range] / [# of classes] [119.63-0] / [8] = 14.95 15 Largest observation Smallest observation Smallest observation Guide line 24
Constructing a Frequency Table - Example Step 3: Set the individual class limits 25
Constructing a Frequency Table - Example Step 4: Tally the vehicle selling prices into the classes. Step 5: Count the number of items in each class. 26
Frequency Distribution Characteristics Class midpoint: A point that divides a class into two equal parts. This is the average of the upper and lower class limits. Class frequency: The number of observations in each class. Class interval: The class interval is obtained by subtracting the lower limit of a class from the lower limit of the next class. 27
Relative Frequencies Class frequencies can be converted to relative class frequencies to show the fraction of the total number of observations in each class. A relative frequency captures the relationship between a class total and the total number of observations. 28
Relative Frequency Distribution To convert a frequency distribution to a relative frequency distribution, each of the class frequencies is divided by the total number of observations. 29
Example Score (X) Frequency (f) Relative Frequency Percentage (%) 50 1 0.05 5 58 1 0.05 5 63 2 0.10 10 65 3 0.15 15 67 2 0.10 10 74 5 0.25 25 78 2 0.10 10 80 2 0.10 10 86 1 0.05 5 89 1 0.05 5 f = 20 30
Relative Frequencies Definition Number of studentsat a particular score Relative frequency = Totalnumber of students f f Example: At score 65 Relative frequency = 3 20 0.15 Percentage = Relative frequency x 100 Example: At score 65 Percentage = 0.15 x 100 = 15% 31
Graphic Presentation of a Frequency Distribution The three commonly used graphic forms are: Histograms Frequency polygons Cumulative frequency distributions Ogive 32
Histogram Histogram for a frequency distribution based on quantitative data is very similar to the bar chart showing the distribution of qualitative data. The classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars. 33
Histogram Using Excel 34
Frequency Polygon A frequency polygon also shows the shape of a distribution and is similar to a histogram. It consists of line segments connecting the points formed by the intersections of the class midpoints and the class frequencies. 35
Frequency Example: Frequency Polygon Class Class Midpoint Frequency 10 but less than 20 15 3 20 but less than 30 25 6 30 but less than 40 35 5 40 but less than 50 45 4 50 but less than 60 55 2 (In a percentage polygon the vertical axis would be defined to show the percentage of observations per class) 7 6 5 4 3 2 1 0 Frequency Polygon: Daily High Temperature 5 15 25 35 45 55 More 36 Class Midpoints
Kekerapan Frequency Example: Frequency Polygon 6 5 4 3 2 1 0 50 60 70 80 90 1 2 3 4 5 6 7 8 9 10 Score Skor 37
Relative Frequency Example: Relative Frequency Polygon 0.30 0.25 0.20 0.15 0.10 0.05 0.00 50 60 70 80 90 1 2 3 4 5 6 7 8 9 10 Score 38
Percent (%) Example: Percent Graph 30 25 20 15 10 5 0 50 1 2 60 3 4 570 6 7 808 9 90 10 11 Score 39
Frequency Polygon and Histogram 40
Cumulative Frequency Distribution 41
Cumulative Frequency Distribution 42
Cumulative Frequency Distribution Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Class Frequency Percentage Cumulative Frequency Cumulative Percentage 10 but less than 20 3 15 3 15 20 but less than 30 6 30 9 45 30 but less than 40 5 25 14 70 40 but less than 50 4 20 18 90 50 but less than 60 2 10 20 100 Total 20 100 43
Cumulative Percentage The Ogive (Cumulative % Polygon) Class Lower class boundary Cumulative Percentage Less than 10 10 0 10 but less than 20 20 15 20 but less than 30 30 45 30 but less than 40 40 70 40 but less than 50 50 90 50 but less than 60 60 100 100 80 60 40 20 0 Ogive: Daily High Temperature 10 20 30 40 50 60 Class Boundaries (Not Midpoints) 44
Cumulative Relative Frequency Distribution Score (X) Frequency (f) Cumulative Frequency (cf) Cumulative Relative Frequency (crf) Cumulative Percent (cp) 50 1 1 0.05 5 58 1 2 0.10 10 63 2 4 0.20 20 65 3 7 0.35 35 67 2 9 0.45 45 74 5 14 0.70 70 78 2 16 0.80 80 80 2 18 0.90 90 86 1 19 0.95 95 89 1 20 1.00 100 f = 20 45
Cumulative Frequency Cumulative Frequency Curve 25 20 15 10 18 students obtain score 85 or less 5 0 50 60 70 80 90 1 2 3 4 5 6 7 8 9 10 Score 46
Grouped Data Cumulative Frequency Distribution and Cumulative Percent Class Interval (CI) (score X) Class Limit (CL) (score X) Class Mid Point (m) Frequency (less than Upper Class Limit (UCL)) (f) Relative Frequency (less than UCL) (cf) Cumulative Relative Frequency (less than UCL) (crf) Cumulative Percent (less than UCL) (cp) 50 54 55 59 60 64 65 69 70 74 75 79 80 84 85 89 49.5 54.5 54.5 59.5 59.5 64.5 64.5 69.5 69.5 74.5 74.5 79.5 79.5 84.5 84.5 89.5 52 57 62 67 72 77 82 87 1 1 2 5 5 2 2 2 1 2 4 9 14 16 18 20 0.05 0.10 0.20 0.45 0.70 0.80 0.90 1.00 5 10 20 45 70 80 90 100 47
Cumulative Frequency Cumulative Frequency Curve and Cumulative Percent for Grouped Data 20 100% 15 10 75% 50% Cumulative Percent 5 25% 0 0 149.5 254.5 3 59.5 4 64.5 5 69.56 74.57 79.58 84.5 9 89.5 Score 48
Ogive 49
Ogive OGIVE Orgive is a smooth cumulative frequency curve. The curve moves from the left and increases smoothly to the right. The smooth increase is called monotonic. 50
Tables and Charts for Categorical Data Categorical Data Tabulating Data Graphing Data Summary Table Bar Charts Pie Charts Pareto Diagram 51
The Summary Table Summarize data by category Investment Amount Percentage Type (in thousands $) (%) (Variables are Categorical) Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total 110.0 100.0 52
Bar and Pie Charts Bar charts and Pie charts are often used for qualitative (category/nominal) data Height of bar or size of pie slice shows the frequency or percentage for each category 53
Bar Charts 54
Bar Chart: Example Investment Amount Percentage Type (in thousands $) (%) Current Investment Portfolio Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total 110.0 100.0 Savings CD Bonds Stocks Investor's Portfolio 55 0 10 20 30 40 50 Amount in $1000's
Pie Charts 56
Pie Charts: Example Investment Amount Percentage Type (in thousands $) (%) Current Investment Portfolio Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total 110.0 100.0 CD 14% Savings 15% Stocks 42% Bonds 29% Percentages are rounded to the nearest percent 57
PIE CHART USING EXCEL 58
Pareto Diagram Used to portray categorical data A bar chart, where categories are shown in descending order of frequency A cumulative polygon is often shown in the same graph Used to separate the vital few from the trivial many 59
% invested in each category (bar graph) Pareto Diagram: Example Current Investment Portfolio 45% 100% 40% 90% 35% 30% 25% 20% 15% 10% 80% 70% 60% 50% 40% 30% 20% cumulative % invested (line graph) 5% 10% 0% Stocks Bonds Savings CD 0% 60
Contingency Tables A scatter diagram requires that both of the variables be at least interval scale. What if we wish to study the relationship between two variables when one or both are nominal or ordinal scale? In this case we tally the results in a contingency table. 61
Contingency Tables Example A manufacturer of preassembled windows produced 50 windows yesterday. This morning the quality assurance inspector reviewed each window for all quality aspects. Each was classified as acceptable or unacceptable and by the shift on which it was produced. Thus we reported two variables on a single item. The two variables are shift and quality. The results are reported in the following table. 62
Contingency Table Contingency Table for Investment Choices ($1000 s) Investment Investor A Investor B Investor C Total Category Stocks 46.5 55 27.5 129 Bonds 32.0 44 19.0 95 CD 15.5 20 13.5 49 Savings 16.0 28 7.0 51 Total 110.0 147 67.0 324 (Individual values could also be expressed as percentages of the overall total, percentages of the row totals, or percentages of the column totals) 63
Contingency Table: Example Example To conduct an efficient advertisement campaign the relationship between occupation and newspapers readership is studied. The following table was created Blue Collar White collar Professional G&M 27 29 33 Post 18 43 51 Star 38 15 24 Sun 37 21 18 64
Contingency Table: Example Solution If there is no relationship between occupation and newspaper read, the bar charts describing the frequency of readership of newspapers should look similar across occupations. 65
Contingency Table: Example Blue 40 30 20 10 0 1 2 3 4 Blue-collar workers prefer the Star and the Sun. White 50 40 30 20 10 0 60 50 40 30 20 10 0 1 2 3 4 Prof 1 2 3 4 White-collar workers and professionals mostly read the Post and the Globe and Mail 66
Graphing the Relationship Between Two Nominal Variables We create a contingency table. This table lists the frequency for each combination of values of the two variables. We can create a bar chart that represent the frequency of occurrence of each combination of values. 67
Describing Time-Series Data Data can be classified according to the time it is collected. Cross-sectional data are all collected at the same time. Time-series data are collected at successive points in time. Time-series data is often depicted on a line chart (a plot of the variable over time). 68
Line Chart Example The total amount of income tax paid by individuals in 1987 through 1999 are listed below. Draw a graph of this data and describe the information produced. 69
Line Chart Line Chart 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0 87 88 89 90 91 92 93 94 95 96 97 98 99 For the first five years total tax was relatively flat From 1993 there was a rapid increase in tax revenues. Line charts can be used to describe nominal data time series. 70
Principles of Graphical Excellence Present data in a way that provides substance, statistics and design Communicate complex ideas with clarity, precision and efficiency Give the largest number of ideas in the most efficient manner Excellence almost always involves several dimensions Tell the truth about the data 71
APPLICATION EXAMPLE Providing information concerning the monthly bills of new subscribers in the first month after signing on with a telephone company. (Refer to file) Collect data Prepare a frequency distribution Draw a histogram 72
Data of Bill 42.19 103.15 39.21 89.5 75.71 2.42 8.37 77.21 1.62 109.08 28.77 104.4 35.32 115.78 13.9 6.95 38.45 94.52 48.54 13.36 88.62 1.08 7.18 72.47 91.1 2.45 9.12 2.88 117.69 0.98 9.22 6.48 29.23 26.84 93.31 44.16 99.5 76.69 11.07 0 10.88 21.97 118.75 65.9 106.84 19.45 109.94 11.64 89.35 93.93 104.88 92.97 85 13.62 1.47 5.64 30.62 17.12 0 20.55 8.4 0 10.7 83.26 118.04 90.26 30.61 99.56 0 88.51 26.4 6.48 100.05 19.7 13.95 3.43 90.04 27.21 0 15.42 110.46 72.78 22.57 92.62 8.41 55.99 13.26 6.95 26.97 6.93 14.34 10.44 3.85 89.27 11.27 24.49 0 101.36 63.7 78.89 70.48 12.24 21.13 19.6 15.43 10.05 79.52 21.36 91.56 14.49 72.02 89.13 72.88 104.8 104.84 87.71 92.88 119.63 95.03 8.11 29.25 99.03 2.72 24.42 10.13 92.17 7.74 111.14 83.05 74.01 6.45 93.57 3.2 23.31 29.04 9.01 1.88 29.24 9.63 95.52 5.72 21 5.04 92.64 95.73 56.01 16.47 0 115.5 11.05 5.42 84.77 16.44 15.21 21.34 6.72 33.69 106.59 33.4 53.9 114.67 19.34 15.3 112.94 27.57 13.54 75.49 20.12 64.78 18.89 68.69 53.21 45.81 1.57 35 15.3 56.04 0 9.12 49.24 20.39 5.2 18.49 9.44 31.77 2.8 84.12 2.67 94.67 5.1 13.68 4.69 44.32 3.03 20.84 41.38 3.69 9.16 100.04 45.77 73
Preparing Frequency Distribution Collect data Bills 42.19 38.45 29.23 89.35 118.04 110.46 0.00 72.88 83.05.. (There are 200 data points Prepare a frequency distribution How many classes to use? Number of observations Number of classes Less then 50 5-7 50-200 7-9 200-500 9-10 500-1,000 10-11 1,000 5,000 11-13 5,000-50,000 13-17 More than 50,000 17-20 NO of Class= 1 +3.3 log (n) n: No of data/observation Class width = [Range] / [# of classes] [119.63-0] / [8] = 14.95 15 Largest observation Smallest observation Smallest observation Guide line 74
Frequency Draw Histogram 80 60 40 20 0 Draw a Histogram 15 30 45 60 75 90 105 120 Bills Bill Frequency 15 71 30 37 45 13 60 9 75 10 90 18 105 28 120 14 75
15 30 45 60 75 90 105 120 Frequency Extracting Information What information can we extract from this histogram About half of all A few bills are in the bills are small the middle range 71+37=108 13+9+10=32 80 60 40 20 0 Relatively, large number of large bills 18+28+14=60 Bills 76
Shapes of Histograms There are four typical shape characteristics 77
Shapes of Histograms One with the long tail extending to either right or left side Positively skewed Negatively skewed 78
Modal classes A modal class is the one with the largest number of observations. A unimodal histogram The modal class 79
Modal classes A bimodal histogram A modal class A modal class 80
Bell shaped histograms A special type of symmetric unimodal histogram is bell shaped Many statistical techniques require that the population be bell shaped. Drawing the histogram helps verify the shape of the population in question 81
Interpreting Histograms Example 2: Comparing students performance Students performance in two statistics classes were compared. The two classes differed in their teaching emphasis Class A mathematical analysis and development of theory. Class B applications and computer based analysis. The final mark for each student in each course was recorded. Draw histograms and interpret the results. 82
Data Marks (Manual) Marks (Computer) 77 59 75 60 65 81 72 59 74 83 71 50 71 53 85 66 75 77 75 52 66 70 72 71 75 74 74 47 79 76 77 68 67 78 53 46 65 73 64 72 72 67 49 50 82 73 77 75 81 82 56 51 80 85 89 74 76 55 61 44 86 83 87 77 79 73 61 52 67 80 78 69 73 92 54 53 64 67 79 60 75 71 44 56 62 78 59 92 52 53 54 53 74 68 63 69 72 75 78 76 67 67 84 69 72 70 73 82 72 62 74 73 83 59 81 82 68 83 74 65 83
Frequency Frequency Interpreting Histograms Histogram The mathematical emphasis creates two groups, and a larger spread. Histogram 40 20 0 50 60 70 80 90 100 Marks(Manual) 40 20 0 50 60 70 80 90 100 Marks(Computer) 84