3|FREQUENCY

Overview

Statistics

A bunch of numbers looking for an argument. 1

  • Mathematical procedures used to collect, organize, summarize & interpret information
    • Provide standardized evaluation procedures
    • Tell us about patterns of interest in data
  • Etymology
    • Statistics comes from status, meaning state
    • The state of the state
    • Census, birth/death rate, incomes, unemployment etc

Frequency

E.g. opinion polling

Frequency distributions

  • A frequency distribution…
    • Organizes and displays data
    • Conveys how scores are distributed
    • Can be either a table or a graph
  • Shows:
    • Categories that make up the scale
    • Frequency, or number of observations, in each category
    • And/or proportion/percentage/cumulative percent of scores in each category

Simple example

  • From raw scores
2 1 2 3 4 3 3 2 2 1 5 3 4 1 2
  • To this
\(X\) \(f\)
1 3
2 5
3 4
4 2
5 1
  • Or this

Frequency tables

  • Categories in a column labeled \(X\) (or a more meaningful name)
  • Frequency count in column labeled \(f\)
  • Can have columns for…
    • Proportion \((f / N)\)
    • Percentage (proportion * 100)
    • Cumulative percentage
\(X\) \(f\) Proportion Percent Cumulative Percent
1 3 0.20 20.00 20.00
2 5 0.33 33.33 53.33
3 4 0.27 26.67 80.00
4 2 0.13 13.33 93.33
5 1 0.07 6.67 100.00

Frequency tables

Midterm scores: 41 43 44 45 48 50 51 51 52 52 52 53 53 53 54 54 55 55 55 55 55 56 56 56 56 56 57 57 57 57 58 58 58 59 59 59 59 59 59 59
  • Regular frequency table not always appropriate
    • Large number of scores, low frequencies
  • Solution: Grouped frequency tables
    • Easier to understand
    • But lose information
\(X\) \(f\)
41 1
42 0
43 1
44 1
45 1
46 0
47 0
48 1
49 0
50 1
51 2
52 3
53 3
54 2
55 5
56 5
57 4
58 3
59 7

Grouped frequency table

  1. What’s the range of scores?
  2. How can you turn that into about 10 groups? (using a simple number, e.g. 2, 5, 10…)
  3. What should we make the bottom score of each interval? (so that the bottom is a multiple of width, i.e, start at 10, not 11)
  4. List intervals in \(X\) column, frequencies in \(f\) column
  5. (optional) Create columns for proportion, percent, cumulative percent

Grouped frequency table

  • Grouped frequency table
    • Bins all same width
    • Width is a simple number (2)
    • Bottom score is multiple of width (i.e, divisible by 2)
    • Produces good number of bins
\(X\) \(f\)
40-41 1
42-43 1
44-45 2
46-47 0
48-49 1
50-51 3
52-53 6
54-55 7
56-57 9
58-59 10

Data vizualization

[Interactive map]

Frequency graphs

  • Histogram, frequency polygon, bar chart, curve
  • Appropriate type depends on:
    • Level of measurement (nominal; ordinal; interval; ratio)
    • Describing sample or population?
    • Want to show more than one group?

Bar graph

  • For nominal or ordinal data
  • Categories on \(x\)-axis, frequency on \(y\)-axis
  • Spaces between adjacent bars indicates separate categories

Histogram

  • For interval or ratio data
  • Scores/bins on \(x\)-axis, frequency on \(y\)-axis
  • Height corresponds to frequency
  • Bars centered on category

Grouped histogram

\(X\) \(f\)
40-41 1
42-43 1
44-45 2
46-47 0
48-49 1
50-51 3
52-53 6
54-55 7
56-57 9
58-59 10

Frequency polygon

  • Basically the same as a histogram
    • Scores on the \(X\)-axis
    • Frequency on \(Y\)-axis
    • Dot above the center of each interval
    • Connect dots with a line
    • Close the polygon with lines to the \(Y = 0\) point
    • Can also be used with grouped frequency distribution data

Frequency polygon

  • Useful for comparing distributions

Frequency polygon

  • …where overlapping histograms are harder to understand

Population curve

  • Used for population distributions
    • When population is large, scores for each individual are usually not known
    • Smooth curve indicates exact scores were not used
    • Convey relative frequency

Learning checks

  1. Use the frequency table to the right to determine how many participants were in the study.
  2. Which graph (histogram / frequency polygon / bar chart / curve) is appropriate for showing:
    • Marital status (single/married/divorced)
    • Letter grades (A+, A, A-, B+, etc)
    • Time spent watching Netflix
  3. A grouped frequency distribution table has categories 0-9, 10-19, 20-29, and 30-39. What is the width of the interval 20-29?
\(X\) \(f\)
5 2
4 4
3 1
2 0
1 3