8|SAMPLING

Overview

Roadmap

  • So far…
    • \(z\)-scores describe the location of a single score in a sample or in a population
    • Normal distributions: precisely quantify probability of obtaining certain scores
  • Moving forward…
    • Quantifying probability of obtaining certain sample statistics

Sampling error

Sampling error

  • Error: Discrepancy between a sample statistic and the population parameter

Sampling error

  • Discrepancy between a sample statistic and the population parameter

Sampling error: IQ

Distribution of Sample Means

  • “Distribution of Sample Means” / “Sampling distribution of the mean”
    • Distribution of sample means obtained by selecting all possible samples of size \(n\) from a population
    • Often huge number of possible samples
    • But distribution forms a simple & predictable pattern

Characteristics

  • Shape
    • The distribution will be approximately normal
    • Sample \(M\)s are representative of population \(\mu\)
    • Most means will be close to \(\mu\); means far from \(\mu\) are rare
  • Center
    • The center/average of the distribution will be close to \(\mu\)
    • \(M\) is a unbiased statistic
    • On average, \(M = \mu\)
  • Variability
    • Related to sample size, \(n\)
    • The larger the sample, the less the variability
    • Larger samples are more representative

Example: height distribution

Example: height distribution

Sample X1 X2 M
1 60 60 60
2 62 60 61
3 64 60 62
4 66 60 63
5 60 62 61
6 62 62 62
7 64 62 63
8 66 62 64
9 60 64 62
10 62 64 63
11 64 64 64
12 66 64 65
13 60 66 63
14 62 66 64
15 64 66 65
16 66 66 66

Sampling distribution (\(n = 2\))

\(p(M < 61) =\ ?\)

\(p(62 \le M \le 64) =\ ?\)

\(p(M > 65) =\ ?\)

Example: height distribution

  • Now we can calculate variability of sample means
    • Since we obtained every sample mean
    • Use population SD formula
\(X\) \(X-M\) \((X-M)^2\)
60 -3 9
61 -2 4
62 -1 1
63 0 0
61 -2 4
62 -1 1
63 0 0
64 1 1
62 -1 1
63 0 0
64 1 1
65 2 4
63 0 0
64 1 1
65 2 4
66 3 9
\(M = 63.00\) \(SS = 40.00\)
\(\sigma^2 = 2.50\)
\(\sigma = 1.58\)

Central Limit Theorem

  • Sampling & the Central Limit Theorem
    • Distribution of samples means based on all possible samples from a population not feasible in most realistic situations
    • But we can mathematically predict shape, mean, & variability for any sample size & population

Central Limit Theorem

  • For any population with mean \(\mu\) and standard deviation \(\sigma\), the distribution of sample means for sample size \(n\) will have…
    • An expected mean \(\mu_M\) of \(\mu\)
    • A standard deviation of \(\dfrac{\sigma} {\sqrt{n}}\)
    • And will approach a normal distribution as \(n\) approaches infinity

Shape

  • Almost perfectly normal in either of two conditions
    • The population from which the samples are selected is a normal distribution
    • Or…
    • Sample \(n\)s are relatively large
  • …what is relatively large?
    • As \(n\) approaches infinity, distribution of sample means approaches a normal distribution
    • But by \(n = 30\) means pile up symmetrically around \(\mu\)
    • Population distribution does not need to be normal; can be skewed, flat, bimodal, whatever

Mean

  • Mean of the distribution of sample means is called the expected value of \(M\) ( \(\mu_M\) )
    • On average, \(M = \mu_M = \mu\)
    • \(M\) is unbiased
    • If we only have a single sample \(M\), our best guess at the (unknown) population mean should always be the (known) sample mean
    • But we can acknowledge variability…

Variability

  • Standard deviation of the sample means
    • “Standard error of the mean”; \(\sigma_M\)
    • Measure of how well a sample mean estimates its population mean
    • How much sampling error we can expect; how much distance is expected on average between \(M\) and \(\mu\)

\(\sigma_M = \dfrac{\sigma}{\sqrt{n}}\) or \(\dfrac{\sqrt{\sigma^2}}{\sqrt{n}}\) or \(\sqrt{\dfrac{\sigma^2}{n}}\)

Variability

Variability

Variability

30

σM =

Variability: heights sampling dist

Sample X1 X2 M
1 60 60 60
2 62 60 61
3 64 60 62
4 66 60 63
5 60 62 61
6 62 62 62
7 64 62 63
8 66 62 64
9 60 64 62
10 62 64 63
11 64 64 64
12 66 64 65
13 60 66 63
14 62 66 64
15 64 66 65
16 66 66 66

Sampling distribution (\(n = 2\))

\(\sigma_M = \dfrac{\sigma}{\sqrt{n}} = \dfrac{2.24}{\sqrt{2}} = 1.58\)

Summary

  • Summary
    • Distribution of sample means for samples of size \(n\) will have…
      • a mean of \(\mu_M\)
      • standard deviation \(\sigma_M = \sigma / \sqrt{n}\)
      • Shape will be normal if population is normally distributed, or \(n > 30\)

Learning checks

  1. True or False?
    • The mean of a sample is always equal to the population mean
    • The shape of a distribution of sample means is always normal
    • As sample size increases, the value of the standard error always decreases
  2. Describe the distribution of sample means (shape, expected value of the mean, and standard error) for samples of \(n = 100\) selected from a population with \(\mu = 40\) and \(\sigma = 10\).

Galton board