10|HYPOTHESIS TESTING

part 2

Overview

Inferential errors

  • Hypothesis testing is an inferential process
  • Incorrect inferences are possible
Actual situation
\(H_0\) true \(H_0\) false
Researcher’s
Decision
Reject \(H_0\) Type 1 error
\(\alpha\)
Correct
Fail to reject \(H_0\) Correct Type 2 error
\(\beta\)

Type 1 error

  • If \(H_0\) is true…

Inferential errors

  • Boy who cried wolf
    • Villagers make Type 1 error (false positive)
    • Type 2 error (false negative)
    • In that order

Effect size

  • Significant effects are not always substantial
    • As sample size increases, standard error of the mean decreases
    • Tiny treatment effect might come out as “statistically significant”
  • Need to consider effect size
    • How big is the treatment effect?
    • Quantifies the absolute magnitude of a treatment effect, independent of sample size

Quantifying effect size

  • One measure: Cohen’s \(d\)
    • Quantifies the absolute magnitude of a treatment effect, independent of sample size
    • Measures effect size in terms of standard deviation
    • \(d = 1.00\): treatment changed \(\mu\) by 1 SD

\[\text{Cohen's } d = \dfrac{\text{mean difference}}{\text{standard deviation}} = \dfrac{\mu_{treatment} - \mu_{no \ treatment}}{\sigma}\]

For \(z\)-tests:

\[\text{Estimated Cohen's }d = \dfrac{\text{mean difference}}{\text{standard deviation}} = \dfrac{M - \mu}{\sigma}\]

Interpreting Cohen’s \(d\)

  • Cohen’s rules of thumb
\(d\) Interpretation
0.2 Small
0.5 Medium
0.8 Large

Effect size & sample size

  • SAT scores: \(\mu = 500; \sigma = 100\)
    • Administer treatment (banana); \(M = 501\)
    • Significant? \((\alpha = .05\) two-tailed; critical values \(z = \pm 1.96)\)
    • Substantial? (effect size)

With 50 participants…

\[z = \dfrac{501 - 500}{100 / \sqrt{50}} = 0.06 \\ d = \dfrac{501 - 500}{100} = 0.01\]

With 50,000 participants…

\[z = \dfrac{501 - 500}{100 / \sqrt{50000}} = 2.22\\ d = \dfrac{501 - 500}{100} = 0.01\]

Statistical power

  • Power: Probability of correctly rejecting a false null hypothesis
    • Power = \(1 – \beta\)
Actual situation
\(H_0\) true \(H_0\) false
Researcher’s
Decision
Reject \(H_0\) Type 1 error
\(\alpha\)
Correct
\(1-\beta\)
Fail to reject \(H_0\) Correct Type 2 error
\(\beta\)

Power interactive

Population characteristics

Experiment parameters

σM =

Diagram options

\(X\)-axis:



\(\beta =\)
Power:

Influences

  • Factors that influence power
  • Effect size
    • Larger effect size; greater power
  • Sample size
    • Larger sample size; greater power
  • Alpha level
    • Lowering alpha (making the test more stringent) reduces power
  • Directional hypothesis
    • Using a one-tailed (directional) test increases power (relative to a two-tailed test)

Using statistical power

  • Power should be estimated before starting study
    • Using known quantities
    • Or, more often, making assumptions about factors that influence power
  • Determining whether a research study is likely to be successful
    • Specify effect size, \(n\), \(\alpha\); calculate power
  • Figuring out how many participants you need
    • Specify desired power (e.g. .8), expected effect size, \(\alpha\)
    • Calculate required sample size

Power & sample sizes

Grouping variable Dependent Variable \(d\) Required \(n\)
Gender Height 1.85 6
Liberal / Conservative How important is social equality? 0.69 34
Do you like eggs? [yes / no] How often do you eat egg salad? 0.58 48
Are you a smoker? [yes / no] What is the likelihood of a smoker dying from a smoking-related illness? 0.33 144
Do you prefer science or art? How many planets can you name correctly? 0.07 3669

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2013, January). Life after p-hacking. In Meeting of the society for personality and social psychology, New Orleans, LA (pp. 17-19). http://dx.doi.org/10.2139/ssrn.2205186

Low power


Running a study with low statistical power is like setting out to look for distant galaxies with a pair of binoculars: even if what you’re looking for is definitely out there, you have essentially no chance of seeing it.

Stuart Ritchie, Science Fictions

Learning checks

  • True/False
    • Larger differences between the sample and population mean increase effect size
    • Increasing the sample size increases the effect size
    • An effect that exists is more likely to be detected if \(n\) is large
    • An effect that exists is less likely to be detected if \(\sigma\) is large
    • A Type I error is like convicting an innocent person in a jury trial
    • A Type II error is like convicting a guilty person in a jury trial