19|CORRELATION

Overview

Correlational research designs
Correlation statistic
Hypothesis test
Correlation & effect size
Learning checks

Correlational research designs

Correlational designs versus experimental designs
- No independent variable
- Measure two (or more) variables per participant
- Examine relationship between variables
Example research questions
- Is income related to happiness?
- Does practice make perfect?
- Class attendance & final grades

Limitations of correlation

Correlations are not usually useful for demonstrating causation
- Establishing causation requires experiments in which something is manipulated
E.g. coffee consumption & intelligence
- Perhaps…
  - A causes B
  - B causes A
  - Something else (C) causes A and B

Corley, J., et al. (2010). Caffeine consumption and cognitive function at age 70: The Lothian Birth Cohort 1936 study. Psychosomatic medicine 72(2), 206-214. https://doi.org/10.1097/PSY.0b013e3181c92a9c

Correlation statistic

Correlation coefficient
- A number quantifying the association between two variables
Three characteristics
- Strength or consistency (varies from 0 to 1)
- Direction (negative or positive)
- Form (e.g. linear)
Characteristics are independent
- Can be positive & weak, negative & strong, etc

Logic

Participant	Sleep	Score
A	4	5
B	5	8
C	7	8
D	8	10
E	11	9

Pearson’s \(r\)

Most widely used correlation statistic
- Measures the degree and the direction of the linear relationship between two variables
Variability & covariability
- Variability: How much each variable varies
- Covariability: How much \(X\) and \(Y\) vary in tandem
- Are changes in \(X\) associated with corresponding changes in \(Y\)?
- E.g. as \(X\) increases, \(Y\) tends to increase (positive)
- Or, as \(X\) increases, \(Y\) tends to decrease (negative)

Pearson’s \(r\)

\(r = \dfrac{SP}{\sqrt{SS_X SS_Y}} = \dfrac{\textrm{covariability of X and Y}}{\textrm{variability of X and Y separately}}\)

Perfect linear relationship…
- If every change in \(X\) has a corresponding change in \(Y\), variability separately = covariability
- Correlation will be \(–1.00\) or \(+1.00\)

Hypothesis test

Step 1. State hypotheses

Correlation coefficient \(r\) is computed for sample data
Hypotheses concerns relationship in the population
Greek letter \(\rho\) (rho) represents population correlation
Non-directional:
- \(H_0: \rho = 0\)
- \(H_1: \rho \ne 0\)
Directional:
- \(H_0: \rho \le 0\); \(H_1: ρ > 0\)
- Or…
- \(H_0: \rho \ge 0\); \(H_1: \rho < 0\)

Step 2. Critical region

Critical value for \(r\) is a type of \(t\) statistic
- \(df = n - 2\)

Proportion in 1 tail	0.1	0.05	0.025	0.01	0.005
Proportion in 2 tails	0.2	0.1	0.05	0.02	0.01
1	3.078	6.314	12.706	31.821	63.657
2	1.886	2.920	4.303	6.965	9.925
3	1.638	2.353	3.182	4.541	5.841
4	1.533	2.132	2.776	3.747	4.604
5	1.476	2.015	2.571	3.365	4.032
6	1.440	1.943	2.447	3.143	3.707
7	1.415	1.895	2.365	2.998	3.499
\(df\) 8	1.397	1.860	2.306	2.896	3.355
9	1.383	1.833	2.262	2.821	3.250
10	1.372	1.812	2.228	2.764	3.169
11	1.363	1.796	2.201	2.718	3.106
12	1.356	1.782	2.179	2.681	3.055
13	1.350	1.771	2.160	2.650	3.012
14	1.345	1.761	2.145	2.624	2.977
15	1.341	1.753	2.131	2.602	2.947
...	...	...	...	...	...

Step 3. Calculate statistic

\(r = \dfrac{SP}{\sqrt{SS_X SS_Y}}\)

Numerator: Sum of Products, \(SP\)
- Multiply deviations by one another

Definitional formula

\(SP = \Sigma(X - M_X)(Y - M_Y)\)

Denominator: Sum of Squares, \(SS\)
- Multiply deviations by themselves

Definitional formula

\[\begin{align} SS &= \Sigma(X - M_X)^2 \\ &= \Sigma(X - M_X)(X - M_X) \end{align}\]

Computational formula

\(SP = \Sigma XY - \dfrac{\Sigma X \Sigma Y}{n}\)

Computational formula

\[\begin{align} SS &= \Sigma X^2 - \dfrac{(\Sigma X)^2}{n}\\ &= \Sigma X X - \dfrac{(\Sigma X)(\Sigma X)}{n} \end{align}\]

Calculating \(r\)

\(X\)	\(X-M_X\)	\((X-M_X)^2\)	\(Y\)	\(Y-M_Y\)	\((Y-M_Y)^2\)	\(P\)
4	-3	9	5	-3	9	9
5	-2	4	8	0	0	0
7	0	0	8	0	0	0
8	1	1	10	2	4	2
11	4	16	9	1	1	4
\(M_X = 7.00\)		\(SS_X = 30.00\)	\(M_Y = 8.00\)		\(SS_Y = 14.00\)	\(SP = 15.00\)
		\(s^2_X = 7.50\)			\(s^2_Y = 3.50\)
		\(s_X = 2.74\)			\(s_Y = 1.87\)

\(r = \dfrac{SP}{\sqrt{SS_X SS_Y}}\)

Calculating \(r\)

\(X\)	\(X-M_X\)	\((X-M_X)^2\)	\(Y\)	\(Y-M_Y\)	\((Y-M_Y)^2\)	\(P\)
4	-3	9	5	-3	9	9
5	-2	4	8	0	0	0
7	0	0	8	0	0	0
8	1	1	10	2	4	2
11	4	16	9	1	1	4
\(M_X = 7.00\)		\(SS_X = 30.00\)	\(M_Y = 8.00\)		\(SS_Y = 14.00\)	\(SP = 15.00\)
		\(s^2_X = 7.50\)			\(s^2_Y = 3.50\)
		\(s_X = 2.74\)			\(s_Y = 1.87\)

\(r = \dfrac{SP}{\sqrt{SS_X SS_Y}}\)

Calculating \(r\)

\(X\)	\(X-M_X\)	\((X-M_X)^2\)	\(Y\)	\(Y-M_Y\)	\((Y-M_Y)^2\)	\(P\)
4	-3	9	5	-3	9	9
5	-2	4	8	0	0	0
7	0	0	8	0	0	0
8	1	1	10	2	4	2
11	4	16	9	1	1	4
\(M_X = 7.00\)		\(SS_X = 30.00\)	\(M_Y = 8.00\)		\(SS_Y = 14.00\)	\(SP = 15.00\)
		\(s^2_X = 7.50\)			\(s^2_Y = 3.50\)
		\(s_X = 2.74\)			\(s_Y = 1.87\)

\(r = \dfrac{SP}{\sqrt{SS_X SS_Y}} = \dfrac{15}{\sqrt{30*14}} = 0.73\)

Calculating \(r\): computation approach

\(X\)	\(X^2\)	\(Y\)	\(Y^2\)	\(XY\)
4	16	5	25	20
5	25	8	64	40
7	49	8	64	56
8	64	10	100	80
11	121	9	81	99
\(\Sigma X = 35\)	\(\Sigma X^2 = 275\)	\(\Sigma Y = 40\)	\(\Sigma Y^2 = 334\)	\(\Sigma XY = 295\)

\(SS_X = \Sigma X^2 - \dfrac{(\Sigma X)^2}{n} = 275 - \dfrac{35^2}{5} = 30\)

\(SS_Y = \Sigma Y^2 - \dfrac{(\Sigma Y)^2}{n} = 334 - \dfrac{40^2}{5} = 14\)

\(SP = \Sigma XY - \dfrac{\Sigma X \Sigma Y}{n} = 295 - \dfrac{35 \cdot 40}{5} = 15\)

\(r = \dfrac{SP}{\sqrt{SS_X SS_Y}} = \dfrac{15}{\sqrt{30 \cdot 14}} = 0.73\)

Step 4. Make decision

Hypothesis test for \(r\) is a type of \(t\) test

\(t = \dfrac{\text{sample statistic} - \text{population parameter}}{\text{standard error}}\)

\(t = \dfrac{r - \rho}{s_r}\)

\(s_r = \sqrt{\dfrac{1 - r^2}{n-2}}\)

\(t = \dfrac{0.73}{\sqrt{\dfrac{1-0.73^2}{5 - 2}}} = 1.86\)

Step 5. Report results

Value of correlation, \(r\)
\(df\) (or sample size)
\(p\)-value or \(\alpha\) level

“More sleep was associated with higher test scores; however, the correlation did not reach statistical significance; \(r (3) = .73, p > .05\).”

Correlation & effect size

Pearson’s \(r\) correlation coefficient is a standardized measure of effect size
- Quantifies degree of association on a scale from \(0\) to \(1\)
- Independent of sample size
- Related to \(r^2\)
- Even related to Cohen’s \(d\)

\(d = \dfrac{2r}{\sqrt{1-r^2}}\)

Interpreting effect size

Cohen (1977)

\(d\)	\(r\)	Description
0.3	0.1	Small
0.5	0.3	Medium
0.8	0.5	Large

Funder & Ozer (2019)

\(d\)	\(r\)	Description
0.10	0.05	Very small for the explanation of single events but potentially consequential in longer run
0.20	0.10	Still small at the level of single events but potentially more ultimately consequential
0.40	0.20	Medium effect of some explanatory and practical use even in the short run
0.60	0.30	Large effect that is potentially powerful in both the short and the long run
0.80	0.40	A very large effect in the context of psychological research; likely to be a gross overestimate

Funder, D. C., & Ozer, D. J. (2019). Evaluating Effect Size in Psychological Research: Sense and Nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156–168. https://doi.org/10.1177/2515245919847202

Learning checks

True/False
1. The numerator of Pearson’s \(r\) cannot be negative
2. The denominator of Pearson’s \(r\) cannot be negative
3. In a non-directional significance test of a correlation, the null hypothesis states that the population correlation is zero

19|CORRELATION

Overview

Correlational research designs

Limitations of correlation

Correlation statistic

Logic

Pearson’s \(r\)

Pearson’s \(r\)

Hypothesis test

Step 1. State hypotheses

Step 2. Critical region

Step 3. Calculate statistic

Calculating \(r\)

Calculating \(r\)

Calculating \(r\)

Calculating \(r\): computation approach

Step 4. Make decision

Step 5. Report results

Correlation & effect size

Interpreting effect size

See also…

Learning checks