BSTA 511/611
OHSU-PSU School of Public Health
2024-11-20
Analysis of Variance (ANOVA)
When to use an ANOVA
Hypotheses
ANOVA table
Different sources of variation in ANOVA
ANOVA conditions
F-distribution
Post-hoc testing of differences in means
Running an ANOVA in R
.txt
file.txt
(text) files are usually tab-deliminated files
.csv
files are comma-separated filesread_delim
is from the readr
package, just like read_csv
, and loads with other tidyverse
packagestrim_ws
: specify whether leading and trailing white space should be trimmed from each field before parsing it
Rows: 70
Columns: 2
$ disability <chr> "none", "none", "none", "none", "none", "none", "none", "no…
$ score <dbl> 1.9, 2.5, 3.0, 3.6, 4.1, 4.2, 4.9, 5.1, 5.4, 5.9, 6.1, 6.7,…
Read OHSU’s Inclusive Language Guide (below is from pgs. 22-25)
“… an evolving tool to help OHSU members learn about and use inclusive language…”
Sections on: Race and ethnicity, Immigration status, Gender and sexual orientation, and Ability (including physical, mental and chronological attributes)
disability
a factor variableMake disability
a factor variable:
What’s different now?
What are the current level names and order?
What changes are being made below?
fct_relevel()
and fct_recode()
are from the forcats
package: https://forcats.tidyverse.org/index.html.forcats
is loaded with library(tidyverse)
.New order & names:
score
distribution shapes within each group?score
measures of center and spread between the groupsTo test for a difference in means across k groups:
\[\begin{align} H_0 &: \mu_1 = \mu_2 = ... = \mu_k\\ \text{vs. } H_A&: \text{At least one pair } \mu_i \neq \mu_j \text{ for } i \neq j \end{align}\]
Hypothetical examples:
In which set (A or B) do you believe the evidence will be stronger that at least one population differs from the others?
Whether or not two means are significantly different depends on:
Questions:
lm
and aov
lm
= linear model; will be using frequently in BSTA 512Analysis of Variance Table
Response: score
Df Sum Sq Mean Sq F value Pr(>F)
disability 4 30.521 7.6304 2.8616 0.03013 *
Residuals 65 173.321 2.6665
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Df Sum Sq Mean Sq F value Pr(>F)
disability 4 30.52 7.630 2.862 0.0301 *
Residuals 65 173.32 2.666
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Hypotheses:
\[\begin{align} H_0 &: \mu_{none} = \mu_{amputation} = \mu_{crutches} = \mu_{hearing} = \mu_{wheelchair}\\ \text{vs. } H_A&: \text{At least one pair } \mu_i \neq \mu_j \text{ for } i \neq j \end{align}\]
Do we reject or fail to reject \(H_0\)?
Disability example ANOVA table from R:
Generic ANOVA table:
ANOVA compares the variability between groups to the variability within groups
Analysis of Variance (ANOVA) compares the variability between groups to the variability within groups
\[\sum_{i = 1}^k \sum_{j = 1}^{n_i}(x_{ij} -\bar{x})^2 \ \ = \ \sum_{i = 1}^k n_i(\bar{x}_{i}-\bar{x})^2 \ \ + \ \ \sum_{i = 1}^k\sum_{j = 1}^{n_i}(x_{ij}-\bar{x}_{i})^2\]
Observation | i = 1 | i = 2 | i = 3 | \(\ldots\) | i = k | overall |
---|---|---|---|---|---|---|
j = 1 | \(x_{11}\) | \(x_{21}\) | \(x_{31}\) | \(\ldots\) | \(x_{k1}\) | |
j = 2 | \(x_{12}\) | \(x_{22}\) | \(x_{32}\) | \(\ldots\) | \(x_{k2}\) | |
j = 3 | \(x_{13}\) | \(x_{23}\) | \(x_{33}\) | \(\ldots\) | \(x_{k3}\) | |
j = 4 | \(x_{14}\) | \(x_{24}\) | \(x_{34}\) | \(\ldots\) | \(x_{k4}\) | |
\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\ddots\) | \(\vdots\) | |
j = \(n_i\) | \(x_{1n_1}\) | \(x_{2n_2}\) | \(x_{3n_3}\) | \(\ldots\) | \(x_{kn_k}\) | |
Means | \(\bar{x}_{1}\) | \(\bar{x}_{2}\) | \(\bar{x}_{3}\) | \(\ldots\) | \(\bar{x}_{k}\) | \(\bar{x}\) |
Variance | \({s}^2_{1}\) | \({s}^2_{2}\) | \({s}^2_{3}\) | \(\ldots\) | \({s}^2_{k}\) | \({s}^2\) |
Total Sums of Squares:
\[SST = \sum_{i = 1}^k \sum_{j = 1}^{n_i}(x_{ij} -\bar{x})^2 = (N-1)s^2\]
where
This is the sum of the squared differences between each observed \(x_{ij}\) value and the grand mean, \(\bar{x}\).
That is, it is the total deviation of the \(x_{ij}\)’s from the grand mean.
Total Sums of Squares:
\[SST = \sum_{i = 1}^k \sum_{j = 1}^{n_i}(x_{ij} -\bar{x})^2 = (N-1)s^2\]
ANOVA compares the variability between groups to the variability within groups
Sums of Squares due to Groups:
\[SSG = \sum_{i = 1}^k n_i(\bar{x}_{i}-\bar{x})^2\]
This is the sum of the squared differences between each group mean, \(\bar{x}_{i}\), and the grand mean, \(\bar{x}\).
That is, it is the deviation of the group means from the grand mean.
Also called the Model SS, or \(SS_{model}.\)
\[SSG = \sum_{i = 1}^k n_i(\bar{x}_{i}-\bar{x})^2\]
Calculate means \(\bar{x}_i\) for each group:
# A tibble: 5 × 2
disability mean
<fct> <dbl>
1 none 4.9
2 amputation 4.43
3 crutches 5.92
4 hearing 4.05
5 wheelchair 5.34
Calculate \(SSG\):
ANOVA compares the variability between groups to the variability within groups
Sums of Squares Error:
\[SSE = \sum_{i = 1}^k\sum_{j = 1}^{n_i}(x_{ij}-\bar{x}_{i})^2 = \sum_{i = 1}^k(n_i-1)s_{i}^2\] where \(s_{i}\) is the standard deviation of the \(i^{th}\) group
This is the sum of the squared differences between each observed \(x_{ij}\) value and its group mean \(\bar{x}_{i}\).
That is, it is the deviation of the \(x_{ij}\)’s from the predicted score by group.
Also called the residual sums of squares, or \(SS_{residual}.\)
\[SSE = \sum_{i = 1}^k\sum_{j = 1}^{n_i}(x_{ij}-\bar{x}_{i})^2 = \sum_{i = 1}^k(n_i-1)s_{i}^2\] where \(s_{i}\) is the standard deviation of the \(i^{th}\) group
Calculate sd’s \(s_i\) for each group:
# A tibble: 5 × 2
disability SD
<fct> <dbl>
1 none 1.79
2 amputation 1.59
3 crutches 1.48
4 hearing 1.53
5 wheelchair 1.75
Calculate \(SSE\):
ANOVA compares the variability between groups to the variability within groups
\[\sum_{i = 1}^k \sum_{j = 1}^{n_i}(x_{ij} -\bar{x})^2 \ \ = \ \ n_i\sum_{i = 1}^k(\bar{x}_{i}-\bar{x})^2 \ \ + \ \ \sum_{i = 1}^k\sum_{j = 1}^{n_i}(x_{ij}-\bar{x}_{i})^2\]
\[(N-1)s^2 \ \ = \ \sum_{i = 1}^k n_i(\bar{x}_{i}-\bar{x})^2 \ \ + \ \ \sum_{i = 1}^k(n_i-1)s_{i}^2\]
If the groups are actually different, then which of these is more accurate?
If there really is a difference between the groups, we would expect the F-statistic to be which of these:
\[F = \frac{MSG}{MSE}\]
# Note that I'm saving the tidy anova table
# Will be pulling p-value from this on future slide
empl_lm <- lm(score ~ disability, data = employ) %>%
anova() %>%
tidy()
empl_lm %>% gt()
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
disability | 4 | 30.52143 | 7.630357 | 2.86158 | 0.03012686 |
Residuals | 65 | 173.32143 | 2.666484 | NA | NA |
Hypotheses:
\[\begin{align} H_0 &: \mu_{none} = \mu_{amputation} = \mu_{crutches} = \mu_{hearing} = \mu_{wheelchair}\\ \text{vs. } H_A&: \text{At least one pair } \mu_i \neq \mu_j \text{ for } i \neq j \end{align}\]
Do we reject or fail to reject \(H_0\)?
\[\begin{align} H_0 &: \mu_{none} = \mu_{amputation} = \mu_{crutches} = \mu_{hearing} = \mu_{wheelchair}\\ \text{vs. } H_A&: \text{At least one pair } \mu_i \neq \mu_j \text{ for } i \neq j \end{align}\]
# A tibble: 2 × 6
term df sumsq meansq statistic p.value
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 disability 4 30.5 7.63 2.86 0.0301
2 Residuals 65 173. 2.67 NA NA
[1] 0.03012686 NA
Pull the p-value using base R:
Pull the p-value using tidyverse:
Conclusion statement:
IF ALL of the following conditions hold:
THEN the sampling distribution of the F-statistic is an F-distribution
Bartlett’s test for equal variances
Note: \(H_A\) is same as saying that at least one of the group levels has a different variance
Caution
Bartlett test of homogeneity of variances
data: score by disability
Bartlett's K-squared = 0.7016, df = 4, p-value = 0.9511
Tip
Levene’s test for equality of variances is not as restrictive: see https://www.statology.org/levenes-test-r/
determining which groups are statistically different
Problem:
\[\begin{align} P(\text{making an error}) = & \alpha\\ P(\text{not making an error}) = & 1-\alpha\\ P(\text{not making an error in m tests}) = & (1-\alpha)^m\\ P(\text{making at least 1 error in m tests}) = & 1-(1-\alpha)^m \end{align}\]
A very conservative (but very popular) approach is to divide the \(\alpha\) level by how many tests \(m\) are being done:
\[\alpha_{Bonf} = \frac{\alpha}{m}\]
\[p\textrm{-value} < \alpha_{Bonf} = \frac{\alpha}{m}\] is the same as \[m \cdot (p\textrm{-value}) < \alpha\] The Bonferroni correction is popular since it’s very easy to implement.
Pairwise t-tests without any p-value adjustments:
Pairwise comparisons using t tests with pooled SD
data: employ$score and employ$disability
none amputation crutches hearing
amputation 0.4477 - - -
crutches 0.1028 0.0184 - -
hearing 0.1732 0.5418 0.0035 -
wheelchair 0.4756 0.1433 0.3520 0.0401
P value adjustment method: none
Pairwise t-tests with Bonferroni p-value adjustments:
Pairwise comparisons using t tests with pooled SD
data: employ$score and employ$disability
none amputation crutches hearing
amputation 1.000 - - -
crutches 1.000 0.184 - -
hearing 1.000 1.000 0.035 -
wheelchair 1.000 1.000 1.000 0.401
P value adjustment method: bonferroni
TukeyHSD()
creates a set of confidence intervals of the differences between means with the specified family-wise probability of coverage.# need to run the model using `aov` instead of `lm`
empl_aov <- aov(score ~ disability, data = employ)
TukeyHSD(x=empl_aov, conf.level = 0.95)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = score ~ disability, data = employ)
$disability
diff lwr upr p adj
amputation-none -0.4714286 -2.2031613 1.2603042 0.9399911
crutches-none 1.0214286 -0.7103042 2.7531613 0.4686233
hearing-none -0.8500000 -2.5817328 0.8817328 0.6442517
wheelchair-none 0.4428571 -1.2888756 2.1745899 0.9517374
crutches-amputation 1.4928571 -0.2388756 3.2245899 0.1232819
hearing-amputation -0.3785714 -2.1103042 1.3531613 0.9724743
wheelchair-amputation 0.9142857 -0.8174470 2.6460185 0.5781165
hearing-crutches -1.8714286 -3.6031613 -0.1396958 0.0277842
wheelchair-crutches -0.5785714 -2.3103042 1.1531613 0.8812293
wheelchair-hearing 1.2928571 -0.4388756 3.0245899 0.2348141
Pairwise comparisons using t tests with pooled SD
data: employ$score and employ$disability
none amputation crutches hearing
amputation 1.000 - - -
crutches 0.719 0.165 - -
hearing 0.866 1.000 0.035 -
wheelchair 1.000 0.860 1.000 0.321
P value adjustment method: holm
Pairwise comparisons using t tests with pooled SD
data: employ$score and employ$disability
none amputation crutches hearing
amputation 0.528 - - -
crutches 0.257 0.092 - -
hearing 0.289 0.542 0.035 -
wheelchair 0.528 0.287 0.503 0.134
P value adjustment method: fdr
post-hoc testing vs. testing many outcomes
Problem:
\[\begin{align} P(\text{making an error}) = & \alpha\\ P(\text{not making an error}) = & 1-\alpha\\ P(\text{not making an error in m tests}) = & (1-\alpha)^m\\ P(\text{making at least 1 error in m tests}) = & 1-(1-\alpha)^m \end{align}\]
\[\begin{align} H_0 &: \mu_1 = \mu_2 = ... = \mu_k\\ \text{vs. } H_A&: \text{At least one pair } \mu_i \neq \mu_j \text{ for } i \neq j \end{align}\]
ANOVA table in R:
ANOVA table
Post-hoc testing
F-distribution & p-value