Rows: 20
Columns: 2
$ Taps <dbl> 246, 248, 250, 252, 248, 250, 246, 248, 245, 250, 242, 245, 244,…
$ Group <chr> "Caffeine", "Caffeine", "Caffeine", "Caffeine", "Caffeine", "Caf…
BSTA 511/611
OHSU-PSU School of Public Health
2024-11-11
Add tabbed sections to your html file using tabset
.
::: panel-tabset
right above the subsection ### First tab
(see the code file).::: panel-tabset
command with :::
at the end.
:::
at the end of the ### Read up on tabsets
tab.If you are reading the source code file, the next line contains :::
, which closes the tabsets.
CI’s and hypothesis tests for different scenarios:
\[\text{point estimate} \pm z^*(or~t^*)\cdot SE,~~\text{test stat} = \frac{\text{point estimate}-\text{null value}}{SE}\]
Day | Book | Population parameter |
Symbol | Point estimate | Symbol | SE |
---|---|---|---|---|---|---|
10 | 5.1 | Pop mean | \(\mu\) | Sample mean | \(\bar{x}\) | \(\frac{s}{\sqrt{n}}\) |
10 | 5.2 | Pop mean of paired diff | \(\mu_d\) or \(\delta\) | Sample mean of paired diff | \(\bar{x}_{d}\) | \(\frac{s_d}{\sqrt{n}}\) |
11 | 5.3 | Diff in pop means |
\(\mu_1-\mu_2\) | Diff in sample means |
\(\bar{x}_1 - \bar{x}_2\) | ??? |
12 | 8.1 | Pop proportion | \(p\) | Sample prop | \(\widehat{p}\) | |
12 | 8.2 | Diff in pop proportions |
\(p_1-p_2\) | Diff in sample proportions |
\(\widehat{p}_1-\widehat{p}_2\) |
What are \(H_0\) and \(H_a\)?
What is the SE for \(\bar{x}_1 - \bar{x}_2\)?
Hypothesis test
Confidence Interval
Run test in R - using long vs. wide data
Satterthwaite’s df
Pooled SD
Any study where participants are randomized to a control and treatment group
Study where create two groups based on whether they were exposed or not to some condition (can be observational)
Book: “Does treatment using embryonic stem cells (ESCs) help improve heart function following a heart attack?”
Book: “Is there evidence that newborns from mothers who smoke have a different average birth weight than newborns from mothers who do not smoke?”
The key is that the data from the two groups are independent of each other.
Set the level of significance \(\alpha\)
Specify the null ( \(H_0\) ) and alternative ( \(H_A\) ) hypotheses
Calculate the test statistic.
Calculate the p-value based on the observed test statistic and its sampling distribution
Write a conclusion to the hypothesis test
Study Design:
Hand, David J.; Daly, Fergus; McConway, K.; Lunn, D. and Ostrowski, E. (1993). A handbook of small data sets. London, U.K.: Chapman and Hall.
CaffeineTaps.csv
data
that is in your R project folder (your working directory)Dotplot of taps/minute stratified by group
Null and alternative hypotheses in words
Include as much context as possible
\(H_0\): The population difference in mean finger taps/min between the caffeine and control groups is …
\(H_A\): The population difference in mean finger taps/min between the caffeine and control groups is …
Null and alternative hypotheses in symbols
\[\begin{align} H_0:& \mu_{caff} - \mu_{ctrl} = \\ H_A:& \mu_{caff} - \mu_{ctrl} \\ \end{align}\]
Recall that in general the test statistic has the form:
\[\text{test stat} = \frac{\text{point estimate}-\text{null value}}{SE}\] Thus, for a two sample independent means test, we have:
\[\text{test statistic} = \frac{\bar{x}_1 - \bar{x}_2 - 0}{SE_{\bar{x}_1 - \bar{x}_2}}\]
Let \(\bar{X}_1\) and \(\bar{X}_2\) be the means of random samples from two independent groups, with parameters shown in table:
Group 1 | Group 2 | |
---|---|---|
sample size | \(n_1\) | \(n_2\) |
pop mean | \(\mu_1\) | \(\mu_2\) |
pop sd | \(\sigma_1\) | \(\sigma_2\) |
Some theoretical statistics:
\[E[\bar{X}_1 - \bar{X}_2] = E[\bar{X}_1] - E[\bar{X}_2] = \mu_1-\mu_2\]
\[\begin{align} Var(\bar{X}_1 - \bar{X}_2) &= Var(\bar{X}_1) + Var(\bar{X}_2) = \frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2} \\ SD(\bar{X}_1 - \bar{X}_2) &= \sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}} \end{align}\]
\[ t_{\bar{x}_1 - \bar{x}_2} = \frac{\bar{x}_1 - \bar{x}_2 - 0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]
Assumptions:
Group | variable | n | mean | sd |
---|---|---|---|---|
Caffeine | Taps | 10 | 248.3 | 2.214 |
NoCaffeine | Taps | 10 | 244.8 | 2.394 |
\[ \text{test statistic} = t_{\bar{x}_1 - \bar{x}_2} = \frac{\bar{x}_1 - \bar{x}_2 - 0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]
Based on the value of the test statistic, do you think we are going to reject or fail to reject \(H_0\)?
Assumptions:
The p-value is the probability of obtaining a test statistic just as extreme or more extreme than the observed test statistic assuming the null hypothesis \(H_0\) is true.
Calculate the p-value:
\[\begin{align} H_0:& \mu_{caff} - \mu_{ctrl} = 0\\ H_A:& \mu_{caff} - \mu_{ctrl} > 0\\ \end{align}\]
Conclusion statement:
Group | variable | n | mean | sd |
---|---|---|---|---|
Caffeine | Taps | 10 | 248.3 | 2.214 |
NoCaffeine | Taps | 10 | 244.8 | 2.394 |
CI for \(\mu_{caff} - \mu_{ctrl}\):
\[\bar{x}_{caff} - \bar{x}_{ctrl} \pm t^* \cdot \sqrt{\frac{s_{caff}^2}{n_{caff}}+\frac{s_{ctrl}^2}{n_{ctrl}}}\]
Interpretation:
We are 95% confident that the (population) difference in mean finger taps/min between the caffeine and control groups is between 1.167 mg/dL and 5.833 mg/dL.
CaffTaps
data are in a long format, meaning that
Welch Two Sample t-test
data: Taps by Group
t = 3.3942, df = 17.89, p-value = 0.001628
alternative hypothesis: true difference in means between group Caffeine and group NoCaffeine is greater than 0
95 percent confidence interval:
1.711272 Inf
sample estimates:
mean in group Caffeine mean in group NoCaffeine
248.3 244.8
tidy
the t.test
outputestimate | estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|---|---|
3.5 | 248.3 | 244.8 | 3.394168 | 0.001627703 | 17.89012 | 1.711272 | Inf | Welch Two Sample t-test | greater |
# make CaffTaps data wide: pivot_wider needs an ID column so that it
# knows how to "match" values from the Caffeine and NoCaffeine groups
CaffTaps_wide <- CaffTaps %>%
mutate(id = rep(1:10, 2)) %>% # "fake" IDs for pivot_wider step
pivot_wider(names_from = "Group",
values_from = "Taps")
glimpse(CaffTaps_wide)
Rows: 10
Columns: 3
$ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ Caffeine <dbl> 246, 248, 250, 252, 248, 250, 246, 248, 245, 250
$ NoCaffeine <dbl> 242, 245, 244, 248, 247, 248, 242, 244, 246, 242
t.test(x = CaffTaps_wide$Caffeine, y = CaffTaps_wide$NoCaffeine, alternative = "greater") %>%
tidy() %>% gt()
estimate | estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|---|---|
3.5 | 248.3 | 244.8 | 3.394168 | 0.001627703 | 17.89012 | 1.711272 | Inf | Welch Two Sample t-test | greater |
From many slides ago:
The actual degrees of freedom are calculated using Satterthwaite’s method:
\[\nu = \frac{[ (s_1^2/n_1) + (s_2^2/n_2) ]^2} {(s_1^2/n_1)^2/(n_1 - 1) + (s_2^2/n_2)^2/(n_2-1) } = \frac{ [ SE_1^2 + SE_2^2 ]^2}{ SE_1^4/df_1 + SE_2^4/df_2 }\]
Verify the p-value in the R output using \(\nu\) = 17.89012:
\[s_{pooled}^2 = \frac{s_1^2 (n_1-1) + s_2^2 (n_2-1)}{n_1 + n_2 - 2}\]
\[SE = \sqrt{\frac{s_{pooled}^2}{n_1} + \frac{s_{pooled}^2}{n_2}}= s_{pooled}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\]
Test statistic with pooled SD:
\[t_{\bar{x}_1 - \bar{x}_2} = \frac{\bar{x}_1 - \bar{x}_2 -0}{s_{pooled}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\]
CI with pooled SD:
\[(\bar{x}_1 - \bar{x}_2) \pm t^{\star} \cdot s_{pooled} \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\]
\[df = (n_1 - 1) + (n_2 - 1) = n_1 + n_2 - 2.\]
# t-test with pooled SD
t.test(formula = Taps ~ Group,
alternative = "greater",
var.equal = TRUE, # pooled SD
data = CaffTaps) %>%
tidy() %>%
gt()
estimate | estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|---|---|
3.5 | 248.3 | 244.8 | 3.394168 | 0.001616497 | 18 | 1.711867 | Inf | Two Sample t-test | greater |
# t-test without pooled SD
t.test(formula = Taps ~ Group,
alternative = "greater",
var.equal = FALSE, # default, NOT pooled SD
data = CaffTaps) %>%
tidy() %>%
gt()
estimate | estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|---|---|
3.5 | 248.3 | 244.8 | 3.394168 | 0.001627703 | 17.89012 | 1.711272 | Inf | Welch Two Sample t-test | greater |
Similar output in this case - why??
CI’s and hypothesis tests for different scenarios:
\[\text{point estimate} \pm z^*(or~t^*)\cdot SE,~~\text{test stat} = \frac{\text{point estimate}-\text{null value}}{SE}\]
Day | Book | Population parameter |
Symbol | Point estimate | Symbol | SE |
---|---|---|---|---|---|---|
10 | 5.1 | Pop mean | \(\mu\) | Sample mean | \(\bar{x}\) | \(\frac{s}{\sqrt{n}}\) |
10 | 5.2 | Pop mean of paired diff | \(\mu_d\) or \(\delta\) | Sample mean of paired diff | \(\bar{x}_{d}\) | \(\frac{s_d}{\sqrt{n}}\) |
11 | 5.3 | Diff in pop means |
\(\mu_1-\mu_2\) | Diff in sample means |
\(\bar{x}_1 - \bar{x}_2\) | \(\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\) or pooled |
12 | 8.1 | Pop proportion | \(p\) | Sample prop | \(\widehat{p}\) | ??? |
12 | 8.2 | Diff in pop proportions |
\(p_1-p_2\) | Diff in sample proportions |
\(\widehat{p}_1-\widehat{p}_2\) | ??? |
Critical values & rejection region
Type I & II errors
Power
How to calculate sample size needed for a study?
Type I and Type II Errors - Making Mistakes in the Justice System
Applet for visualizing Type I & II errors and power: https://rpsychologist.com/d3/NHST/
From the applet at https://rpsychologist.com/d3/NHST/
Power vs. Type II error
Power = 1 - P(Type II error) = 1 - \(\beta\)
Thus as \(\beta\) = P(Type II error) decreases, the power increases
P(Type II error) decreases as the mean of the alternative population shifts further away from the mean of the null population (effect size gets bigger).
Typically want at least 80% power; 90% power is good
The left tail probability pnorm(-1.96, mean=3, sd=1, lower.tail=TRUE)
is essentially 0 in this case.
\[n=\left(s\frac{z_{1-\alpha/2}+z_{1-\beta}}{\mu-\mu_0}\right)^2\]
We would only need a sample size of 35 for 80% power!
However, this is an under-estimate since we used the normal instead of t-distribution.
Conversely, we can calculate how much power we had in our body temperature one-sample test, given the sample size of 130.
\[1-\beta= \Phi\left(z-z_{1-\alpha/2}\right)+\Phi\left(-z-z_{1-\alpha/2}\right) \quad ,\quad \text{where } z=\frac{\mu-\mu_0}{s/\sqrt{n}}\]
\(\Phi\) is the probability for a standard normal distribution
[1] -5.466595
[1] 0.9997731
If the population mean is 98.2 instead of 98.6, we have a 99.98% chance of correctly rejecting \(H_0\) when the sample size is 130.
pwr
for power analysespwr.t.test
for both one- and two-sample t-tests.pwr.t.test(n = NULL, d = NULL, sig.level = 0.05, power = NULL,
type = c("two.sample", "one.sample", "paired"),
alternative = c("two.sided", "less", "greater"))
d
is Cohen’s d effect size: small = 0.2, medium = 0.5, large = 0.8
One-sample test (or paired t-test):
\[d = \frac{\mu-\mu_0}{s}\]
Two-sample test (independent):
\[d = \frac{\bar{x}_1 - \bar{x}_2}{s_{pooled}}\]
\(\bar{x}_1 - \bar{x}_2\) is the difference in means between the two groups that one would want to be able to detect as being significant,
\(s_{pooled}\) is the pooled SD between the two groups - often assume have same sd in each group
R package pwr
for basic statistical tests
pwr
: sample size for one mean testpwr.t.test(n = NULL, d = NULL, sig.level = 0.05, power = NULL,
type = c("two.sample", "one.sample", "paired"),
alternative = c("two.sided", "less", "greater"))
d
is Cohen’s d effect size: \(d = \frac{\mu-\mu_0}{s}\)Specify all parameters except for the sample size:
pwr
: power for one mean testpwr.t.test(n = NULL, d = NULL, sig.level = 0.05, power = NULL,
type = c("two.sample", "one.sample", "paired"),
alternative = c("two.sided", "less", "greater"))
d
is Cohen’s d effect size: \(d = \frac{\mu-\mu_0}{s}\)Specify all parameters except for the power:
pwr
: Two-sample t-test: sample sizepwr.t.test(n = NULL, d = NULL, sig.level = 0.05, power = NULL,
type = c("two.sample", "one.sample", "paired"),
alternative = c("two.sided", "less", "greater"))
d
is Cohen’s d effect size: \(d = \frac{\bar{x}_1 - \bar{x}_2}{s_{pooled}}\)Example: Suppose the data collected for the caffeine taps study were pilot day for a larger study. Investigators want to know what sample size they would need to detect a 2 point difference between the two groups. Assume the SD in both groups is 2.3.
Specify all parameters except for the sample size:
pwr
: Two-sample t-test: powerpwr.t.test(n = NULL, d = NULL, sig.level = 0.05, power = NULL,
type = c("two.sample", "one.sample", "paired"),
alternative = c("two.sided", "less", "greater"))
d
is Cohen’s d effect size: \(d = \frac{\bar{x}_1 - \bar{x}_2}{s_{pooled}}\)Example: Suppose the data collected for the caffeine taps study were pilot day for a larger study. Investigators want to know what sample size they would need to detect a 2 point difference between the two groups. Assume the SD in both groups is 2.3.
Specify all parameters except for the power:
There are 4 pieces of information:
Given any 3 pieces of information, we can solve for the 4th.