BSTA 511/611
OHSU-PSU School of Public Health
2024-11-06
tidy()
the test output using broom
packageMoRitz loves using R projects to
Other bonuses include
We will discuss how to use projects later in today’s slides when loading a dataset.
See file Projects in RStudio for more information.
Question: based on the 1992 JAMA data, is there evidence to support that the population mean body temperature is different from 98.6°F?
Two approaches to answer this question:
This does not give us a range of plausible values for the population mean \(\mu\).
Instead, we calculate a test statistic and p-value
\[\bar{x} = 98.25,~s=0.733,~n=130\] CI for \(\mu\):
\[\begin{align} \bar{x} &\pm t^*\cdot\frac{s}{\sqrt{n}}\\ 98.25 &\pm 1.979\cdot\frac{0.733}{\sqrt{130}}\\ 98.25 &\pm 0.127\\ (98.123&, 98.377) \end{align}\]
Used \(t^*\) = qt(.975, df=129)
Conclusion:
We are 95% confident that the (population) mean body temperature is between 98.123°F and 98.377°F.
From before:
This does not give us a range of plausible values for the population mean \(\mu\).
Instead, we calculate a test statistic and p-value
How do we calculate a test statistic and p-value?
From the Central Limit Theorem (CLT), we know that
\[\bar{X}\sim N\Big(\mu_{\bar{X}} = \mu, \sigma_{\bar{X}}= \frac{\sigma}{\sqrt{n}}\Big)\]
Set the level of significance \(\alpha\)
Specify the null ( \(H_0\) ) and alternative ( \(H_A\) ) hypotheses
Calculate the test statistic.
Calculate the p-value based on the observed test statistic and its sampling distribution
Write a conclusion to the hypothesis test
In statistics, a hypothesis is a statement about the value of an unknown population parameter.
A hypothesis test consists of a test between two competing hypotheses:
Example of hypotheses in words:
\[\begin{aligned} H_0 &: \text{The population mean body temperature is 98.6°F}\\ \text{vs. } H_A &: \text{The population mean body temperature is not 98.6°F} \end{aligned}\]Notation for hypotheses:
\[\begin{aligned} H_0 &: \mu = \mu_0\\ \text{vs. } H_A&: \mu \neq, <, \textrm{or}, > \mu_0 \end{aligned}\]We call \(\mu_0\) the null value
\(H_A: \mu \neq \mu_0\)
\(H_A: \mu < \mu_0\)
\(H_A: \mu > \mu_0\)
Example:
\[\begin{aligned} H_0 &: \mu = 98.6\\ \text{vs. } H_A&: \mu \neq 98.6 \end{aligned}\]Case 1: know population sd \(\sigma\)
\[ \text{test statistic} = z_{\bar{x}} = \frac{\bar{x} - \mu_0}{\frac{\sigma}{\sqrt{n}}} \]
Case 2: don’t know population sd \(\sigma\)
\[ \text{test statistic} = t_{\bar{x}} = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}} \]
\(\bar{x}\) = sample mean, \(\mu_0\) = hypothesized population mean from \(H_0\),
\(\sigma\) = population standard deviation, \(s\) = sample standard deviation,
\(n\) = sample size
Assumptions: same as CLT
Recall that \(\bar{x} = 98.25\), \(s=0.733\), and \(n=130.\)
The test statistic is:
\[t_{\bar{x}} = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}} = \frac{98.25 - 98.6}{\frac{0.73}{\sqrt{130}}} = -5.45\]
Assumptions met?
The p-value is the probability of obtaining a test statistic just as extreme or more extreme than the observed test statistic assuming the null hypothesis \(H_0\) is true.
Calculate the p-value using the Student’s t-distribution with \(d.f. = n-1 = 129\):
\[p-value=P(T \leq -5.45) + P(T \geq 5.45) = 2.410889 \times 10^{-07}\]
[1] 2.410889e-07
Important
Conclusion statement:
tidy()
the test output using broom
packagegetwd()
function.[1] "/Users/niederha/Library/CloudStorage/OneDrive-OregonHealth&ScienceUniversity/teaching/BSTA 511/F24/0_webpage/BSTA_511_F24"
Note
BodyTemperatures.csv
here()
function from the here
package: here::here()
.# read_csv() is a function from the readr package that is a part of the tidyverse
library(here) # first install this package
BodyTemps <- read_csv(here::here("data", "BodyTemperatures.csv"))
# location: look in "data" folder
# for the file "BodyTemperatures.csv"
glimpse(BodyTemps)
Rows: 130
Columns: 3
$ Temperature <dbl> 96.3, 96.7, 96.9, 97.0, 97.1, 97.1, 97.1, 97.2, 97.3, 97.4…
$ Gender <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ HeartRate <dbl> 70, 71, 74, 80, 73, 75, 82, 64, 69, 70, 68, 72, 78, 70, 75…
here::here()
General use of here::here()
here::here("folder_name", "filename")
Resources for here::here()
:
here
package (Jenny Richmond)Project-oriented workflow (Jenny Bryan)
t.test
: base R’s function for testing one meanBodyTemps
when we loaded itRows: 130
Columns: 3
$ Temperature <dbl> 96.3, 96.7, 96.9, 97.0, 97.1, 97.1, 97.1, 97.2, 97.3, 97.4…
$ Gender <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ HeartRate <dbl> 70, 71, 74, 80, 73, 75, 82, 64, 69, 70, 68, 72, 78, 70, 75…
(temps_ttest <- t.test(x = BodyTemps$Temperature,
# alternative = "two.sided", # default
mu = 98.6))
One Sample t-test
data: BodyTemps$Temperature
t = -5.4548, df = 129, p-value = 2.411e-07
alternative hypothesis: true mean is not equal to 98.6
95 percent confidence interval:
98.12200 98.37646
sample estimates:
mean of x
98.24923
Note that the test output also gives the 95% CI using the t-distribution.
tidy()
the t.test
outputtidy()
function from the broom
package for briefer output in table format that’s stored as a tibble
gt()
function from the gt
package, we get a nice tableestimate | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|
98.24923 | -5.454823 | 2.410632e-07 | 129 | 98.122 | 98.37646 | One Sample t-test | two.sided |
tidy()
output is a tibble, we can easily pull()
specific values from it:CI’s and hypothesis testing for different scenarios:
Day | Section | Population parameter | Symbol | Point estimate | Symbol |
---|---|---|---|---|---|
10 | 5.1 | Pop mean | \(\mu\) | Sample mean | \(\bar{x}\) |
10 | 5.2 | Pop mean of paired diff | \(\mu_d\) or \(\delta\) | Sample mean of paired diff | \(\bar{x}_{d}\) |
11 | 5.3 | Diff in pop means | \(\mu_1-\mu_2\) | Diff in sample means | \(\bar{x}_1 - \bar{x}_2\) |
12 | 8.1 | Pop proportion | \(p\) | Sample prop | \(\widehat{p}\) |
12 | 8.2 | Diff in pop prop’s | \(p_1-p_2\) | Diff in sample prop’s | \(\widehat{p}_1-\widehat{p}_2\) |