HW 4: BSTA 511/611 F24
Due 11/9/24
Download the .qmd file for this assignment from https://github.com/niederhausen/BSTA_511_F24/blob/main/homework/HW_4_F24_bsta511.qmd
Graded exercises
The exercises listed below will be graded for this assignment. You are strongly encouraged to complete the entire assignment. You will receive feedback on exercises you turn in that are not being graded.
- Book exercises
- 5.6, 5.12
- Non-book exercise
- NB1: The Ethan Allen
- R exercises
- R1: Youth weights - Part 1
- R2: Youth weights - Part 2
- R3: Swim times
Directions
- Complete all exercises in this assignment using Quarto.
- I highly recommend using LaTeX to format equations.
- See the .qmd files from class notes for LaTeX code to make it easier to show your work in computations.
- For instructions on creating equations in the Visual editor, check out https://quarto.org/docs/get-started/authoring/rstudio.html#equations. html
- Also check out examples of LaTeX formatting for statistics created by recent biostats alum Ariel Weingarten.
- If you have difficulty rendering the LaTeX equations, I recommend installing and running the R package
tinytex
. See this website for instructions.
- Please upload your homework to Sakai. Upload both your .qmd code file and the rendered .html file.
- Use the assignment .qmd file linked to above as a template for your own assignment.
- Please always use the following naming convention for submitting your files:
- Lastname_Firstname_HWx.qmd, such as Niederhausen_Meike_HW2.qmd
- Lastname_Firstname_HWx.html, such as Niederhausen_Meike_HW2.html
- For each question, make sure to show all of your work.
- This includes all code and resulting output in the html file to support your answers for exercises requiring work done in R (including any arithmetic calculations).
- For non-calculation questions, this includes an explanation of your answer (why did you choose your answer?).
- For each question, include a sentence summarizing the answer for that question in the context of the research question.
It is a good idea to try rendering your document from time to time as you go along! Note that rendering automatically saves your Qmd file and rendering frequently helps you catch your errors more quickly.
Book exercises
4.2 Heights of adults
4.4 Mental health, Part I
4.6 Thanksgiving spending, Part I
4.8 Age at first marriage, Part I
5.6 Working backwards, Part II
5.10 t⋆ vs. z⋆
5.12 Auto exhaust and lead exposure
5.16 Paired or not, Part II
5.22 DDT exposure
Non-book exercises
NB1: The Ethan Allen
On October 5, 2005, a tour boat named the Ethan Allen capsized on Lake George in New York with 47 passengers aboard. In the inquiries that followed, it was suggested that the tour operators should have realized that the combined weight of so many passengers was likely to exceed the weight capacity of the boat. Could they have predicted this?
- The maximum weight capacity of passengers that the Ethan Allen could accommodate was estimated to be 7500 pounds.
- Data from the Centers for Disease Control and Prevention indicate that weights of American adults in 2005 had a mean of 167 pounds and a standard deviation of 35 pounds.
If the tour boat company consistently accepted 47 passengers, what we want to know is the probability that the combined weight of the 47 passengers would exceed this capacity.
(a) Maximum average weight
With 47 passengers on board, what is the maximum average weight that the Ethan Allen could accommodate?
(b) Probability of an individual
Assuming that the weights of American adults in 2005 can be modeled with a normal distribution, find the probability that an individual weighs more than the maximum average weight the Ethan Allen can accommodate.
(c) Probability a random sample
Calculate the probability that a random sample of 47 American adults has an average weight greater than the maximum average weight the Ethan Allen can accommodate.
(d) Theorem used?
What theorem did you use in the previous part, and why were you able to apply it to this problem?
(e) Could this have been predicted?
Could the tour operators have predicted that the combined weight of so many passengers was likely to exceed the weight capacity of the Ethan Allen?
R exercises
Load packages
Load all the packages you need in the first code chunk of the file that starts with #| label: "setup"
.
R1: Youth weights - Part 1
In this exercise you will use the YRBSS dataset we used in class on Day 8, to simulate the distribution of mean weights from repeated samples. Use the code from class where we simulated mean heights, and apply it to the weights (in pounds) as directed below.
You will need to install and load the moderndive
R package to use the rep_sample_n()
command from the class notes.
(a) set.seed()
Use the set.seed()
command to set a randomization seed. Use whatever number you want for the seed.
(b) 1000 random samples of size 10
Take 1000 random samples of size 10 and save the tibble with the random samples. Show the first 20 lines of this tibble.
(c) Mean weights from the 1000 random samples
Create a tibble with mean weights from the 1000 random samples. Show the first 10 rows of this tibble.
(d) Histogram of the 1000 mean weights
Make a histogram of the 1000 mean weights. What do we call this distribution? Describe the shape of the distribution.
(e) Mean and standard deviation of the 1000 sample mean weights
Calculate the mean and standard deviation of the 1000 sample mean weights. What is another name for this standard deviation?
(f) Theoretical values for mean and standard deviation
What are the theoretical values for mean and standard deviation of the sampling distribution from the CLT, and how do your simulated values compare to the theoretical values?
R2: Youth weights - Part 2
In this exercise you will use the YRBSS dataset again that we used in class on Days 8-9, to simulate the distribution of mean weights from repeated samples.
(a) CI
Suppose you took a random sample of size 50 from the YRBSS data, that has mean weight 130 pounds. Calculate and interpret a 90% confidence interval using the standard deviation of weights from the YRBSS “population.”
(b) Another CI
Calculate and interpret a 90% confidence interval assuming the standard deviation of weights from the random sample is 40.
R3: Swim times
- In these exercises you will use R to work through the swim times example from Section 5.2 in the textbook.
- The data are in the
oibiostats
package and calledswim
.
(a) Mean & SD of differences
Calculate the mean and standard deviation for the differences in swim times, and compare them to the ones in the book. Which order were the differences calculated, wet suit - swim suit or the opposite? Were all the differences positive?
(b) Dot plot of differences
Create a dot plot of the differences in swim times and comment on the distribution shape.
(c) Hypothesis test
Run the appropriate statistical test in R as both a one-sample t-test and a paired t-test to verify the test statistic, p-value, and CI in the text. Use inline R code to pull these values from the test output when writing up your comparison of these values to the book’s values.