HW 4: BSTA 511/611 F24

Author

Your name here - update this!!!!

Published

November 9, 2024

Due 11/9/24

Download the .qmd file for this assignment from https://github.com/niederhausen/BSTA_511_F24/blob/main/homework/HW_4_F24_bsta511.qmd

Graded exercises

The exercises listed below will be graded for this assignment. You are strongly encouraged to complete the entire assignment. You will receive feedback on exercises you turn in that are not being graded.

Book exercises
- 5.6, 5.12
Non-book exercise
- NB1: The Ethan Allen
R exercises
- R1: Youth weights - Part 1
- R2: Youth weights - Part 2
- R3: Swim times

Directions

Important

Complete all exercises in this assignment using Quarto.
I highly recommend using LaTeX to format equations.
- See the .qmd files from class notes for LaTeX code to make it easier to show your work in computations.
- For instructions on creating equations in the Visual editor, check out https://quarto.org/docs/get-started/authoring/rstudio.html#equations. html
- Also check out examples of LaTeX formatting for statistics created by recent biostats alum Ariel Weingarten.
- If you have difficulty rendering the LaTeX equations, I recommend installing and running the R package tinytex. See this website for instructions.

Please upload your homework to Sakai. Upload both your .qmd code file and the rendered .html file.
- Use the assignment .qmd file linked to above as a template for your own assignment.
Please always use the following naming convention for submitting your files:
- Lastname_Firstname_HWx.qmd, such as Niederhausen_Meike_HW2.qmd
- Lastname_Firstname_HWx.html, such as Niederhausen_Meike_HW2.html
For each question, make sure to show all of your work.
- This includes all code and resulting output in the html file to support your answers for exercises requiring work done in R (including any arithmetic calculations).
- For non-calculation questions, this includes an explanation of your answer (why did you choose your answer?).
For each question, include a sentence summarizing the answer for that question in the context of the research question.

Tip

It is a good idea to try rendering your document from time to time as you go along! Note that rendering automatically saves your Qmd file and rendering frequently helps you catch your errors more quickly.

Book exercises

4.2 Heights of adults

4.4 Mental health, Part I

4.6 Thanksgiving spending, Part I

4.8 Age at first marriage, Part I

5.6 Working backwards, Part II

5.10 t⋆ vs. z⋆

5.12 Auto exhaust and lead exposure

5.16 Paired or not, Part II

5.22 DDT exposure

Non-book exercises

NB1: The Ethan Allen

On October 5, 2005, a tour boat named the Ethan Allen capsized on Lake George in New York with 47 passengers aboard. In the inquiries that followed, it was suggested that the tour operators should have realized that the combined weight of so many passengers was likely to exceed the weight capacity of the boat. Could they have predicted this?

The maximum weight capacity of passengers that the Ethan Allen could accommodate was estimated to be 7500 pounds.
Data from the Centers for Disease Control and Prevention indicate that weights of American adults in 2005 had a mean of 167 pounds and a standard deviation of 35 pounds.

If the tour boat company consistently accepted 47 passengers, what we want to know is the probability that the combined weight of the 47 passengers would exceed this capacity.

(a) Maximum average weight

With 47 passengers on board, what is the maximum average weight that the Ethan Allen could accommodate?

(b) Probability of an individual

Assuming that the weights of American adults in 2005 can be modeled with a normal distribution, find the probability that an individual weighs more than the maximum average weight the Ethan Allen can accommodate.

(c) Probability a random sample

Calculate the probability that a random sample of 47 American adults has an average weight greater than the maximum average weight the Ethan Allen can accommodate.

(d) Theorem used?

What theorem did you use in the previous part, and why were you able to apply it to this problem?

(e) Could this have been predicted?

Could the tour operators have predicted that the combined weight of so many passengers was likely to exceed the weight capacity of the Ethan Allen?

R exercises

Load packages

Load all the packages you need in the first code chunk of the file that starts with #| label: "setup".

R1: Youth weights - Part 1

In this exercise you will use the YRBSS dataset we used in class on Day 8, to simulate the distribution of mean weights from repeated samples. Use the code from class where we simulated mean heights, and apply it to the weights (in pounds) as directed below.

Important

You will need to install and load the moderndive R package to use the rep_sample_n() command from the class notes.

(a) `set.seed()`

Use the set.seed() command to set a randomization seed. Use whatever number you want for the seed.

(b) 1000 random samples of size 10

Take 1000 random samples of size 10 and save the tibble with the random samples. Show the first 20 lines of this tibble.

(c) Mean weights from the 1000 random samples

Create a tibble with mean weights from the 1000 random samples. Show the first 10 rows of this tibble.

(d) Histogram of the 1000 mean weights

Make a histogram of the 1000 mean weights. What do we call this distribution? Describe the shape of the distribution.

(e) Mean and standard deviation of the 1000 sample mean weights

Calculate the mean and standard deviation of the 1000 sample mean weights. What is another name for this standard deviation?

(f) Theoretical values for mean and standard deviation

What are the theoretical values for mean and standard deviation of the sampling distribution from the CLT, and how do your simulated values compare to the theoretical values?

R2: Youth weights - Part 2

In this exercise you will use the YRBSS dataset again that we used in class on Days 8-9, to simulate the distribution of mean weights from repeated samples.

(a) CI

Suppose you took a random sample of size 50 from the YRBSS data, that has mean weight 130 pounds. Calculate and interpret a 90% confidence interval using the standard deviation of weights from the YRBSS “population.”

(b) Another CI

Calculate and interpret a 90% confidence interval assuming the standard deviation of weights from the random sample is 40.

R3: Swim times

In these exercises you will use R to work through the swim times example from Section 5.2 in the textbook.
The data are in the oibiostats package and called swim.

(a) Mean & SD of differences

Calculate the mean and standard deviation for the differences in swim times, and compare them to the ones in the book. Which order were the differences calculated, wet suit - swim suit or the opposite? Were all the differences positive?

(b) Dot plot of differences

Create a dot plot of the differences in swim times and comment on the distribution shape.

(c) Hypothesis test

Run the appropriate statistical test in R as both a one-sample t-test and a paired t-test to verify the test statistic, p-value, and CI in the text. Use inline R code to pull these values from the test output when writing up your comparison of these values to the book’s values.

Graded exercises

Directions

Book exercises

4.2 Heights of adults

4.4 Mental health, Part I

4.6 Thanksgiving spending, Part I

4.8 Age at first marriage, Part I

5.6 Working backwards, Part II

5.10 t⋆ vs. z⋆

5.12 Auto exhaust and lead exposure

5.16 Paired or not, Part II

5.22 DDT exposure

Non-book exercises

NB1: The Ethan Allen

(a) Maximum average weight

(b) Probability of an individual

(c) Probability a random sample

(d) Theorem used?

(e) Could this have been predicted?

R exercises

Load packages

R1: Youth weights - Part 1

(a) set.seed()

(b) 1000 random samples of size 10

(c) Mean weights from the 1000 random samples

(d) Histogram of the 1000 mean weights

(e) Mean and standard deviation of the 1000 sample mean weights

(f) Theoretical values for mean and standard deviation

R2: Youth weights - Part 2

(a) CI

(b) Another CI

R3: Swim times

(a) Mean & SD of differences

(b) Dot plot of differences

(c) Hypothesis test

(a) `set.seed()`