HW 3 v2: BSTA 511/611 F25
Due 11/1/25
Download the .qmd file for this assignment from https://github.com/niederhausen/BSTA_511_F25/blob/main/homework/HW_3_F25_bsta511_v2.qmd
Graded exercises
The exercises listed below will be graded for this assignment. You are strongly encouraged to complete the entire assignment. You will receive feedback on exercises you turn in that are not being graded.
- Book exercises
- 3.4, 3.8, 3.22, 3.32, 3.40, 5.6
- Non-book exercise
- NB 1 - I recommend completing this problem after completing book questions 3.4-3.6.
- NB 2: The Ethan Allen
- R exercises
- R1: Youth weights - Part 1
- R2: Youth weights - Part 2
Directions
*Starred exercises may be completed by hand (such as on paper or using a tablet) instead of using Quarto.- Some of these exercises require some R code. If you do these exercises by hand, make sure to include the R code written out by hand as well as the output from R (as done in the class notes), so that the TA only needs to grade one file for those exercises.
- Some of these exercises require some R code. If you do these exercises by hand, make sure to include the R code written out by hand as well as the output from R (as done in the class notes), so that the TA only needs to grade one file for those exercises.
- Make sure to check out the calculating probabilities in R code file: qmd, html
- For questions you are submitting in Quarto:
- Use LaTeX to format the equations.
- For instructions on creating equations in the Visual editor, check out https://quarto.org/docs/get-started/authoring/rstudio.html#equations.
- You can see examples of LaTeX formatting in the calculating probabilities in R code file: qmd, html
- Also check out examples of LaTeX formatting for statistics created by recent biostats alum Ariel Weingarten.
- If you have difficulty rendering the LaTeX equations, I recommend installing and running the R package
tinytex. See this website for instructions.
- Use LaTeX to format the equations.
- Day 6 special instructions (questions 3.4-3.6 and NB 1)
- Make sure to show your work algebraically. You can do the arithmetic in R to get the final answer, but the more important part is showing how you derived the expected value, variance, and standard deviation.
- First, define every random variable you use.
- Second, write out the mathematical model for the situation using the random variables you defined.
- Think of the model as how you would calculate the outcome if you were given the individual data points that the random variables are representing.
- This step does not involve any means or expected values!!!
- Then proceed to derive the expected value and variance of the model.
- See examples 3.14 and 3.16 in the notes for examples following these steps.
- Please upload your homework to Sakai. Upload both your .qmd code file and the rendered .html file (or just your pdf if completing the assignment by hand).
- Use the assignment .qmd file linked to above as a template for your own assignment.
- Please always use the following naming convention for submitting your files:
- Lastname_Firstname_HWx.qmd, such as Niederhausen_Meike_HW2.qmd
- Lastname_Firstname_HWx.html, such as Niederhausen_Meike_HW2.html
- For each question, make sure to show all of your work.
- This includes all code and resulting output in the html file to support your answers for exercises requiring work done in R (including any arithmetic calculations).
- For non-calculation questions, this includes an explanation of your answer (why did you choose your answer?).
- For each question, include a sentence summarizing the answer for that question in the context of the research question.
It is a good idea to try rendering your document from time to time as you go along! Note that rendering automatically saves your Qmd file and rendering frequently helps you catch your errors more quickly.
Book exercises
*3.4 Baggage fees
- Note: Part (a) asks you to “build a probability model.” See Figure 3.2 on the bottom of pg. 140 in the textbook for an example of a probability model.
*3.5 Gull clutch size
Review the solution in the back of the book for exercise 3.5.
(a)
For part (a), the answer is correct, but there is an error in the work and the notation is sloppy. Correct the error and rewrite the solution with proper notation.
(b)
For part (b), the answer is not correct, as a result of the error in the work with sloppy notation. Give a correct solution with proper notation.
*3.6 Scooping ice cream
*3.8 Chickenpox, Part I.
For #3.8, calculate the binomial probabilities two ways:
(1) using the formula (can use R to calculate factorials or the choose function) and
(2) using R functions for binomial distribution probabilities.
*3.10 Chickenpox, Part II.
For #3.10, you can use R functions to calculate the binomial probabilities instead of directly using the formula. However, include the mathematical formulas that would be used to calculate the probabilities.
!!! Instructions for Normal probability exercises 3.20 - 3.40 !!!
Additional Instructions - IMPORTANT!!!
- For ALL normal distribution exercises:
- Make a sketch of the normal distribution curve with the mean and 1 sd away from the mean clearly labeled, and the area representing probability of interest shaded in.
- If you are using R for these questions, see the example of how to create a normal curve with a specified area shaded in from the Probability distributions in R file.
- Calculate probabilities using both
- z-table
- R
- Make a sketch of the normal distribution curve with the mean and 1 sd away from the mean clearly labeled, and the area representing probability of interest shaded in.
*3.20 Area under the curve, Part II
*3.22 Triathlon times
*3.28 Arsenic poisoning
*3.30 Find the SD
*3.32 Chickenpox, Part III
*3.38 Stenographer’s typos
*3.40 Osteosarcoma in NYC
*4.4 Mental health, Part I
*4.6 Thanksgiving spending, Part I
*4.8 Age at first marriage, Part I
*5.6 Working backwards, Part II
*5.10 t⋆ vs. z⋆
Non-book exercise
*NB 1: Clinician time with patients
Suppose a clinician schedules 20 minutes to spend with each of their patients. However, they sometimes run over or end earlier. Based on past data, the mean “extra” time they spend with a patient is 3 minutes with a standard deviation of 2 minutes. Suppose they see 13 patients today and the extra times they spend with patients are independent from patient to patient.
(a) Expected total time
Find the expected total time they will spend with all of their patients today.
(b) SD of total time
Find the standard deviation of the total time they will spend with all of their patients today.
*NB 2: The Ethan Allen
On October 5, 2005, a tour boat named the Ethan Allen capsized on Lake George in New York with 47 passengers aboard. In the inquiries that followed, it was suggested that the tour operators should have realized that the combined weight of so many passengers was likely to exceed the weight capacity of the boat. Could they have predicted this?
- The maximum weight capacity of passengers that the Ethan Allen could accommodate was estimated to be 7500 pounds.
- Data from the Centers for Disease Control and Prevention indicate that weights of American adults in 2005 had a mean of 167 pounds and a standard deviation of 35 pounds.
If the tour boat company consistently accepted 47 passengers, what we want to know is the probability that the combined weight of the 47 passengers would exceed this capacity.
(a) Maximum average weight
With 47 passengers on board, what is the maximum average weight that the Ethan Allen could accommodate?
(b) Probability of an individual
Assuming that the weights of American adults in 2005 can be modeled with a normal distribution, find the probability that an individual weighs more than the maximum average weight the Ethan Allen can accommodate.
(c) Probability a random sample
Calculate the probability that a random sample of 47 American adults has an average weight greater than the maximum average weight the Ethan Allen can accommodate.
(d) Theorem used?
What theorem did you use in the previous part, and why were you able to apply it to this problem?
(e) Could this have been predicted?
Could the tour operators have predicted that the combined weight of so many passengers was likely to exceed the weight capacity of the Ethan Allen?
R exercises
Load packages
Load all the packages you need in the first code chunk of the file that starts with #| label: "setup".
R1: Youth weights - Part 1
In this exercise you will use the YRBSS dataset we used in class on Day 8, to simulate the distribution of mean weights from repeated samples. Use the code from class where we simulated mean heights, and apply it to the weights (in pounds) as directed below.
You will need to install and load the moderndive R package to use the rep_sample_n() command from the class notes.
(a) set.seed()
Use the set.seed() command to set a randomization seed. Use whatever number you want for the seed.
(b) 1000 random samples of size 10
Take 1000 random samples of size 10 and save the tibble with the random samples. Show the first 20 lines of this tibble.
(c) Mean weights from the 1000 random samples
Create a tibble with mean weights from the 1000 random samples. Show the first 10 rows of this tibble.
(d) Histogram of the 1000 mean weights
Make a histogram of the 1000 mean weights. What do we call this distribution? Describe the shape of the distribution.
(e) Mean and standard deviation of the 1000 sample mean weights
Calculate the mean and standard deviation of the 1000 sample mean weights. What is another name for this standard deviation?
(f) Theoretical values for mean and standard deviation
What are the theoretical values for mean and standard deviation of the sampling distribution from the CLT, and how do your simulated values compare to the theoretical values?
R2: Youth weights - Part 2
In this exercise you will use the YRBSS dataset again that we used in class on Days 8-9, to simulate the distribution of mean weights from repeated samples.
(a) CI
Suppose you took a random sample of size 50 from the YRBSS data, that has mean weight 130 pounds. Calculate and interpret a 90% confidence interval using the standard deviation of weights from the YRBSS “population.”
(b) Another CI
Calculate and interpret a 90% confidence interval assuming the standard deviation of weights from the random sample is 40.