HW 6: BSTA 511/611 F25

Author

Your name here - update this!!!!

Published

November 29, 2025

Due Sat 11/29/25

Download the .qmd file for this assignment from https://github.com/niederhausen/BSTA_511_F25/blob/main/homework/HW_6_F25_bsta511.qmd

Graded exercises

The exercises listed below will be graded for this assignment. You are strongly encouraged to complete the entire assignment. You will receive feedback on exercises you turn in that are not being graded.

  • Book exercises
    • 5.44, 6.10
  • R exercises
    • R1: Palmer Penguins ANOVA
    • R2: Palmer Penguins SLR

Directions

Important
  • Complete ALL exercises in this assignment using Quarto.
  • Use LaTeX to format equations.

Hypothesis test instructions

Important
  • For book exercises, make sure to include all steps in a hypothesis test (where applicable) as outlined in the class notes.

  • Do not forget to include a discussion on whether you think the test (or CI) assumptions have been satisfied. Are there assumptions you need to make in order for them to be satisfied? Whether you believe they are satisfied or not, continue to run the hypothesis test (or CI) as instructed.

  • Please upload your homework to Sakai. Upload both your .qmd code file and the rendered .html file (or just your pdf if completing the assignment by hand).
    • Use the assignment .qmd file linked to above as a template for your own assignment.
  • Please always use the following naming convention for submitting your files:
    • Lastname_Firstname_HWx.qmd, such as Niederhausen_Meike_HW2.qmd
    • Lastname_Firstname_HWx.html, such as Niederhausen_Meike_HW2.html
  • For each question, make sure to show all of your work.
    • This includes all code and resulting output in the html file to support your answers for exercises requiring work done in R (including any arithmetic calculations).
    • For non-calculation questions, this includes an explanation of your answer (why did you choose your answer?).
  • For each question, include a sentence summarizing the answer for that question in the context of the research question.
Tip

It is a good idea to try rendering your document from time to time as you go along! Note that rendering automatically saves your Qmd file and rendering frequently helps you catch your errors more quickly.

Book exercises

5.44 Work hours and education

6.10 Guppies, Part I

R exercises

Load packages

Load all the packages you need in the first code chunk of the file that starts with #| label: "setup".

R1: Palmer Penguins ANOVA

  • Use the penguins data from the palmerpenguins package.
    • Don’t forget to first install the palmerpenguins package
  • You can learn more about the Palmer penguins data at https://allisonhorst.github.io/palmerpenguins/
  • We will test whether there are differences in penguins’ mean bill depths when comparing different species.
library(palmerpenguins)
data(penguins)

(a) Dotplots

Make a dotplot of the penguins’ bill depths stratified by species type. Include points for the mean of each species type as well as a horizontal dashed line for the overall mean. See example from class for the plot I’m describing.

(b) Which groups significantly different?

Based on the figure, which pairs of species look like they have significantly different mean bill depths?

(c) Hypotheses in words

Write out in words the null and alternative hypotheses.

(d) Hypotheses in symbols

Write out in symbols the null and alternative hypotheses.

(e) Run ANOVA in R

Using R, run the hypothesis test and display the output.

(f) SST

Using the values from the ANOVA table, calculate the value of the SST (total sum of squares).

(g) MSG & MSE

Using the values from the ANOVA table, verify (calculate) the values of the MSG (mean square groups) and MSE (mean square error).

(h) F statistic

Using the values from the ANOVA table, verify (calculate) the value of the F statistic.

(i) p-value

Using the values from the ANOVA table, verify (calculate) the p-value.

(j) Decision?

Based on the p-value, will we reject or fail to reject the null hypothesis? Why?

(k) Conclusion

Write a conclusion to the hypothesis test in the context of the problem.

(l) Technical conditions

Investigate whether the technical conditions for using an ANOVA been satisfied.

(m) Post-hoc pairwise t-tests: no correction

Run post-hoc pairwise t-tests using NO p-value correction. Which pairs of species have significantly different bill depths?

(n) Post-hoc pairwise t-tests: Bonferroni correction

Run post-hoc pairwise t-tests using a Bonferroni correction. Which pairs of species have significantly different bill depths?

(o) Hypothetical Bonferroni correction

If hypothetically the p-value comparing the mean bill depths of the Adelie and Chinstrap species were 0.03 without any p-value adjustment, what would the p-value be after running the post-hoc pairwise t-tests using a Bonferroni correction?

(p) Post-hoc pairwise t-tests: Tukey’s Honest Significance Test correction

Run post-hoc pairwise t-tests using Tukey’s Honest Significance Test correction. Which pairs of species have significantly different bill depths?

(q) Tukey confidence intervals

Make a plot showing the 95% family-wise Tukey confidence intervals. How does this plot visually confirm the which pairs of species have significantly different bill depths?

R2: Palmer Penguins SLR

Important

Below I frequently use the terminology variable1 vs. variable2. When we write this, the first variable is \(y\) (vertical axis) and the second is \(x\) (horizontal axis). Thus it’s always \(y\) vs. \(x\) (NOT \(x\) vs. \(y\)).

(a) Scatterplots

  • For each of the following pairs of variables, make a scatterplot showing the best fit line and describe the relationship between the variables.
  • In particular address
    • whether the association is linear,
    • how strong it is (based purely on the plot), and
    • what direction (positive, negative, or neither).
  1. body mass vs. flipper length

  2. bill depth vs. flipper length

  3. bill depth vs. bill length

(b) Correlations

  • For each of the following pairs of variables, find the correlation coefficient \(r\).
  1. body mass vs. flipper length

  2. bill depth vs. flipper length

  3. bill depth vs. bill length

(c) Compare associations

Which pair of variables has the strongest association? Which has the weakest? Explain how you determined this.

(d) Body mass vs. flipper length SLR

Run the simple linear regression model for body mass vs. flipper length, and display the regression table output.

(e) Regression equation

Write out the regression equation for this model, using the variable names instead of the generic \(x\) and \(y\), and inserting the regression coefficient values.

(f) \(b_1\) calculation

Very that the formula \(b_1 = r \cdot \frac{s_y}{s_x}\) holds for this example using the values of the statistics.

(g) Interpret intercept

Write a sentence interpreting the intercept for this example. Is it meaningful in this example?

(h) Interpret slope

Write a sentence interpreting the slope for this example.

(i) Prediction

What is the expected body mass of a penguin with flipper length 210 mm based on the model?