library(palmerpenguins)
data(penguins)
HW 8: BSTA 511-611 F23
Due Monday 11/27/23
Download the .qmd file for this assignment from https://github.com/niederhausen/BSTA_511_F23/blob/main/homework/HW_8_F23_bsta511.qmd
Directions
- Please upload your homework to Sakai. Upload both your .qmd code file and the rendered .html file.
- For each question, make sure to include all code and resulting output in the html file to support your answers.
R & LaTeX code
- See the .qmd files with the code from class notes for LaTeX and R code.
- The LaTeX code will make it easier to show your work in computations.
It is a good idea to try rendering your document from time to time as you go along! Note that rendering automatically saves your Qmd file and rendering frequently helps you catch your errors more quickly.
Book exercises
5.44 Work hours and education
5.46 Child care hours
5.48 True/False: ANOVA, Part II
6.2 Identify relationships, Part II
6.6 Over-under, Part II
6.10 Guppies, Part I
1 R exercises
1.1 Load all the packages you need below here.
1.2 R1: Palmer Penguins ANOVA
- Use the
penguins
data from thepalmerpenguins
package.- Don’t forget to first install the
palmerpenguins
package
- Don’t forget to first install the
- You can learn more about the Palmer penguins data at https://allisonhorst.github.io/palmerpenguins/
- We will test whether there are differences in penguins’ mean bill depths when comparing different species.
1.2.1 Dotplots
Make a dotplot of the penguins’ bill depths stratified by species type. Include points for the mean of each species type as well as a horizontal dashed line for the overall mean. See example from class for the plot I’m describing.
1.2.2 Which groups significantly different?
Based on the figure, which pairs of species look like they have significantly different mean bill depths?
1.2.3 Hypotheses in words
Write out in words the null and alternative hypotheses.
1.2.4 Hypotheses in symbols
Write out in symbols the null and alternative hypotheses.
1.2.5 Run ANOVA in R
Using R, run the hypothesis test and display the output.
1.2.6 SST
Using the values from the ANOVA table, calculate the value of the SST (total sum of squares).
1.2.7 MSG & MSE
Using the values from the ANOVA table, verify (calculate) the values of the MSG (mean square groups) and MSE (mean square error).
1.2.8 F statistic
Using the values from the ANOVA table, verify (calculate) the value of the F statistic.
1.2.9 p-value
Using the values from the ANOVA table, verify (calculate) the p-value.
1.2.10 Decision?
Based on the p-value, will we reject or fail to reject the null hypothesis? Why?
1.2.11 Conclusion
Write a conclusion to the hypothesis test in the context of the problem.
1.2.12 Technical conditions
Investigate whether the technical conditions for using an ANOVA been satisfied.
1.2.13 Post-hoc pairwise t-tests: no correction
Run post-hoc pairwise t-tests using NO p-value correction. Which pairs of species have significantly different bill depths?
1.2.14 Post-hoc pairwise t-tests: Bonferroni correction
Run post-hoc pairwise t-tests using a Bonferroni correction. Which pairs of species have significantly different bill depths?
1.2.15 Hypothetical Bonferroni correction
If hypothetically the p-value comparing the mean bill depths of the Adelie and Chinstrap species were 0.03 without any p-value adjustment, what would the p-value be after running the post-hoc pairwise t-tests using a Bonferroni correction?
1.2.16 Post-hoc pairwise t-tests: Tukey’s Honest Significance Test correction
Run post-hoc pairwise t-tests using Tukey’s Honest Significance Test correction. Which pairs of species have significantly different bill depths?
1.2.17 Tukey confidence intervals
Make a plot showing the 95% family-wise Tukey confidence intervals. How does this plot visually confirm the which pairs of species have significantly different bill depths?
1.3 R2: Palmer Penguins SLR
Below I frequently use the terminology variable1 vs. variable2. When we write this, the first variable is \(y\) (vertical axis) and the second is \(x\) (horizontal axis). Thus it’s always \(y\) vs. \(x\) (NOT \(x\) vs. \(y\)).
1.3.1 Scatterplots
- For each of the following pairs of variables, make a scatterplot showing the best fit line and describe the relationship between the variables.
- In particular address
- whether the association is linear,
- how strong it is (based purely on the plot), and
- what direction (positive, negative, or neither).
body mass vs. flipper length
bill depth vs. flipper length
bill depth vs. bill length
1.3.2 Correlations
- For each of the following pairs of variables, find the correlation coefficient \(r\).
body mass vs. flipper length
bill depth vs. flipper length
bill depth vs. bill length
1.3.3 Compare associations
Which pair of variables has the strongest association? Which has the weakest? Explain how you determined this.
1.3.4 Body mass vs. flipper length SLR
Run the simple linear regression model for body mass vs. flipper length, and display the regression table output.
1.3.5 Regression equation
Write out the regression equation for this model, using the variable names instead of the generic \(x\) and \(y\), and inserting the regression coefficient values.
1.3.6 \(b_1\) calculation
Very that the formula \(b_1 = r \cdot \frac{s_y}{s_x}\) holds for this example using the values of the statistics.
1.3.7 Interpret intercept
Write a sentence interpreting the intercept for this example. Is it meaningful in this example?
1.3.8 Interpret slope
Write a sentence interpreting the slope for this example.
1.3.9 Prediction
What is the expected body mass of a penguin with flipper length 210 mm based on the model?