HW 6: BSTA 511-611 F23
Updated 11/6/23: moved Day 12 questions to HW 7
Due 11/11/23
Download the .qmd file for this assignment from https://github.com/niederhausen/BSTA_511_F23/blob/main/homework/HW_6_F23_bsta511.qmd
Directions
- Please upload your homework to Sakai. Upload both your .qmd code file and the rendered .html file.
- For each question, make sure to include all code and resulting output in the html file to support your answers.
R & LaTeX code
- See the .qmd files with the code from class notes for LaTeX and R code.
- The LaTeX code will make it easier to show your work in computations.
It is a good idea to try rendering your document from time to time as you go along! Note that rendering automatically saves your Qmd file and rendering frequently helps you catch your errors more quickly.
Hypothesis test & CI instructions
For book exercises, make sure to include all steps in a hypothesis test (where applicable) as outlined in the class notes.
Do not forget to include a discussion on whether you think the test (or CI) conditions have been satisfied. Are there assumptions you need to make in order for them to be satisfied? Whether you believe they are satisfied or not, continue to run the hypothesis test (or CI) as instructed.
Book exercises
5.26 Egg volume
5.34 Placebos without deception
1 PSS
1.1 PSS1: 4.22 Testing for food safety.
Do exercise 4.22 from textbook.
1.2 PSS2: Auto exhaust and lead exposure revisited.
1.2.1 Power
In exercise 5.12, we tested whether police officers appear to have been exposed to a higher concentration of lead than 35. Calculate the power for the hypothesis test and include an interpretation of the power in the context of the research question. Was it sufficiently powered?
1.2.2 Sample size
For the same test, what sample size would be needed for 80% power? How about 90% power? Would it be reasonable to conduct the study with these sample sizes? Why or why not?
1.2.3 Effect size
Suppose the study has resources to include 30 people. What minimum effect size would they be able to detect with 85% power assuming the same sample mean and standard deviation. Use \(\alpha\) = 0.05.
1.2.4 2-sided vs. 1-sided
Continuing with the previous question, what happens to the effect size they can detect if the test is two sided instead of one-sided?
2 R exercises
2.1 Load all the packages you need below here.
2.2 R1: DDS expenditures by ethnicity
- In these exercises you will use R to work through the discrimination in developmental disability support example from Section 5.3.4 (pg. 253) in the textbook.
- The data are in the
oibiostats
package and calleddds.discr
.
2.2.1 New dataset
Create a new dataset that only includes the White (non Hispanic) and Hispanic ethnicities. Use this new dataset for the following questions.
2.2.2 Data viz
Create density plots and box plots of the expenditures stratified by ethnicity. Comment on the distribution shapes. Are there any outliers?
2.2.3 t-test conditions
Are the conditions for a t-test comparing the mean expenditures of the two ethnicities satisfied?
2.2.4 Log-transformation
The book recommends log-transforming the expenditure values before testing. Create a new column in the dataset with the transformed values. The R command for the natural logarithm is log()
.
2.2.5 Data viz: log-transformed expenditures
Create density plots and box plots of the log-transformed expenditures stratified by ethnicity. Comment on the distribution shapes. Are there any outliers?
2.2.6 t-test conditions: log-transformed expenditures
Are the conditions for a t-test comparing the mean log-transformed expenditures of the two ethnicities satisfied?
2.2.7 Summary stats: log-transformed expenditures
Calculate the means, standard deviations, and sample sizes for the log-transformed expenditures stratified by ethnicity, and compare them to the ones in the book. Which group had a larger mean?
2.2.8 Test
Run the appropriate statistical test in R to verify the test statistic in the text and get the actual p-value. In which order was the difference in means calculated, and is this same as in the book? Use inline R code to pull these values from the test output when writing up your comparison of these values to the book’s values.
2.2.9 df
How do the degrees of freedom (df) from the hypothesis test compare to the df used by the book? Why are they different? Which degrees df (book vs. test output) leads to a bigger p-value?
2.2.10 CI
What is the 95% CI? Write an interpretation of the CI in the context of the research question.
2.2.11 Test original expenditure values
Run the appropriate statistical test in R using the original expenditure values. What are the test statistic and p-value? Does the conclusion of the test change?
2.2.12 CI using original expenditure values
What is the 95% CI? Write an interpretation of the CI in the context of the research question. Which of the CI’s (log-transformed vs not) is easier to interpret?
2.2.13 Age groups
The book’s example goes on to analyze the data stratified by age groups, since age is a confounder in expenditure amounts. Create two new datasets restricted to the age groups 13-17 and 22-50, respectively.
2.2.14 Data viz by age groups
Create density plots and box plots of the expenditures stratified by ethnicity for each of the age groups separately. Comment on the distribution shapes. Are there any outliers?
2.2.15 t-test conditions: age groups
Are the conditions for a t-test comparing the mean expenditures of the two ethnicities satisfied for either or both of the age groups?
2.2.16 Summary stats: age groups
Calculate the means, standard deviations, and sample sizes for the expenditures stratified by ethnicity and the age groups, and compare them to the ones in the book. Which group had a larger mean?
2.2.17 t-test: age groups
Run the appropriate statistical tests for both age groups in R to verify the test statistics, df’s, and p-values in the text. In which order were the differences in means calculated, and are they the same as in the book? Use inline R code to pull these values from the test output when writing up your comparison of these values to the book’s values.
2.2.18 CI: age groups
What are the 95% CI’s for each of the age groups? Write interpretations of the CI’s in the context of the research question. Does they suggest there are differences in expenditures between the two ethnicities? Why or why not?
2.2.19 Discrimination in DDS expenditures?
Even though the p-values for the age-stratified tests were not significant, is it possible that there was discrimination in DDS expenditures?