HW 6: BSTA 511-611 F23

Author

Your name here - update this!!!!

Published

November 11, 2023

Updated 11/6/23: moved Day 12 questions to HW 7

Due 11/11/23

Download the .qmd file for this assignment from https://github.com/niederhausen/BSTA_511_F23/blob/main/homework/HW_6_F23_bsta511.qmd

Directions

Please upload your homework to Sakai. Upload both your .qmd code file and the rendered .html file.
For each question, make sure to include all code and resulting output in the html file to support your answers.

R & LaTeX code

See the .qmd files with the code from class notes for LaTeX and R code.
The LaTeX code will make it easier to show your work in computations.

Tip

It is a good idea to try rendering your document from time to time as you go along! Note that rendering automatically saves your Qmd file and rendering frequently helps you catch your errors more quickly.

Hypothesis test & CI instructions

Important

For book exercises, make sure to include all steps in a hypothesis test (where applicable) as outlined in the class notes.
Do not forget to include a discussion on whether you think the test (or CI) conditions have been satisfied. Are there assumptions you need to make in order for them to be satisfied? Whether you believe they are satisfied or not, continue to run the hypothesis test (or CI) as instructed.

Book exercises

5.26 Egg volume

5.34 Placebos without deception

1 PSS

1.1 PSS1: 4.22 Testing for food safety.

Do exercise 4.22 from textbook.

1.2 PSS2: Auto exhaust and lead exposure revisited.

1.2.1 Power

In exercise 5.12, we tested whether police officers appear to have been exposed to a higher concentration of lead than 35. Calculate the power for the hypothesis test and include an interpretation of the power in the context of the research question. Was it sufficiently powered?

1.2.2 Sample size

For the same test, what sample size would be needed for 80% power? How about 90% power? Would it be reasonable to conduct the study with these sample sizes? Why or why not?

1.2.3 Effect size

Suppose the study has resources to include 30 people. What minimum effect size would they be able to detect with 85% power assuming the same sample mean and standard deviation. Use \(\alpha\) = 0.05.

1.2.4 2-sided vs. 1-sided

Continuing with the previous question, what happens to the effect size they can detect if the test is two sided instead of one-sided?

2 R exercises

2.1 Load all the packages you need below here.

2.2 R1: DDS expenditures by ethnicity

In these exercises you will use R to work through the discrimination in developmental disability support example from Section 5.3.4 (pg. 253) in the textbook.
The data are in the oibiostats package and called dds.discr.

2.2.1 New dataset

Create a new dataset that only includes the White (non Hispanic) and Hispanic ethnicities. Use this new dataset for the following questions.

2.2.2 Data viz

Create density plots and box plots of the expenditures stratified by ethnicity. Comment on the distribution shapes. Are there any outliers?

2.2.3 t-test conditions

Are the conditions for a t-test comparing the mean expenditures of the two ethnicities satisfied?

2.2.4 Log-transformation

The book recommends log-transforming the expenditure values before testing. Create a new column in the dataset with the transformed values. The R command for the natural logarithm is log().

2.2.5 Data viz: log-transformed expenditures

Create density plots and box plots of the log-transformed expenditures stratified by ethnicity. Comment on the distribution shapes. Are there any outliers?

2.2.6 t-test conditions: log-transformed expenditures

Are the conditions for a t-test comparing the mean log-transformed expenditures of the two ethnicities satisfied?

2.2.7 Summary stats: log-transformed expenditures

Calculate the means, standard deviations, and sample sizes for the log-transformed expenditures stratified by ethnicity, and compare them to the ones in the book. Which group had a larger mean?

2.2.8 Test

Run the appropriate statistical test in R to verify the test statistic in the text and get the actual p-value. In which order was the difference in means calculated, and is this same as in the book? Use inline R code to pull these values from the test output when writing up your comparison of these values to the book’s values.

2.2.9 df

How do the degrees of freedom (df) from the hypothesis test compare to the df used by the book? Why are they different? Which degrees df (book vs. test output) leads to a bigger p-value?

2.2.10 CI

What is the 95% CI? Write an interpretation of the CI in the context of the research question.

2.2.11 Test original expenditure values

Run the appropriate statistical test in R using the original expenditure values. What are the test statistic and p-value? Does the conclusion of the test change?

2.2.12 CI using original expenditure values

What is the 95% CI? Write an interpretation of the CI in the context of the research question. Which of the CI’s (log-transformed vs not) is easier to interpret?

2.2.13 Age groups

The book’s example goes on to analyze the data stratified by age groups, since age is a confounder in expenditure amounts. Create two new datasets restricted to the age groups 13-17 and 22-50, respectively.

2.2.14 Data viz by age groups

Create density plots and box plots of the expenditures stratified by ethnicity for each of the age groups separately. Comment on the distribution shapes. Are there any outliers?

2.2.15 t-test conditions: age groups

Are the conditions for a t-test comparing the mean expenditures of the two ethnicities satisfied for either or both of the age groups?

2.2.16 Summary stats: age groups

Calculate the means, standard deviations, and sample sizes for the expenditures stratified by ethnicity and the age groups, and compare them to the ones in the book. Which group had a larger mean?

2.2.17 t-test: age groups

Run the appropriate statistical tests for both age groups in R to verify the test statistics, df’s, and p-values in the text. In which order were the differences in means calculated, and are they the same as in the book? Use inline R code to pull these values from the test output when writing up your comparison of these values to the book’s values.

2.2.18 CI: age groups

What are the 95% CI’s for each of the age groups? Write interpretations of the CI’s in the context of the research question. Does they suggest there are differences in expenditures between the two ethnicities? Why or why not?

2.2.19 Discrimination in DDS expenditures?

Even though the p-values for the age-stratified tests were not significant, is it possible that there was discrimination in DDS expenditures?