Compare means among groups

Remember you should

Overview

This practice reviews the Compare means among groups lecture.

Examples

We will run ANOVA’s using the lm function to connect them to other test. First, build the model

iris_anova <- lm(Sepal.Length~Species, iris)

Then use the object it created to test assumptions

par(mfrow = c(2,2))
plot(iris_anova)

If assumptions are met, check the p-value using the summary or Anova function.

summary(iris_anova)

Call:
lm(formula = Sepal.Length ~ Species, data = iris)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.6880 -0.3285 -0.0060  0.3120  1.3120 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)         5.0060     0.0728  68.762  < 2e-16 ***
Speciesversicolor   0.9300     0.1030   9.033 8.77e-16 ***
Speciesvirginica    1.5820     0.1030  15.366  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5148 on 147 degrees of freedom
Multiple R-squared:  0.6187,    Adjusted R-squared:  0.6135 
F-statistic: 119.3 on 2 and 147 DF,  p-value: < 2.2e-16
library(car)
Warning: package 'car' was built under R version 4.4.1
Loading required package: carData
Anova(iris_anova, type = "III")
Anova Table (Type III tests)

Response: Sepal.Length
             Sum Sq  Df F value    Pr(>F)    
(Intercept) 1253.00   1 4728.16 < 2.2e-16 ***
Species       63.21   2  119.26 < 2.2e-16 ***
Residuals     38.96 147                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If the overall test is significant, carry out post hoc tests (Tukey shown here for all pairs, as most common)

library(multcomp)
Loading required package: mvtnorm
Loading required package: survival
Loading required package: TH.data
Loading required package: MASS

Attaching package: 'TH.data'
The following object is masked from 'package:MASS':

    geyser
compare_cont_tukey <- glht(iris_anova, linfct = mcp(Species = "Tukey"))
summary(compare_cont_tukey)

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lm(formula = Sepal.Length ~ Species, data = iris)

Linear Hypotheses:
                            Estimate Std. Error t value Pr(>|t|)    
versicolor - setosa == 0       0.930      0.103   9.033   <1e-08 ***
virginica - setosa == 0        1.582      0.103  15.366   <1e-08 ***
virginica - versicolor == 0    0.652      0.103   6.333   <1e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

Note a special case of an ANOVA (which is a special case of a linear model) is when we only are comparing 2 populations. When this happens, we have a t-test. For example, we can compare only I. virginica and I. setosa.

not_versicolor <- iris[iris$Species != "versicolor",]
t.test(Sepal.Length ~ Species, not_versicolor)

    Welch Two Sample t-test

data:  Sepal.Length by Species
t = -15.386, df = 76.516, p-value < 2.2e-16
alternative hypothesis: true difference in means between group setosa and group virginica is not equal to 0
95 percent confidence interval:
 -1.78676 -1.37724
sample estimates:
   mean in group setosa mean in group virginica 
                  5.006                   6.588 

If you compare lm and ANOVA output, it may look slightly different.

summary(lm(Sepal.Length ~ Species, not_versicolor))

Call:
lm(formula = Sepal.Length ~ Species, data = not_versicolor)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.6880 -0.2880 -0.0060  0.2985  1.3120 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)        5.0060     0.0727   68.85   <2e-16 ***
Speciesvirginica   1.5820     0.1028   15.39   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5141 on 98 degrees of freedom
Multiple R-squared:  0.7072,    Adjusted R-squared:  0.7042 
F-statistic: 236.7 on 1 and 98 DF,  p-value: < 2.2e-16

That’s because by default the t-test does not assume equal variances. We can fix that.

t.test(Sepal.Length ~ Species, not_versicolor, var.equal=T)

    Two Sample t-test

data:  Sepal.Length by Species
t = -15.386, df = 98, p-value < 2.2e-16
alternative hypothesis: true difference in means between group setosa and group virginica is not equal to 0
95 percent confidence interval:
 -1.786042 -1.377958
sample estimates:
   mean in group setosa mean in group virginica 
                  5.006                   6.588 

but in this case it has minimal impact.

If assumptions are not met, we can use the Kruskal Wallis non-parametric test and associated post hoc tests.

kruskal.test(Sepal.Length ~ Species, data = iris)

    Kruskal-Wallis rank sum test

data:  Sepal.Length by Species
Kruskal-Wallis chi-squared = 96.937, df = 2, p-value < 2.2e-16
pairwise.wilcox.test(iris$Sepal.Length, 
                          iris$Species, 
                          p.adjust.method="holm")

    Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  iris$Sepal.Length and iris$Species 

           setosa  versicolor
versicolor 1.7e-13 -         
virginica  < 2e-16 5.9e-07   

P value adjustment method: holm 

The 2-sample connected test here is a Wilcoxon test (also known as the Mann-Whitney test or Mann-Whitney-Wilcox test) (and why we used pairwise.wilcox.test earrlier).

wilcox.test(Sepal.Length ~ Species, not_versicolor, var.equal=T)

    Wilcoxon rank sum test with continuity correction

data:  Sepal.Length by Species
W = 38.5, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0

We can also use a bootstrap alternative

library(WRS2)
t1waybt(Sepal.Length~Species, iris)
Call:
t1waybt(formula = Sepal.Length ~ Species, data = iris)

Effective number of bootstrap samples was 599.

Test statistic: 111.9502 
p-value: 0 
Variance explained: 0.716 
Effect size: 0.846 
bootstrap_post_hoc <- mcppb20(Sepal.Length~Species, iris)
p.adjust(as.numeric(bootstrap_post_hoc$comp[,6]), "holm")
[1] 0 0 0

For 2 groups, the boot.t.test function in the MKinfer package is also an option.

library(MKinfer)
boot.t.test(Sepal.Length ~ Species, not_versicolor)

    Bootstrap Welch Two Sample t-test

data:  Sepal.Length by Species
number of bootstrap samples:  9999
bootstrap p-value < 1e-04 
bootstrap difference of means (SE) = -1.582342 (0.10144) 
95 percent bootstrap percentile confidence interval:
 -1.778 -1.382

Results without bootstrap:
t = -15.386, df = 76.516, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.78676 -1.37724
sample estimates:
   mean in group setosa mean in group virginica 
                  5.006                   6.588 

A final option (shown here for 2 groups) is to use permutation tests. These tests work by rearranging the data in all (theoretically) possible ways and calculating signals for each permutation. You can then compare what we actually found to that to obtain p-values. Often times we can’t find all the combinations there are too many!), but a good sample (10,000+) should work. The function that we’ll use for this is independence_test from the coin package. It uses the same arguments as the t-test (including the formula interface).

library(coin)
independence_test(Sepal.Length ~ Species, not_versicolor)

    Asymptotic General Independence Test

data:  Sepal.Length by Species (setosa, virginica)
Z = -8.3675, p-value < 2.2e-16
alternative hypothesis: two.sided

Swirl lesson

Swirl is an R package that provides guided lessons to help you learn and review material. These lessons should serve as a bridge between all the code provided in the slides and background reading and the key functions and concepts from each lesson. A full course lesson (all lessons combined) can also be downloaded using the following instructions.

THIS IS ONE OF THE FEW TIMES I RECOMMEND WORKING DIRECTLY IN THE CONSOLE! THERE IS NO NEED TO DEVELOP A SCRIPT FOR THESE INTERACTIVE SESSIONS, THOUGH YOU CAN!

  • install the “swirl” package

  • run the following code once on the computer to install a new course

    library(swirl)
    install_course_github("jsgosnell", "JSG_swirl_lessons")
  • start swirl!

    swirl()
    • swirl()
  • then follow the on-screen prompts to select the JSG_swirl_lessons course and the lessons you want

    • Here we will focus on the Compare means among groups lesson
  • TIP: If you are seeing duplicate courses (or odd versions of each), you can clear all courses and then re-download the courses by

    • exiting swirl using escape key or bye() function

      bye()
    • uninstalling and reinstalling courses

      uninstall_all_courses()
      install_course_github("jsgosnell", "JSG_swirl_lessons")
    • when you restart swirl with swirl(), you may need to select

      • No. Let me start something new

Just for practice

For the following questions, you will use multiple methods to analyze a single dataset. This is for practice and so you can compare! For actual analysis you should only use the best method!

1

Use the iris dataset in R to determine if petal length differs among species. Do this problem using the following methods

  • ANOVA

  • Kruskal-Wallis

  • bootstrapping

Make sure you can plot the data and carry out multiple comparison methods as needed. Also be sure to understand the use of coefficients and adjusted R2 values and where to find them.

2

Data on plant heights (in cm) for plants grown with a new and old formulation of fertilizer can be found at

https://docs.google.com/spreadsheets/d/e/2PACX-1vSUVowOKlmTic4ekL7LSbwDcqrsDSXv5K_c4Qyfcvz1lLE1_iINmGzy0zMGxY7z5DImlUErK4S2wY7Y/pub?gid=0&single=true&output=csv.

Analyze this data using the t.test function and the lm function to convince yourself that t-tests are special cases of ANOVAs, which are special cases of linear models!

For the following questions, pick the appropriate method for analyzing the question. Use a plot of the data and/or model analysis to justify your decision. Make sure you can carry out multiple comparison methods as needed. Also be sure to understand the use of coefficients and adjusted R2 values and where to find them.

For the following questions, pick the appropriate method for analyzing the question. Use a plot of the data and/or model analysis to justify your decision. Make sure you can carry out multiple comparison methods as needed. Also be sure to understand the use of coefficients and adjusted R2 values and where to find them.

3

Data on sugar cane yield for multiple fields is available using

read.table(“https://docs.google.com/spreadsheets/d/e/2PACX-1vRjstKreIM6UknyKFQCtw2_Q6itY9iOAVWO1hUNZkBFL8mwVssvTevqgzV22YDKCUeJq0HBDrsBrf5O/pub?gid=971470377&single=true&output=tsv”, header = T, stringsAsFactors = T)

More info on the data can be found at http://www.statsci.org/data/oz/cane.html. Is there evidence that location (DistrictPosition column) impacts yield (Tonn.Hect column)? If so, which areas are driving this distance?

4

Data on FEV (forced expiratory volume), a measure of lung function, can be found at

http://www.statsci.org/data/general/fev.txt

More information on the dataset is available at

http://www.statsci.org/data/general/fev.html.

Is there evidence that FEV depends on gender? If so, which gender has the higher FEV score? How much variance does gender explain?

5

The following data are human blood clotting times (in minutes) of individuals given one of two different drugs.

Drug B Drug G
8.8 9.9
8.4 9.0
7.9 11.1
8.7 9.6
9.1 8.7
9.6 10.4
9.5

Test the hypothesis that the mean clotting times are equal for the two groups

  • Estimating the variance from the data
  • Using rank transform analysis
  • Using a permutation test
  • Using a bootstrap test

Test the hypothesis that the mean clotting times are equal for the two groups

  • Estimating the variance from the data
  • Using rank transform analysis
  • Using a permutation test
  • Using a bootstrap test

6

(Example from Handbook on Biological Statistics) Odd (stunted, short, new) feathers were compared in color to typical feathers in Northern Flickers (Colaptes auratus) (Wiebe and Bortolotti 2002) . Data is at

https://raw.githubusercontent.com/jsgosnell/CUNY-BioStats/master/datasets/wiebe_2002_example.csv

Test the hypothesis that odd and typical feathers did not differ using

  • a Student’s t test and/or lm
  • a rank test
  • bootstrapping

Note we will return to this question next week!

7

Find an example of a one-way ANOVA from a paper that is related to your research or a field of interest (hint: use Google Scholar or another search engine to search for a keyword of interest plus ANOVA). Make sure you understand the connections between the methods, results, and graphs. Briefly answer the following questions

  • What was the dependent variable?
  • What were the independent variables?
  • Was there a difference in mean values among the groups? If so, how did the paper address or explain this finding.

References

Wiebe, Karen L., and Gary R. Bortolotti. 2002. “Variation in Carotenoid-Based Color in Northern Flickers in a Hybrid Zone.” The Wilson Bulletin 114 (3): 393–400. http://www.jstor.org/stable/4164474.