More ANOVAs

Dealing with multiple group membership and interactions

In the last chapter we introduced the idea of comparing means among populations (one-way ANOVAs, our first linear models). However, the units that we measure may belong to multiple groups. We will extend our analysis of variance to consider multiple group membership and interactions in this chapter. As a starting point, consider that group membership may be an inherent property of the unit we measure or we may assign it.

Example: Back to the birds

One of the last chapters practice problems focused bird feathers. While studying feather color in Northern flickers (Colaptes auratus), Wiebe and Bortolotti (2002) noted that ~25% of birds had one or more “odd” tail feathers. They decided to compare the color of these odd and “typical” feathers.

Northern Flicker. Mike’s Birds, CC BY-SA 2.0 <https://creativecommons.org/licenses/by-sa/2.0>, via Wikimedia Commons

Example and data provided by McDonald (2014).

library(rmarkdown)
paged_table(feather)

How do we analyze this data?

We may first note that we have a continuous measurement (feather color, measured using color hues from a digital camera and another statistical technique that we will not go into here) and a categorical variable (feather type, with levels “typical” and “odd”). This hopefully reminds you of an ANOVA/t-test!

We could plot the data

library(ggplot2)
ggplot(feather, aes(x=Feather_type, y= Color_index, color=Feather_type))+
  geom_jitter()+
  labs(y= "Color index",
       x= "Feather type",
       title="Comparing odd and typical feathers in Northern flickers")+
  guides(color=F)
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.

Develop a set of hypotheses:

\[ \begin{split} H_O: \mu_{\textrm{odd feather color}} = \mu_{\textrm{typical feather color}}\\ H_A: \mu_{\textrm{odd feather color}} \neq \mu_{\textrm{typical feather color}}\\ \end{split} \]

and test them using a t-test:

t.test(Color_index ~ Feather_type, data=feather)

    Welch Two Sample t-test

data:  Color_index by Feather_type
t = -3.56, df = 29.971, p-value = 0.00126
alternative hypothesis: true difference in means between group Odd and group Typical is not equal to 0
95 percent confidence interval:
 -0.21579254 -0.05845746
sample estimates:
    mean in group Odd mean in group Typical 
            -0.176125             -0.039000 

or, using more generalizable functions, a linear model:

library(car)
Anova(lm(Color_index ~ Feather_type, data=feather), type = "III")
Anova Table (Type III tests)

Response: Color_index
              Sum Sq Df F value    Pr(>F)    
(Intercept)  0.49632  1  41.816 3.814e-07 ***
Feather_type 0.15043  1  12.674  0.001259 ** 
Residuals    0.35607 30                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We find a significant p value, but we did not check assumptions. For linear models (remember, $i.i.d. N(,)$, we could use our visual checks

plot(lm(Color_index ~ Feather_type, data=feather))

Which appears ok, but there is a problem.

Our data are not independent!

Lack of Independence

Odd and typical feathers were measured on a single bird (note the Bird column) in the dataset. We might assume feathers on a given bird are more closely related in color than feathers on different birds. This could be due to diet or other factors making all feathers on a given bird brighter or darker than those on another. Regardless of reason (and “good” p value), we know the measurements are linked in some way. Note we could “connect” individual observations.

ggplot(feather, aes(x=Feather_type, y= Color_index, color=Feather_type, group=Bird))+
  geom_line(position = position_dodge(0.4), color="black") +
  geom_point(position = position_dodge(0.4)) +  
  labs(y= "Color index",
       x= "Feather type",
       title="Comparing odd and typical feathers in Northern flickers")+
  guides(color=F)

This may also occur if we measure outcomes with-in a single unit (e.g., a study of fertilizer impacts using multiple fields) or over time (e.g., before/after studies). Regardless of the reason, when our experimental design has led to measurements being connected/not independent, we need to consider these connections in order to properly note (and sometimes even observe) impacts of focal variables.

Blocking, two-way ANOVAs,and paired t-tests

In this case, the connections may be considered artifacts of the data. We didn’t assign birds. We also made a choice to compare odd and typical feathers from the same bird - why? In general, accounting for extra variation in the data will give you a better answer about how a given variable influences outcomes. This may be called blocking. Although the motivation might therefore be to get a “better” p value, it should be driven by experimental design (and thus we started with an example where we didn’t “need” to account for it to achieve significance).

In order to consider how color differs by bird and feather type, we need to add both variables to our linear model. For each variable we add, we also add a null (and corresponding alternative) hypothesis. So we retain our focus on feather type:

\[ \begin{split} H_O: \mu_{\textrm{odd feather color}} = \mu_{\textrm{typical feather color}}\\ H_A: \mu_{\textrm{odd feather color}} \neq \mu_{\textrm{typical feather color}}\\ \end{split} \]

but also add a set of hypotheses focused on birds:

\[ \begin{split} H_O: \mu_{\textrm{color of bird A}} = \mu_{\textrm{color of bird B}}....\textrm{for all birds}\\ H_A: \mu_{\textrm{color of bird A}} \neq \mu_{\textrm{color of bird B}}....\textrm{for all birds}\\ \end{split} \]

We can analyze this using our linear model approach. This is possible because, as we noted earlier, we can subdivide variance among multiple levels. Under the hood, the linear model approach build a model matrix that considers the impact of feather type and bird on outcomes. Since both variables are categorical, this is often called a two-way ANOVA. First, let’s make the object

two_way_anova_example <- lm(Color_index ~ Feather_type + Bird, data=feather)
You can see the new model matrix and coefficients if you want

Note the model matrix now includes columns for feather type and bird (lots of dummy variables, and now intercept is Bird A’s odd feather!). The \(\beta\) matrix of coefficients has corresponding estimates.

library(rmarkdown)
paged_table(data.frame(model.matrix(two_way_anova_example)))
matrix(as.numeric(two_way_anova_example$coefficients), ncol=1)
            [,1]
 [1,] -0.3580625
 [2,]  0.1371250
 [3,]  0.0905000
 [4,]  0.0450000
 [5,]  0.1250000
 [6,]  0.2535000
 [7,]  0.2575000
 [8,]  0.1500000
 [9,]  0.2525000
[10,]  0.2885000
[11,]  0.2150000
[12,]  0.2530000
[13,]  0.1445000
[14,]  0.1365000
[15,]  0.2190000
[16,]  0.2530000
[17,]  0.2275000

So for our first observation, which is

feather[1,]
  Bird Feather_type Color_index
1    A      Typical      -0.255

Our estimate is the intercept (since it’s bird A) and the typical feather:

model.matrix(two_way_anova_example)[1,] %*% matrix(as.numeric(two_way_anova_example$coefficients), ncol=1)
           [,1]
[1,] -0.2209375

and thus our residual is

two_way_anova_example$residuals[1]
         1 
-0.0340625 

which is the same as

feather[1,]$Color_index-model.matrix(two_way_anova_example)[1,] %*% matrix(as.numeric(two_way_anova_example$coefficients), ncol=1)
           [,1]
[1,] -0.0340625

Then check the assumptions

plot(two_way_anova_example)

Note, visually speaking, the residuals do appear to be closer to normal now. Since assumptions look ok, we can analyze the outcome

summary(two_way_anova_example)

Call:
lm(formula = Color_index ~ Feather_type + Bird, data = feather)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.12444 -0.05209  0.00000  0.05209  0.12444 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         -0.35806    0.06955  -5.148 0.000119 ***
Feather_typeTypical  0.13712    0.03374   4.065 0.001017 ** 
BirdB                0.09050    0.09542   0.948 0.357936    
BirdC                0.04500    0.09542   0.472 0.643998    
BirdD                0.12500    0.09542   1.310 0.209903    
BirdE                0.25350    0.09542   2.657 0.017950 *  
BirdF                0.25750    0.09542   2.699 0.016505 *  
BirdG                0.15000    0.09542   1.572 0.136802    
BirdH                0.25250    0.09542   2.646 0.018330 *  
BirdI                0.28850    0.09542   3.023 0.008554 ** 
BirdJ                0.21500    0.09542   2.253 0.039643 *  
BirdK                0.25300    0.09542   2.651 0.018139 *  
BirdL                0.14450    0.09542   1.514 0.150719    
BirdM                0.13650    0.09542   1.431 0.173069    
BirdN                0.21900    0.09542   2.295 0.036567 *  
BirdO                0.25300    0.09542   2.651 0.018139 *  
BirdP                0.22750    0.09542   2.384 0.030759 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09542 on 15 degrees of freedom
Multiple R-squared:  0.7304,    Adjusted R-squared:  0.4427 
F-statistic: 2.539 on 16 and 15 DF,  p-value: 0.03923
Anova(two_way_anova_example, type= "III")
Anova Table (Type III tests)

Response: Color_index
              Sum Sq Df F value   Pr(>F)    
(Intercept)  0.24133  1 26.5059 0.000119 ***
Feather_type 0.15043  1 16.5214 0.001017 ** 
Bird         0.21950 15  1.6072 0.184180    
Residuals    0.13657 15                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note we see a significant difference in color among birds and feather type. Although we may be tempted to (and could) use post-hoc tests to consider which birds are different than which others, this is typically not done for blocked variables. We did not assign these pairings and it is not the focus of our efforts.

Since we only had 2 types of feathers, we also don’t need post-hoc tests. A significant p value means they differ from each other, and the estimates provided by the summary command indicate the typical feathers have a higher color index.

t-test connections

When we have only two measurements per group (e.g., odd and typical feathers from each bird), we can use a t-test approach to achieve similar goals. This approach is known as a paired t-test. Instead of focusing on the difference in means (like a 2-sample t-test), the test focuses on the mean difference between paired measurements (which would be 0 under the null hypothesis!). In this way, it is effectively a one-sample test that is pairing the data to reduce variation (blocking). We can do carry out the test:

t.test(Color_index ~ Feather_type, data=feather, paired=TRUE)

    Paired t-test

data:  Color_index by Feather_type
t = -4.0647, df = 15, p-value = 0.001017
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -0.20903152 -0.06521848
sample estimates:
mean difference 
      -0.137125 

and get the same results as above (note we don’t even have to consider corrections like the Welch approach since this a one-sample test). Common examples of paired t-tests include before-after and twin studies.

In an earlier chapters we considered options for one- and two-sample tests when t-tests assumptions were not met. For two-sample tests, one of these approaches, the sign or binary test, is only valid for paired data. The differences in paired observations are compared to a set value (typically 0). Under the null hypothesis, half should be below the proposed median and half should be above. Differences matching the proposed value are ignored, thus reducing the sample size and making it harder to reject the null hypothesis; this is actually an odd way of accounting for them. The proportion of values below the proposed median is then evaluated using a binomial test. For two sample, the SIGN.test function in the BSDA package requires 2 columns of data and assumes the order of the column represents paired data.

library(BSDA)
SIGN.test(feather[feather$Feather_type == "Odd", "Color_index"], 
          feather[feather$Feather_type == "Typical", "Color_index"],
          md = 0)

    Dependent-samples Sign-Test

data:  feather[feather$Feather_type == "Odd", "Color_index"] and feather[feather$Feather_type == "Typical", "Color_index"]
S = 3, p-value = 0.02127
alternative hypothesis: true median difference is not equal to 0
95 percent confidence interval:
 -0.24048275 -0.02331055
sample estimates:
median of x-y 
       -0.114 

Achieved and Interpolated Confidence Intervals: 

                  Conf.Level  L.E.pt  U.E.pt
Lower Achieved CI     0.9232 -0.2400 -0.0320
Interpolated CI       0.9500 -0.2405 -0.0233
Upper Achieved CI     0.9787 -0.2410 -0.0140

More than 2 measurements? Back to the linear model

We can also block for variation when we take more than 2 measurements per unit. For example, imagine if these birds also had a special, long tail feather.

set.seed(25)
special <- data.frame(Bird = LETTERS[1:16], Feather_type = "Special", 
                      Color_index= feather[feather$Feather_type == "Typical", "Color_index"] +
                        .3 +runif(16,1,1)*.01)
feather_extra <- merge(feather, special, all = T)
feather_extra$Feather_type <- factor(feather_extra$Feather_type)

We could still block for variation using the linear model/ANOVA, but not the t-test, approach. As another review, we create the model

more_blocks <-lm(Color_index ~ Feather_type + Bird, data=feather_extra)

Check assumptions

plot(more_blocks)

Check outcome (this time focusing on Anova output)

Anova(more_blocks, type="III")
Anova Table (Type III tests)

Response: Color_index
              Sum Sq Df  F value    Pr(>F)    
(Intercept)  0.36392  1  59.9538 1.224e-08 ***
Feather_type 1.67906  2 138.3093 7.208e-16 ***
Bird         0.34649 15   3.8055 0.0008969 ***
Residuals    0.18210 30                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We still see feather type has a significant impact on color, but since we have more than 2 groups we need to follow up this finding with a post-hoc test.

library(multcomp)
compare <- glht(more_blocks, linfct = mcp(Feather_type = "Tukey"))
summary(compare)

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lm(formula = Color_index ~ Feather_type + Bird, data = feather_extra)

Linear Hypotheses:
                       Estimate Std. Error t value Pr(>|t|)    
Special - Odd == 0      0.44712    0.02755  16.232   <1e-04 ***
Typical - Odd == 0      0.13712    0.02755   4.978   <1e-04 ***
Typical - Special == 0 -0.31000    0.02755 -11.254   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

Other ways to be in multiple groups

In the bird example, one of our categories (bird) was un-intential. We chose to measure odd and typical feathers, and accounting for variation among birds was an appropriate step given lack of independence in measurements. However, we can also assign units to multiple groups (instead of making multiple measurements within one unit).

Consider if we ran an experiment focused on the impact of factors A and B on some trait. We can fully cross the factors in an experiment. Doing so can let us consider the main effects of multiple variables and potential interactions among them in what is often called a factorial ANOVA. For starters, let each factor have only 2 levels, and let the levels be the absence or presence of the factor.

Factor A
Factor B Absent Present
Absent Control Impact of A only
Present Impact of B only Combined impact of A+B
Experimental design notes

For a factorial ANOVA, we need to assign each unit randomly to a level of factor A. Then each level of factor B is randomly assigned to subjects at each level of factor A. This is different than randomly assigning treatments of A and B, as that could lead to outcomes where some level of factor B is not represented in some level of factor A.

We also need multiple units (3+) assigned to each combination.

When both are absent we have a classic control outcome. When one is present and the other absent we see main effects impacts of only one factor. Note we previously analyzed experiments that considered only one factor using ANOVAs or t-tests (linear models), but now we have multiple factors. We should not analyze the main effects of each using 2 one-way ANOVAs. Doing so cuts our data in half, meaning our estimates of variances are less precise and we increase our chance of making a type 1 error. More importantly, we wouldn’t be able to properly consider the combined impacts of A + B. What could these be?

example_interaction <- data.frame(Treatment = c(rep("Control",5),
                                                rep("Impact of A only",5),
                                                rep("Impact of B only",5),
                                                rep("A+B Additive",5), 
                                                rep("A+B Synergistic", 5),
                                                rep("A+B Antagonistic", 5)), 
                                  Cause= rep(c("Control","Factor A","Factor B", "Synergistic", "Antagonistic"), 6),
                                  Impact = c(5,0,0,0,0,
                                             5,2,0,0,0,
                                             5,0,3,0,0,
                                             5,2,3,0,0,
                                             0,0,0,20,0,
                                             0,0,0,0,6))
example_interaction$Treatment <- factor(example_interaction$Treatment, levels=c("Control","Impact of A only","Impact of B only", "A+B Additive", "A+B Synergistic", "A+B Antagonistic"))
example_interaction$Cause <- factor(example_interaction$Cause, levels=c("Control","Factor A","Factor B", "Synergistic", "Antagonistic"))
ggplot(example_interaction, aes(x=Treatment, y= Impact, fill= Cause))+
  geom_col(position = position_stack(reverse = TRUE))+
  theme(axis.text.x = element_text(angle = -45))

As shown in the graph (Inspired by (Fong, Bittick, and Fong 2017)), A and B could have additive effects (where they simply stack), synergistic effects (the combined impact is more than the sum of the two), or antagonistic effects (the combined impacts is less than the sum of the two). Synergistic and antagonistic impacts are both examples of interactions. Interactions occur when the impact of one variable depends on the level of another.

Example: Impacts of grazing and fertilization

We can extend this example to consider more than 2 levels for one or more factors. For example, Valdez et al. (2023) wanted to consider the impact of top-down (snail grazing) and bottom- up (nutrient availability) on marsh plant (Spartina alterniflora) growth. To do this, they assigned plots to one of 3 grazer treatments and one of 2 nitrogen treatments.

Map of study site on Hog Island, Virginia, USA and conceptual illustration of experimental design with the following treatments: 1) Nitrogen addition with ambient snails, 2) nitrogen addition with three times ambient snails, 3) nitrogen addition without snails, 4) ambient nitrogen with ambient snails, 5) ambient nitrogen with three times ambient snails, and 6) ambient nitrogen without snails. The figure also depicts cage controls and uncaged plots used to assess caging effects on marsh plants. The map in the figure was created in R using ggmap [33] from ©OpenStreetMap under a ODb license, with permission from OpenStreetMapFoundation, original copyright 2018. https://www.openstreetmap.org/copyright.

Fig 1 from Valdez et al. 2003. Map and conceptual illustration of experimental design.

This design is different from the bird example. No two measurements for a given trait were taken on the same plot. In this case, we likely care about the main effects, or impacts, of both variables. However, we may also need to consider interactions among the variables. Interactions occur when the impact of one variable depends on the level of another. For example, snail removal might have major impacts on nitrogen-enriched plots while having no impact on ambient plots. Due to this, we now have even more hypotheses:

\[ \begin{split} H_O: \mu_\textrm{plant growth, no fertilizer} = \mu_\textrm{plant growth, fertilizer}\\ H_O: \mu_\textrm{plant growth, snails removed} = \mu_\textrm{plant growth, control snails}= \mu_\textrm{plant growth, snails added}\\ H_O: \textrm{impact of snail grazing does not depend on nitrogen level}\\ \end{split} \]

Fortunately, these are easy to consider in our linear model framework. While not shown here, the model matrix adds columns to note our new interaction terms, and the coefficient matrix estimates them. From an R standpoint, we can include the interaction between two variables using the “:” notation. We’ll focus on below-ground biomass (standardized to m2) for this example (the paper measured 9 response variables!)

valdez_2023 <- read.csv("data/Spartina_alterniflora_traits.csv", stringsAsFactors = T)
bgb_model <-lm(below.biomass.g.meter.sq..m2..~Snail.Level + Nitrogen.level + Snail.Level:Nitrogen.level, valdez_2023)

For shorthand, note that if we put main effect * main effect in a model, it automatically adds the interaction term. You can see the model summary is the same.

bgb_model_shorthand <-lm(below.biomass.g.meter.sq..m2..~Snail.Level * Nitrogen.level, valdez_2023)
summary(bgb_model)

Call:
lm(formula = below.biomass.g.meter.sq..m2.. ~ Snail.Level + Nitrogen.level + 
    Snail.Level:Nitrogen.level, data = valdez_2023)

Residuals:
     Min       1Q   Median       3Q      Max 
-103.003  -50.933    1.507   28.297  153.937 

Coefficients: (1 not defined because of singularities)
                                                Estimate Std. Error t value
(Intercept)                                       542.16      40.72  13.315
Snail.Levelremoval                                 33.58      57.58   0.583
Snail.Levelsnail addition                         -88.85      57.58  -1.543
Snail.Leveluncaged                                 45.09      57.58   0.783
Nitrogen.levelwithout                            -111.87      57.58  -1.943
Snail.Levelremoval:Nitrogen.levelwithout           43.39      81.44   0.533
Snail.Levelsnail addition:Nitrogen.levelwithout    60.18      81.44   0.739
Snail.Leveluncaged:Nitrogen.levelwithout              NA         NA      NA
                                                Pr(>|t|)    
(Intercept)                                     2.44e-09 ***
Snail.Levelremoval                                0.5691    
Snail.Levelsnail addition                         0.1451    
Snail.Leveluncaged                                0.4467    
Nitrogen.levelwithout                             0.0724 .  
Snail.Levelremoval:Nitrogen.levelwithout          0.6025    
Snail.Levelsnail addition:Nitrogen.levelwithout   0.4721    
Snail.Leveluncaged:Nitrogen.levelwithout              NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 70.53 on 14 degrees of freedom
Multiple R-squared:  0.498, Adjusted R-squared:  0.2829 
F-statistic: 2.315 on 6 and 14 DF,  p-value: 0.09183
summary(bgb_model_shorthand)

Call:
lm(formula = below.biomass.g.meter.sq..m2.. ~ Snail.Level * Nitrogen.level, 
    data = valdez_2023)

Residuals:
     Min       1Q   Median       3Q      Max 
-103.003  -50.933    1.507   28.297  153.937 

Coefficients: (1 not defined because of singularities)
                                                Estimate Std. Error t value
(Intercept)                                       542.16      40.72  13.315
Snail.Levelremoval                                 33.58      57.58   0.583
Snail.Levelsnail addition                         -88.85      57.58  -1.543
Snail.Leveluncaged                                 45.09      57.58   0.783
Nitrogen.levelwithout                            -111.87      57.58  -1.943
Snail.Levelremoval:Nitrogen.levelwithout           43.39      81.44   0.533
Snail.Levelsnail addition:Nitrogen.levelwithout    60.18      81.44   0.739
Snail.Leveluncaged:Nitrogen.levelwithout              NA         NA      NA
                                                Pr(>|t|)    
(Intercept)                                     2.44e-09 ***
Snail.Levelremoval                                0.5691    
Snail.Levelsnail addition                         0.1451    
Snail.Leveluncaged                                0.4467    
Nitrogen.levelwithout                             0.0724 .  
Snail.Levelremoval:Nitrogen.levelwithout          0.6025    
Snail.Levelsnail addition:Nitrogen.levelwithout   0.4721    
Snail.Leveluncaged:Nitrogen.levelwithout              NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 70.53 on 14 degrees of freedom
Multiple R-squared:  0.498, Adjusted R-squared:  0.2829 
F-statistic: 2.315 on 6 and 14 DF,  p-value: 0.09183

You may note a weird NA here (we’ll come back to it), but remember we should really check model assumptions before looking at output.

plot(bgb_model)

These look ok. There may be a slight increase in variance with fitted values, but we can work with this. Let’s build an ANOVA table.

Anova(bgb_model, type="III")
Error in Anova.III.lm(mod, error, singular.ok = singular.ok, ...): there are aliased coefficients in the model

But we got an error! What happened? Let’s look at the data

paged_table(valdez_2023)

A summary may help more. Note we can summarize across multiple factors.

library(Rmisc)
paged_table(summarySE(valdez_2023, measurevar = "below.biomass.g.meter.sq..m2..", groupvars = c("Snail.Level", "Nitrogen.level")))

Note the uncaged treatment only has without for the nitrogen impact. It was a control! While we often need these in experiments, they can create analysis problems. This is because we can’t consider how nutrient level depends on snail treatment for the control level! In other words, interactions can not be calculated for some levels.

This is also why we saw the NA and warnings in our model summary

summary(bgb_model)

Call:
lm(formula = below.biomass.g.meter.sq..m2.. ~ Snail.Level + Nitrogen.level + 
    Snail.Level:Nitrogen.level, data = valdez_2023)

Residuals:
     Min       1Q   Median       3Q      Max 
-103.003  -50.933    1.507   28.297  153.937 

Coefficients: (1 not defined because of singularities)
                                                Estimate Std. Error t value
(Intercept)                                       542.16      40.72  13.315
Snail.Levelremoval                                 33.58      57.58   0.583
Snail.Levelsnail addition                         -88.85      57.58  -1.543
Snail.Leveluncaged                                 45.09      57.58   0.783
Nitrogen.levelwithout                            -111.87      57.58  -1.943
Snail.Levelremoval:Nitrogen.levelwithout           43.39      81.44   0.533
Snail.Levelsnail addition:Nitrogen.levelwithout    60.18      81.44   0.739
Snail.Leveluncaged:Nitrogen.levelwithout              NA         NA      NA
                                                Pr(>|t|)    
(Intercept)                                     2.44e-09 ***
Snail.Levelremoval                                0.5691    
Snail.Levelsnail addition                         0.1451    
Snail.Leveluncaged                                0.4467    
Nitrogen.levelwithout                             0.0724 .  
Snail.Levelremoval:Nitrogen.levelwithout          0.6025    
Snail.Levelsnail addition:Nitrogen.levelwithout   0.4721    
Snail.Leveluncaged:Nitrogen.levelwithout              NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 70.53 on 14 degrees of freedom
Multiple R-squared:  0.498, Adjusted R-squared:  0.2829 
F-statistic: 2.315 on 6 and 14 DF,  p-value: 0.09183

You could note we have the same issue for our initial bird analysis:

two_way_anova_example_int <- lm(Color_index ~ Feather_type * Bird, data=feather)
Anova(two_way_anova_example_int, type="III")
Error in Anova.lm(two_way_anova_example_int, type = "III"): residual df = 0

On a positive note, this means R will typically not consider interactions when it shouldn’t, but you need to know why in order to fix it.

Dealing with controls and missing interactions

To fix this (and deal with controls), we need to consider the data. Valdez et al. (2023) used t-tests (why?) to consider differences between cage and cage control plots (note %in% and the fact they did not focus on above-ground biomass (maybe because uncaged plots had little..). %in% allows you to subset data by matching items to list. Remember you can always get help on functions using something like (we need the quotations for operators!)

?'%in%'
t.test(below.biomass.g.meter.sq..m2..~Snail.Level, valdez_2023[valdez_2023$Snail.Level %in% c("uncaged","control snails") & valdez_2023$Nitrogen.level == "without",])

    Welch Two Sample t-test

data:  below.biomass.g.meter.sq..m2.. by Snail.Level
t = -0.82961, df = 3.9666, p-value = 0.4538
alternative hypothesis: true difference in means between group control snails and group uncaged is not equal to 0
95 percent confidence interval:
 -196.4785  106.3052
sample estimates:
mean in group control snails        mean in group uncaged 
                    430.2967                     475.3833 

If interactions are missing for other reasons (e.g., a set of units failed/died/data was lost), we can either ignore interactions or combine factor levels into a single new treatment variable and analyze using one-way ANOVAs.

Considering interactions

To consider interactions, we can remove the controls

bgb_model_cont_removed <-lm(below.biomass.g.meter.sq..m2..~Snail.Level + Nitrogen.level + Snail.Level:Nitrogen.level, valdez_2023[valdez_2023$Snail.Level != "uncaged",])

We can consider the assumptions

plot(bgb_model_cont_removed)

and now note …

Anova(bgb_model_cont_removed, type="III")
Anova Table (Type III tests)

Response: below.biomass.g.meter.sq..m2..
                           Sum Sq Df  F value    Pr(>F)    
(Intercept)                881823  1 176.4820 1.545e-08 ***
Snail.Level                 24012  2   2.4028   0.13254    
Nitrogen.level              18771  1   3.7567   0.07648 .  
Snail.Level:Nitrogen.level   2893  2   0.2895   0.75371    
Residuals                   59960 12                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

…that the ANOVA table works.

Before we can interpret this, we need to return to our discussion about different sums of squares.

Back to type = III

Much of this follows an excellent post by . A paper by Venables (n.d.) may also be helpful.

We introduced the various types of sums of squares in the previous chapter, but they didn’t matter much there. However, here they do. This returns to the idea that we can partition the variance of a given point among various factors.In general, we can think about determining the impact by comparing sums of squares from models with and without a factor of interest (using incremental sums of squares). So, for a model

lm(response ~ A + B + A:B)

We could consider the impact of all factors and interactions by comparing:

SS(AB | A, B) = SS(A, B, A:B) – SS(A, B)
SS(A | B, AB) = SS(A, B, A:B) – SS(B, A:B)
SS(B | A, AB) = SS(A, B, A:B) – SS(A, A:B)
SS(A | B) = SS(A, B) – SS(B)
SS(B | A) = SS(A, B) – SS(A)

However, the variance we attribute to each factor can be calculated in multiple ways. First, we can consider type 1 sums of squares. Using this approach, the order factors are entered into the model matters. Following above,

SS(A) for factor A. 
SS(B | A) for factor B. 
SS(A:B | B, A) for interaction A:B

This approach leads to different main effects (overall, general impacts of a factor) for factors A and B depending on the order. Note

summary(aov(bgb_model_cont_removed))
                           Df Sum Sq Mean Sq F value Pr(>F)  
Snail.Level                 2  39024   19512   3.905 0.0494 *
Nitrogen.level              1  26919   26919   5.387 0.0387 *
Snail.Level:Nitrogen.level  2   2893    1447   0.290 0.7537  
Residuals                  12  59960    4997                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

is different from

summary(aov(lm(below.biomass.g.meter.sq..m2..~Nitrogen.level + Snail.Level + Snail.Level:Nitrogen.level, valdez_2023[valdez_2023$Snail.Level != "uncaged",])))
                           Df Sum Sq Mean Sq F value Pr(>F)  
Nitrogen.level              1  26919   26919   5.387 0.0387 *
Snail.Level                 2  39024   19512   3.905 0.0494 *
Nitrogen.level:Snail.Level  2   2893    1447   0.290 0.7537  
Residuals                  12  59960    4997                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

..Wait, it’s the same. That’s because differences only arise when the data are unbalanced! ANOVAs were designed for use in perfectly balanced datasets. We’ll come back to this.

Another way is to consider the impact of main effects, such as

SS(A | B) for factor A. 
SS(B | A) for factor B.

This approach is known as type II. It does not consider interactions (or assumes they are equal to 0)

Anova(bgb_model_cont_removed, type="II")
Anova Table (Type II tests)

Response: below.biomass.g.meter.sq..m2..
                           Sum Sq Df F value  Pr(>F)  
Snail.Level                 39024  2  3.9050 0.04941 *
Nitrogen.level              26919  1  5.3874 0.03869 *
Snail.Level:Nitrogen.level   2893  2  0.2895 0.75371  
Residuals                   59960 12                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

so it works ok here since the interaction is not significant, but it shouldn’t be used when interactions are significant/important. Compare it to

Anova(bgb_model_cont_removed, type="III")
Anova Table (Type III tests)

Response: below.biomass.g.meter.sq..m2..
                           Sum Sq Df  F value    Pr(>F)    
(Intercept)                881823  1 176.4820 1.545e-08 ***
Snail.Level                 24012  2   2.4028   0.13254    
Nitrogen.level              18771  1   3.7567   0.07648 .  
Snail.Level:Nitrogen.level   2893  2   0.2895   0.75371    
Residuals                   59960 12                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

to see the difference. Type III considers interactions.

To make things, more complicated, when looking at type III output, we need to consider how R computes main effects. The default contrast in R looks at the main effect for factor B using only the first level of factor A (specified by the treatment contrasts). If there is an interaction, this obviously doesn’t make sense - B may matter for some levels of A but not others. Given this, we can (and maybe should) instead look at the average impact of B. We can do this by updating how contrasts are calculated. We can specify overall changes to how R does this, or we can include in the model as shown here:

Anova(lm(below.biomass.g.meter.sq..m2..~Snail.Level + Nitrogen.level + Snail.Level:Nitrogen.level, valdez_2023[valdez_2023$Snail.Level != "uncaged",], contrasts = list(Snail.Level = "contr.sum", Nitrogen.level = "contr.sum")), type = "III")
Anova Table (Type III tests)

Response: below.biomass.g.meter.sq..m2..
                            Sum Sq Df  F value   Pr(>F)    
(Intercept)                4235224  1 847.6086 1.68e-12 ***
Snail.Level                  39024  2   3.9050  0.04941 *  
Nitrogen.level               26919  1   5.3874  0.03869 *  
Snail.Level:Nitrogen.level    2893  2   0.2895  0.75371    
Residuals                    59960 12                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This time we get the same results for the main effects as we did for Type 1 and Type 2 approaches. To be clear, this is because we used the same type of contrasts and had a balanced design. Note this matches the above,

Anova(lm(below.biomass.g.meter.sq..m2..~Snail.Level + Nitrogen.level + Snail.Level:Nitrogen.level, valdez_2023[valdez_2023$Snail.Level != "uncaged",], contrasts = list(Snail.Level = "contr.sum", Nitrogen.level = "contr.sum")), type = "II")
Anova Table (Type II tests)

Response: below.biomass.g.meter.sq..m2..
                           Sum Sq Df F value  Pr(>F)  
Snail.Level                 39024  2  3.9050 0.04941 *
Nitrogen.level              26919  1  5.3874 0.03869 *
Snail.Level:Nitrogen.level   2893  2  0.2895 0.75371  
Residuals                   59960 12                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

and this does not.

Anova(lm(below.biomass.g.meter.sq..m2..~Snail.Level + Nitrogen.level + Snail.Level:Nitrogen.level, valdez_2023[valdez_2023$Snail.Level != "uncaged",], contrasts = list(Snail.Level = "contr.treatment", Nitrogen.level = "contr.treatment")), type = "III")
Anova Table (Type III tests)

Response: below.biomass.g.meter.sq..m2..
                           Sum Sq Df  F value    Pr(>F)    
(Intercept)                881823  1 176.4820 1.545e-08 ***
Snail.Level                 24012  2   2.4028   0.13254    
Nitrogen.level              18771  1   3.7567   0.07648 .  
Snail.Level:Nitrogen.level   2893  2   0.2895   0.75371    
Residuals                   59960 12                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In doing this, however, note our coefficients also change. Note

data.frame(coef(lm(below.biomass.g.meter.sq..m2..~Snail.Level + Nitrogen.level + Snail.Level:Nitrogen.level, valdez_2023[valdez_2023$Snail.Level != "uncaged",], contrasts = list(Snail.Level = "contr.sum", Nitrogen.level = "contr.sum"))))
                             coef.lm.below.biomass.g.meter.sq..m2.....Snail.Level...Nitrogen.level...
(Intercept)                                                                                485.067222
Snail.Level1                                                                                 1.162778
Snail.Level2                                                                                56.436111
Nitrogen.level1                                                                             38.671667
Snail.Level1:Nitrogen.level1                                                                17.261667
Snail.Level2:Nitrogen.level1                                                                -4.431667

is different than

data.frame(coef(lm(below.biomass.g.meter.sq..m2..~Snail.Level + Nitrogen.level + Snail.Level:Nitrogen.level, valdez_2023[valdez_2023$Snail.Level != "uncaged",])))
                                                coef.lm.below.biomass.g.meter.sq..m2.....Snail.Level...Nitrogen.level...
(Intercept)                                                                                                    542.16333
Snail.Levelremoval                                                                                              33.58000
Snail.Levelsnail addition                                                                                      -88.85333
Nitrogen.levelwithout                                                                                         -111.86667
Snail.Levelremoval:Nitrogen.levelwithout                                                                        43.38667
Snail.Levelsnail addition:Nitrogen.levelwithout                                                                 60.18333

The contrasts sum approach sets the intercept as the overall average and then puts all other levels relative to that (meaning it includes all levels). Our default treatment approach lets the first factor level (typically alphabetical, here technically a grouped level (Snail Level (ambient): Nitrogen level (with) ) be an intercept for the linear model, and then considers the other factor levels as deviations from that.

Finally, note than unbalanced designs (when you have a different number of means for one grouping than others), the different approaches lead to different outcomes. We can consider this (and reinforce the above points) using a built-in dataset, ToothGrowth. If you run this code

?ToothGrowth

in R, you’ll see it notes the dataset contains “the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).”

You can view the data using:

head(ToothGrowth)
   len supp dose
1  4.2   VC  0.5
2 11.5   VC  0.5
3  7.3   VC  0.5
4  5.8   VC  0.5
5  6.4   VC  0.5
6 10.0   VC  0.5

and we can see we have a balanced design here:

xtabs(~supp+dose, ToothGrowth)
    dose
supp 0.5  1  2
  OJ  10 10 10
  VC  10 10 10

Given that, our type 1, 2, 3 approaches are equivalent (we need to make does a factor for ANOVA purposes)

ToothGrowth$dose <- factor(ToothGrowth$dose) 
summary(aov(lm(len~supp*dose, ToothGrowth)))
            Df Sum Sq Mean Sq F value   Pr(>F)    
supp         1  205.4   205.4  15.572 0.000231 ***
dose         2 2426.4  1213.2  92.000  < 2e-16 ***
supp:dose    2  108.3    54.2   4.107 0.021860 *  
Residuals   54  712.1    13.2                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(lm(len~supp*dose, ToothGrowth), type = "II") 
Anova Table (Type II tests)

Response: len
           Sum Sq Df F value    Pr(>F)    
supp       205.35  1  15.572 0.0002312 ***
dose      2426.43  2  92.000 < 2.2e-16 ***
supp:dose  108.32  2   4.107 0.0218603 *  
Residuals  712.11 54                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(lm(len~supp*dose, ToothGrowth, contrasts = list(supp = "contr.sum", dose = "contr.sum")), type = "III")
Anova Table (Type III tests)

Response: len
             Sum Sq Df  F value    Pr(>F)    
(Intercept) 21236.5  1 1610.393 < 2.2e-16 ***
supp          205.4  1   15.572 0.0002312 ***
dose         2426.4  2   92.000 < 2.2e-16 ***
supp:dose     108.3  2    4.107 0.0218603 *  
Residuals     712.1 54                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

but not if we use a different contrast (treatment is default used here)

Anova(lm(len~supp*dose, ToothGrowth), type = "III")
Anova Table (Type III tests)

Response: len
             Sum Sq Df F value    Pr(>F)    
(Intercept) 1750.33  1 132.730 3.603e-16 ***
supp         137.81  1  10.450  0.002092 ** 
dose         885.26  2  33.565 3.363e-10 ***
supp:dose    108.32  2   4.107  0.021860 *  
Residuals    712.11 54                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Also, if we have an unbalanced design

library(purrr) 
library(tidyr) 
set.seed(19) 
ToothGrowth_unbalanced <- ToothGrowth %>%   group_by(supp, dose) %>%    nest() %>%               ungroup() %>%    mutate(n = c(10,8,7,5,10,3)) %>%    mutate(samp = map2(data, n, sample_n)) %>%    dplyr::select(-data) %>%   unnest(samp) 
ToothGrowth_unbalanced_anova <- lm(len~supp*dose, ToothGrowth_unbalanced, contrasts = list(supp = "contr.sum", dose = "contr.sum"))

Then we get different answers from each approach:

summary(aov(ToothGrowth_unbalanced_anova))
            Df Sum Sq Mean Sq F value   Pr(>F)    
supp         1  194.6   194.6  16.636 0.000231 ***
dose         2 1696.3   848.1  72.489 1.59e-13 ***
supp:dose    2  103.7    51.9   4.432 0.018818 *  
Residuals   37  432.9    11.7                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(ToothGrowth_unbalanced_anova, type= "II")
Anova Table (Type II tests)

Response: len
           Sum Sq Df F value    Pr(>F)    
supp       153.04  1 13.0798 0.0008853 ***
dose      1696.26  2 72.4885  1.59e-13 ***
supp:dose  103.71  2  4.4318 0.0188183 *  
Residuals  432.91 37                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(ToothGrowth_unbalanced_anova, type= "III")
Anova Table (Type III tests)

Response: len
             Sum Sq Df   F value    Pr(>F)    
(Intercept) 12583.8  1 1075.5215 < 2.2e-16 ***
supp           79.2  1    6.7654   0.01328 *  
dose         1250.9  2   53.4579 1.221e-11 ***
supp:dose     103.7  2    4.4318   0.01882 *  
Residuals     432.9 37                        
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now we see the different ways of calculating sums of squares leads to different outcomes when the sample sizes are unequal. In general, this supports the use of the type III approach (but see Langsrud (2003)) and for considering contrasts as needed.

Interpreting interactions

An important note is that all approaches above found the same values (p, F, etc.) for the highest order interaction. In general, interactions make interpreting main effects difficult. So, we have a few ways of handling them.

When not significant

If interactions are not significant, they can be handled in 2 ways.

  1. We can leave the interaction in the model and interpret main effects immediately

    1. If doing this, we should make sure we set the contrasts to use an average approach (usually using the sum contrasts as shown above)
        Anova(lm(below.biomass.g.meter.sq..m2..~Snail.Level + Nitrogen.level + Snail.Level:Nitrogen.level, valdez_2023[valdez_2023$Snail.Level != "uncaged",], contrasts = list(Snail.Level = "contr.sum", Nitrogen.level = "contr.sum")), type = "III")
    Anova Table (Type III tests)
    
    Response: below.biomass.g.meter.sq..m2..
                                Sum Sq Df  F value   Pr(>F)    
    (Intercept)                4235224  1 847.6086 1.68e-12 ***
    Snail.Level                  39024  2   3.9050  0.04941 *  
    Nitrogen.level               26919  1   5.3874  0.03869 *  
    Snail.Level:Nitrogen.level    2893  2   0.2895  0.75371    
    Residuals                    59960 12                      
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  2. We can remove the interaction from the model, re-run it, and interpret main effects of the factors.

    bgb_model_cont_removed_int_removed <- update(bgb_model_cont_removed, .~.-Snail.Level:Nitrogen.level)
    plot(bgb_model_cont_removed_int_removed)

    Anova(bgb_model_cont_removed_int_removed, type="III")
    Anova Table (Type III tests)
    
    Response: below.biomass.g.meter.sq..m2..
                    Sum Sq Df  F value    Pr(>F)    
    (Intercept)    1239848  1 276.1645 1.303e-10 ***
    Snail.Level      39024  2   4.3461   0.03402 *  
    Nitrogen.level   26919  1   5.9959   0.02812 *  
    Residuals        62853 14                       
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Which is the same as…

Anova(bgb_model_cont_removed_int_removed, type="II")
Anova Table (Type II tests)

Response: below.biomass.g.meter.sq..m2..
               Sum Sq Df F value  Pr(>F)  
Snail.Level     39024  2  4.3461 0.03402 *
Nitrogen.level  26919  1  5.9959 0.02812 *
Residuals       62853 14                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

When we do this, the contrasts methods matter less,

bgb_model_cont_removed_int_removed <- update(bgb_model_cont_removed, .~.-Snail.Level:Nitrogen.level, contrasts = list(Snail.Level = "contr.sum", Nitrogen.level = "contr.sum"))
Anova(bgb_model_cont_removed_int_removed, type="III")
Anova Table (Type III tests)

Response: below.biomass.g.meter.sq..m2..
                Sum Sq Df  F value    Pr(>F)    
(Intercept)    4235224  1 943.3564 3.015e-14 ***
Snail.Level      39024  2   4.3461   0.03402 *  
Nitrogen.level   26919  1   5.9959   0.02812 *  
Residuals        62853 14                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

but you shouldn’t report main effects from the model with the interaction.

Compare to these to what we actually see (always a good idea!)

Plotting outcomes

Results from two-way ANOVAs are often plotted similarly to one-way ANOVAs, but with colors or other aesthetics representing the additional group.

sdm_summary <- summarySE(valdez_2023[valdez_2023$Snail.Level != "uncaged",], measurevar = "below.biomass.g.meter.sq..m2..", groupvars = c("Snail.Level", "Nitrogen.level")) 
sdm_summary$Snail.Level <- relevel(sdm_summary$Snail.Level, "removal")  
sdm_summary$Nitrogen.level <- revalue(sdm_summary$Nitrogen.level, c("Fertilized" = "Yes",                                                                     "without"= "No")) 
ggplot(sdm_summary, aes(x=Snail.Level,                            
                        y=below.biomass.g.meter.sq..m2..,  
                        fill=Nitrogen.level)) +   
  geom_col(color="black", position=position_dodge()) +   
  geom_errorbar(aes(ymin=below.biomass.g.meter.sq..m2.., ymax=below.biomass.g.meter.sq..m2..+ci), 
                position = position_dodge()) +   
  labs(title="Grazing impacts depend on nitrogen levels",        
       x= "Grazing level",        
       y= expression(paste("Standing dry mass (" , g^{-1}, m^{-2}, ")")),        
       fill = "Fertilized?")

In general, an interaction is not apparent, fertilized plots tend to do better than unfertilized, and there might be less dry mass as snails increase.

Regardless, we can interpret main effects (though with possibly different outcomes). The benefit of approach 2 (removing the interaction) is we “increase” the degrees of freedom associated with the residuals, which ends up reducing the the MST. This is because in 2-way ANOVAs we allocate degrees of freedom to calculating main effects and interactions. This approach was likely used in the original manuscript.

Simply using the the provided output (approach 1) and not performing another series of tests, however, reduces the chances for a Type 1 error. We will return to this discussion when we get to model selection.

If we see significant effects of a factor that has more than 2 levels (like we do when using the approach that drops insignificant interactions), we can consider the general impact of grazing levels using post-hoc tests:

summary(glht(bgb_model_cont_removed_int_removed, linfct = mcp(Snail.Level = "Tukey")))

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lm(formula = below.biomass.g.meter.sq..m2.. ~ Snail.Level + Nitrogen.level, 
    data = valdez_2023[valdez_2023$Snail.Level != "uncaged", 
        ], contrasts = list(Snail.Level = "contr.sum", Nitrogen.level = "contr.sum"))

Linear Hypotheses:
                                     Estimate Std. Error t value Pr(>|t|)  
removal - control snails == 0           55.27      38.68   1.429   0.3534  
snail addition - control snails == 0   -58.76      38.68  -1.519   0.3124  
snail addition - removal == 0         -114.03      38.68  -2.948   0.0272 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

Note the contrast approach has minimal impact here

summary(glht(lm(below.biomass.g.meter.sq..m2..~Snail.Level + Nitrogen.level
                , valdez_2023[valdez_2023$Snail.Level != "uncaged",], contrasts = list(Snail.Level = "contr.sum", Nitrogen.level = "contr.sum")), linfct = mcp(Snail.Level = "Tukey")))

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lm(formula = below.biomass.g.meter.sq..m2.. ~ Snail.Level + Nitrogen.level, 
    data = valdez_2023[valdez_2023$Snail.Level != "uncaged", 
        ], contrasts = list(Snail.Level = "contr.sum", Nitrogen.level = "contr.sum"))

Linear Hypotheses:
                                     Estimate Std. Error t value Pr(>|t|)  
removal - control snails == 0           55.27      38.68   1.429    0.354  
snail addition - control snails == 0   -58.76      38.68  -1.519    0.312  
snail addition - removal == 0         -114.03      38.68  -2.948    0.027 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

When significant

If the interaction term is significant, it means the main effects can not be interpreted. This is because the impact of a given variable depends on the level of another. When this happens, the data is typically divided into subset and analyzed using one-way ANOVAs.

For example, when Valdez et al. (2023) analyzed standing dead mass, they found a significant interaction term:

sdm_model <-lm(Standing.Dead..dry..m2.~Snail.Level * Nitrogen.level, valdez_2023[valdez_2023$Snail.Level != "uncaged",])
plot(sdm_model)

Anova(sdm_model, type="III")
Anova Table (Type III tests)

Response: Standing.Dead..dry..m2.
                            Sum Sq Df F value    Pr(>F)    
(Intercept)                1050.19  1 328.463 4.392e-10 ***
Snail.Level                 300.82  2  47.042 2.095e-06 ***
Nitrogen.level              348.39  1 108.963 2.248e-07 ***
Snail.Level:Nitrogen.level  200.90  2  31.417 1.700e-05 ***
Residuals                    38.37 12                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Following this, you could investigate impacts in plots with nitrogen, where you find snail manipulation had a significant impact (and considered post-hoc which ones!)

sdm_model_fertilized <-lm(Standing.Dead..dry..m2.~Snail.Level, valdez_2023[valdez_2023$Snail.Level != "uncaged" & valdez_2023$Nitrogen.level == "Fertilized",])
plot(sdm_model_fertilized)

Anova(sdm_model_fertilized, type= "III")
Anova Table (Type III tests)

Response: Standing.Dead..dry..m2.
             Sum Sq Df F value    Pr(>F)    
(Intercept) 1050.19  1 206.027 7.158e-06 ***
Snail.Level  300.82  2  29.507  0.000786 ***
Residuals     30.58  6                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(glht(sdm_model_fertilized, linfct = mcp(Snail.Level= "Tukey")))

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lm(formula = Standing.Dead..dry..m2. ~ Snail.Level, data = valdez_2023[valdez_2023$Snail.Level != 
    "uncaged" & valdez_2023$Nitrogen.level == "Fertilized", ])

Linear Hypotheses:
                                     Estimate Std. Error t value Pr(>|t|)    
removal - control snails == 0         -14.150      1.843  -7.676   <0.001 ***
snail addition - control snails == 0   -7.567      1.843  -4.105   0.0150 *  
snail addition - removal == 0           6.583      1.843   3.571   0.0272 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

and plots without added nutrients, where you find snail addition did not

sdm_model_not_fertilized <-lm(Standing.Dead..dry..m2.~Snail.Level, valdez_2023[valdez_2023$Snail.Level != "uncaged" & valdez_2023$Nitrogen.level == "without",])
plot(sdm_model_not_fertilized)

Anova(sdm_model_not_fertilized, type= "III")
Anova Table (Type III tests)

Response: Standing.Dead..dry..m2.
            Sum Sq Df F value   Pr(>F)   
(Intercept) 36.123  1 27.8457 0.001871 **
Snail.Level 12.416  2  4.7856 0.057211 . 
Residuals    7.783  6                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Other approaches for dealing with significant interactions include directly interpreting interaction terms (which we can do given our understanding of linear model coefficients), but this is rarely used. They are somewhat messy

coef(sdm_model)
                                    (Intercept) 
                                      18.710000 
                             Snail.Levelremoval 
                                     -14.150000 
                      Snail.Levelsnail addition 
                                      -7.566667 
                          Nitrogen.levelwithout 
                                     -15.240000 
       Snail.Levelremoval:Nitrogen.levelwithout 
                                      16.153333 
Snail.Levelsnail addition:Nitrogen.levelwithout 
                                      10.356667 

Another approach when interactions are significant is to compare all group means (somewhat like a Tukey design for combined treatment levels). The emmeans package offers this approach.

library(emmeans)
emmeans(sdm_model, pairwise ~ Snail.Level*Nitrogen.level)
$emmeans
 Snail.Level    Nitrogen.level emmean   SE df lower.CL upper.CL
 control snails Fertilized      18.71 1.03 12    16.46    20.96
 removal        Fertilized       4.56 1.03 12     2.31     6.81
 snail addition Fertilized      11.14 1.03 12     8.89    13.39
 control snails without          3.47 1.03 12     1.22     5.72
 removal        without          5.47 1.03 12     3.22     7.72
 snail addition without          6.26 1.03 12     4.01     8.51

Confidence level used: 0.95 

$contrasts
 contrast                                              estimate   SE df t.ratio
 control snails Fertilized - removal Fertilized          14.150 1.46 12   9.692
 control snails Fertilized - snail addition Fertilized    7.567 1.46 12   5.183
 control snails Fertilized - control snails without      15.240 1.46 12  10.439
 control snails Fertilized - removal without             13.237 1.46 12   9.066
 control snails Fertilized - snail addition without      12.450 1.46 12   8.528
 removal Fertilized - snail addition Fertilized          -6.583 1.46 12  -4.509
 removal Fertilized - control snails without              1.090 1.46 12   0.747
 removal Fertilized - removal without                    -0.913 1.46 12  -0.626
 removal Fertilized - snail addition without             -1.700 1.46 12  -1.164
 snail addition Fertilized - control snails without       7.673 1.46 12   5.256
 snail addition Fertilized - removal without              5.670 1.46 12   3.884
 snail addition Fertilized - snail addition without       4.883 1.46 12   3.345
 control snails without - removal without                -2.003 1.46 12  -1.372
 control snails without - snail addition without         -2.790 1.46 12  -1.911
 removal without - snail addition without                -0.787 1.46 12  -0.539
 p.value
  <.0001
  0.0024
  <.0001
  <.0001
  <.0001
  0.0072
  0.9716
  0.9868
  0.8450
  0.0021
  0.0206
  0.0512
  0.7419
  0.4406
  0.9932

P value adjustment: tukey method for comparing a family of 6 estimates 

This was likely the approach used in the Valdez et al. 2023 paper.

paged_table(data.frame(emmeans(sdm_model, pairwise ~ Snail.Level*Nitrogen.level)$contrasts))
paged_table(data.frame(emmeans(sdm_model, pairwise ~ Snail.Level)$contrasts))
NOTE: Results may be misleading due to involvement in interactions
paged_table(data.frame(emmeans(sdm_model, pairwise ~ Nitrogen.level)$contrasts))
NOTE: Results may be misleading due to involvement in interactions

Note the warning; if interactions are significant comparing main effects may be inappropriate (which is why other approaches include subsetting the data).

Other options

Bootstrapping and permutation tests options may also be used for two-way ANOVAs when assumptions are not met, though there is implementation is more complicated than single-sample designs due to the need to randomize/permute interaction impacts.

Plotting outcomes

Results from two-way ANOVAs are often plotted similarly to one-way ANOVAs, but with colors or other aesthetics representing the additional group.

sdm_summary <- summarySE(valdez_2023[valdez_2023$Snail.Level != "uncaged",], measurevar = "Standing.Dead..dry..m2.", groupvars = c("Snail.Level", "Nitrogen.level"))
sdm_summary
     Snail.Level Nitrogen.level N Standing.Dead..dry..m2.        sd        se
1 control snails     Fertilized 3               18.710000 3.6626220 2.1146158
2 control snails        without 3                3.470000 1.0013491 0.5781292
3        removal     Fertilized 3                4.560000 0.4250882 0.2454248
4        removal        without 3                5.473333 1.4150029 0.8169523
5 snail addition     Fertilized 3               11.143333 1.3025104 0.7520047
6 snail addition        without 3                6.260000 0.9417006 0.5436911
        ci
1 9.098457
2 2.487489
3 1.055978
4 3.515062
5 3.235615
6 2.339314
sdm_summary$Snail.Level <- relevel(sdm_summary$Snail.Level, "removal")

sdm_summary$Nitrogen.level <- revalue(sdm_summary$Nitrogen.level, c("Fertilized" = "Yes",
                                                                    "without"= "No"))

ggplot(sdm_summary, aes(x=Snail.Level, 
                           y=Standing.Dead..dry..m2.,
                           fill=Nitrogen.level)) +
  geom_col(color="black", position=position_dodge()) +
  geom_errorbar(aes(ymin=Standing.Dead..dry..m2., ymax=Standing.Dead..dry..m2.+ci), position = position_dodge()) +
  labs(title="Grazing impacts depend on nitrogen levels",
       x= "Grazing level",
       y= expression(paste("Standing dry mass (" , g^{-1}, m^{-2}, ")")),
       fill = "Fertilized?")

Next steps

In the next chapters we will carry our linear model approach to consider the relationship between continuous outcomes and continuous predictor variables.

References

Fong, Caitlin, Sarah Bittick, and Peggy Fong. 2017. “Simultaneous Synergist, Antagonistic, and Additive Interactions Between Multiple Local Stressors All Degrade Algal Turf Communities on Coral Reefs.” Journal of Ecology 106 (December). https://doi.org/10.1111/1365-2745.12914.
Langsrud, Øyvind. 2003. “ANOVA for Unbalanced Data: Use Type II Instead of Type III Sums of Squares.” Statistics and Computing 13 (2): 163–67. https://doi.org/10.1023/A:1023260610025.
McDonald, J. H. 2014. Handbook of Biological Statistics. 3rd ed. Baltimore, MD: Sparky House Publishing.
Valdez, Stephanie R., Pedro Daleo, David S. DeLaMater Iii, and Brian R. Silliman. 2023. “Variable Responses to Top-down and Bottom-up Control on Multiple Traits in the Foundational Plant, Spartina Alterniflora.” PLOS ONE 18 (5): e0286327. https://doi.org/10.1371/journal.pone.0286327.
Venables, Bill. n.d. “Coding Matrices, Contrast Matrices and Linear Models.”
Wiebe, Karen L., and Gary R. Bortolotti. 2002. “Variation in Carotenoid-Based Color in Northern Flickers in a Hybrid Zone.” The Wilson Bulletin 114 (3): 393–400. http://www.jstor.org/stable/4164474.