More ANOVAs

Remember you should

Overview

This practice reviews the More ANOVAs lecuture.

Examples

If interaction is significant

Following the memory example from class, read in and check data

memory <- read.table("http://www.statsci.org/data/general/eysenck.txt", header = T,
                     stringsAsFactors = T)
str(memory)
'data.frame':   100 obs. of  3 variables:
 $ Age    : Factor w/ 2 levels "Older","Younger": 2 2 2 2 2 2 2 2 2 2 ...
 $ Process: Factor w/ 5 levels "Adjective","Counting",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ Words  : num  8 6 4 6 7 6 5 7 9 7 ...

Let’s put younger level first

library(plyr)
memory$Age <- relevel(memory$Age, "Younger")

and graph

library(Rmisc)
Loading required package: lattice
function_output <- summarySE(memory, measurevar="Words", groupvars =
                               c("Age", "Process"), na.rm = T)
library(ggplot2)
ggplot(function_output, aes(x=Age, y=Words,color=Process, 
                                   shape = Process)) +
  geom_line(aes(group=Process, linetype = Process), size=2) +
    geom_point(size = 5) +
  ylab("Words remembered")+ 
  xlab("Age") + 
  ggtitle("Process type interacts with \n age to impact memory")+
  theme(axis.title.x = element_text(face="bold", size=28), 
        axis.title.y = element_text(face="bold", size=28), 
        axis.text.y  = element_text(size=20),
        axis.text.x  = element_text(size=20), 
        legend.text =element_text(size=20),
        legend.title = element_text(size=20, face="bold"),
        plot.title = element_text(hjust = 0.5, face="bold", size=32))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

There appears to be some interactions. Let’ build a model

memory_interactions <- lm(Words ~ Age * Process, memory)

and check assumptions.

par(mfrow=c(2,2))
plot(memory_interactions)

These appear to be met, so look at output

library(car)
Warning: package 'car' was built under R version 4.4.1
Loading required package: carData
Anova(memory_interactions, type = "III")
Anova Table (Type III tests)

Response: Words
            Sum Sq Df  F value    Pr(>F)    
(Intercept) 2190.4  1 272.9281 < 2.2e-16 ***
Age           72.2  1   8.9963 0.0034984 ** 
Process     1353.7  4  42.1690 < 2.2e-16 ***
Age:Process  190.3  4   5.9279 0.0002793 ***
Residuals    722.3 90                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since interaction is significant, analyze subsets. For example,

memory_interactions_young <- lm(Words ~ Process, memory[memory$Age == "Younger",])
plot(memory_interactions_young)

Anova(memory_interactions_young, type = "III")
Anova Table (Type III tests)

Response: Words
            Sum Sq Df F value    Pr(>F)    
(Intercept) 2190.4  1 343.442 < 2.2e-16 ***
Process     1353.7  4  53.064 < 2.2e-16 ***
Residuals    287.0 45                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is a significant difference in words recalled based on process, but why? Investigate with post-hoc tests.

library(multcomp)
Loading required package: mvtnorm
Loading required package: survival
Loading required package: TH.data
Loading required package: MASS

Attaching package: 'TH.data'
The following object is masked from 'package:MASS':

    geyser
comp_young <- glht(memory_interactions_young, linfct = mcp(Process = "Tukey"))
summary(comp_young)

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lm(formula = Words ~ Process, data = memory[memory$Age == "Younger", 
    ])

Linear Hypotheses:
                             Estimate Std. Error t value Pr(>|t|)    
Counting - Adjective == 0      -8.300      1.129  -7.349  < 1e-04 ***
Imagery - Adjective == 0        2.800      1.129   2.479  0.11350    
Intentional - Adjective == 0    4.500      1.129   3.984  0.00219 ** 
Rhyming - Adjective == 0       -7.200      1.129  -6.375  < 1e-04 ***
Imagery - Counting == 0        11.100      1.129   9.828  < 1e-04 ***
Intentional - Counting == 0    12.800      1.129  11.333  < 1e-04 ***
Rhyming - Counting == 0         1.100      1.129   0.974  0.86545    
Intentional - Imagery == 0      1.700      1.129   1.505  0.56457    
Rhyming - Imagery == 0        -10.000      1.129  -8.854  < 1e-04 ***
Rhyming - Intentional == 0    -11.700      1.129 -10.359  < 1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

Blocking example

Following feather color example from class:

# more than 2? ####
feather <-  read.csv("https://raw.githubusercontent.com/jsgosnell/CUNY-BioStats/master/datasets/wiebe_2002_example.csv", stringsAsFactors = T)
str(feather)
'data.frame':   32 obs. of  3 variables:
 $ Bird       : Factor w/ 16 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Feather    : Factor w/ 2 levels "Odd","Typical": 2 2 2 2 2 2 2 2 2 2 ...
 $ Color_index: num  -0.255 -0.213 -0.19 -0.185 -0.045 -0.025 -0.015 0.003 0.015 0.02 ...
set.seed(25)
special <- data.frame(Bird = LETTERS[1:16], Feather = "Special", 
                      Color_index= feather[feather$Feather == "Typical", "Color_index"] +
                        .3 +runif(16,1,1)*.01)
feather <- merge(feather, special, all = T)


Anova(lm(Color_index ~ Feather + Bird, data=feather), type= "III")
Anova Table (Type III tests)

Response: Color_index
             Sum Sq Df  F value    Pr(>F)    
(Intercept) 0.36392  1  59.9538 1.224e-08 ***
Feather     1.67906  2 138.3093 7.208e-16 ***
Bird        0.34649 15   3.8055 0.0008969 ***
Residuals   0.18210 30                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(multcomp)
compare <- glht(lm(Color_index ~ Feather + Bird, data=feather), linfct = mcp("Feather" = "Tukey"))
summary(compare)

     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lm(formula = Color_index ~ Feather + Bird, data = feather)

Linear Hypotheses:
                       Estimate Std. Error t value Pr(>|t|)    
Typical - Odd == 0      0.13712    0.02755   4.978   <1e-04 ***
Special - Odd == 0      0.44712    0.02755  16.232   <1e-04 ***
Special - Typical == 0  0.31000    0.02755  11.254   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
#note comparison doesn't work
Anova(lm(Color_index ~ Feather * Bird, data=feather), type= "III")
Error in Anova.lm(lm(Color_index ~ Feather * Bird, data = feather), type = "III"): residual df = 0

Swirl lesson

Swirl is an R package that provides guided lessons to help you learn and review material. These lessons should serve as a bridge between all the code provided in the slides and background reading and the key functions and concepts from each lesson. A full course lesson (all lessons combined) can also be downloaded using the following instructions.

THIS IS ONE OF THE FEW TIMES I RECOMMEND WORKING DIRECTLY IN THE CONSOLE! THERE IS NO NEED TO DEVELOP A SCRIPT FOR THESE INTERACTIVE SESSIONS, THOUGH YOU CAN!

  • install the “swirl” package

  • run the following code once on the computer to install a new course

    library(swirl)
    install_course_github("jsgosnell", "JSG_swirl_lessons")
  • start swirl!

    swirl()
    • swirl()
  • then follow the on-screen prompts to select the JSG_swirl_lessons course and the lessons you want

    • Here we will focus on the More ANOVAs lesson
  • TIP: If you are seeing duplicate courses (or odd versions of each), you can clear all courses and then re-download the courses by

    • exiting swirl using escape key or bye() function

      bye()
    • uninstalling and reinstalling courses

      uninstall_all_courses()
      install_course_github("jsgosnell", "JSG_swirl_lessons")
    • when you restart swirl with swirl(), you may need to select

      • No. Let me start something new

Practice

1

A survey was conducted to see if athletes and non-athletes deal with anger in the same way. Data is @

angry <- read.csv(“https://docs.google.com/spreadsheets/d/e/2PACX-1vSaawG37o1ZUEs1B4keIJpZAY2c5tuljf29dWnzqQ0tHNCzfbz85AlWobYzBQ3nPPXJBLP-FWe4BNZB/pub?gid=1784556512&single=true&output=csv”, stringsAsFactors = T)

and more information is at

http://onlinestatbook.com/case_studies/angry_moods.html.

Focus on the following variables:

Sports 1 = athletes, 2 = non-athletes Gender 1 = males, 2 = females Expression (AE) index of general anger expression: (Anger-Out) + (Anger-In) - (Control-Out) - (Control-In) + 48

Is there any evidence that gender or athlete status impact how anger is expressed?

2

A professor carried out a long-term study to see how various factors impacted pulse rate before and after exercise. Data can be found at http://www.statsci.org/data/oz/ms212.txt With more info at http://www.statsci.org/data/oz/ms212.html. Is there evidence that frequency of exercise (Exercise column) and gender impact change in pulse rate for students who ran (Ran column = 1)?

3

Data from Valdez et al 2023 is available @ https://docs.google.com/spreadsheets/d/e/2PACX-1vT2gaLu6pyRMlcbzarn3ej4bFmT_iHvrlNWJYSdrsLdUWIjcJi7rU11-ipvYpGnqD9qLDnbhNd2sDUW/pub?gid=1707080634&single=true&output=csv.

Import it into to R and

  • determine how the snail grazing and nitrogen levels impact number of flowering shoots ( Shoot.density..m2)
  • construct a plot to showcase your analysis

4

Find an example of a factorial ANOVA from a paper that is related to your research or a field of interest. Make sure you understand the connections between the methods, results, and graphs. Briefly answer the following questions

  • What was the dependent variable?
  • What were the independent variables?
  • Was the interaction significant?
    • If so, how did they interpret findings
    • If not, were the main effects significant?