Getting used to R

Overview

The focus of this overview is to get you used to tools we will be using in class. Before completing it you should have a basic understanding of using R. We will do an introduction in class (download help file). You should also be comfortable using Rstudio and github (see help file).

qmd basics

qmd files differ from R files in that they combine regular text with code chunks. This is a code chunk

print("this is a chunk")
[1] "this is a chunk"

Code chunks combine code with output. When combined with regular text/prose, this makes it easier to produce a range of documents. You set the output in the YAML header (the stuff between the 3 dashes you see at top of this document).

After you write the file, you Render it to turn the qmd file into the selected output. Try it now. Note the first time you do this in a project you may be prompted to install a number of packages! If you are using a webservice you may also need to allow pop-ups in your browser. Don’t be surprised if a new window pops up (it should).

Note: qmd is a newer form of rmd. Rmd files used to be knit. Once qmd files were established, both those and Rmd files were rendered. Some files in the book are still rmd but are being transitioned over time. This has little impact on the user.

The render button turns your .qmd file into other products

The Render button saves the .qmd file and renders a new version whose output depends on what you selected in the header. Here we have html_document, so if everything works a preview of a webpage like document should appear. The file also produces a github friendly .md file. This means you should only edit the qmd file! Everything else is automatically produced and any changes you make will be overwritten by your next render.

When you Render a file, it runs in a totally new R instance. this means anything you only added in your instance (like working in the console) won’t be available. In other words, its the best way to see what a “new” user gets when they use your code.

However, you can also work with the file interactively. To do this, press the green button next to an R chunk.

The green arrows just runs the chunk in the console and shows the output
print("this is a chunk")
[1] "this is a chunk"

Now we’ll start changing the file to show you how rmarkdown works. First, amend the file by replacing the NAME and DATE spots in the header (top of the file between the — markers) with your name and the real date. Then Knit the file again. You should see your name in the new preview.

Rstudio has a Markdown Quick Reference guide (look under the help tab), but some general notes.

  • Pound/Hashtag signs denote headers
  • you can surround something double asterisks for bold or single asterisks for italics
  • lists are denoted by numbers or asterisks at beginning of line (followed by space!)
    • and can be indented for sublevels
  • R code can be done inline, but is generally placed in stand-alone chunks
    • these will, by default, show the code and output
  • lots of other options exist!

You can also switch to visual mode to more easily work with formatting.

Selecting visual mode will show you what your markdown code actually looks like

The main idea is qmd files allow you to combine code, text, graphs, etc into multiple outputs that you can share (including with coding illiterate colleagues who just want output).

To practice working with qmd files and R, work through the questions below. You can also get more help with this video

Practice in R

1

Let x be defined by

x <- 5:15

Try executing this chunk (in R studio, not the webview) by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

This will run the code in the Console. You may need to switch to Console (from Rmarkdown) in the lower right window area to see this. The executed code is also displayed in your processed file (hit Knit again to see this!).

Note running this chunk has added an object named x to the Environment tab area (top right area of screen). But nothing was “returned” in the console. You prove this by typing x in the console. What does it return?

Determine what the “:” does! Complete the following sentence:

The : means FILL THIS IN.

2

Now try to guess the output of these commands

  • length(x)
  • max(x)
  • x[x < 10]
  • x^2
  • x[ x < 12 & x > 7]

INSERT AN R CHUNK HERE AND RUN EACH OF THESE COMMANDS. Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I. Then state what each of these does.

3

Is -1:2 the same as (-1):2 or -(1:2)? INSERT AN R CHUNK HERE AND RUN EACH OF THESE COMMANDS. Then state what each of these does.

Data input, plotting, and tests

You can read in a dataset from the internet following this protocol.

sleep <- read.csv("http://raw.githubusercontent.com/jsgosnell/CUNY-BioStats/master/datasets/sleep.csv", stringsAsFactors = T)

Run this chunk and note it has added an object named sleep to the environment.

Info on the dataset is viewable @ http://www.statsci.org/data/general/sleep.html.

4

How many rows does the sleep data set have (hint: ?dim)? What kind of data is stored in each variable?

ENTER ANSWERS HERE. ADD ANY R CHUNKS YOU USED TO FIND THE ANSWER.

5

Change the column named BodyWt to Body_weight”* in the sleep dataset.

ADD ANY R CHUNKS YOU USED TO COMPLETE THE TASK.

6

Produce a plot of how TotalSleep differs between primates and other species. What is this plot showing?

Note, as of early 2020 R no longer reads in strings as factors! This means the Primate column, which is full of “Yes”s and “No”s, reads in as words and R doesn’t know how to plot them. There are many ways to handle this. You can modify the read.csv command (add stringsAsFactors = T option), eg

sleep <- read.csv("http://raw.githubusercontent.com/jsgosnell/CUNY-BioStats/master/datasets/sleep.csv", stringsAsFactors = T)

If you do this, you’ll need to rechange anything you previously updated to the object (like renaming the BodyWt column).

You can also modify a single column for the actual object

sleep$Primate <- factor (sleep$Primate)

or for a single command, eg (plot not actually shown!)

plot(BodyWt ~ factor(Primate), data = sleep)

NOTE YOU CAN ADD A PLOT TO THE DOCUMENT TOO! AMEND THE BELOW AS NEEDED.

plot(cars)

7

The sleep dataset begs to have a linear model fit for it. Let’s consider. First plot how TotalSleep is explained by BrainWt. Are there any issues with the data? Exclude any outlier and fit a linear model to obtain the p-value for the model (hint: summary()). What does this imply?

ENTER ANSWERS HERE. ADD ANY R CHUNKS YOU USED TO FIND THE ANSWER.

EXTRA QUESTIONS

not required

alt text Dow Puffin Matthew Zalewski / CC BY (https://creativecommons.org/licenses/by/3.0)

8

Sometimes data doesn’t have headers (column names),so you have to add them. Download a dataset on alcids (birds like puffins and auklets) from https://raw.githubusercontent.com/jsgosnell/CUNY-BioStats/master/datasets/alcids55.csv.
You’ll need to modify the read.csv function by specifying header = False, then use the names function to name the columns [“year”, “a1_abund”, “NAO”, “a2_abund”, “a3_abund”, “a4_abund”, “a5_abund”, “a6_abund”]. Try it and check your input using the head command.

ENTER ANSWERS HERE. ADD ANY R CHUNKS YOU USED TO FIND THE ANSWER.

9

Here’s a sample dataset:

Date greenness Richness habitat
12-25-2009 13766 46 forest
01-01-2010 50513 60 forest
01-15-2010 25084 60 grassland

Enter it into R (manually or via a .csv). (Hint: you have a piece of this in the code already). Check your input using the head() command.

ENTER ANSWERS HERE. ADD ANY R CHUNKS YOU USED TO FIND THE ANSWER.