Introduction
Why are we forced to take this class?
“Why is statistics a required course for someone who wants to be a dentist/doctor/ nurse?”
This is a common question (or at least thought) for many students. I hope to convince you this semester you at least need to understand statistics as part of the scientific method (and you should realize the scientific process informs all those jobs - in fact it can inform any job or task where you are searching for an answer or better method).
For example, doctors prescribe medicine to patients, but how do they know these medicines work? Some doctors carry out research, but many rely on published guidelines, which themselves rely on research. So a new drug or treatment is proposed- but who decides if it should be used? Researchers carry out trials to determine the efficacy of the treatment. In doing this they have to consider how to design an experiment (what do they collect? from whom?) and analyze the resulting data so they can trust the results.
Other students in our class may be interested in a career focused on resource management or environmental issues (e.g., wildlife rehabilitation, carbon mitigation expert, researcher). Regardless of your goal, any question should be informed by this approach. For example,
- Does an environmental factor cause cancer?
- Do potential toxins really harm the enviroment?
- Is organic food really healthier?
- Does exposing organisms reared in captivity to predator cues lead to more successful releases?
- Zhu et al. (2023)
At its heart, statistics is about turning data into information that we can use to make decisions or better understand the world around us. Data can come from experiments we are running. This offers a clear connection to field and lab science, and its what we will focus on for most of this class. Data and theories can also be used to develop models that produce output ; this isn’t real-world data, but it offers very useful insight on what we think will happen if something occurs (and something we can test with other field data!). For example, restoration projects may focus on small-scale plots that undergo different restoration protocols. Data produced from monitoring these plots may be used to develop models to predict large-scale impacts (and maybe benefits and costs) of different restoration scenarios for larger regions.
Above I used words like know (how do they know these medicines work? )and predict (develop models to predict large-scale impacts (and maybe benefits and costs) of different restoration scenarios for larger regions). While we may use words like these to discuss our findings and results, its important to note the are not totally correct. Statistics (and related models) generally give us estimates about how the real world works. Put another way, if we knew everything about the world, we wouldn’t need to use statistics because we wouldn’t need estimates.
The reasons we don’t usually know everything include
- the world is complicated (some questions can’t be directly tested)
- it’s not possible to measure everything
Because of this, statistics is also focused on trying to describe populations of interest or find signals (impacts of treatments, medicines, or restoration practices, for example) amidst the noise (variation in outcomes that are always common!). When considering relationships among variables, noise may occur because there are lots of things impacting the outcome of interest. For example, restoration protocol may impact the trajectory of an oyster reef, but so too may local factors like temperature an and salinity. Noise can also occur because of sampling error - since we don’t measure everything, our estimate of relationship or population traits may be imperfect.
In the next session we’ll start to discuss how we can use data to make estimates about a population (and answer questions like what is a population and what are we trying to estimate). However, a final aside to finish this section - we often think about statistics happening after an experiment, survey, or other thing we get data from is finished. However, part of statistics is experimental design! Statistics should inform how you setup an experiment. In fact, the best idea (which seldom happens!) is that you simulate the type of data you expect to get from your experiment and then analyze that before you actually run the experiment. This ensures you are measuring what you need to measure and setting things up correctly! As the famous (to statisticians) quote states,
To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of. -Ronald Fisher