Call:
lm(formula = Sepal.Length ~ Petal.Length, data = iris)
Residuals:
Min 1Q Median 3Q Max
-1.24675 -0.29657 -0.01515 0.27676 1.00269
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.30660 0.07839 54.94 <2e-16 ***
Petal.Length 0.40892 0.01889 21.65 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4071 on 148 degrees of freedom
Multiple R-squared: 0.76, Adjusted R-squared: 0.7583
F-statistic: 468.6 on 1 and 148 DF, p-value: < 2.2e-16
We find that F1,148=468.5, p < 0.01, so we reject the null hypothesis that the slope is equal to 0. The estimate indicates a slope of 0.408, so sepal length increases with petal length. An R2 value of 0.76 indicates petal length explains about 75% of the variation in sepal length.
Alternatively, we could consider if the association between the two variables is equal to 0 (or not).
cor.test(~ Sepal.Length + Petal.Length, data = iris)
Pearson's product-moment correlation
data: Sepal.Length and Petal.Length
t = 21.646, df = 148, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8270363 0.9055080
sample estimates:
cor
0.8717538
We find the same p-value (using a t-distribution), and see that the estimated linear correlation coefficient, .87, is the square of our R2 value.
If we prefer a rank-based test, we can update the code:
cor.test(~ Sepal.Length + Petal.Length, data = iris,method="spearman")
Warning in cor.test.default(x = mf[[1L]], y = mf[[2L]], ...): Cannot compute
exact p-value with ties
Spearman's rank correlation rho
data: Sepal.Length and Petal.Length
S = 66429, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.8818981
Which again leads us to reject the null hypothesis. Finally, a bootstrap options can be produced
where estimates indicated the slope does not contain 0.
Practice
1
A professor carried out a long-term study to see how various factors impacted pulse rate before and after exercise. Data can be found at
http://www.statsci.org/data/oz/ms212.txt
With more info at
http://www.statsci.org/data/oz/ms212.html.
Is there evidence that age, height, or weight impact change in pulse rate for students who ran (Ran column = 1)? For each of these, how much variation in pulse rate do they explain?
2
(from OZDASL repository, http://www.statsci.org/data/general/stature.html; reference for more information)
When anthropologists analyze human skeletal remains, an important piece of information is living stature. Since skeletons are commonly based on statistical methods that utilize measurements on small bones. The following data was presented in a paper in the American Journal of Physical Anthropology to validate one such method. Data is available @
http://www.statsci.org/data/general/stature.txt
as a tab-delimted file (need to use read.table!) Is there evidence that metacarpal bone length is a good predictor of stature? If so, how much variation does it account for in the response variable?
3
Data on medals won by various countries in the 1992 and 1994 Olympics is available in a tab-delimited file at
http://www.statsci.org/data/oz/medals.txt
More information on the data can be found at:
http://www.statsci.org/data/oz/medals.html
Is there any relationship between a country’s population and the total number of medals they win?
4
Continuing with the Olympic data, is there a relationship between the latitude of a country and the number of medals won in summer or winter Olympics?
5
Data on FEV (forced expiratory volume), a measure of lung function, can be found at