Read Chapter 6 to help you complete the questions in this exercise.
Although this short course is primarily focussed on introducing you to R, it wouldn’t be complete if we didn’t have a quick peek at some of R’s statistical roots. Having said that, this will be a very brief overview with very little in the way of theory so don’t worry if you get a little lost - this is just a taster, the main course is still to come!
datadirectory. Import these data into R and assign to a variable with an appropriate name. These data were collected from an experiment to investigate the difference in growth rate of the giant tiger prawn (Penaeus monodon) fed either an artificial or natural diet. Have a quick look at the structure of this dataset and plot the growth rate versus the diet using an appropriate plot. How many observations are there in each diet treatment?
shapiro.test()to assess normality of growth rate for each diet separately (Hint: use your indexing skills to extract the growth rate for each diet
GRate[diet=='Natural']first). Use the function
var.test()to test for equal variance (see
?var.testfor more information or Section 6.1 of the book for more details). Are your data normally distributed and have equal variances?
t.test()function (Section 6.1of the book). Use the argument
var.equal = TRUEto perform the t-test assuming equal variances. What is the null hypothesis you want to test? Do you reject or fail to reject the null hypothesis? What is the value of the t statistic, degrees of freedom and p value? How would you summarise these summary statistics in a report?
lm()function to fit a linear model with
GRateas the response variable and
dietas an explanatory variable (see Section 6.3 for a very brief introduction to linear modelling). Assign (
<-) the results of the linear model to a variable with an appropriate name (i.e.
anova(growth.lm). Compare the ANOVA p value to the p value obtained using a t-test. What do you notice? What is the value of the F statistics and degrees of freedom? How would you summarise these results in a report?
par(mfrow=c(2,2))so you can fit four plots on a single device. Use the
plot()function on your fitted model (
plot(growth.lm)) to plot the graphs. Discuss with an instructor how to interpret these plots. What are your conclusions?
datadirectory. Import the dataset into R and assign the dataframe an appropriate name. These data were collected from a study to examine the change in
diameterof red algae Mastocarpus stellatus spores grown in three different diatom cultures and a control group grown in artificial seawater (
diatom.treatvariable). Use the function
str()to examine the dataframe. How many replicates are there per diatom treatment? Use an appropriate plot to examine whether there are any obvious differences in diameter between the treatments.
Sstat) using a one-way analysis of variance (ANOVA). What is your null hypothesis?
lm()once again. Make sure you know which of the variables is your response variable and which is your explanatory variable (ask an instructor if in doubt). Fit the linear model and assign the model output to a variable with an appropriate name (i.e.
anova()function. What is the value of the p value? Do you reject or fail to reject the null hypothesis? What is the value of the F statistic and degrees of freedom? How would you report these summary statistics in a report?
mosaicpackage to perform these comparisons (you will need to install this package first and then use
library(mosaic)to make the function available). Which groups are different from each other if we use the p-value cutoff (alpha) of p < 0.05?
plot()function with the
TukeyHSD.lm(gigartina.lm). Ask if you get stuck (or Google it!).
datadirectory. Import the dataset into R and as usual assign it to a variable. These data are from an experiment that was conducted to investigate the relationship between temperature (
temp) and the beat rate (Hz)
beat_rateof the copepod Temora longicornis which had been acclimatised at three different temperature regimes (
acclimitisation_temp). Examine the structure of the dataset. How many variables are there? What type of variables are they? Which is the response (dependent) variable, and which are the explanatory (independent) variables?
acclimitisation_temp? Is it a factor? Convert
acclimitisation_tempto a factor and store the result in a new column in your dataframe called
Facclimitisation_temp. Hint: use the function
factor(). Use an appropriate plot to visualise these data (perhaps a coplot or similar?).
tempfor each level of
Facclimatisation_temp. Take a look at the plot you produced in Q16, do you think the relationships are different?
As usual we will fit the model using the
lm() function. You will need to fit the main effects of
Facclimatisation_temp and the interaction between
Facclimatisation_temp. You can do this using either of the equivalent specifications:
temp + Facclimatisation_temp + temp:Facclimatisation_temp or
temp * Facclimatisation_temp
Facclimatisation_tempsignificant? What is the interpretation of the interaction term? Should we interpret the main effects of
#) describing the interpretation of this model. Report the appropriate summary statistics such as F values, degrees of freedom and p values.
beat_rateas the response variable. Does the interpretation of the model change? Do the validation plots of the residuals look better?
End of Exercise 5