During this exercise you will practice adding code chunks and inline
R code to the R markdown document you created during the previous exercise. The solutions to each
section of the exercise can be revealed by clicking on the
Solution
button (don’t be too quick to do this though!).
Once you have finished you can find my version of the .Rmd
file here and the rendered html
document here. A pdf version of the
document is here
So, the first thing you will need to do is open your
squid_analysis.Rmd
file in RStudio. You will already have a
nicely marked up description of the data and the morphological variables
measured during the study so we will continue with data import,
exploration and some simple visualisations. You will need to download
the squid1.xlsx
file from Data link,
open it in excel and save it as a tab delimited file called
squid1.txt
(as we did during the R course).
Create a code chunk named data-import
and include some R
code to import the squid1.txt
file into R using the
read.table()
function and assign it to a variable called
squid
. Within the code chunk get R to print out the
structure of the squid
variable. Add some text below the
code chunk to explain (to your future self) what you are doing. Also
write some text with inline R code to state how many observations are
present (hint: use the nrow()
function) in the dataset and
how many variables were measured (hint: use the ncol()
function).
Perhaps you might remember from Exercise 4 that the
year
, month
and maturity.stage
variables were coded as integers in the original dataset (see the output
from str(squid)
above to confirm) but we want to recode
them as factors. Create a new code chunk called data-recode
and include R code to create a new variable for each of these variables
in the squid
dataframe and recode them as factors. Change
the chunk option to hide this code in the rendered document. Write some
text to describe that you have recoded the variables to let the reader
know you’ve done this (even though you don’t show the code in the final
document).
```{r, data-recode, echo=FALSE}
# convert variables to factors
squid$Fmaturity <- factor(squid$maturity.stage)
squid$Fmonth <- factor(squid$month)
squid$Fyear <- factor(squid$year)
```
The variables `maturity.stage`, `month` and `year` were converted from integers to factors in the dataframe
`squid`. These recoded variables were named `Fmaturity`, `Fmonth` and `Fyear`.
Next create a code chunk (give it a suitable name) and write some
code to create a table of the number of observations for each year and
month combination (hint: remember the table()
function?)
Don’t forget to use the factor recoded versions of these variables. Use
the kable()
function from the knitr
package to
nicely render the table (remember you will need to use
library(knitr)
to load the package first). You might want
to also include the argument row.names = TRUE
when you use
the kable()
function so the table contains the month
numbers. Another argument that is often useful to include is
format = 'markdown
which will ensure the table renders
nicely in HTML and pdf formats. Write some text to highlight which year
has the fewest number of observations and which year and month
combinations have no observations.
Next, let's take a look at the number of observations across years and months.
```{r, data-obs}
library(knitr)
kable(table(squid$Fmonth, squid$Fyear), row.names = TRUE, format = 'markdown')
```
Or,If you want a fancy table with the variable names (month and year) then use the pander function from the
pander package. You will also have to provide the dimnames to the table and use the ftable function to
'flatten' the table.
```{r, data-obs2}
library(pander)
mytab <- table(squid$Fmonth, squid$Fyear)
names(dimnames(mytab)) <- c("Month", "Year")
pander(ftable(mytab))
```
In 1989 data were only collected during December and in 1991 data collection stopped in August.
During 1990, no data were collected in either February or June. There are also some months that
have very few observations (May 1990 and July 1991 for example) so care must be taken when
modelling these data.
We should also create a summary table of the number of observations
for each level of maturity stage for each month. Create a code chunk and
include code to do this but hide the code in the rendered document using
the appropriate chunk option. If you feel like it, experiment with the
kableExtra
package to alter some of the formatting (text
size etc). You may need to install the package before you can use it.
Again, write some text to summarise your findings.
Number of observations each month for each of the squid maturity stages are given in the table below.
```{r, maturity-obs, echo=FALSE}
# using just kable
kable(table(squid$Fmaturity, squid$Fmonth), row.names = TRUE)
# using kableExtra (good for html output)
library(kableExtra)
kable(table(squid$Fmaturity, squid$Fmonth), row.names = TRUE) %>%
kable_styling(bootstrap_options = "striped", font_size = 14)
```
Not all maturity stages were observed in all months. Very few squid of maturity stage 1, 2 or 3
were caught in the months February to May whereas maturity stages 4 and 5 were
predominantly caught during these months.
Ok, lets produce some exploratory plots in our document. The first
thing we would like to know is whether there are any unusual
observations in the variables; DML
, weight
,
nid.length
and ovary.weight
. Create a code
chunk containing code to plot cleveland dotplots of these variables. You
can either plot one after the other or split the plotting device into 2
rows and 2 columns to plot them all together. This time we would like to
show the code used to create these plots. Describe what you see and come
up with a plausable explanation (refer back to Exercise 4 Question 7 for
a hint.)
Now let's check for any unusual observations in the variables; `DML`, `weight`,
`nid.length` and `ovary.weight`.
```{r, dotplot}
par(mfrow = c(2, 2))
dotchart(squid$DML, main = "DML")
dotchart(squid$weight, main = "weight")
dotchart(squid$nid.length, main = "nid length")
dotchart(squid$ovary.weight, main = "ovary weight")
```
It looks like the variable `nid.length` contains an **unusually large** value. Actually, this value
is biologically implausible and clearly an error. I went back and checked my field notebook and
sure enough it's a typo. I was knackered at the time and accidentally inserted a zero by mistake
when transcribing these data. **Doh!** This squid was identified as observation number
`r which(squid$nid.length > 400)` with a sample number
`r squid$sample.no[which(squid$nid.length > 400)]`. This observation was
subsequently removed from the data set.
Next, produce a boxplot of the DML
variable against
Fmaturity
to examine whether the size of the squid changes
with maturity stage. Change the chunk option to supress the R code in
the final document and only display the plot. Write some text to
summarise the main conclusions from the plot. Also include some inline R
code to report the mean value of DML for each of the maturity
stages.
Let's take a look at whether DML changes with maturity stage.
```{r, maturity-dml, echo=FALSE}
boxplot(DML ~ Fmaturity, data = squid, xlab = "maturity stage", ylab = "DML")
```
DML was lowest for maturity stage 1 with a mean length of
`r round(mean(squid$DM[squid$Fmaturity == 1]), digits = 2)` mm. DML
increased until maturity stage 3
(mean `r round(mean(squid$DM[squid$Fmaturity == 3]), digits = 2)` mm)
after which it remained reasonably consistent for maturity stages 4
(mean `r round(mean(squid$DM[squid$Fmaturity == 4]), digits = 2)` mm)
and 5 (mean `r round(mean(squid$DM[squid$Fmaturity == 5]), digits = 2)` mm).
It’s always good practice to include a summary of the version of R
you have been using as well as a list of packages loaded. An easy way to
do this is include sessionInfo()
in a code chunk at the end
of your document.