This is a first attempt at an R version of the Randomized Complete Block example.
A Randomized Complete Block Design is one in which there are two factors, or effects,, with one fixed effect (typically the 'treatments' of interest) and one random, block effect. I am not attempting to present all of the theory and background to RCBD designs and models; there are many statistical texts which discuss this in substantial detail (plus I am not a professional statistician). For statistical details I suggest: Steel, Torrie and Dickey, or Cochran and Cox). Suffice to say that we are looking here at a two-way, mixed model (fixed and random effects) in a cross-classified arrangement. Here are links to sections for a two-way fixed effects model (classical ANOVA) and a two factor nested model.
An RCBD model is relatively straightforward in the completely balanced scenario, especially if one is simply looking at treatment differences, and not wanting lsmreans/emmeans; any textbook will provide the classical formulae. For unbalanced data and when we wish to obtain lsmeans/emmeans the classical, balanced formulae are insufficient and we need to turn to matrix-based approaches, and computer software. Here we are looking at the use of R; one of the nice features of such matrix-based approaches is that the same formulae work equally well for both balanced and unbalanced data.
This example considers a situation where we want to test the effect of four different diets to be fed to just-weaned piglets. This is a study to simulate feeding of young infant humans, our Research Ethics Board would not allow us to carry out this early-stage trial using humans! We know/believe, from previous studies, that there is likely to be an effect of the sow, even post-weaning. So, we decide to use a Randomized Complete Block Design: we shall have 12 sows, and we shall take 4 male piglets from each sow, and assign one piglet to each of the four treatment diets. Thus, we shall need to consider both Sow and Diet as factors in our statistical model. We shall wish to consider Diet as a fixed effect; we are interested in these 4 specific diets. We shall wish to consider the sows not as fixed effects, but rather as a random effect; we want our conclusions not to be limited to these 12 sows, but to be generalisable across sows in general. This would be a completely balanced design. Unfortunately, 1 piglet died before the start of the experiment, and hence the analysis is unbalanced; one missing observation, for reasons unrelated to the experiment (i.e. Missing At Random, MAR). We wish to obtain an F-test for the fixed effect of Diets, estimates of the differences amongst Diets (accounting for the multiple comparisons issue), and various linear combinations amongst the diets, as well as estimates of the random effects (sigma squared e and sigma squared sow (aka block)), as well as a test of significance of the random sow effect. We would also like to obtain BLUP (Best Linear Unbiased Preidictons) of the sow effects and their PEV (Prediction Error Variances) and hence standard errors (and hence accuracy) of these BLUP's.
Here are the data from this trial
Diet | Sow | Gain | |
---|---|---|---|
1 | 1 | 1 | 51.90 |
2 | 2 | 1 | 53.70 |
3 | 3 | 1 | 54.90 |
4 | 4 | 1 | 55.60 |
5 | 1 | 2 | 51.60 |
6 | 3 | 2 | 52.20 |
7 | 4 | 2 | 53.80 |
8 | 1 | 3 | 44.00 |
9 | 2 | 3 | 46.00 |
10 | 3 | 3 | 47.30 |
11 | 4 | 3 | 48.40 |
12 | 1 | 4 | 52.60 |
13 | 2 | 4 | 52.80 |
14 | 3 | 4 | 54.00 |
15 | 4 | 4 | 56.30 |
16 | 1 | 5 | 56.50 |
17 | 2 | 5 | 57.50 |
18 | 3 | 5 | 58.20 |
19 | 4 | 5 | 61.70 |
20 | 1 | 6 | 48.50 |
21 | 2 | 6 | 50.80 |
22 | 3 | 6 | 51.60 |
23 | 4 | 6 | 52.00 |
24 | 1 | 7 | 52.10 |
25 | 2 | 7 | 53.40 |
26 | 3 | 7 | 55.50 |
27 | 4 | 7 | 56.10 |
28 | 1 | 8 | 52.60 |
29 | 2 | 8 | 52.40 |
30 | 3 | 8 | 53.60 |
31 | 4 | 8 | 55.80 |
32 | 1 | 9 | 49.30 |
33 | 2 | 9 | 49.40 |
34 | 3 | 9 | 52.10 |
35 | 4 | 9 | 51.60 |
36 | 1 | 10 | 51.10 |
37 | 2 | 10 | 52.50 |
38 | 3 | 10 | 52.90 |
39 | 4 | 10 | 57.10 |
40 | 1 | 11 | 58.60 |
41 | 2 | 11 | 57.60 |
42 | 3 | 11 | 58.90 |
43 | 4 | 11 | 61.00 |
44 | 1 | 12 | 45.10 |
45 | 2 | 12 | 46.70 |
46 | 3 | 12 | 47.90 |
47 | 4 | 12 | 48.00 |
NOTE: the piglets are all kept in individual pens, so they are all independent of one another (approved by the University's Animal Care Committee). Piglets were randomly assigned, within litters, one to each of the four diets. Prior to the start of the experiment one piglet on Diet 2 was lost, for reasons completely unrelated to the treatments (hence it is legitimate to consider it to be Missing At Random, MAR).
Thus, this is an RCBD, with sows as the 'blocks', and diets as our treatment.
The statistical model will be :
Yij = µ + Dieti + Sowj + eij
We read the data, from a plain ASCII text file, into R, we then use the as.factor() function to define Sow and Diet as factors, i.e. as qualitative levels, with no quantitative meaning. We use the lmerTest package, to fit a linear mixed model, and then esticon() and emmeans() to obtain estimated marginal means (aka least squares means/lsmeans) and differences.
Below I show code interspersed with output and some interpretation, comments and explanations, so that these codes and ideas can be extended to other models.
Recap: the basic model has the fixed effect of Diet and the random effect of Sow. Both effects are factors, hnence we define them as such. Note my style of defining the Filename for the read.table function; this is to separate the FilePath from the filename per se, to aid in portability and/or moving the complete subdirectory to another location or system.
# # R statements, for a mixed effects 2-way model library(doBy) library(lmerTest) # read data, fit Y = mu + diet + sow + e # use a FilePath + filename, to make it easier to move/change # a subdirectory FilePath <- "your path to data subdirectory/" FileName <- "blocks2.txt" filen <- paste(FilePath,FileName,sep="") filen ds<-read.table(filen,header=TRUE) summary(ds) ds$Diet<-as.factor(ds$Diet) ds$Sow<-as.factor(ds$Sow) summary(ds) # mixed model, Diet is a fixed effect, Sow is a Random effect lm2 <- lmer(Gain~Diet+(1|Sow),data=ds) # generate an ANOVA for the fixed effects (ONLY) anova(lm2)
Sum Sq | Mean Sq | NumDF | DenDF | F value | Pr(>F) | |
---|---|---|---|---|---|---|
Diet | 88.46 | 29.49 | 3.00 | 32.01 | 47.91 | 0.0000 |
We see that we conclude that there is a significant effet of Diets, i.e. there are (some) differences amongst these Diets. Which differences are significant is a subsiduary and subsequent question.
Here we show the code and results for testing the significance for the random effect(s), via a log-likelihood test:
# test the significance of the random effect(s), via Log Likelihood and Chi-squared ranova(lm2)
npar | logLik | AIC | LRT | Df | Pr(>Chisq) | |
---|---|---|---|---|---|---|
<none> | 6.00 | -80.43 | 172.87 | |||
(1 | Sow) | 5.00 | -124.83 | 259.66 | 88.79 | 1 | 0.0000 |
Having tested the significance of the random effect(s), it is going to be sensible to want to know what are the numerical magnitude of the variances. These may/should be needed for publishing (eithere a thesis or manuscripts) and are also needed for any power/sample size calculations. Note how we use an extractor function VarCorr(). Note also we use the xtable package to produce the HTML code for this web-page; xtble can also produce LaTeX code if the output is to be incorporated into a LaTeX document (this is the approach that I use for building the course notes for this graduate course)
lm2 str(lm2) # show how we can output the random effects variance-covariance components VarCorr(lm2) (vc <- VarCorr(lm2)) ## default print method: standard dev and corr ## both variance and std.dev. print(vc,comp=c("Variance","Std.Dev."),digits=4) # generate html code displaying the variance components, as shown below vc.xtable <- xtable(as.data.frame(print(vc,comp=c("Variance","Std.Dev."),digits=4))) print(vc.xtable,type="html")
Group | Variance | Standard Deviation | |
---|---|---|---|
1 | Sow (Intercept) | 14.53 | 3.81 |
2 | Residual | 0.62 | 0.78 |
Diet | emmean(/th> | SE | df | lower.CL | upper.CL |
---|---|---|---|---|---|
1 | 51.2 | 1.12 | 11.7 | 48.7 | 53.6 |
2 | 52.0 | 1.13 | 11.8 | 49.6 | 54.5 |
3 | 53.3 | 1.12 | 11.7 | 50.8 | 55.7 |
4 | 54.8 | 1.12 | 11.7 | 52.3 | 57.2 |
Note that the denominator d.f. are non necessarily integers, this is a consequence of the unbalanced data.
Least Squares Means table:
Difference | Estimate | Std. Error | df | t value | lower | upper | Pr(>|t|) |
---|---|---|---|---|---|---|---|
Diet1 - Diet2 | -0.86658 | 0.32981 | 32 | -2.6276 | -1.53836 | -0.19481 | 0.0130916 * |
Diet1 - Diet3 | -2.10000 | 0.32028 | 32 | -6.5568 | -2.75238 | -1.44762 | 2.185e-07 *** |
Diet1 - Diet4 | -3.62500 | 0.32028 | 32 | -11.3184 | -4.27738 | -2.97262 | 1.004e-12 *** |
Diet2 - Diet3 | -1.23342 | 0.32981 | 32 | -3.7398 | -1.90519 | -0.56164 | 0.0007226 *** |
Diet2 - Diet4 | -2.75842 | 0.32981 | 32 | -8.3638 | -3.43019 | -2.08664 | 1.472e-09 *** |
Diet3 - Diet4 | -1.52500 | 0.32028 | 32 | -4.7615 | -2.17738 | -0.87262 | 3.968e-05 *** |
> # > # R statements, for a mixed effects 2-way model > > library(doBy) Loading required package: survival > library(lmerTest) Loading required package: Matrix Loading required package: lme4 Attaching package: lmerTest The following object is masked from package:lme4: lmer The following object is masked from package:stats: step > # see below for use of lmerTest (probably better) > > # read data, fit Y = mu + sow + trt + e > > ds<-read.table("complete path and file name",header=TRUE) > summary(ds) Diet Sow Gain Min. :1.000 Min. : 1.000 Min. :44.00 1st Qu.:1.500 1st Qu.: 4.000 1st Qu.:50.95 Median :3.000 Median : 7.000 Median :52.60 Mean :2.511 Mean : 6.596 Mean :52.83 3rd Qu.:3.500 3rd Qu.: 9.500 3rd Qu.:55.70 Max. :4.000 Max. :12.000 Max. :61.70 > ds$Diet<-as.factor(ds$Diet) > ds$Sow<-as.factor(ds$Sow) > summary(ds) Diet Sow Gain 1:12 1 : 4 Min. :44.00 2:11 3 : 4 1st Qu.:50.95 3:12 4 : 4 Median :52.60 4:12 5 : 4 Mean :52.83 6 : 4 3rd Qu.:55.70 7 : 4 Max. :61.70 (Other):23 > > # mixed model > lm2 <- lmer(Gain~Diet+(1|Sow),data=ds) > anova(lm2) Analysis of Variance Table of type III with Satterthwaite approximation for degrees of freedom Sum Sq Mean Sq NumDF DenDF F.value Pr(>F) Diet 88.461 29.487 3 32.013 47.91 6.11e-12 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 > rand(lm2) Analysis of Random effects Table: Chi.sq Chi.DF p.value Sow 88.8 1 <2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 > lm2 Linear mixed model fit by REML ['merModLmerTest'] Formula: Gain ~ Diet + (1 | Sow) Data: ds REML criterion at convergence: 160.8686 Random effects: Groups Name Std.Dev. Sow (Intercept) 3.8119 Residual 0.7845 Number of obs: 47, groups: Sow, 12 Fixed Effects: (Intercept) Diet2 Diet3 Diet4 51.1583 0.8666 2.1000 3.6250 >
From the ANOVA for the fixed effects test we can see that the F-value is 47.91, with ndf=3 and ddf=32.01. The non-integer degrees of freedom for the denominator are because of the missing observation.
For the test of significance of the random effect of Sow, we see that the calculated Chi-squared value is 88.8, with 1 d.f., i.e. statistically significant, hence validating our choice of an RCBD.
Sow variance = 14.530
Residual variance = 0.6155