Two-Way mixed effects model, RCBD, R version

Randomized Complete Block Design

This is a first attempt at an R version of the Randomized Complete Block example.

A Randomized Complete Block Design is one in which there are two factors, or effects,, with one fixed effect (typically the 'treatments' of interest) and one random, block effect. I am not attempting to present all of the theory and background to RCBD designs and models; there are many statistical texts which discuss this in substantial detail (plus I am not a professional statistician). For statistical details I suggest: Steel, Torrie and Dickey, or Cochran and Cox). Suffice to say that we are looking here at a two-way, mixed model (fixed and random effects) in a cross-classified arrangement. Here are links to sections for a two-way fixed effects model (classical ANOVA) and a two factor nested model.

An RCBD model is relatively straightforward in the completely balanced scenario, especially if one is simply looking at treatment differences, and not wanting lsmreans/emmeans; any textbook will provide the classical formulae. For unbalanced data and when we wish to obtain lsmeans/emmeans the classical, balanced formulae are insufficient and we need to turn to matrix-based approaches, and computer software. Here we are looking at the use of R; one of the nice features of such matrix-based approaches is that the same formulae work equally well for both balanced and unbalanced data.

This example considers a situation where we want to test the effect of four different diets to be fed to just-weaned piglets. This is a study to simulate feeding of young infant humans, our Research Ethics Board would not allow us to carry out this early-stage trial using humans! We know/believe, from previous studies, that there is likely to be an effect of the sow, even post-weaning. So, we decide to use a Randomized Complete Block Design: we shall have 12 sows, and we shall take 4 male piglets from each sow, and assign one piglet to each of the four treatment diets. Thus, we shall need to consider both Sow and Diet as factors in our statistical model. We shall wish to consider Diet as a fixed effect; we are interested in these 4 specific diets. We shall wish to consider the sows not as fixed effects, but rather as a random effect; we want our conclusions not to be limited to these 12 sows, but to be generalisable across sows in general. This would be a completely balanced design. Unfortunately, 1 piglet died before the start of the experiment, and hence the analysis is unbalanced; one missing observation, for reasons unrelated to the experiment (i.e. Missing At Random, MAR). We wish to obtain an F-test for the fixed effect of Diets, estimates of the differences amongst Diets (accounting for the multiple comparisons issue), and various linear combinations amongst the diets, as well as estimates of the random effects (sigma squared e and sigma squared sow (aka block)), as well as a test of significance of the random sow effect. We would also like to obtain BLUP (Best Linear Unbiased Preidictons) of the sow effects and their PEV (Prediction Error Variances) and hence standard errors (and hence accuracy) of these BLUP's.

Here are the data from this trial

	Diet	Sow	Gain
1	1	1	51.90
2	2	1	53.70
3	3	1	54.90
4	4	1	55.60
5	1	2	51.60
6	3	2	52.20
7	4	2	53.80
8	1	3	44.00
9	2	3	46.00
10	3	3	47.30
11	4	3	48.40
12	1	4	52.60
13	2	4	52.80
14	3	4	54.00
15	4	4	56.30
16	1	5	56.50
17	2	5	57.50
18	3	5	58.20
19	4	5	61.70
20	1	6	48.50
21	2	6	50.80
22	3	6	51.60
23	4	6	52.00
24	1	7	52.10
25	2	7	53.40
26	3	7	55.50
27	4	7	56.10
28	1	8	52.60
29	2	8	52.40
30	3	8	53.60
31	4	8	55.80
32	1	9	49.30
33	2	9	49.40
34	3	9	52.10
35	4	9	51.60
36	1	10	51.10
37	2	10	52.50
38	3	10	52.90
39	4	10	57.10
40	1	11	58.60
41	2	11	57.60
42	3	11	58.90
43	4	11	61.00
44	1	12	45.10
45	2	12	46.70
46	3	12	47.90
47	4	12	48.00

NOTE: the piglets are all kept in individual pens, so they are all independent of one another (approved by the University's Animal Care Committee). Piglets were randomly assigned, within litters, one to each of the four diets. Prior to the start of the experiment one piglet on Diet 2 was lost, for reasons completely unrelated to the treatments (hence it is legitimate to consider it to be Missing At Random, MAR).

Thus, this is an RCBD, with sows as the 'blocks', and diets as our treatment.

The statistical model will be :

Y_ij = µ + Diet_i + Sow_j + e_ij

We read the data, from a plain ASCII text file, into R, we then use the as.factor() function to define Sow and Diet as factors, i.e. as qualitative levels, with no quantitative meaning. We use the lmerTest package, to fit a linear mixed model, and then esticon() and emmeans() to obtain estimated marginal means (aka least squares means/lsmeans) and differences.

Below I show code interspersed with output and some interpretation, comments and explanations, so that these codes and ideas can be extended to other models.

Recap: the basic model has the fixed effect of Diet and the random effect of Sow. Both effects are factors, hnence we define them as such. Note my style of defining the Filename for the read.table function; this is to separate the FilePath from the filename per se, to aid in portability and/or moving the complete subdirectory to another location or system.


#
# R statements, for a mixed effects 2-way model

library(doBy)
library(lmerTest)

# read data, fit Y = mu + diet + sow + e
 
# use a FilePath + filename, to make it easier to move/change
# a subdirectory

FilePath <- "your path to data subdirectory/"
FileName <- "blocks2.txt"

filen <- paste(FilePath,FileName,sep="")
filen

ds<-read.table(filen,header=TRUE)

summary(ds)
ds$Diet<-as.factor(ds$Diet)
ds$Sow<-as.factor(ds$Sow)
summary(ds)

# mixed model, Diet is a fixed effect, Sow is a Random effect
lm2 <- lmer(Gain~Diet+(1|Sow),data=ds)

# generate an ANOVA for the fixed effects (ONLY)
anova(lm2)

ANOVA, fixed effect(s)

Sum Sq Mean Sq NumDF DenDF F value Pr(>F)

Diet 88.46 29.49 3.00 32.01 47.91 0.0000

	Sum Sq	Mean Sq	NumDF	DenDF	F value	Pr(>F)
Diet	88.46	29.49	3.00	32.01	47.91	0.0000

We see that we conclude that there is a significant effet of Diets, i.e. there are (some) differences amongst these Diets. Which differences are significant is a subsiduary and subsequent question.

Random effect(s)

Here we show the code and results for testing the significance for the random effect(s), via a log-likelihood test:


# test the significance of the random effect(s), via Log Likelihood and Chi-squared
ranova(lm2)

npar logLik AIC LRT Df Pr(>Chisq)

<none> 6.00 -80.43 172.87

(1 | Sow) 5.00 -124.83 259.66 88.79 1 0.0000

	npar	logLik	AIC	LRT	Df	Pr(>Chisq)
<none>	6.00	-80.43	172.87
(1 \| Sow)	5.00	-124.83	259.66	88.79	1	0.0000

Having tested the significance of the random effect(s), it is going to be sensible to want to know what are the numerical magnitude of the variances. These may/should be needed for publishing (eithere a thesis or manuscripts) and are also needed for any power/sample size calculations. Note how we use an extractor function VarCorr(). Note also we use the xtable package to produce the HTML code for this web-page; xtble can also produce LaTeX code if the output is to be incorporated into a LaTeX document (this is the approach that I use for building the course notes for this graduate course)

lm2
str(lm2)

# show how we can output the random effects variance-covariance components

VarCorr(lm2)

(vc <- VarCorr(lm2))  ## default print method: standard dev and corr

## both variance and std.dev.
print(vc,comp=c("Variance","Std.Dev."),digits=4)

# generate html code displaying the variance components, as shown
below
vc.xtable <- xtable(as.data.frame(print(vc,comp=c("Variance","Std.Dev."),digits=4)))
print(vc.xtable,type="html")

Results, variance components

Group Variance Standard Deviation

1 Sow (Intercept) 14.53 3.81

2 Residual 0.62 0.78

	Group	Variance	Standard Deviation
1	Sow (Intercept)	14.53	3.81
2	Residual	0.62	0.78

Estimated marginal means (emmeans/lsmeans)

Diet emmean(/th> SE df lower.CL upper.CL

1 51.2 1.12 11.7 48.7 53.6

2 52.0 1.13 11.8 49.6 54.5

3 53.3 1.12 11.7 50.8 55.7

4 54.8 1.12 11.7 52.3 57.2

Degrees-of-freedom method: Kenward-Roger Confidence level used: 0.95

Diet	emmean(/th>	SE	df	lower.CL	upper.CL
1	51.2	1.12	11.7	48.7	53.6
2	52.0	1.13	11.8	49.6	54.5
3	53.3	1.12	11.7	50.8	55.7
4	54.8	1.12	11.7	52.3	57.2

Note that the denominator d.f. are non necessarily integers, this is a consequence of the unbalanced data.

Differences amonst Estimated marginal means (emmeans/lsmeans)

Least Squares Means table:

Difference Estimate Std. Error df t value lower upper Pr(>|t|)

Diet1 - Diet2 -0.86658 0.32981 32 -2.6276 -1.53836 -0.19481 0.0130916 *

Diet1 - Diet3 -2.10000 0.32028 32 -6.5568 -2.75238 -1.44762 2.185e-07 ***

Diet1 - Diet4 -3.62500 0.32028 32 -11.3184 -4.27738 -2.97262 1.004e-12 ***

Diet2 - Diet3 -1.23342 0.32981 32 -3.7398 -1.90519 -0.56164 0.0007226 ***

Diet2 - Diet4 -2.75842 0.32981 32 -8.3638 -3.43019 -2.08664 1.472e-09 ***

Diet3 - Diet4 -1.52500 0.32028 32 -4.7615 -2.17738 -0.87262 3.968e-05 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Confidence level: 95%
Degrees of freedom method: Satterthwaite

Difference	Estimate	Std. Error	df	t value	lower	upper	Pr(>\|t\|)
Diet1 - Diet2	-0.86658	0.32981	32	-2.6276	-1.53836	-0.19481	0.0130916 *
Diet1 - Diet3	-2.10000	0.32028	32	-6.5568	-2.75238	-1.44762	2.185e-07 ***
Diet1 - Diet4	-3.62500	0.32028	32	-11.3184	-4.27738	-2.97262	1.004e-12 ***
Diet2 - Diet3	-1.23342	0.32981	32	-3.7398	-1.90519	-0.56164	0.0007226 ***
Diet2 - Diet4	-2.75842	0.32981	32	-8.3638	-3.43019	-2.08664	1.472e-09 ***
Diet3 - Diet4	-1.52500	0.32028	32	-4.7615	-2.17738	-0.87262	3.968e-05 ***



> #
> # R statements, for a mixed effects 2-way model
> 
> library(doBy)
Loading required package: survival
> library(lmerTest)
Loading required package: Matrix
Loading required package: lme4

Attaching package: ‘lmerTest’

The following object is masked from ‘package:lme4’:

    lmer

The following object is masked from ‘package:stats’:

    step

> # see below for use of lmerTest (probably better)
> 
> # read data, fit Y = mu + sow + trt + e
>  
> ds<-read.table("complete path and file name",header=TRUE)
> summary(ds)
      Diet            Sow              Gain      
 Min.   :1.000   Min.   : 1.000   Min.   :44.00  
 1st Qu.:1.500   1st Qu.: 4.000   1st Qu.:50.95  
 Median :3.000   Median : 7.000   Median :52.60  
 Mean   :2.511   Mean   : 6.596   Mean   :52.83  
 3rd Qu.:3.500   3rd Qu.: 9.500   3rd Qu.:55.70  
 Max.   :4.000   Max.   :12.000   Max.   :61.70  
> ds$Diet<-as.factor(ds$Diet)
> ds$Sow<-as.factor(ds$Sow)
> summary(ds)
 Diet        Sow          Gain      
 1:12   1      : 4   Min.   :44.00  
 2:11   3      : 4   1st Qu.:50.95  
 3:12   4      : 4   Median :52.60  
 4:12   5      : 4   Mean   :52.83  
        6      : 4   3rd Qu.:55.70  
        7      : 4   Max.   :61.70  
        (Other):23                  
> 
> # mixed model
> lm2 <- lmer(Gain~Diet+(1|Sow),data=ds)
> anova(lm2)
Analysis of Variance Table of type III  with  Satterthwaite 
approximation for degrees of freedom
     Sum Sq Mean Sq NumDF  DenDF F.value   Pr(>F)    
Diet 88.461  29.487     3 32.013   47.91 6.11e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> rand(lm2)
Analysis of Random effects Table:
    Chi.sq Chi.DF p.value    
Sow   88.8      1  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> lm2
Linear mixed model fit by REML ['merModLmerTest']
Formula: Gain ~ Diet + (1 | Sow)
   Data: ds
REML criterion at convergence: 160.8686
Random effects:
 Groups   Name        Std.Dev.
 Sow      (Intercept) 3.8119  
 Residual             0.7845  
Number of obs: 47, groups:  Sow, 12
Fixed Effects:
(Intercept)        Diet2        Diet3        Diet4  
    51.1583       0.8666       2.1000       3.6250  
>

From the ANOVA for the fixed effects test we can see that the F-value is 47.91, with ndf=3 and ddf=32.01. The non-integer degrees of freedom for the denominator are because of the missing observation.

For the test of significance of the random effect of Sow, we see that the calculated Chi-squared value is 88.8, with 1 d.f., i.e. statistically significant, hence validating our choice of an RCBD.

Sow variance = 14.530

Residual variance = 0.6155