Nested Design, take 2

Nested, Subsampling Design, take 2 (aka follow-up)

In many cases we have a model with subsamples; as we have seen earlier. This arises when the experimental unit and the sampling unit are not the same. For example, imagine that we have a group of 54 pigs and we divide them up into groups of 6 pigs and put them in 9 pens. We have 3 diets and we randomly assign the 3 diets to the pens, so that we have 3 pens on each diet. In each pen there is a trough into which we put the feed, so that all pigs have free and equal access. We weigh each pig at the start of the experiment and at the end and then calculate te difference; it is that difference that we shall be looking at as our dependent variable (outcome). We have 54 pigs and hence 54 weight-gains. BUT, the experimental unit, to which the treatment was applied, was the pen and NOT the pig; pigs are the subsampling unit. It does not matter that it was the pigs that ate the feed and not the pen; the pigs were still a group. If one ignored this elementary fact and analysed the data one would in all likelihood come up with overly optimistic results; i.e. rubbish. Additionally, we may well have a mixture of male and female pigs. It does not matter that it is the pigs that ate the feed and not the 'pen' eating the feed; the animals were a 'group' and this MUST be accounted for.

The statistical model will be:
Y_ijk = mu + diet_i + pen_ij + ...

The data, in the form of the SAS code and data step to input said data are given below.

Here are the input data and SAS statements


data subsamp2;
input diet pen pig gain sex;
cards;
   1  1  1   241.47   2
   1  1  2   266.13   1
   1  1  3   201.13   2
   1  1  4   314.11   1
   1  1  5   234.54   1
   1  1  6   256.76   2
   1  2  1   343.85   1
   1  2  2   278.91   1
   1  2  3   263.21   2
   1  2  4   299.79   2
   1  2  5   329.00   2
   1  2  6   343.61   1
   1  3  1   257.04   2
   1  3  2   340.12   1
   1  3  3   276.31   2
   1  3  4   293.03   1
   1  3  5   322.97   1
   1  3  6   313.31   1
   2  1  1   268.52   1
   2  1  2   271.46   2
   2  1  3   269.41   2
   2  1  4   242.54   2
   2  1  5   265.53   2
   2  1  6   281.51   1
   2  2  1   355.27   1
   2  2  2   291.54   1
   2  2  3   308.28   1
   2  2  4   275.09   1
   2  2  5   301.42   2
   2  2  6   312.66   2
   2  3  1   287.78   1
   2  3  2   356.63   1
   2  3  3   347.70   1
   2  3  4   339.20   1
   2  3  5   334.41   2
   2  3  6   259.72   2
   3  1  1   316.33   2
   3  1  2   354.47   1
   3  1  3   340.59   2
   3  1  4   395.03   2
   3  1  5   372.19   2
   3  1  6   375.58   1
   3  2  1   317.48   1
   3  2  2   287.57   2
   3  2  3   348.24   2
   3  2  4   336.53   2
   3  2  5   283.04   2
   3  2  6   310.32   1
   3  3  1   341.39   2
   3  3  2   350.02   1
   3  3  3   380.93   1
   3  3  4   395.15   1
   3  3  5   365.06   1
   3  3  6   364.22   1
;

Anything else to our model?

Well. IFF all our pigs were the same sex then no, our model would be complete (apart from the residual error [the variation amongst pigs within pen].

As described, there IS more; we have piglets of different sexes, males and females. We cannot and should not ignore this. How do we consider the effect of sex?

WELL. We should recognise that this is an example of a 'split-plot'. We have sub-divided our group of piglets (pen, aka 'plot') into 2 sub-groups (males and females). So, we shall add to our model an effect of sex and a diet-by-sex interaction.

Our model is now:
Y_ijk = mu + diet_i + pen_ij + sex_k + diet*sex_ik + e_ijk

and our SAS code is:


proc mixed data=subsamp2 lognote;
class diet pen sex;
model gain = diet sex diet*sex/ddfm=kr;
random pen(diet);
run;

SAS output

The Mixed Procedure

Model Information
Data Set	WORK.SUBSAMP2
Dependent Variable	gain
Covariance Structure	Variance Components
Estimation Method	REML
Residual Variance Method	Profile
Fixed Effects SE Method	Kenward-Roger
Degrees of Freedom Method	Kenward-Roger

Class Level Information
Class	Levels	Values
diet	3	1 2 3
pen	3	1 2 3
pig	6	1 2 3 4 5 6
sex	2	1 2

Dimensions
Covariance Parameters	2
Columns in X	12
Columns in Z	9
Subjects	1
Max Obs per Subject	54

Number of Observations
Number of Observations Read	54
Number of Observations Used	54
Number of Observations Not Used	0

Iteration History
Iteration	Evaluations	-2 Res Log Like	Criterion
0	1	488.99068327
1	2	477.15648448	0.00000003
2	1	477.15647939	0.00000000

Convergence criteria met.

Covariance Parameter Estimates
Cov Parm	Estimate
pen(diet)	614.06
Residual	744.20

Fit Statistics
-2 Res Log Likelihood	477.2
AIC (Smaller is Better)	481.2
AICC (Smaller is Better)	481.4
BIC (Smaller is Better)	481.6

Type 3 Tests of Fixed Effects
Effect	Num DF	Den DF	F Value	Pr > F
diet	2	5.91	4.24	0.0721
sex	1	44	8.47	0.0057
diet*sex	2	43.9	0.94	0.4000

Least Squares Means
Effect	diet	sex	Estimate	Standard Error	DF	t Value	Pr > \|t\|
diet	1		285.41	15.7023	5.92	18.18	<.0001
diet	2		297.22	15.7036	5.92	18.93	<.0001
diet	3		346.34	15.6854	5.89	22.08	<.0001
sex		1	321.20	9.7810	7.88	32.84	<.0001
sex		2	298.12	10.0036	8.59	29.80	<.0001
diet*sex	1	1	304.33	16.7309	7.6	18.19	<.0001
diet*sex	1	2	266.50	17.2912	8.63	15.41	<.0001
diet*sex	2	1	306.56	16.8111	7.69	18.24	<.0001
diet*sex	2	2	287.89	17.4123	8.77	16.53	<.0001
diet*sex	3	1	352.72	17.2763	8.37	20.42	<.0001
diet*sex	3	2	339.97	17.2763	8.37	19.68	<.0001

This provides us with our basic analysis, and provides the appropriate tests of the various fixed effects. However, it does not tell us if the effect of pen can be considered statistically significant or not. for that we need to re-fit the model, BUT without the random effect of pen, so that we can compare the Fit Statistics (we shall use the BIC values to compare with and without pen).

The Mixed Procedure, model without random pen effect

Model Information
Data Set	WORK.SUBSAMP2
Dependent Variable	gain
Covariance Structure	Diagonal
Estimation Method	REML
Residual Variance Method	Profile
Fixed Effects SE Method	Model-Based
Degrees of Freedom Method	Residual

Class Level Information
Class	Levels	Values
diet	3	1 2 3
pen	3	1 2 3
pig	6	1 2 3 4 5 6
sex	2	1 2

Dimensions
Covariance Parameters	1
Columns in X	12
Columns in Z	0
Subjects	1
Max Obs per Subject	54

Number of Observations
Number of Observations Read	54
Number of Observations Used	54
Number of Observations Not Used	0

Covariance Parameter Estimates
Cov Parm	Estimate
Residual	1182.39

Fit Statistics
-2 Res Log Likelihood	489.0
AIC (Smaller is Better)	491.0
AICC (Smaller is Better)	491.1
BIC (Smaller is Better)	492.9

Type 3 Tests of Fixed Effects
Effect	Num DF	Den DF	F Value	Pr > F
diet	2	48	15.97	<.0001
sex	1	48	10.16	0.0025
diet*sex	2	48	0.31	0.7334

Deciding whether the effect of pen, nested within diet

The classical approach to comparing (RE)Maximum Likelihood models has been to compare the log likelihood values; Fisher showed that -twice the difference in the Log Likelihood (LnL) has a chi-squared distribution. The BIC statistics can also be used to compare the models. We shall present both comparisons, but will use the BIC approach for making our decision.

Fit Statistics to compare models
	LnL	BIC
With	477.2	481.6	<- smaller
Without	489.0	492.9
Difference	11.8	11.3

Using a Chi-squared, the difference in the -2LnL is 11.8. Comparing this to the tabulated Chi-squared value (for 1 d.f.) we can consider that this is statistically significant. Likewise, using the BIC values the model with the random effect of pen has a smaller value than the model without the effect of pen: the difference is 11.3. Using the BIC statistics, a model with a smaller BIC value is preferred, thus we should consider that the model with the random effect of pen is a better fitting model than the one without pen. The difference should be greater than 5 for us to have a reasonable measure of confidence that the model IS better. Therefore we conclude that there is an effect of pen.

GLM analysis, WRONG

The GLM Procedure, Wrong analysis, using GLM, ignoring pen

Class Level Information
Class	Levels	Values
diet	3	1 2 3
pen	3	1 2 3
pig	6	1 2 3 4 5 6
sex	2	1 2

Number of Observations Read	54
Number of Observations Used	54

The GLM Procedure

Dependent Variable: gain

Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	5	48044.2681	9608.8536	8.13	<.0001
Error	48	56754.7356	1182.3903
Corrected Total	53	104799.0037

R-Square	Coeff Var	Root MSE	gain Mean
0.458442	11.06704	34.38590	310.7056

Source	DF	Type I SS	Mean Square	F Value	Pr > F
diet	2	35325.87903	17662.93952	14.94	<.0001
sex	1	11980.36755	11980.36755	10.13	0.0026
diet*sex	2	738.02153	369.01076	0.31	0.7334

Source	DF	Type III SS	Mean Square	F Value	Pr > F
diet	2	37758.13169	18879.06584	15.97	<.0001
sex	1	12008.48965	12008.48965	10.16	0.0025
diet*sex	2	738.02153	369.01076	0.31	0.7334

The GLM Procedure

Least Squares Means

diet	gain LSMEAN	Standard Error	Pr > \|t\|
1	285.323375	8.155333	<.0001
2	296.647875	8.155333	<.0001
3	346.341111	8.104835	<.0001

Conclusions?

We have conclded that pen is significant. The analysis shown in the first model is the most appropriate one. We can see that there is no significant interaction (between Diet and Sex) and that there is no real effect of Diet (Prob = 0.0721.

However, if we had erroneously ignored the pen effect and had thought that the pig was the experimental unit and looked at the GLM analysis we would think that there was a highly significant effect of Diet, i.e. that there were real differences between the diets, whereas in fact there are no differences. So, if we got our model wrong then we would come to completely the wrong conclusions, possibly to our detriment if our research results and manuscript are rejected!

In general, if we have a model with a nested (aka subsampling) structure and we ignore the structure and test the varoous effects against the residual then we shall overestimate the significance of the effects.