Here are the results from running the Randomised Complete Block analysis, using SAS PROC/MIXED.
The MIXED Procedure Class Level Information Class Levels Values
DIET 4 1 2 3 4 MONTH 12 1 2 3 4 5 6 7 8 9 10 11 12
REML Estimation Iteration History Iteration Evaluations Objective Criterion
0 1 173.52884464 1 1 82.12223052 0.00000000 Convergence criteria met.
Covariance Parameter Estimates (REML) Cov Parm Estimate
MONTH 14.54488636 Residual 0.60414773
What does the above tell us?
Model Fitting Information for GAIN Description Value
Observations 48.0000 Res Log Likelihood -81.4944 Akaike's Information Criterion -83.4944 Schwarz's Bayesian Criterion -85.2786 -2 Res Log Likelihood 162.9888
Tests of Fixed Effects Source NDF DDF Type III F Pr > F
DIET 3 33 49.78 0.0001
ESTIMATE Statement Results Parameter Estimate Std Error DF t Pr > |t|
mean + diet 1 51.15833333 1.12357443 33 45.53 0.0001 mean + diet 2 51.97500000 1.12357443 33 46.26 0.0001 mean + diet 3 53.25833333 1.12357443 33 47.40 0.0001 mean + diet 4 54.78333333 1.12357443 33 48.76 0.0001 diet 1 - 2 -0.81666667 0.31731891 33 -2.57 0.0147 diet 1 - 3 -2.10000000 0.31731891 33 -6.62 0.0001
We can see the information that SAS uses in fitting the model, specifically the (Restricted) Log Likelihood (-81.4944), as well as Akaike's Information Criteria (AIC) and Schwarz's Bayesian Criterion (SBC). These relate to the fit of the model, but are usually themselves of no particular value; they are only relative values for comparing 1 model with another.
-2 (Res) Log Likelihood is an interesting and useful number, because if we are comparing 2 models, for example if we had also run this data with a model without Month, the difference in -2 Log Likelihood's between the two models has a chi-squared distribution (c2); so we could test whether the effect of Month (s2month) was in fact statistically significant.
The tests of the fixed effects provides us with a synopsis of what would be an Analysis of Variance table. For the fixed effect of Diet (the only fixed effect in this model) we have 3 degrees of freedom for the numerator (NDF, 4 levels - 1 = 3) and 33 degrees of freedom for the denominator (DDF), which in this case is simply the residual degrees of freedom (DDF = N - r(X), 48 - (1 for µ, 3 for Diets and 11 for Month = 15) = 33). The Type III F is the F-ratio, the Marginal SS, as we compute using our k' matrix, divided by the degrees of freedom for Diet to give the Mean SS for Diet, divided by the appropriate Error Mean Square, in this case the Residual, or MSE. Finally Pr > F indicates the probability of obtaining such a large F-ratio simply by random chance when there is no effect of Diet (our Null Hypothesis, Ho). We can see that this is very unlikely, less than 1 in 10000, so I shall reject the Ho and instead accept that there are indeed statistically significant differences between Diets.
Least Squares Means Effect DIET LSMEAN Std Error DF t Pr > |t|
DIET 1 51.15833333 1.12357443 33 45.53 0.0001 DIET 2 51.97500000 1.12357443 33 46.26 0.0001 DIET 3 53.25833333 1.12357443 33 47.40 0.0001 DIET 4 54.78333333 1.12357443 33 48.76 0.0001
The LSMEANS are Least Squares Means (n.b. most journals and supervisors love LSMeans!), which for Diet 1 are simply the average of the 12 fitted values for Diet 1, i.e.
( µ + diet1 + month1 + µ + diet1 + month2 + µ + diet1 + month3 + µ + diet1 + month4 + µ + diet1 + month5 + µ + diet1 + month6 + µ + diet1 + month7 + µ + diet1 + month8 + µ + diet1 + month9 + µ + diet1 + month10 + µ + diet1 + month11 + µ + diet1 + month12 ) / 12
You should note that this LSMean is a linear function of fitted values (and hence estimable). A suitable k' matrix would be
µ Diets Months ------- ----------------------------------------------------------- ( 1 1 0 0 0 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 )
The Sampling Variance of our LSMean estimate is simply k'(X'X)-k * phenotypic variance. The phenotypic variance = residual variance + month variance.
i.e. sp2 = se2 + sm2
variancep = 0.60414773 + 14.54488636 = 15.149034
Multiply these and in fact, because it is a simple completely balanced experiment with each diet having 12 observations (1 per month), then the Sampling Variance = variancep/12, = 15.149 / 12 = 1.2624.
Then the standard error to our estimate is the square root of this, which gives us the value of 1.124.
We can test the statistical significance of the random Month effect by re-running the analysis, but dropping out the effect of MONTH.
This gives the following results:
The MIXED Procedure Class Level Information Class Levels Values
DIET 4 1 2 3 4
Covariance Parameter Estimates (REML) Cov Parm Estimate
Residual 15.14903409
Model Fitting Information for GAIN Description Value
Observations 48.0000 Res Log Likelihood -127.198 Akaike's Information Criterion -128.198 Schwarz's Bayesian Criterion -129.090 -2 Res Log Likelihood 254.3954
Tests of Fixed Effects Source NDF DDF Type III F Pr > F
DIET 3 44 1.99 0.1300
What we are interested in is the -2 (Res) Log Likelihood number, it is 254.3954. What does this mean and what does it tell us?
Model | -2 LnL |
---|---|
( µ + Dietfixed) | 254.3954 |
- ( µ + Dietfixed + Monthrandom) | 162.9888 |
= | 91.4066 |
Thus we have a c2 of 91.4 for 1 degree of freedom (for our 1 parameter [e2month]). The critical tabulated value for a c2 with 1 d.f. and a Pr of 5% is 3.84. Thus we can conclude that the effect of Month is quite significant and should be retained in the model; therefore the first analysis is the one that we should use.
Here we look at the results that we would have got if we had mistakenly used a Fixed Effects model.
General Linear Models Procedure Class Level Information Class Levels Values
DIET 4 1 2 3 4 MONTH 12 1 2 3 4 5 6 7 8 9 10 11 12 Number of observations in data set = 48
Much as for PROC MIXED, PROC GLM tells us that DIET and MONTH were class variables with 4 and 12 levels respectively.
General Linear Models Procedure Dependent Variable: GAIN Sum of Mean Source DF Squares Square F Value Pr > F
Model 14 736.8512500 52.6322321 87.12 0.0001 Error 33 19.9368750 0.6041477 Corrected Total 47 756.7881250
R-Square C.V. Root MSE GAIN Mean
0.973656 1.472275 0.777269 52.79375
Source DF Type I SS Mean Square F Value Pr > F
DIET 3 90.2306250 30.0768750 49.78 0.0001 MONTH 11 646.6206250 58.7836932 97.30 0.0001
Source DF Type III SS Mean Square F Value Pr > F
DIET 3 90.2306250 30.0768750 49.78 0.0001 MONTH 11 646.6206250 58.7836932 97.30 0.0001
The GLM procedure gives an Analysis of Variance showing the Sources of Variation being the Model and the Residual.
The Model is in fact the Model corrected for the Mean, or the Model over and above the Mean, R(Diet, Month | µ).
The Type I Sums of Squares are the Sequential Sums of Squares (due to fitting the effects in the order specified in the SAS model statement). The Type III Sums of Squares are the Marginal Sums of Squares and are therefore independent of the order in which they are fitted.
General Linear Models Procedure Source Type III Expected Mean Square
DIET Var(Error) + Q(DIET) MONTH Var(Error) + 4 Var(MONTH)
Having declared MONTH as a Random effect we obtain a table of the Expectations of the various Mean Squares in the ANOVA, E(MS). This is in spite of the fact that PROC GLM is a fixed effects model and fits all effects as Fixed Effects. Anyway, if we use the E(MS) we can estimate the variance due to Month:
General Linear Models Procedure Least Squares Means DIET GAIN Std Err Pr > |T| LSMEAN LSMEAN H0:LSMEAN=0
1 51.1583333 0.2243783 0.0001 2 51.9750000 0.2243783 0.0001 3 53.2583333 0.2243783 0.0001 4 54.7833333 0.2243783 0.0001
Just as with PROC MIXED, the LSMeans are the average of the 12 fitted values for Diet 1, i.e.
( µ + diet1 + month1 + µ + diet1 + month2 + µ + diet1 + month3 + µ + diet1 + month4 + µ + diet1 + month5 + µ + diet1 + month6 + µ + diet1 + month7 + µ + diet1 + month8 + µ + diet1 + month9 + µ + diet1 + month10 + µ + diet1 + month11 + µ + diet1 + month12 ) / 12
In fact, because the design is balanced, we get the same estimate for out LSMean for Diet 1 as we had obtained with PROC MIXED. BUT, the standard error is quite different. The standard error is computed by PROC GLM using only the residual variance (because in a purely fixed effects analysis that is the only variance) in our (by now traditional) formula:
Note that this gives the standard error, as computed by PROC GLM, of .224, whereas the correct standard error, computed by PROC MIXED, is 1.12; 5 times larger!
General Linear Models Procedure Dependent Variable: GAIN T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate
mean diet 1 51.1583333 228.00 0.0001 0.22437835 mean diet 2 51.9750000 231.64 0.0001 0.22437835 d1 - d2 -0.8166667 -2.57 0.0147 0.31731891 d1 - d3 -2.1000000 -6.62 0.0001 0.31731891
R.I. Cue,
last updated : 2010 April 27