Results/Output from RCB analysis

Here are the results from running the Randomised Complete Block analysis, using SAS PROC/MIXED.


                             The MIXED Procedure

                           Class Level Information

                Class     Levels  Values

                DIET           4  1 2 3 4
                MONTH         12  1 2 3 4 5 6 7 8 9 10 11 12


                      REML Estimation Iteration History

              Iteration  Evaluations     Objective     Criterion

                      0            1  173.52884464
                      1            1   82.12223052    0.00000000

                          Convergence criteria met.


                    Covariance Parameter Estimates (REML)

                    Cov Parm       Estimate

                    MONTH       14.54488636
                    Residual     0.60414773

What does the above tell us?

In this data set that we are analysing, using PROC MIXED, we have a factor DIET which has been fitted as a CLASS variable and that it has 4 levels (1, 2, 3, 4).
We also have a factor MONTH which is fitted as a CLASS variable with 12 levels (1 to 12).
The iterative Restricted Maximum Likelihood (REML) procedure was used to estimate the 2 variance components (MONTH and Residual) and that the convergence criteria were met after 2 iterations (Iteration 0, the starting point, and Iteration 1).
The estimates of the (co)variance components were, month variance = 14.544886 and the residual variance = 0.6041477


                      Model Fitting Information for GAIN

                   Description                        Value

                   Observations                     48.0000
                   Res Log Likelihood              -81.4944
                   Akaike's Information Criterion  -83.4944
                   Schwarz's Bayesian Criterion    -85.2786
                   -2 Res Log Likelihood           162.9888


                           Tests of Fixed Effects

                  Source      NDF   DDF  Type III F  Pr > F

                  DIET          3    33       49.78  0.0001


                          ESTIMATE Statement Results

   Parameter                 Estimate     Std Error    DF       t  Pr > |t|

   mean + diet 1          51.15833333    1.12357443    33   45.53    0.0001
   mean + diet 2          51.97500000    1.12357443    33   46.26    0.0001
   mean + diet 3          53.25833333    1.12357443    33   47.40    0.0001
   mean + diet 4          54.78333333    1.12357443    33   48.76    0.0001
   diet 1 - 2             -0.81666667    0.31731891    33   -2.57    0.0147
   diet 1 - 3             -2.10000000    0.31731891    33   -6.62    0.0001

We can see the information that SAS uses in fitting the model, specifically the (Restricted) Log Likelihood (-81.4944), as well as Akaike's Information Criteria (AIC) and Schwarz's Bayesian Criterion (SBC). These relate to the fit of the model, but are usually themselves of no particular value; they are only relative values for comparing 1 model with another.

-2 (Res) Log Likelihood is an interesting and useful number, because if we are comparing 2 models, for example if we had also run this data with a model without Month, the difference in -2 Log Likelihood's between the two models has a chi-squared distribution (c²); so we could test whether the effect of Month (s²_month) was in fact statistically significant.

The tests of the fixed effects provides us with a synopsis of what would be an Analysis of Variance table. For the fixed effect of Diet (the only fixed effect in this model) we have 3 degrees of freedom for the numerator (NDF, 4 levels - 1 = 3) and 33 degrees of freedom for the denominator (DDF), which in this case is simply the residual degrees of freedom (DDF = N - r(X), 48 - (1 for µ, 3 for Diets and 11 for Month = 15) = 33). The Type III F is the F-ratio, the Marginal SS, as we compute using our k' matrix, divided by the degrees of freedom for Diet to give the Mean SS for Diet, divided by the appropriate Error Mean Square, in this case the Residual, or MSE. Finally Pr > F indicates the probability of obtaining such a large F-ratio simply by random chance when there is no effect of Diet (our Null Hypothesis, H_o). We can see that this is very unlikely, less than 1 in 10000, so I shall reject the H_o and instead accept that there are indeed statistically significant differences between Diets.


                             Least Squares Means

       Effect  DIET        LSMEAN     Std Error    DF       t  Pr > |t|

       DIET    1      51.15833333    1.12357443    33   45.53    0.0001
       DIET    2      51.97500000    1.12357443    33   46.26    0.0001
       DIET    3      53.25833333    1.12357443    33   47.40    0.0001
       DIET    4      54.78333333    1.12357443    33   48.76    0.0001

The LSMEANS are Least Squares Means (n.b. most journals and supervisors love LSMeans!), which for Diet 1 are simply the average of the 12 fitted values for Diet 1, i.e.

  (  µ + diet₁ + month₁
 +   µ + diet₁ + month₂
 +   µ + diet₁ + month₃
 +   µ + diet₁ + month₄
 +   µ + diet₁ + month₅
 +   µ + diet₁ + month₆
 +   µ + diet₁ + month₇
 +   µ + diet₁ + month₈
 +   µ + diet₁ + month₉
 +   µ + diet₁ + month₁₀
 +   µ + diet₁ + month₁₁
 +   µ + diet₁ + month₁₂ ) / 12

You should note that this LSMean is a linear function of fitted values (and hence estimable). A suitable k' matrix would be

  µ    Diets                  Months
      -------  -----------------------------------------------------------
( 1   1 0 0 0  1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 )

The Sampling Variance of our LSMean estimate is simply k'(X'X)^-k * phenotypic variance. The phenotypic variance = residual variance + month variance.

i.e. s_p² = s_e² + s_m²

variance_p = 0.60414773 + 14.54488636 = 15.149034

Multiply these and in fact, because it is a simple completely balanced experiment with each diet having 12 observations (1 per month), then the Sampling Variance = variance_p/12, = 15.149 / 12 = 1.2624.

Then the standard error to our estimate is the square root of this, which gives us the value of 1.124.

Testing whether the Effect of Month is Significant

We can test the statistical significance of the random Month effect by re-running the analysis, but dropping out the effect of MONTH.

This gives the following results:


                             The MIXED Procedure

                           Class Level Information

                          Class     Levels  Values

                          DIET           4  1 2 3 4


                    Covariance Parameter Estimates (REML)

                    Cov Parm       Estimate

                    Residual    15.14903409


                      Model Fitting Information for GAIN

                   Description                        Value

                   Observations                     48.0000
                   Res Log Likelihood              -127.198
                   Akaike's Information Criterion  -128.198
                   Schwarz's Bayesian Criterion    -129.090
                   -2 Res Log Likelihood           254.3954


                           Tests of Fixed Effects

                  Source      NDF   DDF  Type III F  Pr > F

                  DIET          3    44        1.99  0.1300

What we are interested in is the -2 (Res) Log Likelihood number, it is 254.3954. What does this mean and what does it tell us?

-2 Log Likelihoods, for Chi-squared test
Model	-2 LnL
( µ + Diet_fixed)	254.3954
- ( µ + Diet_fixed + Month_random)	162.9888
=	91.4066

Thus we have a c² of 91.4 for 1 degree of freedom (for our 1 parameter [e²_month]). The critical tabulated value for a c² with 1 d.f. and a Pr of 5% is 3.84. Thus we can conclude that the effect of Month is quite significant and should be retained in the model; therefore the first analysis is the one that we should use.

If we had used PROC GLM (Fixed effects model)

Here we look at the results that we would have got if we had mistakenly used a Fixed Effects model.

                       General Linear Models Procedure
                           Class Level Information

                Class    Levels    Values

                DIET          4    1 2 3 4

                MONTH        12    1 2 3 4 5 6 7 8 9 10 11 12


                   Number of observations in data set = 48

Much as for PROC MIXED, PROC GLM tells us that DIET and MONTH were class variables with 4 and 12 levels respectively.

                       General Linear Models Procedure

Dependent Variable: GAIN
                                    Sum of           Mean
Source                  DF         Squares         Square   F Value     Pr > F

Model                   14     736.8512500     52.6322321     87.12     0.0001

Error                   33      19.9368750      0.6041477

Corrected Total         47     756.7881250


                  R-Square            C.V.       Root MSE            GAIN Mean

                  0.973656        1.472275       0.777269             52.79375


Source                  DF       Type I SS    Mean Square   F Value     Pr > F

DIET                     3      90.2306250     30.0768750     49.78     0.0001
MONTH                   11     646.6206250     58.7836932     97.30     0.0001


Source                  DF     Type III SS    Mean Square   F Value     Pr > F

DIET                     3      90.2306250     30.0768750     49.78     0.0001
MONTH                   11     646.6206250     58.7836932     97.30     0.0001

The GLM procedure gives an Analysis of Variance showing the Sources of Variation being the Model and the Residual.

The Model is in fact the Model corrected for the Mean, or the Model over and above the Mean, R(Diet, Month | µ).

The Type I Sums of Squares are the Sequential Sums of Squares (due to fitting the effects in the order specified in the SAS model statement). The Type III Sums of Squares are the Marginal Sums of Squares and are therefore independent of the order in which they are fitted.

                       General Linear Models Procedure

Source      Type III Expected Mean Square

DIET        Var(Error) + Q(DIET)

MONTH       Var(Error) + 4 Var(MONTH)

Having declared MONTH as a Random effect we obtain a table of the Expectations of the various Mean Squares in the ANOVA, E(MS). This is in spite of the fact that PROC GLM is a fixed effects model and fits all effects as Fixed Effects. Anyway, if we use the E(MS) we can estimate the variance due to Month:

(MS_month - MS_residual) / 4 = 14.54

                       General Linear Models Procedure
                             Least Squares Means

                DIET          GAIN       Std Err     Pr > |T|
                            LSMEAN        LSMEAN   H0:LSMEAN=0

                1       51.1583333     0.2243783        0.0001
                2       51.9750000     0.2243783        0.0001
                3       53.2583333     0.2243783        0.0001
                4       54.7833333     0.2243783        0.0001

Just as with PROC MIXED, the LSMeans are the average of the 12 fitted values for Diet 1, i.e.

  (  µ + diet₁ + month₁
 +   µ + diet₁ + month₂
 +   µ + diet₁ + month₃
 +   µ + diet₁ + month₄
 +   µ + diet₁ + month₅
 +   µ + diet₁ + month₆
 +   µ + diet₁ + month₇
 +   µ + diet₁ + month₈
 +   µ + diet₁ + month₉
 +   µ + diet₁ + month₁₀
 +   µ + diet₁ + month₁₁
 +   µ + diet₁ + month₁₂ ) / 12

In fact, because the design is balanced, we get the same estimate for out LSMean for Diet 1 as we had obtained with PROC MIXED. BUT, the standard error is quite different. The standard error is computed by PROC GLM using only the residual variance (because in a purely fixed effects analysis that is the only variance) in our (by now traditional) formula:

sampling variance = k'(X'X)^-k * s_e²

Note that this gives the standard error, as computed by PROC GLM, of .224, whereas the correct standard error, computed by PROC MIXED, is 1.12; 5 times larger!

                       General Linear Models Procedure

Dependent Variable: GAIN

                                        T for H0:    Pr > |T|   Std Error of
Parameter                  Estimate    Parameter=0                Estimate

mean diet 1              51.1583333         228.00     0.0001     0.22437835
mean diet 2              51.9750000         231.64     0.0001     0.22437835
d1 - d2                  -0.8166667          -2.57     0.0147     0.31731891
d1 - d3                  -2.1000000          -6.62     0.0001     0.31731891

R.I. Cue,
last updated : 2010 April 27