For a good general overview of ANOVA procedures, the four type of estimable functions and their associated Sums of Squares see the introductory chapters of the SAS/STAT guide.
As a general rule we want the Type III, Marginal Sums of Squares for a factor, i.e. corrected for as many other factors in the model as possible. Type III Sums of Squares also provide estimates which are not a function of the frequency of observations in any group, i.e. for unbalanced data structures, where we have unequal numbers of observations in each group, the group(s) with more observations do not per se have more importance than group(s) with fewer observations. For purely nested designs, some polynomial regressions, and some models involving balanced data fitted in the right order, we can sometimes need Type I, Sequential, Sums of Squares; however, more often we should in fact be using a nested or mixed models procedure in such cases.
The Sums of Squares obtained by fitting effects in the order specified in the model.
Yi = b0 + b1 Xi1 + b2 Xi2 + b3 Xi3 + ei
The Type I Sums of Squares for b1 are the Sums of Squares obtained from fitting b1 over and above the mean. i.e. R(b1 | µ). They are the 'marginal' Sums of Squares for b1 if one fitted the model
Yi = b0 + b1 Xi1 + ei
The Type I Sums of Squares for b2 are the Sums of Squares obtained from fitting b2 after b1, i.e. R(b2 | b11, but not for any other factors we may have measured and be including in our model. They are the 'marginal' Sums of Squares for b2 if one fitted the model
Yi = b0 + b1 Xi1 + b2 Xi2 + ei
Note that in the above model
i.e. the Sequential Sums of Squares sum to the Sums of Squares for the model corrected for the mean.
Similarly, the Type I Sums of Squares for b3 are the Sums of Squares obtained from fitting b3 after b2 and b1, i.e. R( b3 | b2, b1, µ). Thus we have 'corrected' for the effect of b1 and b2. They are the marginal Sums of Squares for b3 if one fitted the model
Yi = b0 + b1 Xi1 + b2 Xi2 + b3 Xi3 + ei
Thus R(b1, b2, b3 | µ) = R(b1 | µ) + R( b2 | b1, µ) + R(b3 | b2, b1, µ)
The Type I, Sequential, Sums of Squares for each effect will change if the order of the effects in the model is changed!
If one has a 'balanced' experiment, when each amount of X1 has every amount of X2 and X3 equally represented, then the Type I (Sequential) Sums of Squares for each effect will also equal the Type III (Marginal) Sums of Squares.
An example with plots of maize and Nitrogen, Phosphorus and Potassium fertilisers.
Nitrogen | Phosphorus | Potassium | Maize Yield | |
---|---|---|---|---|
10 | 10 | 10 | 65 | |
10 | 10 | 20 | 80 | |
10 | 10 | 30 | 104 | |
10 | 20 | 10 | 87 | |
10 | 20 | 20 | 108 | |
10 | 20 | 30 | 126 | |
10 | 30 | 10 | 107 | |
10 | 30 | 20 | 126 | |
10 | 30 | 30 | 148 | |
20 | 10 | 10 | 86 | |
20 | 10 | 20 | 107 | |
20 | 10 | 30 | 129 | |
20 | 20 | 10 | 107 | |
20 | 20 | 20 | 126 | |
20 | 20 | 30 | 148 | |
20 | 30 | 10 | 125 | |
20 | 30 | 20 | 144 | |
20 | 30 | 30 | 168 | |
30 | 10 | 10 | 108 | |
30 | 10 | 20 | 129 | |
30 | 10 | 30 | 141 | |
30 | 20 | 10 | 125 | |
30 | 20 | 20 | 143 | |
30 | 20 | 30 | 168 | |
30 | 30 | 10 | 149 | |
30 | 30 | 20 | 163 | |
30 | 30 | 30 | 184 |
SAS code, balanced experiment
SAS code, unbalanced experiment
Take both these data sets and SAS code and run them through SAS. Examine the outputs, paying particular attention to the PROC GLM analyses, the Type I and Type III Sums of Squares for each of the analyses.
The Sums of Squares obtained by fitting each effect after all the other terms in the model, i.e. the Sums of Squares for each effect corrected for the other terms in the model. The marginal (Type III) Sums of Squares do not depend upon the order in which effects are specified in the model.
Yi = b0 + b1 Xi1 + b2 Xi2 + b3 Xi3 + ei
The marginal Sums of Squares do NOT sum to the Sums of Squares for the model corrected for the mean, i.e.
SSRm = R(b1, b2, b3 | µ)
R(b1, b2, b3 | µ) ne R(b1 | µ, b2, b3) + R(b2 | µ, b1, b3) + R(b3 | µ, b1, b2)
The marginal (Type III) Sums of Squares are preferable in most cases since they correspond to the variation attributable to an effect after correcting for any other effects in the model. They are unaffected by the frequency of observations.
A case where they are not preferable is the case when we have a purely nested design, in this case the main effect within which the effect is nested should be considered by using the Type I Sums of Squares for that main effect in a model where other effects preceed the main effect and the nested effect.
For example, let us consider that we have a nested design, with 3 treatments to be applied to apple trees and that we are then going to weigh 6 apples from each tree. We have 12 trees, 4 per treatment. Trees are the experimental unit. The model will be
Yijk = µ + trti + treeij + appleijk
Then the Sums of Squares that we compute will be
R(trt, tree_within_trt | µ), R( tree_within_trt | µ, trt) and R(trt | µ).
Thus we can see that we need the Type I Sums of Squares for treatment (over and abouve the mean) and for trees within treatments over and above the effect of treatments. This is a purely nested design. Other than this type of case of a purely nested design we should stick to Type III Sums of Squares.