Ignoring an effect from a model


Suppose we have 6 different diet/treatments which we wish to compare for their efficacy in humans, for alleviating problems during pregnancy. We believe that there may well be differences between parities in the levels of problems associated with pregnancy, so we arrange to minimise these effects on our experiment by having equal numbers of women of first, second, third and fourth parity on each treatment, i.e. from women in their first parity we choose 6 at random, from women in their second parity we choose 6 at random, from women in their third parity we choose 6 at random, and from women in their fourth parity we choose 6 at random. Thus the women are a random sample from within each of the 4 parity groups. Within each parity group the assignment of a woman (the experimental unit/subject) to one of the 6 treatments is at random. The following data were obtained :

Parity
Treatments 1 2 3 4
 1   4.4     5.9     7.0     4.1  
 2   3.3     1.9     5.9     7.1  
 3   4.4     4.0     5.5     3.1  
 4   6.8     6.6     8.0     6.4  
 5   6.3     4.9     6.9     7.1  
 6   6.4     7.3     8.7     6.7  

What model do we use for this analysis?


Possible models :

  • Two-way, RCB
  • One-way, CRD
  • Comparison of both models

  • Two-way, Randomised Complete Block design

    We could analyse this data as a two-way Analysis of Variance. There would be the factor 'Trt' with 6 levels and 'Parity' with 4 levels. The model would therefore be :

    Yij = µ + Trti + Parityj + eij

    The above table shows the parameters of the model for the 3 different types of model that we could have; Fixed effects, Mixed (Fixed and Random) effects, and Random effects. In this case we would probably consider both treatment and parity to be both fixed effects. The SAS code would therefore be :

    data rcb;
    input trt parity y;
    cards;
    1 1 4.4
    1 2 5.9
    1 3 7.0
    1 4 4.1
    2 1 3.3
    2 2 1.9
    2 3 5.9
    2 4 7.1
    3 1 4.4
    3 2 4.0
    3 3 5.5
    3 4 3.1
    4 1 6.8
    4 2 6.6
    4 3 8.0
    4 4 6.4
    5 1 6.3
    5 2 4.9
    5 3 6.9
    5 4 7.1
    6 1 6.4
    6 2 7.3
    6 3 8.7
    6 4 6.7
    ;
    proc glm data=rcb;
    classes trt parity;
    model y = trt parity;
    lsmeans trt parity/stderr pdiff;
    run;
    

    Note the various Sums of Squares. Write down the Expected Mean Squares. They would be :

    Thus this table of the expectations of the Mean Squares shows us very clearly which effects to test against which. The general rule is to test a Mean Square against the Mean Square which differs only by NOT having the effect that we are interested in, in the model. Thus Treatment will be tested against the Residual since the Residual contains the Mean Square Error and Treatment Mean Square contains the Mean Square Error + a Quadratic in the treatment effects. Similarly for Blocks. If Blocks and/or Treatments were random we would find that we test them against the Residual, but we would be interested in Variance components and not Block effects per se.


    One-way Analysis of Variance

    If we thought that, since the data are 'balanced' for parities across treatments, we could safely ignore parity we would arrive at analysing this data as a One-way ANOVA experiment with factor 'Trt' with 6 levels; this would not be correct, but if we were to do this then we would be postulating the following model:

    Yij = µ + Trti + eij

    and the SAS code would therefore be :

    data crd;
    input trt parity y;
    cards;
    1 1 4.4
    1 2 5.9
    1 3 7.0
    1 4 4.1
    2 1 3.3
    2 2 1.9
    2 3 5.9
    2 4 7.1
    3 1 4.4
    3 2 4.0
    3 3 5.5
    3 4 3.1
    4 1 6.8
    4 2 6.6
    4 3 8.0
    4 4 6.4
    5 1 6.3
    5 2 4.9
    5 3 6.9
    5 4 7.1
    6 1 6.4
    6 2 7.3
    6 3 8.7
    6 4 6.7
    ;
    proc glm data=crd;
    classes trt;
    model y = trt;
    lsmeans trt/stderr pdiff;
    run;
    

    Note the various Sums of Squares. Compare them with the previous analysis. What do you think that the expected Mean Squares are now?

    In fact what happens is that since the experimental design is 'balanced' then the Mean Squares for Treatments will remain unchanged, but the residual will nolonger be simply the Mean Square Error. It will also contain the effect that we have so conveniently 'forgotten', or 'ignored'. The expectations of the Mean Squares are now :

    We can see that the Expected Mean Square for Treatments remains the same, but that the expectation for the residual has changed; it now contains not just the Mean Square error, but also the block effect. Thus if we test Treatments against what we have computed as the residual then it is no longer a valid or sensible F-test.


    Comparison of both models

    ANOVA, including Treatments and Blocks
     Source of variation     d.f.    Sums Squares    Mean Squares     F-ratio     Pr  
     SSRm      8      44.9437      5.6179      4.27    0.0075  
     Treatment      5      31.6521      6.3304      4.82    0.0080
     Block      3      13.29125      4.4304      3.37    0.00467
     Residual      15      19.71625      1.3144  


    ANOVA, including Treatments, ignoring Blocks
     Source of variation     d.f.    Sums Squares    Mean Squares     F-ratio     Pr  
     SSRm = Trt      5      31.6521      6.3304      3.45    0.0231  
     Residual      18      33.0075      1.8338  


    Note : Not only is such an analysis (ignoring block) wrong, but also if we had decided to be conservative and require the F-ratio to exceed that for the 1% probability level, then in the second analysis we would come to the erroneous conclusion that there were no statistically significant differences between treatments when in fact there are significant differences amongst treatments.

    Ignoring an effect from a model when it was in fact part of the initial experimental design is hazardous to your "Degree" of knowledge!


    Additional references

    Steel, Torrie and Dickey.