Multiple Comparisons


See Steel, Torrie and Dickey, Chapter 8, and Multiple Comparisons and Multiple Tests using the SAS System, Westfall et al.

Pre-planned comparisons that are linearly independent of one another, hence involving no more contrasts than the appropriate degrees of freedom, can be made using either t-tests or F tests (although it should be noted that this does not, of itself address the issue of multiple comparisons). If we want to test all the possible differences, or if we wish to make tests suggested by the data then simple t-tests or F tests are no longer appropriate, since the overall probability level will likely be toohigh (the risk of false positives). We need tests appropriate to multiple comparisons. Multiple comparison tests are characterised by considering the number of tests that could be made.

There are a number of multiple comparison tests :

Scheffé's method

Here we look at Scheffé's test; because it is a valid, fairly conservative test, sufficiently generalised to be applicable to unequal designs. It is very general, so that all possible contrasts can be tested for significance, or confidence intervals contructed for corresponding linear functions of parameters.


Comparison of treatment means from a Completely Randomised Design

Consider the example from the Completely Randomised Design, One-way ANOVA.

Treatments
3DOk1 3DOk5 3DOk4 3DOk7 3DOk13 composite
means 28.8223.9014.3519.92 13.2618.7

We shall look at the multiple comparison test due to Scheffé. We can use it for computing a Confidence Interval, and also for computing a Critical Difference, to allow us to determine whether a difference can be considered to be statistically significantly different,or not.

First of all we need to decide on our Hypothesis; our Null Hypothesis (Ho) and our Alternative Hypothesis (HA).

Our Null Hypothesis (Ho) will be that there is no difference, i.e. that the difference = Zero.

Our Alternative Hypothesis (HA) will be that there is a difference, i.e. that the difference is not equal to Zero.

Basically it involves estimating the difference of any particular desired contrast, and the standard error, using the appropriate methodology, i.e. using our matrix k`!

We then need to determine the Ftabulated, which will depend on our probability level, numerator and denominator degrees of freedom, hence the importance of being able to compute the correct tabulated F value for any given numerrator and denominator degrees of freedom (see the section on computing F vaalues).

Then we compute the critical difference as :

Then, if the absolute estimated difference is greater than the critical difference, we can declare that the difference is statistically significant, i.e. reject Ho, where Ho is that the difference is zero.

Thus, for example, suppose that we have analysed these data and see that there appears to be a difference between the 5 straight inoculants and the mixture. So, we wish to compare the average of the 5 inoculants vs. the 6th (which is the combined mixture of the 5 inoculants), then the difference is:

(trt 1 + 2 + 3 + 4 + 5)/5 - trt 6 = 1.35 k` = (0 .2 .2 .2 .2 .2 -1) s.e. = 1.77 SAS statement (in PROC GLM) estimate 'trt1-5 vs 6' trt 1 1 1 1 1 -5/divisor=5;

The tabulated t value for 22 residual degrees of freedom at the 5% level is 2.074. Thus, using a simple (inappropriate) t statistic we would require a critical difference (estimate) of at least t * s.e.; i.e. 2.074 * 1.77 = 3.67

If we did not have this as a pre-planned comparison, but rather after our analysis we noted this difference and wanted to know if it was significant we should be using a multiple comparison test, e.g. Scheffé's test. The calculations are as above, to compute the estimate of the difference and the standard error. We have 6 treatments, so s = 6, and s - 1 = 5, the degrees of freedom for treatments.

F 5%, 5 d.f., 22 d.f. = 2.66
= 3.6469

Then the critical difference = 1.77 * 3.6469 = 6.45. Thus the estimated difference must exceed 6.45 to be considered statistically significant, at the 5% level!! Therefore in this case we would accept the null hypothesis, that the treatments do not in fact differ significantly from one another.


We can also use the Sheffé's multiple comparison method to compute a Confidence Interval; which will simply be our estimate +/- the Critical difference! This will apply equally well to an estimate of a difference, or a Least Square mean.


How do we do this with SAS (or any other statistical package)?
With SAS we can use the estimate statement, after the model statement, to compute the estimate of the difference between two 'treatments' or levels together with the standard error of the estimate. Then a little bit of work by hand with a calculator will give us the critical difference.

In addition, if we have carried out an experiment and then made the statistical analysis and want to compare all the treatments (for example in the above experiment) we are now able to have SAS do the Scheffé's test for us when we compare the Least Squares Means. We shall not obtain the estimates of the differences, nor a critical difference. Rather SAS will provide us with a table of the probabilities of the differences, adjusted for the multiple comparisons, using the method of Scheffé. The SAS statements for the CRD would be:


proc glm data=crd1;
classes trt;
model y = crd;
lsmeans trt/stderr pdiff;
lsmeans trt/stderr pdiff adjust=scheffe;  /* Scheffe's test  */
lsmeans trt/stderr pdiff adjust=bon;      /* Boneferoni's test  */
estimate 'trt1-trt2' trt 1 -1 0 0 0 0;
estimate 'trt1-trt3' trt 1 0 -1 0 0 0;
estimate 'trt1-trt4' trt 1 0 0 -1 0 0;
estimate 'trt1-trt5' trt 1 0 0 0 -1 0;
estimate 'trt1-trt6' trt 1 0 0 0 0 -1;
estimate 'trt2-trt3' trt 0 1 -1 0 0 0;
estimate 'trt2-trt4' trt 0 1 0 -1 0 0;
estimate 'trt2-trt5' trt 0 1 0 0 -1 0;
estimate 'trt2-trt6' trt 0 1 0 0 0 -1;
estimate 'trt3-trt4' trt 0 0 1 -1 0 0;
estimate 'trt3-trt5' trt 0 0 1 0 -1 0;
estimate 'trt3-trt6' trt 0 0 1 0 0 -1;
estimate 'trt4-trt5' trt 0 0 0 1 -1 0;
estimate 'trt4-trt6' trt 0 0 0 1 0 -1;
estimate 'trt5-trt6' trt 0 0 0 0 1 -1;
estimate 'trt1-5 vs 6' trt .2 .2 .2 .2 .2 -1;
run;

The output produced from the above analysis is shown here:


 

The SAS System

The GLM Procedure

Class Level Information
Class Levels Values
trt 6 1 2 3 4 5 6
 
Number of observations 28

 


 
The SAS System

The GLM Procedure
Dependent Variable: y

Source DF Sum of Squares Mean Square F Value Pr > F
Model 5 812.674500 162.534900 12.72 <.0001
Error 22 281.118000 12.778091    
Corrected Total 27 1093.792500      
 
R-Square Coeff Var Root MSE y Mean
0.742988 17.98564 3.574646 19.87500
 
Source DF Type I SS Mean Square F Value Pr > F
trt 5 812.6745000 162.5349000 12.72 <.0001
 
Source DF Type III SS Mean Square F Value Pr > F
trt 5 812.6745000 162.5349000 12.72 <.0001

 


 

The GLM Procedure
Least Squares Means
LSMeans = fitted values, mu + trti
Pr = probability that the real value of mu + trti = Zero

trt y LSMEAN Standard Error Pr > |t| LSMEAN Number
1 28.8200000 1.5986301 <.0001 1
2 23.9000000 1.7873228 <.0001 2
3 14.3500000 1.7873228 <.0001 3
4 19.9200000 1.5986301 <.0001 4
5 13.2600000 1.5986301 <.0001 5
6 18.7000000 1.5986301 <.0001 6


 
Probabilities of differences amongst treatments
Least Squares Means for effect trt
Pr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: y
i/j 1 2 3 4 5 6
1   0.0523 <.0001 0.0007 <.0001 0.0002
2 0.0523   0.0010 0.1112 0.0002 0.0412
3 <.0001 0.0010   0.0298 0.6539 0.0833
4 0.0007 0.1112 0.0298   0.0075 0.5949
5 <.0001 0.0002 0.6539 0.0075   0.0250
6 0.0002 0.0412 0.0833 0.5949 0.0250  

NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used.
NOTE: These are simple t-tests, no consideration for multiple comparisons.

 


 

The GLM Procedure
Least Squares Means
Adjustment for Multiple Comparisons: Scheffe
Least Squares Means for effect trt
Pr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: y
i/j 1 2 3 4 5 6
1   0.5345 0.0004 0.0288 <.0001 0.0098
2 0.5345   0.0391 0.7360 0.0106 0.4745
3 0.0004 0.0391   0.3990 0.9989 0.6587
4 0.0288 0.7360 0.3990   0.1683 0.9975
5 <.0001 0.0106 0.9989 0.1683   0.3607
6 0.0098 0.4745 0.6587 0.9975 0.3607  

 


 

The GLM Procedure
Least Squares Means
Adjustment for Multiple Comparisons: Bonferroni
Least Squares Means for effect trt
Pr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: y
i/j 1 2 3 4 5 6
1   0.7843 <.0001 0.0106 <.0001 0.0028
2 0.7843   0.0155 1.0000 0.0031 0.6181
3 <.0001 0.0155   0.4475 1.0000 1.0000
4 0.0106 1.0000 0.4475   0.1121 1.0000
5 <.0001 0.0031 1.0000 0.1121   0.3744
6 0.0028 0.6181 1.0000 1.0000 0.3744  

 


 

The GLM Procedure
Dependent Variable: y

Parameter Estimate Standard Error t Value Pr > |t|
trt1-trt2 4.9200000 2.39794514 2.05 0.0523
trt1-trt3 14.4700000 2.39794514 6.03 <.0001
trt1-trt4 8.9000000 2.26080436 3.94 0.0007
trt1-trt5 15.5600000 2.26080436 6.88 <.0001
trt1-trt6 10.1200000 2.26080436 4.48 0.0002
trt2-trt3 9.5500000 2.52765612 3.78 0.0010
trt2-trt4 3.9800000 2.39794514 1.66 0.1112
trt2-trt5 10.6400000 2.39794514 4.44 0.0002
trt2-trt6 5.2000000 2.39794514 2.17 0.0412
trt3-trt4 -5.5700000 2.39794514 -2.32 0.0298
trt3-trt5 1.0900000 2.39794514 0.45 0.6539
trt3-trt6 -4.3500000 2.39794514 -1.81 0.0833
trt4-trt5 6.6600000 2.26080436 2.95 0.0075
trt4-trt6 1.2200000 2.26080436 0.54 0.5949
trt5-trt6 -5.4400000 2.26080436 -2.41 0.0250
trt1-5 vs 6 1.3500000 1.76574465 0.76 0.4527

Comparison of treatment means from a Factorial Design

Consider the example from the Factorial Design.

We have a simple Factorial Design, with both factors being fixed effects, Diet with 3 levels and Sex with 2 levels, so that the effects are all being tested against the Residual Mean Square. The interaction effect was statistically significant (Pr < .0034). Thus the main effects loose their importance and we should be looking at the 'Simple Effects'. From this 3 X 2 Factorial we therefore have 6 Diet X Sex combinations, or 'Simple Effects'. Therefore in terms of 'Simple Effects', if we wish to make multiple comparisons or ad hoc, post priori tests we have 6 'levels', hence 5 degrees of freedom. These 5 degrees of freedom are equal to the 2 d.f. for Diet + 1 d.f. for Sex + the 2 d.f. for the Diet X Sex interaction.

Simple effectsEstimate ± s.e.
µ + a1 + b1 + ab11 105.25 ± 1.40
µ + a1 + b2 + ab12 95.00 ± 1.40
µ + a1 + b3 + ab13 91.25 ± 1.40
µ + a2 + b1 + ab21 102.25 ± 1.40
µ + a2 + b2 + ab22 102.25 ± 1.40
µ + a2 + b3 + ab23 89.50 ± 1.40

Using the same principles as above, for a multiple comparison test using Scheffé's test, we have 6 'treatments' (combinations); therefore s = 6, and s-1 = 5. The residual degrees of freedom are 18.

F 5%, 5 d.f., 18 d.f. = 2.77
= 3.7216

Consider the difference between A1B2 ( 95.00) and A2B2 (102.25).

trt A_1B_2 - trt A_2B_2 = 95.00 - 102.25 = -7.25 k` = (0 0 0 0 1 -1 0 1 -1 0 0 0) s.e. = 1.98 SAS statement estimate 'A1B2 - A2B2' a 1 -1 A*B 0 1 0 0 -1 0;

Then the critical difference is 1.98*3.7216 = 7.369

Thus an aposteriori test, à la Scheffé's test, would accept the null hypothesis, that there is no difference, since the difference is less than the critical difference; whereas a simple t-test would reject the null hypothesis :

t-calculated = 7.25/1.98 = 3.66

and the tabulated t value for 5% and 18 d.f. is 2.101; less than our computed t value.

How to do this using SAS? The approach to use is much the same as that described above for a CRD design, we use the SAS estimate statement to get SAS to compute the estimate and standard error of the particular contrast that we are interested in; the rest we do by hand!


Multiple Comparisons in a Nested Analysis

The above approaches can be extended to Nested Designs and Analyses, effectively the only difference is that in a Nested (Subsampling) Analysis the Residual Mean Square (Error) is replaced by the appropriate Mean Square, the same one as used in the Analysis of Variance.


R. I. Cue ©
Department of Animal Science, McGill University
last updated : 2010 May 1