Multiple Comparisons

See Steel, Torrie and Dickey, Chapter 8, and Multiple Comparisons and Multiple Tests using the SAS System, Westfall et al.

Pre-planned comparisons that are linearly independent of one another, hence involving no more contrasts than the appropriate degrees of freedom, can be made using either t-tests or F tests (although it should be noted that this does not, of itself address the issue of multiple comparisons). If we want to test all the possible differences, or if we wish to make tests suggested by the data then simple t-tests or F tests are no longer appropriate, since the overall probability level will likely be toohigh (the risk of false positives). We need tests appropriate to multiple comparisons. Multiple comparison tests are characterised by considering the number of tests that could be made.

There are a number of multiple comparison tests :

Scheffé's test
Tukey's w procedure
Bonferonni's test

Scheffé's method

Here we look at Scheffé's test; because it is a valid, fairly conservative test, sufficiently generalised to be applicable to unequal designs. It is very general, so that all possible contrasts can be tested for significance, or confidence intervals contructed for corresponding linear functions of parameters.

Comparison of treatment means from a Completely Randomised Design

Consider the example from the Completely Randomised Design, One-way ANOVA.

Treatments

3DOk1 3DOk5 3DOk4 3DOk7 3DOk13 composite

means 28.82 23.90 14.35 19.92 13.26 18.7

	Treatments
	3DOk1	3DOk5	3DOk4	3DOk7	3DOk13	composite
means	28.82	23.90	14.35	19.92	13.26	18.7

We shall look at the multiple comparison test due to Scheffé. We can use it for computing a Confidence Interval, and also for computing a Critical Difference, to allow us to determine whether a difference can be considered to be statistically significantly different,or not.

First of all we need to decide on our Hypothesis; our Null Hypothesis (H_o) and our Alternative Hypothesis (H_A).

Our Null Hypothesis (H_o) will be that there is no difference, i.e. that the difference = Zero.

Our Alternative Hypothesis (H_A) will be that there is a difference, i.e. that the difference is not equal to Zero.

Basically it involves estimating the difference of any particular desired contrast, and the standard error, using the appropriate methodology, i.e. using our matrix k`!

We then need to determine the F_tabulated, which will depend on our probability level, numerator and denominator degrees of freedom, hence the importance of being able to compute the correct tabulated F value for any given numerrator and denominator degrees of freedom (see the section on computing F vaalues).

Then we compute the critical difference as :

Then, if the absolute estimated difference is greater than the critical difference, we can declare that the difference is statistically significant, i.e. reject H_o, where H_o is that the difference is zero.

Thus, for example, suppose that we have analysed these data and see that there appears to be a difference between the 5 straight inoculants and the mixture. So, we wish to compare the average of the 5 inoculants vs. the 6^th (which is the combined mixture of the 5 inoculants), then the difference is:

(trt 1 + 2 + 3 + 4 + 5)/5 - trt 6 = 1.35 k` = (0 .2 .2 .2 .2 .2 -1) s.e. = 1.77 SAS statement (in PROC GLM) estimate 'trt1-5 vs 6' trt 1 1 1 1 1 -5/divisor=5;

The tabulated t value for 22 residual degrees of freedom at the 5% level is 2.074. Thus, using a simple (inappropriate) t statistic we would require a critical difference (estimate) of at least t * s.e.; i.e. 2.074 * 1.77 = 3.67

If we did not have this as a pre-planned comparison, but rather after our analysis we noted this difference and wanted to know if it was significant we should be using a multiple comparison test, e.g. Scheffé's test. The calculations are as above, to compute the estimate of the difference and the standard error. We have 6 treatments, so s = 6, and s - 1 = 5, the degrees of freedom for treatments.

F 5%, 5 d.f., 22 d.f. = 2.66

= 3.6469

Then the critical difference = 1.77 * 3.6469 = 6.45. Thus the estimated difference must exceed 6.45 to be considered statistically significant, at the 5% level!! Therefore in this case we would accept the null hypothesis, that the treatments do not in fact differ significantly from one another.

We can also use the Sheffé's multiple comparison method to compute a Confidence Interval; which will simply be our estimate +/- the Critical difference! This will apply equally well to an estimate of a difference, or a Least Square mean.

How do we do this with SAS (or any other statistical package)?
With SAS we can use the estimate statement, after the model statement, to compute the estimate of the difference between two 'treatments' or levels together with the standard error of the estimate. Then a little bit of work by hand with a calculator will give us the critical difference.

In addition, if we have carried out an experiment and then made the statistical analysis and want to compare all the treatments (for example in the above experiment) we are now able to have SAS do the Scheffé's test for us when we compare the Least Squares Means. We shall not obtain the estimates of the differences, nor a critical difference. Rather SAS will provide us with a table of the probabilities of the differences, adjusted for the multiple comparisons, using the method of Scheffé. The SAS statements for the CRD would be:


proc glm data=crd1;
classes trt;
model y = crd;
lsmeans trt/stderr pdiff;
lsmeans trt/stderr pdiff adjust=scheffe;  /* Scheffe's test  */
lsmeans trt/stderr pdiff adjust=bon;      /* Boneferoni's test  */
estimate 'trt1-trt2' trt 1 -1 0 0 0 0;
estimate 'trt1-trt3' trt 1 0 -1 0 0 0;
estimate 'trt1-trt4' trt 1 0 0 -1 0 0;
estimate 'trt1-trt5' trt 1 0 0 0 -1 0;
estimate 'trt1-trt6' trt 1 0 0 0 0 -1;
estimate 'trt2-trt3' trt 0 1 -1 0 0 0;
estimate 'trt2-trt4' trt 0 1 0 -1 0 0;
estimate 'trt2-trt5' trt 0 1 0 0 -1 0;
estimate 'trt2-trt6' trt 0 1 0 0 0 -1;
estimate 'trt3-trt4' trt 0 0 1 -1 0 0;
estimate 'trt3-trt5' trt 0 0 1 0 -1 0;
estimate 'trt3-trt6' trt 0 0 1 0 0 -1;
estimate 'trt4-trt5' trt 0 0 0 1 -1 0;
estimate 'trt4-trt6' trt 0 0 0 1 0 -1;
estimate 'trt5-trt6' trt 0 0 0 0 1 -1;
estimate 'trt1-5 vs 6' trt .2 .2 .2 .2 .2 -1;
run;

The output produced from the above analysis is shown here:

The SAS System

The GLM Procedure

Class Level Information
Class	Levels	Values
trt	6	1 2 3 4 5 6

Number of observations

The SAS System

The GLM Procedure

Dependent Variable: y

Source	DF	Sum of Squares	Mean Square	F Value	Pr > F
Model	5	812.674500	162.534900	12.72	<.0001
Error	22	281.118000	12.778091
Corrected Total	27	1093.792500

R-Square	Coeff Var	Root MSE	y Mean
0.742988	17.98564	3.574646	19.87500

Source	DF	Type I SS	Mean Square	F Value	Pr > F
trt	5	812.6745000	162.5349000	12.72	<.0001

Source	DF	Type III SS	Mean Square	F Value	Pr > F
trt	5	812.6745000	162.5349000	12.72	<.0001

The GLM Procedure

Least Squares Means

LSMeans = fitted values, mu + trt_i

Pr = probability that the real value of mu + trt_i = Zero

trt	y LSMEAN	Standard Error	Pr > \|t\|	LSMEAN Number
1	28.8200000	1.5986301	<.0001	1
2	23.9000000	1.7873228	<.0001	2
3	14.3500000	1.7873228	<.0001	3
4	19.9200000	1.5986301	<.0001	4
5	13.2600000	1.5986301	<.0001	5
6	18.7000000	1.5986301	<.0001	6

Probabilities of differences amongst treatments

Least Squares Means for effect trt Pr > \|t\| for H0: LSMean(i)=LSMean(j) Dependent Variable: y
i/j	1	2	3	4	5	6
1		0.0523	<.0001	0.0007	<.0001	0.0002
2	0.0523		0.0010	0.1112	0.0002	0.0412
3	<.0001	0.0010		0.0298	0.6539	0.0833
4	0.0007	0.1112	0.0298		0.0075	0.5949
5	<.0001	0.0002	0.6539	0.0075		0.0250
6	0.0002	0.0412	0.0833	0.5949	0.0250

NOTE:	To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used.
NOTE:	These are simple t-tests, no consideration for multiple comparisons.

The GLM Procedure

Least Squares Means

Adjustment for Multiple Comparisons: Scheffe

Least Squares Means for effect trt Pr > \|t\| for H0: LSMean(i)=LSMean(j) Dependent Variable: y
i/j	1	2	3	4	5	6
1		0.5345	0.0004	0.0288	<.0001	0.0098
2	0.5345		0.0391	0.7360	0.0106	0.4745
3	0.0004	0.0391		0.3990	0.9989	0.6587
4	0.0288	0.7360	0.3990		0.1683	0.9975
5	<.0001	0.0106	0.9989	0.1683		0.3607
6	0.0098	0.4745	0.6587	0.9975	0.3607

The GLM Procedure

Least Squares Means

Adjustment for Multiple Comparisons: Bonferroni

Least Squares Means for effect trt Pr > \|t\| for H0: LSMean(i)=LSMean(j) Dependent Variable: y
i/j	1	2	3	4	5	6
1		0.7843	<.0001	0.0106	<.0001	0.0028
2	0.7843		0.0155	1.0000	0.0031	0.6181
3	<.0001	0.0155		0.4475	1.0000	1.0000
4	0.0106	1.0000	0.4475		0.1121	1.0000
5	<.0001	0.0031	1.0000	0.1121		0.3744
6	0.0028	0.6181	1.0000	1.0000	0.3744

The GLM Procedure

Dependent Variable: y

Parameter	Estimate	Standard Error	t Value	Pr > \|t\|
trt1-trt2	4.9200000	2.39794514	2.05	0.0523
trt1-trt3	14.4700000	2.39794514	6.03	<.0001
trt1-trt4	8.9000000	2.26080436	3.94	0.0007
trt1-trt5	15.5600000	2.26080436	6.88	<.0001
trt1-trt6	10.1200000	2.26080436	4.48	0.0002
trt2-trt3	9.5500000	2.52765612	3.78	0.0010
trt2-trt4	3.9800000	2.39794514	1.66	0.1112
trt2-trt5	10.6400000	2.39794514	4.44	0.0002
trt2-trt6	5.2000000	2.39794514	2.17	0.0412
trt3-trt4	-5.5700000	2.39794514	-2.32	0.0298
trt3-trt5	1.0900000	2.39794514	0.45	0.6539
trt3-trt6	-4.3500000	2.39794514	-1.81	0.0833
trt4-trt5	6.6600000	2.26080436	2.95	0.0075
trt4-trt6	1.2200000	2.26080436	0.54	0.5949
trt5-trt6	-5.4400000	2.26080436	-2.41	0.0250
trt1-5 vs 6	1.3500000	1.76574465	0.76	0.4527

Comparison of treatment means from a Factorial Design

Consider the example from the Factorial Design.

We have a simple Factorial Design, with both factors being fixed effects, Diet with 3 levels and Sex with 2 levels, so that the effects are all being tested against the Residual Mean Square. The interaction effect was statistically significant (Pr < .0034). Thus the main effects loose their importance and we should be looking at the 'Simple Effects'. From this 3 X 2 Factorial we therefore have 6 Diet X Sex combinations, or 'Simple Effects'. Therefore in terms of 'Simple Effects', if we wish to make multiple comparisons or ad hoc, post priori tests we have 6 'levels', hence 5 degrees of freedom. These 5 degrees of freedom are equal to the 2 d.f. for Diet + 1 d.f. for Sex + the 2 d.f. for the Diet X Sex interaction.

Simple effects Estimate ± s.e.

µ + a₁ + b₁ + ab₁₁ 105.25 ± 1.40

µ + a₁ + b₂ + ab₁₂ 95.00 ± 1.40

µ + a₁ + b₃ + ab₁₃ 91.25 ± 1.40

µ + a₂ + b₁ + ab₂₁ 102.25 ± 1.40

µ + a₂ + b₂ + ab₂₂ 102.25 ± 1.40

µ + a₂ + b₃ + ab₂₃ 89.50 ± 1.40

Simple effects	Estimate ± s.e.
µ + a₁ + b₁ + ab₁₁	105.25 ± 1.40
µ + a₁ + b₂ + ab₁₂	95.00 ± 1.40
µ + a₁ + b₃ + ab₁₃	91.25 ± 1.40
µ + a₂ + b₁ + ab₂₁	102.25 ± 1.40
µ + a₂ + b₂ + ab₂₂	102.25 ± 1.40
µ + a₂ + b₃ + ab₂₃	89.50 ± 1.40

Using the same principles as above, for a multiple comparison test using Scheffé's test, we have 6 'treatments' (combinations); therefore s = 6, and s-1 = 5. The residual degrees of freedom are 18.

F 5%, 5 d.f., 18 d.f. = 2.77

= 3.7216

Consider the difference between A₁B₂ ( 95.00) and A₂B₂ (102.25).

trt A_1B_2 - trt A_2B_2 = 95.00 - 102.25 = -7.25 k` = (0 0 0 0 1 -1 0 1 -1 0 0 0) s.e. = 1.98 SAS statement estimate 'A1B2 - A2B2' a 1 -1 A*B 0 1 0 0 -1 0;

Then the critical difference is 1.98*3.7216 = 7.369

Thus an aposteriori test, à la Scheffé's test, would accept the null hypothesis, that there is no difference, since the difference is less than the critical difference; whereas a simple t-test would reject the null hypothesis :

t-calculated = 7.25/1.98 = 3.66

and the tabulated t value for 5% and 18 d.f. is 2.101; less than our computed t value.

How to do this using SAS? The approach to use is much the same as that described above for a CRD design, we use the SAS estimate statement to get SAS to compute the estimate and standard error of the particular contrast that we are interested in; the rest we do by hand!

Multiple Comparisons in a Nested Analysis

The above approaches can be extended to Nested Designs and Analyses, effectively the only difference is that in a Nested (Subsampling) Analysis the Residual Mean Square (Error) is replaced by the appropriate Mean Square, the same one as used in the Analysis of Variance.