Factorial Analyses, Fixed effects

Factorial ANOVA, Fixed Effects

Factorial experiments arise when we have 2 or more factors with multiple measurements for each combination of factors. See Steel, Torrie and Dickey, Chapter 15.

Example with animals

An example with animals will be considered. Let us consider this in the context of a study of the effects of various diets on the growth of rats. We have 3 diets (1, 2 and 3), factor A, and 2 sexes (1 = Male and 2 = Female), factor B. We think that the effects of the 3 diets may not be the same in males and in females; i.e. an interaction between Diet and Sex. This is a 3 x 2 factorial design, similar to the example given below for plants. For each of the 6 combinations we decide to use 4 experimental units (i.e. 4 animals, each in seperate cages). Therefore we require 24 cages (12 males and 12 females; 1 rat per cage). The 12 males must be a random sample from available males and must be assigned to the 3 diets (1, 2 and 3) at random; similarly for the females.

Note, in the table below the first letter refers to Sex, (1 = Male) or (2 = Female), the second letter refers to the Diet (1, 2 or 3); the number refers to whether it is the first, second, third or fourth experimental unit for the given combination. Thus the 3 character indicator serves to designate any particular observation.

Random assignment of treatment combinations
Y₁₁₂, Y=104 Y₂₂₃, Y=103 Y₁₂₃, Y=91 Y₁₃₁, Y=94 Y₁₃₃, Y=92 Y₂₂₂, Y=104

Y₁₁₁, Y=104 Y₁₃₄, Y=93 Y₂₃₃, Y=92 Y₂₁₃, Y=101 Y₂₂₄, Y=104 Y₂₃₄, Y=89

Y₂₁₄, Y=105 Y₁₁₃, Y=107 Y₁₂₄, Y=93 Y₁₂₁, Y=99 Y₂₁₂, Y=102 Y₂₂₁, Y=98

Y₂₃₂, Y=91 Y₁₃₂, Y=86 Y₂₁₁, Y=101 Y₁₂₂, Y=97 Y₁₁₄, Y=106 Y₂₃₁, Y=86

Random assignment of treatment combinations
Y₁₁₂, Y=104	Y₂₂₃, Y=103	Y₁₂₃, Y=91	Y₁₃₁, Y=94	Y₁₃₃, Y=92	Y₂₂₂, Y=104
Y₁₁₁, Y=104	Y₁₃₄, Y=93	Y₂₃₃, Y=92	Y₂₁₃, Y=101	Y₂₂₄, Y=104	Y₂₃₄, Y=89
Y₂₁₄, Y=105	Y₁₁₃, Y=107	Y₁₂₄, Y=93	Y₁₂₁, Y=99	Y₂₁₂, Y=102	Y₂₂₁, Y=98
Y₂₃₂, Y=91	Y₁₃₂, Y=86	Y₂₁₁, Y=101	Y₁₂₂, Y=97	Y₁₁₄, Y=106	Y₂₃₁, Y=86

Example with crops/plants

An example with crops will serve to illustrate the basic concept. Suppose that we are interested in looking at 2 factors (Phosphorus and Nitrogen) and their effect on maize yield. We are going to use 2 'levels' of Phosphorus (1 = High = 150kg, and 2 = Low = 50kg), factor A, and 3 'levels' of Nitrogen (1 = High = 200kg, 2 = Medium = 150kg, and 3 = Low = 100kg), factor B. This is a 3 x 2 factorial. For each of the 6 combinations we decide to use 4 experimental plots; for a total of 24 (3 x 2 x 4) plots. The plots are the experimental unit and must be assigned to the treatment combinations at random.

Note the first letter refers to Phosphorus (High or Low) and the second refers to Nitrogen (High, Medium or Low); the number indicates whether it is the first, second, third or fourth experimental unit for the given combination. Thus the 3 character indicator serves to designate any particular observation.

Random assignment of treatment combinations
Y₁₁₂, Y=104	Y₂₂₃, Y=103	Y₁₂₃, Y=91	Y₁₃₁, Y=94	Y₁₃₃, Y=92	Y₂₂₂, Y=104
Y₁₁₁, Y=104	Y₁₃₄, Y=93	Y₂₃₃, Y=92	Y₂₁₃, Y=101	Y₂₂₄, Y=104	Y₂₃₄, Y=89
Y₂₁₄, Y=105	Y₁₁₃, Y=107	Y₁₂₄, Y=93	Y₁₂₁, Y=99	Y₂₁₂, Y=102	Y₂₂₁, Y=98
Y₂₃₂, Y=91	Y₁₃₂, Y=86	Y₂₁₁, Y=101	Y₁₂₂, Y=97	Y₁₁₄, Y=106	Y₂₃₁, Y=86

Then the linear model will be

Y_ijk = µ + A_i + B_j + AB_ij + e_ijk

This is a fixed effects model; we are interested in the fixed, specific levels of the factors A and B that we chose to consider. These results are not extrapolatable to other 'levels' of A or B, or even to intermediate amounts, i.e. 105 kg Nitrogen.

Then the Expected Mean Squares from the Analysis of Variance are :

SAS code, PROC GLM

Run this model and examine the output. Verify the Normal Equations, the fitted values and the contrast statements (specifically the implicit k' matrix printed by the /e option).

Fitted values, Contrasts and Sums of Squares

The fitted value for A₁, B₁ is :

With k' matrix

Thus the fitted values for factor A and their k' matrices are :

Fitted value = µ + a₁ + b₁ + ab₁₁

k' =

Fitted value = µ + a₁ + b₂ + ab₁₂

k' =

Fitted value = µ + a₁ + b₃ + ab₁₃

k' =

Fitted value = µ + a₂ + b₁ + ab₂₁

k' =

Fitted value = µ + a₂ + b₂ + ab₂₂

k' =

Fitted value = µ + a₂ + b₃ + ab₂₃

k' =

Thus the contrast between factor A₁ and A₂ is

With k' matrix =

Verify these algebraic computations and the numerical results. Compare them to the SAS output and check that with the k' matrix you obtain the Type III, Marginal, Sums of Squares for factor A (Phosphorus or Sex).

Note particularly that the Sums of Squares for factor A is thus adjusted for factor B, and for the (un)equal frequency of observations in each AB subclass, but does include interaction effects, i.e. SS_A = R( A | µ, B)

Repeat this same exercise for factor B (Nitrogen or Diet) fitted values and differences to ensure that again the k' matrix generated gives the Type III, Marginal, Sums of Squares for the effect of factor B (Nitrogen or Diets).

Again, note that the Sums of Squares for factor B = R( B | µ, A).

Note that although we are computing Sums of Squares, as well as linear functions of fitted values, for factor A adjusted for factor B, and vice versa, we are not able to adjust out completely the interaction components! We can account for unequal frequencies but we are unable to completely remove the interaction effects, as noted by their presence in the k` matrices.

For the Sums of Squares for the Interaction we have to remember that the Interaction measures whether the differences between factor A levels are the same at each level of factor B; i.e. the differences of the differences.

measures the difference factor A level 1 - 2 at level 1 of factor B.

Similarly measures the difference factor A level 1 - 2 at level 2 of factor B.

Thus the difference between these 2 differences measures the difference in the differences between factor A 1 - 2 at level 1 and level 2 of factor B :

A suitable k' matrix would be :

Similarly measures the difference between level 1 - 2 of factor A at level 3 of factor B.

Thus the difference between these 2 differences measures the difference in the differences between level 1 - 2 of factor A at level 1 and level 3 of factor B :

A suitable k' matrix would be :

Therefore we have 2 contrasts,

This measures the Interaction component. Note that the 2 contrasts involve only interaction components, thus the interaction component is free of main effects, i.e. it is R(AB | µ, A, B).

Construct a suitable k' matrix (of 2 rows) involving these 2 contrasts that will enable us to compute the Sums of Squares for the interaction effect.

Use this contrast in PROC GLM to compute the Sums of Squares and verify that they equal those in the following ANOVA table.

ANOVA for factorial design
Source of variation d.f. Sums Squares Mean Squares F-ratio Pr

SSR_m 5 857.833 191.567 24.28 .0001

A 1 4.167 4.167 0.53 .47

B 2 728.583 364.292 46.18 .0001

A * B 2 125.083 62.542 7.93 .0034

Residual 18 142.00 7.89

ANOVA for factorial design
Source of variation	d.f.	Sums Squares	Mean Squares	F-ratio	Pr
SSR_m	5	857.833	191.567	24.28	.0001
A	1	4.167	4.167	0.53	.47
B	2	728.583	364.292	46.18	.0001
A * B	2	125.083	62.542	7.93	.0034
Residual	18	142.00	7.89

If the interaction term is statistically significant what does this mean?

It means that the interaction term AB is presumed to be real and to exist; that the model cannot be simply explained in terms of µ, A and B (see Ch 15.1 and Ch 15.2 of Steel, Torrie and Dickey). Thus our fitted values are :

Looking at these estimable functions we can see that there is no way to obtain only differences between levels of a main effect completely free of other effects; i.e. (A₁ - A₂) is not estimable.

Why? An exercise. Write down algebraically all, each and every fitted value (estimated value).

We must consider what are sometimes called 'Simple Effects'.

Simple Effects

Simple effects are 'simply' (sic) the fitted values, i.e.

There are 6 fitted values in this example for the 'simple effects'.

Simple effects Estimate ± s.e.

µ + a₁ + b₁ + ab₁₁ 105.25 ± 1.40

µ + a₁ + b₂ + ab₁₂ 95.00 ± 1.40

µ + a₁ + b₃ + ab₁₃ 91.25 ± 1.40

µ + a₂ + b₁ + ab₂₁ 102.25 ± 1.40

µ + a₂ + b₂ + ab₂₂ 102.25 ± 1.40

µ + a₂ + b₃ + ab₂₃ 89.50 ± 1.40

Simple effects	Estimate ± s.e.
µ + a₁ + b₁ + ab₁₁	105.25 ± 1.40
µ + a₁ + b₂ + ab₁₂	95.00 ± 1.40
µ + a₁ + b₃ + ab₁₃	91.25 ± 1.40
µ + a₂ + b₁ + ab₂₁	102.25 ± 1.40
µ + a₂ + b₂ + ab₂₂	102.25 ± 1.40
µ + a₂ + b₃ + ab₂₃	89.50 ± 1.40

An exercise. Plot the fitted values (on the Y-axis) against diet (as 1, 2 and 3 on the X-axis) for both males and females, Graph. Estimate ( ± s.e.) the differences between males and females on each of the 3 diets.

What if we think that the variability in the different groups varies. i.e. that the variances in the various subclasses (Factor A * Factor B subgroups) are not homogeneous? Check back in the section Normality and Homogeneity of Variance. Does this give you any ideas how to tackle and answer this question? Jump to this section for more details on the actual SAS code for this problem, answer and discussion.

Here is another example, with unequal numbers of observations. Repeat the above exercises and construct the Analysis of Variance table and compute the fitted values.

SAS code, PROC GLM , example with unequal numbers of observations.