Factorial models with Regressions or Classification


We have examined the situation where we have a One-Way Analysis of Variance and the single factor in our model can be considered to be either qualitative (classification model) or quantitative ( regression model).

It is also quite comon to have a similar situation in multi-way and factorial models. For multi-way models where there is no interaction between the factor whihc may or may not be considered quantitative and the other factor(s) we can proceed, with respect to the qualitative/qunatitative factor, exactly as before. For a factorial-type model we have a more interesting case.

Consider a factorial model where we have 2 factors; soil type (Loam, Loam Peat, and Structured Soil), and different amounts of salt applied (0, 10, 15, 20 and 25 units), with 4 experimental units per combination. This is a 3*5 Factorial. We could use the following SAS code to read in the data and fit a Factorial model to test for the interaction.


data factor2;
input soil $ salt y;
if (salt eq 1) then amount=0;
if (salt eq 2) then amount=10;
if (salt eq 3) then amount=15;
if (salt eq 4) then amount=20;
if (salt eq 5) then amount=25;
cards;
L  1 23.5
LP 1 27.0
SS 1 26.8
.
.
;

proc glm data=factor2;
class soil salt;
model y = soil salt soil*salt;
run;

We could/can analyse these as a classification model for both factors (soil and salt) with the above factorial model. Could salt levels be considered as a quantitative factor? To address this it is helpful to remember that we can fit an equivalent model for the qualitative/quantitative factor, namely one with regression equal to the number of degrees of freedom for the classification effect.

R(salt | ....) = R(Lin,Quad,Cub,Quart | ....)

We might have the situation that the effects of salt levels (amount) were specific to the types of soil; this implies that a statistically significant soil*salt interaction exists, over and above the main effects of soil and salt. We might also have the situation that Linear and Quadratic regressions specific to each soil type might be adequate and that the additional fir from a qualitative consideration of the soil*salt interaction be not statistically significant.

R(soil, salt, soil*salt) = R(soil, Lin,Quad,Cub,Quart, soil*[Lin,Quad,Cub,Quart])

R(soil, Lin,Quad,Cub,Quart, soil*[Lin,Quad,Cub,Quart])

= R(soil, Lin,Quad,Cub,Quart, soil*Lin,soil*Quad,soil*Cub,soil*Quart])


proc glm data=factor2;
class soil salt;
model y = soil salt soil*amount soil*amount*amount soil*salt;
run;

Since we do not really understand what Cubic and Quartic regressions would mean they are only a synonym/equivalencefor verifying whether salt needs to be included as a classification factor or wheter Linear and Quadratic regressions might suffice. In the above GLM code we note that the soil*salt interaction, because the model is over-paramterized, represents the soil*cubic and soil*quartic effects, i.e. the Marginal, Type III effect of the non-quantitative effect of salt with soil.

If R(soil*Cub, soil*quart | ....) is statistically significant then a qualitative (classification) interpretation is most appropriate; we go back to our 3*5 Factorial.

If R(soil*Cub, soil*quart | ....) is not statistically significant then a qualitative interpretation is not needed; a more appropriate and parsimonious quantitative interpreation can be contemplated.


proc glm data=factor2;
class soil salt;
model y = soil salt soil*amount soil*amount*amount;
run;

This fits linear and quadratic interations with soil type, and also fits salt and a main effect as a classification factor; which model we can consider sub-dividing the main effect of amount of salt.


proc glm data=factor2;
class soil salt;
model y = soil amount amount*amount salt soil*amount soil*amount*amount;
run;

This will allow us to test whether the Reduction effect due to the Cubic and Quartic main effects of salt amount are a statistically significant improvement in the goodness of fit of the model over and above the Linear and Quadratic regressions. If the Cubic+Quartic effect is significant then we need to remain with our model with the main effect of salt as a classification effect. If the Cubic+Quartic effect is not statistically significant then we can use a simpler model with Linear and Quadratic regressions, i.e.
R(Soil, Linear, Squadratic, Soil*Linear, Soil*Quadratic)


proc glm data=factor2;
class soil salt;
model y = soil amount amount*amount soil*amount soil*amount*amount;
run;

IF R(Soil*Quad | ....) is significant, then it tells us that a quadratic regression effect of salt specific to each soil type is needed over and above a common quadratic effect of salt. If this is the case then we can re-write our model as:

R(Soil, Soil*Lin, Soil*Quad)


proc glm data=factor2;
class soil salt;
model y = soil soil*amount soil*amount*amount;
run;

IF R(Soil*Quad | ....) is not significant, then we can drop it a re-run with a simpler model.


proc glm data=factor2;
class soil salt;
model y = soil amount amount*amount soil*amount;
run;


R.I. Cue,
Department of Animal Science, McGill University ©
last update : 2010 May 23