Number of sub-samples and Sample Size

When we are planning an experiment we are often faced with deciding upon the number of sub-samples per experimental unit and the number of experimental units required. There is little point in carrying out an experiment if there is only a small chance of being able to detect any differences even if they are real and do exist. We might as well save ourselves the time and effort and instead go and relax on the beach; at least we will get a nice tan!

The two factors, number of sub-samples and sample size, are usually not independent of oneanother. Most text books (and Steel, Torrie and Dickey is no exception) only deal with the simple question of sample size when there are no sub-samples. This is just one particular case, and often causes confusion when attempting to plan experiments involving sub-samples (nested designs). However, with a few basic rules and principles it is relatively easy; it requires us to determine the variance of our measurements. We will look at some formulae and specific examples.

Number of sub-samples

Cochran (xx) (STD ...) provides a formula for deciding upon the optimum number of sub-samples per experimental unit; it depends upon the relative variabilities of the experimental units and sub-samples, and their relative costs.

n = Ö [ (c₁ * s²_e)/ (c₂ * s²_exp)]

where

c₁ = the cost per experimental unit

c₂ = the cost per sampling unit

s²_e = the variance amongst sampling units

s²_exp = the variance amongst experimental units

For example, using the data from section (Random effect) on apple trees we found that the variance between trees ( s²_tree ) was 32 and that the variance amongst apples within trees ( s²_e ) was 12. Suppose that we are going to carry out an experiment and that we calculate that it will cost 20$ per tree and 0.12$ per apple measured. Then the optimum number of apples per tree (sub-samples per experimental unit; n.b. tree is the experimental unit) is

n = Ö [( 20 * 12)/(0.12 * 32)] = 8.14

Thus the optimum number of sub-samples per experimental unit (apples per tree) will be 8 or 9; let us decide to measure 9 apples per tree.

Sample size

How do we calculate sample size? Most text books on statistics give the classical formula for determining the necessary sample size:

n ³ (Z_a/2 + Z_b)² * ( s / d)²

This formula is correct, but is only appropriate as it stands when we have the correct s. This is often assumed to simply be the residual variance s²_e; which is only true if we are dealing with a design where there is NO sub-sampling, i.e. a 'fixed effects' only model! What if we have a design/plan where we cannot measure the experimental unit directly, but rather we take measurements on [sub-]samples? The above example with apples trees is an example of this; the experimental unit to which we apply a treatment is the tree, but we measure the weight of the apples on the tree. We have concluded in the above section, that 9 apples per tree would be sensible. What we have to do is determine s; s is the variance of our measurement of the experimental unit. Recall that the measurements are made on the apples, so effectively the measure for the tree is the average of the 9 apples. What is the variance of the mean of these 9 apples? As you will undoubtedly remember from Statistical Methods I (or it's equivalent) the variance of a mean is the variance divided by the number of measurements. What is the variance (Mean Square) amongst trees? The Mean Square, for an experiment with 9 apples per tree, would be:

MS_trees = s²_e + 9 * s²_tree

so the variance (our s²) of a tree mean will be

s² = s²_e/9 + s²_tree

Note, as the number of sub-samples (apples) increases so s²_e/n will decrease, but even with a very large number of apples per tree we will still have the component of the variance due to trees s²_tree which will not decline! If we compute the s² in this way we will arrive at the correct determination for the size of our experiment. Note, that what is given here is entirely consistent with the classical formula; in the classical case where there is no subsampling, then the number of measurements which contribute to each experimental unit measure is 1, and the variance of a mean of 1 number is simply the variance, which will be the residual variance.

Assessing Power, Precision and Sample Size

Stroup, a practicing statistician at the University of Nebraska, Lincoln, has written a very nice article about calculating the number of experimental units one needs. The web link to the article is given below:

http://statistics.unl.edu/faculty/steve/802/2001/power_sas.pdf

Another paper, published in the [American] Journal of Dairy Science, 2006, is entitled "Estimating statistical power of mixed models used in dairy nutrition experiments", Kononoff, R. J., and Hanford, K. J. JDS v89:P3968-3971. This paper is available electronically through the McGill Library system. Although the title refers to mixed models and dairy nutrition it is quite general and applicable to any field and/or class of model, fixed or random or mixed; dairy cattle, humans, soils, etc. Recommended.

Steel, Torrie and Dickey, Chapter 14.6, and Chapter 11