Analysis of Variance
In the experimental designs described in this paper, line-fishing treatments are applied to reefs that are grouped into clusters. Within each cluster there are n (n = 1 or 2) replicates of each line-fishing treatment. Each reef is then measured either annually or six-monthly, although we used only annual sampling for our simulations. The resulting experimental design can be described as a three-factor experiment (Fishing Treatment * Cluster * Time) with repeated measures on the last factor. Groups of reefs are observed under all levels of the Time factor (years), but each group is assigned to only one combination of the other two factors. Winer et al. (1992), and Milliken and Johnson (1984) provide detailed discussions of repeated measures designs and their analyses.
The model used to describe the observed abundance of coral trout yijky on reef i in cluster j and subject to treatment k in year y was as follows:
yijky = m +Cj + Fk + (CF)jk+ hijk + Ty + (CT)jy + (FT)ky + (CFT)jky + e ijky
where m denotes the population mean for all reefs in all years. The error terms are hijk, which denote the reef within cluster error, and eijky, which denotes the time interval within reef error. We treated all factors as fixed effects since (1) fishing treatments are clearly limited in number and imposed by us, (2) years are sequential (not randomly sampled), and(3) clusters are limited in number by the availability of neighboring reefs with a history of closure to fishing and are in specific regions rather than chosen at random.
In addition to the usual assumptions of ANOVA the validity of the F tests for terms involving the repeated measurements (i.e., the terms involving time) rests on the conditions that (1) the variance/covariance matrices of the within-reef residuals over all times are homogeneous and (2) that the pooled error variance/covariance matrix over all treatments is spherical. Effectively, these conditions imply that the correlations between residuals from repeated measurements of the same experiment units are homogeneous among experimental units and across all paired combinations of times (Huynh and Feldt 1970, Winer et al. 1992). Strictly, if any of these assumptions is incorrect for a data set, then the ratios of mean-squares calculated will not be the expected F distribution, and the results of the statistical analyses are likely to be misleading. Typically, this would mean that the actual probabilities of Type 1 error would be greater than indicated by the analyses. ANOVA models are robust to violation of some assumptions than to violation of others, with homogeneity of variances and covariance (and independence of errors) being more critical than normality of errors.
ANOVA after modelling temporal error structure
We expect that, for the sequential data for each reef, the residuals in time are likely to have an auto-correlated error structure rather than an error structure satisfying the Huynh-Fedt condition above. This might arise from a combination of biological mechanisms that cause random disturbances to have persistent effects on abundances (e.g., survival over many years of unusually strong year-classes) and the non-randomization of the time treatment. In such cases the "between reef" analysis (C, F, C*F) can be carried out as usual, using the (observed) means over years for each reef (Milliken and Johnson 1984, Winer et al. 1992). The between reef analysis is always exact, irrespective of the presence or absence of autocorrelated errors in time, provided the usual assumptions apply to the behavior of the reef-means data.
There is no exact within-reef analysis (T, T*C, T*F, T*C*F), however, when the temporally separate residuals are not uniformly correlated. Valid approximate tests can be derived for such data by transforming the observations to remove their correlations through time, and analyzing the transformed data. While there are several different types of time series models which might be appropriate for describing the within-subject (temporal) error structure and transforming the observations (SAS 1992), we only chose a first-order autoregressive [AR(1)] model to remove potential auto-correlation from our simulation data, although we recommend exploration of alternative models for the future (real) data sets. The AR(1) model assumes that the error terms (eijky) can be expressed as:
eijky = reijky-1 + wijky
where wijky are independent random normal variates with mean 0 and variance
for all i, j, k, and y; r is the correlation between successive residuals for each experimental unit; and eijk0 are independent random normal variates with mean 0 and variance
/(1-r2) for all i, j and k (Milliken and Johnson 1984). Given this model, Albohali (1983) proposed an approximate transformation to remove the correlation between successive observations. The method uses the maximum likelihood estimate of r (
) and estimates of the w ijky (
) to "filter" the data as follows:
where
.
The within-reefs analysis described above is then done on the filtered data. (Note, in the description of Albohali's calculation of
given on page 367 of Milliken and Johnson there is a minus sign missing on the first term in the definition of A1, see Hasza 1980)
Calculation of contrasts
To compare the different line-fishing treatments at the same or different years, a between-subject comparison must be combined with a within-subject comparison. An approximate least significant difference (LSD) value for comparing treatments at the same time or different times is:
where n reefs subject to f fishing treatments in c clusters are sampled in each of y years. The value of t* is an approximate t value given by the following expression (Milliken and Johnson, section 26.2):
and is expected to be approximately distributed as a Students-t distribution with cf(n-1) degrees of freedom.
Parameter estimates from a Simplified Linear Model
The REEF program includes a procedure for fitting simplified General Linear Models (GLM) to the simulated data and estimating the magnitudes of contrasts between treatment effects of interest in each "year". As Walters and Sainsbury noted, "the GLM model involves statistical assumptions that are more difficult to justify than simple ANOVA/MANOVA models", and we review those assumptions here. Detailed treatments of Generalized Linear Modelling can be found in Searle (1971) and Graybill (1976).
Let yijky be the observed abundance on the i-th reef in cluster j subjected to the k-th treatment regime in year y and assume that this response can be written as an additive sum of cluster, fishing, and local (reef) effects:
yijky = Cj + (CT)jy + (FT)ky + eijky
where Cj is the time-averaged abundance in the absence of line fishing for reefs in cluster j; (CT)jy is a time-dependent departure from this mean that is shared by all reefs in cluster j; (FT)ky is a time-dependent departure from Cj due to the k-th fishing treatment regime as expressed in year y; and eijky is a reef-specific variation due to location and time effects not explained by fishing or shared with other reefs in the same cluster. As with the ANOVA analysis, the "residuals" eijky cannot be assumed to be independent and identically distributed random effects, or be forced to be so through any randomization process used in the selection of experimental reefs. Again, the eijky are likely to be autocorrelated, so, as before, we "filtered" the data in an attempt to remove the autocorrelated error. The linear model was then fitted again to the "filtered" data.
From this model of the experimental layout, the REEF program then estimates the magnitude of those contrasts stipulated by the user. In each case the estimated difference between treatment and control was compared with zero (the Ho), using a t test based on the standard error of the contrast and having ncft-p-1 degrees of freedom, where n = the number of reefs per treatment per cluster, c = number of clusters, f = number of fishing treatments, t = number of years sampled, and p = number of parameters estimated.
This approach to the experimental design encapsulates several assumptions that are examined rather than assumed in the ANOVA approach. Specifically, the simplified model above assumes:
(1) the fishing treatments will affect all reefs in all clusters similarly at any nominated time after depletion; i.e., there will not be a Cluster * Treatment * Time interaction.
(2) interactions between fishing treatments and clusters (when averaged over all year) are insignificant.These assumptions were generally supported by the results of the ANOVA analyses of the simulated data.
LITERATURE CITEDAlbohali, M. N. 1983. A time series approach to the analysis of repeated measures designs. Dissertation. Kansas State University, Manhattan, Kansas, USA.
Graybill, F. A. 1976. Theory and application of the linear model. Duxbury Press, North Scituate, Massachusetts, USA.
Hasza, D. P. 1980. A note on the maximum likelihood estimation for the first-order autoregressive processes. Communications in Statistics - Theoretical Methods A9:1411-1415.
Huynh, H., and L. S. Feldt. 1970. Conditions under which mean square ratios in repeated measures designs have exact F-distributions. Journal of the American Statistical Association 65:1582-1589.
Milliken, G. A., and D. E. Johnson. 1984. Analysis of messy data. Van Nostrand Reinhold, New York, New York, USA.
SAS. 1992. SAS/STAT Software: Changes and enhancements. SAS Technical Report P-229. SAS Institute, Cary, North Carolina, USA.
Searle, S. R. 1971. Linear Models. Wiley, New York, New York, USA.
Winer, B. J., D. R. Brown, and K. M. Michels. 1992. Statistical principles in experimental design. Third edition. McGraw-Hill, New York, New York, USA.