Ecological Archives E084-011-A1

Marti J. Anderson and Trevor J. Willis. 2003. Canonical analysis of principal coordinates: a useful method of constrained ordination for ecology. Ecology 84:511–525.

Appendix A. Statistical details of the canonical analysis of principal coordinates (CAP). A pdf version of this appendix is also available.

We provide here a detailed description of the canonical analysis of principal coordinate axes (CAP). First, let  be a matrix of i = 1,…, N real-valued observations on each of k = 1,…, p variables. In an ecological context, these are commonly counts of individuals of each of p species. Consider also that the observations are classified a priori into g groups with sample sizes  and . Let  be an  square symmetric matrix of distances or dissimilarities calculated between every pair of observation units.
 

The first step is to calculate principal coordinates from the distance matrix. These are orthogonal axes that place the points into Euclidean space as well as possible, while preserving the original distances or dissimilarities among the points (e.g., Gower 1966).

 
For principal coordinate analysis, let matrix , then center this matrix on its rows and columns to produce matrix G as follows:

(A.1)

where  is the average for row i is the average for column j and  is the average value for the entire matrix A. To get principal coordinate axes (Gower 1966), we do a spectral decomposition (eigenanalysis) of matrix G to obtain

(A.2)

where  is a diagonal matrix of ordered eigenvalues  and  is an  matrix of corresponding orthonormal eigenvectors. At least one eigenvalue will be zero, as only N – 1 axes are needed to describe the position of N points in Euclidean space. The column vectors of Q are the orthonormal vectors we require. Commonly, these axes are each normalized so that their variance is equal to their corresponding eigenvalue. Then, the first two (or more) axes may be plotted for an unconstrained ordination. Our purpose here, however, is to use principal coordinate analysis in order to obtain an orthonormal Euclidean (i.e., Mahalanobis) representation of the data cloud. Thus, no such normalization is done.

The next step is to do a traditional canonical analysis on these orthonormal axes. Here is where our hypothesis comes into the analysis. In the case of a hypothesis concerning a priori groups, let X be an  design matrix containing orthogonal variables coding for group contrasts (e.g., see Appendix C of Legendre and Anderson 1999). In the case of a hypothesis concerning relationships of species data with another set of quantitative variables (such as environmental variables), then X contains this set of variables. Then let  be the  projection matrix of rank r (note that  if X specifies groups). This is the usual idempotent “hat” matrix used for linear models (e.g., Neter et al. 1996).

To do the canonical analysis, we need to use only a subset, say the first m axes, of matrix Q. The  matrix that contains the first m columns of Q shall be denoted . The canonical analysis is accomplished by spectral decomposition (eigenanalysis) of the  matrix , thus:

(A.3)

where  is a diagonal matrix of ordered eigenvalues  which are the squared canonical correlations, and U is an  matrix of corresponding orthonormal eigenvectors. The number of canonical axes will be . The canonical variable scores for ordination are then obtained as . For plotting, the canonical variable scores are standardized by the square root of their corresponding eigenvalue (i.e., ).

The following test statistics for differences among groups, based on the CAP approach, have been suggested (Anderson and Robinson, in press). First, we have the trace statistic

,
(A.4)

which is equal to the sum of squared canonical correlations. Second, we have the maximum root statistic

,
(A.5)

which is the first squared canonical correlation. Each of these can be tested by permutation of the observation vectors of Y (or, equivalently, by permutation of the row vectors of ), the only assumption being that the observation vectors are exchangeable under the null hypothesis of no relationship with X or no differences among the groups (Anderson 2001). Note that in the case of Euclidean distances being used to calculate D and when m = p < N, then these two test statistics correspond to the trace and maximum root statistics for traditional CDA or CCorA (Anderson and Robinson, in press).

Formulation using singular value decomposition

An alternative formulation is obtained using singular value decomposition. A Cholesky decomposition of matrix H gives matrix B where . Consider the singular value decomposition of matrix  as follows:

(A.6)

where U and V are semi-orthogonal matrices, , with  and  is a diagonal matrix of positive eigenvalues , which are the canonical correlations. Then  are the canonical variables, which, for plotting, are standardized by their corresponding eigenvalue (i.e., ), as described above.

Classification

The classification of a new multivariate observation can be achieved by first considering the distances (or dissimilarities) of the new (N + 1)th observation point from each of the N previously observed points. Then, using Eq. A.1, the values it would take in the G matrix would be

.
(A.7)

The eigenvectors Q obtained from the spectral decomposition of G (shown in Eq. A.2 above) can also be written as

(A.8)

and rearranging this in terms of the principal coordinate eigenvector for a new (N + 1)th observation gives

.
(A.9)

Then, the position of the new observation in the canonical space is obtained by multiplying the values of   for each of the  principal coordinates by the appropriate values in the matrix of canonical eigenvectors U. That is,

(A.10)

for  canonical axes. We can then classify the observation into one of the groups by observing which group centroid is the closest to it in the canonical space, as measured by Euclidean distance and where the canonical axes have been standardized in the usual way according to the canonical correlations.

Goodness of fit

Consider that  and are N × s arrays with correlation . So, a least-squares estimate of  is  and we can estimate B by  or by . Using a “leave-one-out” approach (e.g., Lachenbruch and Mickey 1968), we can calculate the position in the principal coordinate space of the ith observation that has been left out, using only its distances from each other point, as described in the previous section on classification. This will be a 1 × m vector which we will call , whose corresponding row in matrix B is the 1 × r vector . Next, consider a prediction of  using  and matrices B and  which have been calculated from the remaining N – 1 observations, which we shall call  and , respectively. The residual sum of squares for a given choice of m is then

(A.11)

and a choice of m can be made where this criterion is minimized. In the case of a priori groups, the choice of m can also be made by calculating the “leave-one-out” misclassification error for increasing values of m, as described in the text. In practice, these two criteria will generally indicate the same or similar values for m, but note that they may differ for reasons given in the discussions by Williams (1983, p. 1290) and Seber (1984, Section 6.10, pp. 337-343).
 

Literature Cited

Anderson, M. J. 2001. Permutation tests for univariate or multivariate analysis of variance and regression. Canadian Journal of Fisheries and Aquatic Sciences 58:626–639.

Anderson, M. J., and J. Robinson. In press. Generalised discriminant analysis based on distances. Australian and New Zealand Journal of Statistics.

Gower, J. C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338.

Lachenbruch, P. A., and M. R. Mickey. 1968. Estimation of error rates in discriminant analysis. Technometrics 10:1–11.

Legendre, P., and M. J. Anderson. 1999. Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecological Monographs 69:1–24.

Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. 1996. Applied linear statistical models, Fourth edition. Irwin, Chicago, Illinois, USA.

Seber, G. A. F. 1984. Multivariate observations. John Wiley and Sons, New York, New York, USA.

Williams, B. K. 1983. Some observations of the use of discriminant analysis in ecology. Ecology 64:1283–1291.


[Back to E084-011]