We provide here a detailed description
of the canonical analysis of principal coordinate axes (CAP). First, let
be a matrix of i = 1,…, N real-valued observations on each of
k = 1,…, p variables. In an ecological context, these are commonly
counts of individuals of each of p species. Consider also that the observations
are classified a priori into g groups with sample sizes
and
. Let
be an
square symmetric
matrix of distances or dissimilarities calculated between every pair of observation
units.
The first step is to calculate principal coordinates from the distance matrix. These are orthogonal axes that place the points into Euclidean space as well as possible, while preserving the original distances or dissimilarities among the points (e.g., Gower 1966).
For principal coordinate analysis, let matrix
,
then center this matrix on its rows and columns to produce matrix G as
follows:
![]() |
(A.1)
|
where
is the average for row i,
is the average for column j and
is the average value for the entire matrix A. To get principal coordinate
axes (Gower 1966), we do a spectral decomposition (eigenanalysis)
of matrix G to obtain
![]() |
(A.2)
|
where
is a diagonal matrix of ordered eigenvalues
and
is an
matrix of corresponding orthonormal eigenvectors. At least one eigenvalue will
be zero, as only N – 1 axes are needed to describe the position of N
points in Euclidean space. The column vectors of Q are the orthonormal
vectors we require. Commonly, these axes are each normalized so that their variance
is equal to their corresponding eigenvalue. Then, the first two (or more) axes
may be plotted for an unconstrained ordination. Our purpose here, however, is
to use principal coordinate analysis in order to obtain an orthonormal Euclidean
(i.e., Mahalanobis) representation of the data cloud. Thus, no such normalization
is done.
The next step is to do a traditional
canonical analysis on these orthonormal axes. Here is where our hypothesis comes
into the analysis. In the case of a hypothesis concerning a priori groups, let
X be an
design matrix
containing orthogonal variables coding for group contrasts (e.g., see Appendix
C of Legendre and Anderson 1999). In the case of a hypothesis
concerning relationships of species data with another set of quantitative variables
(such as environmental variables), then X contains this set of variables.
Then let
be the
projection matrix of rank r (note that
if X specifies groups). This is the usual idempotent “hat” matrix used
for linear models (e.g., Neter et al. 1996).
To do the canonical analysis, we
need to use only a subset, say the first m axes, of matrix Q.
The
matrix that contains
the first m columns of Q shall be denoted
.
The canonical analysis is accomplished by spectral decomposition (eigenanalysis)
of the
matrix
,
thus:
![]() |
(A.3)
|
where
is a diagonal matrix of ordered eigenvalues
which are the squared canonical correlations, and U is an
matrix of corresponding orthonormal eigenvectors. The number of canonical axes
will be
. The canonical
variable scores for ordination are then obtained as
.
For plotting, the canonical variable scores are standardized by the square root
of their corresponding eigenvalue (i.e.,
).
The following test statistics for differences among groups, based on the CAP approach, have been suggested (Anderson and Robinson, in press). First, we have the trace statistic
, |
(A.4)
|
which is equal to the sum of squared canonical correlations. Second, we have the maximum root statistic
, |
(A.5)
|
which is the first squared canonical
correlation. Each of these can be tested by permutation of the observation vectors
of Y (or, equivalently, by permutation of the row vectors of
),
the only assumption being that the observation vectors are exchangeable under
the null hypothesis of no relationship with X or no differences among the
groups (Anderson 2001). Note that in the case of Euclidean
distances being used to calculate D and when m = p < N,
then these two test statistics correspond to the trace and maximum root statistics
for traditional CDA or CCorA (Anderson and Robinson, in
press).
Formulation using singular value decomposition
An alternative formulation is obtained
using singular value decomposition. A Cholesky decomposition of matrix H
gives matrix B where
. Consider the
singular value decomposition of matrix
as follows:
![]() |
(A.6)
|
where U and V are
semi-orthogonal matrices,
,
with
and
is a diagonal matrix of positive eigenvalues
,
which are the canonical correlations. Then
are the canonical variables, which, for plotting, are standardized by their
corresponding eigenvalue (i.e.,
),
as described above.
Classification
The classification of a new multivariate observation can be achieved by first considering the distances (or dissimilarities) of the new (N + 1)th observation point from each of the N previously observed points. Then, using Eq. A.1, the values it would take in the G matrix would be
. |
(A.7)
|
The eigenvectors Q obtained from the spectral decomposition of G (shown in Eq. A.2 above) can also be written as
![]() |
(A.8)
|
and rearranging this in terms of the principal coordinate eigenvector for a new (N + 1)th observation gives
. |
(A.9)
|
Then, the position of the new observation
in the canonical space is obtained by multiplying the values of
for each of the
principal
coordinates by the appropriate values in the matrix of canonical eigenvectors
U. That is,
![]() |
(A.10)
|
for
canonical axes. We can then classify the observation into one of the groups
by observing which group centroid is the closest to it in the canonical space,
as measured by Euclidean distance and where the canonical axes have been standardized
in the usual way according to the canonical correlations.
Goodness of fit
Consider that
and
are N ×
s arrays with correlation
.
So, a least-squares estimate of
is
and we can estimate
B by
or by
.
Using a “leave-one-out” approach (e.g., Lachenbruch and
Mickey 1968), we can calculate the position in the principal coordinate
space of the ith observation that has been left out, using only its distances
from each other point, as described in the previous section on classification.
This will be a 1 × m vector which we will call
,
whose corresponding row in matrix B is the 1 × r vector
.
Next, consider a prediction of
using
and matrices B
and
which have been calculated
from the remaining N – 1 observations, which we shall call
and
, respectively. The
residual sum of squares for a given choice of m is then
![]() |
(A.11)
|
and a choice of m can be
made where this criterion is minimized. In the case of a priori groups, the
choice of m can also be made by calculating the “leave-one-out” misclassification
error for increasing values of m, as described in the text. In practice,
these two criteria will generally indicate the same or similar values for m,
but note that they may differ for reasons given in the discussions by Williams
(1983, p. 1290) and Seber (1984, Section 6.10, pp. 337-343).
Anderson, M. J. 2001. Permutation tests for univariate or multivariate analysis of variance and regression. Canadian Journal of Fisheries and Aquatic Sciences 58:626639.
Anderson, M. J., and J. Robinson. In press. Generalised discriminant analysis based on distances. Australian and New Zealand Journal of Statistics.
Gower, J. C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325338.
Lachenbruch, P. A., and M. R. Mickey. 1968. Estimation of error rates in discriminant analysis. Technometrics 10:111.
Legendre, P., and M. J. Anderson. 1999. Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecological Monographs 69:124.
Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. 1996. Applied linear statistical models, Fourth edition. Irwin, Chicago, Illinois, USA.
Seber, G. A. F. 1984. Multivariate observations. John Wiley and Sons, New York, New York, USA.
Williams, B. K. 1983. Some observations of the use of discriminant analysis in ecology. Ecology 64:12831291.