Appendix B. Additional methods and results for multiple imputation analyses.
FIG. B1. Flowchart depicting the process of multiple imputation and subsequent analyses.
FIG. B2. Example of six imputed datasets for the freshwater species. Each plot shows Log10 CP vs. Log10 CN for one imputed dataset for the freshwater species. Listed in the upper right are the fitted slope and standard error for the effect of Log10 CN on Log10 CP. Across these six datasets, the mean slope is -0.67. The mean within-imputation variance in the slope estimate is 0.0054, and the variance in slope estimates across imputations is 0.031. Using the formula in “Methods – Multiple imputation”, the resulting total standard error for the slope is
FIG. B3. Patterns in CN, CP, and volume using only the 25 species for which all seven parameters were observed. Loading of CN, CP, and cell volume onto the first two PC axes of a PCA using these three variables.
FIG. B4. Power analysis of the ability to detect a CP-CN tradeoff in the freshwater data. Using the multivariate normal covariance matrix fitted to the freshwater data, we used multivariate random draws to generate 100 completely-observed fake datasets of the same size as the real dataset (69 species). For each dataset we used regression to test the effect of Log10 CP on Log10 CN, and saved the coefficient estimate. We then imposed on each fake dataset the same pattern of missingness that occurs for the freshwater species (Table B1). We then performed multiple imputation on each dataset as described in the Methods, and saved the coefficient estimate and p-value for the effect of Log10 CP on Log10 CN. We also performed the same regression using only the complete-case data (N = 12) (A) Histogram of the proportional error in the coefficient estimate due to missing data, for the regressions fit with multiple imputation. On average, missing data alters the coefficient estimate by 12%, relative to the coefficient fitted to the completely-observed dataset. (B) Histogram of P values for the effect of Log10 CP on Log10 CN, for the regressions fit with multiple imputation. 95% of P values are < 0.05, 88% are < 0.01. (C) Histogram of the proportional error in the coefficient estimate due to missing data, for the regressions fit to complete-case data. On average, missing data alters the coefficient estimate by 17%, relative to the coefficient fitted to the completely-observed dataset. (D) Histogram of P values for the effect of Log10 CP on Log10 CN, for the regressions fit to complete-case data. 80% of P values are < 0.05, 61% are < 0.01. The coefficients for the completely-observed datasets were all significant at P < 1.0E-06.
FIG. B5. Relationships between nutrient parameters and volume for the marine species. (A) Log10 vs. Log10 volume. (B) Log10 vs. Log10 volume. (C) Log10 vs. Log10 volume. (D) Log10 vs. Log10 volume. (E) Log10 vs. Log10 volume. (F) Log10 vs. Log10 volume. (G) Log10 CN vs. Log10 volume. (H) Log10 CP vs. Log10 volume. In all plots, filled circles are species with all data measured, triangles are mean imputed values for all other marine species.
TABLE B1. Cross-validation results for the imputation model in which freshwater and marine data were imputed using the same imputation model. For all six nutrient parameters, mean squared prediction error is reported for the imputation model (‘Imputation MSPE’), and a null model in which missing values are imputed using the mean of the observed values (‘Null MSPE’). Percent reduction in MSPE by the imputation model is also listed.
|No. of observations||14||26||17||47||43||47|
|No. of observations||27||35||25||30||30||30|