Erle C. Ellis, Rong Gang Li, Lin Zhang Yang and Xu Cheng. 2000. Long-term Change in Village-Scale Ecosystems in China Using Landscape and Statistical Methods. Ecological Applications 10: 1057-1073.

Ecological Archives A-010-006

Manual for Calculating Data Quality Pedigrees using the Microsoft Excel® Visual Basic Program in pedigree.xls

1. Overview
2. Calculating
Data Quality Pedigrees using pedigree.xls
3. Using the pedigree calculation demonstration worksheets.
4. Using the pedigree calculation program for your own variables.
Literature Cited
Useful References
Figures

 

1. Overview
This document supports the calculation of data quality pedigrees using the Microsoft Excel® Visual Basic program referenced in Ellis et al. (2000) and supplied in the Microsoft Excel® 97 workbook file: pedigree.xls.  Data quality pedigrees represent the statistical (stat), empirical (emp), and methodological (meth) quality of a measured or estimated variable using a vector of three ordinal scores from 0 to 4, as described in Table 6 of Ellis et al. (2000).  Pedigrees are expressed using the notation {stat,emp,meth} or as a grade, normalized from zero to one, equal to: (stat + emp + meth)/(4 + 4 + 4).  Pedigrees for variables that have been measured or estimated directly (“original variables”) are scored directly by subjective appraisal based on Table 6 of Ellis et al. (2000).  Pedigrees for variables calculated as a function of other variables (“derived variables”) are calculated using the pedigree calculation program in pedigree.xls.  The pedigree.xls workbook includes an interactive demonstration using example worksheets to demonstrate the use of the pedigree calculation program and a blank standardized worksheet for entering and calculating pedigrees of one’s own.

Section 2, “Calculating Data Quality Pedigrees using pedigree.xls ”, provides a general description of the use of pedigree.xls.  Section 3, “Using the pedigree calculation demonstration worksheets” offers step-by-step instructions for an interactive demonstration of pedigree calculation using pedigree.xls.  The reader is encouraged to skim over section 2 to get a general idea of pedigree calculations, and then proceed to section 3, where the interactive demonstration will clarify the concepts in section 2.  Section 2 is also a reference for making one’s own pedigree calculations, as described in section 4: ”Using the pedigree calculation program for your own variables”.  We recommend printing this manual before using pedigree.xls, so that the pedigree.xls file can be expanded to occupy the entire screen.  This document assumes that the user is experienced with using named cells in Excel formulas, a standard practice explained in the Excel help files.  For more details on the algorithms used in the pedigree calculation program, see the documentation in pedigree_notes.

2. Calculating Data Quality Pedigrees using pedigree.xls

Requirements.  The pedigree.xls file is designed to run on computers with MS Windows® and Excel® 97, 2000, or higher.  It should also run on Macintosh® computers with Microsoft Excel® 98, but this has not been tested.  The demonstration works best on computers with screen settings of 1024 × 768 or greater and color is required.  When opening pedigree.xls in Excel, make sure to enable macros if queried about this.  On smaller screens, we advise the user to adjust the zoom of spreadsheets in Excel using the View>zoom feature.  To calculate variables using probability distributions (PDFs) and Monte Carlo simulation, @RISK® software for Excel (Palisade Corporation 1996) must be installed.  This demonstration assumes that @RISK® is not installed, and that variables will be represented by their mean values.

File Description.  pedigree.xls is an Excel® 97 workbook file consisting of six worksheets (see below) and a set of embedded Visual Basic programs used for calculating data quality pedigrees.  If desired, all subroutines, forms, and variables used by the pedigree calculation program can be viewed using the Visual Basic editor in Excel, however, this is not necessary for their use. 

Worksheets in pedigree.xls and their use:

Sheet name

Contents

Notes

Title page.

Demo_blank

Standardized worksheet provided for a demonstration exercise in calculation pedigrees.

Demo_calculated

Correct results of the demonstration exercise.  All formulas have been removed- use only for reference.

Demo_w_PDFs

Standardized worksheet provided for a demonstration of pedigree calculation with PDFs (requires @RISK® software).

blank_worksheet

Blank standardized worksheet for entering and calculating pedigrees.

Weights

Normalized inverse variance weight calculator

 

General notes on calculating pedigrees using pedigree.xls.

The pedigree calculation program in pedigree.xls is designed to work with standardized spreadsheets with tables formatted as illustrated on sheets Demo_blank (FIG 1), Demo_w_PDFs, and blank_worksheet.  These spreadsheet tables contain the following fields:

Fields (columns) in standard pedigree calculation worksheets:

Field (column)

Description

Variable_Name

Name of each variable.  Do not use any spaces or special characters besides periods (“.”) or underscores (“_”) in variable names..

PDF

The mean, PDF, or formula for the variable.  Enter the mean for original variables and the formula for derived variables using named variables in standard Excel equations.  If @RISK® software is available, enter PDFs for original variables using RISK functions.

Units

The units for the variable (optional)

Variable_Desc

A description of the variable.

o_d

Enter “o” if the variable is an original variable, or “d” if it is a derived variable.

Ped_Stat

For original variables, enter the Statistical Quality score as an ordinal from 0 to 4.  Leave blank for derived variables- these are calculated using the pedigree calculation algorithm.

Ped_Emp

Empirical Quality score, entered same as above.

Ped_Meth

Methodological Quality score, entered same as above.

Grade

Grade: calculated automatically from the pedigree data using a formula, do not enter or erase this cell.

pedigree

Data quality pedigree vector in text form, calculated automatically from the pedigree data using a formula, do not enter or erase this cell.  This cell can be copied and pasted to use the final pedigrees in text format.

CV

Coefficient of variation: calculated automatically from the standard deviation and mean data using a formula, do not enter or erase this cell.

Mean

Mean, entered into cell.

Median

Median, entered into cell.

Std

Standard deviation or Standard error, entered into cell.

Variance

Variance, entered into cell.

 

The program will fail if any of these columns (“fields”) are renamed, missing, or if blank rows or columns are added within the tables on the standardized worksheets.  It is possible to enter new fields in the table, as long as the standard fields remain and there are no blanks added in the top row.

Data are entered for each variable using a single row for each variable.  The row begins with the name of each variable in the Variable_name column (“field”).  Variable names must conform to the rules listed in the table above.  Data must be entered into each colored cell in the table, while white, uncolored cells are calculated, either by the pedigree calculation algorithm or by standard Excel formulas and should be left as is.  For original variables, data are entered for each column except for the grade, pedigree and CV.  The PDF (“probability distribution function”) column represents the “best estimate” for variables.  If the user has @RISK® software for Excel (Palisade Corporation 1996), PDFs representing each original variable can be entered into this column, as in Ellis et al. (2000).  However, the results of pedigree calculations are the same when the mean estimate for each variable is entered into this cell- this is the data entered for this demonstration.

For derived variables, data are entered for each variable as for original variables, except that formulas based on original variables (named cells) are entered into the PDF column and the Ped_Stat, Ped_Emp, & Ped_Meth columns are left blank, to be calculated using the pedigree calculation algorithm.  To enter formulas for derived variables using named cells, it is first necessary to enter the data for each original variable.  The purple Name Variables button on the worksheet (FIG 1) is then pressed to name the cells containing the original variables so that the names can be used in Excel formulas.  Note that by placing the cursor in a cell and pressing the 'F3' key, the named variables available in the workbook are displayed and can be entered into formulas.  When derived variables are calculated as a function of other derived variables, it is sometimes necessary to press the Name Variables button more than once.  If it is necessary to rename the variables, click on the black Delete Names button (FIG 1), and then click the Name Variables button again.  The descriptive statistics needed for derived variables (mean, median, standard deviation, and variance) can be calculated from original variables using Taylor series (Taylor 1997), or by Monte Carlo simulation of PDFs using @RISK, if available.

Once the necessary data have been entered for each variable, pedigrees for derived variables can be calculated by placing the cursor in the row containing the variable and clicking on the green Calc Pedigree button, which opens the Main Calculation form for pedigree calculation (FIG 2).  When pedigree calculations are simple, they are performed automatically.  This is indicated by a green ENTER? button in the Main Calculation form (FIG 2).  When this is the case, the calculated pedigree can be entered into the worksheet by clicking on the ENTER? button or by pressing return.  Note that the Main Calculation form provides a great deal of useful information for pedigree calculations, including the number and type of mathematical operations, variables, weights and constants, and provides a simplified formula with components renamed using a standard system in the formula window (“V1”, “V2”… for variables, C1, C2… for constants, etc., FIG 2).

When pedigree calculations cannot be made automatically, this is indicated by a red EXIT button (FIG 3).  If the pedigree cannot be calculated, the reason is described in the red notes window, and the EXIT button should be pressed, so that the problem can be fixed in the spreadsheet (missing data is usually the problem).  If the pedigree can be calculated, as indicated by the text “CAN CALCULATE” in the red notes window (FIG 3), the pedigree must be calculated using the custom calculation system.  This is accomplished by conceptualizing the calculation as a set of simply calculable components, selecting the variables within each component and making the component calculations with each set of variables separately. 

To make the component calculations, the variables for one component are selected by clicking the checkboxes next to the variable code in the variables window at the bottom of the Main Calculation form (FIG 3, variable codes are presented in the the formula window) and pressing the Custom Calculate button.  This opens the Custom Calculation form displaying the selected variables (FIG 4).  Clicking on the ADD, SUBTRACT, Multiply, or Divide, buttons then makes the calculation, or a selected variable can be entered for multiple uses by clicking the ENTER button (FIG 4).  Clicking any of these buttons opens the Calculation Progress form (FIG 5), which displays the component calculations.  To calculate another component, click on the View Original Data button to return to the Main Calculation form, and make another calculation using the procedure described above (FIG 5).  To use the results of component calculations to in further calculations, including the final pedigree calculation, select the component calculations in the Calculation Progress form by clicking the checkboxes next to each component and then clicking on the Calculate Selected button (FIG 6).

Note that it is possible to maneuver between the three calculation forms (Main Calculation, Custom Calculation, & Calculation Progress) as follows:

When the final calculation has been made from the component calculations, the calculated pedigree is selected for entry into the worksheet by double clicking on the final calculation in the yellow window at the bottom of the Calculation Progress form (FIG 6).  After double clicking the final calculation, the calculated pedigree should appear in the green Selected Pedigree window of the Calculation Progress form (FIG 6).  Pressing the “ENTER Pedigree” button and confirming the popup windows enters the pedigree for the variable into the worksheet (FIG 6).  A demonstration of these methods is presented in the following section.

 

3. Using the pedigree calculation demonstration worksheets.

This section presents a demonstration of pedigree calculation using the Demo_blank worksheet in pedigree.xls.  The Demo_blank worksheet consists of the variables needed to calculate the total amounts, percentage and difference of an element in two hypothetical land types (A & B) from original variables for the area, depth and concentration of the element in each land type.  To demonstrate the averaging of multiple independent estimates using normalized inverse variance weights (Eq 1, Ellis et al. 2000; Weights worksheet of pedigree.xls), estimates of element concentration made by two hypothetical methods (#1 & #2) are included in the calculation.  The general equation for the total amount in each land type = area × depth × concentration.  The formulas for derived variables can be observed by clicking in the PDF column of each variable.  Each variable occupies one row, with a “d” in the “o_d” column indicating derived variables and an “o” for original variables.  All variables have already been named using the Name Variables button, and all of the needed data including descriptive statistics have been entered (these were generated using Monte Carlo simulation from the PDFs in the Demo_w_PDFs worksheet). 

To calculate pedigrees for each derived variable, place the cursor in the Variable_name column in each variable row and press the green Calc Pedigree button at left (FIG 1).  Start at the top of the derived variables (conc_B_mean) and work down - this will take care of dependent variables.  For all variables except Total_A_and_B, the pedigree will be calculated automatically, yielding a green ENTER? button that can be pressed to enter the calculated pedigree directly into the worksheet (FIG 2). 

Calculating the pedigree for the Total_A_and_B variable requires the use of the custom pedigree calculation system described in general in the section above and detailed for this calculation as follows.  Once the Main Calculation form for the Total_A_and_B variable has opened (after clicking the Calc Pedigree button), a red EXIT button is observed, and a simplified version of the formula is provided in the formula window with variables renamed using a standard system in (V1 – V5, note that the depth variable, now named “V2” is used twice; FIG 3).  The pedigree for Total_A_and_B can be calculated by breaking the formula into two simple components, (V1 × V2 × V3) and (V4 × V2 × V5), which can then be added to make the final pedigree.  To calculate the first component, select variables V1, V2, & V3 by clicking the checkboxes next to each in the variables window at the bottom of the form (FIG 3) and then press the Custom Calculate button.  The Custom Calculation form will then open displaying the selected variables (FIG 4).  Click on the Multiply button to multiply the variables, thereby opening the Calculation Progress form (FIG 5), which displays the calculation (1).  Click on the View Original Data button to return to the Main Calculation form, unselect the previous variables by clicking on their checkboxes, select variables V2, V4, & V5, and press the Custom Calculate button, and finally the Multiply button on the Custom Calculation form, bringing you back to the Calculation Progress form.  Select component calculations (1) and (2) in the Calculation Progress form by clicking the checkboxes next to each component and then click the Calculate Selected button (FIG 6).  When the Custom Calculation form opens, click on the ADD button to add the components.  To enter the final calculated pedigree into the worksheet, select calculation (3) by double clicking it in the yellow window at the bottom of the Calculation Progress form (FIG 6).  After double clicking the final calculation, the calculated pedigree should appear in the green Selected Pedigree window of the Calculation Progress form (FIG 6).  Pressing the ENTER Pedigree button and confirming the popup windows enters the pedigree for the variable into the worksheet (FIG 6).

Pedigrees for the remaining variables can be calculated automatically as for those previous to Total_A_and_B. Once complete, the pedigree calculations can be checked by reference with the Demo_calculated worksheet.

 

 4. Using the pedigree calculation program for your own variables.

To use the pedigree calculation program in pedigree.xls for your own Excel workbooks, it is necessary to save the entire file under a new name and use worksheets formatted the same as sheets Demo_blank  (FIG 1), Demo_w_PDFs, and blank_worksheet.  This is because the pedigree calculation program is comprised of multiple Visual Basic components embedded in the file and is best copied as a whole.  The easiest way to accomplish this is to copy and rename the pedigree.xls file and delete all but the blank_worksheet sheet.  This sheet can then be copied and renamed and the top two rows can be copied below the two example rows to make rows for new variables, as necessary.  Note that it is usually necessary to use the Delete Names button (FIG 1), to remove existing variable names and then click the Name Variables button to name the newly entered variables.  Do not forget to include the buttons on the standard worksheets- these are necessary for calculations.

 

Literature Cited

Ellis, E.C., R.G. Li, L.Z. Yang, and X. Cheng.  2000.  Long-term change in village-scale ecosystems in China using landscape and statistical methods. Ecological Applications 10: 1057-1073.

Palisade Corporation.  1996.  @RISK advanced risk analysis software for spreadsheets. Version 3.5.2. Palisades Corporation, Newfield, New York, USA.

Taylor, J.R.  1997.  An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements, 2nd edition. University Science Books, Mill Valley, Calif.

 

Useful References

Costanza, R., S.O. Funtowicz, and J.R. Ravetz.  1992.  Assessing and communicating data quality in policy-relevant research. Environmental Management 16:121-131.

Funtowicz, S.O., and J.R. Ravetz.  1991.  Uncertainty and Quality in Science for Policy, Theory and Decision Library Series A: Philosophy and Methodology of the Social Sciences 15. Kluwer Academic Publishers, Dordrecht, The Netherlands.

Palisade Corporation. <http://www.palisade.com/html/risk.html>

Microsoft.  1997.  Visual Basic for Applications. Version 5.0. Microsoft, Redmond, Washington, USA.

 

 

FIG 1: Blank Pedigree worksheet before calculating pedigrees (the “Demo_blank” worksheet in pedigree.xls).


 

FIG 2: Main Calculation Form, automatic calculation.  After pressing the “Calc Pedigree” button with cursor in cell with “conc_B_mean” variable.  This is a simple pedigree calculation that can be calculated automatically.  To enter the calculated pedigree into the spreadsheet, press the green “ENTER?” button.

 

FIG 3: Main Calculation Form, custom calculation required.  After pressing the “Calc Pedigree” button with cursor in cell with “Total_A_and_B” variable.  Note that a red “EXIT” button replaces the green “ENTER?” button.  The text in the notes window however, says “CAN CALCULATE”, so that the data and variables are available to perform the calculation using the Custom Calculation system.  To begin the custom calculation, variables V1, V2 & V3 have been selected by clicking the checkboxes next to their names in the “Variables” window.  Next, the “Custom Calculate” button is pressed.


 

FIG 4: Custom Calculation Form after pressing the “Custom Calculate” button on the Main Calculation Form as described in FIG 3.  To multiply the selected variables (V1, V2 & V3), click on the “Multiply” button.



 

FIG 5: Calculation Progress Form after pressing the “Multiply” button on the Custom Calculation Form (FIG 4).  When this screen appears, press the “View Original Data” button to return to the Main Calculation Form, select variables V2, V4 & V5, and multiply them using the same procedure as in FIGs 3-4.


 

FIG 6: Calculation Progress Form after multiplying V2, V4, & V5 as described in FIG 5 and selecting calculations (1) and (2) in the Calculation Progress Form by clicking their checkboxes, pressing the “Calculate Selected” button, and adding them using the “ADD” button on the Custom Calculation Form (FIG 4).  This completes the custom pedigree calculation for the variable “Total_A_and_B” (FIG 3).  To enter the pedigree into the worksheet, double click the final output (calculation (3)) and press the “Enter Pedigree” button (press the “Enter” key when asked for confirmation).  The pedigree should now be entered into the worksheet.