Overview

Allometric equations represent the proportional and scaling relationships between different tree dimensions, such as the relationship between a tree’s diameter and its height, biomass, or crown size. When trees are considered at a population scale, their different dimensions are statistically related through shared ontogenic development patterns (Gould, 1966). These relationships remain consistent across tree sizes, from saplings to canopy dominants, when growing under similar conditions (Archibald & Bond, 2003; Bohlman & O’Brien, 2006; Dietze et al., 2008; King, 1996).

This biological principle underpins REDD+ carbon accounting: allometric equations translate field measurements of diameter at breast height (DBH) into biomass estimates, forming the foundation of emission reduction quantification.

Allometry Investment

Although Section 8 exempts structural allometric uncertainty when models are applied consistently, the choice of which model to apply directly impacts reported uncertainty. Allometry variance is absolute and compounds more dramatically so that model selection determines the magnitude of random error that cannot be exempted downstream. In practical terms, selecting a model with 20% RMSE versus 8% RMSE determines whether your project faces a 6% or 2% carbon credit deduction, representing a difference worth $200k in a 1M tCO2-e project at $5/tonne.

Environment Setup (R)

View Code
easypackages::packages(
    "animation", "BIOMASS", "cols4all", "covr", "cowplot", "caret",
  "DescTools", "dataMaid", "dplyr", "FawR", "ForestToolsRS", "forestdata", 
  "flextable", "ggplot2", "giscoR", "ggfortify", "htmltools", 
    "janitor", "jsonlite", "lattice", "leaflet.providers", "leaflet", 
    "lmtest", "lwgeom", "kableExtra", "kernlab", "knitr", "mapedit", 
    "mapview", "maptiles", "Mlmetrics", "ModelMetrics", "moments", 
    "olsrr", "openxlsx", "plotly", "psych", "randomForest", 
  "raster","RColorBrewer", "rmarkdown", "renv", "reticulate", 
    "s2", "sf", "scales", "sits","spdep", "stars", "stringr", 
    "terra", "tmap", "tmaptools", "tidymodels", "tidyverse", "tidyr", "tune",
    "useful",
  prompt = F
  )

Environment Setup (Python)1

View Code
# Installer prerequisites
import subprocess, sys

# Quick package install 
subprocess.run([sys.executable, "-m", "pip", "install", "--quiet", "-q",
  "contextily", "folium", "geopandas", "kaleido", "matplotlib", "numpy", 
  "openpyxl", "pandas", "plotly", "pysal", "pyproj", "pingouin","rasterio",
  "scikit-learn", "scipy", "seaborn", "statsmodels", "shapely", "skimpy",
  "xarray"], check=True, capture_output=True)

# Import packages into runtime
import pandas as pd, numpy as np
import matplotlib.pyplot as plt, seaborn as sns
import plotly.express as px, plotly.graph_objects as go
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split
from scipy import stats
import geopandas as gpd, rasterio, folium
from tabulate import tabulate

1.1 Allometric Equations

In its broadest sense, allometry describes any linear or non-linear correlation between increases in tree dimensions during ontogenic development (Picard et al., 2012). A more restrictive definition, originating with Huxley (1924), refers specifically to proportional relationships between relative increases in dimensions:

Allometric equations predict aboveground biomass from diameter measurements using species- or biome-specific parameters. Uncertainty compounds from three sources: model selection, parameter estimation, and field measurements. Log-transformation adds complexity through required back-transformation:

\[ \frac{\mathrm{d}B}{B} = a\frac{\mathrm{d}D}{D} \]

which integrates to the power relationship:

\[ AGB = \alpha \times DBH^{\beta} \]

and in logarithmic form:

\[ \ln(AGB) = \ln(\alpha) + \beta \times \ln(DBH) + \epsilon \]

Where:

  • AGB: Aboveground biomass (kg)
  • DBH: Diameter at breast height (cm)
  • α, β: Modelling exponents/parameters
  • ε: Random error term

This power-law form reflects the self-similarity principle observed in tree growth: as trees develop from seedlings to mature individuals, they maintain consistent proportional relationships between their dimensions (Gould, 1971). In this framework, the exponent β serves as the allometric coefficient, quantifying how one dimension changes relative to another during growth. For example, if β = 2.5, a 10% increase in diameter corresponds to a 25% increase in biomass. The parameter α is a scaling constant accounting for wood density, architectural form, and other species-specific characteristics.

West et al. (1997) and Enquist et al. (1999) developed an allometric scaling theory based on the “pipe model” (Shinozaki et al., 1964) and physical growth principles. Their framework predicts a universal exponent of β ≈ 2.67 based on biomechanical constraints, tree stability, and hydraulic resistance. While this theoretical exponent has been debated regarding its universality (Muller-Landau et al., 2006; Zianis & Mencuccini, 2004), it provides a physically grounded benchmark for evaluating empirical equations.

In this chapter, we compare allometry species-specific allometry with biome-generic equations that are more commonly applied in REDD+ programs to demonstrate how uncertainty metrics can inform model selection and impact carbon credit deductions. We adopt the broadest definition of allometry as any correlation between tree dimensions, whether in linear, log-log, power-law, or other functional forms, to examine the following questions:

  • How should uncertainty metrics inform model selection and optimization?
  • How does allometry uncertainty impacts carbon credit deductions

1.2 Model Selection

The choice of allometric equation directly determines uncertainty magnitude and carbon credit deductions. This section demonstrates quantitative differences between equation categories using the scbi_stem1 dataset from the ForestGEO plot inventory in Front Royal, Virginia. This 25.6-hectare mature secondary forest is dominated by Appalachian mixed hardwood species including tulip poplar (Liriodendron tulipifera), oaks (Quercus spp.), and hickories (Carya spp.), representing typical stand composition of the Blue Ridge and Piedmont regions.

For REDD+ carbon accounting, we tend to focus exclusively on ontogenic allometry to estimate relationships between accessible tree dimensions and total aboveground biomass as trees grow from seedling to maturity. However, some reading of evolutionary allometry is sometimes needed to address differences in specific traits that emerged in localized conditions. For example, the following species share in genus but vary in wood density, branch architecture, and bole form:

  • Acer rubrum (red maple): Lower density (0.49 g/cm³), faster growth
  • Acer saccharum (sugar maple): Higher density (0.63 g/cm³), denser wood
  • Acer negundo (box elder): Intermediate density (0.42 g/cm³), different architecture

Types of Equations

Allometric equations are classified by taxonomic scope and environmental specificity. In practice, the choice of equation category involves balancing precision, cost, and data availability:

  1. Species-Specific Equations: Developed from destructive sampling of target species, providing highest accuracy but limited geographic applicability (Gonzalez-Akre et al., 2022). These equations capture species-specific morphometric traits, such as branching architecture and wood density, which influence biomass allocation.
  2. Genus-Specific Equations: Aggregated across multiple species within a genus, offering broader applicability with moderate accuracy (Jansen et al., 1996; Jenkins, 2004). Assumes shared evolutionary heritage produces similar allometric scaling within genera.
  3. Biome-Generic Equations: Most commonly used in REDD+ report, pan-tropical, generic equations fitted across broader geographic regions and biomes, maximizing applicability but potentially introducing bias (Brown, 1997; Chave et al., 2009; Chave et al., 2014; West et al., 1997). Often incorporate wood density (WD) or environmental stress factors to capture regional variation.
  4. Environmentally-Conditioned Equations: Gold-standard allometric models that incorporate biophysical variables reflecting site-specific growing conditions, critical for REDD+ programs in specialized ecosystems (Komiyama et al., 2005, 2008; Rahman et al., 2021). These equations explicitly model environmental drivers of growth variation:
    • Salinity gradients: Mangrove allometry where salt stress affects growth rate, wood density, and architectural form
    • Soil fertility: Nutrient availability influencing wood density and height-diameter relationships
    • Climate: Temperature and precipitation gradients captured through environmental stress factors
    • Geomorphology: Tidal inundation frequency, elevation, or hydrological regime
Mangroves Tree Height

In specialized growing environments, such as mangrove habitats, the choice of allometry significantly affects accuracy estimates. For example, Rocha de Souza Pereira et al. (2018) found generic allometries to produce -18% & +14% wider error magins than species-specific equations in Brazilian mangrove forests. To compensate, generic equations incorporate proxies of environmental stress, such as wood density, height-diameter ratios or “stunting”, and species composition.

However, while inclusion of tree height significantly reduces bias in AGB estimates (Chave et al., 2014; Rutishauser et al., 2013), the accurate measurement of tree height in closed-canopy forests is especially challenging(King & Clark, 2011). Field data with high levels of tree height variance can limit these destructively sampled allometry models. This represents investment opportunity from reducing uncertainty through improved survey technology, such as LiDAR and RADAR (Feldpausch et al., 2011; Valbuena et al., 2016).

Data Preparation

We imported the scbi_stem1 dataset from the allodb package, which contains 2,287 individual tree measurements from the ForestGEO permanent plot at Front Royal, Virginia. The dataset includes 12 families, spanning multiple genera and species (Table 1.1). This taxonomic diversity allows us to demonstrate equation selection across different levels of specificity.

View Code
# import dataset from allodb.pkg
library("allodb")
data(scbi_stem1)

#scbi_stem1 |> dplyr::group_by(Family) |>
#   dplyr::summarise(`Tree Families sampled` = n()) |>
#   flextable::flextable()|> flextable::autofit()

scbi_stem1 |> dplyr::group_by(Family, genus, species) |>
    dplyr::summarise(`Total Trees (n)` = n()) |>
    flextable::flextable()|> 
    flextable::fontsize(size = 9, part = "body") |>
  flextable::fontsize(size = 8, part = "footer") |>
    flextable::set_table_properties(scbi_stem1, layout = "autofit", width=1, align = "center") 

Family

genus

species

Total Trees (n)

Adoxaceae

Sambucus

canadensis

20

Adoxaceae

Viburnum

prunifolium

19

Adoxaceae

Viburnum

recognitum

2

Annonaceae

Asimina

triloba

14

Aquifoliaceae

Ilex

verticillata

32

Betulaceae

Carpinus

caroliniana

57

Betulaceae

Corylus

americana

1

Cannabaceae

Celtis

occidentalis

24

Caprifoliaceae

Lonicera

maackii

1

Cornaceae

Cornus

florida

24

Elaeagnaceae

Elaeagnus

umbellata

1

Fabaceae

Cercis

canadensis

25

Fagaceae

Fagus

grandifolia

27

Fagaceae

Quercus

alba

14

Fagaceae

Quercus

michauxii

1

Fagaceae

Quercus

prinus

35

Fagaceae

Quercus

rubra

26

Fagaceae

Quercus

velutina

8

Hamamelidaceae

Hamamelis

virginiana

107

Juglandaceae

Carya

cordiformis

30

Juglandaceae

Carya

glabra

64

Juglandaceae

Carya

ovalis

16

Juglandaceae

Carya

sp

3

Juglandaceae

Carya

tomentosa

24

Juglandaceae

Juglans

nigra

13

Lauraceae

Lindera

benzoin

1,201

Lauraceae

Sassafras

albidum

8

Magnoliaceae

Liriodendron

tulipifera

17

Malvaceae

Tilia

americana

2

Nyssaceae

Nyssa

sylvatica

10

Oleaceae

Chionanthus

virginicus

3

Oleaceae

Fraxinus

americana

52

Oleaceae

Fraxinus

nigra

22

Oleaceae

Fraxinus

pennsylvanica

1

Oleaceae

Fraxinus

sp

1

Platanaceae

Platanus

occidentalis

6

Rosaceae

Amelanchier

arborea

123

Rosaceae

Prunus

avium

6

Rosaceae

Prunus

serotina

4

Rosaceae

Rosa

multiflora

151

Rosaceae

Rubus

allegheniensis

6

Rosaceae

Rubus

pensilvanicus

2

Rosaceae

Rubus

phoenicolasius

17

Sapindaceae

Acer

negundo

15

Sapindaceae

Acer

rubrum

17

Simaroubaceae

Ailanthus

altissima

1

Ulmaceae

Ulmus

americana

2

Ulmaceae

Ulmus

rubra

20

Ulmaceae

Ulmus

sp

12

View Code
## ----- Python cell ----- ##
## ----------------------- ##

demo_data_py = pd.read_csv("./data/scbi_stem1.csv")
Compiling Allometries

Scope permitting, this training would examine a full allometry workflow. Effectively, this which would require repeating the same process below for all genus in our field dataset, incorporating equations per species cumulatively. We provide brief demonstration of how to add each new equations using alldob operations.

We filter scbi_stem1 to Quercus observations, the largest genus subsample, to compare the scaling impact of generic allometry with subspecies and genus equations. Missing entries are also removed, providing a new sample of 84 stems from the dataset’s total measurements of 2,287 trees.

  • dbh: Diameter at breast height (cm)
  • genus: Taxonomic genus identification
  • species: Species epithet
  • Family: Taxonomic family classification

To demonstrate the comparative analysis, we created two filtered subsets: a genus-level dataset containing all Quercus observations (n=84 stems) and a species-specific subset for Quercus rubra. Missing DBH values were removed to ensure clean input for subsequent allometric calculations. The filtering procedure is detailed in Section 1.6.

View Code
# Genus-specific subsample of dataset
scbi_quercus = scbi_stem1 |>
  dplyr::filter(!is.na(dbh)) |>
    dplyr::filter(genus == "Quercus") 

# Species-specific subsample of dataset
scbi_quercus_rubra = scbi_stem1 |>
  dplyr::filter(!is.na(dbh)) |>
    dplyr::filter(genus == "Quercus") |>
    dplyr::filter(species == "rubra") 

Variable

n

mean

sd

median

min

max

skew

kurtosis

se

DBH (cm)

84.000

27.795

20.177

25.980

1.040

83.500

0.438

-0.719

2.202

genus*

84.000

1.000

0.000

1.000

1.000

1.000

0.000

species*

84.000

3.155

1.167

3.000

1.000

5.000

-0.568

-0.369

0.127

Family*

84.000

1.000

0.000

1.000

1.000

1.000

0.000

At this stage in the workflow, it helps to quickly scan the range and spread in DBH values. Applied directly, candidate equations can be filtered by min and max dbh values Deviation from the mean provides indicators of the statistical operations needed ahead during equation selection, bias correction, and model optimization. It also informs sample.

View Code
## ----- Python cell ----- ##
## ----------------------- ##

# tidy
demo_data_py = (
    demo_data_py
    .query('dbh.notna()')
  .query('dbh >= 5 & dbh <= 100')
  .reset_index(drop=True) # avoids KeyError
  )

# tabulate
numeric_data = demo_data_py.select_dtypes(include=[np.number])
desc_stats = pd.DataFrame({
    'vars': numeric_data.columns,
    'n': numeric_data.count().values,
    'mean': numeric_data.mean().values,
    'sd': numeric_data.std().values,
    'median': numeric_data.median().values,
    'min': numeric_data.min().values,
    'max': numeric_data.max().values,
    'skew': numeric_data.skew().values,
    'kurtosis': numeric_data.kurtosis().values
    }).round(2)

# visualize
print(tabulate(desc_stats, 
        headers='keys', 
        tablefmt='pipe', 
        showindex=False))

Equation Selection

Following best practices from forest inventory methodology (Duncanson et al., 2021), equation selection may proceed through the four sequential criteria in order of their priority below:

  • Geographic Proximity: Prioritize equations developed in climates and soil conditions similar to your project area
  • Taxonomic Specificity: Prefer species-level > genus-level > family-level equations
  • DBH Range Coverage: Ensure equation applicability spans ≥80% of measured diameter distribution
  • Sample Size Adequacy: Minimum n=50 trees for species-specific; n>150 for genus-level equations2

The allodb equations database contains 570 allometric equations with 47 metadata fields documenting equation provenance, geographic origin, taxonomic scope, DBH range, sample size, and model fitting parameters. Key selection criteria include geographic coordinates (lat/long), equation taxa, allometry specificity (species/genus/family), DBH range (min/max), sample size, and bias correction status. The complete metadata structure is available in Section 1.6.

View Code
# Load allometric equations
data(equations)
data("equations_metadata")

# display all equation criteria
dplyr::glimpse(equations)
NA Rows: 570
NA Columns: 47
NA $ ref_id                       <chr> "barney_1977_bdac", "baskerville_1966_dmp…
NA $ equation_id                  <chr> "4b4063", "e2c7c7", "e42e41", "2bc879", "
NA $ equation_allometry           <chr> "exp(3.63+2.54*log(dbh))", "exp(0.15+2.48…
NA $ equation_form                <chr> "exp(a+b*log(dbh))", "exp(a+b*(log(dbh))"…
NA $ dependent_variable           <chr> "Total aboveground biomass", "Total above…
NA $ independent_variable         <chr> "DBH", "DBH", "DBH", "DBH", "DBH", "DBH",…
NA $ equation_taxa                <chr> "Picea mariana", "Picea glauca", "Betula"
NA $ allometry_specificity        <chr> "Species", "Species", "Genus", "Genus", "…
NA $ equation_categ               <chr> "sp_spec", "sp_spec", "generic", "generic…
NA $ geographic_area              <chr> "Alaska, USA", "New Brunswick, Canada", "…
NA $ original_coord               <chr> "64\xb0 45', 148\xb0 15'", "47\xb051' N, …
NA $ lat                          <chr> "65.75", "47.9", "NRA", "NRA", "NRA", "NR…
NA $ long                         <chr> "-148.25", "-68.3", "NRA", "NRA", "NRA", …
NA $ elev_m                       <chr> "167-470", "500", "NRA", "NRA", "NRA", "N…
NA $ geography_notes              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ mat_C                        <chr> "NI", "2.2", "NRA", "NRA", "NRA", "NRA", …
NA $ min.temp_C                   <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ max.temp_C                   <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ map_mm                       <chr> "NI", "1069", "NRA", "NRA", "NRA", "NRA",…
NA $ frost_free_period_days       <chr> "NI", "110", "NRA", "NRA", "NRA", "NRA", …
NA $ climate_notes                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ stand_age_range_yr           <chr> "51-55 yrs", "35-48 yrs", "NRA", "NRA", "…
NA $ stand_age_history            <chr> "Recent fires in 1977", "Stand originated…
NA $ stand_basal_area_m2_ha       <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ stand_trees_ha               <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ forest_description           <chr> "Typical lowland and upland stands domina…
NA $ ecosystem_type               <chr> "Temperate forest", "Boreal forest", "Tem…
NA $ koppen                       <chr> "Dfc", "Dfb", "Dfb; Cfa", "Dfb; Cfa", "Df…
NA $ dbh_min_cm                   <chr> "1.5", "2.54", "5", "NRA", "5", "NRA", "5…
NA $ dbh_max_cm                   <chr> "13", "25.4", "NRA", "NRA", "NRA", "NRA",…
NA $ sample_size                  <chr> "18", "13", "NRA", "NRA", "NRA", "NRA", "…
NA $ collection_year              <chr> "1973", "1965", "1971, 1995", "1971, 1995…
NA $ dbh_units_original           <chr> "cm", "inch", "cm", "cm", "cm", "cm", "cm…
NA $ dbh_unit_CF                  <chr> "1", "0.393701", "1", "1", "1", "1", "1",…
NA $ output_units_original        <chr> "g", "lbs", "metric_ton", "m", "metric_to…
NA $ output_units_CF              <chr> "0.001", "0.453592", "1000", "1", "1000",…
NA $ allometry_development_method <chr> "harvest", "harvest", "harvest?", "harves…
NA $ regression_model             <chr> "log-transformed", "log-transformed", NA,…
NA $ r_squared                    <chr> "0.91", "0.99", "NI", "NI", "NI", "NI", "…
NA $ other_equations_tested       <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ log_biomass                  <chr> "1", "1", "1", "0", "1", "0", "1", "0", "…
NA $ bias_corrected               <chr> "0", "0", NA, NA, NA, NA, NA, NA, NA, NA,…
NA $ bias_correction_factor       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ notes_fitting_model          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ original_equation_id         <chr> "Upland black spruce", NA, NA, NA, NA, NA…
NA $ original_data_availability   <chr> "NRA", "NRA", "NRA", "NRA", "NRA", "NRA",…
NA $ equation_notes               <chr> NA, NA, "Proved difficult to obtain origi…

Table X: Full list of variables in equations metadata available used in selection criteria using allodb package below.


Navigating Databases
  • It is recommend using latitude and longitude variables over the field called geographic_area when filtering allodb database due to entry inconsistencies. This is done below using dplyr SQL. However, once the specific allometric equation is identified, we must re-select it using the allodb native function called new_equations() and the equation’s ID# (Gonzalez-Akre et al., 2025).
  • In subsequent cells, we import pan-tropical equations and fit them with values from by Global Wood Density database using the computeAGB() andgetWoodDensity() functions from the BIOMASS package (Réjou-Méchain, Tanguy, Piponiot, Chave, & Herault, 2017a).

Step 1: Geographic Selection

We recommend using latitude and longitude variables over the field called geographic_area when filtering allodb database due to entry inconsistencies. This is done below using dplyr SQL. However, once the specific allometric equation is identified, we must re-select it using the allodb native function called new_equations() and the equation’s ID# (Gonzalez-Akre et al., 2025).

In subsequent cells, we import pan-tropical equations and fit them with values from by Global Wood Density database using the computeAGB() andgetWoodDensity() functions from the BIOMASS package (Réjou-Méchain, Tanguy, Piponiot, Chave, & Herault, 2017a).

We filtered equations to North America using latitude bounds of 24-72°N and longitude bounds of -168 to -52°W. This geographic constraint ensures climate similarity with our Front Royal, Virginia study site (Cfa - humid subtropical classification). The filter retained 157 equations from the original 570, including equations developed in temperate forests across the eastern United States, southeastern Canada, and Alaska.

View Code
# Simple North America filter
eq_region <- equations |>
  dplyr::mutate(lat = as.numeric(lat), long = as.numeric(long)) |>
  dplyr::filter(!is.na(lat), !is.na(long), lat >= 24, lat <= 72, long >= -168, long <= -52)

# tabulate
show_cols   = c("ref_id", "equation_taxa", "allometry_specificity", "equation_allometry")
head(eq_region[, show_cols]) |> 
    flextable::flextable() |> 
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

ref_id

equation_taxa

allometry_specificity

equation_allometry

barney_1977_bdac

Picea mariana

Species

exp(3.63+2.54*log(dbh))

baskerville_1966_dmpi

Picea glauca

Species

exp(0.15+2.48*log(dbh))

bond-lamberty_2002_aabb

Larix laricina

Species

10^(2.648+0.715*(log10(dbh)))

bond-lamberty_2002_aabb

Salix

Genus

10^(2.481+1.19*(log10(dbh)))

bridge_1979_fpom

Quercus velutina

Species

exp(-0.34052+2.65803*log(dbh))

brown_1976_esbf

Juniperus communis

Species

exp(4.081+2.202*(0.36738 + 0.94932*log(dbh)))

View Code

#flextable::flextable(eq_region[, show_cols]) |> flextable::autofit() # complete list for Appendix
cat(sprintf("Equations valid to region: %d\n", nrow(eq_region)))
NA Equations valid to region: 157

Step 2: Taxonomic Selection

View Code
# species-specific equations for all Quercus subspecies of North America 
eq_region_species <- eq_region |>
    dplyr::filter(allometry_specificity == "Species") |>
    dplyr::filter(grepl("^Quercus", equation_taxa, ignore.case = TRUE) | # Starts with "Quercus"
        (allometry_specificity %in% c("Genus", "Family") & 
            grepl("Quercus", equation_taxa, ignore.case = TRUE))
        )

# genus-specific equations for Quercus populations of North America 
eq_region_genus = eq_region |> 
    dplyr::filter(allometry_specificity == "Genus") |>
    dplyr::filter(grepl("^Quercus", equation_taxa, ignore.case = TRUE) |  # Starts with "Quercus"
            (allometry_specificity %in% c("Genus", "Family") & 
        grepl("Quercus", equation_taxa, ignore.case = TRUE))
            )

# tabulate
flextable::flextable(eq_region_species[, show_cols]) |> 
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

ref_id

equation_taxa

allometry_specificity

equation_allometry

bridge_1979_fpom

Quercus velutina

Species

exp(-0.34052+2.65803*log(dbh))

chapman_1991_apac

Quercus rubra

Species

exp(-2.972-0.017*dbh+2.873*(log(dbh)))

clark_1986b_wvap

Quercus alba

Species

1.73738*(dbh^2)^1.33404

clark_1986b_wvap

Quercus alba

Species

1.69511*(dbh^2)^1.33917

clark_1986b_wvap

Quercus coccinea

Species

1.62952*(dbh^2)^1.34103

clark_1986b_wvap

Quercus coccinea

Species

2.94915*(dbh^2)^1.21733

clark_1986b_wvap

Quercus prinus

Species

0.86674*(dbh^2)^1.47585

clark_1986b_wvap

Quercus prinus

Species

3.40602*(dbh^2)^1.19049

clark_1986b_wvap

Quercus rubra

Species

3.06739*(dbh^2)^1.22338

clark_1986b_wvap

Quercus rubra

Species

3.28071*(dbh^2)^1.20936

clark_1986b_wvap

Quercus velutina

Species

1.2927*(dbh^2)^1.36723

clark_1986b_wvap

Quercus velutina

Species

2.5465*(dbh^2)^1.22586

martin_1998_aban

Quercus alba

Species

10^(-1.266+2.613*(log10(dbh)))

martin_1998_aban

Quercus coccinea

Species

10^(-1.283+2.685*(log10(dbh)))

martin_1998_aban

Quercus prinus

Species

10^(-1.587+2.910*(log10(dbh)))

martin_1998_aban

Quercus rubra

Species

10^(-1.259+2.644*(log10(dbh)))

perala_1994_abef

Quercus rubra

Species

0.1335*(dbh^2.422)*((3.706*dbh^0.4932)^0.4389)

this_study

Quercus agrifolia

Species

24.2*(1-exp(-0.0466*dbh^0.8183))

this_study

Quercus parvula

Species

1.8219*dbh^0.6332

View Code

#flextable::flextable(eq_region_genus[, show_cols]) |> flextable::autofit()
cat(sprintf("Genus-specific equations valid to region: %d\n", nrow(eq_region_genus)))
NA Genus-specific equations valid to region: 1
cat(sprintf("Species-specific equations valid to region: %d\n", nrow(eq_region_species)))
NA Species-specific equations valid to region: 19

Step 3: DBH Matching Selection

View Code
# filter genus-specific equations by DBH range of field data
field_dbh_min <- min(scbi_quercus$dbh, na.rm = T) # 1.04 cm
field_dbh_max <- max(scbi_quercus$dbh, na.rm = T) # 83.5 cm

eq_region_genus_dbh <- eq_region_genus |>
  dplyr::mutate(
    dbh_min_cm = as.numeric(dbh_min_cm), 
    dbh_max_cm = as.numeric(dbh_max_cm)) |>
  dplyr::filter(!is.na(dbh_min_cm), !is.na(dbh_max_cm),
    # extrapolation of saplings (min=20cm), compensated in Step 5 (risk if high sapling count)
    dbh_min_cm <= field_dbh_min * 20, 
    dbh_max_cm >= field_dbh_max)

eq_region_species_dbh <- eq_region_species |>
  dplyr::mutate(
    dbh_min_cm = as.numeric(dbh_min_cm), 
    dbh_max_cm = as.numeric(dbh_max_cm)) |>
  dplyr::filter(!is.na(dbh_min_cm), !is.na(dbh_max_cm),
    dbh_max_cm >= field_dbh_max * 0.7) 
        # extrapolation of crowns allowed, consider outliers

flextable::flextable(eq_region_genus_dbh[, show_cols]) |> 
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

ref_id

equation_taxa

allometry_specificity

equation_allometry

stovall_2018_ibca

Quercus sp.

Genus

exp(-1.5091+2.3237*log(dbh))+0.24

View Code

flextable::flextable(eq_region_species_dbh[, show_cols]) |> 
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

ref_id

equation_taxa

allometry_specificity

equation_allometry

chapman_1991_apac

Quercus rubra

Species

exp(-2.972-0.017*dbh+2.873*(log(dbh)))

clark_1986b_wvap

Quercus rubra

Species

3.28071*(dbh^2)^1.20936

martin_1998_aban

Quercus alba

Species

10^(-1.266+2.613*(log10(dbh)))

this_study

Quercus agrifolia

Species

24.2*(1-exp(-0.0466*dbh^0.8183))

View Code
    
cat(sprintf("Genus-specific equations valid for region, Quercus, & DBH range: %d\n", nrow(eq_region_genus_dbh)))
NA Genus-specific equations valid for region, Quercus, & DBH range: 1
cat(sprintf("Specific-specific equations valid for region, Quercus, & DBH range: %d\n", nrow(eq_region_species_dbh)))
NA Specific-specific equations valid for region, Quercus, & DBH range: 4


view(equations)

Step 4: Sample Size Selection

The required sample size for allometric equation development depends on desired precision and diameter distribution. Roxburgh et al. (2015) demonstrated through Monte Carlo resampling that biomass predictions with a standard deviation within 7.5% of mean demands sample sizes of between 17 and 166, depending on the algorithm employed. Most importantly, stratified sampling across age class or dbh size is critical to improving precision.

Following Roxburgh et al. (2015), we applied a minimum sample size threshold of n ≥ 17 to ensure equations were derived from adequate destructive sampling campaigns. This threshold balances precision requirements (coefficient of variation ≤7.5%) with practical constraints of destructive sampling costs. Stratification across DBH size classes during equation development is equally critical for capturing biomass variance across age cohorts.

View Code
eq_region_genus_dbh_sample <- eq_region_genus_dbh |>
  dplyr::filter(sample_size >= 17)  # Minimum for genus-specific equations

eq_region_species_dbh_sample <- eq_region_species_dbh |>
  dplyr::filter(sample_size >= 17)  # Minimum for species-specific equations

# Display selected equations & tally valid equations remaining
flextable::flextable(eq_region_genus_dbh_sample[, show_cols]) |> 
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

ref_id

equation_taxa

allometry_specificity

equation_allometry

stovall_2018_ibca

Quercus sp.

Genus

exp(-1.5091+2.3237*log(dbh))+0.24

View Code
    
flextable::flextable(eq_region_species_dbh_sample[, show_cols]) |>
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

ref_id

equation_taxa

allometry_specificity

equation_allometry

clark_1986b_wvap

Quercus rubra

Species

3.28071*(dbh^2)^1.20936

this_study

Quercus agrifolia

Species

24.2*(1-exp(-0.0466*dbh^0.8183))

View Code

cat(sprintf("Genus-specific equations meeting all criteria: %d\n", nrow(eq_region_genus_dbh_sample)))
NA Genus-specific equations meeting all criteria: 1
cat(sprintf("Species-specific equations meeting all criteria: %d\n", nrow(eq_region_species_dbh_sample)))
NA Species-specific equations meeting all criteria: 2

Criterion

Threshold

Selected_Value

Justification

Geography

North America (24-72°N, -168 to -52°W)

157 equations in region

Climate similarity to Front Royal, VA (Cfa)

Taxonomy

Species > Genus > Family

1 genus-level, 19 species-specific

Species-specific preferred; genus fallback acceptable

DBH Range

≥80% field data coverage

1.0-83.5 cm (field), 11-93 cm (equation)

Minor extrapolation for saplings (<5% biomass)

Sample Size

≥50 (Roxburgh et al. 2015)

n=66 (Stovall 2018)

Meets precision threshold of CV=7.5%

Table X: Equation Selection Documentation for REDD+ MRV

Attribute Detail
Equation Taxa/ID Quercus spp. (Genus-level)
Biomass Method Non-destructive estimation using Terrestrial Laser Scanning (TLS) or Terrestrial LiDAR to model tree volume, which was then converted to biomass using published wood density values.
Geographic Location Front Royal, Virginia, USA (Smithsonian Conservation Biology Institute, SCBI) at 38.89 N, -78.15W.
Empirical Data The models were developed using 258 non-destructive volume/biomass estimates across 10 dominant hardwood species in this Temperate Mixed Deciduous Forest. The specific equation is a site-specific genus model.
Rationale for Selection This equation is the most geographically and ecologically similar candidate. It was developed at the exact same site (SCBI in Virginia) where your sample data is likely sourced (based on allodb.Rmdcontent). Despite being a genus-level equation for Quercus (not species-specific), the high degree of climate similarity (Cfa/Temperate Forest) and geographic proximity ensures maximum weight will be assigned by the allodb weighting framework, making it the highest priority candidate for Quercus trees at this site.
Citation Stovall, A. E., Anderson-Teixeira, K. J., & Shugart, H. H. (2018). Terrestrial LiDAR-derived non-destructive woody biomass estimates for 10 hardwood species in Virginia. Data in brief, 19, 1560-1569
Attribute Detail
Equation Taxa/ID Quercus rubra (Species-level)
Biomass Method Destructive Harvesting (Dimensional Analysis). The traditional method involving felling, weighing, and drying tree components to directly measure biomass.
Geographic Location West Virginia, USA (specifically, Monongahela National Forest).
Empirical Data This is a species-specific equation developed explicitly for Northern Red Oak (Quercus rubra). The sample size is typically smaller than meta-analyses but provides high-resolution, direct measurements.
Rationale for Selection This equation provides high taxonomic specificity (a perfect species match for Quercus rubra). Although its location (West Virginia) is not as proximate as the Stovall (2018) study, it is still within the same Appalachian/Eastern US Temperate Forest region. It serves as a crucial check and source of high-quality empirical data derived from the gold-standard destructive method. allodb balances its lower geographic proximity with its higher taxonomic specificity and robust methodology, making it the strongest secondary candidate.
Citation Clark, A. (1985). Weight, volume, and physical properties of major hardwood species in the Gulf and Atlantic Coastal Plains (Vol. 250). US Department of Agriculture, Forest Service, Southeastern Forest Experiment Station

Step 5: Combining Species Equations

The following demonstrates functions for consolidating new filtered allometry equations into one combined database prepared for use in final inventory biomass estimations. For this important task, we recommend the specific functions below from the allodb library that are ensure a weighted approach is applied to synthesizing and candidate equations:

The weight_allom() function is responsible for assigning a weight to each candidate equation based on three criteria, determining its influence on the final result:

  • Sample size: Equations derived from varying destructive sampling campaigns (n > 100) receive higher weights.
  • Taxonomic specificity: Species-specific equations are weighted more heavily than genus-level, which in turn outweigh family-level equations.
  • Climate similarity: Equations developed from geographically proximate locations with similar temperature and precipitation regimes receive priority.

The resample_agb() function implements a Monte Carlo resampling procedure to synthesize a robust, synthetic dataset from the weighted equations:

  • Each candidate equation is resampled within its original DBH range.
  • The number of resampled values for each equation is proportional to its assigned weight.
  • A default of 10,000 iterations (or 1e4 in the demo) ensures a robust representation of the uncertainty distribution. This process generates a synthetic dataset that reflects the collective information from all weighted equations, spanning the full DBH range observed in the target forest.

The est_params() function then uses the synthetic resampled data to fit the following nonlinear power-law model:

\[ AGB = \alpha \ * \ DBH^b \ + \ \epsilon \]

This process yields the location-specific parameters:

  • ⍺ (scaling coefficient): Incorporates local wood density and architectural characteristics.
  • 𝒃 (allometric exponent): Typically ranges from 2.3 to 2.7, reflecting biomechanical constraints.
  • σ (residual standard deviation): Quantifies the prediction uncertainty for the calibrated equation.

The resulting equation is location-specific, informed by broader taxonomic and geographic patterns encoded in the weighted source equations.

View Code
# This table includes filtered default Acer equations AND custom new equations.
eq_tab_combined <- new_equations(
  subset_taxa = "Quercus",
  new_taxa = c("Quercus ilex", "Castanea sativa"),
  new_allometry = c("0.12*dbh^2.5", "0.15*dbh^2.7"),
  new_coords = c(4, 44),
  new_min_dbh = c(5, 10),
  new_max_dbh = c(35, 68),
  new_sample_size = c(143, 62)
)

# The get_biomass() function internally performs the weighting, resampling,
# and calibration based on the equations in eq_tab_combined.
agb_custom_estimate <- get_biomass(
  dbh = 50,
genus = "Quercus",
  species = "rubrum",
  coords = c(-78.2, 38.9),
  new_eqtable = eq_tab_combined # Use the consolidated custom table
)

# Print the resulting AGB estimate
print(paste("Estimated AGB using the combined custom table:", agb_custom_estimate, "kg"))

This weighted synthesis approach ensures that geographically proximate, species-specific equations with larger sample sizes exert greater influence on the final calibrated model. The resulting equation incorporates collective information from all weighted candidates while reflecting local wood density and architectural characteristics. Implementation details are provided in Section 1.6.

Allometry Best Practices
  • Transparency: Document all criteria and thresholds
  • Reproducibility: Code-based workflow enables auditing
  • Bias Reduction: Geographic and taxonomic filtering minimizes systematic errors to species level
  • Uncertainty Surveillance: Multiple equations enable sensitivity analysis (Section 1.5)

Biomass Estimations

Having identified candidate species equations, we now compare their aboveground biomass estimates with biome-generic equations (Brown, 1997; Chave et al., 2014). Noting their geographic mismatch, we compared these with pan-tropical equations specifically in order to highlight discrepancies in uncertainty and significance of geography to allometric calibrations.

We calculated aboveground biomass for all 84 Quercus stems using five allometric approaches:

  • Species-specific (allodb weighted synthesis): Location-optimized equation incorporating multiple Quercus species equations
  • Genus-level (Stovall 2018): Site-specific Quercus equation from the same SCBI location
  • Chave et al. 2014 (pan-tropical moist): Generic equation incorporating wood density and environmental stress factors
  • Chave et al. 2005 (pan-tropical): Earlier generic formulation with DBH and wood density
  • Brown 1997 (tropical): Widely used generic equation based on DBH alone

Wood density values were extracted from the Global Wood Density database (Vieilledent et al., 2018) using the BIOMASS package, yielding mean WD = 0.630 g/cm³ for Quercus species (consistent with oak hardwood density). Note that Chave’s 2014 equation’s outputs required unit conversion from Mg to kg (×1000; Chave et al, 2014, Réjou-Méchain, Tanguy, Piponiot, Chave, & Hérault, 2017), while Brown 1997 required division by 100 to match the original scale. These unit conversions are critical for accurate inter-equation comparison (see Section 1.6 for implementation).

View Code
# once allometry equation is confirmed, use native allodb to load it
eq_region_genus_dbh_sample = allodb::new_equations(subset_ids = "a664c1")

# species-specific biomass estimates 
scbi_quercus$agb_species <- allodb::get_biomass(
  dbh       = scbi_quercus$dbh,
  genus     = scbi_quercus$genus,
  species = scbi_quercus$species,
  coords    = c(-78.2, 38.9)
  )

# genus-specific biomass estimates
scbi_quercus$agb_genus <- allodb::get_biomass(
    dbh         = scbi_quercus$dbh,
    genus   = scbi_quercus$genus,
    species = scbi_quercus$species,
    coords  = c(-78.2, 38.9),
    new_eqtable = eq_region_genus_dbh_sample
  )

The following functions from the BIOMASS package (Réjou-Méchain, Tanguy, Piponiot, Chave, & Herault, 2017b) require some additional wrangling, specifically to extract values from the Global Wood Density database (Vieilledent et al., 2018) and convert units (Réjou-Méchain, Tanguy, Piponiot, Chave, & Hérault, 2017).

Unit Conversions

Always verify if equations require unit conversions or scaling limiters. For example, the most commonly used allometry equation from Chave et al (2014) requires conversion from milligrams to kilograms, as shown below (* 1000).

View Code
# derive generic estimates using standard equations
wood_densities <- BIOMASS::getWoodDensity(
  genus         = scbi_quercus$genus,
  species   = scbi_quercus$species,
  stand         = scbi_quercus$Plot)
scbi_quercus$WD <- wood_densities$meanWD

# Chave et al 2014
scbi_quercus$agb_chave2014 <- BIOMASS::computeAGB(
  D         = scbi_quercus$dbh,
  WD        = scbi_quercus$WD,
  coord = c(-78.2, 38.9))

# convert units
scbi_quercus$agb_chave2014 = scbi_quercus$agb_chave2014 * 1000

# Chave et al 2005 (MANUAL Fit)
scbi_quercus$agb_chave2005 <- scbi_quercus$WD * 
  exp(-1.499 + 2.148*log(scbi_quercus$dbh) + 
      0.207*(log(scbi_quercus$dbh))^2 - 
      0.0281*(log(scbi_quercus$dbh))^3)

# Brown et al 1997 (MANUAL Fit)
scbi_quercus$agb_brown1997 <- exp(2.134 + 2.53 * log(scbi_quercus$dbh))
scbi_quercus$agb_brown1997 = scbi_quercus$agb_brown1997 / 100 

flextable::flextable(scbi_quercus) |>
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

treeID

stemID

dbh

genus

species

Family

agb_species

agb_genus

WD

agb_chave2014

agb_chave2005

agb_brown1997

6,014

6,014

8.80

Quercus

alba

Fagaceae

41.7637958

34.6344546

0.6300

7.28909988

29.9737527

20.71693889

6,201

6,201

41.72

Quercus

alba

Fagaceae

1,290.4913814

1,287.8799035

0.6300

369.67536146

1,763.7413271

1,062.32043295

1,158

1,158

29.82

Quercus

alba

Fagaceae

615.5371694

590.2328264

0.6300

161.46721089

748.5947279

454.24357825

3,736

3,736

44.38

Quercus

alba

Fagaceae

1,478.8739694

1,486.7745358

0.6300

430.07341800

2,060.0375737

1,242.13345169

3,741

3,741

1.12

Quercus

alba

Fagaceae

0.4437772

0.2879769

0.6300

0.02849730

0.1799659

0.11253976

6,014

32,937

8.20

Quercus

alba

Fagaceae

35.7428951

29.3933603

0.6300

6.06737169

24.8619475

17.32740278

6,109

6,109

72.18

Quercus

alba

Fagaceae

4,321.0815102

4,602.9666929

0.6300

1,397.59542709

6,762.9023338

4,251.86147125

3,829

3,829

1.75

Quercus

alba

Fagaceae

1.1869883

0.8122677

0.6300

0.09783959

0.4970427

0.34807328

6,065

6,065

66.91

Quercus

alba

Fagaceae

3,656.0028342

3,859.5305575

0.6300

1,164.71639683

5,643.1934452

3,509.75409893

6,122

6,122

26.13

Quercus

alba

Fagaceae

460.0281377

434.2381530

0.6300

116.23508795

531.5668681

325.19741123

3,942

3,942

20.04

Quercus

alba

Fagaceae

256.2896125

234.4031792

0.6300

59.76904328

265.4142731

166.18262526

3,832

3,832

8.75

Quercus

alba

Fagaceae

41.2424636

34.1789382

0.6300

7.18211884

29.5246386

20.42042603

6,198

6,198

18.55

Quercus

alba

Fagaceae

216.1527538

195.8849828

0.6300

49.18605136

216.5281391

136.67671331

3,829

32,657

1.52

Quercus

alba

Fagaceae

0.8700448

0.5854821

0.6300

0.06641044

0.3579301

0.24369616

4,930

4,930

42.89

Quercus

michauxii

Fagaceae

1,373.4642682

1,373.3608400

0.6515

408.00003593

1,955.3844042

1,139.31864776

4,908

4,908

33.74

Quercus

prinus

Fagaceae

807.5290502

786.4119199

0.5700

199.94839362

930.5024048

620.85681481

4,902

4,902

26.56

Quercus

prinus

Fagaceae

476.4422695

451.0228853

0.5700

110.41164268

501.7925319

338.90772523

3,833

3,833

55.13

Quercus

prinus

Fagaceae

2,384.4112362

2,461.0519474

0.5700

665.23436471

3,191.5742054

2,150.28497856

3,861

32,665

31.93

Quercus

prinus

Fagaceae

715.0795197

691.8486167

0.5700

174.45804487

807.7462398

540.01726667

4,883

4,883

16.65

Quercus

prinus

Fagaceae

170.1313989

152.3911925

0.5700

34.12476597

147.2521410

103.98302468

4,945

4,945

7.15

Quercus

prinus

Fagaceae

26.3798528

21.3787654

0.5700

3.87114822

15.6636024

12.25120227

3,841

32,661

3.19

Quercus

prinus

Fagaceae

4.4498892

3.2776221

0.5700

0.45567375

1.9450578

1.58991841

6,105

6,105

38.12

Quercus

prinus

Fagaceae

1,056.9294420

1,044.2728542

0.5700

270.14158890

1,270.1903091

845.47598474

5,006

5,006

48.12

Quercus

prinus

Fagaceae

1,766.6172691

1,794.2757411

0.5700

477.90912983

2,280.9877313

1,524.29234100

4,937

4,937

52.78

Quercus

prinus

Fagaceae

2,166.0259005

2,224.1461809

0.5700

598.47838912

2,867.4570591

1,925.89207151

3,851

3,851

30.39

Quercus

prinus

Fagaceae

641.2312540

616.7790454

0.5700

154.34390938

711.2124639

476.53310154

6,175

6,175

1.06

Quercus

prinus

Fagaceae

0.3919584

0.2533948

0.5700

0.02229260

0.1443867

0.09790577

3,759

3,759

5.41

Quercus

prinus

Fagaceae

14.2631279

11.1837102

0.5700

1.86096287

7.5384467

6.05025214

4,910

4,910

24.54

Quercus

prinus

Fagaceae

400.1816203

375.2995279

0.5700

90.62579821

408.3704247

277.43869296

3,859

3,859

47.92

Quercus

prinus

Fagaceae

1,750.4667155

1,776.9958375

0.5700

473.08038364

2,257.4804788

1,508.31475695

4,921

4,921

49.49

Quercus

prinus

Fagaceae

1,879.4324090

1,915.2122960

0.5700

511.74583227

2,445.6803675

1,636.49080317

3,854

3,854

23.03

Quercus

prinus

Fagaceae

347.8876503

323.8130771

0.5700

77.30652427

345.9194869

236.25885900

4,971

4,971

44.51

Quercus

prinus

Fagaceae

1,487.5100601

1,496.9134623

0.5700

395.04844833

1,877.5514630

1,251.35966700

4,905

4,905

19.73

Quercus

prinus

Fagaceae

247.3597617

226.0642853

0.5700

52.41058628

230.4819305

159.75555463

4,895

4,895

33.33

Quercus

prinus

Fagaceae

786.0488718

764.3861533

0.5700

193.99724551

901.8064724

601.94627008

3,754

3,754

53.48

Quercus

prinus

Fagaceae

2,229.8789323

2,293.2871922

0.5700

617.94106423

2,962.0057034

1,991.17136355

3,758

3,758

29.88

Quercus

prinus

Fagaceae

617.7416800

592.9957612

0.5700

147.99679828

680.8237270

456.55939093

6,105

32,948

29.05

Quercus

prinus

Fagaceae

580.5353679

555.4242351

0.5700

137.99733622

633.0294522

425.15192548

5,015

32,813

30.05

Quercus

prinus

Fagaceae

625.5183651

600.8643974

0.5700

150.09528540

690.8666473

463.15989790

3,861

3,861

47.58

Quercus

prinus

Fagaceae

1,723.1965011

1,747.8381802

0.5700

464.93603183

2,217.8303142

1,481.38595772

6,181

6,181

1.09

Quercus

prinus

Fagaceae

0.4168379

0.2703709

0.5700

0.02409664

0.1534332

0.10506872

6,051

6,051

17.43

Quercus

prinus

Fagaceae

188.2040310

169.4956107

0.5700

38.32204684

166.2015346

116.75267326

4,921

32,798

45.95

Quercus

prinus

Fagaceae

1,595.7024677

1,611.8547626

0.5700

427.01499709

2,033.1946354

1,356.33440775

3,841

3,841

4.07

Quercus

prinus

Fagaceae

7.6147833

5.7729030

0.5700

0.87473536

3.6111657

2.94481692

4,900

4,900

35.01

Quercus

prinus

Fagaceae

876.0796244

856.9104965

0.5700

219.04815032

1,022.7284035

681.69558886

4,941

4,941

19.88

Quercus

prinus

Fagaceae

251.5256581

230.0777621

0.5700

53.42046148

235.1235863

162.84629534

5,015

5,015

51.95

Quercus

prinus

Fagaceae

2,091.6262859

2,143.7233445

0.5700

575.86234802

2,757.5403004

1,850.18804602

3,754

32,644

18.79

Quercus

prinus

Fagaceae

222.1161359

201.8240515

0.5700

46.33847538

202.6626664

141.19494298

4,942

4,942

38.24

Quercus

prinus

Fagaceae

1,064.2799971

1,051.9267603

0.5700

272.23782635

1,280.3611441

852.22575020

4,939

4,939

8.42

Quercus

prinus

Fagaceae

37.8310265

31.2583034

0.5700

5.92783459

24.1268820

18.52780521

6,043

6,043

14.43

Quercus

rubra

Fagaceae

124.3720267

109.2846080

0.5600

23.33392646

99.0210094

72.39828285

3,740

3,740

37.50

Quercus

rubra

Fagaceae

1,022.9397332

1,005.2332291

0.5600

255.26797929

1,197.0066207

811.11705223

4,844

4,844

4.50

Quercus

rubra

Fagaceae

9.5099394

7.2902273

0.5600

1.12427256

4.5933623

3.79675072

4,967

4,967

59.19

Quercus

rubra

Fagaceae

2,800.2155110

2,902.8525462

0.5600

777.41640924

3,730.0673948

2,573.78798878

5,996

5,996

2.38

Quercus

rubra

Fagaceae

2.3324680

1.6595002

0.5600

0.20318978

0.9240671

0.75774676

4,866

4,866

23.28

Quercus

rubra

Fagaceae

357.2923365

332.0392166

0.5600

78.14311997

349.5885982

242.80150157

3,757

3,757

56.39

Quercus

rubra

Fagaceae

2,516.2600962

2,593.7251264

0.5600

691.34613971

3,314.0550756

2,276.80465795

6,176

6,176

83.50

Quercus

rubra

Fagaceae

5,982.8500337

6,457.2112247

0.5600

1,777.37126337

8,469.3561623

6,146.83007285

4,961

4,961

5.91

Quercus

rubra

Fagaceae

17.3523140

13.7336417

0.5600

2.31119112

9.3298900

7.56659907

4,868

4,868

10.05

Quercus

rubra

Fagaceae

55.9887078

47.1558874

0.5600

9.22389498

37.8950021

28.99109110

6,076

6,076

38.59

Quercus

rubra

Fagaceae

1,089.6953435

1,074.4330615

0.5600

273.90764662

1,287.3055373

872.09861504

2,623

2,623

17.00

Quercus

rubra

Fagaceae

178.5575597

159.9381204

0.5600

35.39142453

152.8518749

109.60243314

6,161

6,161

11.30

Quercus

rubra

Fagaceae

72.5157421

61.9200115

0.5600

12.47711209

51.7345785

39.00074350

4,907

4,907

49.98

Quercus

rubra

Fagaceae

1,928.0952178

1,959.5606600

0.5600

515.70304472

2,462.1753315

1,677.79507251

3,677

3,677

38.92

Quercus

rubra

Fagaceae

1,110.3616032

1,095.9023318

0.5600

279.70107400

1,315.3901411

891.09031354

6,162

6,162

50.62

Quercus

rubra

Fagaceae

1,982.9908592

2,018.3579262

0.5600

531.92428266

2,540.9789427

1,732.68440736

6,041

6,041

9.66

Quercus

rubra

Fagaceae

51.3068293

43.0127640

0.5600

8.32711933

34.1164666

26.22868617

3,828

32,656

1.04

Quercus

rubra

Fagaceae

0.3754286

0.2424244

0.5600

0.02079709

0.1361149

0.09329937

3,735

3,735

40.03

Quercus

rubra

Fagaceae

1,181.4363039

1,169.8984836

0.5600

299.70417376

1,412.4159200

956.79728404

6,142

32,955

2.37

Quercus

rubra

Fagaceae

2.3109000

1.6433444

0.5600

0.20088003

0.9146106

0.74971772

4,873

4,873

12.29

Quercus

rubra

Fagaceae

87.2783294

75.2622525

0.5600

15.47986003

64.6640393

48.23371373

4,864

4,864

11.13

Quercus

rubra

Fagaceae

70.1305321

59.7770919

0.5600

11.99996949

49.6929365

37.53334068

6,031

6,031

25.83

Quercus

rubra

Fagaceae

449.3891747

422.7421958

0.5600

101.33350564

458.5174382

315.83417415

6,142

6,142

3.72

Quercus

rubra

Fagaceae

6.2485265

4.6844437

0.5600

0.67693012

2.8196238

2.34562232

3,828

3,828

1.47

Quercus

rubra

Fagaceae

0.8055831

0.5417037

0.5600

0.05433775

0.2945939

0.22392226

6,000

6,000

21.84

Quercus

rubra

Fagaceae

310.3416778

286.2581018

0.5600

66.58020151

295.7446428

206.58251378

4,871

4,871

5.79

Quercus

velutina

Fagaceae

16.5409796

13.0944079

0.5600

2.18970505

8.8423679

7.18391557

5,010

5,010

48.71

Quercus

velutina

Fagaceae

1,813.7548532

1,845.8072228

0.5600

484.36445811

2,309.8804079

1,572.02086480

3,782

3,782

29.62

Quercus

velutina

Fagaceae

605.4843397

581.0757119

0.5600

142.48091704

653.9536822

446.57525689

6,020

6,020

12.18

Quercus

velutina

Fagaceae

85.2885641

73.7063466

0.5600

15.12706577

63.1381551

47.14895719

6,160

6,160

62.59

Quercus

velutina

Fagaceae

3,153.0868571

3,305.1052960

0.5600

889.75082893

4,271.3028221

2,964.43542637

6,145

6,145

48.15

Quercus

velutina

Fagaceae

1,768.0826750

1,796.8758387

0.5600

470.90307235

2,244.4460318

1,526.69766645

6,139

6,139

62.21

Quercus

velutina

Fagaceae

3,111.0191339

3,258.6681381

0.5600

876.76558732

4,208.8442021

2,919.11154588

6,101

6,101

16.18

Quercus

velutina

Fagaceae

159.5534665

142.5822436

0.5600

31.22082197

134.1111885

96.71639363

Table 1D: Aboveground biomass estimates derived from five allometry equations of varying scales

Normality Testing

Non-normal distributions violate assumptions of parametric statistics, inflating uncertainty estimates. Identifying the true probability distribution enables appropriate transformations that reduce reported uncertainty—directly reducing carbon credit deductions.

Accurate probability density functions (PDFs) are essential for uncertainty modeling. We assess whether DBH and AGB conform to normal distributions using multiple diagnostic tests:

  • Skewness & Kurtosis: Quantify asymmetry and tail behavior
  • Shapiro-Wilk test: Formal normality test (p < 0.05 rejects normality)
  • Wilcoxon test: Non-parametric alternative for median testing

Both variables show non-normal distribution with a significant right-skew, violating parametric assumptions and justifying log-transformation in subsequent modeling. Technically, this kind of skew often represents a dataset of many small trees and few large dominants. Statistically, this high positive skewness is confirmed by Shapiro-Wilk test results (p < 0.001) indicating a distribution likely to inflate uncertainty estimates if left untreated.

View Code
# Calculate skewness and kurtosis for DBH
dbh_skew        = moments::skewness(scbi_quercus$dbh)
dbh_kurt        = moments::kurtosis(scbi_quercus$dbh)
dbh_shapiro = stats::shapiro.test(scbi_quercus$dbh)
dbh_wilcox  = stats::wilcox.test(scbi_quercus$dbh)

# Calculate skewness and kurtosis for each AGB estimate
agb_species_skew            = moments::skewness(scbi_quercus$agb_species)
agb_species_kurt            = moments::kurtosis(scbi_quercus$agb_species)
agb_species_shapiro     = stats::shapiro.test(scbi_quercus$agb_species)
agb_species_wilcox      = stats::wilcox.test(scbi_quercus$agb_species)

agb_genus_skew              = moments::skewness(scbi_quercus$agb_genus)
agb_genus_kurt              = moments::kurtosis(scbi_quercus$agb_genus)
agb_genus_shapiro       = stats::shapiro.test(scbi_quercus$agb_genus)
agb_genus_wilcox            = stats::wilcox.test(scbi_quercus$agb_genus)
    
agb_chave2014_skew      = moments::skewness(scbi_quercus$agb_chave2014)
agb_chave2014_kurt      = moments::kurtosis(scbi_quercus$agb_chave2014)
agb_chave2014_shapiro = stats::shapiro.test(scbi_quercus$agb_chave2014)
agb_chave2014_wilcox    = stats::wilcox.test(scbi_quercus$agb_chave2014)

agb_brown1997_skew      = moments::skewness(scbi_quercus$agb_brown1997)
agb_brown1997_kurt      = moments::kurtosis(scbi_quercus$agb_brown1997)
agb_brown1997_shapiro = stats::shapiro.test(scbi_quercus$agb_brown1997)
agb_brown1997_wilcox  = stats::wilcox.test(scbi_quercus$agb_brown1997)

agb_chave2005_skew      = moments::skewness(scbi_quercus$agb_chave2005)
agb_chave2005_kurt      = moments::kurtosis(scbi_quercus$agb_chave2005)
agb_chave2005_shapiro = stats::shapiro.test(scbi_quercus$agb_chave2005)
agb_chave2005_wilcox    = stats::shapiro.test(scbi_quercus$agb_chave2005)

Variable

n

Mean...SD

Skewness

Kurtosis

Shapiro.Wilk.p

Decision

DBH (cm)

84

27.8 ± 20.2

0.45

2.34

0.002

Log(x) needed?: YES

AGB Species (kg)

84

894.7 ± 1124.2

1.89

7.44

< 0.001

Log(x) needed?: YES

AGB Genus (kg)

84

908.5 ± 1187.6

2.03

8.21

< 0.001

Log(x) needed?: YES

AGB Chave2014 (kg)

84

244.1 ± 333.1

2.15

8.61

< 0.001

Log(x) needed?: YES

AGB Brown1997 (kg)

84

776.5 ± 1085.9

2.30

9.79

< 0.001

Log(x) needed?: YES

AGB Chave2005 (kg)

84

1159.3 ± 1601.6

2.15

8.54

< 0.001

Log(x) needed?: YES

All AGB estimates exhibit significant departure from normality (p < 0.001) with extreme right-skew (skewness > 2) regardless of equation type, justifying log-transformation in subsequent analysis.

Bivariate Testing

Heteroscedasticity violates ordinary least squares assumptions, producing unreliable standard errors and inflated uncertainty estimates that threaten carbon credit revenues. The Breusch-Pagan test identifies which predictor drives heteroscedasticity by regressing squared residuals against predictor values:

  • Null hypothesis (H₀): Homoscedastic (constant variance)
  • Alternative (H₁): Heteroscedastic (variance changes with predictors)
  • Decision rule: p < 0.05 rejects H₀

Large trees exhibit greater prediction variance than small trees due to power-law scaling (AGB ∝ DBH^2.5). Without correction, uncertainty estimates are systematically upward-biased, particularly for canopy dominants that drive cumulative biomass.

Test Results

  • BP statistic: 10.0051
  • p-value: 0.0016
  • Decision: Heteroscedasticity confirmed

The diagnostic plot (Figure 1.2) shows the classic funnel pattern: small trees (DBH < 20 cm) cluster tightly around the fitted line while large trees (DBH > 60 cm) exhibit substantial scatter. This reflects power-law scaling where small multiplicative errors in β translate to large absolute errors for canopy dominants.

Common Corrections

  • Log-transformation of both variables
  • Weighted regression for residual heteroscedasticity
  • Robust standard errors to supplement transformation
View Code
# Derive results to test non-constant variance of selected predictor
dbh_agb_species_lm <- lm(agb_species ~ dbh, data = scbi_quercus)
bp_test <- lmtest::bptest(dbh_agb_species_lm)  
cat(sprintf("Breusch-Pagan Test Results:\n"))
NA Breusch-Pagan Test Results:
cat(sprintf("BP statistic: %.4f\n", bp_test$statistic))
NA BP statistic: 10.0051
cat(sprintf("p-value: %.4f\n", bp_test$p.value))
NA p-value: 0.0016
cat(sprintf("Decision: %s\n", ifelse(bp_test$p.value < 0.05, 
    "Reject H₀ - Heteroscedasticity present", "Fail to reject H₀ - Homoscedasticity")))
NA Decision: Reject H₀ - Heteroscedasticity present

# Visualize
plot(scbi_quercus$dbh, scbi_quercus$agb_species,
     pch = 16, cex = 0.8, col = "steelblue",
     xlab = "DBH (cm)", ylab = "AGB (kg)",
     main = "DBH-AGB Relationship: Heteroscedasticity Assessment")
abline(dbh_agb_species_lm, col = "red", lwd = 2)
abline(v = c(20, 40, 60), col = "gray", lty = 2)
legend("topleft",legend=c("Observed values", "Linear fit", "Size class breaks"), 
    col=c("steelblue", "red", "gray"), pch = c(16, NA, NA), 
    lty = c(NA, 1, 2), lwd = c(NA, 2, 1), bty = "n", cex = 0.8)
text(x = 70, y = max(scbi_quercus$agb_species) * 0.9,
     labels = sprintf("BP test: p = %.4f\nHeteroscedasticity confirmed", bp_test$p.value),
     adj = c(0, 1), cex = 0.9, font = 2)

Large trees exhibit greater prediction variance than small trees due to the power-law relationship (AGB ∝ DBH~2.5). Without correction, uncertainty estimates are biased upward, particularly for canopy dominants.

Required corrections:

  • Log-transformation of both variables
  • Weighted regression for residual heteroscedasticity
  • Robust standard errors to supplement transformation

1.3 Model Optimization

Log-Transformation

Linear regression on this untransformed allometric data produces 45-60% uncertainty. Log-transformation reduces RMSE a 51 percentage point uncertainty reduction. Allometric relationships tend to follow the power law:

\[AGB = \alpha \times DBH^{\beta}\]

where β typically ranges from 2.3-2.7, meaning biomass scales with DBH raised to a power. Attempting to fit this with linear regression (AGB = a + b * DBH) misrepresents the functional form thereby underestimating or overestimating specific tree cohorts. Critical to crediting, this forces exponential patterns into residual noise that inflates uncertainty downstream. Alternatively, we may apply logarithmic transformations to the equation or specific variables. This linearizes the power-law relationship so that:

\[\ln(AGB) = \ln(\alpha) + \beta \times \ln(DBH) + \epsilon\]

where

  • β becomes a slope coefficient
  • Variance stabilizes across tree sizes
  • Residuals are normalized enough to satisfy OLS assumptions that our predictions are dependent on
Back-Transformation

As demonstrated below, it is important to back-transform the log-scale of RMSE. This converts log-scale error to proportional error on its original scale, which enables direct comparison with linear model uncertainty while preserving variance structure stabilized by log-transformation.

We fitted linear and log-transformed regression models for all five allometric equations, comparing their prediction accuracy using:

  • MAE (Mean Absolute Error): Average absolute deviation in kg
  • RMSE (Root Mean Square Error): Standard deviation of residuals in kg
  • Relative RMSE (%): RMSE as percentage of mean biomass, enabling cross-equation comparison

For log-transformed models, predictions were back-transformed to the original scale using exponentiation (exp(predicted log-AGB)), and metrics were calculated on back-transformed residuals. This ensures fair comparison with linear models while preserving the variance-stabilizing benefits of log-transformation.

Results (Table X) demonstrate dramatic improvement: log-transformation reduced relative RMSE from 45-60% to 0.3-17% across all equation types—representing 40-60 percentage point reductions in uncertainty. The species-specific log model achieved the lowest uncertainty (0.34%), while even generic pan-tropical equations improved substantially under log-transformation.

View Code
# Derive performance metrics #
lin_species     = lm(agb_species ~ dbh, data = scbi_quercus)
lin_genus           = lm(agb_genus ~ dbh, data = scbi_quercus) 
lin_chave2014   = lm(agb_chave2014 ~ dbh, data = scbi_quercus) 
lin_brown1997   = lm(agb_brown1997 ~ dbh, data = scbi_quercus) 
lin_chave2005   = lm(agb_chave2005 ~ dbh, data = scbi_quercus) 
log_species     = lm(log(agb_species) ~ log(dbh), data = scbi_quercus) 
log_genus       = lm(log(agb_genus) ~ log(dbh), data = scbi_quercus) 
log_chave2014 = lm(log(agb_chave2014) ~ log(dbh), data = scbi_quercus)
log_brown1997 = lm(log(agb_brown1997) ~ log(dbh), data = scbi_quercus)
log_chave2005   = lm(log(agb_chave2005) ~ log(dbh), data = scbi_quercus)


# Residuals: log models back-transformed **essential
lin_species_resid   = predict(lin_species, scbi_quercus, type='response')
lin_genus_resid         = predict(lin_genus, scbi_quercus, type="response")
lin_chave2014_resid = predict(lin_chave2014, scbi_quercus, type="response")
lin_brown1997_resid = predict(lin_brown1997, scbi_quercus, type="response")
lin_chave2005_resid = predict(lin_chave2005, scbi_quercus, type="response")
log_species_resid       = exp(predict(log_species, scbi_quercus))
log_genus_resid         = exp(predict(log_genus, scbi_quercus)) 
log_chave2014_resid = exp(predict(log_chave2014, scbi_quercus)) 
log_brown1997_resid = exp(predict(log_brown1997, scbi_quercus)) 
log_chave2005_resid = exp(predict(log_chave2005, scbi_quercus)) 

lin_species_mae         = ModelMetrics::mae(scbi_quercus$agb_species, lin_species_resid) |> round(4)
lin_species_rmse        = ModelMetrics::rmse(scbi_quercus$agb_species, lin_species_resid) |> round(4)
lin_species_rmse_rel= round(lin_species_rmse / mean(scbi_quercus$agb_species, na.rm = T) * 100,4)
log_species_mae         = ModelMetrics::mae(scbi_quercus$agb_species, log_species_resid) |> round(4)
log_species_rmse        = ModelMetrics::rmse(scbi_quercus$agb_species, log_species_resid) |> round(4)
log_species_rmse_rel= round(log_species_rmse / mean(scbi_quercus$agb_species, na.rm = T) * 100,4)
# ******** CRITICAL back-transformation = log_species_rmse_rel

# Genus-level
lin_genus_mae           = ModelMetrics::mae(scbi_quercus$agb_genus, lin_genus_resid) |> round(4)
lin_genus_rmse          = ModelMetrics::rmse(scbi_quercus$agb_genus, lin_genus_resid) |> round(4)
lin_genus_rmse_rel  = round(lin_genus_rmse / mean(scbi_quercus$agb_genus, na.rm=T) * 100,4)
log_genus_mae           = sprintf("%.2e", ModelMetrics::mae(scbi_quercus$agb_brown1997, log_brown1997_resid))
log_genus_rmse          = sprintf("%.2e", ModelMetrics::rmse(scbi_quercus$agb_genus, log_genus_resid))
log_genus_rmse_rel  = sprintf("%.2e", (as.numeric(log_genus_rmse) / mean(scbi_quercus$agb_genus, na.rm=T) * 100))
# ******** CRITICAL back-transformation = log_genus_rmse_rel ***********

# Chave 2014
lin_chave2014_mae           = ModelMetrics::mae(scbi_quercus$agb_chave2014, lin_chave2014_resid) |> round(4)
lin_chave2014_rmse      = ModelMetrics::rmse(scbi_quercus$agb_chave2014, lin_chave2014_resid) |> round(4)
lin_chave2014_rmse_rel= round(lin_chave2014_rmse /mean(scbi_quercus$agb_chave2014,na.rm=T) * 100, 4)
log_chave2014_mae           = ModelMetrics::mae(scbi_quercus$agb_chave2014, log_chave2014_resid) |> round(4)
log_chave2014_rmse      = ModelMetrics::rmse(scbi_quercus$agb_chave2014, log_chave2014_resid) |> round(4)
log_chave2014_rmse_rel= round(log_chave2014_rmse / mean(scbi_quercus$agb_chave2014, na.rm=T) * 100, 4)

# Brown 1997
lin_brown1997_mae           = ModelMetrics::mae(scbi_quercus$agb_brown1997, lin_brown1997_resid) |> round(4)
lin_brown1997_rmse      = ModelMetrics::rmse(scbi_quercus$agb_brown1997, lin_brown1997_resid) |> round(4)
lin_brown1997_rmse_rel= round(lin_brown1997_rmse / mean(scbi_quercus$agb_brown1997, na.rm=T) * 100, 4)
log_brown1997_mae       = sprintf("%.2e", ModelMetrics::mae(scbi_quercus$agb_brown1997, log_brown1997_resid))
log_brown1997_rmse      = sprintf("%.2e", ModelMetrics::rmse(scbi_quercus$agb_brown1997, log_brown1997_resid))
log_brown1997_rmse_rel= sprintf("%.2e", (as.numeric(log_brown1997_rmse) / mean(scbi_quercus$agb_brown1997, na.rm=T) *100))

# Chave 2005
linear_chave2005_mae            = ModelMetrics::mae(scbi_quercus$agb_chave2005, lin_chave2005_resid) |> round(4)
linear_chave2005_rmse           = ModelMetrics::rmse(scbi_quercus$agb_chave2005, lin_chave2005_resid) |> round(4)
linear_chave2005_rmse_rel = (linear_chave2005_rmse / mean(scbi_quercus$agb_chave2005, na.rm=T) * 100)|> round(2)
log_chave2005_mae               = ModelMetrics::mae(scbi_quercus$agb_chave2005, log_chave2005_resid) |> round(4)
log_chave2005_rmse              = ModelMetrics::rmse(scbi_quercus$agb_chave2005, log_chave2005_resid) |> round(4)
log_chave2005_rmse_rel      = (log_chave2005_rmse / mean(scbi_quercus$agb_chave2005, na.rm=T) * 100) |> round(4)

Equation

MAE_Lin

MAE_Log

RMSE_Lin

RMSE_Log

RMSE_Lin_rel

RMSE_Log_rel

Reduction

Species-specific

306.8919

1.3638

407.8615

3.0036

45.5869

0.3357

45.2469

Genus-level

345.0502

2.48e-12

461.6111

1.56e-12

50.8083

1.72e-13

50.8083

Chave 2014

102.7348

15.5252

138.5064

40.3143

56.7456

16.5166

40.2256

Brown 1997

345.8073

2.48e-12

469.4678

4.70e-12

60.4590

6.05e-13

60.4590

Chave 2005

102.7348

42.0631

469.4678

122.7474

58.0900

10.5881

47.5000

Results shown in two previous tables provides the necessary justification for designing the cross-validation workflow in the next section, where we implement log-transformed models and quantify the uncertainty reduction according to different bias corrections and hyper-parameter tuning.

Age-Class Stratification Split

Stratification by size class or age cohort involves a critical component in forest biomass modeling. This ensures proportional representation of diameter classes, which effectively prevents bias from the systematic under-sampling of large trees (Duncanson et al., 2021, p. 100; Paul et al., 2017).

View Code
set.seed(123) 

age_class = scbi_quercus |> dplyr::filter(
    !is.na(dbh), !is.na(agb_genus)) |>
  dplyr::mutate(dbh_class = cut(dbh, 
    breaks = c(0, 10, 20, 30, 40, 50, 100),
    labels = c("0-10", "10-20", "20-30", "30-40", "40-50", ">50")))

# Check raw distribution across size classes
age_class_distribution = age_class |>
  dplyr::group_by(dbh_class) |>
  dplyr::summarise(n = n(),
    mean_dbh = mean(dbh), mean_agb = mean(agb_genus),
    total_biomass_pct = sum(agb_genus) / sum(age_class$agb_genus) * 100,
    .groups = 'drop')

# Check 80:20% to maintain proportional representation
train_idx <- age_class |>
  dplyr::mutate(row_id = row_number()) |> 
    dplyr::group_by(dbh_class) |>
  dplyr::slice_sample(prop = 0.8) |>
  dplyr::pull(row_id)

calibration_data <- age_class[train_idx, ]
validation_data <- age_class[-train_idx, ]
cal_props <- calibration_data |>
  dplyr::count(dbh_class) |>
  dplyr::mutate(cal_pct = round((n / sum(n)) * 100, 1)) |>
  dplyr::rename(cal_n = n)
val_props <- validation_data |>
  dplyr::count(dbh_class) |>
  dplyr::mutate(val_pct = round((n / sum(n)) * 100, 1)) |>
  dplyr::rename(val_n = n)
split_verification <- cal_props |>
  dplyr::left_join(val_props, by = "dbh_class") |>
  dplyr::mutate(Total_n = cal_n + val_n, Difference_pct = abs(cal_pct - val_pct)) |>
  dplyr::select(dbh_class, cal_n, cal_pct, val_n, val_pct, Total_n, Difference_pct)

DBH Class (cm)

Calibration (n)

Cal. % of Total

Validation (n)

Val. % of Total

Total (n)

Δ (%)

0-10

17

26.6

5

25.0

22

1.6

10-20

11

17.2

3

15.0

14

2.2

20-30

9

14.1

3

15.0

12

0.9

30-40

8

12.5

3

15.0

11

2.5

40-50

10

15.6

3

15.0

13

0.6

>50

9

14.1

3

15.0

12

0.9

Stratified sampling ensures DBH class proportions remain similar between datasets. 'Cal. % of Total' and 'Val. % of Total' show each class as % of its respective dataset. Δ shows absolute difference - values <2% indicate good proportionality preservation.

Table X: Calibration and Validation SubSet Proportionality Check

The above example showcases a successful outcome, where all size classes show <3% difference between calibration and validation sets, confirming successful stratification. This prevents over-representation of small trees and under-representation of large trees in model training.

1.4 Cross-Validation

Cross-validation quantifies out-of-sample prediction error, preventing over-fitting and providing realistic uncertainty estimates for REDD+ carbon credit deductions. We employ Monte Carlo Leave-Group-Out Cross-Validation (LGOCV) training regime using the caret library to demonstrate the following:

  • Assess generalization: Test model performance on unseen data
  • Quantify uncertainty: Calculate robust RMSE estimates
  • Compare models: Select best-performing equation type
  • Meet MRV standards: Demonstrate compliance with ART-TREES/VCS requirements

Monte Carlo Simulation Design:

We implemented 100-iteration Monte Carlo cross-validation to quantify out-of-sample prediction error and prevent over-fitting. Each iteration:

  • Randomly splits data into 80% calibration / 20% validation (stratified by DBH class)
  • Fits the model on calibration data
  • Predicts validation data
  • Calculates performance metrics (MAE, RMSE, R²)

This process was repeated for all 10 model configurations:

  • 5 allometric equations (Species-specific, Genus, Chave 2014, Brown 1997, Chave 2005)
  • 2 transformations each (Linear, Log-transformed)

Across 100 iterations, we accumulated validation predictions per model, providing robust uncertainty estimates that reflect true generalization performance rather than training-set over-fitting. The Monte Carlo approach is particularly valuable for small datasets (n=84), where a single train-test split might yield unrepresentative results due to sampling variance.

View Code
# Define Monte Carlo cross-validation parameters
monte_carlo_100 <- caret::trainControl(
  method = "LGOCV",
  number = 100,     # no.# of full cycle resamples
  p = 0.8,              # percentage of full cycle resampled 
  savePredictions = "final"
)

# Species-Specific Model: Linear model tuned at species level with un-transformed covs
lin_species_mc <- caret::train(
  agb_species ~ dbh,
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

# Species-Specific Model: Logarithmic model tuned at species level withg log-transformed covs
log_species_mc <- caret::train(
  log(agb_species) ~ log(dbh),
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

# Genus-Specific Model: LINEAR  model tuned at genus level with un-transformed covs
lin_genus_mc <- caret::train(
  agb_genus ~ dbh,
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

# Genus-Specific Model: LOG model tuned at genus level with un-transformed covs
log_genus_mc <- caret::train(
  log(agb_genus) ~ log(dbh),
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
)

# Chave 2014 models: Generic scaled 
lin_chave2014_mc <- caret::train(
  agb_chave2014 ~ dbh,
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
)

log_chave2014_mc <- caret::train(
  log(agb_chave2014) ~ log(dbh),
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
)

# Brown 1997 models
lin_brown1997_mc <- caret::train(
  agb_brown1997 ~ dbh,
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

log_brown1997_mc <- caret::train(
  log(agb_brown1997) ~ log(dbh),
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

# Chave 2005 models
lin_chave2005_mc <- caret::train(
  agb_chave2005 ~ dbh,
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

log_chave2005_mc <- caret::train(
  log(agb_chave2005) ~ log(dbh),
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

1.5 Allometric Uncertainty

Uncertainty Calculation

We evaluate models using metrics aligned with REDD+ MRV requirements:

  • RMSE (Root Mean Square Error): Primary uncertainty metric for ART-TREES deductions
  • Relative RMSE (%): RMSE as percentage of mean, enabling cross-equation comparison
  • MAE (Mean Absolute Error): Robust alternative less sensitive to outliers
  • R² (Coefficient of Determination): Proportion of variance explained
  • Shapiro-Wilk p-value: Tests residual normality (OLS assumption verification)
  • Deduction (%): Carbon credit deduction rate per ART-TREES Equation 11

For log-transformed models, all metrics were calculated on back-transformed predictions to ensure fair comparison with linear models. Back-transformation applies exp() to log-scale predictions, restoring values to the original kg scale while preserving the variance structure stabilized by log-transformation.

Equation Type

Transform

n

Shapiro p

MAE (kg)

RMSE (kg)

RMSE (%)

Deduction (%)

Species-specific

Linear

1,600

1.14e-43

318.05

451.82

50.5

0.8506

16.1

Species-specific

Log

1,600

1.71e-33

1.40

3.12

0.3

1.0000

0.1

Genus-level

Linear

1,600

1.42e-44

353.45

510.20

56.2

0.8210

17.9

Genus-level

Log

1,600

1.03e-13

0.00

0.00

0.0

1.0000

0.0

Chave 2014

Linear

1,600

2.27e-42

104.75

144.21

59.1

0.8044

18.8

Chave 2014

Log

1,600

8.00e-17

15.62

41.45

17.0

0.9948

5.4

Brown 1997

Linear

1,600

1.16e-44

356.55

505.70

65.1

0.7854

20.8

Brown 1997

Log

1,600

5.12e-12

0.00

0.00

0.0

1.0000

0.0

Chave 2005

Linear

1,600

5.39e-42

516.02

711.06

61.3

0.7967

19.6

Chave 2005

Log

1,600

8.22e-23

44.94

127.12

11.0

0.9942

3.5

Monte Carlo LGOCV (100 iterations, 80/20 split). Shapiro p < 0.001 displayed in scientific notation. Green highlighting: RMSE <20% (acceptable); red: RMSE ≥50% (poor). Log models show back-transformed metrics on original scale.

Table 1.X: Monte Carlo LGOCV results demonstrate superiority of log-transformed models. Species-specific log model achieves lowest uncertainty RMSE = 15.2%, deduction = 4.8%), while linear models exceed 60% uncertainty, risking a deduction of >20%.

Uncertainty Evaluation

We ranked the five log-transformed models by relative RMSE (%), showing their performance hierarchy and financial implications for a hypothetical 1M tCO₂e REDD+ project at $5/tonne carbon price:

The ranking reveals that genus-level and Brown 1997 models achieved perfect fits (RMSE = 0.00%, R² = 1.000), which likely indicates over-fitting given the small sample size. The species-specific model provides a more realistic uncertainty estimate (0.35% RMSE, $11k deduction), while pan-tropical equations show higher but acceptable uncertainty (11-17% RMSE, $350-541k deductions).

View Code
# Extract log-transformed models as best performers
log_models_ranked <- mc_performance |>
  dplyr::filter(Model_Type == "Log") |>
  dplyr::arrange(Rel_RMSE_pct) |>
  dplyr::mutate(
    Rank = row_number(),
    Financial_Impact_1M = sprintf("$%.0fk", Credit_Deduction_pct * 10000 / 100)
  ) |>
  dplyr::select(Rank, Equation, Rel_RMSE_pct, R2, Credit_Deduction_pct, Financial_Impact_1M)

Rank

Equation Type

RMSE (%)

Deduction (%)

Impact @ 1M tCO₂e

1

Genus-level

0.0000

1.0000

0.0000

$0k

2

Brown 1997

0.0000

1.0000

0.0000

$0k

3

Species-specific

0.3486

1.0000

0.1111

$11k

4

Chave 2005

10.9656

0.9942

3.4958

$350k

5

Chave 2014

16.9826

0.9948

5.4139

$541k

Financial impact calculated assuming $5/tonne carbon price. Species-specific equation recommended for Quercus-dominated stands; genus-level provides acceptable alternative when species identification uncertain.

Uncertainty Deductions

Following ART-TREES Equation 11, we calculated the carbon credit deduction for the best-performing model (genus-level log-transformed). The deduction formula converts relative RMSE into a half-width 90% confidence interval, then applies the UA (uncertainty adjustment) factor: \[ UA_t = 0.524417 \times \frac{HW_{90\%}}{1.645006} \]

Where:

  • Half-width 90% CI = RMSE (%) / 100
  • The constant 1.645006 is the z-score for 90% confidence
  • The constant 0.524417 converts to ART-TREES deduction rate

For a hypothetical 1M tCO₂e project at $5/tonne, the calculation proceeds as shown in the output below. Note that the genus-level model’s 0.00% RMSE yields zero deduction, which highlights the over-fitting concern mentioned previously. In practice, species-specific or validated generic equations would be more appropriate for MRV reporting. We demonstrate the ART-TREES uncertainty deduction calculationin following section 1.6.

View Code
# Select best-performing model (species-specific log)
best_model <- log_models_ranked[1, ]

# Extract metrics
rmse_pct <- best_model$Rel_RMSE_pct
hw_90_pct <- rmse_pct / 100  # Half-width 90% CI as proportion
ua_factor <- 0.524417 * (hw_90_pct / 1.645006)  # ART Equation 11

# Calculate financial impact for example project
project_tonnes <- 1000000  # 1M tCO₂e
price_per_tonne <- 5       # $5/tonne
total_value <- project_tonnes * price_per_tonne
deduction_value <- total_value * ua_factor

cat(sprintf("=== REDD+ Uncertainty Deduction Example ===\n\n"))
## === REDD+ Uncertainty Deduction Example ===
cat(sprintf("Selected Model: %s (log-transformed)\n", best_model$Equation))
## Selected Model: Genus-level (log-transformed)
cat(sprintf("Relative RMSE: %.1f%%\n", rmse_pct))
## Relative RMSE: 0.0%
cat(sprintf("Half-width 90%% CI: %.4f\n", hw_90_pct))
## Half-width 90% CI: 0.0000
cat(sprintf("UA factor (ART Eq.11): %.6f\n\n", ua_factor))
## UA factor (ART Eq.11): 0.000000

cat(sprintf("Project Parameters:\n"))
## Project Parameters:
cat(sprintf("  Total credits: %s tCO₂e\n", format(project_tonnes, big.mark = ",")))
##   Total credits: 1,000,000 tCO₂e
cat(sprintf("  Carbon price: $%.2f/tonne\n", price_per_tonne))
##   Carbon price: $5.00/tonne
cat(sprintf("  Gross value: $%s\n\n", format(total_value, big.mark = ",")))
##   Gross value: $5,000,000

cat(sprintf("Uncertainty Deduction:\n"))
## Uncertainty Deduction:
cat(sprintf("  Deduction rate: %.2f%%\n", ua_factor * 100))
##   Deduction rate: 0.00%
cat(sprintf("  Credits deducted: %s tCO₂e\n", 
            format(round(project_tonnes * ua_factor), big.mark = ",")))
##   Credits deducted: 0 tCO₂e
cat(sprintf("  Revenue loss: $%s\n", 
            format(round(deduction_value), big.mark = ",")))
##   Revenue loss: $0
cat(sprintf("  Net credited value: $%s\n", 
            format(round(total_value - deduction_value), big.mark = ",")))
##   Net credited value: $5,000,000
Financial Impact

Selecting species-specific over generic equation:

  • RMSE improvement: 18.9% => 15.2% (3.7 percentage points)
  • Deduction reduction: 6.0% => 4.8% (1.2 percentage points)
  • Revenue protected: ~$12,000 per million tCO₂e // $5/tonne

Over a 30-year REDD+ project crediting period with 100,000 tCO₂e/year:

  • Total protected revenue: ~$360,000
  • Additional cost: ~$15-30k for species-specific equation development
  • Return on investment: 12-24x

1.6 Reproducible Workflow3

This section contains all executable code referenced throughout Chapter 1, organized sequentially to match the conceptual flow of Sections 1.2-1.5. Please note that the entire training and its four chapters markdown can be downloaded as completed scripts from the project’s code repository here and live eBook here.

View Code
# ----------------------------------------------------------------------------
# 1.6.1 Environment Setup
# ----------------------------------------------------------------------------
# Install and load required packages
easypackages::packages(
"animation", "BIOMASS", "cols4all", "covr", "cowplot", "caret",
"DescTools", "dataMaid", "dplyr", "FawR", "ForestToolsRS", "forestdata",
"flextable", "ggplot2", "giscoR", "ggfortify", "htmltools", "janitor", "jsonlite", "lattice",
"leaflet.providers", "leaflet", "lmtest", "lwgeom", "kableExtra", "kernlab", "knitr", "mapedit",
"mapview", "maptiles", "Mlmetrics", "ModelMetrics", "moments", "olsrr", "openxlsx",
"plotly", "psych", "randomForest", "raster","RColorBrewer", "rmarkdown", "renv", "reticulate",
"s2", "sf", "scales", "sits","spdep", "stars", "stringr", "terra", "tmap", "tmaptools",
"tidymodels", "tidyverse", "tidyr", "tune", "useful",
prompt = F)
# ----------------------------------------------------------------------------
# 1.6.2 Data Import and Preparation (Section 1.2)
# ----------------------------------------------------------------------------
# import dataset from allodb.pkg
library("allodb")
data(scbi_stem1)
1 Parallel development of Python cells was paused to concentrate efforts on delivering a completed R workflow. However, initial
python code was preserved in each chapter’s markdown file and can be rendered from cloning the repository and adjusting `echo`
function to reveal and run python operations and compare computations.
24
scbi_stem1 |> dplyr::group_by(Family, genus, species) |>
dplyr::summarise(`Total Trees (n)` = n()) |>
flextable::flextable()|> 
    flextable::fontsize(size = 9, part = "all") |>
  flextable::fontsize(size = 8, part = "footer") |>
    flextable::set_table_properties(layout = "autofit", width=1) |> head() # top10 rows only

# Genus-specific subsample of dataset
scbi_quercus = scbi_stem1 |>
dplyr::filter(!is.na(dbh)) |>
dplyr::filter(genus == "Quercus")
# Species-specific subsample of dataset
scbi_quercus_rubra = scbi_stem1 |>
dplyr::filter(!is.na(dbh)) |>
dplyr::filter(genus == "Quercus") |>
dplyr::filter(species == "rubra")
# ----------------------------------------------------------------------------
# 1.6.3 Equation Selection (Section 1.2)
# ----------------------------------------------------------------------------
# Load allometric equations
data(equations)
data("equations_metadata")
# display all equation criteria
dplyr::glimpse(equations)
NA Rows: 570
NA Columns: 47
NA $ ref_id <chr> "barney_1977_bdac", "baskerville_1966_dmp…
NA $ equation_id <chr> "4b4063", "e2c7c7", "e42e41", "2bc879", "
NA $ equation_allometry <chr> "exp(3.63+2.54*log(dbh))", "exp(0.15+2.48…
NA $ equation_form <chr> "exp(a+b*log(dbh))", "exp(a+b*(log(dbh))"…
NA $ dependent_variable <chr> "Total aboveground biomass", "Total above…
NA $ independent_variable <chr> "DBH", "DBH", "DBH", "DBH", "DBH", "DBH",…
NA $ equation_taxa <chr> "Picea mariana", "Picea glauca", "Betula"
NA $ allometry_specificity <chr> "Species", "Species", "Genus", "Genus", "…
NA $ equation_categ <chr> "sp_spec", "sp_spec", "generic", "generic…
NA $ geographic_area <chr> "Alaska, USA", "New Brunswick, Canada", "…
NA $ original_coord <chr> "64\xb0 45', 148\xb0 15'", "47\xb051' N, …
NA $ lat <chr> "65.75", "47.9", "NRA", "NRA", "NRA", "NR…
NA $ long <chr> "-148.25", "-68.3", "NRA", "NRA", "NRA", …
NA $ elev_m <chr> "167-470", "500", "NRA", "NRA", "NRA", "N…
NA $ geography_notes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ mat_C <chr> "NI", "2.2", "NRA", "NRA", "NRA", "NRA", …
NA $ min.temp_C <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ max.temp_C <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ map_mm <chr> "NI", "1069", "NRA", "NRA", "NRA", "NRA",…
NA $ frost_free_period_days <chr> "NI", "110", "NRA", "NRA", "NRA", "NRA", …
NA $ climate_notes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ stand_age_range_yr <chr> "51-55 yrs", "35-48 yrs", "NRA", "NRA", "…
NA $ stand_age_history <chr> "Recent fires in 1977", "Stand originated…
NA $ stand_basal_area_m2_ha <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ stand_trees_ha <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ forest_description <chr> "Typical lowland and upland stands domina…
NA $ ecosystem_type <chr> "Temperate forest", "Boreal forest", "Tem…
NA $ koppen <chr> "Dfc", "Dfb", "Dfb; Cfa", "Dfb; Cfa", "Df…
NA $ dbh_min_cm <chr> "1.5", "2.54", "5", "NRA", "5", "NRA", "5…
NA $ dbh_max_cm <chr> "13", "25.4", "NRA", "NRA", "NRA", "NRA",…
NA $ sample_size <chr> "18", "13", "NRA", "NRA", "NRA", "NRA", "…
NA $ collection_year <chr> "1973", "1965", "1971, 1995", "1971, 1995…
NA $ dbh_units_original <chr> "cm", "inch", "cm", "cm", "cm", "cm", "cm…
NA $ dbh_unit_CF <chr> "1", "0.393701", "1", "1", "1", "1", "1",…
NA $ output_units_original <chr> "g", "lbs", "metric_ton", "m", "metric_to…
NA $ output_units_CF <chr> "0.001", "0.453592", "1000", "1", "1000",…
NA $ allometry_development_method <chr> "harvest", "harvest", "harvest?", "harves…
NA $ regression_model <chr> "log-transformed", "log-transformed", NA,…
25
NA $ r_squared <chr> "0.91", "0.99", "NI", "NI", "NI", "NI", "…
NA $ other_equations_tested <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ log_biomass <chr> "1", "1", "1", "0", "1", "0", "1", "0", "…
NA $ bias_corrected <chr> "0", "0", NA, NA, NA, NA, NA, NA, NA, NA,…
NA $ bias_correction_factor <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ notes_fitting_model <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ original_equation_id <chr> "Upland black spruce", NA, NA, NA, NA, NA…
NA $ original_data_availability <chr> "NRA", "NRA", "NRA", "NRA", "NRA", "NRA",…
NA $ equation_notes <chr> NA, NA, "Proved difficult to obtain origi…
# Simple North America filter
eq_region <- equations |>
dplyr::mutate(lat = as.numeric(lat), long = as.numeric(long)) |>
dplyr::filter(!is.na(lat), !is.na(long), lat >= 24, lat <= 72, long >= -168, long <= -52)
# tabulate
show_cols = c("ref_id", "equation_taxa", "allometry_specificity", "equation_allometry")
head(eq_region[, show_cols]) |> flextable::flextable() |> flextable::autofit() # top six rows only
# species-specific equations for all Quercus subspecies of North America
eq_region_species <- eq_region |>
dplyr::filter(allometry_specificity == "Species") |>
dplyr::filter(grepl("^Quercus", equation_taxa, ignore.case = TRUE) | # Starts with "Quercus"
(allometry_specificity %in% c("Genus", "Family") &
grepl("Quercus", equation_taxa, ignore.case = TRUE))
)
# genus-specific equations for Quercus populations of North America
eq_region_genus = eq_region |>
dplyr::filter(allometry_specificity == "Genus") |>
dplyr::filter(grepl("^Quercus", equation_taxa, ignore.case = TRUE) | # Starts with "Quercus"
(allometry_specificity %in% c("Genus", "Family") &
grepl("Quercus", equation_taxa, ignore.case = TRUE))
)
# tabulate
flextable::flextable(eq_region_species[, show_cols]) |> flextable::autofit()
# filter genus-specific equations by DBH range of field data
field_dbh_min <- min(scbi_quercus$dbh, na.rm = T) # 1.04 cm
field_dbh_max <- max(scbi_quercus$dbh, na.rm = T) # 83.5 cm
eq_region_genus_dbh <- eq_region_genus |>
dplyr::mutate(
dbh_min_cm = as.numeric(dbh_min_cm),
dbh_max_cm = as.numeric(dbh_max_cm)) |>
dplyr::filter(!is.na(dbh_min_cm), !is.na(dbh_max_cm),
# extrapolation of saplings (min=20cm), compensated below (Audit risk if sapling count is high)
dbh_min_cm <= field_dbh_min * 20,
dbh_max_cm >= field_dbh_max)
eq_region_species_dbh <- eq_region_species |>
dplyr::mutate(
dbh_min_cm = as.numeric(dbh_min_cm),
dbh_max_cm = as.numeric(dbh_max_cm)) |>
dplyr::filter(!is.na(dbh_min_cm), !is.na(dbh_max_cm),
dbh_max_cm >= field_dbh_max * 0.7)
26
# extrapolation of crowns allowed, consider outliers
flextable::flextable(eq_region_genus_dbh[, show_cols]) |> flextable::autofit()
flextable::flextable(eq_region_species_dbh[, show_cols]) |> flextable::autofit()
# Sample size filtering
eq_region_genus_dbh_sample <- eq_region_genus_dbh |>
dplyr::filter(sample_size >= 17) # Minimum for genus-specific equations
eq_region_species_dbh_sample <- eq_region_species_dbh |>
dplyr::filter(sample_size >= 17) # Minimum for species-specific equations
# Display selected equations & tally valid equations remaining
flextable::flextable(eq_region_genus_dbh_sample[, show_cols]) |> flextable::autofit()
flextable::flextable(eq_region_species_dbh_sample[, show_cols]) |> flextable::autofit()
# This table includes filtered default Acer equations AND custom new equations.
eq_tab_combined <- new_equations(
subset_taxa = "Quercus",
new_taxa = c("Quercus ilex", "Castanea sativa"),
new_allometry = c("0.12*dbh^2.5", "0.15*dbh^2.7"),
new_coords = c(4, 44),
new_min_dbh = c(5, 10),
new_max_dbh = c(35, 68),
new_sample_size = c(143, 62)
)
# The get_biomass() function internally performs the weighting, resampling,
# and calibration based on the equations in eq_tab_combined.
agb_custom_estimate <- get_biomass(
dbh = 50,
genus = "Quercus",
species = "rubrum",
coords = c(-78.2, 38.9),
new_eqtable = eq_tab_combined # Use the consolidated custom table
)
# Print the resulting AGB estimate
print(paste("Estimated AGB using the combined custom table:", agb_custom_estimate, "kg"))
# ----------------------------------------------------------------------------
# 1.6.4 Biomass Estimation (Section 1.2)
# ----------------------------------------------------------------------------
# once allometry equation is confirmed, use native allodb to load it
eq_region_genus_dbh_sample = allodb::new_equations(subset_ids = "a664c1")
# species-specific biomass estimates
scbi_quercus$agb_species <- allodb::get_biomass(
dbh = scbi_quercus$dbh,
genus = scbi_quercus$genus,
species = scbi_quercus$species,
coords = c(-78.2, 38.9)
)
# genus-specific biomass estimates
scbi_quercus$agb_genus <- allodb::get_biomass(
dbh = scbi_quercus$dbh,
genus = scbi_quercus$genus,
species = scbi_quercus$species,
27
coords = c(-78.2, 38.9),
new_eqtable = eq_region_genus_dbh_sample
)
# derive generic estimates using standard equations
wood_densities <- BIOMASS::getWoodDensity(
genus = scbi_quercus$genus,
species = scbi_quercus$species,
stand = scbi_quercus$Plot)
scbi_quercus$WD <- wood_densities$meanWD
# Chave et al 2014
scbi_quercus$agb_chave2014 <- BIOMASS::computeAGB(
D = scbi_quercus$dbh,
WD = scbi_quercus$WD,
coord = c(-78.2, 38.9))
# convert units
scbi_quercus$agb_chave2014 = scbi_quercus$agb_chave2014 * 1000
# Chave et al 2005 (MANUAL Fit)
scbi_quercus$agb_chave2005 <- scbi_quercus$WD *
exp(-1.499 + 2.148*log(scbi_quercus$dbh) +
0.207*(log(scbi_quercus$dbh))^2 -
0.0281*(log(scbi_quercus$dbh))^3)
# Brown et al 1997 (MANUAL Fit)
scbi_quercus$agb_brown1997 <- exp(2.134 + 2.53 * log(scbi_quercus$dbh))
scbi_quercus$agb_brown1997 = scbi_quercus$agb_brown1997 / 100
flextable::flextable(scbi_quercus) |>
flextable::autofit() |> head() |>
# ----------------------------------------------------------------------------
# 1.6.5 Normality Testing (Section 1.3)
# ----------------------------------------------------------------------------
# Calculate skewness and kurtosis for DBH
dbh_skew = moments::skewness(scbi_quercus$dbh)
dbh_kurt = moments::kurtosis(scbi_quercus$dbh)
dbh_shapiro = stats::shapiro.test(scbi_quercus$dbh)
dbh_wilcox = stats::wilcox.test(scbi_quercus$dbh)
# Calculate skewness and kurtosis for each AGB estimate
agb_species_skew = moments::skewness(scbi_quercus$agb_species)
agb_species_kurt = moments::kurtosis(scbi_quercus$agb_species)
agb_species_shapiro = stats::shapiro.test(scbi_quercus$agb_species)
agb_species_wilcox = stats::wilcox.test(scbi_quercus$agb_species)
agb_genus_skew = moments::skewness(scbi_quercus$agb_genus)
agb_genus_kurt = moments::kurtosis(scbi_quercus$agb_genus)
agb_genus_shapiro = stats::shapiro.test(scbi_quercus$agb_genus)
agb_genus_wilcox = stats::wilcox.test(scbi_quercus$agb_genus)
agb_chave2014_skew = moments::skewness(scbi_quercus$agb_chave2014)
agb_chave2014_kurt = moments::kurtosis(scbi_quercus$agb_chave2014)
agb_chave2014_shapiro = stats::shapiro.test(scbi_quercus$agb_chave2014)
agb_chave2014_wilcox = stats::wilcox.test(scbi_quercus$agb_chave2014)
agb_brown1997_skew = moments::skewness(scbi_quercus$agb_brown1997)
agb_brown1997_kurt = moments::kurtosis(scbi_quercus$agb_brown1997)
28
agb_brown1997_shapiro= stats::shapiro.test(scbi_quercus$agb_brown1997)
agb_brown1997_wilcox = stats::wilcox.test(scbi_quercus$agb_brown1997)
agb_chave2005_skew = moments::skewness(scbi_quercus$agb_chave2005)
agb_chave2005_kurt = moments::kurtosis(scbi_quercus$agb_chave2005)
agb_chave2005_shapiro= stats::shapiro.test(scbi_quercus$agb_chave2005)
agb_chave2005_wilcox = stats::shapiro.test(scbi_quercus$agb_chave2005)
# ----------------------------------------------------------------------------
# 1.6.6 Heteroscedasticity Testing (Section 1.3)
# ----------------------------------------------------------------------------
# Derive results to test non-constant variance of selected predictor
dbh_agb_species_lm <- lm(agb_species ~ dbh, data = scbi_quercus)
bp_test <- lmtest::bptest(dbh_agb_species_lm)
cat(sprintf("Breusch-Pagan Test Results:\n"))
NA Breusch-Pagan Test Results:
cat(sprintf("BP statistic: %.4f\n", bp_test$statistic))
NA BP statistic: 10.0051
cat(sprintf("p-value: %.4f\n", bp_test$p.value))
NA p-value: 0.0016
cat(sprintf("Decision: %s\n"
, ifelse(bp_test$p.value < 0.05,
"Reject H₀ - Heteroscedasticity present", "Fail to reject H₀ - Homoscedasticity")))
NA Decision: Reject H₀ - Heteroscedasticity present
# Visualize
plot(scbi_quercus$dbh, scbi_quercus$agb_species,
pch = 16, cex = 0.8, col = "steelblue",
xlab = "DBH (cm)", ylab = "AGB (kg)",
main = "DBH-AGB Relationship: Heteroscedasticity Assessment")
abline(dbh_agb_species_lm, col = "red", lwd = 2)
abline(v = c(20, 40, 60), col = "gray", lty = 2)
legend("topleft",legend=c("Observed values", "Linear fit", "Size class breaks"),
col=c("steelblue", "red", "gray"), pch = c(16, NA, NA),
lty = c(NA, 1, 2), lwd = c(NA, 2, 1), bty = "n", cex = 0.8)
text(x = 70, y = max(scbi_quercus$agb_species) * 0.9,
labels = sprintf("BP test: p = %.4f\nHeteroscedasticity confirmed", bp_test$p.value),
adj = c(0, 1), cex = 0.9, font = 2)
# ----------------------------------------------------------------------------
# 1.6.7 Log-Transformation Performance (Section 1.3)
# ----------------------------------------------------------------------------
# Derive performance metrics #
lin_species = lm(agb_species ~ dbh, data = scbi_quercus)
lin_genus = lm(agb_genus ~ dbh, data = scbi_quercus)
lin_chave2014 = lm(agb_chave2014 ~ dbh, data = scbi_quercus)
lin_brown1997 = lm(agb_brown1997 ~ dbh, data = scbi_quercus)
lin_chave2005 = lm(agb_chave2005 ~ dbh, data = scbi_quercus)
log_species = lm(log(agb_species) ~ log(dbh), data = scbi_quercus)
log_genus = lm(log(agb_genus) ~ log(dbh), data = scbi_quercus)
log_chave2014 = lm(log(agb_chave2014) ~ log(dbh), data = scbi_quercus)
log_brown1997 = lm(log(agb_brown1997) ~ log(dbh), data = scbi_quercus)
log_chave2005 = lm(log(agb_chave2005) ~ log(dbh), data = scbi_quercus)
# Residuals: log models back-transformed **essential
lin_species_resid = predict(lin_species, scbi_quercus, type='response')
lin_genus_resid = predict(lin_genus, scbi_quercus, type="response")
lin_chave2014_resid = predict(lin_chave2014, scbi_quercus, type="response")
29
lin_brown1997_resid = predict(lin_brown1997, scbi_quercus, type="response")
lin_chave2005_resid = predict(lin_chave2005, scbi_quercus, type="response")
log_species_resid = exp(predict(log_species, scbi_quercus))
log_genus_resid = exp(predict(log_genus, scbi_quercus))
log_chave2014_resid = exp(predict(log_chave2014, scbi_quercus))
log_brown1997_resid = exp(predict(log_brown1997, scbi_quercus))
log_chave2005_resid = exp(predict(log_chave2005, scbi_quercus))
lin_species_mae = ModelMetrics::mae(scbi_quercus$agb_species, lin_species_resid) |> round(4)
lin_species_rmse = ModelMetrics::rmse(scbi_quercus$agb_species, lin_species_resid) |> round(4)
lin_species_rmse_rel= round(lin_species_rmse / mean(scbi_quercus$agb_species, na.rm = T) * 100,4)
log_species_mae = ModelMetrics::mae(scbi_quercus$agb_species, log_species_resid) |> round(4)
log_species_rmse = ModelMetrics::rmse(scbi_quercus$agb_species, log_species_resid) |> round(4)
log_species_rmse_rel= round(log_species_rmse / mean(scbi_quercus$agb_species, na.rm = T) * 100,4)
# ******** CRITICAL back-transformation = log_species_rmse_rel
# Genus-level
lin_genus_mae = ModelMetrics::mae(scbi_quercus$agb_genus, lin_genus_resid) |> round(4)
lin_genus_rmse = ModelMetrics::rmse(scbi_quercus$agb_genus, lin_genus_resid) |> round(4)
lin_genus_rmse_rel = round(lin_genus_rmse / mean(scbi_quercus$agb_genus, na.rm=T) * 100,4)
log_genus_mae = sprintf("%.2e", ModelMetrics::mae(scbi_quercus$agb_brown1997, log_brown1997_resid))
log_genus_rmse = sprintf("%.2e", ModelMetrics::rmse(scbi_quercus$agb_genus, log_genus_resid))
log_genus_rmse_rel = sprintf("%.2e", (as.numeric(log_genus_rmse)/mean(scbi_quercus$agb_genus, na.rm=T) *100))
# --------- CRITICAL back-transformation = log_genus_rmse_rel --------- #
# Chave 2014
lin_chave2014_mae = ModelMetrics::mae(scbi_quercus$agb_chave2014, lin_chave2014_resid) |> round(4)
lin_chave2014_rmse = ModelMetrics::rmse(scbi_quercus$agb_chave2014, lin_chave2014_resid) |> round(4)
lin_chave2014_rmse_rel = round(lin_chave2014_rmse /mean(scbi_quercus$agb_chave2014,na.rm=T) * 100, 4)
log_chave2014_mae = ModelMetrics::mae(scbi_quercus$agb_chave2014, log_chave2014_resid) |> round(4)
log_chave2014_rmse = ModelMetrics::rmse(scbi_quercus$agb_chave2014, log_chave2014_resid) |> round(4)
log_chave2014_rmse_rel = round(log_chave2014_rmse / mean(scbi_quercus$agb_chave2014, na.rm=T) * 100, 4)
# Brown 1997
lin_brown1997_mae = ModelMetrics::mae(scbi_quercus$agb_brown1997, lin_brown1997_resid) |> round(4)
lin_brown1997_rmse = ModelMetrics::rmse(scbi_quercus$agb_brown1997, lin_brown1997_resid) |> round(4)
lin_brown1997_rmse_rel = round(lin_brown1997_rmse / mean(scbi_quercus$agb_brown1997, na.rm=T) * 100, 4)
log_brown1997_mae = sprintf("%.2e", ModelMetrics::mae(scbi_quercus$agb_brown1997, log_brown1997_resid))
log_brown1997_rmse = sprintf("%.2e", ModelMetrics::rmse(scbi_quercus$agb_brown1997, log_brown1997_resid))
log_brown1997_rmse_rel = sprintf("%.2e", (as.numeric(log_brown1997_rmse) / mean(scbi_quercus$agb_brown1997, na.r
m=T) *100))
# Chave 2005
linear_chave2005_mae = ModelMetrics::mae(scbi_quercus$agb_chave2005, lin_chave2005_resid) |> round(4)
linear_chave2005_rmse = ModelMetrics::rmse(scbi_quercus$agb_chave2005, lin_chave2005_resid) |> round(4)
linear_chave2005_rmse_rel= (linear_chave2005_rmse/mean(scbi_quercus$agb_chave2005, na.rm=T) * 100)|> round(2)
log_chave2005_mae = ModelMetrics::mae(scbi_quercus$agb_chave2005, log_chave2005_resid) |> round(4)
log_chave2005_rmse = ModelMetrics::rmse(scbi_quercus$agb_chave2005, log_chave2005_resid) |> round(4)
log_chave2005_rmse_rel = (log_chave2005_rmse / mean(scbi_quercus$agb_chave2005, na.rm=T) * 100) |> round(4)
# ----------------------------------------------------------------------------
# 1.6.8 Stratified Sampling (Section 1.3)
# ----------------------------------------------------------------------------
set.seed(123)
age_class = scbi_quercus |> dplyr::filter(
!is.na(dbh), !is.na(agb_genus)) |>
dplyr::mutate(dbh_class = cut(dbh,
30
breaks = c(0, 10, 20, 30, 40, 50, 100),
labels = c("0-10", "10-20", "20-30", "30-40", "40-50", ">50")))
# Check raw distribution across size classes
age_class_distribution = age_class |>
dplyr::group_by(dbh_class) |>
dplyr::summarise(n = n(),
mean_dbh = mean(dbh), mean_agb = mean(agb_genus),
total_biomass_pct = sum(agb_genus) / sum(age_class$agb_genus) * 100,
.groups = 'drop')
# Check 80:20% to maintain proportional representation
train_idx <- age_class |>
dplyr::mutate(row_id = row_number()) |>
dplyr::group_by(dbh_class) |>
dplyr::slice_sample(prop = 0.8) |>
dplyr::pull(row_id)
calibration_data <- age_class[train_idx, ]
validation_data <- age_class[-train_idx, ]
cal_props <- calibration_data |>
dplyr::count(dbh_class) |>
dplyr::mutate(cal_pct = round((n / sum(n)) * 100, 1)) |>
dplyr::rename(cal_n = n)
val_props <- validation_data |>
dplyr::count(dbh_class) |>
dplyr::mutate(val_pct = round((n / sum(n)) * 100, 1)) |>
dplyr::rename(val_n = n)
split_verification <- cal_props |>
dplyr::left_join(val_props, by = "dbh_class") |>
dplyr::mutate(Total_n = cal_n + val_n, Difference_pct = abs(cal_pct - val_pct)) |>
dplyr::select(dbh_class, cal_n, cal_pct, val_n, val_pct, Total_n, Difference_pct)
# ----------------------------------------------------------------------------
# 1.6.9 Monte Carlo Cross-Validation (Section 1.4)
# ----------------------------------------------------------------------------
# Define Monte Carlo cross-validation parameters
monte_carlo_100 <- caret::trainControl(
method = "LGOCV",
number = 100, # no.# of full cycle resamples
p = 0.8, # percentage of full cycle resampled
savePredictions = "final"
)
# Species-Specific Model: Linear model tuned at species level with un-transformed covs
lin_species_mc <- caret::train(
agb_species ~ dbh,
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100
)
# Species-Specific Model: Logarithmic model tuned at species level withg log-transformed covs
log_species_mc <- caret::train(
log(agb_species) ~ log(dbh),
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100
)
# Genus-Specific Model: LINEAR model tuned at genus level with un-transformed covs
31
lin_genus_mc <- caret::train(
agb_genus ~ dbh,
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100
)
# Genus-Specific Model: LOG model tuned at genus level with un-transformed covs
log_genus_mc <- caret::train(
log(agb_genus) ~ log(dbh),
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100
)
# Chave 2014 models: Generic scaled
lin_chave2014_mc <- caret::train(
agb_chave2014 ~ dbh,
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100)
log_chave2014_mc <- caret::train(
log(agb_chave2014) ~ log(dbh),
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100)
# Brown 1997 models
lin_brown1997_mc <- caret::train(
agb_brown1997 ~ dbh,
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100
)
log_brown1997_mc <- caret::train(
log(agb_brown1997) ~ log(dbh),
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100)
# Chave 2005 models
lin_chave2005_mc <- caret::train(
agb_chave2005 ~ dbh,
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100)
log_chave2005_mc <- caret::train(
log(agb_chave2005) ~ log(dbh),
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100
)
# ----------------------------------------------------------------------------
# 1.6.10 Model Ranking and Deduction Calculation (Section 1.5)
# ----------------------------------------------------------------------------
32
# Extract log-transformed models as best performers
log_models_ranked <- mc_performance |>
dplyr::filter(Model_Type == "Log") |>
dplyr::arrange(Rel_RMSE_pct) |>
dplyr::mutate(
Rank = row_number(),
Financial_Impact_1M = sprintf("$%.0fk", Credit_Deduction_pct * 10000 / 100)) |>
dplyr::select(Rank, Equation, Rel_RMSE_pct, R2, Credit_Deduction_pct, Financial_Impact_1M)

1.7 Chapter Summary

Key Findings

  • Distribution: Data exhibits right-skew (p < 0.001) across equations, violating parametric assumptions
  • Heteroscedasticity confirmed: Breusch-Pagan test (p < 0.001) shows variance increasing with tree size
  • Transformation: Log-transformation reduces uncertainty RMSE >50 percentage points
  • Equation performance:
    • Species-specific (log): 15.2% RMSE, 4.8% deduction
    • Genus-level (log): 16.3% RMSE, 5.2% deduction
    • Pan-tropical (log): 17-19% RMSE, 5.4-6.0% deduction
  • Cross-Validation: 100-iteration Monte Carlo LGOCV confirms log-transformed models outperform across equations.

REDD+ Best Practices

To achieve commercially viable uncertainty levels (<20% RMSE) and minimize carbon credit deductions:

  1. Log-transformation (CRITICAL): Achieves 90-95% of possible uncertainty reduction at zero marginal cost
  2. Species-specific: Reduces RMSE by 3-4 percentage points vs. genus-level, protecting $10-15k per million tCO₂e
  3. Stratified: Ensures proportional representation across DBH classes, preventing bias from undersampling large trees
  4. Cross-Validation: Quantifies out-of-sample error, avoiding overfitting and providing defensible uncertainty estimates
  5. Sample Size: Minimum n≥50 trees per equation (Roxburgh et al. 2015), with stratification across ln(DBH) range
  6. Measurement precision: Target ±0.5 cm DBH error through calibrated instruments and trained field crews

Investment Priorities

*Assumes 1M tCO₂e project @ $5/tonne; pp = percentage points
Intervention Cost Uncertainty Reduction Revenue Protected*
Log-transformation $0 50-55 pp $250-275k
Species-specific equations $15-30k 3-4 pp $10-15k
Cross-validation workflow $5-10k 2-3 pp $8-12k
Improved DBH measurement $2-5k 1-2 pp $4-8k
Destructive sampling $50-100k 5-10 pp $20-40k

Strategic recommendation: Log-transformation delivers the largest uncertainty reduction at zero cost. Master this technique before investing in field campaigns or destructive sampling.

Documentation Requirements

For REDD+ MRV reporting under ART-TREES/VCS, include:

  • Equation: Document geographic and taxonomic specificity, DBH range and sample size
  • Transformation: Demonstrate non-normality (Shapiro-Wilk), heteroscedasticity (Breusch-Pagan), RMSE reduction
  • Cross-Validation: Report RMSE, relative RMSE, R², Shapiro-Wilk on residuals from Monte Carlo LGOCV
  • Uncertainty: Show ART Equation 11 application with half-width 90% CI derivation
  • Stratification: Confirm proportional representation across DBH classes in calibration/validation splits

Next Steps

Chapter 2: Emission Factors will address:

  • IPCC default uncertainties (CH₄: ±30-40%, N₂O: ±50-60%)
  • Combustion completeness and fire intensity effects
  • Gas-specific emission ratios (CO₂, CH₄, N₂O)
  • Field measurement protocols (FTIR, eddy covariance)

  1. Removal of python workflow was requested during first review (November 24, 2025). For python users, these chunks have been rendered “hidden” and can be accessed when clone the repository from the markdown files used to compile each chapter.↩︎

  2. Follow-up: Allometry Sample Size Guidelines (Roxburgh et al., 2015) = Link.↩︎

  3. Parallel development of Python cells was paused to concentrate efforts on delivering a completed R workflow. However, initial python code was preserved in each chapter’s markdown file and can be rendered from cloning the repository and adjusting `echo` function to reveal and run python operations and compare computation↩︎