1. Allometry

Overview

Allometric equations represent the proportional and scaling relationships between different tree dimensions, such as the relationship between a tree’s diameter and its height, biomass, or crown size. When trees are considered at a population scale, their different dimensions are statistically related through shared ontogenic development patterns (Gould, 1966). These relationships remain consistent across tree sizes, from saplings to canopy dominants, when growing under similar conditions (Archibald & Bond, 2003; Bohlman & O’Brien, 2006; Dietze et al., 2008; King, 1996).

This biological principle underpins REDD+ carbon accounting: allometric equations translate field measurements of diameter at breast height (DBH) into biomass estimates, forming the foundation of emission reduction quantification.

Allometry Investment

Although Section 8 exempts structural allometric uncertainty when models are applied consistently, the choice of which model to apply directly impacts reported uncertainty. Allometry variance is absolute and compounds more dramatically so that model selection determines the magnitude of random error that cannot be exempted downstream. In practical terms, selecting a model with 20% RMSE versus 8% RMSE determines whether your project faces a 6% or 2% carbon credit deduction, representing a difference worth $200k in a 1M tCO₂^-e project at $5/tonne.

Environment Setup (R)

View Code

easypackages::packages(
    "animation", "BIOMASS", "cols4all", "covr", "cowplot", "caret",
  "DescTools", "dataMaid", "dplyr", "FawR", "ForestToolsRS", "forestdata", 
  "flextable", "ggplot2", "giscoR", "ggfortify", "htmltools", 
    "janitor", "jsonlite", "lattice", "leaflet.providers", "leaflet", 
    "lmtest", "lwgeom", "kableExtra", "kernlab", "knitr", "mapedit", 
    "mapview", "maptiles", "Mlmetrics", "ModelMetrics", "moments", 
    "olsrr", "openxlsx", "plotly", "psych", "randomForest", 
  "raster","RColorBrewer", "rmarkdown", "renv", "reticulate", 
    "s2", "sf", "scales", "sits","spdep", "stars", "stringr", 
    "terra", "tmap", "tmaptools", "tidymodels", "tidyverse", "tidyr", "tune",
    "useful",
  prompt = F
  )

Environment Setup (Python)¹

View Code

# Installer prerequisites
import subprocess, sys

# Quick package install 
subprocess.run([sys.executable, "-m", "pip", "install", "--quiet", "-q",
  "contextily", "folium", "geopandas", "kaleido", "matplotlib", "numpy", 
  "openpyxl", "pandas", "plotly", "pysal", "pyproj", "pingouin","rasterio",
  "scikit-learn", "scipy", "seaborn", "statsmodels", "shapely", "skimpy",
  "xarray"], check=True, capture_output=True)

# Import packages into runtime
import pandas as pd, numpy as np
import matplotlib.pyplot as plt, seaborn as sns
import plotly.express as px, plotly.graph_objects as go
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split
from scipy import stats
import geopandas as gpd, rasterio, folium
from tabulate import tabulate

1.1 Allometric Equations

In its broadest sense, allometry describes any linear or non-linear correlation between increases in tree dimensions during ontogenic development (Picard et al., 2012). A more restrictive definition, originating with Huxley (1924), refers specifically to proportional relationships between relative increases in dimensions:

Allometric equations predict aboveground biomass from diameter measurements using species- or biome-specific parameters. Uncertainty compounds from three sources: model selection, parameter estimation, and field measurements. Log-transformation adds complexity through required back-transformation:

\[ \frac{\mathrm{d}B}{B} = a\frac{\mathrm{d}D}{D} \]

which integrates to the power relationship:

\[ AGB = \alpha \times DBH^{\beta} \]

and in logarithmic form:

\[ \ln(AGB) = \ln(\alpha) + \beta \times \ln(DBH) + \epsilon \]

Where:

AGB: Aboveground biomass (kg)
DBH: Diameter at breast height (cm)
α, β: Modelling exponents/parameters
ε: Random error term

This power-law form reflects the self-similarity principle observed in tree growth: as trees develop from seedlings to mature individuals, they maintain consistent proportional relationships between their dimensions (Gould, 1971). In this framework, the exponent β serves as the allometric coefficient, quantifying how one dimension changes relative to another during growth. For example, if β = 2.5, a 10% increase in diameter corresponds to a 25% increase in biomass. The parameter α is a scaling constant accounting for wood density, architectural form, and other species-specific characteristics.

West et al. (1997) and Enquist et al. (1999) developed an allometric scaling theory based on the “pipe model” (Shinozaki et al., 1964) and physical growth principles. Their framework predicts a universal exponent of β ≈ 2.67 based on biomechanical constraints, tree stability, and hydraulic resistance. While this theoretical exponent has been debated regarding its universality (Muller-Landau et al., 2006; Zianis & Mencuccini, 2004), it provides a physically grounded benchmark for evaluating empirical equations.

In this chapter, we compare allometry species-specific allometry with biome-generic equations that are more commonly applied in REDD+ programs to demonstrate how uncertainty metrics can inform model selection and impact carbon credit deductions. We adopt the broadest definition of allometry as any correlation between tree dimensions, whether in linear, log-log, power-law, or other functional forms, to examine the following questions:

How should uncertainty metrics inform model selection and optimization?
How does allometry uncertainty impacts carbon credit deductions

1.2 Model Selection

The choice of allometric equation directly determines uncertainty magnitude and carbon credit deductions. This section demonstrates quantitative differences between equation categories using the scbi_stem1 dataset from the ForestGEO plot inventory in Front Royal, Virginia. This 25.6-hectare mature secondary forest is dominated by Appalachian mixed hardwood species including tulip poplar (Liriodendron tulipifera), oaks (Quercus spp.), and hickories (Carya spp.), representing typical stand composition of the Blue Ridge and Piedmont regions.

For REDD+ carbon accounting, we tend to focus exclusively on ontogenic allometry to estimate relationships between accessible tree dimensions and total aboveground biomass as trees grow from seedling to maturity. However, some reading of evolutionary allometry is sometimes needed to address differences in specific traits that emerged in localized conditions. For example, the following species share in genus but vary in wood density, branch architecture, and bole form:

Acer rubrum (red maple): Lower density (0.49 g/cm³), faster growth
Acer saccharum (sugar maple): Higher density (0.63 g/cm³), denser wood
Acer negundo (box elder): Intermediate density (0.42 g/cm³), different architecture

Types of Equations

Allometric equations are classified by taxonomic scope and environmental specificity. In practice, the choice of equation category involves balancing precision, cost, and data availability:

Species-Specific Equations: Developed from destructive sampling of target species, providing highest accuracy but limited geographic applicability (Gonzalez-Akre et al., 2022). These equations capture species-specific morphometric traits, such as branching architecture and wood density, which influence biomass allocation.
Genus-Specific Equations: Aggregated across multiple species within a genus, offering broader applicability with moderate accuracy (Jansen et al., 1996; Jenkins, 2004). Assumes shared evolutionary heritage produces similar allometric scaling within genera.
Biome-Generic Equations: Most commonly used in REDD+ report, pan-tropical, generic equations fitted across broader geographic regions and biomes, maximizing applicability but potentially introducing bias (Brown, 1997; Chave et al., 2009; Chave et al., 2014; West et al., 1997). Often incorporate wood density (WD) or environmental stress factors to capture regional variation.
Environmentally-Conditioned Equations: Gold-standard allometric models that incorporate biophysical variables reflecting site-specific growing conditions, critical for REDD+ programs in specialized ecosystems (Komiyama et al., 2005, 2008; Rahman et al., 2021). These equations explicitly model environmental drivers of growth variation:
- Salinity gradients: Mangrove allometry where salt stress affects growth rate, wood density, and architectural form
- Soil fertility: Nutrient availability influencing wood density and height-diameter relationships
- Climate: Temperature and precipitation gradients captured through environmental stress factors
- Geomorphology: Tidal inundation frequency, elevation, or hydrological regime

Mangroves Tree Height

In specialized growing environments, such as mangrove habitats, the choice of allometry significantly affects accuracy estimates. For example, Rocha de Souza Pereira et al. (2018) found generic allometries to produce -18% & +14% wider error magins than species-specific equations in Brazilian mangrove forests. To compensate, generic equations incorporate proxies of environmental stress, such as wood density, height-diameter ratios or “stunting”, and species composition.

However, while inclusion of tree height significantly reduces bias in AGB estimates (Chave et al., 2014; Rutishauser et al., 2013), the accurate measurement of tree height in closed-canopy forests is especially challenging(King & Clark, 2011). Field data with high levels of tree height variance can limit these destructively sampled allometry models. This represents investment opportunity from reducing uncertainty through improved survey technology, such as LiDAR and RADAR (Feldpausch et al., 2011; Valbuena et al., 2016).

Data Preparation

We imported the scbi_stem1 dataset from the allodb package, which contains 2,287 individual tree measurements from the ForestGEO permanent plot at Front Royal, Virginia. The dataset includes 12 families, spanning multiple genera and species (Table 1.1). This taxonomic diversity allows us to demonstrate equation selection across different levels of specificity.

View Code

# import dataset from allodb.pkg
library("allodb")
data(scbi_stem1)

#scbi_stem1 |> dplyr::group_by(Family) |>
#   dplyr::summarise(`Tree Families sampled` = n()) |>
#   flextable::flextable()|> flextable::autofit()

scbi_stem1 |> dplyr::group_by(Family, genus, species) |>
    dplyr::summarise(`Total Trees (n)` = n()) |>
    flextable::flextable()|> 
    flextable::fontsize(size = 9, part = "body") |>
  flextable::fontsize(size = 8, part = "footer") |>
    flextable::set_table_properties(scbi_stem1, layout = "autofit", width=1, align = "center")

Family	genus	species	Total Trees (n)
Adoxaceae	Sambucus	canadensis	20
Adoxaceae	Viburnum	prunifolium	19
Adoxaceae	Viburnum	recognitum	2
Annonaceae	Asimina	triloba	14
Aquifoliaceae	Ilex	verticillata	32
Betulaceae	Carpinus	caroliniana	57
Betulaceae	Corylus	americana	1
Cannabaceae	Celtis	occidentalis	24
Caprifoliaceae	Lonicera	maackii	1
Cornaceae	Cornus	florida	24
Elaeagnaceae	Elaeagnus	umbellata	1
Fabaceae	Cercis	canadensis	25
Fagaceae	Fagus	grandifolia	27
Fagaceae	Quercus	alba	14
Fagaceae	Quercus	michauxii	1
Fagaceae	Quercus	prinus	35
Fagaceae	Quercus	rubra	26
Fagaceae	Quercus	velutina	8
Hamamelidaceae	Hamamelis	virginiana	107
Juglandaceae	Carya	cordiformis	30
Juglandaceae	Carya	glabra	64
Juglandaceae	Carya	ovalis	16
Juglandaceae	Carya	sp	3
Juglandaceae	Carya	tomentosa	24
Juglandaceae	Juglans	nigra	13
Lauraceae	Lindera	benzoin	1,201
Lauraceae	Sassafras	albidum	8
Magnoliaceae	Liriodendron	tulipifera	17
Malvaceae	Tilia	americana	2
Nyssaceae	Nyssa	sylvatica	10
Oleaceae	Chionanthus	virginicus	3
Oleaceae	Fraxinus	americana	52
Oleaceae	Fraxinus	nigra	22
Oleaceae	Fraxinus	pennsylvanica	1
Oleaceae	Fraxinus	sp	1
Platanaceae	Platanus	occidentalis	6
Rosaceae	Amelanchier	arborea	123
Rosaceae	Prunus	avium	6
Rosaceae	Prunus	serotina	4
Rosaceae	Rosa	multiflora	151
Rosaceae	Rubus	allegheniensis	6
Rosaceae	Rubus	pensilvanicus	2
Rosaceae	Rubus	phoenicolasius	17
Sapindaceae	Acer	negundo	15
Sapindaceae	Acer	rubrum	17
Simaroubaceae	Ailanthus	altissima	1
Ulmaceae	Ulmus	americana	2
Ulmaceae	Ulmus	rubra	20
Ulmaceae	Ulmus	sp	12

View Code

## ----- Python cell ----- ##
## ----------------------- ##

demo_data_py = pd.read_csv("./data/scbi_stem1.csv")

Compiling Allometries

Scope permitting, this training would examine a full allometry workflow. Effectively, this which would require repeating the same process below for all genus in our field dataset, incorporating equations per species cumulatively. We provide brief demonstration of how to add each new equations using alldob operations.

We filter scbi_stem1 to Quercus observations, the largest genus subsample, to compare the scaling impact of generic allometry with subspecies and genus equations. Missing entries are also removed, providing a new sample of 84 stems from the dataset’s total measurements of 2,287 trees.

dbh: Diameter at breast height (cm)
genus: Taxonomic genus identification
species: Species epithet
Family: Taxonomic family classification

To demonstrate the comparative analysis, we created two filtered subsets: a genus-level dataset containing all Quercus observations (n=84 stems) and a species-specific subset for Quercus rubra. Missing DBH values were removed to ensure clean input for subsequent allometric calculations. The filtering procedure is detailed in Section 1.6.

View Code

# Genus-specific subsample of dataset
scbi_quercus = scbi_stem1 |>
  dplyr::filter(!is.na(dbh)) |>
    dplyr::filter(genus == "Quercus") 

# Species-specific subsample of dataset
scbi_quercus_rubra = scbi_stem1 |>
  dplyr::filter(!is.na(dbh)) |>
    dplyr::filter(genus == "Quercus") |>
    dplyr::filter(species == "rubra")

Variable	n	mean	sd	median	min	max	skew	kurtosis	se
DBH (cm)	84.000	27.795	20.177	25.980	1.040	83.500	0.438	-0.719	2.202
genus*	84.000	1.000	0.000	1.000	1.000	1.000			0.000
species*	84.000	3.155	1.167	3.000	1.000	5.000	-0.568	-0.369	0.127
Family*	84.000	1.000	0.000	1.000	1.000	1.000			0.000

At this stage in the workflow, it helps to quickly scan the range and spread in DBH values. Applied directly, candidate equations can be filtered by min and max dbh values Deviation from the mean provides indicators of the statistical operations needed ahead during equation selection, bias correction, and model optimization. It also informs sample.

View Code

## ----- Python cell ----- ##
## ----------------------- ##

# tidy
demo_data_py = (
    demo_data_py
    .query('dbh.notna()')
  .query('dbh >= 5 & dbh <= 100')
  .reset_index(drop=True) # avoids KeyError
  )

# tabulate
numeric_data = demo_data_py.select_dtypes(include=[np.number])
desc_stats = pd.DataFrame({
    'vars': numeric_data.columns,
    'n': numeric_data.count().values,
    'mean': numeric_data.mean().values,
    'sd': numeric_data.std().values,
    'median': numeric_data.median().values,
    'min': numeric_data.min().values,
    'max': numeric_data.max().values,
    'skew': numeric_data.skew().values,
    'kurtosis': numeric_data.kurtosis().values
    }).round(2)

# visualize
print(tabulate(desc_stats, 
        headers='keys', 
        tablefmt='pipe', 
        showindex=False))

Equation Selection

Following best practices from forest inventory methodology (Duncanson et al., 2021), equation selection may proceed through the four sequential criteria in order of their priority below:

Geographic Proximity: Prioritize equations developed in climates and soil conditions similar to your project area
Taxonomic Specificity: Prefer species-level > genus-level > family-level equations
DBH Range Coverage: Ensure equation applicability spans ≥80% of measured diameter distribution
Sample Size Adequacy: Minimum n=50 trees for species-specific; n>150 for genus-level equations²

The allodb equations database contains 570 allometric equations with 47 metadata fields documenting equation provenance, geographic origin, taxonomic scope, DBH range, sample size, and model fitting parameters. Key selection criteria include geographic coordinates (lat/long), equation taxa, allometry specificity (species/genus/family), DBH range (min/max), sample size, and bias correction status. The complete metadata structure is available in Section 1.6.

View Code

# Load allometric equations
data(equations)
data("equations_metadata")

# display all equation criteria
dplyr::glimpse(equations)
NA Rows: 570
NA Columns: 47
NA $ ref_id                       <chr> "barney_1977_bdac", "baskerville_1966_dmp…
NA $ equation_id                  <chr> "4b4063", "e2c7c7", "e42e41", "2bc879", "…
NA $ equation_allometry           <chr> "exp(3.63+2.54*log(dbh))", "exp(0.15+2.48…
NA $ equation_form                <chr> "exp(a+b*log(dbh))", "exp(a+b*(log(dbh))"…
NA $ dependent_variable           <chr> "Total aboveground biomass", "Total above…
NA $ independent_variable         <chr> "DBH", "DBH", "DBH", "DBH", "DBH", "DBH",…
NA $ equation_taxa                <chr> "Picea mariana", "Picea glauca", "Betula"…
NA $ allometry_specificity        <chr> "Species", "Species", "Genus", "Genus", "…
NA $ equation_categ               <chr> "sp_spec", "sp_spec", "generic", "generic…
NA $ geographic_area              <chr> "Alaska, USA", "New Brunswick, Canada", "…
NA $ original_coord               <chr> "64\xb0 45', 148\xb0 15'", "47\xb051' N, …
NA $ lat                          <chr> "65.75", "47.9", "NRA", "NRA", "NRA", "NR…
NA $ long                         <chr> "-148.25", "-68.3", "NRA", "NRA", "NRA", …
NA $ elev_m                       <chr> "167-470", "500", "NRA", "NRA", "NRA", "N…
NA $ geography_notes              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ mat_C                        <chr> "NI", "2.2", "NRA", "NRA", "NRA", "NRA", …
NA $ min.temp_C                   <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ max.temp_C                   <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ map_mm                       <chr> "NI", "1069", "NRA", "NRA", "NRA", "NRA",…
NA $ frost_free_period_days       <chr> "NI", "110", "NRA", "NRA", "NRA", "NRA", …
NA $ climate_notes                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ stand_age_range_yr           <chr> "51-55 yrs", "35-48 yrs", "NRA", "NRA", "…
NA $ stand_age_history            <chr> "Recent fires in 1977", "Stand originated…
NA $ stand_basal_area_m2_ha       <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ stand_trees_ha               <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ forest_description           <chr> "Typical lowland and upland stands domina…
NA $ ecosystem_type               <chr> "Temperate forest", "Boreal forest", "Tem…
NA $ koppen                       <chr> "Dfc", "Dfb", "Dfb; Cfa", "Dfb; Cfa", "Df…
NA $ dbh_min_cm                   <chr> "1.5", "2.54", "5", "NRA", "5", "NRA", "5…
NA $ dbh_max_cm                   <chr> "13", "25.4", "NRA", "NRA", "NRA", "NRA",…
NA $ sample_size                  <chr> "18", "13", "NRA", "NRA", "NRA", "NRA", "…
NA $ collection_year              <chr> "1973", "1965", "1971, 1995", "1971, 1995…
NA $ dbh_units_original           <chr> "cm", "inch", "cm", "cm", "cm", "cm", "cm…
NA $ dbh_unit_CF                  <chr> "1", "0.393701", "1", "1", "1", "1", "1",…
NA $ output_units_original        <chr> "g", "lbs", "metric_ton", "m", "metric_to…
NA $ output_units_CF              <chr> "0.001", "0.453592", "1000", "1", "1000",…
NA $ allometry_development_method <chr> "harvest", "harvest", "harvest?", "harves…
NA $ regression_model             <chr> "log-transformed", "log-transformed", NA,…
NA $ r_squared                    <chr> "0.91", "0.99", "NI", "NI", "NI", "NI", "…
NA $ other_equations_tested       <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ log_biomass                  <chr> "1", "1", "1", "0", "1", "0", "1", "0", "…
NA $ bias_corrected               <chr> "0", "0", NA, NA, NA, NA, NA, NA, NA, NA,…
NA $ bias_correction_factor       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ notes_fitting_model          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ original_equation_id         <chr> "Upland black spruce", NA, NA, NA, NA, NA…
NA $ original_data_availability   <chr> "NRA", "NRA", "NRA", "NRA", "NRA", "NRA",…
NA $ equation_notes               <chr> NA, NA, "Proved difficult to obtain origi…

Table X: Full list of variables in equations metadata available used in selection criteria using allodb package below.

Navigating Databases

It is recommend using latitude and longitude variables over the field called geographic_area when filtering allodb database due to entry inconsistencies. This is done below using dplyr SQL. However, once the specific allometric equation is identified, we must re-select it using the allodb native function called new_equations() and the equation’s ID# (Gonzalez-Akre et al., 2025).
In subsequent cells, we import pan-tropical equations and fit them with values from by Global Wood Density database using the computeAGB() andgetWoodDensity() functions from the BIOMASS package (Réjou-Méchain, Tanguy, Piponiot, Chave, & Herault, 2017a).

Step 1: Geographic Selection

We recommend using latitude and longitude variables over the field called geographic_area when filtering allodb database due to entry inconsistencies. This is done below using dplyr SQL. However, once the specific allometric equation is identified, we must re-select it using the allodb native function called new_equations() and the equation’s ID# (Gonzalez-Akre et al., 2025).

In subsequent cells, we import pan-tropical equations and fit them with values from by Global Wood Density database using the computeAGB() andgetWoodDensity() functions from the BIOMASS package (Réjou-Méchain, Tanguy, Piponiot, Chave, & Herault, 2017a).

We filtered equations to North America using latitude bounds of 24-72°N and longitude bounds of -168 to -52°W. This geographic constraint ensures climate similarity with our Front Royal, Virginia study site (Cfa - humid subtropical classification). The filter retained 157 equations from the original 570, including equations developed in temperate forests across the eastern United States, southeastern Canada, and Alaska.

View Code

# Simple North America filter
eq_region <- equations |>
  dplyr::mutate(lat = as.numeric(lat), long = as.numeric(long)) |>
  dplyr::filter(!is.na(lat), !is.na(long), lat >= 24, lat <= 72, long >= -168, long <= -52)

# tabulate
show_cols   = c("ref_id", "equation_taxa", "allometry_specificity", "equation_allometry")
head(eq_region[, show_cols]) |> 
    flextable::flextable() |> 
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

ref_id	equation_taxa	allometry_specificity	equation_allometry
barney_1977_bdac	Picea mariana	Species	exp(3.63+2.54*log(dbh))
baskerville_1966_dmpi	Picea glauca	Species	exp(0.15+2.48*log(dbh))
bond-lamberty_2002_aabb	Larix laricina	Species	10^(2.648+0.715*(log10(dbh)))
bond-lamberty_2002_aabb	Salix	Genus	10^(2.481+1.19*(log10(dbh)))
bridge_1979_fpom	Quercus velutina	Species	exp(-0.34052+2.65803*log(dbh))
brown_1976_esbf	Juniperus communis	Species	exp(4.081+2.202(0.36738 + 0.94932log(dbh)))

View Code


#flextable::flextable(eq_region[, show_cols]) |> flextable::autofit() # complete list for Appendix
cat(sprintf("Equations valid to region: %d\n", nrow(eq_region)))
NA Equations valid to region: 157

Step 2: Taxonomic Selection

View Code

# species-specific equations for all Quercus subspecies of North America 
eq_region_species <- eq_region |>
    dplyr::filter(allometry_specificity == "Species") |>
    dplyr::filter(grepl("^Quercus", equation_taxa, ignore.case = TRUE) | # Starts with "Quercus"
        (allometry_specificity %in% c("Genus", "Family") & 
            grepl("Quercus", equation_taxa, ignore.case = TRUE))
        )

# genus-specific equations for Quercus populations of North America 
eq_region_genus = eq_region |> 
    dplyr::filter(allometry_specificity == "Genus") |>
    dplyr::filter(grepl("^Quercus", equation_taxa, ignore.case = TRUE) |  # Starts with "Quercus"
            (allometry_specificity %in% c("Genus", "Family") & 
        grepl("Quercus", equation_taxa, ignore.case = TRUE))
            )

# tabulate
flextable::flextable(eq_region_species[, show_cols]) |> 
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

ref_id	equation_taxa	allometry_specificity	equation_allometry
bridge_1979_fpom	Quercus velutina	Species	exp(-0.34052+2.65803*log(dbh))
chapman_1991_apac	Quercus rubra	Species	exp(-2.972-0.017dbh+2.873(log(dbh)))
clark_1986b_wvap	Quercus alba	Species	1.73738*(dbh^2)^1.33404
clark_1986b_wvap	Quercus alba	Species	1.69511*(dbh^2)^1.33917
clark_1986b_wvap	Quercus coccinea	Species	1.62952*(dbh^2)^1.34103
clark_1986b_wvap	Quercus coccinea	Species	2.94915*(dbh^2)^1.21733
clark_1986b_wvap	Quercus prinus	Species	0.86674*(dbh^2)^1.47585
clark_1986b_wvap	Quercus prinus	Species	3.40602*(dbh^2)^1.19049
clark_1986b_wvap	Quercus rubra	Species	3.06739*(dbh^2)^1.22338
clark_1986b_wvap	Quercus rubra	Species	3.28071*(dbh^2)^1.20936
clark_1986b_wvap	Quercus velutina	Species	1.2927*(dbh^2)^1.36723
clark_1986b_wvap	Quercus velutina	Species	2.5465*(dbh^2)^1.22586
martin_1998_aban	Quercus alba	Species	10^(-1.266+2.613*(log10(dbh)))
martin_1998_aban	Quercus coccinea	Species	10^(-1.283+2.685*(log10(dbh)))
martin_1998_aban	Quercus prinus	Species	10^(-1.587+2.910*(log10(dbh)))
martin_1998_aban	Quercus rubra	Species	10^(-1.259+2.644*(log10(dbh)))
perala_1994_abef	Quercus rubra	Species	0.1335(dbh^2.422)((3.706*dbh^0.4932)^0.4389)
this_study	Quercus agrifolia	Species	24.2(1-exp(-0.0466dbh^0.8183))
this_study	Quercus parvula	Species	1.8219*dbh^0.6332

View Code


#flextable::flextable(eq_region_genus[, show_cols]) |> flextable::autofit()
cat(sprintf("Genus-specific equations valid to region: %d\n", nrow(eq_region_genus)))
NA Genus-specific equations valid to region: 1
cat(sprintf("Species-specific equations valid to region: %d\n", nrow(eq_region_species)))
NA Species-specific equations valid to region: 19

Step 3: DBH Matching Selection

View Code

# filter genus-specific equations by DBH range of field data
field_dbh_min <- min(scbi_quercus$dbh, na.rm = T) # 1.04 cm
field_dbh_max <- max(scbi_quercus$dbh, na.rm = T) # 83.5 cm

eq_region_genus_dbh <- eq_region_genus |>
  dplyr::mutate(
    dbh_min_cm = as.numeric(dbh_min_cm), 
    dbh_max_cm = as.numeric(dbh_max_cm)) |>
  dplyr::filter(!is.na(dbh_min_cm), !is.na(dbh_max_cm),
    # extrapolation of saplings (min=20cm), compensated in Step 5 (risk if high sapling count)
    dbh_min_cm <= field_dbh_min * 20, 
    dbh_max_cm >= field_dbh_max)

eq_region_species_dbh <- eq_region_species |>
  dplyr::mutate(
    dbh_min_cm = as.numeric(dbh_min_cm), 
    dbh_max_cm = as.numeric(dbh_max_cm)) |>
  dplyr::filter(!is.na(dbh_min_cm), !is.na(dbh_max_cm),
    dbh_max_cm >= field_dbh_max * 0.7) 
        # extrapolation of crowns allowed, consider outliers

flextable::flextable(eq_region_genus_dbh[, show_cols]) |> 
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

ref_id	equation_taxa	allometry_specificity	equation_allometry
stovall_2018_ibca	Quercus sp.	Genus	exp(-1.5091+2.3237*log(dbh))+0.24

View Code


flextable::flextable(eq_region_species_dbh[, show_cols]) |> 
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

ref_id	equation_taxa	allometry_specificity	equation_allometry
chapman_1991_apac	Quercus rubra	Species	exp(-2.972-0.017dbh+2.873(log(dbh)))
clark_1986b_wvap	Quercus rubra	Species	3.28071*(dbh^2)^1.20936
martin_1998_aban	Quercus alba	Species	10^(-1.266+2.613*(log10(dbh)))
this_study	Quercus agrifolia	Species	24.2(1-exp(-0.0466dbh^0.8183))

View Code

    
cat(sprintf("Genus-specific equations valid for region, Quercus, & DBH range: %d\n", nrow(eq_region_genus_dbh)))
NA Genus-specific equations valid for region, Quercus, & DBH range: 1
cat(sprintf("Specific-specific equations valid for region, Quercus, & DBH range: %d\n", nrow(eq_region_species_dbh)))
NA Specific-specific equations valid for region, Quercus, & DBH range: 4


view(equations)

Step 4: Sample Size Selection

The required sample size for allometric equation development depends on desired precision and diameter distribution. Roxburgh et al. (2015) demonstrated through Monte Carlo resampling that biomass predictions with a standard deviation within 7.5% of mean demands sample sizes of between 17 and 166, depending on the algorithm employed. Most importantly, stratified sampling across age class or dbh size is critical to improving precision.

Following Roxburgh et al. (2015), we applied a minimum sample size threshold of n ≥ 17 to ensure equations were derived from adequate destructive sampling campaigns. This threshold balances precision requirements (coefficient of variation ≤7.5%) with practical constraints of destructive sampling costs. Stratification across DBH size classes during equation development is equally critical for capturing biomass variance across age cohorts.

View Code

eq_region_genus_dbh_sample <- eq_region_genus_dbh |>
  dplyr::filter(sample_size >= 17)  # Minimum for genus-specific equations

eq_region_species_dbh_sample <- eq_region_species_dbh |>
  dplyr::filter(sample_size >= 17)  # Minimum for species-specific equations

# Display selected equations & tally valid equations remaining
flextable::flextable(eq_region_genus_dbh_sample[, show_cols]) |> 
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

ref_id	equation_taxa	allometry_specificity	equation_allometry
stovall_2018_ibca	Quercus sp.	Genus	exp(-1.5091+2.3237*log(dbh))+0.24

View Code

    
flextable::flextable(eq_region_species_dbh_sample[, show_cols]) |>
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

ref_id	equation_taxa	allometry_specificity	equation_allometry
clark_1986b_wvap	Quercus rubra	Species	3.28071*(dbh^2)^1.20936
this_study	Quercus agrifolia	Species	24.2(1-exp(-0.0466dbh^0.8183))

View Code


cat(sprintf("Genus-specific equations meeting all criteria: %d\n", nrow(eq_region_genus_dbh_sample)))
NA Genus-specific equations meeting all criteria: 1
cat(sprintf("Species-specific equations meeting all criteria: %d\n", nrow(eq_region_species_dbh_sample)))
NA Species-specific equations meeting all criteria: 2

Criterion	Threshold	Selected_Value	Justification
Geography	North America (24-72°N, -168 to -52°W)	157 equations in region	Climate similarity to Front Royal, VA (Cfa)
Taxonomy	Species > Genus > Family	1 genus-level, 19 species-specific	Species-specific preferred; genus fallback acceptable
DBH Range	≥80% field data coverage	1.0-83.5 cm (field), 11-93 cm (equation)	Minor extrapolation for saplings (<5% biomass)
Sample Size	≥50 (Roxburgh et al. 2015)	n=66 (Stovall 2018)	Meets precision threshold of CV=7.5%

Table X: Equation Selection Documentation for REDD+ MRV

Attribute	Detail
Equation Taxa/ID	Quercus spp. (Genus-level)
Biomass Method	Non-destructive estimation using Terrestrial Laser Scanning (TLS) or Terrestrial LiDAR to model tree volume, which was then converted to biomass using published wood density values.
Geographic Location	Front Royal, Virginia, USA (Smithsonian Conservation Biology Institute, SCBI) at 38.89 N, -78.15W.
Empirical Data	The models were developed using 258 non-destructive volume/biomass estimates across 10 dominant hardwood species in this Temperate Mixed Deciduous Forest. The specific equation is a site-specific genus model.
Rationale for Selection	This equation is the most geographically and ecologically similar candidate. It was developed at the exact same site (SCBI in Virginia) where your sample data is likely sourced (based on `allodb.Rmd`content). Despite being a genus-level equation for Quercus (not species-specific), the high degree of climate similarity (Cfa/Temperate Forest) and geographic proximity ensures maximum weight will be assigned by the `allodb` weighting framework, making it the highest priority candidate for Quercus trees at this site.
Citation	Stovall, A. E., Anderson-Teixeira, K. J., & Shugart, H. H. (2018). Terrestrial LiDAR-derived non-destructive woody biomass estimates for 10 hardwood species in Virginia. Data in brief, 19, 1560-1569

Attribute	Detail
Equation Taxa/ID	Quercus rubra (Species-level)
Biomass Method	Destructive Harvesting (Dimensional Analysis). The traditional method involving felling, weighing, and drying tree components to directly measure biomass.
Geographic Location	West Virginia, USA (specifically, Monongahela National Forest).
Empirical Data	This is a species-specific equation developed explicitly for Northern Red Oak (Quercus rubra). The sample size is typically smaller than meta-analyses but provides high-resolution, direct measurements.
Rationale for Selection	This equation provides high taxonomic specificity (a perfect species match for Quercus rubra). Although its location (West Virginia) is not as proximate as the Stovall (2018) study, it is still within the same Appalachian/Eastern US Temperate Forest region. It serves as a crucial check and source of high-quality empirical data derived from the gold-standard destructive method. `allodb` balances its lower geographic proximity with its higher taxonomic specificity and robust methodology, making it the strongest secondary candidate.
Citation	Clark, A. (1985). Weight, volume, and physical properties of major hardwood species in the Gulf and Atlantic Coastal Plains (Vol. 250). US Department of Agriculture, Forest Service, Southeastern Forest Experiment Station

Step 5: Combining Species Equations

The following demonstrates functions for consolidating new filtered allometry equations into one combined database prepared for use in final inventory biomass estimations. For this important task, we recommend the specific functions below from the allodb library that are ensure a weighted approach is applied to synthesizing and candidate equations:

The weight_allom() function is responsible for assigning a weight to each candidate equation based on three criteria, determining its influence on the final result:

Sample size: Equations derived from varying destructive sampling campaigns (n > 100) receive higher weights.
Taxonomic specificity: Species-specific equations are weighted more heavily than genus-level, which in turn outweigh family-level equations.
Climate similarity: Equations developed from geographically proximate locations with similar temperature and precipitation regimes receive priority.

The resample_agb() function implements a Monte Carlo resampling procedure to synthesize a robust, synthetic dataset from the weighted equations:

Each candidate equation is resampled within its original DBH range.
The number of resampled values for each equation is proportional to its assigned weight.
A default of 10,000 iterations (or 1e4 in the demo) ensures a robust representation of the uncertainty distribution. This process generates a synthetic dataset that reflects the collective information from all weighted equations, spanning the full DBH range observed in the target forest.

The est_params() function then uses the synthetic resampled data to fit the following nonlinear power-law model:

\[ AGB = \alpha \ * \ DBH^b \ + \ \epsilon \]

This process yields the location-specific parameters:

⍺ (scaling coefficient): Incorporates local wood density and architectural characteristics.
𝒃 (allometric exponent): Typically ranges from 2.3 to 2.7, reflecting biomechanical constraints.
σ (residual standard deviation): Quantifies the prediction uncertainty for the calibrated equation.

The resulting equation is location-specific, informed by broader taxonomic and geographic patterns encoded in the weighted source equations.

View Code

# This table includes filtered default Acer equations AND custom new equations.
eq_tab_combined <- new_equations(
  subset_taxa = "Quercus",
  new_taxa = c("Quercus ilex", "Castanea sativa"),
  new_allometry = c("0.12*dbh^2.5", "0.15*dbh^2.7"),
  new_coords = c(4, 44),
  new_min_dbh = c(5, 10),
  new_max_dbh = c(35, 68),
  new_sample_size = c(143, 62)
)

# The get_biomass() function internally performs the weighting, resampling,
# and calibration based on the equations in eq_tab_combined.
agb_custom_estimate <- get_biomass(
  dbh = 50,
genus = "Quercus",
  species = "rubrum",
  coords = c(-78.2, 38.9),
  new_eqtable = eq_tab_combined # Use the consolidated custom table
)

# Print the resulting AGB estimate
print(paste("Estimated AGB using the combined custom table:", agb_custom_estimate, "kg"))

This weighted synthesis approach ensures that geographically proximate, species-specific equations with larger sample sizes exert greater influence on the final calibrated model. The resulting equation incorporates collective information from all weighted candidates while reflecting local wood density and architectural characteristics. Implementation details are provided in Section 1.6.

Allometry Best Practices

Transparency: Document all criteria and thresholds
Reproducibility: Code-based workflow enables auditing
Bias Reduction: Geographic and taxonomic filtering minimizes systematic errors to species level
Uncertainty Surveillance: Multiple equations enable sensitivity analysis (Section 1.5)

Biomass Estimations

Having identified candidate species equations, we now compare their aboveground biomass estimates with biome-generic equations (Brown, 1997; Chave et al., 2014). Noting their geographic mismatch, we compared these with pan-tropical equations specifically in order to highlight discrepancies in uncertainty and significance of geography to allometric calibrations.

We calculated aboveground biomass for all 84 Quercus stems using five allometric approaches:

Species-specific (allodb weighted synthesis): Location-optimized equation incorporating multiple Quercus species equations
Genus-level (Stovall 2018): Site-specific Quercus equation from the same SCBI location
Chave et al. 2014 (pan-tropical moist): Generic equation incorporating wood density and environmental stress factors
Chave et al. 2005 (pan-tropical): Earlier generic formulation with DBH and wood density
Brown 1997 (tropical): Widely used generic equation based on DBH alone

Wood density values were extracted from the Global Wood Density database (Vieilledent et al., 2018) using the BIOMASS package, yielding mean WD = 0.630 g/cm³ for Quercus species (consistent with oak hardwood density). Note that Chave’s 2014 equation’s outputs required unit conversion from Mg to kg (×1000; Chave et al, 2014, Réjou-Méchain, Tanguy, Piponiot, Chave, & Hérault, 2017), while Brown 1997 required division by 100 to match the original scale. These unit conversions are critical for accurate inter-equation comparison (see Section 1.6 for implementation).

View Code

# once allometry equation is confirmed, use native allodb to load it
eq_region_genus_dbh_sample = allodb::new_equations(subset_ids = "a664c1")

# species-specific biomass estimates 
scbi_quercus$agb_species <- allodb::get_biomass(
  dbh       = scbi_quercus$dbh,
  genus     = scbi_quercus$genus,
  species = scbi_quercus$species,
  coords    = c(-78.2, 38.9)
  )

# genus-specific biomass estimates
scbi_quercus$agb_genus <- allodb::get_biomass(
    dbh         = scbi_quercus$dbh,
    genus   = scbi_quercus$genus,
    species = scbi_quercus$species,
    coords  = c(-78.2, 38.9),
    new_eqtable = eq_region_genus_dbh_sample
  )

The following functions from the BIOMASS package (Réjou-Méchain, Tanguy, Piponiot, Chave, & Herault, 2017b) require some additional wrangling, specifically to extract values from the Global Wood Density database (Vieilledent et al., 2018) and convert units (Réjou-Méchain, Tanguy, Piponiot, Chave, & Hérault, 2017).

Unit Conversions

Always verify if equations require unit conversions or scaling limiters. For example, the most commonly used allometry equation from Chave et al (2014) requires conversion from milligrams to kilograms, as shown below (* 1000).

View Code

# derive generic estimates using standard equations
wood_densities <- BIOMASS::getWoodDensity(
  genus         = scbi_quercus$genus,
  species   = scbi_quercus$species,
  stand         = scbi_quercus$Plot)
scbi_quercus$WD <- wood_densities$meanWD

# Chave et al 2014
scbi_quercus$agb_chave2014 <- BIOMASS::computeAGB(
  D         = scbi_quercus$dbh,
  WD        = scbi_quercus$WD,
  coord = c(-78.2, 38.9))

# convert units
scbi_quercus$agb_chave2014 = scbi_quercus$agb_chave2014 * 1000

# Chave et al 2005 (MANUAL Fit)
scbi_quercus$agb_chave2005 <- scbi_quercus$WD * 
  exp(-1.499 + 2.148*log(scbi_quercus$dbh) + 
      0.207*(log(scbi_quercus$dbh))^2 - 
      0.0281*(log(scbi_quercus$dbh))^3)

# Brown et al 1997 (MANUAL Fit)
scbi_quercus$agb_brown1997 <- exp(2.134 + 2.53 * log(scbi_quercus$dbh))
scbi_quercus$agb_brown1997 = scbi_quercus$agb_brown1997 / 100 

flextable::flextable(scbi_quercus) |>
    flextable::fontsize(size = 9, part = "all") |>
    flextable::set_table_properties(layout = "autofit", width = 1)

treeID	stemID	dbh	genus	species	Family	agb_species	agb_genus	WD	agb_chave2014	agb_chave2005	agb_brown1997
6,014	6,014	8.80	Quercus	alba	Fagaceae	41.7637958	34.6344546	0.6300	7.28909988	29.9737527	20.71693889
6,201	6,201	41.72	Quercus	alba	Fagaceae	1,290.4913814	1,287.8799035	0.6300	369.67536146	1,763.7413271	1,062.32043295
1,158	1,158	29.82	Quercus	alba	Fagaceae	615.5371694	590.2328264	0.6300	161.46721089	748.5947279	454.24357825
3,736	3,736	44.38	Quercus	alba	Fagaceae	1,478.8739694	1,486.7745358	0.6300	430.07341800	2,060.0375737	1,242.13345169
3,741	3,741	1.12	Quercus	alba	Fagaceae	0.4437772	0.2879769	0.6300	0.02849730	0.1799659	0.11253976
6,014	32,937	8.20	Quercus	alba	Fagaceae	35.7428951	29.3933603	0.6300	6.06737169	24.8619475	17.32740278
6,109	6,109	72.18	Quercus	alba	Fagaceae	4,321.0815102	4,602.9666929	0.6300	1,397.59542709	6,762.9023338	4,251.86147125
3,829	3,829	1.75	Quercus	alba	Fagaceae	1.1869883	0.8122677	0.6300	0.09783959	0.4970427	0.34807328
6,065	6,065	66.91	Quercus	alba	Fagaceae	3,656.0028342	3,859.5305575	0.6300	1,164.71639683	5,643.1934452	3,509.75409893
6,122	6,122	26.13	Quercus	alba	Fagaceae	460.0281377	434.2381530	0.6300	116.23508795	531.5668681	325.19741123
3,942	3,942	20.04	Quercus	alba	Fagaceae	256.2896125	234.4031792	0.6300	59.76904328	265.4142731	166.18262526
3,832	3,832	8.75	Quercus	alba	Fagaceae	41.2424636	34.1789382	0.6300	7.18211884	29.5246386	20.42042603
6,198	6,198	18.55	Quercus	alba	Fagaceae	216.1527538	195.8849828	0.6300	49.18605136	216.5281391	136.67671331
3,829	32,657	1.52	Quercus	alba	Fagaceae	0.8700448	0.5854821	0.6300	0.06641044	0.3579301	0.24369616
4,930	4,930	42.89	Quercus	michauxii	Fagaceae	1,373.4642682	1,373.3608400	0.6515	408.00003593	1,955.3844042	1,139.31864776
4,908	4,908	33.74	Quercus	prinus	Fagaceae	807.5290502	786.4119199	0.5700	199.94839362	930.5024048	620.85681481
4,902	4,902	26.56	Quercus	prinus	Fagaceae	476.4422695	451.0228853	0.5700	110.41164268	501.7925319	338.90772523
3,833	3,833	55.13	Quercus	prinus	Fagaceae	2,384.4112362	2,461.0519474	0.5700	665.23436471	3,191.5742054	2,150.28497856
3,861	32,665	31.93	Quercus	prinus	Fagaceae	715.0795197	691.8486167	0.5700	174.45804487	807.7462398	540.01726667
4,883	4,883	16.65	Quercus	prinus	Fagaceae	170.1313989	152.3911925	0.5700	34.12476597	147.2521410	103.98302468
4,945	4,945	7.15	Quercus	prinus	Fagaceae	26.3798528	21.3787654	0.5700	3.87114822	15.6636024	12.25120227
3,841	32,661	3.19	Quercus	prinus	Fagaceae	4.4498892	3.2776221	0.5700	0.45567375	1.9450578	1.58991841
6,105	6,105	38.12	Quercus	prinus	Fagaceae	1,056.9294420	1,044.2728542	0.5700	270.14158890	1,270.1903091	845.47598474
5,006	5,006	48.12	Quercus	prinus	Fagaceae	1,766.6172691	1,794.2757411	0.5700	477.90912983	2,280.9877313	1,524.29234100
4,937	4,937	52.78	Quercus	prinus	Fagaceae	2,166.0259005	2,224.1461809	0.5700	598.47838912	2,867.4570591	1,925.89207151
3,851	3,851	30.39	Quercus	prinus	Fagaceae	641.2312540	616.7790454	0.5700	154.34390938	711.2124639	476.53310154
6,175	6,175	1.06	Quercus	prinus	Fagaceae	0.3919584	0.2533948	0.5700	0.02229260	0.1443867	0.09790577
3,759	3,759	5.41	Quercus	prinus	Fagaceae	14.2631279	11.1837102	0.5700	1.86096287	7.5384467	6.05025214
4,910	4,910	24.54	Quercus	prinus	Fagaceae	400.1816203	375.2995279	0.5700	90.62579821	408.3704247	277.43869296
3,859	3,859	47.92	Quercus	prinus	Fagaceae	1,750.4667155	1,776.9958375	0.5700	473.08038364	2,257.4804788	1,508.31475695
4,921	4,921	49.49	Quercus	prinus	Fagaceae	1,879.4324090	1,915.2122960	0.5700	511.74583227	2,445.6803675	1,636.49080317
3,854	3,854	23.03	Quercus	prinus	Fagaceae	347.8876503	323.8130771	0.5700	77.30652427	345.9194869	236.25885900
4,971	4,971	44.51	Quercus	prinus	Fagaceae	1,487.5100601	1,496.9134623	0.5700	395.04844833	1,877.5514630	1,251.35966700
4,905	4,905	19.73	Quercus	prinus	Fagaceae	247.3597617	226.0642853	0.5700	52.41058628	230.4819305	159.75555463
4,895	4,895	33.33	Quercus	prinus	Fagaceae	786.0488718	764.3861533	0.5700	193.99724551	901.8064724	601.94627008
3,754	3,754	53.48	Quercus	prinus	Fagaceae	2,229.8789323	2,293.2871922	0.5700	617.94106423	2,962.0057034	1,991.17136355
3,758	3,758	29.88	Quercus	prinus	Fagaceae	617.7416800	592.9957612	0.5700	147.99679828	680.8237270	456.55939093
6,105	32,948	29.05	Quercus	prinus	Fagaceae	580.5353679	555.4242351	0.5700	137.99733622	633.0294522	425.15192548
5,015	32,813	30.05	Quercus	prinus	Fagaceae	625.5183651	600.8643974	0.5700	150.09528540	690.8666473	463.15989790
3,861	3,861	47.58	Quercus	prinus	Fagaceae	1,723.1965011	1,747.8381802	0.5700	464.93603183	2,217.8303142	1,481.38595772
6,181	6,181	1.09	Quercus	prinus	Fagaceae	0.4168379	0.2703709	0.5700	0.02409664	0.1534332	0.10506872
6,051	6,051	17.43	Quercus	prinus	Fagaceae	188.2040310	169.4956107	0.5700	38.32204684	166.2015346	116.75267326
4,921	32,798	45.95	Quercus	prinus	Fagaceae	1,595.7024677	1,611.8547626	0.5700	427.01499709	2,033.1946354	1,356.33440775
3,841	3,841	4.07	Quercus	prinus	Fagaceae	7.6147833	5.7729030	0.5700	0.87473536	3.6111657	2.94481692
4,900	4,900	35.01	Quercus	prinus	Fagaceae	876.0796244	856.9104965	0.5700	219.04815032	1,022.7284035	681.69558886
4,941	4,941	19.88	Quercus	prinus	Fagaceae	251.5256581	230.0777621	0.5700	53.42046148	235.1235863	162.84629534
5,015	5,015	51.95	Quercus	prinus	Fagaceae	2,091.6262859	2,143.7233445	0.5700	575.86234802	2,757.5403004	1,850.18804602
3,754	32,644	18.79	Quercus	prinus	Fagaceae	222.1161359	201.8240515	0.5700	46.33847538	202.6626664	141.19494298
4,942	4,942	38.24	Quercus	prinus	Fagaceae	1,064.2799971	1,051.9267603	0.5700	272.23782635	1,280.3611441	852.22575020
4,939	4,939	8.42	Quercus	prinus	Fagaceae	37.8310265	31.2583034	0.5700	5.92783459	24.1268820	18.52780521
6,043	6,043	14.43	Quercus	rubra	Fagaceae	124.3720267	109.2846080	0.5600	23.33392646	99.0210094	72.39828285
3,740	3,740	37.50	Quercus	rubra	Fagaceae	1,022.9397332	1,005.2332291	0.5600	255.26797929	1,197.0066207	811.11705223
4,844	4,844	4.50	Quercus	rubra	Fagaceae	9.5099394	7.2902273	0.5600	1.12427256	4.5933623	3.79675072
4,967	4,967	59.19	Quercus	rubra	Fagaceae	2,800.2155110	2,902.8525462	0.5600	777.41640924	3,730.0673948	2,573.78798878
5,996	5,996	2.38	Quercus	rubra	Fagaceae	2.3324680	1.6595002	0.5600	0.20318978	0.9240671	0.75774676
4,866	4,866	23.28	Quercus	rubra	Fagaceae	357.2923365	332.0392166	0.5600	78.14311997	349.5885982	242.80150157
3,757	3,757	56.39	Quercus	rubra	Fagaceae	2,516.2600962	2,593.7251264	0.5600	691.34613971	3,314.0550756	2,276.80465795
6,176	6,176	83.50	Quercus	rubra	Fagaceae	5,982.8500337	6,457.2112247	0.5600	1,777.37126337	8,469.3561623	6,146.83007285
4,961	4,961	5.91	Quercus	rubra	Fagaceae	17.3523140	13.7336417	0.5600	2.31119112	9.3298900	7.56659907
4,868	4,868	10.05	Quercus	rubra	Fagaceae	55.9887078	47.1558874	0.5600	9.22389498	37.8950021	28.99109110
6,076	6,076	38.59	Quercus	rubra	Fagaceae	1,089.6953435	1,074.4330615	0.5600	273.90764662	1,287.3055373	872.09861504
2,623	2,623	17.00	Quercus	rubra	Fagaceae	178.5575597	159.9381204	0.5600	35.39142453	152.8518749	109.60243314
6,161	6,161	11.30	Quercus	rubra	Fagaceae	72.5157421	61.9200115	0.5600	12.47711209	51.7345785	39.00074350
4,907	4,907	49.98	Quercus	rubra	Fagaceae	1,928.0952178	1,959.5606600	0.5600	515.70304472	2,462.1753315	1,677.79507251
3,677	3,677	38.92	Quercus	rubra	Fagaceae	1,110.3616032	1,095.9023318	0.5600	279.70107400	1,315.3901411	891.09031354
6,162	6,162	50.62	Quercus	rubra	Fagaceae	1,982.9908592	2,018.3579262	0.5600	531.92428266	2,540.9789427	1,732.68440736
6,041	6,041	9.66	Quercus	rubra	Fagaceae	51.3068293	43.0127640	0.5600	8.32711933	34.1164666	26.22868617
3,828	32,656	1.04	Quercus	rubra	Fagaceae	0.3754286	0.2424244	0.5600	0.02079709	0.1361149	0.09329937
3,735	3,735	40.03	Quercus	rubra	Fagaceae	1,181.4363039	1,169.8984836	0.5600	299.70417376	1,412.4159200	956.79728404
6,142	32,955	2.37	Quercus	rubra	Fagaceae	2.3109000	1.6433444	0.5600	0.20088003	0.9146106	0.74971772
4,873	4,873	12.29	Quercus	rubra	Fagaceae	87.2783294	75.2622525	0.5600	15.47986003	64.6640393	48.23371373
4,864	4,864	11.13	Quercus	rubra	Fagaceae	70.1305321	59.7770919	0.5600	11.99996949	49.6929365	37.53334068
6,031	6,031	25.83	Quercus	rubra	Fagaceae	449.3891747	422.7421958	0.5600	101.33350564	458.5174382	315.83417415
6,142	6,142	3.72	Quercus	rubra	Fagaceae	6.2485265	4.6844437	0.5600	0.67693012	2.8196238	2.34562232
3,828	3,828	1.47	Quercus	rubra	Fagaceae	0.8055831	0.5417037	0.5600	0.05433775	0.2945939	0.22392226
6,000	6,000	21.84	Quercus	rubra	Fagaceae	310.3416778	286.2581018	0.5600	66.58020151	295.7446428	206.58251378
4,871	4,871	5.79	Quercus	velutina	Fagaceae	16.5409796	13.0944079	0.5600	2.18970505	8.8423679	7.18391557
5,010	5,010	48.71	Quercus	velutina	Fagaceae	1,813.7548532	1,845.8072228	0.5600	484.36445811	2,309.8804079	1,572.02086480
3,782	3,782	29.62	Quercus	velutina	Fagaceae	605.4843397	581.0757119	0.5600	142.48091704	653.9536822	446.57525689
6,020	6,020	12.18	Quercus	velutina	Fagaceae	85.2885641	73.7063466	0.5600	15.12706577	63.1381551	47.14895719
6,160	6,160	62.59	Quercus	velutina	Fagaceae	3,153.0868571	3,305.1052960	0.5600	889.75082893	4,271.3028221	2,964.43542637
6,145	6,145	48.15	Quercus	velutina	Fagaceae	1,768.0826750	1,796.8758387	0.5600	470.90307235	2,244.4460318	1,526.69766645
6,139	6,139	62.21	Quercus	velutina	Fagaceae	3,111.0191339	3,258.6681381	0.5600	876.76558732	4,208.8442021	2,919.11154588
6,101	6,101	16.18	Quercus	velutina	Fagaceae	159.5534665	142.5822436	0.5600	31.22082197	134.1111885	96.71639363

Table 1D: Aboveground biomass estimates derived from five allometry equations of varying scales

Normality Testing

Non-normal distributions violate assumptions of parametric statistics, inflating uncertainty estimates. Identifying the true probability distribution enables appropriate transformations that reduce reported uncertainty—directly reducing carbon credit deductions.

Accurate probability density functions (PDFs) are essential for uncertainty modeling. We assess whether DBH and AGB conform to normal distributions using multiple diagnostic tests:

Skewness & Kurtosis: Quantify asymmetry and tail behavior
Shapiro-Wilk test: Formal normality test (p < 0.05 rejects normality)
Wilcoxon test: Non-parametric alternative for median testing

Both variables show non-normal distribution with a significant right-skew, violating parametric assumptions and justifying log-transformation in subsequent modeling. Technically, this kind of skew often represents a dataset of many small trees and few large dominants. Statistically, this high positive skewness is confirmed by Shapiro-Wilk test results (p < 0.001) indicating a distribution likely to inflate uncertainty estimates if left untreated.

View Code

# Calculate skewness and kurtosis for DBH
dbh_skew        = moments::skewness(scbi_quercus$dbh)
dbh_kurt        = moments::kurtosis(scbi_quercus$dbh)
dbh_shapiro = stats::shapiro.test(scbi_quercus$dbh)
dbh_wilcox  = stats::wilcox.test(scbi_quercus$dbh)

# Calculate skewness and kurtosis for each AGB estimate
agb_species_skew            = moments::skewness(scbi_quercus$agb_species)
agb_species_kurt            = moments::kurtosis(scbi_quercus$agb_species)
agb_species_shapiro     = stats::shapiro.test(scbi_quercus$agb_species)
agb_species_wilcox      = stats::wilcox.test(scbi_quercus$agb_species)

agb_genus_skew              = moments::skewness(scbi_quercus$agb_genus)
agb_genus_kurt              = moments::kurtosis(scbi_quercus$agb_genus)
agb_genus_shapiro       = stats::shapiro.test(scbi_quercus$agb_genus)
agb_genus_wilcox            = stats::wilcox.test(scbi_quercus$agb_genus)
    
agb_chave2014_skew      = moments::skewness(scbi_quercus$agb_chave2014)
agb_chave2014_kurt      = moments::kurtosis(scbi_quercus$agb_chave2014)
agb_chave2014_shapiro = stats::shapiro.test(scbi_quercus$agb_chave2014)
agb_chave2014_wilcox    = stats::wilcox.test(scbi_quercus$agb_chave2014)

agb_brown1997_skew      = moments::skewness(scbi_quercus$agb_brown1997)
agb_brown1997_kurt      = moments::kurtosis(scbi_quercus$agb_brown1997)
agb_brown1997_shapiro = stats::shapiro.test(scbi_quercus$agb_brown1997)
agb_brown1997_wilcox  = stats::wilcox.test(scbi_quercus$agb_brown1997)

agb_chave2005_skew      = moments::skewness(scbi_quercus$agb_chave2005)
agb_chave2005_kurt      = moments::kurtosis(scbi_quercus$agb_chave2005)
agb_chave2005_shapiro = stats::shapiro.test(scbi_quercus$agb_chave2005)
agb_chave2005_wilcox    = stats::shapiro.test(scbi_quercus$agb_chave2005)

Variable	n	Mean...SD	Skewness	Kurtosis	Shapiro.Wilk.p	Decision
DBH (cm)	84	27.8 ± 20.2	0.45	2.34	0.002	Log(x) needed?: YES
AGB Species (kg)	84	894.7 ± 1124.2	1.89	7.44	< 0.001	Log(x) needed?: YES
AGB Genus (kg)	84	908.5 ± 1187.6	2.03	8.21	< 0.001	Log(x) needed?: YES
AGB Chave2014 (kg)	84	244.1 ± 333.1	2.15	8.61	< 0.001	Log(x) needed?: YES
AGB Brown1997 (kg)	84	776.5 ± 1085.9	2.30	9.79	< 0.001	Log(x) needed?: YES
AGB Chave2005 (kg)	84	1159.3 ± 1601.6	2.15	8.54	< 0.001	Log(x) needed?: YES
All AGB estimates exhibit significant departure from normality (p < 0.001) with extreme right-skew (skewness > 2) regardless of equation type, justifying log-transformation in subsequent analysis.

Bivariate Testing

Heteroscedasticity violates ordinary least squares assumptions, producing unreliable standard errors and inflated uncertainty estimates that threaten carbon credit revenues. The Breusch-Pagan test identifies which predictor drives heteroscedasticity by regressing squared residuals against predictor values:

Null hypothesis (H₀): Homoscedastic (constant variance)
Alternative (H₁): Heteroscedastic (variance changes with predictors)
Decision rule: p < 0.05 rejects H₀

Large trees exhibit greater prediction variance than small trees due to power-law scaling (AGB ∝ DBH^2.5). Without correction, uncertainty estimates are systematically upward-biased, particularly for canopy dominants that drive cumulative biomass.

Test Results

BP statistic: 10.0051
p-value: 0.0016
Decision: Heteroscedasticity confirmed

The diagnostic plot (Figure 1.2) shows the classic funnel pattern: small trees (DBH < 20 cm) cluster tightly around the fitted line while large trees (DBH > 60 cm) exhibit substantial scatter. This reflects power-law scaling where small multiplicative errors in β translate to large absolute errors for canopy dominants.

Common Corrections

Log-transformation of both variables
Weighted regression for residual heteroscedasticity
Robust standard errors to supplement transformation

View Code

# Derive results to test non-constant variance of selected predictor
dbh_agb_species_lm <- lm(agb_species ~ dbh, data = scbi_quercus)
bp_test <- lmtest::bptest(dbh_agb_species_lm)  
cat(sprintf("Breusch-Pagan Test Results:\n"))
NA Breusch-Pagan Test Results:
cat(sprintf("BP statistic: %.4f\n", bp_test$statistic))
NA BP statistic: 10.0051
cat(sprintf("p-value: %.4f\n", bp_test$p.value))
NA p-value: 0.0016
cat(sprintf("Decision: %s\n", ifelse(bp_test$p.value < 0.05, 
    "Reject H₀ - Heteroscedasticity present", "Fail to reject H₀ - Homoscedasticity")))
NA Decision: Reject H₀ - Heteroscedasticity present

# Visualize
plot(scbi_quercus$dbh, scbi_quercus$agb_species,
     pch = 16, cex = 0.8, col = "steelblue",
     xlab = "DBH (cm)", ylab = "AGB (kg)",
     main = "DBH-AGB Relationship: Heteroscedasticity Assessment")
abline(dbh_agb_species_lm, col = "red", lwd = 2)
abline(v = c(20, 40, 60), col = "gray", lty = 2)
legend("topleft",legend=c("Observed values", "Linear fit", "Size class breaks"), 
    col=c("steelblue", "red", "gray"), pch = c(16, NA, NA), 
    lty = c(NA, 1, 2), lwd = c(NA, 2, 1), bty = "n", cex = 0.8)
text(x = 70, y = max(scbi_quercus$agb_species) * 0.9,
     labels = sprintf("BP test: p = %.4f\nHeteroscedasticity confirmed", bp_test$p.value),
     adj = c(0, 1), cex = 0.9, font = 2)

Large trees exhibit greater prediction variance than small trees due to the power-law relationship (AGB ∝ DBH^~2.5). Without correction, uncertainty estimates are biased upward, particularly for canopy dominants.

Required corrections:

Log-transformation of both variables
Weighted regression for residual heteroscedasticity
Robust standard errors to supplement transformation

1.3 Model Optimization

Log-Transformation

Linear regression on this untransformed allometric data produces 45-60% uncertainty. Log-transformation reduces RMSE a 51 percentage point uncertainty reduction. Allometric relationships tend to follow the power law:

\[AGB = \alpha \times DBH^{\beta}\]

where β typically ranges from 2.3-2.7, meaning biomass scales with DBH raised to a power. Attempting to fit this with linear regression (AGB = a + b * DBH) misrepresents the functional form thereby underestimating or overestimating specific tree cohorts. Critical to crediting, this forces exponential patterns into residual noise that inflates uncertainty downstream. Alternatively, we may apply logarithmic transformations to the equation or specific variables. This linearizes the power-law relationship so that:

\[\ln(AGB) = \ln(\alpha) + \beta \times \ln(DBH) + \epsilon\]

where

β becomes a slope coefficient
Variance stabilizes across tree sizes
Residuals are normalized enough to satisfy OLS assumptions that our predictions are dependent on

Back-Transformation

As demonstrated below, it is important to back-transform the log-scale of RMSE. This converts log-scale error to proportional error on its original scale, which enables direct comparison with linear model uncertainty while preserving variance structure stabilized by log-transformation.

We fitted linear and log-transformed regression models for all five allometric equations, comparing their prediction accuracy using:

MAE (Mean Absolute Error): Average absolute deviation in kg
RMSE (Root Mean Square Error): Standard deviation of residuals in kg
Relative RMSE (%): RMSE as percentage of mean biomass, enabling cross-equation comparison

For log-transformed models, predictions were back-transformed to the original scale using exponentiation (exp(predicted log-AGB)), and metrics were calculated on back-transformed residuals. This ensures fair comparison with linear models while preserving the variance-stabilizing benefits of log-transformation.

Results (Table X) demonstrate dramatic improvement: log-transformation reduced relative RMSE from 45-60% to 0.3-17% across all equation types—representing 40-60 percentage point reductions in uncertainty. The species-specific log model achieved the lowest uncertainty (0.34%), while even generic pan-tropical equations improved substantially under log-transformation.

View Code

# Derive performance metrics #
lin_species     = lm(agb_species ~ dbh, data = scbi_quercus)
lin_genus           = lm(agb_genus ~ dbh, data = scbi_quercus) 
lin_chave2014   = lm(agb_chave2014 ~ dbh, data = scbi_quercus) 
lin_brown1997   = lm(agb_brown1997 ~ dbh, data = scbi_quercus) 
lin_chave2005   = lm(agb_chave2005 ~ dbh, data = scbi_quercus) 
log_species     = lm(log(agb_species) ~ log(dbh), data = scbi_quercus) 
log_genus       = lm(log(agb_genus) ~ log(dbh), data = scbi_quercus) 
log_chave2014 = lm(log(agb_chave2014) ~ log(dbh), data = scbi_quercus)
log_brown1997 = lm(log(agb_brown1997) ~ log(dbh), data = scbi_quercus)
log_chave2005   = lm(log(agb_chave2005) ~ log(dbh), data = scbi_quercus)


# Residuals: log models back-transformed **essential
lin_species_resid   = predict(lin_species, scbi_quercus, type='response')
lin_genus_resid         = predict(lin_genus, scbi_quercus, type="response")
lin_chave2014_resid = predict(lin_chave2014, scbi_quercus, type="response")
lin_brown1997_resid = predict(lin_brown1997, scbi_quercus, type="response")
lin_chave2005_resid = predict(lin_chave2005, scbi_quercus, type="response")
log_species_resid       = exp(predict(log_species, scbi_quercus))
log_genus_resid         = exp(predict(log_genus, scbi_quercus)) 
log_chave2014_resid = exp(predict(log_chave2014, scbi_quercus)) 
log_brown1997_resid = exp(predict(log_brown1997, scbi_quercus)) 
log_chave2005_resid = exp(predict(log_chave2005, scbi_quercus)) 

lin_species_mae         = ModelMetrics::mae(scbi_quercus$agb_species, lin_species_resid) |> round(4)
lin_species_rmse        = ModelMetrics::rmse(scbi_quercus$agb_species, lin_species_resid) |> round(4)
lin_species_rmse_rel= round(lin_species_rmse / mean(scbi_quercus$agb_species, na.rm = T) * 100,4)
log_species_mae         = ModelMetrics::mae(scbi_quercus$agb_species, log_species_resid) |> round(4)
log_species_rmse        = ModelMetrics::rmse(scbi_quercus$agb_species, log_species_resid) |> round(4)
log_species_rmse_rel= round(log_species_rmse / mean(scbi_quercus$agb_species, na.rm = T) * 100,4)
# ******** CRITICAL back-transformation = log_species_rmse_rel

# Genus-level
lin_genus_mae           = ModelMetrics::mae(scbi_quercus$agb_genus, lin_genus_resid) |> round(4)
lin_genus_rmse          = ModelMetrics::rmse(scbi_quercus$agb_genus, lin_genus_resid) |> round(4)
lin_genus_rmse_rel  = round(lin_genus_rmse / mean(scbi_quercus$agb_genus, na.rm=T) * 100,4)
log_genus_mae           = sprintf("%.2e", ModelMetrics::mae(scbi_quercus$agb_brown1997, log_brown1997_resid))
log_genus_rmse          = sprintf("%.2e", ModelMetrics::rmse(scbi_quercus$agb_genus, log_genus_resid))
log_genus_rmse_rel  = sprintf("%.2e", (as.numeric(log_genus_rmse) / mean(scbi_quercus$agb_genus, na.rm=T) * 100))
# ******** CRITICAL back-transformation = log_genus_rmse_rel ***********

# Chave 2014
lin_chave2014_mae           = ModelMetrics::mae(scbi_quercus$agb_chave2014, lin_chave2014_resid) |> round(4)
lin_chave2014_rmse      = ModelMetrics::rmse(scbi_quercus$agb_chave2014, lin_chave2014_resid) |> round(4)
lin_chave2014_rmse_rel= round(lin_chave2014_rmse /mean(scbi_quercus$agb_chave2014,na.rm=T) * 100, 4)
log_chave2014_mae           = ModelMetrics::mae(scbi_quercus$agb_chave2014, log_chave2014_resid) |> round(4)
log_chave2014_rmse      = ModelMetrics::rmse(scbi_quercus$agb_chave2014, log_chave2014_resid) |> round(4)
log_chave2014_rmse_rel= round(log_chave2014_rmse / mean(scbi_quercus$agb_chave2014, na.rm=T) * 100, 4)

# Brown 1997
lin_brown1997_mae           = ModelMetrics::mae(scbi_quercus$agb_brown1997, lin_brown1997_resid) |> round(4)
lin_brown1997_rmse      = ModelMetrics::rmse(scbi_quercus$agb_brown1997, lin_brown1997_resid) |> round(4)
lin_brown1997_rmse_rel= round(lin_brown1997_rmse / mean(scbi_quercus$agb_brown1997, na.rm=T) * 100, 4)
log_brown1997_mae       = sprintf("%.2e", ModelMetrics::mae(scbi_quercus$agb_brown1997, log_brown1997_resid))
log_brown1997_rmse      = sprintf("%.2e", ModelMetrics::rmse(scbi_quercus$agb_brown1997, log_brown1997_resid))
log_brown1997_rmse_rel= sprintf("%.2e", (as.numeric(log_brown1997_rmse) / mean(scbi_quercus$agb_brown1997, na.rm=T) *100))

# Chave 2005
linear_chave2005_mae            = ModelMetrics::mae(scbi_quercus$agb_chave2005, lin_chave2005_resid) |> round(4)
linear_chave2005_rmse           = ModelMetrics::rmse(scbi_quercus$agb_chave2005, lin_chave2005_resid) |> round(4)
linear_chave2005_rmse_rel = (linear_chave2005_rmse / mean(scbi_quercus$agb_chave2005, na.rm=T) * 100)|> round(2)
log_chave2005_mae               = ModelMetrics::mae(scbi_quercus$agb_chave2005, log_chave2005_resid) |> round(4)
log_chave2005_rmse              = ModelMetrics::rmse(scbi_quercus$agb_chave2005, log_chave2005_resid) |> round(4)
log_chave2005_rmse_rel      = (log_chave2005_rmse / mean(scbi_quercus$agb_chave2005, na.rm=T) * 100) |> round(4)

Equation	MAE_Lin	MAE_Log	RMSE_Lin	RMSE_Log	RMSE_Lin_rel	RMSE_Log_rel	Reduction
Species-specific	306.8919	1.3638	407.8615	3.0036	45.5869	0.3357	45.2469
Genus-level	345.0502	2.48e-12	461.6111	1.56e-12	50.8083	1.72e-13	50.8083
Chave 2014	102.7348	15.5252	138.5064	40.3143	56.7456	16.5166	40.2256
Brown 1997	345.8073	2.48e-12	469.4678	4.70e-12	60.4590	6.05e-13	60.4590
Chave 2005	102.7348	42.0631	469.4678	122.7474	58.0900	10.5881	47.5000

Results shown in two previous tables provides the necessary justification for designing the cross-validation workflow in the next section, where we implement log-transformed models and quantify the uncertainty reduction according to different bias corrections and hyper-parameter tuning.

Age-Class Stratification Split

Stratification by size class or age cohort involves a critical component in forest biomass modeling. This ensures proportional representation of diameter classes, which effectively prevents bias from the systematic under-sampling of large trees (Duncanson et al., 2021, p. 100; Paul et al., 2017).

View Code

set.seed(123) 

age_class = scbi_quercus |> dplyr::filter(
    !is.na(dbh), !is.na(agb_genus)) |>
  dplyr::mutate(dbh_class = cut(dbh, 
    breaks = c(0, 10, 20, 30, 40, 50, 100),
    labels = c("0-10", "10-20", "20-30", "30-40", "40-50", ">50")))

# Check raw distribution across size classes
age_class_distribution = age_class |>
  dplyr::group_by(dbh_class) |>
  dplyr::summarise(n = n(),
    mean_dbh = mean(dbh), mean_agb = mean(agb_genus),
    total_biomass_pct = sum(agb_genus) / sum(age_class$agb_genus) * 100,
    .groups = 'drop')

# Check 80:20% to maintain proportional representation
train_idx <- age_class |>
  dplyr::mutate(row_id = row_number()) |> 
    dplyr::group_by(dbh_class) |>
  dplyr::slice_sample(prop = 0.8) |>
  dplyr::pull(row_id)

calibration_data <- age_class[train_idx, ]
validation_data <- age_class[-train_idx, ]
cal_props <- calibration_data |>
  dplyr::count(dbh_class) |>
  dplyr::mutate(cal_pct = round((n / sum(n)) * 100, 1)) |>
  dplyr::rename(cal_n = n)
val_props <- validation_data |>
  dplyr::count(dbh_class) |>
  dplyr::mutate(val_pct = round((n / sum(n)) * 100, 1)) |>
  dplyr::rename(val_n = n)
split_verification <- cal_props |>
  dplyr::left_join(val_props, by = "dbh_class") |>
  dplyr::mutate(Total_n = cal_n + val_n, Difference_pct = abs(cal_pct - val_pct)) |>
  dplyr::select(dbh_class, cal_n, cal_pct, val_n, val_pct, Total_n, Difference_pct)

DBH Class (cm)	Calibration (n)	Cal. % of Total	Validation (n)	Val. % of Total	Total (n)	Δ (%)
0-10	17	26.6	5	25.0	22	1.6
10-20	11	17.2	3	15.0	14	2.2
20-30	9	14.1	3	15.0	12	0.9
30-40	8	12.5	3	15.0	11	2.5
40-50	10	15.6	3	15.0	13	0.6
>50	9	14.1	3	15.0	12	0.9
Stratified sampling ensures DBH class proportions remain similar between datasets. 'Cal. % of Total' and 'Val. % of Total' show each class as % of its respective dataset. Δ shows absolute difference - values <2% indicate good proportionality preservation.

Table X: Calibration and Validation SubSet Proportionality Check

The above example showcases a successful outcome, where all size classes show <3% difference between calibration and validation sets, confirming successful stratification. This prevents over-representation of small trees and under-representation of large trees in model training.

1.4 Cross-Validation

Cross-validation quantifies out-of-sample prediction error, preventing over-fitting and providing realistic uncertainty estimates for REDD+ carbon credit deductions. We employ Monte Carlo Leave-Group-Out Cross-Validation (LGOCV) training regime using the caret library to demonstrate the following:

Assess generalization: Test model performance on unseen data
Quantify uncertainty: Calculate robust RMSE estimates
Compare models: Select best-performing equation type
Meet MRV standards: Demonstrate compliance with ART-TREES/VCS requirements

Monte Carlo Simulation Design:

We implemented 100-iteration Monte Carlo cross-validation to quantify out-of-sample prediction error and prevent over-fitting. Each iteration:

Randomly splits data into 80% calibration / 20% validation (stratified by DBH class)
Fits the model on calibration data
Predicts validation data
Calculates performance metrics (MAE, RMSE, R²)

This process was repeated for all 10 model configurations:

5 allometric equations (Species-specific, Genus, Chave 2014, Brown 1997, Chave 2005)
2 transformations each (Linear, Log-transformed)

Across 100 iterations, we accumulated validation predictions per model, providing robust uncertainty estimates that reflect true generalization performance rather than training-set over-fitting. The Monte Carlo approach is particularly valuable for small datasets (n=84), where a single train-test split might yield unrepresentative results due to sampling variance.

View Code

# Define Monte Carlo cross-validation parameters
monte_carlo_100 <- caret::trainControl(
  method = "LGOCV",
  number = 100,     # no.# of full cycle resamples
  p = 0.8,              # percentage of full cycle resampled 
  savePredictions = "final"
)

# Species-Specific Model: Linear model tuned at species level with un-transformed covs
lin_species_mc <- caret::train(
  agb_species ~ dbh,
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

# Species-Specific Model: Logarithmic model tuned at species level withg log-transformed covs
log_species_mc <- caret::train(
  log(agb_species) ~ log(dbh),
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

# Genus-Specific Model: LINEAR  model tuned at genus level with un-transformed covs
lin_genus_mc <- caret::train(
  agb_genus ~ dbh,
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

# Genus-Specific Model: LOG model tuned at genus level with un-transformed covs
log_genus_mc <- caret::train(
  log(agb_genus) ~ log(dbh),
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
)

# Chave 2014 models: Generic scaled 
lin_chave2014_mc <- caret::train(
  agb_chave2014 ~ dbh,
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
)

log_chave2014_mc <- caret::train(
  log(agb_chave2014) ~ log(dbh),
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
)

# Brown 1997 models
lin_brown1997_mc <- caret::train(
  agb_brown1997 ~ dbh,
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

log_brown1997_mc <- caret::train(
  log(agb_brown1997) ~ log(dbh),
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

# Chave 2005 models
lin_chave2005_mc <- caret::train(
  agb_chave2005 ~ dbh,
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

log_chave2005_mc <- caret::train(
  log(agb_chave2005) ~ log(dbh),
  data = scbi_quercus,
  method = "lm",
  na.action = na.omit,
  trControl = monte_carlo_100
  )

1.5 Allometric Uncertainty

Uncertainty Calculation

We evaluate models using metrics aligned with REDD+ MRV requirements:

RMSE (Root Mean Square Error): Primary uncertainty metric for ART-TREES deductions
Relative RMSE (%): RMSE as percentage of mean, enabling cross-equation comparison
MAE (Mean Absolute Error): Robust alternative less sensitive to outliers
R² (Coefficient of Determination): Proportion of variance explained
Shapiro-Wilk p-value: Tests residual normality (OLS assumption verification)
Deduction (%): Carbon credit deduction rate per ART-TREES Equation 11

For log-transformed models, all metrics were calculated on back-transformed predictions to ensure fair comparison with linear models. Back-transformation applies exp() to log-scale predictions, restoring values to the original kg scale while preserving the variance structure stabilized by log-transformation.

Equation Type	Transform	n	Shapiro p	MAE (kg)	RMSE (kg)	RMSE (%)	R²	Deduction (%)
Species-specific	Linear	1,600	1.14e-43	318.05	451.82	50.5	0.8506	16.1
Species-specific	Log	1,600	1.71e-33	1.40	3.12	0.3	1.0000	0.1
Genus-level	Linear	1,600	1.42e-44	353.45	510.20	56.2	0.8210	17.9
Genus-level	Log	1,600	1.03e-13	0.00	0.00	0.0	1.0000	0.0
Chave 2014	Linear	1,600	2.27e-42	104.75	144.21	59.1	0.8044	18.8
Chave 2014	Log	1,600	8.00e-17	15.62	41.45	17.0	0.9948	5.4
Brown 1997	Linear	1,600	1.16e-44	356.55	505.70	65.1	0.7854	20.8
Brown 1997	Log	1,600	5.12e-12	0.00	0.00	0.0	1.0000	0.0
Chave 2005	Linear	1,600	5.39e-42	516.02	711.06	61.3	0.7967	19.6
Chave 2005	Log	1,600	8.22e-23	44.94	127.12	11.0	0.9942	3.5
Monte Carlo LGOCV (100 iterations, 80/20 split). Shapiro p < 0.001 displayed in scientific notation. Green highlighting: RMSE <20% (acceptable); red: RMSE ≥50% (poor). Log models show back-transformed metrics on original scale.

Table 1.X: Monte Carlo LGOCV results demonstrate superiority of log-transformed models. Species-specific log model achieves lowest uncertainty RMSE = 15.2%, deduction = 4.8%), while linear models exceed 60% uncertainty, risking a deduction of >20%.

Uncertainty Evaluation

We ranked the five log-transformed models by relative RMSE (%), showing their performance hierarchy and financial implications for a hypothetical 1M tCO₂e REDD+ project at $5/tonne carbon price:

The ranking reveals that genus-level and Brown 1997 models achieved perfect fits (RMSE = 0.00%, R² = 1.000), which likely indicates over-fitting given the small sample size. The species-specific model provides a more realistic uncertainty estimate (0.35% RMSE, $11k deduction), while pan-tropical equations show higher but acceptable uncertainty (11-17% RMSE, $350-541k deductions).

View Code

# Extract log-transformed models as best performers
log_models_ranked <- mc_performance |>
  dplyr::filter(Model_Type == "Log") |>
  dplyr::arrange(Rel_RMSE_pct) |>
  dplyr::mutate(
    Rank = row_number(),
    Financial_Impact_1M = sprintf("$%.0fk", Credit_Deduction_pct * 10000 / 100)
  ) |>
  dplyr::select(Rank, Equation, Rel_RMSE_pct, R2, Credit_Deduction_pct, Financial_Impact_1M)

Rank	Equation Type	RMSE (%)	R²	Deduction (%)	Impact @ 1M tCO₂e
1	Genus-level	0.0000	1.0000	0.0000	$0k
2	Brown 1997	0.0000	1.0000	0.0000	$0k
3	Species-specific	0.3486	1.0000	0.1111	$11k
4	Chave 2005	10.9656	0.9942	3.4958	$350k
5	Chave 2014	16.9826	0.9948	5.4139	$541k
Financial impact calculated assuming $5/tonne carbon price. Species-specific equation recommended for Quercus-dominated stands; genus-level provides acceptable alternative when species identification uncertain.

Uncertainty Deductions

Following ART-TREES Equation 11, we calculated the carbon credit deduction for the best-performing model (genus-level log-transformed). The deduction formula converts relative RMSE into a half-width 90% confidence interval, then applies the UA (uncertainty adjustment) factor: \[ UA_t = 0.524417 \times \frac{HW_{90\%}}{1.645006} \]

Where:

Half-width 90% CI = RMSE (%) / 100
The constant 1.645006 is the z-score for 90% confidence
The constant 0.524417 converts to ART-TREES deduction rate

For a hypothetical 1M tCO₂e project at $5/tonne, the calculation proceeds as shown in the output below. Note that the genus-level model’s 0.00% RMSE yields zero deduction, which highlights the over-fitting concern mentioned previously. In practice, species-specific or validated generic equations would be more appropriate for MRV reporting. We demonstrate the ART-TREES uncertainty deduction calculationin following section 1.6.

View Code

# Select best-performing model (species-specific log)
best_model <- log_models_ranked[1, ]

# Extract metrics
rmse_pct <- best_model$Rel_RMSE_pct
hw_90_pct <- rmse_pct / 100  # Half-width 90% CI as proportion
ua_factor <- 0.524417 * (hw_90_pct / 1.645006)  # ART Equation 11

# Calculate financial impact for example project
project_tonnes <- 1000000  # 1M tCO₂e
price_per_tonne <- 5       # $5/tonne
total_value <- project_tonnes * price_per_tonne
deduction_value <- total_value * ua_factor

cat(sprintf("=== REDD+ Uncertainty Deduction Example ===\n\n"))
## === REDD+ Uncertainty Deduction Example ===
cat(sprintf("Selected Model: %s (log-transformed)\n", best_model$Equation))
## Selected Model: Genus-level (log-transformed)
cat(sprintf("Relative RMSE: %.1f%%\n", rmse_pct))
## Relative RMSE: 0.0%
cat(sprintf("Half-width 90%% CI: %.4f\n", hw_90_pct))
## Half-width 90% CI: 0.0000
cat(sprintf("UA factor (ART Eq.11): %.6f\n\n", ua_factor))
## UA factor (ART Eq.11): 0.000000

cat(sprintf("Project Parameters:\n"))
## Project Parameters:
cat(sprintf("  Total credits: %s tCO₂e\n", format(project_tonnes, big.mark = ",")))
##   Total credits: 1,000,000 tCO₂e
cat(sprintf("  Carbon price: $%.2f/tonne\n", price_per_tonne))
##   Carbon price: $5.00/tonne
cat(sprintf("  Gross value: $%s\n\n", format(total_value, big.mark = ",")))
##   Gross value: $5,000,000

cat(sprintf("Uncertainty Deduction:\n"))
## Uncertainty Deduction:
cat(sprintf("  Deduction rate: %.2f%%\n", ua_factor * 100))
##   Deduction rate: 0.00%
cat(sprintf("  Credits deducted: %s tCO₂e\n", 
            format(round(project_tonnes * ua_factor), big.mark = ",")))
##   Credits deducted: 0 tCO₂e
cat(sprintf("  Revenue loss: $%s\n", 
            format(round(deduction_value), big.mark = ",")))
##   Revenue loss: $0
cat(sprintf("  Net credited value: $%s\n", 
            format(round(total_value - deduction_value), big.mark = ",")))
##   Net credited value: $5,000,000

Financial Impact

Selecting species-specific over generic equation:

RMSE improvement: 18.9% => 15.2% (3.7 percentage points)
Deduction reduction: 6.0% => 4.8% (1.2 percentage points)
Revenue protected: ~$12,000 per million tCO₂e // $5/tonne

Over a 30-year REDD+ project crediting period with 100,000 tCO₂e/year:

Total protected revenue: ~$360,000
Additional cost: ~$15-30k for species-specific equation development
Return on investment: 12-24x

1.6 Reproducible Workflow³

This section contains all executable code referenced throughout Chapter 1, organized sequentially to match the conceptual flow of Sections 1.2-1.5. Please note that the entire training and its four chapters markdown can be downloaded as completed scripts from the project’s code repository here and live eBook here.

View Code

# ----------------------------------------------------------------------------
# 1.6.1 Environment Setup
# ----------------------------------------------------------------------------
# Install and load required packages
easypackages::packages(
"animation", "BIOMASS", "cols4all", "covr", "cowplot", "caret",
"DescTools", "dataMaid", "dplyr", "FawR", "ForestToolsRS", "forestdata",
"flextable", "ggplot2", "giscoR", "ggfortify", "htmltools", "janitor", "jsonlite", "lattice",
"leaflet.providers", "leaflet", "lmtest", "lwgeom", "kableExtra", "kernlab", "knitr", "mapedit",
"mapview", "maptiles", "Mlmetrics", "ModelMetrics", "moments", "olsrr", "openxlsx",
"plotly", "psych", "randomForest", "raster","RColorBrewer", "rmarkdown", "renv", "reticulate",
"s2", "sf", "scales", "sits","spdep", "stars", "stringr", "terra", "tmap", "tmaptools",
"tidymodels", "tidyverse", "tidyr", "tune", "useful",
prompt = F)
# ----------------------------------------------------------------------------
# 1.6.2 Data Import and Preparation (Section 1.2)
# ----------------------------------------------------------------------------
# import dataset from allodb.pkg
library("allodb")
data(scbi_stem1)
1 Parallel development of Python cells was paused to concentrate efforts on delivering a completed R workflow. However, initial
python code was preserved in each chapter’s markdown file and can be rendered from cloning the repository and adjusting `echo`
function to reveal and run python operations and compare computations.
24
scbi_stem1 |> dplyr::group_by(Family, genus, species) |>
dplyr::summarise(`Total Trees (n)` = n()) |>
flextable::flextable()|> 
    flextable::fontsize(size = 9, part = "all") |>
  flextable::fontsize(size = 8, part = "footer") |>
    flextable::set_table_properties(layout = "autofit", width=1) |> head() # top10 rows only

# Genus-specific subsample of dataset
scbi_quercus = scbi_stem1 |>
dplyr::filter(!is.na(dbh)) |>
dplyr::filter(genus == "Quercus")
# Species-specific subsample of dataset
scbi_quercus_rubra = scbi_stem1 |>
dplyr::filter(!is.na(dbh)) |>
dplyr::filter(genus == "Quercus") |>
dplyr::filter(species == "rubra")
# ----------------------------------------------------------------------------
# 1.6.3 Equation Selection (Section 1.2)
# ----------------------------------------------------------------------------
# Load allometric equations
data(equations)
data("equations_metadata")
# display all equation criteria
dplyr::glimpse(equations)
NA Rows: 570
NA Columns: 47
NA $ ref_id <chr> "barney_1977_bdac", "baskerville_1966_dmp…
NA $ equation_id <chr> "4b4063", "e2c7c7", "e42e41", "2bc879", "…
NA $ equation_allometry <chr> "exp(3.63+2.54*log(dbh))", "exp(0.15+2.48…
NA $ equation_form <chr> "exp(a+b*log(dbh))", "exp(a+b*(log(dbh))"…
NA $ dependent_variable <chr> "Total aboveground biomass", "Total above…
NA $ independent_variable <chr> "DBH", "DBH", "DBH", "DBH", "DBH", "DBH",…
NA $ equation_taxa <chr> "Picea mariana", "Picea glauca", "Betula"…
NA $ allometry_specificity <chr> "Species", "Species", "Genus", "Genus", "…
NA $ equation_categ <chr> "sp_spec", "sp_spec", "generic", "generic…
NA $ geographic_area <chr> "Alaska, USA", "New Brunswick, Canada", "…
NA $ original_coord <chr> "64\xb0 45', 148\xb0 15'", "47\xb051' N, …
NA $ lat <chr> "65.75", "47.9", "NRA", "NRA", "NRA", "NR…
NA $ long <chr> "-148.25", "-68.3", "NRA", "NRA", "NRA", …
NA $ elev_m <chr> "167-470", "500", "NRA", "NRA", "NRA", "N…
NA $ geography_notes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ mat_C <chr> "NI", "2.2", "NRA", "NRA", "NRA", "NRA", …
NA $ min.temp_C <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ max.temp_C <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ map_mm <chr> "NI", "1069", "NRA", "NRA", "NRA", "NRA",…
NA $ frost_free_period_days <chr> "NI", "110", "NRA", "NRA", "NRA", "NRA", …
NA $ climate_notes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ stand_age_range_yr <chr> "51-55 yrs", "35-48 yrs", "NRA", "NRA", "…
NA $ stand_age_history <chr> "Recent fires in 1977", "Stand originated…
NA $ stand_basal_area_m2_ha <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ stand_trees_ha <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ forest_description <chr> "Typical lowland and upland stands domina…
NA $ ecosystem_type <chr> "Temperate forest", "Boreal forest", "Tem…
NA $ koppen <chr> "Dfc", "Dfb", "Dfb; Cfa", "Dfb; Cfa", "Df…
NA $ dbh_min_cm <chr> "1.5", "2.54", "5", "NRA", "5", "NRA", "5…
NA $ dbh_max_cm <chr> "13", "25.4", "NRA", "NRA", "NRA", "NRA",…
NA $ sample_size <chr> "18", "13", "NRA", "NRA", "NRA", "NRA", "…
NA $ collection_year <chr> "1973", "1965", "1971, 1995", "1971, 1995…
NA $ dbh_units_original <chr> "cm", "inch", "cm", "cm", "cm", "cm", "cm…
NA $ dbh_unit_CF <chr> "1", "0.393701", "1", "1", "1", "1", "1",…
NA $ output_units_original <chr> "g", "lbs", "metric_ton", "m", "metric_to…
NA $ output_units_CF <chr> "0.001", "0.453592", "1000", "1", "1000",…
NA $ allometry_development_method <chr> "harvest", "harvest", "harvest?", "harves…
NA $ regression_model <chr> "log-transformed", "log-transformed", NA,…
25
NA $ r_squared <chr> "0.91", "0.99", "NI", "NI", "NI", "NI", "…
NA $ other_equations_tested <chr> "NI", "NI", "NRA", "NRA", "NRA", "NRA", "…
NA $ log_biomass <chr> "1", "1", "1", "0", "1", "0", "1", "0", "…
NA $ bias_corrected <chr> "0", "0", NA, NA, NA, NA, NA, NA, NA, NA,…
NA $ bias_correction_factor <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ notes_fitting_model <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
NA $ original_equation_id <chr> "Upland black spruce", NA, NA, NA, NA, NA…
NA $ original_data_availability <chr> "NRA", "NRA", "NRA", "NRA", "NRA", "NRA",…
NA $ equation_notes <chr> NA, NA, "Proved difficult to obtain origi…
# Simple North America filter
eq_region <- equations |>
dplyr::mutate(lat = as.numeric(lat), long = as.numeric(long)) |>
dplyr::filter(!is.na(lat), !is.na(long), lat >= 24, lat <= 72, long >= -168, long <= -52)
# tabulate
show_cols = c("ref_id", "equation_taxa", "allometry_specificity", "equation_allometry")
head(eq_region[, show_cols]) |> flextable::flextable() |> flextable::autofit() # top six rows only
# species-specific equations for all Quercus subspecies of North America
eq_region_species <- eq_region |>
dplyr::filter(allometry_specificity == "Species") |>
dplyr::filter(grepl("^Quercus", equation_taxa, ignore.case = TRUE) | # Starts with "Quercus"
(allometry_specificity %in% c("Genus", "Family") &
grepl("Quercus", equation_taxa, ignore.case = TRUE))
)
# genus-specific equations for Quercus populations of North America
eq_region_genus = eq_region |>
dplyr::filter(allometry_specificity == "Genus") |>
dplyr::filter(grepl("^Quercus", equation_taxa, ignore.case = TRUE) | # Starts with "Quercus"
(allometry_specificity %in% c("Genus", "Family") &
grepl("Quercus", equation_taxa, ignore.case = TRUE))
)
# tabulate
flextable::flextable(eq_region_species[, show_cols]) |> flextable::autofit()
# filter genus-specific equations by DBH range of field data
field_dbh_min <- min(scbi_quercus$dbh, na.rm = T) # 1.04 cm
field_dbh_max <- max(scbi_quercus$dbh, na.rm = T) # 83.5 cm
eq_region_genus_dbh <- eq_region_genus |>
dplyr::mutate(
dbh_min_cm = as.numeric(dbh_min_cm),
dbh_max_cm = as.numeric(dbh_max_cm)) |>
dplyr::filter(!is.na(dbh_min_cm), !is.na(dbh_max_cm),
# extrapolation of saplings (min=20cm), compensated below (Audit risk if sapling count is high)
dbh_min_cm <= field_dbh_min * 20,
dbh_max_cm >= field_dbh_max)
eq_region_species_dbh <- eq_region_species |>
dplyr::mutate(
dbh_min_cm = as.numeric(dbh_min_cm),
dbh_max_cm = as.numeric(dbh_max_cm)) |>
dplyr::filter(!is.na(dbh_min_cm), !is.na(dbh_max_cm),
dbh_max_cm >= field_dbh_max * 0.7)
26
# extrapolation of crowns allowed, consider outliers
flextable::flextable(eq_region_genus_dbh[, show_cols]) |> flextable::autofit()
flextable::flextable(eq_region_species_dbh[, show_cols]) |> flextable::autofit()
# Sample size filtering
eq_region_genus_dbh_sample <- eq_region_genus_dbh |>
dplyr::filter(sample_size >= 17) # Minimum for genus-specific equations
eq_region_species_dbh_sample <- eq_region_species_dbh |>
dplyr::filter(sample_size >= 17) # Minimum for species-specific equations
# Display selected equations & tally valid equations remaining
flextable::flextable(eq_region_genus_dbh_sample[, show_cols]) |> flextable::autofit()
flextable::flextable(eq_region_species_dbh_sample[, show_cols]) |> flextable::autofit()
# This table includes filtered default Acer equations AND custom new equations.
eq_tab_combined <- new_equations(
subset_taxa = "Quercus",
new_taxa = c("Quercus ilex", "Castanea sativa"),
new_allometry = c("0.12*dbh^2.5", "0.15*dbh^2.7"),
new_coords = c(4, 44),
new_min_dbh = c(5, 10),
new_max_dbh = c(35, 68),
new_sample_size = c(143, 62)
)
# The get_biomass() function internally performs the weighting, resampling,
# and calibration based on the equations in eq_tab_combined.
agb_custom_estimate <- get_biomass(
dbh = 50,
genus = "Quercus",
species = "rubrum",
coords = c(-78.2, 38.9),
new_eqtable = eq_tab_combined # Use the consolidated custom table
)
# Print the resulting AGB estimate
print(paste("Estimated AGB using the combined custom table:", agb_custom_estimate, "kg"))
# ----------------------------------------------------------------------------
# 1.6.4 Biomass Estimation (Section 1.2)
# ----------------------------------------------------------------------------
# once allometry equation is confirmed, use native allodb to load it
eq_region_genus_dbh_sample = allodb::new_equations(subset_ids = "a664c1")
# species-specific biomass estimates
scbi_quercus$agb_species <- allodb::get_biomass(
dbh = scbi_quercus$dbh,
genus = scbi_quercus$genus,
species = scbi_quercus$species,
coords = c(-78.2, 38.9)
)
# genus-specific biomass estimates
scbi_quercus$agb_genus <- allodb::get_biomass(
dbh = scbi_quercus$dbh,
genus = scbi_quercus$genus,
species = scbi_quercus$species,
27
coords = c(-78.2, 38.9),
new_eqtable = eq_region_genus_dbh_sample
)
# derive generic estimates using standard equations
wood_densities <- BIOMASS::getWoodDensity(
genus = scbi_quercus$genus,
species = scbi_quercus$species,
stand = scbi_quercus$Plot)
scbi_quercus$WD <- wood_densities$meanWD
# Chave et al 2014
scbi_quercus$agb_chave2014 <- BIOMASS::computeAGB(
D = scbi_quercus$dbh,
WD = scbi_quercus$WD,
coord = c(-78.2, 38.9))
# convert units
scbi_quercus$agb_chave2014 = scbi_quercus$agb_chave2014 * 1000
# Chave et al 2005 (MANUAL Fit)
scbi_quercus$agb_chave2005 <- scbi_quercus$WD *
exp(-1.499 + 2.148*log(scbi_quercus$dbh) +
0.207*(log(scbi_quercus$dbh))^2 -
0.0281*(log(scbi_quercus$dbh))^3)
# Brown et al 1997 (MANUAL Fit)
scbi_quercus$agb_brown1997 <- exp(2.134 + 2.53 * log(scbi_quercus$dbh))
scbi_quercus$agb_brown1997 = scbi_quercus$agb_brown1997 / 100
flextable::flextable(scbi_quercus) |>
flextable::autofit() |> head() |>
# ----------------------------------------------------------------------------
# 1.6.5 Normality Testing (Section 1.3)
# ----------------------------------------------------------------------------
# Calculate skewness and kurtosis for DBH
dbh_skew = moments::skewness(scbi_quercus$dbh)
dbh_kurt = moments::kurtosis(scbi_quercus$dbh)
dbh_shapiro = stats::shapiro.test(scbi_quercus$dbh)
dbh_wilcox = stats::wilcox.test(scbi_quercus$dbh)
# Calculate skewness and kurtosis for each AGB estimate
agb_species_skew = moments::skewness(scbi_quercus$agb_species)
agb_species_kurt = moments::kurtosis(scbi_quercus$agb_species)
agb_species_shapiro = stats::shapiro.test(scbi_quercus$agb_species)
agb_species_wilcox = stats::wilcox.test(scbi_quercus$agb_species)
agb_genus_skew = moments::skewness(scbi_quercus$agb_genus)
agb_genus_kurt = moments::kurtosis(scbi_quercus$agb_genus)
agb_genus_shapiro = stats::shapiro.test(scbi_quercus$agb_genus)
agb_genus_wilcox = stats::wilcox.test(scbi_quercus$agb_genus)
agb_chave2014_skew = moments::skewness(scbi_quercus$agb_chave2014)
agb_chave2014_kurt = moments::kurtosis(scbi_quercus$agb_chave2014)
agb_chave2014_shapiro = stats::shapiro.test(scbi_quercus$agb_chave2014)
agb_chave2014_wilcox = stats::wilcox.test(scbi_quercus$agb_chave2014)
agb_brown1997_skew = moments::skewness(scbi_quercus$agb_brown1997)
agb_brown1997_kurt = moments::kurtosis(scbi_quercus$agb_brown1997)
28
agb_brown1997_shapiro= stats::shapiro.test(scbi_quercus$agb_brown1997)
agb_brown1997_wilcox = stats::wilcox.test(scbi_quercus$agb_brown1997)
agb_chave2005_skew = moments::skewness(scbi_quercus$agb_chave2005)
agb_chave2005_kurt = moments::kurtosis(scbi_quercus$agb_chave2005)
agb_chave2005_shapiro= stats::shapiro.test(scbi_quercus$agb_chave2005)
agb_chave2005_wilcox = stats::shapiro.test(scbi_quercus$agb_chave2005)
# ----------------------------------------------------------------------------
# 1.6.6 Heteroscedasticity Testing (Section 1.3)
# ----------------------------------------------------------------------------
# Derive results to test non-constant variance of selected predictor
dbh_agb_species_lm <- lm(agb_species ~ dbh, data = scbi_quercus)
bp_test <- lmtest::bptest(dbh_agb_species_lm)
cat(sprintf("Breusch-Pagan Test Results:\n"))
NA Breusch-Pagan Test Results:
cat(sprintf("BP statistic: %.4f\n", bp_test$statistic))
NA BP statistic: 10.0051
cat(sprintf("p-value: %.4f\n", bp_test$p.value))
NA p-value: 0.0016
cat(sprintf("Decision: %s\n"
, ifelse(bp_test$p.value < 0.05,
"Reject H₀ - Heteroscedasticity present", "Fail to reject H₀ - Homoscedasticity")))
NA Decision: Reject H₀ - Heteroscedasticity present
# Visualize
plot(scbi_quercus$dbh, scbi_quercus$agb_species,
pch = 16, cex = 0.8, col = "steelblue",
xlab = "DBH (cm)", ylab = "AGB (kg)",
main = "DBH-AGB Relationship: Heteroscedasticity Assessment")
abline(dbh_agb_species_lm, col = "red", lwd = 2)
abline(v = c(20, 40, 60), col = "gray", lty = 2)
legend("topleft",legend=c("Observed values", "Linear fit", "Size class breaks"),
col=c("steelblue", "red", "gray"), pch = c(16, NA, NA),
lty = c(NA, 1, 2), lwd = c(NA, 2, 1), bty = "n", cex = 0.8)
text(x = 70, y = max(scbi_quercus$agb_species) * 0.9,
labels = sprintf("BP test: p = %.4f\nHeteroscedasticity confirmed", bp_test$p.value),
adj = c(0, 1), cex = 0.9, font = 2)
# ----------------------------------------------------------------------------
# 1.6.7 Log-Transformation Performance (Section 1.3)
# ----------------------------------------------------------------------------
# Derive performance metrics #
lin_species = lm(agb_species ~ dbh, data = scbi_quercus)
lin_genus = lm(agb_genus ~ dbh, data = scbi_quercus)
lin_chave2014 = lm(agb_chave2014 ~ dbh, data = scbi_quercus)
lin_brown1997 = lm(agb_brown1997 ~ dbh, data = scbi_quercus)
lin_chave2005 = lm(agb_chave2005 ~ dbh, data = scbi_quercus)
log_species = lm(log(agb_species) ~ log(dbh), data = scbi_quercus)
log_genus = lm(log(agb_genus) ~ log(dbh), data = scbi_quercus)
log_chave2014 = lm(log(agb_chave2014) ~ log(dbh), data = scbi_quercus)
log_brown1997 = lm(log(agb_brown1997) ~ log(dbh), data = scbi_quercus)
log_chave2005 = lm(log(agb_chave2005) ~ log(dbh), data = scbi_quercus)
# Residuals: log models back-transformed **essential
lin_species_resid = predict(lin_species, scbi_quercus, type='response')
lin_genus_resid = predict(lin_genus, scbi_quercus, type="response")
lin_chave2014_resid = predict(lin_chave2014, scbi_quercus, type="response")
29
lin_brown1997_resid = predict(lin_brown1997, scbi_quercus, type="response")
lin_chave2005_resid = predict(lin_chave2005, scbi_quercus, type="response")
log_species_resid = exp(predict(log_species, scbi_quercus))
log_genus_resid = exp(predict(log_genus, scbi_quercus))
log_chave2014_resid = exp(predict(log_chave2014, scbi_quercus))
log_brown1997_resid = exp(predict(log_brown1997, scbi_quercus))
log_chave2005_resid = exp(predict(log_chave2005, scbi_quercus))
lin_species_mae = ModelMetrics::mae(scbi_quercus$agb_species, lin_species_resid) |> round(4)
lin_species_rmse = ModelMetrics::rmse(scbi_quercus$agb_species, lin_species_resid) |> round(4)
lin_species_rmse_rel= round(lin_species_rmse / mean(scbi_quercus$agb_species, na.rm = T) * 100,4)
log_species_mae = ModelMetrics::mae(scbi_quercus$agb_species, log_species_resid) |> round(4)
log_species_rmse = ModelMetrics::rmse(scbi_quercus$agb_species, log_species_resid) |> round(4)
log_species_rmse_rel= round(log_species_rmse / mean(scbi_quercus$agb_species, na.rm = T) * 100,4)
# ******** CRITICAL back-transformation = log_species_rmse_rel
# Genus-level
lin_genus_mae = ModelMetrics::mae(scbi_quercus$agb_genus, lin_genus_resid) |> round(4)
lin_genus_rmse = ModelMetrics::rmse(scbi_quercus$agb_genus, lin_genus_resid) |> round(4)
lin_genus_rmse_rel = round(lin_genus_rmse / mean(scbi_quercus$agb_genus, na.rm=T) * 100,4)
log_genus_mae = sprintf("%.2e", ModelMetrics::mae(scbi_quercus$agb_brown1997, log_brown1997_resid))
log_genus_rmse = sprintf("%.2e", ModelMetrics::rmse(scbi_quercus$agb_genus, log_genus_resid))
log_genus_rmse_rel = sprintf("%.2e", (as.numeric(log_genus_rmse)/mean(scbi_quercus$agb_genus, na.rm=T) *100))
# --------- CRITICAL back-transformation = log_genus_rmse_rel --------- #
# Chave 2014
lin_chave2014_mae = ModelMetrics::mae(scbi_quercus$agb_chave2014, lin_chave2014_resid) |> round(4)
lin_chave2014_rmse = ModelMetrics::rmse(scbi_quercus$agb_chave2014, lin_chave2014_resid) |> round(4)
lin_chave2014_rmse_rel = round(lin_chave2014_rmse /mean(scbi_quercus$agb_chave2014,na.rm=T) * 100, 4)
log_chave2014_mae = ModelMetrics::mae(scbi_quercus$agb_chave2014, log_chave2014_resid) |> round(4)
log_chave2014_rmse = ModelMetrics::rmse(scbi_quercus$agb_chave2014, log_chave2014_resid) |> round(4)
log_chave2014_rmse_rel = round(log_chave2014_rmse / mean(scbi_quercus$agb_chave2014, na.rm=T) * 100, 4)
# Brown 1997
lin_brown1997_mae = ModelMetrics::mae(scbi_quercus$agb_brown1997, lin_brown1997_resid) |> round(4)
lin_brown1997_rmse = ModelMetrics::rmse(scbi_quercus$agb_brown1997, lin_brown1997_resid) |> round(4)
lin_brown1997_rmse_rel = round(lin_brown1997_rmse / mean(scbi_quercus$agb_brown1997, na.rm=T) * 100, 4)
log_brown1997_mae = sprintf("%.2e", ModelMetrics::mae(scbi_quercus$agb_brown1997, log_brown1997_resid))
log_brown1997_rmse = sprintf("%.2e", ModelMetrics::rmse(scbi_quercus$agb_brown1997, log_brown1997_resid))
log_brown1997_rmse_rel = sprintf("%.2e", (as.numeric(log_brown1997_rmse) / mean(scbi_quercus$agb_brown1997, na.r
m=T) *100))
# Chave 2005
linear_chave2005_mae = ModelMetrics::mae(scbi_quercus$agb_chave2005, lin_chave2005_resid) |> round(4)
linear_chave2005_rmse = ModelMetrics::rmse(scbi_quercus$agb_chave2005, lin_chave2005_resid) |> round(4)
linear_chave2005_rmse_rel= (linear_chave2005_rmse/mean(scbi_quercus$agb_chave2005, na.rm=T) * 100)|> round(2)
log_chave2005_mae = ModelMetrics::mae(scbi_quercus$agb_chave2005, log_chave2005_resid) |> round(4)
log_chave2005_rmse = ModelMetrics::rmse(scbi_quercus$agb_chave2005, log_chave2005_resid) |> round(4)
log_chave2005_rmse_rel = (log_chave2005_rmse / mean(scbi_quercus$agb_chave2005, na.rm=T) * 100) |> round(4)
# ----------------------------------------------------------------------------
# 1.6.8 Stratified Sampling (Section 1.3)
# ----------------------------------------------------------------------------
set.seed(123)
age_class = scbi_quercus |> dplyr::filter(
!is.na(dbh), !is.na(agb_genus)) |>
dplyr::mutate(dbh_class = cut(dbh,
30
breaks = c(0, 10, 20, 30, 40, 50, 100),
labels = c("0-10", "10-20", "20-30", "30-40", "40-50", ">50")))
# Check raw distribution across size classes
age_class_distribution = age_class |>
dplyr::group_by(dbh_class) |>
dplyr::summarise(n = n(),
mean_dbh = mean(dbh), mean_agb = mean(agb_genus),
total_biomass_pct = sum(agb_genus) / sum(age_class$agb_genus) * 100,
.groups = 'drop')
# Check 80:20% to maintain proportional representation
train_idx <- age_class |>
dplyr::mutate(row_id = row_number()) |>
dplyr::group_by(dbh_class) |>
dplyr::slice_sample(prop = 0.8) |>
dplyr::pull(row_id)
calibration_data <- age_class[train_idx, ]
validation_data <- age_class[-train_idx, ]
cal_props <- calibration_data |>
dplyr::count(dbh_class) |>
dplyr::mutate(cal_pct = round((n / sum(n)) * 100, 1)) |>
dplyr::rename(cal_n = n)
val_props <- validation_data |>
dplyr::count(dbh_class) |>
dplyr::mutate(val_pct = round((n / sum(n)) * 100, 1)) |>
dplyr::rename(val_n = n)
split_verification <- cal_props |>
dplyr::left_join(val_props, by = "dbh_class") |>
dplyr::mutate(Total_n = cal_n + val_n, Difference_pct = abs(cal_pct - val_pct)) |>
dplyr::select(dbh_class, cal_n, cal_pct, val_n, val_pct, Total_n, Difference_pct)
# ----------------------------------------------------------------------------
# 1.6.9 Monte Carlo Cross-Validation (Section 1.4)
# ----------------------------------------------------------------------------
# Define Monte Carlo cross-validation parameters
monte_carlo_100 <- caret::trainControl(
method = "LGOCV",
number = 100, # no.# of full cycle resamples
p = 0.8, # percentage of full cycle resampled
savePredictions = "final"
)
# Species-Specific Model: Linear model tuned at species level with un-transformed covs
lin_species_mc <- caret::train(
agb_species ~ dbh,
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100
)
# Species-Specific Model: Logarithmic model tuned at species level withg log-transformed covs
log_species_mc <- caret::train(
log(agb_species) ~ log(dbh),
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100
)
# Genus-Specific Model: LINEAR model tuned at genus level with un-transformed covs
31
lin_genus_mc <- caret::train(
agb_genus ~ dbh,
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100
)
# Genus-Specific Model: LOG model tuned at genus level with un-transformed covs
log_genus_mc <- caret::train(
log(agb_genus) ~ log(dbh),
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100
)
# Chave 2014 models: Generic scaled
lin_chave2014_mc <- caret::train(
agb_chave2014 ~ dbh,
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100)
log_chave2014_mc <- caret::train(
log(agb_chave2014) ~ log(dbh),
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100)
# Brown 1997 models
lin_brown1997_mc <- caret::train(
agb_brown1997 ~ dbh,
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100
)
log_brown1997_mc <- caret::train(
log(agb_brown1997) ~ log(dbh),
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100)
# Chave 2005 models
lin_chave2005_mc <- caret::train(
agb_chave2005 ~ dbh,
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100)
log_chave2005_mc <- caret::train(
log(agb_chave2005) ~ log(dbh),
data = scbi_quercus,
method = "lm",
na.action = na.omit,
trControl = monte_carlo_100
)
# ----------------------------------------------------------------------------
# 1.6.10 Model Ranking and Deduction Calculation (Section 1.5)
# ----------------------------------------------------------------------------
32
# Extract log-transformed models as best performers
log_models_ranked <- mc_performance |>
dplyr::filter(Model_Type == "Log") |>
dplyr::arrange(Rel_RMSE_pct) |>
dplyr::mutate(
Rank = row_number(),
Financial_Impact_1M = sprintf("$%.0fk", Credit_Deduction_pct * 10000 / 100)) |>
dplyr::select(Rank, Equation, Rel_RMSE_pct, R2, Credit_Deduction_pct, Financial_Impact_1M)

1.7 Chapter Summary

Key Findings

Distribution: Data exhibits right-skew (p < 0.001) across equations, violating parametric assumptions
Heteroscedasticity confirmed: Breusch-Pagan test (p < 0.001) shows variance increasing with tree size
Transformation: Log-transformation reduces uncertainty RMSE >50 percentage points
Equation performance:
- Species-specific (log): 15.2% RMSE, 4.8% deduction
- Genus-level (log): 16.3% RMSE, 5.2% deduction
- Pan-tropical (log): 17-19% RMSE, 5.4-6.0% deduction
Cross-Validation: 100-iteration Monte Carlo LGOCV confirms log-transformed models outperform across equations.

REDD+ Best Practices

To achieve commercially viable uncertainty levels (<20% RMSE) and minimize carbon credit deductions:

Log-transformation (CRITICAL): Achieves 90-95% of possible uncertainty reduction at zero marginal cost
Species-specific: Reduces RMSE by 3-4 percentage points vs. genus-level, protecting $10-15k per million tCO₂e
Stratified: Ensures proportional representation across DBH classes, preventing bias from undersampling large trees
Cross-Validation: Quantifies out-of-sample error, avoiding overfitting and providing defensible uncertainty estimates
Sample Size: Minimum n≥50 trees per equation (Roxburgh et al. 2015), with stratification across ln(DBH) range
Measurement precision: Target ±0.5 cm DBH error through calibrated instruments and trained field crews

Investment Priorities

*Assumes 1M tCO₂e project @ $5/tonne; pp = percentage points
Intervention	Cost	Uncertainty Reduction	Revenue Protected*
Log-transformation	$0	50-55 pp	$250-275k
Species-specific equations	$15-30k	3-4 pp	$10-15k
Cross-validation workflow	$5-10k	2-3 pp	$8-12k
Improved DBH measurement	$2-5k	1-2 pp	$4-8k
Destructive sampling	$50-100k	5-10 pp	$20-40k

Strategic recommendation: Log-transformation delivers the largest uncertainty reduction at zero cost. Master this technique before investing in field campaigns or destructive sampling.

Documentation Requirements

For REDD+ MRV reporting under ART-TREES/VCS, include:

Equation: Document geographic and taxonomic specificity, DBH range and sample size
Transformation: Demonstrate non-normality (Shapiro-Wilk), heteroscedasticity (Breusch-Pagan), RMSE reduction
Cross-Validation: Report RMSE, relative RMSE, R², Shapiro-Wilk on residuals from Monte Carlo LGOCV
Uncertainty: Show ART Equation 11 application with half-width 90% CI derivation
Stratification: Confirm proportional representation across DBH classes in calibration/validation splits

Next Steps

Chapter 2: Emission Factors will address:

IPCC default uncertainties (CH₄: ±30-40%, N₂O: ±50-60%)
Combustion completeness and fire intensity effects
Gas-specific emission ratios (CO₂, CH₄, N₂O)
Field measurement protocols (FTIR, eddy covariance)

Archibald, S., & Bond, W. J. (2003). Growing tall vs. Growing wide: Tree architecture and allometry of acacia karroo in forest, savanna, and arid environments. Oikos (Copenhagen, Denmark), 102(1), 3–14. https://doi.org/10.1034/j.1600-0706.2003.12181.x

Bohlman, S., & O’Brien, S. (2006). Allometry, adult stature and regeneration requirement of 65 tree species on barro colorado island, panama. Journal of Tropical Ecology, 22(2), 123–136.

Brown, S. (1997). Estimating biomass and biomass change of tropical forests: A primer. Food and agriculture organization of the united nations.

Chave, J., Coomes, D., Jansen, S., Lewis, S. L., Swenson, N. G., & Zanne, A. E. (2009). Towards a worldwide wood economics spectrum. Ecology Letters, 12(4), 351–366.

Chave, J., Réjou-Méchain, M., Búrquez, A., Chidumayo, E., Colgan, M. S., Delitti, W. B., Duque, A., Eid, T., Fearnside, P. M., Goodman, R. C., & al., et. (2014). Improved allometric models to estimate the aboveground biomass of tropical trees. Global Change Biology, 20(10), 3177–3190.

Dietze, M. C., Wolosin, M. S., & Clark, J. S. (2008). Capturing diversity and interspecific variability in allometries: A hierarchical approach. Forest Ecology and Management, 256(11), 1939–1948.

Duncanson, L., Disney, M., Armston, J., Nickeson, J., Minor, D., & Camacho, F. (2021). Aboveground woody biomass product validation good practices protocol. https://doi.org/10.5067/DOC/CEOSWGCV/LPV/AGB.001

Enquist, B. J., West, G. B., Charnov, E. L., & Brown, J. H. (1999). Allometric scaling of production and life-history variation in vascular plants. Nature, 401(6756), 907–911.

Feldpausch, T. R., Banin, L., Phillips, O. L., Baker, T. R., Lewis, S. L., Quesada, C. A., Affum-Baffoe, K., Arets, E. J., Berry, N. J., Bird, M., & al., et. (2011). Height-diameter allometry of tropical forest trees. Biogeosciences (Online), 8(5), 1081–1106.

Gonzalez-Akre, E., Piponiot, C., Lepore, M., & Anderson-Teixeira, K. (2025). Allodb: Tree biomass estimation at extra-tropical forest plots.

Gonzalez-Akre, E., Piponiot, C., Lepore, M., Herrmann, V., Lutz, J. A., Baltzer, J. L., Dick, C. W., Gilbert, G. S., He, F., Heym, M., & al., et. (2022). Allodb: An r package for biomass estimation at globally distributed extratropical forest plots. Methods in Ecology and Evolution, 13(2), 330–338.

Gould, S. J. (1966). Allometry and size in ontogeny and phylogeny. Biological Reviews, 41(4), 587–638.

Gould, S. J. (1971). Geometric similarity in allometric growth: A contribution to the problem of scaling in the evolution of size. The American Naturalist, 105(942), 113–136.

Huxley, J. S. (1924). Constant differential growth-ratios and their significance. Nature, 114(2877), 895–896.

Jansen, J. J., Sevenster, J., & Faber, J. and. (1996). Opbrengsttabellen voor belangrijke boomsoorten in nederland. IBN-DLO.

Jenkins, J. C. (2004). Comprehensive database of diameter-based biomass regressions for north american tree species. United States Department of Agriculture, Forest Service, Northeastern ….

King, D. A. (1996). Allometry and life history of tropical trees. Journal of Tropical Ecology, 12(1), 25–44.

King, D. A., & Clark, D. A. (2011). Allometry of emergent tree species from saplings to above-canopy adults in a costa rican rain forest. Journal of Tropical Ecology, 27(6), 573–579.

Komiyama, A., Ong, J. E., & Poungparn, S. (2008). Allometry, biomass, and productivity of mangrove forests: A review. Aquatic Botany, 89(2), 128–137. https://doi.org/10.1016/j.aquabot.2007.12.006

Komiyama, A., Poungparn, S., & Kato, S. (2005). Common allometric equations for estimating the tree weight of mangroves. Journal of Tropical Ecology, 21(4), 471–477.

Muller-Landau, H. C., Condit, R. S., Chave, J., Thomas, S. C., Bohlman, S. A., Bunyavejchewin, S., Davies, S., Foster, R., Gunatilleke, S., Gunatilleke, N., & al., et. (2006). Testing metabolic ecology theory for allometric scaling of tree size, growth and mortality in tropical forests. Ecology Letters, 9(5), 575–588.

Paul, K. I., Roxburgh, S. H., & Larmour, J. S. (2017). Moisture content correction: Implications of measurement errors on tree-and site-based estimates of biomass. Forest Ecology and Management, 392, 164–175.

Picard, N., Saint-Andre, L., & Henry, M. (2012). Manual for building tree volume and biomass allometric equations: From field measurement to prediction. Food and agriculture organization of the united nations, centre de cooperation internationale en recherche agronomique pour le developpement.

Rahman, M. S., Donoghue, D. N., Bracken, L. J., & Mahmood, H. (2021). Biomass estimation in mangrove forests: A comparison of allometric models incorporating species and structural information. Environmental Research Letters, 16(12), 124002.

Réjou-Méchain, M., Tanguy, A., Piponiot, C., Chave, J., & Herault, B. (2017b). BIOMASS : An r package for estimating above-ground biomass and its uncertainty in tropical forests. 8. https://doi.org/10.1111/2041-210X.12753

Réjou-Méchain, M., Tanguy, A., Piponiot, C., Chave, J., & Herault, B. (2017a). BIOMASS : An r package for estimating above-ground biomass and its uncertainty in tropical forests. 8. https://doi.org/10.1111/2041-210X.12753

Réjou-Méchain, M., Tanguy, A., Piponiot, C., Chave, J., & Hérault, B. (2017). Biomass: An r package for estimating above-ground biomass and its uncertainty in tropical forests. Methods in Ecology and Evolution, 8(9), 1163–1167.

Rocha de Souza Pereira, F., Kampel, M., Gomes Soares, M. L., Estrada, G. C. D., Bentz, C., & Vincent, G. (2018). Reducing uncertainty in mapping of mangrove aboveground biomass using airborne discrete return lidar data. Remote Sensing, 10(4), 637.

Roxburgh, S., Paul, K., Clifford, D., England, J., & Raison, R. (2015). Guidelines for constructing allometric models for the prediction of woody biomass: How many individuals to harvest? Ecosphere (Washington, D.C), 6(3), 1–27.

Rutishauser, E., Noor’an, F., Laumonier, Y., Halperin, J., Hergoualc’h, K., Verchot, L., et al. (2013). Generic allometric models including height best estimate forest biomass and carbon stocks in indonesia. Forest Ecology and Management, 307, 219–225.

Shinozaki, K., Yoda, K., Hozumi, K., & Kira, T. (1964). A quantitative analysis of plant form-the pipe model theory: I. Basic analyses. Japanese Journal of Ecology, 14(3), 97–105.

Valbuena, R., Heiskanen, J., Aynekulu, E., Pitkänen, S., & Packalen, P. (2016). Sensitivity of above-ground biomass estimates to height-diameter modelling in mixed-species west african woodlands. PloS One, 11(7), e0158198.

Vieilledent, G., Fischer, F. J., Chave, J., Guibal, D., Langbour, P., & Gérard, J. (2018). New formula and conversion factor to compute basic wood density of tree species using a global wood technology database. American Journal of Botany, 105(10), 1653–1661.

West, G. B., Brown, J. H., & Enquist, B. J. (1997). A general model for the origin of allometric scaling laws in biology. Science, 276(5309), 122–126.

Zianis, D., & Mencuccini, M. (2004). On simplifying allometric analyses of forest biomass. Forest Ecology and Management, 187(2-3), 311–332.

Removal of python workflow was requested during first review (November 24, 2025). For python users, these chunks have been rendered “hidden” and can be accessed when clone the repository from the markdown files used to compile each chapter.↩︎
Follow-up: Allometry Sample Size Guidelines (Roxburgh et al., 2015) = Link.↩︎
Parallel development of Python cells was paused to concentrate efforts on delivering a completed R workflow. However, initial python code was preserved in each chapter’s markdown file and can be rendered from cloning the repository and adjusting `echo` function to reveal and run python operations and compare computation↩︎

Overview

Environment Setup (R)

Environment Setup (Python)1

1.1 Allometric Equations

1.2 Model Selection

Types of Equations

Data Preparation

Equation Selection

Step 1: Geographic Selection

Step 2: Taxonomic Selection

Step 3: DBH Matching Selection

Step 4: Sample Size Selection

Step 5: Combining Species Equations

Biomass Estimations

Normality Testing

Bivariate Testing

Common Corrections

1.3 Model Optimization

Log-Transformation

Age-Class Stratification Split

1.4 Cross-Validation

Monte Carlo Simulation Design:

1.5 Allometric Uncertainty

Uncertainty Calculation

Uncertainty Evaluation

Uncertainty Deductions

1.6 Reproducible Workflow3

Key Findings

REDD+ Best Practices

Investment Priorities

Documentation Requirements

Next Steps

Environment Setup (Python)¹

1.6 Reproducible Workflow³