Performance

bext ships with a built-in performance measurement and regression prevention system. Every PR is benchmarked, every metric has a budget, and regressions are caught before they reach production.

Budget Framework

Sites declare [perf.budget] in bext.config.toml to set per-metric thresholds. When a workload run exceeds a budget the regression gate fails the build.

[perf.budget]
p50_ms = 8
p99_ms = 25
rss_mb = 128

[perf.budget.features]
waf = { p50_add_ms = 1.5 }
compression = { p50_add_ms = 0.5 }

Budgets are enforced per-workload. If a workload does not declare a budget for a metric, the global budget applies.

History Store

bext bench history <workload> shows the timeline of a metric across commits. The history store persists results to .bext/bench-history/ by default, with optional S3 upload for team-wide visibility.

$ bext bench history blog_read_heavy --metric p99_ms --last 20
commit    date        p99_ms
a1b2c3d   2026-04-01  12.3
d4e5f6g   2026-04-02  12.1
h7i8j9k   2026-04-03  14.8  << regression
...

Use bext bench blame <workload> --metric p99_ms to identify the commit that introduced a regression using binary-search over the history.

Regression Gate

CI runs perf-gate on every PR. The gate compares the PR's workload results against the baseline from the main branch and fails if any metric regresses beyond the MAD (median absolute deviation) band.

The gate uses a statistical model rather than a fixed threshold so that natural variance does not cause false positives. A regression must exceed 2x the MAD of the last 10 baseline runs to trigger a failure (MAD_MULTIPLIER = 2.0 in bext-bench-history/src/band.rs).

$ perf-gate --baseline main --compare HEAD
PASS  blog_read_heavy      p50_ms   7.8 vs 8.1  (+3.8%, within 2x MAD)
FAIL  ssr_dynamic           p99_ms  22.1 vs 18.4 (+20.1%, exceeds 2x MAD of 1.2)

Per-Feature Cost

bext bench cost shows the cost table: how much each compiled feature adds to key metrics. This is measured by building with and without each feature and differencing the workload results.

$ bext bench cost
feature           p50_add_ms   rss_add_mb   binary_add_kb
waf               +1.2         +3.4         +820
tls               +0.1         +1.8         +1200
react-compiler    +0.0         +0.5         +340
v8                +0.3         +12.1        +4800

Interpreting Failures

When the gate fails, the error shows:

1. Which workload regressed (e.g. ssr_dynamic). 2. Which metric exceeded the band (e.g. p99_ms). 3. The delta between the PR and the baseline, expressed as both an absolute value and a percentage. 4. The MAD band so you can see how far outside normal variance the result falls.

Common causes of regressions:

- Adding a synchronous operation to the request hot path.

- A dependency update that increases binary size or startup time.

- A cache configuration change that lowers hit rates.

- An accidental clone() or allocation in a per-request path.

Run bext bench compare <workload> --a <base> --b <head> locally to reproduce the regression and iterate on fixes before pushing again.

Tip

The gate uses a statistical MAD band, not a fixed threshold, so minor noise does not trigger failures. A result must exceed 2× the MAD of the last 10 baseline runs — single-run outliers are ignored.

Note

The bext bench cost feature table shows per-feature overhead measured by building with and without each feature. If you want to understand why enabling v8 adds RSS, or whether the WAF is the bottleneck, start there before tuning budgets.

Related

- Monitoring — Prometheus metrics for live P50/P95/P99 latency

- Troubleshooting — high CPU, transform pipeline, and image cache issues

- Architecture: V8 Render Pool — render worker pool topology that drives SSR latency

- Guides: Caching — TTL and stale-while-revalidate mechanics that affect cache hit rates

- Configuration: Build Flags — compile-time feature flags reflected in the cost table