Monitoring
bext exposes a Prometheus-compatible metrics endpoint and a health check endpoint, giving you full observability with your existing monitoring stack.
Configuration
# bext.config.toml
[metrics]
enabled = true # expose /__bext/metrics
path = "/__bext/metrics" # customize the endpoint path
include_process = true # include process_* metrics (RSS, CPU, FDs)
[health]
enabled = true # expose /__bext/health
path = "/__bext/health" # customize the endpoint path
include_details = true # include component-level health in response
Both endpoints are excluded from access logs and rate limiting by default.
Metrics Endpoint
Scrape /__bext/metrics with Prometheus. The response is in the standard Prometheus exposition format.
Available Metrics
HTTP Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
http_requests_total |
counter | method, status, path_group |
Total HTTP requests |
http_request_duration_seconds |
histogram | method, path_group |
Request latency (buckets: 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 5s) |
http_request_size_bytes |
histogram | method |
Request body size |
http_response_size_bytes |
histogram | method |
Response body size |
http_active_connections |
gauge | Currently open connections |
Cache Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
cache_hits_total |
counter | layer |
Cache hits (l1 = in-memory, l2 = Redis) |
cache_misses_total |
counter | layer |
Cache misses |
cache_entries |
gauge | layer |
Current cache entry count |
cache_evictions_total |
counter | layer |
Entries evicted |
cache_stampede_coalesced_total |
counter | Requests coalesced by stampede guard | |
isr_revalidations_total |
counter | status |
ISR background revalidations (success, error) |
SSR / V8 Pool Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
render_pool_active |
gauge | render workers currently rendering | |
render_pool_idle |
gauge | render workers available | |
render_pool_total |
gauge | Total render workers configured | |
render_duration_seconds |
histogram | SSR render time | |
render_pool_wait_seconds |
histogram | Time spent waiting for an available worker | |
render_oom_kills_total |
counter | Workers killed due to memory limits |
Plugin Metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
plugin_invocations_total |
counter | plugin, hook |
Plugin hook invocations |
plugin_duration_seconds |
histogram | plugin, hook |
Plugin execution time |
plugin_errors_total |
counter | plugin |
Plugin errors/timeouts |
Process Metrics (when include_process = true)
| Metric | Type | Description |
|---|---|---|
process_resident_memory_bytes |
gauge | RSS memory usage |
process_cpu_seconds_total |
counter | Total CPU time consumed |
process_open_fds |
gauge | Open file descriptor count |
process_uptime_seconds |
gauge | Server uptime |
Example Prometheus Scrape Config
# prometheus.yml
scrape_configs:
- job_name: bext
scrape_interval: 15s
static_configs:
- targets: ["bext-host:3061"]
metrics_path: /__bext/metrics
Health Check Endpoint
GET /__bext/health returns a JSON health report:
{
"status": "healthy",
"uptime_secs": 86412,
"checks": {
"render_pool": "ok",
"isr_cache": "ok",
"redis": "ok",
"tls_certs": "ok",
"disk_space": "ok"
}
}
Returns 200 when healthy, 503 when any component reports degraded. Use this for load balancer health checks and Kubernetes liveness probes.
Grafana Dashboard
A sample Grafana dashboard JSON is available in the bext repository at contrib/grafana-dashboard.json. Import it into Grafana and set the Prometheus data source.
Key panels include:
- Request rate (rate(http_requests_total[5m]))
- P50/P95/P99 latency (histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m])))
- Error rate (rate(http_requests_total{status=~"5.."}[5m]))
- Cache hit ratio (rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])))
- V8 pool saturation (render_pool_active / render_pool_total)
Alerting Rules
Recommended Prometheus alerting rules:
# alerts.yml
groups:
- name: bext
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "bext error rate above 5%"
- alert: HighLatency
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1.0
for: 5m
labels:
severity: warning
annotations:
summary: "bext P95 latency above 1 second"
- alert: JscPoolExhausted
expr: render_pool_idle == 0
for: 2m
labels:
severity: critical
annotations:
summary: "All render workers busy — SSR requests are queuing"
- alert: CacheHitRateLow
expr: rate(cache_hits_total[10m]) / (rate(cache_hits_total[10m]) + rate(cache_misses_total[10m])) < 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "Cache hit rate below 50% — check ISR TTLs and cache capacity"
OpenTelemetry (Pro Feature)
With a Pro or Enterprise license, bext exports traces and metrics via OTLP:
# bext.config.toml
[telemetry]
otlp_endpoint = "http://otel-collector:4317"
otlp_protocol = "grpc" # grpc | http
sample_rate = 0.1 # sample 10% of traces
service_name = "bext-production"
Each request creates a span with request_id, path, status, and duration. Child spans are created for SSR render, cache lookup, plugin hooks, and upstream proxy calls, giving you a full distributed trace for every request.