07 — Observability
Metrics, logging, tracing, and health monitoring for the bext platform.
Current State
- Logging:
tracingcrate with JSON output, env-based log level filter - Metrics: Analytics plugin (per-request counters, Prometheus or JSON export)
- Health:
/healthendpoint with cache stats - Monitoring:
/metricsendpoint with JSC pool + cache metrics
Target State
Comprehensive observability covering:
- Structured metrics with dimensional labels (per-app, per-route, per-plugin)
- Distributed tracing with OpenTelemetry
- Health checks with dependency monitoring
- Alerting hooks for anomaly detection
Metrics
Core Metrics
# Request metrics
bext_requests_total{app, method, status, route}
bext_request_duration_ms{app, route, quantile}
bext_request_size_bytes{app, direction} # direction: request | response
# Cache metrics
bext_cache_hits_total{app, layer} # layer: isr | fragment | layout | tenant
bext_cache_misses_total{app, layer}
bext_cache_entries{app, layer}
bext_cache_bytes{app, layer}
bext_cache_evictions_total{app, layer}
bext_cache_invalidations_total{app, method} # method: tag | path | gc
# Isolate metrics
bext_isolate_count{app}
bext_isolate_memory_bytes{app}
bext_isolate_render_duration_ms{app, quantile}
bext_isolate_errors_total{app, error_type}
# Compression metrics
bext_compression_ratio{app, encoding} # encoding: gzip | brotli
bext_compression_duration_us{app, encoding}
# Plugin metrics
bext_plugin_duration_us{plugin, hook} # hook: on_request | on_response | etc.
bext_plugin_errors_total{plugin, hook}
bext_plugin_fuel_consumed{plugin}
# Deploy metrics
bext_deploys_total{app, status} # status: success | failed | rolled_back
bext_deploy_duration_ms{app}
# Flow engine metrics
bext_flow_active_runs
bext_flow_completed_total
bext_flow_failed_total
bext_flow_step_duration_ms{flow_name}
Prometheus Endpoint
GET /metrics
Content-Type: text/plain; version=0.0.4
bext_requests_total{app="marketing",method="GET",status="200",route="/about"} 42531
bext_request_duration_ms{app="marketing",route="/about",quantile="0.5"} 2.1
bext_request_duration_ms{app="marketing",route="/about",quantile="0.99"} 48.3
bext_cache_hits_total{app="marketing",layer="isr"} 39842
bext_cache_misses_total{app="marketing",layer="isr"} 2689
...
Structured JSON Metrics
GET /metrics?format=json
{
"timestamp": "2026-03-28T15:30:00Z",
"uptime_secs": 86400,
"apps": {
"marketing": {
"requests": { "total": 42531, "rps": 12.4 },
"cache": { "hit_rate": 0.937, "entries": 2481, "bytes": 155189248 },
"isolate": { "workers": 4, "memory_mb": 48, "avg_render_ms": 3.2 },
"errors": { "total": 12, "rate": 0.0003 }
}
},
"plugins": {
"analytics": { "calls": 42531, "avg_us": 12 },
"security-headers": { "calls": 42531, "avg_us": 3 }
},
"system": {
"memory_mb": 256,
"cpu_percent": 15.2,
"open_fds": 128
}
}
Logging
Log Format
Structured JSON logs with trace context:
{
"timestamp": "2026-03-28T15:30:42.123Z",
"level": "info",
"target": "bext_server::handler",
"message": "request completed",
"app": "marketing",
"method": "GET",
"path": "/about",
"status": 200,
"duration_ms": 2.1,
"cache": "hit",
"trace_id": "abc123def456",
"span_id": "789ghi"
}
Log Levels
| Level | What gets logged |
|---|---|
error |
Request failures, isolate crashes, plugin errors |
warn |
Cache evictions, slow renders (>100ms), config issues |
info |
Request completions, deploys, cache invalidations |
debug |
Cache hits/misses, transform timing, plugin calls |
trace |
Full request/response bodies, WASM fuel consumption |
Per-App Log Filtering
[apps.marketing]
log_level = "info"
[apps.api]
log_level = "debug" # More verbose for API debugging
Health Checks
/health Endpoint
{
"status": "healthy",
"version": "0.5.0",
"uptime_secs": 86400,
"checks": {
"database": { "status": "healthy", "latency_ms": 2 },
"cache": { "status": "healthy", "entries": 4573, "hit_rate": 0.93 },
"isolates": { "status": "healthy", "active": 8, "max": 100 },
"flow_engine": { "status": "healthy", "active_runs": 3 },
"plugins": { "status": "healthy", "loaded": 4 },
"disk": { "status": "healthy", "data_dir_mb": 512, "free_mb": 10240 }
},
"apps": {
"marketing": { "status": "running", "version": "abc1234" },
"dashboard": { "status": "running", "version": "def5678" },
"api": { "status": "running", "version": "ghi9012" }
}
}
Liveness vs Readiness
| Endpoint | Purpose | Checks |
|---|---|---|
/health/live |
Is the process alive? | Always 200 |
/health/ready |
Can it serve traffic? | DB connected, at least 1 isolate ready |
/health |
Full health check | All subsystems |
Implementation Tasks
OB-1: Metrics System
Tasks:
- Create
bext-core/src/metrics.rswith metric registry - Counter, Gauge, Histogram types (lockless, atomic)
- Per-app metric labels
- Prometheus text format export
- JSON format export
- Request middleware that records metrics
- Cache operation metrics (already have hit/miss, add latency)
- Isolate metrics (render time, memory)
- Plugin metrics (execution time, errors)
OB-2: Structured Logging
Tasks:
- Per-app log level configuration
- Request context in all log entries (app, trace_id) —
tracingcrate with structured fields - Log rotation / output to file (optional)
- Access log format (combined or JSON)
- Slow request logging (threshold configurable)
OB-3: Health Check System
Tasks:
-
/health/liveendpoint (always 200) -
/health/readyendpoint (checks critical dependencies) -
/healthfull health with per-subsystem status - Configurable health check thresholds
- Per-app health status
OB-4: OpenTelemetry Integration (Future)
Note: Config scaffolding only (
TelemetryConfigstruct inconfig.rs) — noopentelemetrycrate dependency yet.
Tasks:
- OTLP exporter for traces
- Trace context propagation (incoming W3C trace headers)
- Span creation for key operations (render, cache, plugin)
- Integration with tracing crate (already used)
- Config:
[telemetry] otlp_endpoint = "http://..."