Phase 7: Reverse Proxy & Load Balancing
Goal
bext can proxy to upstream backends (Bun API, external services) and load-balance across multiple instances — replacing nginx's upstream + proxy_pass entirely.
Current State
- Multi-app routing dispatches requests to JSC render pool (single process)
- No upstream connection pooling
- No proxy_pass to external backends
- No round-robin / least-conn / failover across identical servers
- Canary manager does weighted traffic splitting but not classic load balancing
- No active health checking of upstreams
Why This Matters
Production bext deployments will need to:
- Proxy API routes to backend services (Node.js, Bun, Rust APIs)
- Load balance across multiple bext instances or backend replicas
- Failover when an upstream is unhealthy
- Pool connections to avoid TCP handshake overhead per request
Currently nginx handles all of this. Absorbing it means one less process to manage.
Design
Upstream Definition
# Named upstream groups
[upstreams.api]
servers = [
{ url = "http://127.0.0.1:3001", weight = 5 },
{ url = "http://127.0.0.1:3002", weight = 5 },
{ url = "http://127.0.0.1:3003", weight = 1, backup = true },
]
strategy = "least-conn" # round-robin | least-conn | ip-hash | random
keepalive = 32 # Persistent connections per server
keepalive_timeout_ms = 60000
max_connections = 100 # Per-server connection limit
connect_timeout_ms = 3000
read_timeout_ms = 30000
write_timeout_ms = 10000
[upstreams.api.health]
enabled = true
path = "/health"
interval_ms = 10000 # Check every 10 seconds
timeout_ms = 3000
healthy_threshold = 2 # N successes to mark healthy
unhealthy_threshold = 3 # N failures to mark unhealthy
# Route rules proxy to upstream
[[route_rules]]
pattern = "/api/**"
proxy = "api" # Proxy to upstream group "api"
strip_prefix = "/api" # Optional: strip /api before forwarding
Load Balancing Strategies
| Strategy | Algorithm | Use case |
|---|---|---|
round-robin |
Rotate through servers sequentially | Equal servers, stateless APIs |
least-conn |
Pick server with fewest active connections | Varying request durations |
ip-hash |
Hash client IP to select server | Session affinity without cookies |
random |
Random selection with weighted probability | Simple, good distribution |
Connection Pooling
Maintain persistent HTTP connections to each upstream server:
struct UpstreamPool {
name: String,
servers: Vec,
strategy: LoadBalanceStrategy,
pool: Pool, // hyper connection pool
health_states: DashMap<String, HealthState>,
}
struct UpstreamServer {
url: Url,
weight: u32,
backup: bool,
max_connections: u32,
active_connections: AtomicU32,
}
The pool reuses TCP connections (Connection: keep-alive) across requests, avoiding per-request handshake overhead.
Health Checking
Background task periodically probes each upstream:
async fn health_checker(pool: &UpstreamPool, config: &HealthConfig) {
let mut interval = tokio::time::interval(config.interval);
loop {
interval.tick().await;
for server in &pool.servers {
let result = probe_health(&server.url, &config.path, config.timeout).await;
pool.update_health(&server.url, result);
}
}
}
async fn probe_health(url: &Url, path: &str, timeout: Duration) -> HealthResult {
match tokio::time::timeout(timeout, client.get(url.join(path)?)).await {
Ok(Ok(resp)) if resp.status().is_success() => HealthResult::Healthy,
Ok(Ok(resp)) => HealthResult::Unhealthy(format!("status {}", resp.status())),
Ok(Err(e)) => HealthResult::Unhealthy(e.to_string()),
Err(_) => HealthResult::Unhealthy("timeout".to_string()),
}
}
Servers transition between states:
HEALTHY ──── unhealthy_threshold failures ──▶ UNHEALTHY
UNHEALTHY ── healthy_threshold successes ──▶ HEALTHY
Unhealthy servers are skipped during selection. If all primary servers are unhealthy, backup servers are used. If everything is down, return 502 Bad Gateway.
Proxy Headers
When forwarding requests, set standard proxy headers:
X-Forwarded-For: {client_ip}
X-Forwarded-Proto: {http|https}
X-Forwarded-Host: {original_host}
X-Real-IP: {client_ip}
X-Request-ID: {generated_uuid}
Via: 1.1 bext
Request/Response Manipulation
[[route_rules]]
pattern = "/api/**"
proxy = "api"
# Add headers before forwarding to upstream
[route_rules.proxy_headers]
X-Tenant-ID = "{tenant_id}" # Template variables
X-App-ID = "{app_id}"
Authorization = "" # Empty = remove header
# Modify response headers from upstream
[route_rules.response_headers]
Server = "bext" # Override
X-Powered-By = "" # Remove
Retry Policy
[upstreams.api]
retry_count = 2 # Retry on failure
retry_on = ["502", "503", "504", "connect_error", "timeout"]
retry_delay_ms = 100 # Wait between retries
# Idempotent methods only by default (GET, HEAD, OPTIONS)
# Set retry_unsafe = true to retry POST/PUT/DELETE
retry_unsafe = false
Circuit Breaker
Per-upstream circuit breaker (Traefik-inspired):
[upstreams.api.circuit_breaker]
enabled = true
failure_threshold = 5 # Open after N failures
success_threshold = 3 # Close after N successes (half-open)
timeout_ms = 30000 # Time in open state before half-open
States:
CLOSED ──── failure_threshold ──▶ OPEN ──── timeout ──▶ HALF_OPEN
│
success_threshold ◀───────┘
│
CLOSED
WebSocket Proxying
When an upstream serves WebSocket, bext must handle the Upgrade header:
if req.headers().get("upgrade") == Some("websocket") {
// Bidirectional proxy: client <-> bext <-> upstream
let upstream_ws = connect_ws(&upstream_url).await?;
return proxy_websocket(client_ws, upstream_ws).await;
}
[[route_rules]]
pattern = "/ws/**"
proxy = "api"
websocket = true # Enable WS upgrade proxying
websocket_timeout_ms = 86400000 # 24h idle timeout for WS connections
Implementation
New Module: bext-server/src/proxy/
bext-server/src/proxy/
mod.rs # Proxy middleware entry point
upstream.rs # UpstreamPool, server selection
health.rs # Background health checker
circuit_breaker.rs # Per-upstream circuit breaker
connection.rs # Connection pool (hyper Client)
headers.rs # Header manipulation (add/remove/template)
retry.rs # Retry policy with backoff
websocket.rs # WebSocket upgrade proxy
Key Dependencies
| Crate | Purpose |
|---|---|
reqwest |
HTTP client with connection pooling (already in deps) |
hyper |
Low-level HTTP client for connection pool management |
tokio-tungstenite |
WebSocket proxy (bidirectional stream) |
Integration with Route Rules
Extend the route rules engine:
enum RouteAction {
Render(RenderConfig), // Existing: JSC render
Proxy(ProxyConfig), // New: forward to upstream
Static(StaticConfig), // Existing: serve static file
Redirect(RedirectConfig), // Existing: redirect
}
struct ProxyConfig {
upstream: String, // Upstream group name
strip_prefix: Option,
proxy_headers: HashMap<String, String>,
response_headers: HashMap<String, String>,
websocket: bool,
retry: RetryConfig,
}
Testing Plan
| Test | Type | What it validates |
|---|---|---|
| Round-robin selection | Unit | Requests distributed evenly |
| Least-conn selection | Unit | Picks server with fewest connections |
| IP-hash selection | Unit | Same IP always routes to same server |
| Weighted selection | Unit | Weights respected in distribution |
| Connection pool reuse | Integration | TCP connections reused across requests |
| Health check probe | Unit | Healthy/unhealthy transitions at threshold |
| Backup server failover | Unit | Backup used when all primaries unhealthy |
| All-down 502 | Unit | Returns 502 when everything is down |
| Proxy headers | Unit | X-Forwarded-* headers set correctly |
| Header manipulation | Unit | Add/remove/template headers |
| Strip prefix | Unit | /api/users → /users when strip_prefix = "/api" |
| Retry on 502 | Integration | Retries on configured status codes |
| Retry idempotent only | Unit | POST not retried by default |
| Circuit breaker open | Unit | Requests rejected when circuit open |
| Circuit breaker half-open | Unit | Trial request after timeout |
| WebSocket proxy | Integration | Bidirectional WS frames forwarded |
| Connection timeout | Unit | Returns 504 on upstream connect timeout |
| Read timeout | Unit | Returns 504 on upstream read timeout |
| Keepalive | Integration | Idle connections closed at timeout |
Done Criteria
- Upstream groups defined in config with multiple servers
- Round-robin, least-conn, ip-hash, random strategies
- Connection pooling with configurable keepalive
- Active health checking with healthy/unhealthy thresholds
- Backup server failover
- Proxy headers (X-Forwarded-*, X-Real-IP, X-Request-ID)
- Header add/remove/template for proxy and response
- Strip prefix for path rewriting
- Retry policy (configurable status codes, idempotent only)
- Circuit breaker per upstream
- WebSocket upgrade proxying
- Connection and read timeouts
- 502 response when all upstreams down
- All tests passing