Phase 7: Reverse Proxy & Load Balancing

Goal

bext can proxy to upstream backends (Bun API, external services) and load-balance across multiple instances — replacing nginx's upstream + proxy_pass entirely.

Current State

  • Multi-app routing dispatches requests to JSC render pool (single process)
  • No upstream connection pooling
  • No proxy_pass to external backends
  • No round-robin / least-conn / failover across identical servers
  • Canary manager does weighted traffic splitting but not classic load balancing
  • No active health checking of upstreams

Why This Matters

Production bext deployments will need to:

  1. Proxy API routes to backend services (Node.js, Bun, Rust APIs)
  2. Load balance across multiple bext instances or backend replicas
  3. Failover when an upstream is unhealthy
  4. Pool connections to avoid TCP handshake overhead per request

Currently nginx handles all of this. Absorbing it means one less process to manage.

Design

Upstream Definition

# Named upstream groups
[upstreams.api]
servers = [
    { url = "http://127.0.0.1:3001", weight = 5 },
    { url = "http://127.0.0.1:3002", weight = 5 },
    { url = "http://127.0.0.1:3003", weight = 1, backup = true },
]
strategy = "least-conn"           # round-robin | least-conn | ip-hash | random
keepalive = 32                    # Persistent connections per server
keepalive_timeout_ms = 60000
max_connections = 100             # Per-server connection limit
connect_timeout_ms = 3000
read_timeout_ms = 30000
write_timeout_ms = 10000

[upstreams.api.health]
enabled = true
path = "/health"
interval_ms = 10000               # Check every 10 seconds
timeout_ms = 3000
healthy_threshold = 2              # N successes to mark healthy
unhealthy_threshold = 3            # N failures to mark unhealthy

# Route rules proxy to upstream
[[route_rules]]
pattern = "/api/**"
proxy = "api"                      # Proxy to upstream group "api"
strip_prefix = "/api"              # Optional: strip /api before forwarding

Load Balancing Strategies

Strategy Algorithm Use case
round-robin Rotate through servers sequentially Equal servers, stateless APIs
least-conn Pick server with fewest active connections Varying request durations
ip-hash Hash client IP to select server Session affinity without cookies
random Random selection with weighted probability Simple, good distribution

Connection Pooling

Maintain persistent HTTP connections to each upstream server:

struct UpstreamPool {
    name: String,
    servers: Vec,
    strategy: LoadBalanceStrategy,
    pool: Pool,  // hyper connection pool
    health_states: DashMap<String, HealthState>,
}

struct UpstreamServer {
    url: Url,
    weight: u32,
    backup: bool,
    max_connections: u32,
    active_connections: AtomicU32,
}

The pool reuses TCP connections (Connection: keep-alive) across requests, avoiding per-request handshake overhead.

Health Checking

Background task periodically probes each upstream:

async fn health_checker(pool: &UpstreamPool, config: &HealthConfig) {
    let mut interval = tokio::time::interval(config.interval);
    loop {
        interval.tick().await;
        for server in &pool.servers {
            let result = probe_health(&server.url, &config.path, config.timeout).await;
            pool.update_health(&server.url, result);
        }
    }
}

async fn probe_health(url: &Url, path: &str, timeout: Duration) -> HealthResult {
    match tokio::time::timeout(timeout, client.get(url.join(path)?)).await {
        Ok(Ok(resp)) if resp.status().is_success() => HealthResult::Healthy,
        Ok(Ok(resp)) => HealthResult::Unhealthy(format!("status {}", resp.status())),
        Ok(Err(e)) => HealthResult::Unhealthy(e.to_string()),
        Err(_) => HealthResult::Unhealthy("timeout".to_string()),
    }
}

Servers transition between states:

HEALTHY ──── unhealthy_threshold failures ──▶ UNHEALTHY
UNHEALTHY ── healthy_threshold successes ──▶ HEALTHY

Unhealthy servers are skipped during selection. If all primary servers are unhealthy, backup servers are used. If everything is down, return 502 Bad Gateway.

Proxy Headers

When forwarding requests, set standard proxy headers:

X-Forwarded-For: {client_ip}
X-Forwarded-Proto: {http|https}
X-Forwarded-Host: {original_host}
X-Real-IP: {client_ip}
X-Request-ID: {generated_uuid}
Via: 1.1 bext

Request/Response Manipulation

[[route_rules]]
pattern = "/api/**"
proxy = "api"

# Add headers before forwarding to upstream
[route_rules.proxy_headers]
X-Tenant-ID = "{tenant_id}"      # Template variables
X-App-ID = "{app_id}"
Authorization = ""                # Empty = remove header

# Modify response headers from upstream
[route_rules.response_headers]
Server = "bext"                   # Override
X-Powered-By = ""                # Remove

Retry Policy

[upstreams.api]
retry_count = 2                   # Retry on failure
retry_on = ["502", "503", "504", "connect_error", "timeout"]
retry_delay_ms = 100              # Wait between retries

# Idempotent methods only by default (GET, HEAD, OPTIONS)
# Set retry_unsafe = true to retry POST/PUT/DELETE
retry_unsafe = false

Circuit Breaker

Per-upstream circuit breaker (Traefik-inspired):

[upstreams.api.circuit_breaker]
enabled = true
failure_threshold = 5             # Open after N failures
success_threshold = 3             # Close after N successes (half-open)
timeout_ms = 30000                # Time in open state before half-open

States:

CLOSED ──── failure_threshold ──▶ OPEN ──── timeout ──▶ HALF_OPEN
                                                            │
                                  success_threshold ◀───────┘
                                       │
                                    CLOSED

WebSocket Proxying

When an upstream serves WebSocket, bext must handle the Upgrade header:

if req.headers().get("upgrade") == Some("websocket") {
    // Bidirectional proxy: client <-> bext <-> upstream
    let upstream_ws = connect_ws(&upstream_url).await?;
    return proxy_websocket(client_ws, upstream_ws).await;
}
[[route_rules]]
pattern = "/ws/**"
proxy = "api"
websocket = true                  # Enable WS upgrade proxying
websocket_timeout_ms = 86400000   # 24h idle timeout for WS connections

Implementation

New Module: bext-server/src/proxy/

bext-server/src/proxy/
  mod.rs              # Proxy middleware entry point
  upstream.rs         # UpstreamPool, server selection
  health.rs           # Background health checker
  circuit_breaker.rs  # Per-upstream circuit breaker
  connection.rs       # Connection pool (hyper Client)
  headers.rs          # Header manipulation (add/remove/template)
  retry.rs            # Retry policy with backoff
  websocket.rs        # WebSocket upgrade proxy

Key Dependencies

Crate Purpose
reqwest HTTP client with connection pooling (already in deps)
hyper Low-level HTTP client for connection pool management
tokio-tungstenite WebSocket proxy (bidirectional stream)

Integration with Route Rules

Extend the route rules engine:

enum RouteAction {
    Render(RenderConfig),          // Existing: JSC render
    Proxy(ProxyConfig),            // New: forward to upstream
    Static(StaticConfig),          // Existing: serve static file
    Redirect(RedirectConfig),      // Existing: redirect
}

struct ProxyConfig {
    upstream: String,              // Upstream group name
    strip_prefix: Option,
    proxy_headers: HashMap<String, String>,
    response_headers: HashMap<String, String>,
    websocket: bool,
    retry: RetryConfig,
}

Testing Plan

Test Type What it validates
Round-robin selection Unit Requests distributed evenly
Least-conn selection Unit Picks server with fewest connections
IP-hash selection Unit Same IP always routes to same server
Weighted selection Unit Weights respected in distribution
Connection pool reuse Integration TCP connections reused across requests
Health check probe Unit Healthy/unhealthy transitions at threshold
Backup server failover Unit Backup used when all primaries unhealthy
All-down 502 Unit Returns 502 when everything is down
Proxy headers Unit X-Forwarded-* headers set correctly
Header manipulation Unit Add/remove/template headers
Strip prefix Unit /api/users/users when strip_prefix = "/api"
Retry on 502 Integration Retries on configured status codes
Retry idempotent only Unit POST not retried by default
Circuit breaker open Unit Requests rejected when circuit open
Circuit breaker half-open Unit Trial request after timeout
WebSocket proxy Integration Bidirectional WS frames forwarded
Connection timeout Unit Returns 504 on upstream connect timeout
Read timeout Unit Returns 504 on upstream read timeout
Keepalive Integration Idle connections closed at timeout

Done Criteria

  • Upstream groups defined in config with multiple servers
  • Round-robin, least-conn, ip-hash, random strategies
  • Connection pooling with configurable keepalive
  • Active health checking with healthy/unhealthy thresholds
  • Backup server failover
  • Proxy headers (X-Forwarded-*, X-Real-IP, X-Request-ID)
  • Header add/remove/template for proxy and response
  • Strip prefix for path rewriting
  • Retry policy (configurable status codes, idempotent only)
  • Circuit breaker per upstream
  • WebSocket upgrade proxying
  • Connection and read timeouts
  • 502 response when all upstreams down
  • All tests passing