Scaling

A single bext instance handles thousands of requests per second. When you need more capacity, scale horizontally by running multiple instances behind a load balancer with Redis for cross-instance coordination.

Configuration

# bext.config.toml
[server]
listen = "0.0.0.0:3061"
workers = 0                      # 0 = auto (one per CPU core)

[redis]
url = "redis://redis.internal:6379/0"
pool_size = 16                   # connection pool size per instance

[scaling]
health_path = "/__bext/health"   # health check endpoint
graceful_shutdown_timeout_secs = 30
session_affinity = "none"        # none | cookie | ip_hash
session_cookie_name = "BEXT_SID" # cookie name when session_affinity = "cookie"

[rate_limit]
enabled = true
requests_per_minute = 600
distributed = true               # share counters across instances via Redis

Redis L2 Cache Sync

When multiple bext instances serve the same app, each has its own in-memory L1 cache (DashMap). Without coordination, cache invalidation on one instance leaves stale content on others.

Enable Redis as the L2 cache layer to solve this:

[redis]
url = "redis://redis.internal:6379/0"

[cache.isr]
max_entries = 10_000
default_ttl_ms = 60_000
default_swr_ms = 3_600_000
l2_enabled = true                # write-through to Redis
l2_ttl_ms = 300_000              # Redis entries live longer than L1

Cache flow with L2 enabled:

1. L1 HIT -- Serve from in-memory DashMap (sub-millisecond) 2. L1 MISS, L2 HIT -- Fetch from Redis, promote to L1, serve 3. L1 MISS, L2 MISS -- SSR render, write to both L1 and L2, serve 4. Invalidation -- Purge from L1 and L2; other instances see L2 miss on next request and re-render

ISR revalidation results are written to L2, so other instances pick up fresh content without rendering again.

Distributed Rate Limiting

With distributed = true, rate limit counters are stored in Redis using sliding window counters. All instances share the same budget per client IP:

[rate_limit]
enabled = true
requests_per_minute = 600
distributed = true

Without Redis, each instance tracks its own counters, meaning a client could hit N * 600 requests/minute across N instances.

Sticky Sessions

If your app stores session state in memory (not recommended for scaled deployments), enable session affinity:

[scaling]
session_affinity = "cookie"       # route same user to same instance
session_cookie_name = "BEXT_SID"

With cookie affinity, bext sets a cookie on first response containing an instance identifier. The load balancer uses this cookie for routing. With ip_hash, routing is determined by client IP.

For most applications, stateless instances + Redis is preferred over sticky sessions.

Load Balancer Configuration

nginx as Load Balancer

upstream bext_cluster {
    least_conn;
    server 10.0.1.10:3061;
    server 10.0.1.11:3061;
    server 10.0.1.12:3061;
}

server {
    listen 443 ssl http2;
    server_name app.example.com;

    location / {
        proxy_pass http://bext_cluster;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

HAProxy

frontend web
    bind *:443 ssl crt /etc/ssl/app.pem
    default_backend bext_servers

backend bext_servers
    balance leastconn
    option httpchk GET /__bext/health
    http-check expect status 200

    server bext1 10.0.1.10:3061 check inter 5s fall 3 rise 2
    server bext2 10.0.1.11:3061 check inter 5s fall 3 rise 2
    server bext3 10.0.1.12:3061 check inter 5s fall 3 rise 2

Cloud Load Balancers (AWS ALB / GCP LB)

Point the target group or backend service at port 3061 on each instance. Set the health check to GET /__bext/health with expected status 200. Enable HTTP/2 between the LB and bext for best performance.

Health Check Endpoint

GET /__bext/health returns 200 when healthy, 503 when degraded. Load balancers should use this for backend health probes:

curl -s http://localhost:3061/__bext/health | jq .

{
  "status": "healthy",
  "uptime_secs": 172800,
  "checks": {
    "render_pool": "ok",
    "isr_cache": "ok",
    "redis": "ok"
  }
}

Worker Count Tuning

The workers setting controls how many OS threads handle incoming connections (via Actix worker threads). Defaults to one per CPU core.

[server]
workers = 0    # auto = num_cpus

Guidelines:

- CPU-bound workloads (heavy SSR, image transforms): keep at num_cpus or slightly below to leave headroom for OS tasks.

- I/O-bound workloads (mostly static files, proxy): can go up to 2 * num_cpus since threads spend time waiting on I/O.

- Mixed workloads: stick with the default. The V8 pool has its own worker count ([render].workers) separate from server workers.

Graceful Shutdown

When bext receives SIGTERM or SIGQUIT, it stops accepting new connections and drains in-flight requests for up to graceful_shutdown_timeout_secs:

[scaling]
graceful_shutdown_timeout_secs = 30

During drain:

- New TCP connections are refused (load balancer routes to healthy instances)

- In-flight HTTP requests complete normally

- WebSocket and SSE connections receive a close frame

- After the timeout, remaining connections are forcefully terminated

This integrates cleanly with Kubernetes rolling updates, systemd restarts, and the bext upgrade zero-downtime flow.