Scaling
A single bext instance handles thousands of requests per second. When you need more capacity, scale horizontally by running multiple instances behind a load balancer with Redis for cross-instance coordination.
Configuration
# bext.config.toml
[server]
listen = "0.0.0.0:3061"
workers = 0 # 0 = auto (one per CPU core)
[redis]
url = "redis://redis.internal:6379/0"
pool_size = 16 # connection pool size per instance
[scaling]
health_path = "/__bext/health" # health check endpoint
graceful_shutdown_timeout_secs = 30
session_affinity = "none" # none | cookie | ip_hash
session_cookie_name = "BEXT_SID" # cookie name when session_affinity = "cookie"
[rate_limit]
enabled = true
requests_per_minute = 600
distributed = true # share counters across instances via Redis
Redis L2 Cache Sync
When multiple bext instances serve the same app, each has its own in-memory L1 cache (DashMap). Without coordination, cache invalidation on one instance leaves stale content on others.
Enable Redis as the L2 cache layer to solve this:
[redis]
url = "redis://redis.internal:6379/0"
[cache.isr]
max_entries = 10_000
default_ttl_ms = 60_000
default_swr_ms = 3_600_000
l2_enabled = true # write-through to Redis
l2_ttl_ms = 300_000 # Redis entries live longer than L1
Cache flow with L2 enabled:
1. L1 HIT -- Serve from in-memory DashMap (sub-millisecond) 2. L1 MISS, L2 HIT -- Fetch from Redis, promote to L1, serve 3. L1 MISS, L2 MISS -- SSR render, write to both L1 and L2, serve 4. Invalidation -- Purge from L1 and L2; other instances see L2 miss on next request and re-render
ISR revalidation results are written to L2, so other instances pick up fresh content without rendering again.
Distributed Rate Limiting
With distributed = true, rate limit counters are stored in Redis using sliding window counters. All instances share the same budget per client IP:
[rate_limit]
enabled = true
requests_per_minute = 600
distributed = true
Without Redis, each instance tracks its own counters, meaning a client could hit N * 600 requests/minute across N instances.
Sticky Sessions
If your app stores session state in memory (not recommended for scaled deployments), enable session affinity:
[scaling]
session_affinity = "cookie" # route same user to same instance
session_cookie_name = "BEXT_SID"
With cookie affinity, bext sets a cookie on first response containing an instance identifier. The load balancer uses this cookie for routing. With ip_hash, routing is determined by client IP.
For most applications, stateless instances + Redis is preferred over sticky sessions.
Load Balancer Configuration
nginx as Load Balancer
upstream bext_cluster {
least_conn;
server 10.0.1.10:3061;
server 10.0.1.11:3061;
server 10.0.1.12:3061;
}
server {
listen 443 ssl http2;
server_name app.example.com;
location / {
proxy_pass http://bext_cluster;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
HAProxy
frontend web
bind *:443 ssl crt /etc/ssl/app.pem
default_backend bext_servers
backend bext_servers
balance leastconn
option httpchk GET /__bext/health
http-check expect status 200
server bext1 10.0.1.10:3061 check inter 5s fall 3 rise 2
server bext2 10.0.1.11:3061 check inter 5s fall 3 rise 2
server bext3 10.0.1.12:3061 check inter 5s fall 3 rise 2
Cloud Load Balancers (AWS ALB / GCP LB)
Point the target group or backend service at port 3061 on each instance. Set the health check to GET /__bext/health with expected status 200. Enable HTTP/2 between the LB and bext for best performance.
Health Check Endpoint
GET /__bext/health returns 200 when healthy, 503 when degraded. Load balancers should use this for backend health probes:
curl -s http://localhost:3061/__bext/health | jq .
{
"status": "healthy",
"uptime_secs": 172800,
"checks": {
"render_pool": "ok",
"isr_cache": "ok",
"redis": "ok"
}
}
Worker Count Tuning
The workers setting controls how many OS threads handle incoming connections (via Actix worker threads). Defaults to one per CPU core.
[server]
workers = 0 # auto = num_cpus
Guidelines:
- CPU-bound workloads (heavy SSR, image transforms): keep at num_cpus or slightly below to leave headroom for OS tasks.
- I/O-bound workloads (mostly static files, proxy): can go up to 2 * num_cpus since threads spend time waiting on I/O.
- Mixed workloads: stick with the default. The V8 pool has its own worker count ([render].workers) separate from server workers.
Graceful Shutdown
When bext receives SIGTERM or SIGQUIT, it stops accepting new connections and drains in-flight requests for up to graceful_shutdown_timeout_secs:
[scaling]
graceful_shutdown_timeout_secs = 30
During drain:
- New TCP connections are refused (load balancer routes to healthy instances)
- In-flight HTTP requests complete normally
- WebSocket and SSE connections receive a close frame
- After the timeout, remaining connections are forcefully terminated
This integrates cleanly with Kubernetes rolling updates, systemd restarts, and the bext upgrade zero-downtime flow.