Locking

The Locking capability is how a plugin says "only one caller at a time, please." A LockingPlugin implements three operations — try_lock, renew, release — against a shared backend, and any other plugin or the bext runtime itself can take a lock on a string key and know that nobody else on the cluster is running the same work.

It is the E2 shape that turns E1's Scheduled capability from "almost correct" into "actually safe on multi-instance deployments." If you run two bext nodes behind a load balancer and your scheduled job declares LockingHint::RequireGlobal, the scheduler acquires a lock on schedule_id before calling run() — the node that fails to acquire simply skips the tick. Without Locking, both nodes would fire the same job at the same time.

The trait

pub trait LockingPlugin: Send + Sync {
    fn name(&self) -> &str;

    fn try_lock(&self, key: &str, ttl_ms: u64) -> Result<Option, String>;
    fn renew(&self, handle: &LockHandle) -> Result<(), String>;
    fn release(&self, handle: LockHandle) -> Result<(), String>;

    fn cleanup(&self) -> Result<(), String> { Ok(()) }
}

Three things are worth calling out about the shape.

Contention is not failure. try_lock returns Ok(Some(handle)) when you acquired the lock, Ok(None) when someone else holds it, and Err(..) only when the backend itself is broken (Redis unreachable, Postgres connection dropped). Callers almost always branch on that difference — retry in a moment on contention, page oncall on backend failure — so encoding it in the outer Result is cleaner than a typed LockError::Held variant every call site has to match. This is the load-bearing shape choice and the reason the trait does not use a typed error enum.

Everything else is Result<_, String>. renew and release cannot meaningfully distinguish "your lock expired out from under you" from "the backend is broken." In both cases the right response is "you no longer own the lock; stop holding it." Folding them into Err(String) keeps the trait lean and WASM-ABI friendly.

Opaque LockHandle. The handle carries a random token minted by the plugin at acquisition time. Backends use it to implement CAS-style release so a stale caller cannot clobber a newer lock holder — Redlock's SET NX PX with a random value and a Lua script checking it before DEL is the canonical example. Callers must treat the handle as opaque and pass it back unchanged to renew and release.

The three reference backends

Three reference plugins ship alongside the trait. Pick the one that matches your deployment shape.

@bext/locking-memory

A Mutex<HashMap<String, (lock_id, expires_at)>> inside a single process. Not distributed — two processes on the same machine, let alone two nodes, will not see each other's locks. It exists for local development, single-instance deployments, and as the always-correct fallback when LockingHint::PreferGlobal is declared but no distributed backend is configured.

Expired entries are lazily swept on every access so a long-lived dead lock never wedges a key.

@bext/locking-redis

Redlock-style Redis locks. The algorithm is straightforward:

- AcquireSET key value NX PX ttl_ms. NX ensures the key is set only if it doesn't already exist; PX applies a millisecond-precision TTL so the lock auto-expires if the holder crashes. value is a random UUIDv4 minted per call.

- Renew — a Lua script that checks the stored value matches the caller's token, then PEXPIREs the key. Atomic server-side.

- Release — a Lua script that DELs the key only if the stored value matches the caller's token. Also atomic.

This is the single-node Redlock variant. Full Redlock against a majority of N independent Redis instances is out of scope for the reference plugin — projects that need the multi-primary story can drive a proper redlock crate from their own wrapper and still satisfy the trait.

@bext/locking-pg

Postgres session-level advisory locks via pg_try_advisory_lock(hashtext(key)::bigint). The hashtext cast maps arbitrary string keys into the 64-bit integer space advisory locks require; collisions are astronomically unlikely for reasonable key sets.

Unlike Redis locks, Postgres advisory locks have no TTL — they live until explicitly released or until the connection closes. This pushes an unusual constraint onto the plugin: each live lock must hold its own Postgres connection open, keyed internally by the minted lock_id. release looks the connection up, calls pg_advisory_unlock, and drops it; dropping the connection is a belt-and-braces fallback because Postgres releases every advisory lock on session termination.

renew on the Postgres backend is a liveness ping on the held connection — there's nothing to actually extend, but verifying the connection still works before telling the caller "you still own this" is useful.

Redlock vs. pg-advisory

The two distributed backends have different shapes, and they fail differently under partition. Knowing the tradeoffs helps you pick.

Redlock is the right choice when you already have Redis on the path — session store, cache, rate limiter — and you want locks to be as close to free as distributed locks can be. The operations are constant-time, run in a single round trip, and handle thousands of acquires per second without breaking a sweat. The TTL is a real timer: if a holder crashes, the lock releases automatically after ttl_ms, which means clock skew between the holder and Redis matters — a clock that runs fast on the holder will release the lock early from its own perspective while Redis still considers it held. Keep TTLs generous and clocks synced.

Postgres advisory locks are the right choice when you already have Postgres and don't want another moving part. They're strongly consistent with respect to everything else in the database — if your job reads rows that the lock protects, you get a single serializability boundary for free. The downside is the connection-per-lock model: a thousand concurrent locks means a thousand open Postgres connections, which most deployments cannot afford. Use advisory locks when the number of concurrently-held locks is small (tens, not thousands) and you value consistency over throughput.

The memory backend is neither of the above — it's a single-node fallback. Do not ship production multi-node traffic on it.

How Scheduled uses it

The E1 Scheduled capability already had a LockingHint field on every schedule with three values — NodeLocal, RequireGlobal, PreferGlobal. Before E2 that field was a promise with no implementation behind it; now the promise is real:

- NodeLocal — the scheduler skips locking entirely. Every node may fire its own copy. Use for per-node maintenance like local cache GC.

- RequireGlobal — at each tick the scheduler calls LockingPlugin::try_lock keyed on schedule_id before invoking run(). On Ok(None) another node is running it; skip this tick. On Err(..) log and skip.

- PreferGlobal — try the lock as above, but if no LockingPlugin is configured, fall back to NodeLocal behavior rather than failing the install. This lets a plugin work correctly in both single-node and multi-node deployments without hard-failing on operators who haven't configured a locking backend.

The TTL the scheduler uses is the expected runtime of the job plus a safety margin; run() taking longer than the TTL is a bug in the schedule declaration, not the locking plugin. The plugin declares what it expects via the schedule's jitter_ms and the operator's config.

Configuration

# bext.config.toml

[locking]
backend = "redis"   # one of: "memory", "redis", "pg"

[locking.redis]
url = "redis://127.0.0.1:6379"
key_prefix = "bext:lock:"

[locking.pg]
url = "postgres://postgres@localhost/bext"

# scheduled + locking together
[[scheduled]]
plugin = "@bext/cron"
hint = "require_global"    # or "prefer_global" / "node_local"

Swapping between redis and pg is a one-line config change; your code that calls lock.try(key, ttl) through the host-function API does not change.

Host functions (future)

The trait is available for plugins to implement. A future bext-core change will wire it into the host-function table so other plugins can take locks through a lock.try(key, ttl) / lock.release(handle) API without depending on any specific backend. Until then, plugins that need locking can embed a LockingPlugin trait object directly, the way the scheduler does.

Where to go next

- The Scheduled capability is the canonical consumer — its LockingHint field is how you opt into the locking path from a cron plugin.

- The capabilities overview covers the promotion ladder, anti-fragmentation rules, and how new capabilities graduate to first-party stable.