If you're sick of hearing about our performance work, we're sorry. We're also not sorry, because we're proud of it. Over the last few months we've shipped a 200x speedup for guardrails, 3.4x faster AI reviews, 2-second guardrail runs, a brand new Go CLI, and a 7.3x storage upgrade by moving to bbolt. Rather than spread the next round across three more posts, we've bundled them into this one.

AI reviews, 12x faster

Not only did we add KV caching (see below), but we also increased the parallelism of our AI reviews by 12x, so now you can run larger agent sessions, full repository sweeps, and big code-base evaluations without waiting in line.

Context caching

If you're familiar, transformer attention scales quadratically with context length -- every new token has to attend to every prior token, which means doubling the prompt roughly quadruples the work. The standard mitigation, covered well in Pope et al.'s "Efficiently Scaling Transformer Inference", is KV caching: the K and V projections from earlier tokens are deterministically reused across calls that share a prefix, so identical prompt chunks get amortized to near-free.

Not every model supports prompt caching, but every one we route to that does (including all of the Claude models), we're now caching against. Since April 3rd, every Zenable inference path -- the Zenable CLI, IDE hooks, GitHub PR reviews, GitLab MR reviews, and the MCP server -- reuses cached prefix attention across requests. The user-visible result is faster responses on repeated context, and a meaningful cut to our inference cost that we get to reinvest in the platform.

bbolt tuning

After migrating the CLI's three local databases to bbolt, we kept tuning. By accepting the right tradeoff for short-lived hook subprocesses -- most notably skipping fdatasync() on every commit -- we picked up roughly another 7x on single writes on top of the migration win. And since our CLI is invoked hundreds of times per coding session, that speedup compounds significantly.

Pro tip: if you've been holding back from running Zenable across an existing legacy codebase because of throughput, today is the day to let it rip. Drop us a line at hello@zenable.io if you want help scoping a sweep, and as always, send us feedback on what to speed up next.

If you're sick of hearing about speedups, sorry. Not sorry.

AI reviews, 12x faster

Context caching

bbolt tuning