Back to Blog

Making the CLI up to 7.3x faster (again) by moving to bbolt

ChangelogAI NativeFreeProEnterprise

Zenable's CLI can run as an agentic IDE hook handler on every tool use, prompt submit, and session stop. A typical coding session fires hundreds of invocations, each opening and closing local databases for checkpoint state, guardrails cache, and OAuth tokens. Our workloads are read-heavy, our runtimes are independent short-lived processes (not a daemon), and everything is local to the developer's machine to maximize CPU cache hits and minimize guardrails overhead.

We recently migrated all three databases from SQLite to bbolt. Here's why, what we evaluated, and the performance we measured.

What we evaluated

We looked at six embedded storage options for Go before landing on bbolt.

EnginePure GoSingle fileFit for short-lived CLIWhy we passed
bboltYesYesExcellent(selected)
modernc.org/sqliteYesYes (+ WAL/SHM)GoodSQL parsing overhead for KV workloads; WAL companion files; lock contention requiring retry logic
BadgerDBYesNo (directory of SSTs + vlogs)PoorSpawns background compaction goroutines; directory-level lock blocks concurrent hooks; designed for long-running servers
BuntDBYesYes (AOF)ModerateFull in-memory replay on every open; maintenance stalled (no commits since March 2024)
PebbleCGo recommendedNo (LSM directory)PoorCockroachDB's storage engine; multi-file, high memory baseline; scoped to server workloads
DuckDBCGo requiredYesPoorOLAP columnar engine; complete category mismatch for KV operations

The decision came down to our constraints: pure Go (no CGo, clean cross-compilation), single file per database (no companion files to orphan or corrupt), and minimal overhead for open-close-per-invocation patterns. bbolt was the only candidate that checked every box.

Why not SQLite

SQLite is a great database. We weren't leaving because it's bad; we were leaving because our access patterns don't need it. All three databases do the same thing: put a key, get a key, prefix scan, delete stale entries. No JOINs, no indexes beyond the primary key, no relational queries. SQL parsing and query planning is overhead we pay on every operation for features we never use.

The operational pain was more concrete. SQLite in WAL mode creates -wal and -shm companion files. When hook processes crash (which happens when IDEs kill subprocesses) those files can get orphaned and corrupt state. We'd built retry/backoff infrastructure (busy_timeout, isSQLiteLocked, cacheRetryConfig) specifically to handle concurrent hook processes fighting over the same database. bbolt's file-level locking with a simple timeout eliminates all of that.

Since our CLI doesn't run as a service or daemon, connection pooling is only useful within a single short-lived invocation. Both SQLite and bbolt face the same constraint: open the file, do the work, close it. No persistent pool survives across hook invocations. This levels the playing field and lets the raw operation performance speak.

Benchmark results

All benchmarks simulate the production pattern: open database, perform operation, close database. No connection pooling, no persistent handles. Exactly how the CLI runs as a hook subprocess.

OperationbboltSQLiteSpeedupAllocations
Open + close25 us79 us3.2x21 vs 52 (60% fewer)
Single key write51 us352 us6.9x71 vs 75
Single key read27 us90 us3.3x30 vs 74 (59% fewer)
Write + read round trip53 us386 us7.3x80 vs 97
Prefix scan (20 entries)20 us127 us6.4x19 vs 339 (94% fewer)
Concurrent reads (14 threads)15 us85 us5.7x18 vs 74 (76% fewer)

bbolt is faster across every operation. The prefix scan result is notable: 94% fewer allocations, which directly reduces GC pressure in a process that starts and stops hundreds of times per session.

When tuning for the performance characteristics we were looking for, we found that by accepting a slight integrity tradeoff during catastrophic failures (process crash mid-write), we could skip fdatasync() on each commit. Since all three databases are transient caches that rebuild on next run, this tradeoff carries very little practical risk for our use case and accounts for the bulk of the write speedup (6.9x for single writes, 7.3x for round trips).

Migration

The new version seamlessly migrates from SQLite with no interaction needed. On first run after upgrade, the CLI detects existing SQLite files by magic bytes, reads the data, writes it to a new bbolt database, and cleans up.

What this means in practice

A typical coding session generates 80-100 database operations across hook invocations, roughly 80% reads. Faster hook response times, zero lock contention errors, no orphaned WAL files, and one less thing between the developer and their coding agent.

This is the storage layer half of the story. We also optimized guardrail runtime batching from groups of 5 to all-at-once (58x faster), moved guardrails to local sync and on-device execution for 200x faster checks, shipped 3.4x faster AI reviews with 9.6x more consistent latency, and rewrote the CLI as a native Go binary that set the stage for this bbolt migration.