All articles
Engineering/6 minutes read

How we built a virtual filesystem for our Assistant

March 24, 2026

DS

Dens Sumesh

Engineering

Share this article


How we built a virtual filesystem for our Assistant

RAG is great, until it isn't.

Powering documentation for tens of thousands of customers means we see every edge case imaginable. When a user asks a basic question, a simple search setup works fine, but the second a question spans multiple pages or requires exact syntax, the AI starts hallucinating or handing back incomplete answers.

The core problem was that our assistant had no way to explore. It could only retrieve isolated chunks of text that matched a query. If the answer lived across multiple pages, or if the user needed exact syntax that didn't happen to land in a top-K result, the assistant was stuck. We wanted the agent to actively navigate the docs the way you'd navigate a codebase, building up context incrementally until it had the full picture before answering.

Modern LLMs are already post-trained to use bash tools like grep, cat, ls, and find, so the most natural interface for this kind of exploration is a filesystem. If each doc page is a file and each section of the docs is a directory, the agent can use the tools it already knows to search for exact strings, read full pages, and traverse the doc structure on its own.

We just needed a filesystem that mirrored the page structure of the live docs site.

The Container Bottleneck

The obvious way to do this is to just give the agent a real filesystem. Most harnesses solve this by spinning up an isolated sandbox and cloning the repo. We already use sandboxes for asynchronous background agents where latency is an afterthought, but for a frontend assistant where a user is staring at a loading spinner, the approach falls apart. Our p90 session creation time (including GitHub clone and other setup) was ~46 seconds.

Beyond latency, dedicated micro-VMs for reading static documentation introduced a serious infrastructure bill:

$0$50k$100k$150k$200k0357101215Average session duration (minutes)Additional annual compute cost
SandboxChromaFs

At 850,000 conversations a month, even a minimal setup (1 vCPU, 2 GiB RAM, 5-minute session lifetime) would put us north of $70,000 a year based on Daytona's per-second sandbox pricing ($0.0504/h per vCPU, $0.0162/h per GiB RAM). Longer session times double that. (This is based on a purely naive approach, a true production workflow would probably have warm pools and container sharing, but the point still stands)

We needed the filesystem workflow to be instant and cheap, which meant rethinking the filesystem itself.

Faking a Shell

The agent doesn't need a real filesystem; it just needs the illusion of one. Our documentation was already indexed, chunked, and stored in a Chroma database to power our search, so we built ChromaFs: a virtual filesystem that intercepts UNIX commands and translates them into queries against that same database. Session creation dropped from ~46 seconds to ~100 milliseconds, and since ChromaFs reuses infrastructure we already pay for, the marginal per-conversation compute cost is zero.

ChromaFs Architecture

MetricSandboxChromaFs
P90 Boot Time~46 seconds~100 milliseconds
Marginal Compute Cost~$0.0137 per conversation~$0 (reuses existing DB)
Search MechanismLinear disk scan (Syscalls)DB Metadata Query
InfrastructureDaytona or similar providersProvisioned DB

ChromaFs is built on just-bash by Vercel Labs (shoutout Malte!), a TypeScript reimplementation of bash that supports grep, cat, ls, find, and cd. just-bash exposes a pluggable IFileSystem interface, so it handles all the parsing, piping, and flag logic while ChromaFs translates every underlying filesystem call into a Chroma query.

How it works

Bootstrapping the Directory Tree

ChromaFs needs to know what files exist before the agent runs a single command. We store the entire file tree as a gzipped JSON document (__path_tree__) inside the Chroma collection:

{
  "auth/oauth": { "isPublic": true, "groups": [] },
  "auth/api-keys": { "isPublic": true, "groups": [] },
  "internal/billing": { "isPublic": false, "groups": ["admin", "billing"] },
  "api-reference/endpoints/users": { "isPublic": true, "groups": [] }
}

On init, the server fetches and decompresses this document into two in-memory structures: a Set<string> of file paths and a Map<string, string[]> mapping directories to children.

Once built, ls, cd, and find resolve in local memory with no network calls. The tree is cached, so subsequent sessions for the same site skip the Chroma fetch entirely.

Access Control

Notice the isPublic and groups fields in the path tree. Before building the file tree, ChromaFs prunes slugs using the current user's session token and applies a matching filter to all subsequent Chroma queries. If a user lacks access to a file, that file is excluded from the tree entirely, so the agent can't access or even reference a path that was pruned.

In a real sandbox, this level of per-user access control would require managing Linux user groups, chmod permissions, or maintaining isolated container images per customer tier. In ChromaFs it's a few lines of filtering before buildFileTree runs.

Groups: none
PathAccessVisible
/auth/oauth.mdxpublic
/auth/api-keys.mdxpublic
/internal/billing.mdxadmin, billing
/internal/audit-log.mdxadmin
/api-reference/users.mdxpublic
/api-reference/payments.mdxbilling

Reassembling Pages from Chunks

Pages in Chroma are split into chunks for embedding, so when the agent runs cat /auth/oauth.mdx, ChromaFs fetches all chunks with a matching page slug, sorts by chunk_index, and joins them into the full page. Results are cached so repeated reads during grep workflows never hit the database twice.

Not every file needs to exist in Chroma. We register lazy file pointers that resolve on access for large OpenAPI specs stored in customers' S3 buckets. The agent sees v2.json in ls /api-specs/, but the content only fetches when it runs cat.

Every write operation throws an EROFS (Read-Only File System) error. The agent explores freely but can never mutate documentation, which makes the system stateless with no session cleanup and no risk of one agent corrupting another's view.

Optimizing Grep

cat and ls are straightforward to virtualize, but grep -r would be far too slow if it naively scanned every file over the network. We intercept just-bash’s grep, parse the flags with yargs-parser, and translate them into a Chroma query ($contains for fixed strings, $regex for patterns).

Chroma acts as a coarse filter that identifies which files might contain the hit, and we bulkPrefetch those matching chunks into a Redis cache. From there, we rewrite the grep command to target only the matched files and hand it back to just-bash for fine filter in-memory execution, which means large recursive queries complete in milliseconds.

1. Coarse filter (Chroma)
/auth/oauth.mdx
/auth/api-keys.mdx
/api-reference/users.mdx
/api-reference/payments.mdx
/guides/quickstart.mdx
/guides/webhooks.mdx
3/6 files match
2. Fine filter (in-memory regex)
/auth/oauth.mdx
Use the access_token from the OAuth flow to authenticate API requests.
/api-reference/users.mdx
The GET /users endpoint returns a list of users. Requires access_token in the Authorization header.
/guides/quickstart.mdx
Get started by generating an access_token using the OAuth guide.

Conclusion

ChromaFs powers the documentation assistant for hundreds of thousands of users across 30,000+ conversations a day. By replacing sandboxes with a virtual filesystem over our existing Chroma database, we got instant session creation, zero marginal compute cost, and built-in RBAC without any new infrastructure.

Try it on any Mintlify docs site, or at mintlify.com/docs.