docbot index command scans your documentation and codebase, creates embeddings, and stores them in Qdrant for fast semantic search. It tracks file changes and only re-indexes modified files.
Basic usage
- Connect to your Qdrant instance
- Scan documentation files for changes
- Create embeddings for new or modified files
- Update the embedding manifest with file hashes
- Optionally index codebase files if
--codebaseis provided
The
AI_GATEWAY_API_KEY environment variable is required for indexing. Docbot uses this to generate embeddings through your AI gateway.Options
Path to the documentation directory. Can also be set via
paths.docs in your config file.If both are omitted, the command will fail with an error.Comma-separated paths or globs to codebase directories. Falls back to
paths.codebase in your config file.Examples: apps/helm,packages/* or src/**If omitted, only documentation will be indexed.Path to docbot config file. Defaults to
docbot.config.jsonc in your project root.Alias: -cQdrant server URL. Overrides the URL in your config file.Default:
http://127.0.0.1:6333Force full re-index, ignoring the manifest. This re-creates embeddings for all files even if they haven’t changed.Alias:
-fExamples
How indexing works
Incremental indexing
Docbot maintains an embedding manifest at.docbot/manifest.json that tracks file hashes. On each run:
- Scans files - Walks the documentation and codebase directories
- Computes hashes - Calculates content hashes for each file
- Detects changes - Compares hashes against the manifest
- Syncs embeddings - Only processes files that are new, changed, or removed
File categorization
During scanning, files are categorized as:- Added - New files not in the manifest
- Changed - Files with different content hashes
- Removed - Files in manifest but no longer on disk
- Unchanged - Files with matching hashes (skipped)
Chunking strategy
Docbot splits files into chunks for better embedding quality:- Documentation - Split by headings and semantic boundaries
- Code - Split by function/class definitions and logical blocks
Expected output
First-time indexing
Incremental update
No changes detected
Configuration
You can configure paths indocbot.config.jsonc to avoid passing flags:
Monitoring progress
The indexing process shows real-time progress:- Scanning phase - Shows file counts and scan duration
- Syncing phase - Shows chunk counts as embeddings are created
- Manifest saves - Periodically saves progress (automatic on SIGINT/SIGTERM)
If you interrupt indexing with Ctrl+C, the manifest is saved automatically. Rerunning the command will resume from where it left off.
Performance considerations
Large codebases
For projects with thousands of files:-
Use specific paths - Instead of indexing the entire repo, target specific directories:
-
Exclude build artifacts - Ensure your codebase paths don’t include
node_modules,dist, or other generated directories - First run takes time - Initial indexing can take several minutes for large projects. Subsequent runs are much faster.
Embedding costs
Each chunk generates an API call to create embeddings. To minimize costs:- Use incremental indexing (don’t use
--forceunnecessarily) - Be selective with codebase paths
- Focus on documentation and source directories only
Troubleshooting
Error: AI_GATEWAY_API_KEY environment variable is required
Error: docs path is required
--docs or configure paths.docs in your config file.
Connection errors
If Qdrant isn’t accessible:Slow indexing
If indexing is slower than expected:- Check network latency - Embedding API calls depend on network speed
- Verify file counts - Ensure you’re not accidentally indexing
node_modulesor other large directories - Monitor Qdrant - Check Qdrant logs for performance issues
Next steps
After indexing:- Test search - Use
docbot search "query"to verify embeddings - Run tasks - Start the agent with
docbot run "task" - Keep updated - Re-run indexing when documentation changes