Overview
Hinbox’s Privacy Mode ensures that no data from your historical documents leaves your local machine. This is critical for:- Sensitive archival research (unpublished manuscripts, confidential sources)
- GDPR/compliance requirements in academic institutions
- Embargo periods before publication
- Proprietary historical collections with usage restrictions
How It Works
The--local flag triggers three privacy-critical behaviors (from src/process_and_extract.py:786-794):
1. Disables LLM Telemetry
By default, Hinbox uses Braintrust for optional LLM telemetry. Thedisable_llm_callbacks() function (src/constants.py:62-75) clears all callbacks:
What gets disabled: All LiteLLM callbacks including Braintrust, Langfuse, OpenTelemetry, and custom loggers.
2. Forces Local Embeddings
Cloud embeddings (Jina AI by default) are replaced with local models:- Cloud:
jina_ai/jina-embeddings-v3(data sent to Jina API) - Local:
sentence-transformers/all-MiniLM-L6-v2(runs on your GPU/CPU)
src/constants.py:11-12:
3. Uses Local LLM
The--local flag sets model_type = "ollama" which routes all extraction through your local Ollama server.
Running in Privacy Mode
Set Up Ollama
See Local Models setup for full instructions:
Verification
Check Network Activity
Verify no external API calls during processing:localhost:11434 (Ollama) connections. No external IPs.
Inspect Logs
Privacy mode logs show local-only operations:Test Without Internet
The ultimate verification:What Gets Processed Locally
Entity Extraction
All LLM calls use local Ollama
- People, organizations, locations, events
- Relevance filtering
- Profile generation
Embeddings
Similarity search runs locally
- Entity deduplication
- Semantic matching
- Name variant detection
Quality Controls
All QC is deterministic and local
- Required field validation
- Name normalization
- Within-article dedup
Processing Status
Metadata stored in local files
- Sidecar JSON tracking
- Extraction cache
- Parquet entity storage
Privacy Guarantees
What Stays Local
✅ Article text content - Never sent to cloud APIs✅ Extracted entities - All processing on your machine
✅ Embedding vectors - Computed locally
✅ Processing metadata - Stored in local Parquet/JSON
✅ Cache data - Persistent cache in
data/{domain}/entities/cache/
What Gets Disabled
🚫 Braintrust telemetry - No LLM call logging to external service🚫 Cloud API calls - Zero network requests to Gemini, Jina, etc.
🚫 Error reporting - Crashes stay local (check logs manually)
🚫 Usage analytics - No phone-home behavior
Configuration Reference
Telemetry Control
Fromsrc/constants.py:56-85:
Embedding Mode
Embedding selection logic fromsrc/utils/embeddings/manager.py:
Hybrid Mode (Not Recommended)
You can technically run local LLM but cloud embeddings:--local.
Performance Impact
| Operation | Cloud (Gemini + Jina) | Local (Ollama + transformers) |
|---|---|---|
| Extraction | ~2-5 docs/sec | ~0.5-2 docs/sec (GPU) |
| Embeddings | ~100 texts/sec | ~20-50 texts/sec (GPU) |
| Deduplication | Fast (cloud batching) | Moderate (local batching) |
| Overall Speed | 100% | ~30-50% (depends on GPU) |
Compliance Scenarios
GDPR Research (EU)
Unpublished Manuscripts
Institutional Review Board (IRB)
For research involving human subjects with privacy restrictions:Troubleshooting
”Local embeddings not available”
Network Calls Still Happening
If you see external connections:- Check for API keys in environment: Remove
GEMINI_API_KEY,JINA_API_KEY - Verify —local flag: Must be passed to
just process - Check config.yaml: Ensure
embeddings.modeisn’t overriding
Slow Local Performance
See Performance Tuning for:- GPU memory optimization
- Concurrency settings for local mode
- Batch size configuration
Next Steps
Local Models
Complete Ollama setup and model selection guide
Performance
Optimize concurrency for local GPU processing
Quality Controls
Understand local QC and validation logic
Caching
Local extraction cache for faster re-runs