Installation

Hinbox requires Python 3.12+ and uses uv for dependency management. This guide will walk you through the complete installation process.

Prerequisites

Before installing Hinbox, ensure you have the following:

Python 3.12 or higher

Check your Python version:

python --version

If you need to install or upgrade Python, visit python.org or use your system package manager.

uv package manager

Install uv for fast dependency management:

curl -LsSf https://astral.sh/uv/install.sh | sh

Verify the installation:

uv --version

just task runner

Install just for running project commands:

brew install just

Verify the installation:

just --version

See the just documentation for more installation options.

Optional: Ollama for local models

If you want to run local AI models (useful for privacy-sensitive research), install Ollama:Visit ollama.com and follow the installation instructions for your platform.Verify the installation:

ollama --version

Local models require significant resources. The default Qwen 2.5 32B model requires ~23GB download and runs best with at least 32GB RAM.

Install Hinbox

1. Clone the repository

git clone https://github.com/strickvl/hinbox.git
cd hinbox

2. Install dependencies

Hinbox supports two installation modes:

Cloud embeddings (recommended)
Local embeddings

Install with cloud embeddings only (works on all platforms including Intel Mac):

uv sync

This is the recommended option for most users. Cloud embeddings via Jina AI are fast and don’t require GPU resources.

Install with local embedding support (requires PyTorch):

uv sync --extra local-embeddings

This option requires:

Linux, Apple Silicon Mac, or Windows
Sufficient RAM for PyTorch models
Optional: GPU for faster processing

Local embeddings are useful for completely offline processing or when you need full privacy control.

3. Set up environment variables

Create a .env file in the project root to store your API keys and configuration:

# Create .env file with Gemini API key (required for cloud models)
echo 'GEMINI_API_KEY=your-gemini-api-key' > .env

Get your Gemini API key from Google AI Studio.

The just task runner automatically loads environment variables from .env files.

4. Optional: Set up local models

If you installed Ollama and want to use local models:

Configure Ollama URL

Add the Ollama API URL to your .env file:

echo 'OLLAMA_API_URL=http://localhost:11434/v1' >> .env

Pull the default model

Download the Qwen 2.5 32B model (~23GB):

ollama pull qwen2.5:32b-instruct-q5_K_M

This is a powerful instruction-tuned model that works well for entity extraction. The download may take some time depending on your connection.

Set context window

Configure a realistic context window for Ollama. Qwen 2.5 supports up to 131K tokens, but more context requires more VRAM.Add to your shell profile (.bashrc, .zshrc, etc.) or systemd unit:

export OLLAMA_CONTEXT_LENGTH=32768

Then restart your shell or the Ollama service.

Optional: Override default models

You can override the default models without editing code. Add to your .env file:

# Override the local model (default: ollama/qwen2.5:32b-instruct-q5_K_M)
echo 'HINBOX_OLLAMA_MODEL=ollama/gemma2:27b' >> .env

# Override the cloud model (default: gemini/gemini-2.0-flash)
echo 'HINBOX_CLOUD_MODEL=gemini/gemini-2.5-flash' >> .env

Hinbox uses LiteLLM under the hood, so you can use any model supported by LiteLLM.

5. Verify installation

Check that everything is working:

just domains

You should see a list of available research domains (at minimum, guantanamo and template).

Configuration files

After installation, your project structure should look like this:

hinbox/
├── .env                    # Your API keys and environment variables
├── configs/               # Domain configurations
│   ├── guantanamo/       # Example domain
│   └── template/         # Template for new domains
├── src/                  # Application code
├── tests/                # Test suite
├── scripts/              # Utility scripts
├── data/                 # Data directory (created when you process sources)
└── justfile              # Task runner commands

Next steps

Quick start

Create your first domain and process historical sources

Configuration

Learn how to configure domains for your research

Common installation issues

uv sync fails with dependency errors

Try updating uv to the latest version:

uv self update

Then retry:

uv sync

If the issue persists, check that you’re using Python 3.12 or higher:

python --version

PyTorch installation fails on Intel Mac

Intel Macs are not supported for local embeddings due to PyTorch limitations. Use cloud embeddings instead:

uv sync  # Without --extra local-embeddings

Ollama model fails to load

Make sure:

Ollama service is running: ollama serve
Model is pulled: ollama list should show your model
Context length is set: echo $OLLAMA_CONTEXT_LENGTH

Try pulling the model again:

ollama pull qwen2.5:32b-instruct-q5_K_M

GEMINI_API_KEY not found

Make sure you created the .env file in the project root:

# Check if .env exists
cat .env

# Should show:
# GEMINI_API_KEY=your-key-here

The justfile automatically loads .env variables using set dotenv-load.

just command not found

Ensure just is installed and in your PATH:

which just

If not found, reinstall following the just installation guide.Alternatively, you can run commands directly with uv:

# Instead of: just domains
uv run python scripts/list_domains.py

# Instead of: just process-domain mydata
uv run python -m src.process_and_extract --domain mydata

Development installation

If you’re contributing to Hinbox development, install dev dependencies:

uv sync --all-extras

This installs additional tools for testing and code quality:

# Run tests
just test

# Format code
just format

# Lint code
just lint

# Run CI checks (exactly what GitHub Actions runs)
just ci

The CI checks include linting with Ruff and running the test suite. Always run just ci before submitting pull requests.

System requirements

Minimum requirements (cloud mode)

CPU: Any modern multi-core processor
RAM: 8GB minimum, 16GB recommended
Disk: 2GB for installation + space for your data
Network: Internet connection for cloud API calls

Recommended for local mode

CPU: 8+ cores recommended
RAM: 32GB minimum for Qwen 2.5 32B
GPU: Optional but recommended (16GB+ VRAM)
Disk: 50GB+ for models and data

Platform support

Platform	Cloud embeddings	Local embeddings	Notes
Linux (x86_64)	✓	✓	Full support
macOS (Apple Silicon)	✓	✓	Full support
macOS (Intel)	✓	✗	PyTorch limitations
Windows	✓	✓	Full support

Privacy and security

Privacy mode

For sensitive historical research, use the --local flag to enforce local-only processing:

just process-domain sensitive_research --local

This:

Forces local embeddings (no cloud API calls)
Disables all LLM telemetry callbacks
Requires local model setup (Ollama)
Requires local embeddings installation (--extra local-embeddings)

API key security

Never commit your .env file or API keys to version control. The .gitignore file excludes .env by default.

Best practices:

Use environment-specific .env files
Rotate API keys regularly
Use separate keys for development and production
Monitor API usage in the provider dashboard

Data storage

All extracted data is stored locally in the data/ directory:

data/
└── {domain}/
    ├── raw_sources/           # Your input documents
    ├── entities/              # Extracted entities (Parquet)
    └── entities/cache/        # Extraction cache (JSON)

No data is sent to external services except:

Cloud API calls for embeddings and LLM inference (can be disabled with --local)
Optional telemetry (can be disabled with --local)

Get Started

Core Concepts

Guides

Advanced

Prerequisites

Install Hinbox

1. Clone the repository

2. Install dependencies

3. Set up environment variables

4. Optional: Set up local models

5. Verify installation

Configuration files

Next steps

Quick start

Configuration

Common installation issues

Development installation

System requirements

Minimum requirements (cloud mode)

Recommended for local mode

Platform support

Privacy and security

Privacy mode

API key security

Data storage

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Advanced

​Prerequisites

​Install Hinbox

​1. Clone the repository

​2. Install dependencies

​3. Set up environment variables

​4. Optional: Set up local models

​5. Verify installation

​Configuration files

​Next steps

Quick start

Configuration

​Common installation issues

​Development installation

​System requirements

​Minimum requirements (cloud mode)

​Recommended for local mode

​Platform support

​Privacy and security

​Privacy mode

​API key security

​Data storage

Build docs developers (and LLMs) love

Prerequisites

Install Hinbox

1. Clone the repository

2. Install dependencies

3. Set up environment variables

4. Optional: Set up local models

5. Verify installation

Configuration files

Next steps

Common installation issues

Development installation

System requirements

Minimum requirements (cloud mode)

Recommended for local mode

Platform support

Privacy and security

Privacy mode

API key security

Data storage