Overview
ollama-plan reads your installed Ollama models and hardware profile, then computes safe values for OLLAMA_NUM_CTX, OLLAMA_NUM_PARALLEL, and OLLAMA_MAX_LOADED_MODELS. It prevents out-of-memory crashes by planning memory usage before you start Ollama.
Example Output
Flags
Model tags or family names to include in the plan. Matches against installed Ollama models by exact name, prefix, family, or substring. If omitted, all local models are included.
Target context window in tokens.Default:
8192Target number of parallel requests to support.Default:
2Optimization objective. Accepted values:
latency, balanced, throughput.Default: balancedMemory to reserve for the OS and background processes (GB).Default:
2Output the full capacity plan as JSON.
Usage Examples
Plan Output Fields
| Field | Description |
|---|---|
| Memory budget | Usable RAM after subtracting the OS reserve |
| Context | Recommended OLLAMA_NUM_CTX value |
| Parallel | Recommended OLLAMA_NUM_PARALLEL value |
| Loaded models | Recommended OLLAMA_MAX_LOADED_MODELS value |
| Estimated memory | Projected memory usage and utilization percentage |
| Risk | Risk level (LOW, MEDIUM, HIGH) and score (0–100) |
| Fallback profile | Conservative fallback values for constrained environments |
Applying the Plan
Copy theexport lines into your shell profile or set them before starting Ollama:
Ollama must be running and you must have at least one model installed before running
ollama-plan. Install a model with ollama pull llama3.2:3b if needed.
