DMR runs models locally — your data never leaves your machine. It is ideal for development, sensitive data, or offline use.
Prerequisites
- Install Docker Desktop and enable the Model Runner feature in Docker Desktop settings.
- Verify it is running:
Pulling models
Pull models from Docker Hub before running them:Configuration
- Inline
- Named model
Available models
Any model available through Docker Model Runner can be used. Common options:| Model | Description |
|---|---|
ai/qwen3 | Qwen 3 — versatile, good for coding and general tasks |
ai/llama3.2 | Llama 3.2 — Meta’s open-source model |
Runtime flags
Pass flags directly to the underlying llama.cpp inference runtime usingprovider_opts.runtime_flags:
Parameter mapping
docker-agent model config fields map automatically to llama.cpp flags.runtime_flags take priority over derived flags on conflict.
| Config field | llama.cpp flag |
|---|---|
temperature | --temp |
top_p | --top-p |
frequency_penalty | --frequency-penalty |
presence_penalty | --presence-penalty |
max_tokens | --context-size |
Speculative decoding
Use a smaller draft model to predict tokens ahead for faster inference:Custom endpoint
By default, docker-agent auto-discovers the DMR endpoint. Setbase_url manually if needed:
If you are running docker-agent itself inside a Docker container, use
http://model-runner.docker.internal/engines/v1 as the base URL.Local inference benefits
- No API costs — models run on your hardware.
- Data privacy — data stays on your machine and never reaches external servers.
- Offline capable — works without an internet connection after the model is pulled.
- Consistent performance — no rate limits or external service outages.
Troubleshooting
Plugin not found: Ensure Docker Model Runner is enabled in Docker Desktop settings. docker-agent will attempt to fall back to the default URL. Endpoint empty: Verify the Model Runner is running withdocker model status --json.
Slow performance: Use runtime_flags to tune GPU layers (--ngl) and thread count (--threads).