ai.ollama | Version: 1.1.0 | Tools: 3
The ai.ollama plugin exposes GenieHelper’s local Ollama inference layer to the agent as MCP tools. All inference runs on the local VPS — no tokens leave the server, no API keys are required, and there is no content moderation layer that would reject adult content requests.
Ollama runs as a systemd service on port 11434. It is not a Docker container. The plugin connects to
http://127.0.0.1:11434 by default, configurable via OLLAMA_URL.Tools
generate — single-turn completion
Sends a single prompt to Ollama and returns the completion. This is the right tool for stateless, one-shot generation tasks.
Parameters:
prompt(required) — the user promptsystem(optional) — system instruction injected before the promptmodel(optional) — overridesOLLAMA_MODELfor this calltemperature(optional) — sampling temperature (default 0.7)max_tokens(optional) — maximum tokens to generate
generate:
- Generating captions for a single media asset
- Classifying a piece of content against a schema
- Summarising scraped platform data
- Drafting a single-message push notification
- Any task where conversation history is irrelevant
chat — multi-turn conversation
Sends a full messages array ([{role, content}]) to the Ollama chat endpoint. Maintains conversational coherence across multiple exchanges by including prior turns.
Parameters:
messages(required) — JSON array of{role: "user"|"assistant"|"system", content: string}system(optional) — system prompt prepended to the messages arraymodel(optional) — overridesOLLAMA_MODELfor this calltemperature(optional) — sampling temperature
chat:
- Drafting a back-and-forth fan message conversation where previous replies shape tone
- Multi-step creative writing where each generation builds on the last
- Iterative refinement loops where the agent critiques and improves its own output
- Simulating a creator’s voice across multiple platform reply threads
list-models — available models
Lists all Ollama models currently pulled on the local server. Returns model name, disk size, and last modified timestamp.
When to use list-models:
- Verifying a required model is available before starting a generation task
- Selecting the most appropriate model for a task based on what is currently loaded
- Debugging inference failures when the default model may have been removed
Model selection
GenieHelper runs three primary models, each with a specific role:dolphin3:8b
Role: OrchestratorDefault model (
OLLAMA_MODEL env). Used for tool planning, ACTION emission, and agent coordination. Llama 3.1 base with Dolphin fine-tune for instruction following.dolphin-mistral:7b
Role: Content writerUncensored Mistral fine-tune. Used for caption writing, fan message drafting, and any task requiring creative adult content generation without content policy restrictions.
qwen-2.5:latest
Role: Agent / codePrimary AnythingLLM agent model. Used for JSON generation, code tasks, and structured output. Also the default
STAGEHAND_MODEL for browser automation.Custom selection
Per-call overrideBoth
generate and chat accept a model parameter to override OLLAMA_MODEL for a single call. Use this to route content writing tasks to dolphin-mistral:7b without changing the global default.The default model is set by the
OLLAMA_MODEL environment variable. If the variable is not set, the plugin falls back to qwen-2.5:latest. The orchestrator model (dolphin3:8b-llama3.1-q4_K_M) must be set explicitly via the env var.Performance characteristics
All inference is CPU-bound on the IONOS dedicated server (16 GB RAM).| Model | Approx. latency | RAM usage |
|---|---|---|
| qwen-2.5:7b (q4) | 2–5 s/call | ~4.8 GB |
| dolphin-mistral:7b | 3–6 s/call | ~5.2 GB |
| dolphin3:8b | 4–8 s/call | ~5.8 GB |
Configuration
| Setting | Value |
|---|---|
| Base URL | http://127.0.0.1:11434 (configurable via OLLAMA_URL) |
| Default model | OLLAMA_MODEL env (fallback: qwen-2.5:latest) |
| Timeout | 120,000 ms (2 minutes) |
| Concurrency | 2 |
| Auth | None (localhost only) |