Ollama is free to use. You pay nothing per request, but you need a machine capable of running the model you choose. For
qwen3:32b, a GPU with at least 24 GB of VRAM is recommended.Setup
Install Ollama
Download and install Ollama from ollama.com. It runs as a background service on your machine.
Pull a model
Open a terminal and pull the default model:You can substitute any model name from the Ollama model library.
Confirm Ollama is running
Ollama starts automatically on install and listens on
http://localhost:11434 by default. You can verify it’s running by visiting that address in a browser — you should see Ollama is running.Configure in Unreal Engine
Open Edit → Project Settings → Plugins → Node to Code → LLM Services → Ollama. Set the Model Name to the model you pulled (e.g.,
qwen3:32b).Configuration
All Ollama settings are under Node to Code | LLM Services | Ollama in Project Settings.Connection
| Setting | Default | Description |
|---|---|---|
| Ollama Endpoint | http://localhost:11434 | Base URL for the Ollama API. Change this if Ollama is running on a different host or port. |
| Model Name | qwen3:32b | The name of the model to use, exactly as it appears after ollama pull. |
Behavior
| Setting | Default | Description |
|---|---|---|
| Use System Prompts | false | Enable if your model supports system prompts. When disabled, the system prompt is merged into the first user message. |
| Prepended Model Command | (empty) | Text prepended to every user message. Use model-specific commands like /no_think to disable extended thinking on reasoning models. |
| Keep Alive Duration | 3600 | Seconds to keep the model loaded in memory after a request. Set to -1 to keep it loaded indefinitely. |
Generation
| Setting | Default | Description |
|---|---|---|
| Temperature | 0.0 | Controls output randomness. 0.0 is fully deterministic. |
| Max Tokens | 8192 | Maximum tokens to generate. Set to -1 for unlimited. |
| Top P | 0.5 | Nucleus sampling threshold. |
| Top K | 40 | Limits tokens considered per step. |
| Min P | 0.05 | Minimum probability threshold relative to the top token. |
| Repeat Penalty | 1.1 | Penalty for repeating tokens. |
Context
| Setting | Default | Description |
|---|---|---|
| Context Window | 8192 | Token limit for the context window. For larger Blueprint graphs or when using Translation Depth, increase this to 16k–32k or higher. |
Advanced
| Setting | Default | Description |
|---|---|---|
| Mirostat Mode | 0 | Mirostat sampling mode. 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0. |
| Mirostat Eta | 0.1 | Mirostat learning rate. Lower values make adjustments more slowly. |
| Mirostat Tau | 5.0 | Mirostat target entropy. Lower values increase focus. |
| Random Seed | 0 | Seed for generation. 0 = random; set a fixed value for reproducible output. |