Skip to main content
Ollama lets you run open-source LLMs locally. Because all inference happens on your machine, your Blueprint code and source files never leave your system — making it a strong choice for proprietary projects or air-gapped environments.
Ollama is free to use. You pay nothing per request, but you need a machine capable of running the model you choose. For qwen3:32b, a GPU with at least 24 GB of VRAM is recommended.

Setup

1

Install Ollama

Download and install Ollama from ollama.com. It runs as a background service on your machine.
2

Pull a model

Open a terminal and pull the default model:
ollama pull qwen3:32b
You can substitute any model name from the Ollama model library.
3

Confirm Ollama is running

Ollama starts automatically on install and listens on http://localhost:11434 by default. You can verify it’s running by visiting that address in a browser — you should see Ollama is running.
4

Configure in Unreal Engine

Open Edit → Project Settings → Plugins → Node to Code → LLM Services → Ollama. Set the Model Name to the model you pulled (e.g., qwen3:32b).
5

Select Ollama as your provider

Under Node to Code | LLM Provider, set Provider to Ollama.

Configuration

All Ollama settings are under Node to Code | LLM Services | Ollama in Project Settings.

Connection

SettingDefaultDescription
Ollama Endpointhttp://localhost:11434Base URL for the Ollama API. Change this if Ollama is running on a different host or port.
Model Nameqwen3:32bThe name of the model to use, exactly as it appears after ollama pull.

Behavior

SettingDefaultDescription
Use System PromptsfalseEnable if your model supports system prompts. When disabled, the system prompt is merged into the first user message.
Prepended Model Command(empty)Text prepended to every user message. Use model-specific commands like /no_think to disable extended thinking on reasoning models.
Keep Alive Duration3600Seconds to keep the model loaded in memory after a request. Set to -1 to keep it loaded indefinitely.

Generation

SettingDefaultDescription
Temperature0.0Controls output randomness. 0.0 is fully deterministic.
Max Tokens8192Maximum tokens to generate. Set to -1 for unlimited.
Top P0.5Nucleus sampling threshold.
Top K40Limits tokens considered per step.
Min P0.05Minimum probability threshold relative to the top token.
Repeat Penalty1.1Penalty for repeating tokens.

Context

SettingDefaultDescription
Context Window8192Token limit for the context window. For larger Blueprint graphs or when using Translation Depth, increase this to 16k–32k or higher.

Advanced

SettingDefaultDescription
Mirostat Mode0Mirostat sampling mode. 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0.
Mirostat Eta0.1Mirostat learning rate. Lower values make adjustments more slowly.
Mirostat Tau5.0Mirostat target entropy. Lower values increase focus.
Random Seed0Seed for generation. 0 = random; set a fixed value for reproducible output.
If you’re translating large or deeply nested Blueprint graphs, increase the Context Window setting significantly. The default of 8192 tokens may be too small for complex graphs, causing truncated or incomplete translations.

Build docs developers (and LLMs) love