Switch inference providers

NemoClaw routes all agent inference through the OpenShell gateway, which means you can change the active model or provider at any time without restarting the sandbox. The change takes effect immediately.

Prerequisites

A running NemoClaw sandbox.
The OpenShell CLI on your PATH.
An NVIDIA API key for cloud providers. The nemoclaw onboard wizard stores this in ~/.nemoclaw/credentials.json on first run.

Switch provider interactively

Re-run the onboard wizard to select a new provider and model through guided prompts:

openclaw nemoclaw onboard

The wizard shows your existing configuration and prompts whether to reconfigure. Confirm, then select the new endpoint and model.

Switch provider non-interactively

Pass --endpoint and --model flags to skip the interactive prompts:

openclaw nemoclaw onboard --endpoint build --model nvidia/llama-3.3-nemotron-super-49b-v1.5

NVIDIA Build
NCP partner
Local NIM (experimental)
Local vLLM (experimental)
Ollama (experimental)

The default provider routes inference to build.nvidia.com. Requires NVIDIA_API_KEY.

openshell inference set --provider nvidia-inference --model nvidia/nemotron-3-super-120b-a12b

To use a different model from the catalog:

openshell inference set --provider nvidia-inference --model nvidia/llama-3.1-nemotron-ultra-253b-v1

NVIDIA Cloud Partners (NCP) provide dedicated, SLA-backed capacity. Provide your partner name and endpoint URL:

openshell inference set --provider nvidia-ncp --model nvidia/nemotron-3-super-120b-a12b

During onboard, the wizard prompts for the NCP endpoint URL:

openclaw nemoclaw onboard --endpoint ncp --model nvidia/nemotron-3-super-120b-a12b

Local NIM, vLLM, and Ollama providers are experimental. Set NEMOCLAW_EXPERIMENTAL=1 before running onboard to enable them.

Self-hosted NIM containers expose an OpenAI-compatible endpoint. Provide your NIM service URL:

export NEMOCLAW_EXPERIMENTAL=1
openclaw nemoclaw onboard --endpoint nim-local --model nvidia/nemotron-3-super-120b-a12b

The wizard prompts for the NIM endpoint URL, for example http://nim-service.local:8000/v1. Credentials are read from NIM_API_KEY.

vLLM support is experimental. Set NEMOCLAW_EXPERIMENTAL=1 to enable this option in the onboard wizard.

vLLM runs an OpenAI-compatible server on localhost:8000. NemoClaw connects through the OpenShell host gateway:

export NEMOCLAW_EXPERIMENTAL=1
openclaw nemoclaw onboard --endpoint vllm --model nvidia/nemotron-3-nano-30b-a3b

The endpoint is fixed to http://host.openshell.internal:8000/v1. Start vLLM separately before running onboard:

python3 -m vllm.entrypoints.openai.api_server \
  --model nvidia/nemotron-3-nano-30b-a3b \
  --port 8000 \
  --host 0.0.0.0

Ollama support is experimental. Set NEMOCLAW_EXPERIMENTAL=1 to enable this option in the onboard wizard.

If Ollama is detected on localhost:11434, the wizard offers it automatically:

export NEMOCLAW_EXPERIMENTAL=1
openclaw nemoclaw onboard --endpoint ollama

The endpoint resolves to http://host.openshell.internal:11434/v1. Credentials default to the string ollama.

Switch the model at runtime

You can change only the model without re-running the full onboard wizard. Use openshell inference set directly:

Plugin setup (openclaw nemoclaw)
Standalone setup (nemoclaw onboard)

If you set up with openclaw nemoclaw launch or openclaw nemoclaw migrate, the blueprint creates a provider named nvidia-inference:

openshell inference set --provider nvidia-inference --model nvidia/nemotron-3-nano-30b-a3b

If you set up with nemoclaw onboard, the wizard creates a provider named nvidia-nim:

openshell inference set --provider nvidia-nim --model nvidia/nemotron-3-nano-30b-a3b

The change takes effect immediately. No sandbox restart is needed.

Verify the active provider

Confirm the provider, model, and endpoint after switching:

openclaw nemoclaw status

For machine-readable output:

openclaw nemoclaw status --json

The output includes the active provider name, model ID, and endpoint URL.

Available models

The following models are available through the nvidia-nim provider on build.nvidia.com:

Model ID	Label	Context window	Max output
`nvidia/nemotron-3-super-120b-a12b`	Nemotron 3 Super 120B	131,072	8,192
`nvidia/llama-3.1-nemotron-ultra-253b-v1`	Nemotron Ultra 253B	131,072	4,096
`nvidia/llama-3.3-nemotron-super-49b-v1.5`	Nemotron Super 49B v1.5	131,072	4,096
`nvidia/nemotron-3-nano-30b-a3b`	Nemotron 3 Nano 30B	131,072	4,096

The default profile uses Nemotron 3 Super 120B. The Nano 30B model is used by default for local vLLM deployments.

API keys are validated against the endpoint before onboarding completes. For local endpoints (vLLM, Ollama, local NIM), validation is best-effort — the service does not need to be running at onboard time.

Inference profiles

Full profile configuration reference, including blueprint.yaml fields and provider types.

Monitor sandbox activity

Check the active provider and model with openclaw nemoclaw status.

Get Started

Guides

Switch inference providers

Switch inference providers

Prerequisites

Switch provider interactively

Switch provider non-interactively

Switch the model at runtime

Verify the active provider

Available models

Inference profiles

Monitor sandbox activity

Build docs developers (and LLMs) love

Get Started

Guides

​Switch inference providers

​Prerequisites

​Switch provider interactively

​Switch provider non-interactively

​Switch the model at runtime

​Verify the active provider

​Available models

​Related topics

Inference profiles

Monitor sandbox activity

Build docs developers (and LLMs) love

Switch inference providers

Prerequisites

Switch provider interactively

Switch provider non-interactively

Switch the model at runtime

Verify the active provider

Available models

Related topics