NeMo Guardrails supports multiple LLM providers including OpenAI, NVIDIA NIM, Google Vertex AI, and HuggingFace. Each provider requires specific configuration in your config.yml file.
Supported LLM Providers
OpenAI GPT-3.5, GPT-4, and other OpenAI models
NVIDIA NIM LLama, Nemotron, and NVIDIA optimized models
Vertex AI Google’s Gemini and PaLM models
HuggingFace Open-source models via pipeline or endpoints
OpenAI Configuration
Basic OpenAI Setup
From examples/bots/hello_world/config.yml:
models :
- type : main
engine : openai
model : gpt-4o-mini
OpenAI with Parameters
From examples/configs/sample/config.yml:
models :
- type : main
engine : openai
model : gpt-3.5-turbo-instruct
parameters :
temperature : 0.7
max_tokens : 256
top_p : 1.0
frequency_penalty : 0.0
presence_penalty : 0.0
Supported OpenAI Models
gpt-4o - Latest GPT-4 Omni model
gpt-4o-mini - Smaller, faster GPT-4 Omni
gpt-4-turbo - GPT-4 Turbo
gpt-4 - GPT-4 base model
gpt-3.5-turbo - GPT-3.5 Turbo (chat)
gpt-3.5-turbo-instruct - GPT-3.5 Instruct (completion)
Environment Variables
Set your OpenAI API key:
export OPENAI_API_KEY = "sk-..."
NVIDIA NIM Configuration
Basic NIM Setup
From examples/configs/llm/nim/config.yml:
models :
- type : main
engine : nim
model : meta/llama3-8b-instruct
parameters :
base_url : http://localhost:7331/v1
NIM with Cloud API
models :
- type : main
engine : nim
model : meta/llama-3.3-70b-instruct
parameters :
base_url : https://integrate.api.nvidia.com/v1
api_key : ${NVIDIA_API_KEY}
Multiple NIM Models
From examples/configs/content_safety/config.yml:
models :
- type : main
engine : nim
model : meta/llama-3.3-70b-instruct
- type : content_safety
engine : nim
model : nvidia/llama-3.1-nemoguard-8b-content-safety
Popular NIM Models
LLama Models
Nemotron Models
Safety Models
Deepseek
models :
- type : main
engine : nim
model : meta/llama3-8b-instruct
# or
# model: meta/llama-3.3-70b-instruct
# model: meta/llama-3.1-405b-instruct
From examples/configs/nemotron/config.yml: models :
- type : main
engine : nim
model : nvidia/nemotron-4-340b-instruct
models :
- type : content_safety
engine : nim
model : nvidia/llama-3.1-nemoguard-8b-content-safety
- type : jailbreak_detection
engine : nim
model : nvidia/nemoguard-jailbreak-detection
From examples/configs/llm/deepseek-r1/config.yml: models :
- type : main
engine : nim
model : deepseek/deepseek-r1
Environment Variables
For NVIDIA NIM cloud:
export NVIDIA_API_KEY = "nvapi-..."
Google Vertex AI Configuration
From examples/configs/llm/vertexai/config.yml:
models :
- type : main
engine : vertexai
model : gemini-1.0-pro
Vertex AI with Parameters
models :
- type : main
engine : vertexai
model : gemini-1.5-pro
parameters :
temperature : 0.7
max_output_tokens : 1024
top_p : 0.95
top_k : 40
Supported Vertex AI Models
gemini-1.5-pro - Latest Gemini Pro
gemini-1.0-pro - Gemini Pro 1.0
gemini-1.5-flash - Fast Gemini model
Authentication
Vertex AI requires Google Cloud authentication:
export GOOGLE_APPLICATION_CREDENTIALS = "path/to/credentials.json"
export GOOGLE_CLOUD_PROJECT = "your-project-id"
HuggingFace Configuration
HuggingFace Pipeline
From examples/configs/llm/hf_pipeline_llama2/config.yml:
models :
- type : main
engine : hf_pipeline_llama2
model : meta-llama/Llama-2-7b-chat-hf
parameters :
device_map : auto
torch_dtype : float16
Other HuggingFace Pipeline Examples
Vicuna
Falcon
Mosaic
Dolly
From examples/configs/llm/hf_pipeline_vicuna/config.yml: models :
- type : main
engine : hf_pipeline_vicuna
model : lmsys/vicuna-7b-v1.5
From examples/configs/llm/hf_pipeline_falcon/config.yml: models :
- type : main
engine : hf_pipeline_falcon
model : tiiuae/falcon-7b-instruct
From examples/configs/llm/hf_pipeline_mosaic/config.yml: models :
- type : main
engine : hf_pipeline_mosaic
model : mosaicml/mpt-7b-chat
From examples/configs/llm/hf_pipeline_dolly/config.yml: models :
- type : main
engine : hf_pipeline_dolly
model : databricks/dolly-v2-3b
HuggingFace Endpoint
From examples/configs/llm/hf_endpoint/config.yml:
models :
- type : main
engine : hf_endpoint
model : your-model-endpoint
parameters :
endpoint_url : https://your-endpoint.huggingface.cloud
Environment Variables
export HUGGINGFACEHUB_API_TOKEN = "hf_..."
Model Parameters
Common parameters across providers:
Controls randomness. Lower values make output more deterministic
Maximum number of tokens to generate
Nucleus sampling parameter
Penalizes repeated tokens (OpenAI)
Penalizes tokens based on presence (OpenAI)
Custom API endpoint URL (NIM, custom deployments)
API key for authentication. Use environment variables for security
Multiple Model Types
You can configure different models for different purposes:
models :
# Main conversation model
- type : main
engine : openai
model : gpt-4o
# Content safety checking
- type : content_safety
engine : nim
model : nvidia/llama-3.1-nemoguard-8b-content-safety
# Jailbreak detection
- type : jailbreak_detection
engine : nim
model : nvidia/nemoguard-jailbreak-detection
# Embedding model for retrieval
- type : embeddings
engine : openai
model : text-embedding-ada-002
Streaming Support
Enable streaming for real-time responses. From examples/configs/streaming/config.yml:
models :
- type : main
engine : openai
model : gpt-4
rails :
dialog :
single_call :
enabled : True
Use streaming in Python:
from nemoguardrails import RailsConfig, LLMRails
config = RailsConfig.from_path( "./config" )
rails = LLMRails(config)
# Streaming response
async for chunk in rails.stream_async(
messages = [{ "role" : "user" , "content" : "Tell me a story" }]
):
print (chunk, end = "" , flush = True )
Custom LLM Providers
You can add custom LLM providers by implementing the LLM interface:
# config/config.py
from nemoguardrails.llm.providers import register_llm_provider
from nemoguardrails.llm.base import BaseLLM
class CustomLLM ( BaseLLM ):
"""Custom LLM implementation."""
async def generate ( self , prompt : str , ** kwargs ):
# Your custom generation logic
pass
# Register the provider
register_llm_provider( "custom_engine" , CustomLLM)
Then use in config.yml:
models :
- type : main
engine : custom_engine
model : your-custom-model
Best Practices
Use Environment Variables
Never hardcode API keys in config files models :
- type : main
engine : openai
model : gpt-4
parameters :
api_key : ${OPENAI_API_KEY} # Good
# api_key: sk-hardcoded-key # Bad!
Choose Appropriate Models
Use smaller models (e.g., gpt-4o-mini) for simple tasks
Use larger models (e.g., gpt-4o) for complex reasoning
Use specialized models for specific tasks (safety, embeddings)
Configure Timeouts
models :
- type : main
engine : openai
model : gpt-4
parameters :
timeout : 30 # seconds
Test Locally First
Use local NIM deployments for development before switching to cloud APIs
Testing LLM Configuration
nemoguardrails chat --config ./config
from nemoguardrails import RailsConfig, LLMRails
config = RailsConfig.from_path( "./config" )
rails = LLMRails(config)
# Test generation
response = rails.generate(
messages = [{ "role" : "user" , "content" : "Hello!" }]
)
print (response)
Troubleshooting
Ensure environment variables are set: echo $OPENAI_API_KEY
echo $NVIDIA_API_KEY
If empty, export them before running: export OPENAI_API_KEY = "your-key-here"
For NIM local deployments, verify the server is running: curl http://localhost:7331/v1/models
Verify the model name matches the provider’s available models:
Next Steps
config.yml Schema Learn about all configuration options
Guardrails Library Explore built-in guardrails for different LLMs