LLM Configuration - NeMo Guardrails

NeMo Guardrails supports multiple LLM providers including OpenAI, NVIDIA NIM, Google Vertex AI, and HuggingFace. Each provider requires specific configuration in your config.yml file.

Supported LLM Providers

OpenAI

GPT-3.5, GPT-4, and other OpenAI models

NVIDIA NIM

LLama, Nemotron, and NVIDIA optimized models

Vertex AI

Google’s Gemini and PaLM models

HuggingFace

Open-source models via pipeline or endpoints

OpenAI Configuration

Basic OpenAI Setup

From examples/bots/hello_world/config.yml:

models:
  - type: main
    engine: openai
    model: gpt-4o-mini

OpenAI with Parameters

From examples/configs/sample/config.yml:

models:
  - type: main
    engine: openai
    model: gpt-3.5-turbo-instruct
    parameters:
      temperature: 0.7
      max_tokens: 256
      top_p: 1.0
      frequency_penalty: 0.0
      presence_penalty: 0.0

Supported OpenAI Models

model

string

gpt-4o - Latest GPT-4 Omni model
gpt-4o-mini - Smaller, faster GPT-4 Omni
gpt-4-turbo - GPT-4 Turbo
gpt-4 - GPT-4 base model
gpt-3.5-turbo - GPT-3.5 Turbo (chat)
gpt-3.5-turbo-instruct - GPT-3.5 Instruct (completion)

Environment Variables

Set your OpenAI API key:

export OPENAI_API_KEY="sk-..."

NVIDIA NIM Configuration

Basic NIM Setup

From examples/configs/llm/nim/config.yml:

models:
  - type: main
    engine: nim
    model: meta/llama3-8b-instruct
    parameters:
      base_url: http://localhost:7331/v1

NIM with Cloud API

models:
  - type: main
    engine: nim
    model: meta/llama-3.3-70b-instruct
    parameters:
      base_url: https://integrate.api.nvidia.com/v1
      api_key: ${NVIDIA_API_KEY}

Multiple NIM Models

From examples/configs/content_safety/config.yml:

models:
  - type: main
    engine: nim
    model: meta/llama-3.3-70b-instruct

  - type: content_safety
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-content-safety

Popular NIM Models

LLama Models
Nemotron Models
Safety Models
Deepseek

models:
  - type: main
    engine: nim
    model: meta/llama3-8b-instruct
    # or
    # model: meta/llama-3.3-70b-instruct
    # model: meta/llama-3.1-405b-instruct

From examples/configs/nemotron/config.yml:

models:
  - type: main
    engine: nim
    model: nvidia/nemotron-4-340b-instruct

models:
  - type: content_safety
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-content-safety
  
  - type: jailbreak_detection
    engine: nim
    model: nvidia/nemoguard-jailbreak-detection

From examples/configs/llm/deepseek-r1/config.yml:

models:
  - type: main
    engine: nim
    model: deepseek/deepseek-r1

Environment Variables

For NVIDIA NIM cloud:

export NVIDIA_API_KEY="nvapi-..."

Google Vertex AI Configuration

From examples/configs/llm/vertexai/config.yml:

models:
  - type: main
    engine: vertexai
    model: gemini-1.0-pro

Vertex AI with Parameters

models:
  - type: main
    engine: vertexai
    model: gemini-1.5-pro
    parameters:
      temperature: 0.7
      max_output_tokens: 1024
      top_p: 0.95
      top_k: 40

Supported Vertex AI Models

model

string

gemini-1.5-pro - Latest Gemini Pro
gemini-1.0-pro - Gemini Pro 1.0
gemini-1.5-flash - Fast Gemini model

Authentication

Vertex AI requires Google Cloud authentication:

export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"
export GOOGLE_CLOUD_PROJECT="your-project-id"

HuggingFace Configuration

HuggingFace Pipeline

From examples/configs/llm/hf_pipeline_llama2/config.yml:

models:
  - type: main
    engine: hf_pipeline_llama2
    model: meta-llama/Llama-2-7b-chat-hf
    parameters:
      device_map: auto
      torch_dtype: float16

Other HuggingFace Pipeline Examples

Vicuna
Falcon
Mosaic
Dolly

From examples/configs/llm/hf_pipeline_vicuna/config.yml:

models:
  - type: main
    engine: hf_pipeline_vicuna
    model: lmsys/vicuna-7b-v1.5

From examples/configs/llm/hf_pipeline_falcon/config.yml:

models:
  - type: main
    engine: hf_pipeline_falcon
    model: tiiuae/falcon-7b-instruct

From examples/configs/llm/hf_pipeline_mosaic/config.yml:

models:
  - type: main
    engine: hf_pipeline_mosaic
    model: mosaicml/mpt-7b-chat

From examples/configs/llm/hf_pipeline_dolly/config.yml:

models:
  - type: main
    engine: hf_pipeline_dolly
    model: databricks/dolly-v2-3b

HuggingFace Endpoint

From examples/configs/llm/hf_endpoint/config.yml:

models:
  - type: main
    engine: hf_endpoint
    model: your-model-endpoint
    parameters:
      endpoint_url: https://your-endpoint.huggingface.cloud

Environment Variables

export HUGGINGFACEHUB_API_TOKEN="hf_..."

Model Parameters

Common parameters across providers:

temperature

float

default:"0.7"

Controls randomness. Lower values make output more deterministic

max_tokens

integer

Maximum number of tokens to generate

top_p

float

default:"1.0"

Nucleus sampling parameter

frequency_penalty

float

default:"0.0"

Penalizes repeated tokens (OpenAI)

presence_penalty

float

default:"0.0"

Penalizes tokens based on presence (OpenAI)

base_url

string

Custom API endpoint URL (NIM, custom deployments)

api_key

string

API key for authentication. Use environment variables for security

Multiple Model Types

You can configure different models for different purposes:

models:
  # Main conversation model
  - type: main
    engine: openai
    model: gpt-4o

  # Content safety checking
  - type: content_safety
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-content-safety

  # Jailbreak detection
  - type: jailbreak_detection
    engine: nim
    model: nvidia/nemoguard-jailbreak-detection

  # Embedding model for retrieval
  - type: embeddings
    engine: openai
    model: text-embedding-ada-002

Streaming Support

Enable streaming for real-time responses. From examples/configs/streaming/config.yml:

models:
  - type: main
    engine: openai
    model: gpt-4

rails:
  dialog:
    single_call:
      enabled: True

Use streaming in Python:

from nemoguardrails import RailsConfig, LLMRails

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

# Streaming response
async for chunk in rails.stream_async(
    messages=[{"role": "user", "content": "Tell me a story"}]
):
    print(chunk, end="", flush=True)

Custom LLM Providers

You can add custom LLM providers by implementing the LLM interface:

# config/config.py
from nemoguardrails.llm.providers import register_llm_provider
from nemoguardrails.llm.base import BaseLLM

class CustomLLM(BaseLLM):
    """Custom LLM implementation."""
    
    async def generate(self, prompt: str, **kwargs):
        # Your custom generation logic
        pass

# Register the provider
register_llm_provider("custom_engine", CustomLLM)

Then use in config.yml:

models:
  - type: main
    engine: custom_engine
    model: your-custom-model

Best Practices

Use Environment Variables

Never hardcode API keys in config files

models:
  - type: main
    engine: openai
    model: gpt-4
    parameters:
      api_key: ${OPENAI_API_KEY}  # Good
      # api_key: sk-hardcoded-key  # Bad!

Choose Appropriate Models

Use smaller models (e.g., gpt-4o-mini) for simple tasks
Use larger models (e.g., gpt-4o) for complex reasoning
Use specialized models for specific tasks (safety, embeddings)

Configure Timeouts

models:
  - type: main
    engine: openai
    model: gpt-4
    parameters:
      timeout: 30  # seconds

Test Locally First

Use local NIM deployments for development before switching to cloud APIs

Testing LLM Configuration

CLI Test
Python Test

nemoguardrails chat --config ./config

from nemoguardrails import RailsConfig, LLMRails

config = RailsConfig.from_path("./config")
rails = LLMRails(config)

# Test generation
response = rails.generate(
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response)

Troubleshooting

API Key Errors

Ensure environment variables are set:

echo $OPENAI_API_KEY
echo $NVIDIA_API_KEY

If empty, export them before running:

export OPENAI_API_KEY="your-key-here"

Connection Errors

For NIM local deployments, verify the server is running:

curl http://localhost:7331/v1/models

Model Not Found

Verify the model name matches the provider’s available models:

OpenAI: https://platform.openai.com/docs/models
NVIDIA NIM: https://catalog.ngc.nvidia.com/
HuggingFace: https://huggingface.co/models

Next Steps

config.yml Schema

Learn about all configuration options

Guardrails Library

Explore built-in guardrails for different LLMs

Get Started

Core Concepts

Configuration

Guardrails Library

Built-in Guardrails

Usage

Deployment

Evaluation

​Supported LLM Providers

OpenAI

NVIDIA NIM

Vertex AI

HuggingFace

​OpenAI Configuration

​Basic OpenAI Setup

​OpenAI with Parameters

​Supported OpenAI Models

​Environment Variables

​NVIDIA NIM Configuration

​Basic NIM Setup

​NIM with Cloud API

​Multiple NIM Models

​Popular NIM Models

​Environment Variables

​Google Vertex AI Configuration

​Vertex AI with Parameters

​Supported Vertex AI Models

​Authentication

​HuggingFace Configuration

​HuggingFace Pipeline

​Other HuggingFace Pipeline Examples

​HuggingFace Endpoint

​Environment Variables

​Model Parameters

​Multiple Model Types

​Streaming Support

​Custom LLM Providers

​Best Practices

​Testing LLM Configuration

​Troubleshooting

​Next Steps

config.yml Schema

Guardrails Library

Build docs developers (and LLMs) love

Supported LLM Providers

OpenAI Configuration

Basic OpenAI Setup

OpenAI with Parameters

Supported OpenAI Models

Environment Variables

NVIDIA NIM Configuration

Basic NIM Setup

NIM with Cloud API

Multiple NIM Models

Popular NIM Models

Environment Variables

Google Vertex AI Configuration

Vertex AI with Parameters

Supported Vertex AI Models

Authentication

HuggingFace Configuration

HuggingFace Pipeline

Other HuggingFace Pipeline Examples

HuggingFace Endpoint

Environment Variables

Model Parameters

Multiple Model Types

Streaming Support

Custom LLM Providers

Best Practices

Testing LLM Configuration

Troubleshooting

Next Steps