Guardrails Server - NeMo Guardrails

The NeMo Guardrails server provides a REST API for adding guardrails to your LLM applications. It’s compatible with OpenAI’s Chat Completions API, making integration straightforward.

Starting the Server

Basic Usage

Start the server pointing to a directory containing guardrails configurations:

nemoguardrails server --config=/path/to/configs

The server will:

Start on port 8000 by default
Load all valid configurations from subdirectories
Expose the Chat UI at http://localhost:8000
Expose the API at http://localhost:8000/v1/

Command Options

nemoguardrails server --config=./configs

Available Options:

--config

string

default:"./config"

Path to a directory containing multiple configuration sub-folders, or a single configuration directory.

--port

integer

default:"8000"

The port that the server should listen on.

--default-config-id

string

The default configuration to use when no config is specified in requests.

--verbose

boolean

default:"false"

Enable verbose logging including prompts and completions.

--disable-chat-ui

boolean

default:"false"

Disable the web-based Chat UI.

--auto-reload

boolean

default:"false"

Enable automatic reloading of configurations when files change.

--prefix

string

default:""

A prefix to add to all server paths (must start with ’/’).

Configuration Directory Structure

Multiple Configurations

For multiple guardrails configurations:

configs/
├── customer_service/
│   ├── config.yml
│   ├── rails.co
│   └── actions.py
├── content_moderation/
│   ├── config.yml
│   └── rails.co
└── qa_bot/
    ├── config.yml
    └── rails.co

Each subdirectory represents a separate configuration accessible via its name.

Single Configuration

For a single configuration:

my_config/
├── config.yml
├── rails.co
├── actions.py
└── kb/
    └── documents.md

When pointing to a directory with config.yml directly, the server runs in single-config mode.

API Endpoints

Chat Completions

The primary endpoint for generating responses with guardrails.

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "config_id": "customer_service",
    "messages": [
      {"role": "user", "content": "Hello! How can I reset my password?"}
    ]
  }'

Request Body:

config_id

string

The ID of the guardrails configuration to use. Corresponds to the subdirectory name.

messages

array

required

Array of message objects with role and content fields.

model

string

Optional model override. If specified, overrides the main model in the configuration.

stream

boolean

default:"false"

Enable streaming responses.

max_tokens

integer

Maximum tokens to generate.

temperature

number

Sampling temperature (0-2).

top_p

number

Nucleus sampling parameter.

stop

array

Stop sequences.

Response Format:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ]
}

Streaming Responses

Enable streaming to receive responses token-by-token:

import requests

response = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "config_id": "customer_service",
        "messages": [{"role": "user", "content": "Tell me a story"}],
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        decoded = line.decode('utf-8')
        if decoded.startswith('data: '):
            chunk = decoded[6:]
            if chunk != '[DONE]':
                print(chunk, end='', flush=True)

List Configurations

Get all available guardrails configurations:

curl http://localhost:8000/v1/rails/configs

Response:

[
  {"id": "customer_service"},
  {"id": "content_moderation"},
  {"id": "qa_bot"}
]

List Models

Get available models from the configured provider:

curl http://localhost:8000/v1/models

Response:

{
  "data": [
    {"id": "gpt-3.5-turbo", "object": "model"},
    {"id": "gpt-4", "object": "model"}
  ]
}

Advanced Features

Context and State Management

Include context variables in your requests:

import requests

response = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "config_id": "customer_service",
        "messages": [
            {"role": "user", "content": "What's my account status?"}
        ],
        "context": {
            "user_id": "12345",
            "account_type": "premium",
            "user_name": "Alice"
        }
    }
)

Thread Support

Maintain conversation threads across multiple requests:

import requests
import uuid

# Generate a unique thread ID (minimum 16 characters)
thread_id = str(uuid.uuid4())

# First message in thread
response1 = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "config_id": "customer_service",
        "thread_id": thread_id,
        "messages": [{"role": "user", "content": "My name is Alice"}]
    }
)

# Continue the thread
response2 = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "config_id": "customer_service",
        "thread_id": thread_id,
        "messages": [{"role": "user", "content": "What's my name?"}]
    }
)
# Response will include context from previous messages

Model Override

Override the configured model for specific requests:

response = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "config_id": "customer_service",
        "model": "gpt-4",  # Override config model
        "messages": [{"role": "user", "content": "Complex question"}]
    }
)

Environment Variables

CORS Configuration

Enable Cross-Origin Resource Sharing:

export NEMO_GUARDRAILS_SERVER_ENABLE_CORS=true
export NEMO_GUARDRAILS_SERVER_ALLOWED_ORIGINS="http://localhost:3000,https://example.com"

nemoguardrails server --config=./configs

Model Configuration

Set the main model engine and base URL:

export MAIN_MODEL_ENGINE=openai
export MAIN_MODEL_BASE_URL=http://localhost:8080/v1

nemoguardrails server --config=./configs

Docker Deployment

Using Docker

# Build the image
docker build -t nemoguardrails .

# Run the server
docker run -p 8000:8000 \
  -v $(pwd)/configs:/configs \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  nemoguardrails \
  nemoguardrails server --config=/configs

Docker Compose

docker-compose.yml

version: '3.8'

services:
  guardrails:
    image: nemoguardrails
    ports:
      - "8000:8000"
    volumes:
      - ./configs:/configs
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    command: nemoguardrails server --config=/configs --verbose

Run with:

docker-compose up

Chat UI

The built-in Chat UI is available at http://localhost:8000 when the server is running. Features:

Interactive chat interface
Configuration selection
Message history
Real-time streaming

Disable the UI:

nemoguardrails server --config=./configs --disable-chat-ui

When disabled, the root endpoint returns:

{"status": "ok"}

Health Checks and Monitoring

Health Check

Check if the server is running:

curl http://localhost:8000/v1/rails/configs

A successful response indicates the server is healthy.

Logging

Enable verbose logging to monitor requests:

nemoguardrails server --config=./configs --verbose

Logs will include:

Request details
Configuration loading
LLM calls (if verbose)
Rail activations
Error traces

Integration Examples

OpenAI SDK

Use the OpenAI Python SDK with the guardrails server:

from openai import OpenAI

# Point to the guardrails server
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"  # API key handled by guardrails config
)

response = client.chat.completions.create(
    model="customer_service",  # This is the config_id
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

LangChain

Integrate with LangChain:

from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage

llm = ChatOpenAI(
    base_url="http://localhost:8000/v1",
    model="customer_service",  # config_id
    api_key="not-needed"
)

response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)

Production Deployment

For production deployments:

Use a process manager

Use systemd, supervisord, or PM2 to manage the server process.

Enable auto-reload

Use --auto-reload to automatically reload configurations without server restart.

Set up reverse proxy

Use Nginx or Apache as a reverse proxy for SSL/TLS termination and load balancing.

Configure CORS

Set appropriate CORS headers for your frontend applications.

Monitor logs

Set up log aggregation and monitoring with tools like ELK stack or Datadog.

Troubleshooting

Configuration Not Loading

Ensure your configuration directory has valid config.yml files:

# Check structure
ls -la configs/

# Validate YAML
python -c "import yaml; yaml.safe_load(open('configs/my_config/config.yml'))"

Port Already in Use

Change the port:

nemoguardrails server --config=./configs --port=8080

Model Connection Issues

Verify environment variables:

echo $OPENAI_API_KEY
echo $MAIN_MODEL_ENGINE
echo $MAIN_MODEL_BASE_URL

Next Steps

Python API

Use guardrails programmatically in your code

CLI Tools

Interactive chat and testing tools

Configuration

Configure your guardrails

Docker Guide

Deploy with Docker

Get Started

Core Concepts

Configuration

Guardrails Library

Built-in Guardrails

Usage

Deployment

Evaluation

​Starting the Server

​Basic Usage

​Command Options

​Configuration Directory Structure

​Multiple Configurations

​Single Configuration

​API Endpoints

​Chat Completions

​Streaming Responses

​List Configurations

​List Models

​Advanced Features

​Context and State Management

​Thread Support

​Model Override

​Environment Variables

​CORS Configuration

​Model Configuration

​Docker Deployment

​Using Docker

​Docker Compose

​Chat UI

​Health Checks and Monitoring

​Health Check

​Logging

​Integration Examples

​OpenAI SDK

​LangChain

​Production Deployment

​Troubleshooting

​Configuration Not Loading

​Port Already in Use

​Model Connection Issues

​Next Steps

Python API

CLI Tools

Configuration

Docker Guide

Build docs developers (and LLMs) love

Starting the Server

Basic Usage

Command Options

Configuration Directory Structure

Multiple Configurations

Single Configuration

API Endpoints

Chat Completions

Streaming Responses

List Configurations

List Models

Advanced Features

Context and State Management

Thread Support

Model Override

Environment Variables

CORS Configuration

Model Configuration

Docker Deployment

Using Docker

Docker Compose

Chat UI

Health Checks and Monitoring

Health Check

Logging

Integration Examples

OpenAI SDK

LangChain

Production Deployment

Troubleshooting

Configuration Not Loading

Port Already in Use

Model Connection Issues

Next Steps