Skip to main content
The NeMo Guardrails server provides a REST API for adding guardrails to your LLM applications. It’s compatible with OpenAI’s Chat Completions API, making integration straightforward.

Starting the Server

Basic Usage

Start the server pointing to a directory containing guardrails configurations:
nemoguardrails server --config=/path/to/configs
The server will:
  • Start on port 8000 by default
  • Load all valid configurations from subdirectories
  • Expose the Chat UI at http://localhost:8000
  • Expose the API at http://localhost:8000/v1/

Command Options

nemoguardrails server --config=./configs
Available Options:
--config
string
default:"./config"
Path to a directory containing multiple configuration sub-folders, or a single configuration directory.
--port
integer
default:"8000"
The port that the server should listen on.
--default-config-id
string
The default configuration to use when no config is specified in requests.
--verbose
boolean
default:"false"
Enable verbose logging including prompts and completions.
--disable-chat-ui
boolean
default:"false"
Disable the web-based Chat UI.
--auto-reload
boolean
default:"false"
Enable automatic reloading of configurations when files change.
--prefix
string
default:""
A prefix to add to all server paths (must start with ’/’).

Configuration Directory Structure

Multiple Configurations

For multiple guardrails configurations:
configs/
├── customer_service/
│   ├── config.yml
│   ├── rails.co
│   └── actions.py
├── content_moderation/
│   ├── config.yml
│   └── rails.co
└── qa_bot/
    ├── config.yml
    └── rails.co
Each subdirectory represents a separate configuration accessible via its name.

Single Configuration

For a single configuration:
my_config/
├── config.yml
├── rails.co
├── actions.py
└── kb/
    └── documents.md
When pointing to a directory with config.yml directly, the server runs in single-config mode.

API Endpoints

Chat Completions

The primary endpoint for generating responses with guardrails.
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "config_id": "customer_service",
    "messages": [
      {"role": "user", "content": "Hello! How can I reset my password?"}
    ]
  }'
Request Body:
config_id
string
The ID of the guardrails configuration to use. Corresponds to the subdirectory name.
messages
array
required
Array of message objects with role and content fields.
model
string
Optional model override. If specified, overrides the main model in the configuration.
stream
boolean
default:"false"
Enable streaming responses.
max_tokens
integer
Maximum tokens to generate.
temperature
number
Sampling temperature (0-2).
top_p
number
Nucleus sampling parameter.
stop
array
Stop sequences.
Response Format:
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ]
}

Streaming Responses

Enable streaming to receive responses token-by-token:
import requests

response = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "config_id": "customer_service",
        "messages": [{"role": "user", "content": "Tell me a story"}],
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        decoded = line.decode('utf-8')
        if decoded.startswith('data: '):
            chunk = decoded[6:]
            if chunk != '[DONE]':
                print(chunk, end='', flush=True)

List Configurations

Get all available guardrails configurations:
curl http://localhost:8000/v1/rails/configs
Response:
[
  {"id": "customer_service"},
  {"id": "content_moderation"},
  {"id": "qa_bot"}
]

List Models

Get available models from the configured provider:
curl http://localhost:8000/v1/models
Response:
{
  "data": [
    {"id": "gpt-3.5-turbo", "object": "model"},
    {"id": "gpt-4", "object": "model"}
  ]
}

Advanced Features

Context and State Management

Include context variables in your requests:
import requests

response = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "config_id": "customer_service",
        "messages": [
            {"role": "user", "content": "What's my account status?"}
        ],
        "context": {
            "user_id": "12345",
            "account_type": "premium",
            "user_name": "Alice"
        }
    }
)

Thread Support

Maintain conversation threads across multiple requests:
import requests
import uuid

# Generate a unique thread ID (minimum 16 characters)
thread_id = str(uuid.uuid4())

# First message in thread
response1 = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "config_id": "customer_service",
        "thread_id": thread_id,
        "messages": [{"role": "user", "content": "My name is Alice"}]
    }
)

# Continue the thread
response2 = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "config_id": "customer_service",
        "thread_id": thread_id,
        "messages": [{"role": "user", "content": "What's my name?"}]
    }
)
# Response will include context from previous messages

Model Override

Override the configured model for specific requests:
response = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "config_id": "customer_service",
        "model": "gpt-4",  # Override config model
        "messages": [{"role": "user", "content": "Complex question"}]
    }
)

Environment Variables

CORS Configuration

Enable Cross-Origin Resource Sharing:
export NEMO_GUARDRAILS_SERVER_ENABLE_CORS=true
export NEMO_GUARDRAILS_SERVER_ALLOWED_ORIGINS="http://localhost:3000,https://example.com"

nemoguardrails server --config=./configs

Model Configuration

Set the main model engine and base URL:
export MAIN_MODEL_ENGINE=openai
export MAIN_MODEL_BASE_URL=http://localhost:8080/v1

nemoguardrails server --config=./configs

Docker Deployment

Using Docker

# Build the image
docker build -t nemoguardrails .

# Run the server
docker run -p 8000:8000 \
  -v $(pwd)/configs:/configs \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  nemoguardrails \
  nemoguardrails server --config=/configs

Docker Compose

docker-compose.yml
version: '3.8'

services:
  guardrails:
    image: nemoguardrails
    ports:
      - "8000:8000"
    volumes:
      - ./configs:/configs
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
    command: nemoguardrails server --config=/configs --verbose
Run with:
docker-compose up

Chat UI

The built-in Chat UI is available at http://localhost:8000 when the server is running. Features:
  • Interactive chat interface
  • Configuration selection
  • Message history
  • Real-time streaming
Disable the UI:
nemoguardrails server --config=./configs --disable-chat-ui
When disabled, the root endpoint returns:
{"status": "ok"}

Health Checks and Monitoring

Health Check

Check if the server is running:
curl http://localhost:8000/v1/rails/configs
A successful response indicates the server is healthy.

Logging

Enable verbose logging to monitor requests:
nemoguardrails server --config=./configs --verbose
Logs will include:
  • Request details
  • Configuration loading
  • LLM calls (if verbose)
  • Rail activations
  • Error traces

Integration Examples

OpenAI SDK

Use the OpenAI Python SDK with the guardrails server:
from openai import OpenAI

# Point to the guardrails server
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"  # API key handled by guardrails config
)

response = client.chat.completions.create(
    model="customer_service",  # This is the config_id
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

LangChain

Integrate with LangChain:
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage

llm = ChatOpenAI(
    base_url="http://localhost:8000/v1",
    model="customer_service",  # config_id
    api_key="not-needed"
)

response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)

Production Deployment

For production deployments:
1

Use a process manager

Use systemd, supervisord, or PM2 to manage the server process.
2

Enable auto-reload

Use --auto-reload to automatically reload configurations without server restart.
3

Set up reverse proxy

Use Nginx or Apache as a reverse proxy for SSL/TLS termination and load balancing.
4

Configure CORS

Set appropriate CORS headers for your frontend applications.
5

Monitor logs

Set up log aggregation and monitoring with tools like ELK stack or Datadog.

Troubleshooting

Configuration Not Loading

Ensure your configuration directory has valid config.yml files:
# Check structure
ls -la configs/

# Validate YAML
python -c "import yaml; yaml.safe_load(open('configs/my_config/config.yml'))"

Port Already in Use

Change the port:
nemoguardrails server --config=./configs --port=8080

Model Connection Issues

Verify environment variables:
echo $OPENAI_API_KEY
echo $MAIN_MODEL_ENGINE
echo $MAIN_MODEL_BASE_URL

Next Steps

Python API

Use guardrails programmatically in your code

CLI Tools

Interactive chat and testing tools

Configuration

Configure your guardrails

Docker Guide

Deploy with Docker

Build docs developers (and LLMs) love