Skip to main content

Server Modes

The NeMo Guardrails server supports two modes:

Multi-Config Mode

In multi-config mode, the server can serve multiple guardrails configurations:
nemoguardrails server --config=/path/to/configs
Directory structure:
configs/
├── config1/
│   ├── config.yml
│   └── rails.co
├── config2/
│   ├── config.yml
│   └── rails.co
└── config3/
    ├── config.yml
    └── rails.co
Clients specify which config to use:
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[...],
    extra_body={
        "guardrails": {
            "config_id": "config1"
        }
    }
)

Single-Config Mode

In single-config mode, the server serves a single configuration:
nemoguardrails server --config=/path/to/my-config
Directory structure:
my-config/
├── config.yml
├── rails.co
└── actions.py
Clients don’t need to specify a config_id.

Server Options

Command-Line Options

nemoguardrails server [OPTIONS]
--port
integer
default:"8000"
The port that the server should listen on.
--config
path
Path to a directory containing configuration sub-folders (multi-config mode) or a single configuration directory (single-config mode).
--default-config-id
string
The default configuration to use when no config is specified in requests.
--verbose
boolean
default:"false"
Enable verbose logging including prompts and LLM calls.
--disable-chat-ui
boolean
default:"false"
Disable the built-in chat UI.
--auto-reload
boolean
default:"false"
Enable automatic reloading when configuration files change.
--prefix
string
default:""
A prefix that should be added to all server paths (must start with ’/’).
nemoguardrails server --config=./configs --port=8000

Environment Variables

CORS Configuration

NEMO_GUARDRAILS_SERVER_ENABLE_CORS
string
default:"false"
Enable Cross-Origin Resource Sharing (CORS).
NEMO_GUARDRAILS_SERVER_ALLOWED_ORIGINS
string
default:"*"
Comma-separated list of allowed origins. Use ”*” to allow all origins.
export NEMO_GUARDRAILS_SERVER_ENABLE_CORS=true
export NEMO_GUARDRAILS_SERVER_ALLOWED_ORIGINS="http://localhost:3000,https://myapp.com"
nemoguardrails server --config=./configs

Model Configuration

MAIN_MODEL_ENGINE
string
default:"openai"
The default LLM provider when model is specified in request.
MAIN_MODEL_BASE_URL
string
Base URL for the LLM provider API.
DEFAULT_CONFIG_ID
string
Default configuration ID to use.
export MAIN_MODEL_ENGINE=nvidia_ai_endpoints
export MAIN_MODEL_BASE_URL=https://integrate.api.nvidia.com/v1
nemoguardrails server --config=./configs

Server Configuration File

You can create a config.py file in your configs directory to customize server behavior:
config.py
from fastapi import FastAPI
from nemoguardrails.server.datastore import MemoryStore, register_datastore

def init(app: FastAPI):
    """Initialize server with custom configuration."""
    
    # Register a custom datastore for threads
    datastore = MemoryStore()
    register_datastore(datastore)
    
    # Register custom loggers
    def custom_logger(data: dict):
        print(f"Request logged: {data['endpoint']}")
    
    from nemoguardrails.server.api import register_logger
    register_logger(custom_logger)
    
    # Set default config
    from nemoguardrails.server.api import set_default_config_id
    set_default_config_id("my-default-config")

API Endpoints

GET /v1/rails/configs

List available guardrails configurations.
curl http://localhost:8000/v1/rails/configs

GET /v1/models

List available LLM models from the configured provider.
curl http://localhost:8000/v1/models

POST /v1/chat/completions

See Chat Completions API for details.

Thread Management

The server supports conversation threads for maintaining state across requests.

Using Threads

import uuid
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"
)

# Generate a unique thread ID (minimum 16 characters)
thread_id = str(uuid.uuid4())

# First message in thread
response1 = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "My name is Alice"}],
    extra_body={
        "guardrails": {
            "config_id": "my-config",
            "thread_id": thread_id
        }
    }
)

# Continue thread
response2 = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is my name?"}],
    extra_body={
        "guardrails": {
            "config_id": "my-config",
            "thread_id": thread_id
        }
    }
)

print(response2.choices[0].message.content)  # "Your name is Alice"

Custom Datastore

By default, threads are stored in memory. You can configure a custom datastore:
config.py
from nemoguardrails.server.datastore import RedisStore, register_datastore

def init(app):
    # Use Redis for persistent thread storage
    datastore = RedisStore(
        host="localhost",
        port=6379,
        db=0
    )
    register_datastore(datastore)

Auto-Reload

Enable auto-reload to automatically reload configurations when files change:
nemoguardrails server --config=./configs --auto-reload
Requires the watchdog package:
pip install watchdog
When enabled, the server monitors configuration files and reloads them automatically when changes are detected.

Chat UI

The server includes a built-in chat UI accessible at http://localhost:8000. To disable the chat UI:
nemoguardrails server --config=./configs --disable-chat-ui

Production Deployment

Using Gunicorn

pip install gunicorn
gunicorn nemoguardrails.server.api:app \
  --workers 4 \
  --worker-class uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000

Using Docker

Dockerfile
FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY configs/ /app/configs/

EXPOSE 8000

CMD ["nemoguardrails", "server", "--config=/app/configs", "--port=8000"]
docker build -t guardrails-server .
docker run -p 8000:8000 guardrails-server

Environment Variables

export OPENAI_API_KEY=sk-...
export MAIN_MODEL_ENGINE=openai
export NEMO_GUARDRAILS_SERVER_ENABLE_CORS=true
nemoguardrails server --config=./configs

Build docs developers (and LLMs) love