The NeMo Guardrails server provides a REST API for adding guardrails to your LLM applications. It’s compatible with OpenAI’s Chat Completions API, making integration straightforward.
Starting the Server
Basic Usage
Start the server pointing to a directory containing guardrails configurations:
nemoguardrails server --config=/path/to/configs
The server will:
Start on port 8000 by default
Load all valid configurations from subdirectories
Expose the Chat UI at http://localhost:8000
Expose the API at http://localhost:8000/v1/
Command Options
Basic
Custom Port
Verbose Mode
No Chat UI
Auto-Reload
With Prefix
nemoguardrails server --config=./configs
Available Options:
Path to a directory containing multiple configuration sub-folders, or a single configuration directory.
The port that the server should listen on.
The default configuration to use when no config is specified in requests.
Enable verbose logging including prompts and completions.
Disable the web-based Chat UI.
Enable automatic reloading of configurations when files change.
A prefix to add to all server paths (must start with ’/’).
Configuration Directory Structure
Multiple Configurations
For multiple guardrails configurations:
configs/
├── customer_service/
│ ├── config.yml
│ ├── rails.co
│ └── actions.py
├── content_moderation/
│ ├── config.yml
│ └── rails.co
└── qa_bot/
├── config.yml
└── rails.co
Each subdirectory represents a separate configuration accessible via its name.
Single Configuration
For a single configuration:
my_config/
├── config.yml
├── rails.co
├── actions.py
└── kb/
└── documents.md
When pointing to a directory with config.yml directly, the server runs in single-config mode.
API Endpoints
Chat Completions
The primary endpoint for generating responses with guardrails.
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"config_id": "customer_service",
"messages": [
{"role": "user", "content": "Hello! How can I reset my password?"}
]
}'
Request Body:
The ID of the guardrails configuration to use. Corresponds to the subdirectory name.
Array of message objects with role and content fields.
Optional model override. If specified, overrides the main model in the configuration.
Enable streaming responses.
Maximum tokens to generate.
Sampling temperature (0-2).
Nucleus sampling parameter.
Response Format:
{
"id" : "chatcmpl-123" ,
"object" : "chat.completion" ,
"created" : 1677652288 ,
"model" : "gpt-3.5-turbo" ,
"choices" : [
{
"index" : 0 ,
"message" : {
"role" : "assistant" ,
"content" : "Hello! How can I help you today?"
},
"finish_reason" : "stop"
}
]
}
Streaming Responses
Enable streaming to receive responses token-by-token:
import requests
response = requests.post(
"http://localhost:8000/v1/chat/completions" ,
json = {
"config_id" : "customer_service" ,
"messages" : [{ "role" : "user" , "content" : "Tell me a story" }],
"stream" : True
},
stream = True
)
for line in response.iter_lines():
if line:
decoded = line.decode( 'utf-8' )
if decoded.startswith( 'data: ' ):
chunk = decoded[ 6 :]
if chunk != '[DONE]' :
print (chunk, end = '' , flush = True )
List Configurations
Get all available guardrails configurations:
curl http://localhost:8000/v1/rails/configs
Response:
[
{ "id" : "customer_service" },
{ "id" : "content_moderation" },
{ "id" : "qa_bot" }
]
List Models
Get available models from the configured provider:
curl http://localhost:8000/v1/models
Response:
{
"data" : [
{ "id" : "gpt-3.5-turbo" , "object" : "model" },
{ "id" : "gpt-4" , "object" : "model" }
]
}
Advanced Features
Context and State Management
Include context variables in your requests:
import requests
response = requests.post(
"http://localhost:8000/v1/chat/completions" ,
json = {
"config_id" : "customer_service" ,
"messages" : [
{ "role" : "user" , "content" : "What's my account status?" }
],
"context" : {
"user_id" : "12345" ,
"account_type" : "premium" ,
"user_name" : "Alice"
}
}
)
Thread Support
Maintain conversation threads across multiple requests:
import requests
import uuid
# Generate a unique thread ID (minimum 16 characters)
thread_id = str (uuid.uuid4())
# First message in thread
response1 = requests.post(
"http://localhost:8000/v1/chat/completions" ,
json = {
"config_id" : "customer_service" ,
"thread_id" : thread_id,
"messages" : [{ "role" : "user" , "content" : "My name is Alice" }]
}
)
# Continue the thread
response2 = requests.post(
"http://localhost:8000/v1/chat/completions" ,
json = {
"config_id" : "customer_service" ,
"thread_id" : thread_id,
"messages" : [{ "role" : "user" , "content" : "What's my name?" }]
}
)
# Response will include context from previous messages
Model Override
Override the configured model for specific requests:
response = requests.post(
"http://localhost:8000/v1/chat/completions" ,
json = {
"config_id" : "customer_service" ,
"model" : "gpt-4" , # Override config model
"messages" : [{ "role" : "user" , "content" : "Complex question" }]
}
)
Environment Variables
CORS Configuration
Enable Cross-Origin Resource Sharing:
export NEMO_GUARDRAILS_SERVER_ENABLE_CORS = true
export NEMO_GUARDRAILS_SERVER_ALLOWED_ORIGINS = "http://localhost:3000,https://example.com"
nemoguardrails server --config=./configs
Model Configuration
Set the main model engine and base URL:
export MAIN_MODEL_ENGINE = openai
export MAIN_MODEL_BASE_URL = http :// localhost : 8080 / v1
nemoguardrails server --config=./configs
Docker Deployment
Using Docker
# Build the image
docker build -t nemoguardrails .
# Run the server
docker run -p 8000:8000 \
-v $( pwd ) /configs:/configs \
-e OPENAI_API_KEY= $OPENAI_API_KEY \
nemoguardrails \
nemoguardrails server --config=/configs
Docker Compose
version : '3.8'
services :
guardrails :
image : nemoguardrails
ports :
- "8000:8000"
volumes :
- ./configs:/configs
environment :
- OPENAI_API_KEY=${OPENAI_API_KEY}
command : nemoguardrails server --config=/configs --verbose
Run with:
Chat UI
The built-in Chat UI is available at http://localhost:8000 when the server is running.
Features:
Interactive chat interface
Configuration selection
Message history
Real-time streaming
Disable the UI:
nemoguardrails server --config=./configs --disable-chat-ui
When disabled, the root endpoint returns:
Health Checks and Monitoring
Health Check
Check if the server is running:
curl http://localhost:8000/v1/rails/configs
A successful response indicates the server is healthy.
Logging
Enable verbose logging to monitor requests:
nemoguardrails server --config=./configs --verbose
Logs will include:
Request details
Configuration loading
LLM calls (if verbose)
Rail activations
Error traces
Integration Examples
OpenAI SDK
Use the OpenAI Python SDK with the guardrails server:
from openai import OpenAI
# Point to the guardrails server
client = OpenAI(
base_url = "http://localhost:8000/v1" ,
api_key = "not-needed" # API key handled by guardrails config
)
response = client.chat.completions.create(
model = "customer_service" , # This is the config_id
messages = [
{ "role" : "user" , "content" : "Hello!" }
]
)
print (response.choices[ 0 ].message.content)
LangChain
Integrate with LangChain:
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage
llm = ChatOpenAI(
base_url = "http://localhost:8000/v1" ,
model = "customer_service" , # config_id
api_key = "not-needed"
)
response = llm.invoke([HumanMessage( content = "Hello!" )])
print (response.content)
Production Deployment
For production deployments:
Use a process manager
Use systemd, supervisord, or PM2 to manage the server process.
Enable auto-reload
Use --auto-reload to automatically reload configurations without server restart.
Set up reverse proxy
Use Nginx or Apache as a reverse proxy for SSL/TLS termination and load balancing.
Configure CORS
Set appropriate CORS headers for your frontend applications.
Monitor logs
Set up log aggregation and monitoring with tools like ELK stack or Datadog.
Troubleshooting
Configuration Not Loading
Ensure your configuration directory has valid config.yml files:
# Check structure
ls -la configs/
# Validate YAML
python -c "import yaml; yaml.safe_load(open('configs/my_config/config.yml'))"
Port Already in Use
Change the port:
nemoguardrails server --config=./configs --port=8080
Model Connection Issues
Verify environment variables:
echo $OPENAI_API_KEY
echo $MAIN_MODEL_ENGINE
echo $MAIN_MODEL_BASE_URL
Next Steps
Python API Use guardrails programmatically in your code
CLI Tools Interactive chat and testing tools
Configuration Configure your guardrails
Docker Guide Deploy with Docker