Core Concepts Overview
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational applications. Guardrails (or “rails” for short) are specific ways of controlling the output of a large language model, such as not talking about politics, responding in a particular way to specific user requests, following a predefined dialog path, using a particular language style, extracting structured data, and more.What Are Programmable Guardrails?
Programmable guardrails sit between your application code and the LLM, providing a flexible layer of control over how the LLM behaves. Rather than relying solely on prompts or post-processing, guardrails enable you to define explicit rules and flows that govern the conversation.
Key Benefits
Programmable guardrails provide several critical advantages:Build Trustworthy Applications
Define rails to guide and safeguard conversations. Choose to define the behavior of your LLM-based application on specific topics and prevent it from engaging in discussions on unwanted topics.
Connect Services Securely
Connect an LLM to other services (tools) seamlessly and securely. Validate tool inputs and outputs with execution rails.
Controllable Dialog
Steer the LLM to follow pre-defined conversational paths, allowing you to design the interaction following conversation design best practices and enforce standard operating procedures.
Multi-Stage Protection
Apply different types of guardrails at five distinct stages: input, retrieval, dialog, execution, and output.
Core Framework Components
The NeMo Guardrails framework consists of several key components that work together:RailsConfig
TheRailsConfig class defines the complete configuration for your guardrails, including:
- LLM Models: Specify which language models to use (main, embeddings, etc.)
- Rails: Configure which guardrails are active and how they operate
- Colang Definitions: Load dialog flows and message definitions from
.cofiles - Custom Actions: Register Python functions as callable actions
- Instructions: Provide context and guidelines to the LLM
LLMRails
TheLLMRails class is the main entry point for using guardrails. It wraps your LLM with the configured guardrails:
The
generate method uses the same message format as the OpenAI Chat Completions API, making it easy to integrate with existing applications.Event-Driven Runtime
NeMo Guardrails uses an event-driven runtime to process conversations. Every interaction generates events that flow through the system:- User utterance →
UtteranceUserActionFinishedevent - Canonical form generation →
UserIntentevent - Next step decision →
BotIntentor action events - Bot response generation →
StartUtteranceBotActionevent
Async-First Architecture
NeMo Guardrails is built with an async-first design. The core mechanics are implemented using Python’s async model, providing several advantages:Better Concurrency
Better Concurrency
Multiple users can be served concurrently without blocking. When one request waits for an LLM response, others can continue processing.
Dual API Support
Dual API Support
Both synchronous and asynchronous versions of methods are available:
- Sync:
rails.generate(messages) - Async:
await rails.generate_async(messages)
Efficient Resource Usage
Efficient Resource Usage
Actions and LLM calls run asynchronously, making better use of system resources during I/O operations.
Configuration Structure
A typical guardrails configuration follows this structure:Sample config.yml
Use Cases
You can use programmable guardrails in different types of applications:- Question Answering
- Domain Assistants
- LLM Endpoints
Enforce fact-checking and output moderation over a set of documents (RAG).
Next Steps
Guardrail Types
Learn about the five types of rails and when to use them
Colang DSL
Understand the Colang language for defining flows and rails
Architecture
Deep dive into the runtime and processing pipeline
Get Started
Start building your first guardrails configuration