Output Rails
Output rails execute after the LLM generates a response but before it’s delivered to the user. They validate, filter, and post-process bot messages to prevent hallucinations, policy violations, and sensitive data leaks.
When Output Rails Execute
Output rails run immediately after the LLM generates a response:
LLM Response → Output Rails → User Delivery
↓
Block/Allow/Modify
Blocked outputs trigger fallback responses. The user never sees the original unsafe content.
Built-in Output Rails
Content Safety Check Output
Validates bot responses against content policies.
Configuration
models:
- type: main
engine: nim
model: meta/llama-3.3-70b-instruct
- type: content_safety
engine: nim
model: nvidia/llama-3.1-nemoguard-8b-content-safety
rails:
output:
flows:
- content safety check output $model=content_safety
Action Implementation
From nemoguardrails/library/content_safety/actions.py:143:
@action(output_mapping=content_safety_check_output_mapping)
async def content_safety_check_output(
llms: Dict[str, BaseLLM],
llm_task_manager: LLMTaskManager,
model_name: Optional[str] = None,
context: Optional[dict] = None,
model_caches: Optional[Dict[str, CacheInterface]] = None,
**kwargs,
) -> dict:
_MAX_TOKENS = 3
user_input: str = ""
bot_response: str = ""
if context is not None:
user_input = context.get("user_message", "")
bot_response = context.get("bot_message", "")
model_name = model_name or context.get("model", None)
# ... safety validation
return {"allowed": is_safe, "policy_violations": violated_policies}
Output mapping:
def content_safety_check_output_mapping(result: dict) -> bool:
"""Returns True if content should be blocked (allowed is False)"""
allowed = result.get("allowed", True)
return not allowed
Self Check Output
Uses the main LLM to validate its own responses.
Configuration
rails:
output:
flows:
- self check output
Action Implementation
From nemoguardrails/library/self_check/output_check/actions.py:32:
@action(is_system_action=True, output_mapping=lambda value: not value)
async def self_check_output(
llm_task_manager: LLMTaskManager,
context: Optional[dict] = None,
llm: Optional[BaseLLM] = None,
config: Optional[RailsConfig] = None,
**kwargs,
):
"""Checks if the output from the bot.
Prompt the LLM, using the `self_check_output` task prompt, to determine if the output
from the bot should be allowed or not.
The LLM call should return "yes" if the output is bad and should be blocked
(this is consistent with self_check_input_prompt).
Returns:
True if the output should be allowed, False otherwise.
"""
_MAX_TOKENS = 3
bot_response = context.get("bot_message")
user_input = context.get("user_message")
bot_thinking = context.get("bot_thinking")
task = Task.SELF_CHECK_OUTPUT
if bot_response:
prompt = llm_task_manager.render_task_prompt(
task=task,
context={
"user_input": user_input,
"bot_response": bot_response,
"bot_thinking": bot_thinking,
},
)
# ... LLM validation
The output mapping lambda value: not value inverts the result because self_check_output returns True when safe, but rails expect True to block.
Hallucination Detection
Detects hallucinations by checking self-consistency across multiple completions.
Configuration
rails:
output:
flows:
- self check hallucination
How It Works
From nemoguardrails/library/hallucination/actions.py:40:
@action(output_mapping=lambda value: value)
async def self_check_hallucination(
llm: BaseLLM,
llm_task_manager: LLMTaskManager,
context: Optional[dict] = None,
use_llm_checking: bool = True,
config: Optional[RailsConfig] = None,
**kwargs,
):
"""Checks if the last bot response is a hallucination by checking multiple completions for self-consistency.
:return: True if hallucination is detected, False otherwise.
"""
bot_response = context.get("bot_message")
last_bot_prompt_string = context.get("_last_bot_prompt")
if bot_response and last_bot_prompt_string:
num_responses = HALLUCINATION_NUM_EXTRA_RESPONSES # 2 extra responses
# Generate multiple responses with temperature 1.0
llm_with_config = llm.bind(temperature=1.0, n=num_responses)
extra_llm_response = await llm_with_config.agenerate(
[formatted_prompt],
callbacks=logging_callback_manager_for_chain.handlers,
)
# Extract responses
extra_responses = []
for i in range(num_responses):
result = extra_llm_completions[i].text
result = get_multiline_response(result)
result = strip_quotes(result)
extra_responses.append(result)
# Check agreement using LLM
if use_llm_checking:
prompt = llm_task_manager.render_task_prompt(
task=Task.SELF_CHECK_HALLUCINATION,
context={
"statement": bot_response,
"paragraph": ". ".join(extra_responses),
},
)
agreement = await llm_call(llm, prompt, stop=stop)
return "no" in agreement.lower().strip()
Hallucination detection works best with OpenAI models that support the n parameter for multiple completions. Other providers may not support this feature.
- Generates 2 extra completions per response
- Adds significant latency (~3x normal response time)
- Best for high-stakes applications where accuracy is critical
Llama Guard Check Output
Uses Meta’s Llama Guard model for output validation.
Configuration
models:
- type: main
engine: openai
model: gpt-4
- type: llama_guard
engine: nim
model: meta/llama-guard-3-8b
rails:
output:
flows:
- llama guard check output
Action Implementation
From nemoguardrails/library/llama_guard/actions.py:100:
@action(output_mapping=llama_guard_check_output_mapping)
async def llama_guard_check_output(
llm_task_manager: LLMTaskManager,
context: Optional[dict] = None,
llama_guard_llm: Optional[BaseLLM] = None,
) -> dict:
"""
Check the bot response using the configured Llama Guard model
and the configured prompt containing the safety guidelines.
"""
user_input = context.get("user_message")
bot_response = context.get("bot_message")
check_output_prompt = llm_task_manager.render_task_prompt(
task=Task.LLAMA_GUARD_CHECK_OUTPUT,
context={
"user_input": user_input,
"bot_response": bot_response,
},
)
# ... returns {"allowed": bool, "policy_violations": list}
The mapping function:
def llama_guard_check_output_mapping(result: dict) -> bool:
"""Returns True if response should be blocked (allowed is False)"""
allowed = result.get("allowed", True)
return not allowed
Sensitive Data Masking
Removes PII from bot responses before delivery.
Configuration
rails:
config:
sensitive_data_detection:
output:
entities:
- PERSON
- EMAIL_ADDRESS
- PHONE_NUMBER
- CREDIT_CARD
score_threshold: 0.6
output:
flows:
- mask sensitive data output
Usage in Flows
define flow sanitize output
bot ...
$bot_message = execute mask_sensitive_data(source="output", text=$bot_message)
The action replaces detected entities:
Original: "Contact John Doe at [email protected]"
Masked: "Contact <PERSON> at <EMAIL_ADDRESS>"
Usage Examples
Combining Multiple Output Rails
rails:
output:
flows:
- content safety check output $model=content_safety
- self check hallucination
- mask sensitive data output
Conditional Output Checking
Only check outputs for specific topics:
define flow answer medical question
user ask medical question
bot provide medical answer
# Extra validation for medical advice
execute content_safety_check_output(model="medical_safety")
execute self_check_hallucination
Custom Fallback Messages
define bot refuse unsafe response
"I apologize, but I cannot provide that response."
"Let me try to help you in a different way."
define flow handle blocked output
bot ...
if $output_blocked
bot refuse unsafe response
Parallel Output Rails
Enable parallel execution for better performance:
rails:
config:
parallel_rails:
output: true
output:
flows:
- content safety check output $model=content_safety
- llama guard check output
Advanced Configurations
Reasoning-Enabled Content Safety
For advanced models, enable reasoning in safety checks:
rails:
config:
content_safety:
reasoning:
enabled: true
This provides explainable safety decisions with reasoning chains.
Caching Output Checks
Enable model caching to speed up repeated checks:
rails:
config:
model_caches:
content_safety:
type: memory
max_size: 1000
From nemoguardrails/library/content_safety/actions.py:196:
if cache:
cache_key = create_normalized_cache_key(check_output_prompt)
cached_result = get_from_cache_and_restore_stats(cache, cache_key)
if cached_result is not None:
log.debug(f"Content safety output cache hit for model '{model_name}'")
return cached_result
Best Practices
- Prioritize fast rails first - Run lightweight checks before expensive ones
- Use specialized models - Content safety models are faster than LLM self-checks
- Cache results - Reduce latency for similar outputs
- Layer defenses - Combine multiple complementary rails
- Test fallback messages - Ensure blocked outputs provide helpful alternatives
- Monitor false positives - Track and tune thresholds to minimize over-blocking
| Rail Type | Latency Impact | Accuracy |
|---|
| Content Safety (NIM) | Low (~50-100ms) | High |
| Llama Guard | Low (~50-100ms) | High |
| Self Check Output | Medium (~200-500ms) | Medium |
| Hallucination Detection | High (~3x response time) | High |
| Sensitive Data Masking | Low (~10-50ms) | High |
See Also