Moving Agents to Production

From Development to Production

Moving an agent system to production requires careful planning around observability, evaluation, and scalability. The transition from local development to production involves:

Trace Management: Systematically capturing and uploading agent execution traces
Online Evaluation: Continuous monitoring of agent performance in production
Scale Considerations: Handling increased volume and operational requirements

Production agents require different instrumentation than development agents. Ensure trace upload, error handling, and evaluation pipelines are in place before deploying.

Production Readiness Checklist

Establish Observability

Configure LangSmith or equivalent tracing to capture all agent interactions. Ensure traces include:

Input/output data
Execution timing
Error states
Custom metadata (user IDs, session IDs, etc.)

Set Up Evaluation Pipeline

Deploy online evaluators that run automatically on production traces:

Response quality metrics
Latency thresholds
Error rate monitoring
Custom business logic validators

Implement Trace Upload

Build a robust system for uploading traces at scale. See Trace Upload for implementation details.

Configure Monitoring & Alerts

Set up dashboards and alerts for:

Agent success/failure rates
Evaluation score distributions
System latency and throughput
Cost per interaction

Key Production Patterns

Synthetic Data for Testing

Before deploying to production, generate synthetic traces that mirror expected production patterns:

# Generate realistic test traces
from datetime import datetime, timedelta
import random

def generate_synthetic_trace(category: str, user_query: str):
    """Create a synthetic trace for testing."""
    return {
        "id": str(uuid.uuid4()),
        "trace_id": str(uuid.uuid4()),
        "name": "CustomerSupportAgent",
        "run_type": "chain",
        "inputs": {"question": user_query},
        "outputs": {"answer": "..."},
        "start_time": datetime.utcnow().isoformat(),
        "end_time": (datetime.utcnow() + timedelta(seconds=2)).isoformat(),
        "tags": [category],
        "extra": {"metadata": {"category": category}}
    }

Time-Shifted Trace Upload

When uploading historical or synthetic traces, shift timestamps to appear recent:

from datetime import datetime, timezone

# Calculate time delta to make traces appear current
latest = max(parse_dt(r["start_time"]) for r in runs if r["start_time"])
time_delta = datetime.now(timezone.utc).replace(tzinfo=None) - latest

# Apply shift to all timestamps
for run in runs:
    run["start_time"] = parse_dt(run["start_time"]) + time_delta
    if run.get("end_time"):
        run["end_time"] = parse_dt(run["end_time"]) + time_delta

ID Regeneration

When re-uploading traces, generate fresh IDs while preserving parent-child relationships:

from langsmith import uuid7

# Build ID mapping (uuid7 preserves time-ordering)
id_map = {}
for run in runs:
    for field in ("id", "trace_id", "parent_run_id"):
        old_id = run.get(field)
        if old_id and old_id not in id_map:
            id_map[old_id] = str(uuid7())

# Remap all IDs
for run in runs:
    run["id"] = id_map[run["id"]]
    if run.get("parent_run_id"):
        run["parent_run_id"] = id_map[run["parent_run_id"]]

Operational Excellence

Batching and Flushing

Always batch trace uploads and explicitly flush when complete:

from langsmith import Client

client = Client()

# Upload traces in batches
for i, trace_runs in enumerate(traces.values()):
    # ... upload logic ...
    
    if (i + 1) % 10 == 0:
        print(f"Uploaded {i + 1}/{len(traces)} traces")

# Critical: flush before exit
client.flush()

Failing to call client.flush() may result in lost traces. Always flush before your application exits.

Error Handling

Production agents must gracefully handle failures:

try:
    root_tree.post(exclude_child_runs=False)
except Exception as e:
    logger.error(f"Failed to upload trace {root_tree.id}: {e}")
    # Implement retry logic or dead-letter queue
    failed_traces.append(root_tree)

Production Deployment Strategies

Blue-Green Deployment

Deploy new agent version alongside existing version
Route small percentage of traffic to new version
Monitor evaluation metrics for regressions
Gradually increase traffic or rollback if issues detected

Canary Releases

Deploy to single region or customer segment
Run online evaluations continuously
Compare metrics against baseline
Full rollout only after validation period

Shadow Mode

Run new agent version in parallel without serving responses
Compare outputs against production agent
Evaluate differences and edge cases
Promote once confidence is established

Next Steps

Trace Upload

Implement scalable trace upload systems

Online Evaluation

Deploy continuous evaluation pipelines

Get Started

Core Concepts

Building Agents

Evaluation

Production

From Development to Production

Production Readiness Checklist

Key Production Patterns

Synthetic Data for Testing

Time-Shifted Trace Upload

ID Regeneration

Operational Excellence

Batching and Flushing

Error Handling

Production Deployment Strategies

Blue-Green Deployment

Canary Releases

Shadow Mode

Next Steps

Trace Upload

Online Evaluation

Build docs developers (and LLMs) love

Get Started

Core Concepts

Building Agents

Evaluation

Production

​From Development to Production

​Production Readiness Checklist

​Key Production Patterns

​Synthetic Data for Testing

​Time-Shifted Trace Upload

​ID Regeneration

​Operational Excellence

​Batching and Flushing

​Error Handling

​Production Deployment Strategies

​Blue-Green Deployment

​Canary Releases

​Shadow Mode

​Next Steps

Trace Upload

Online Evaluation

Build docs developers (and LLMs) love

From Development to Production

Production Readiness Checklist

Key Production Patterns

Synthetic Data for Testing

Time-Shifted Trace Upload

ID Regeneration

Operational Excellence

Batching and Flushing

Error Handling

Production Deployment Strategies

Blue-Green Deployment

Canary Releases

Shadow Mode

Next Steps