Skip to main content
The AI agent uses LangGraph’s PostgreSQL checkpointing to maintain conversation state across sessions. This enables multi-turn conversations with full context preservation.

Overview

Checkpointing is implemented using langgraph.checkpoint.postgres.PostgresSaver (see src/copilot/graph.py:417-429):
def create_agent_graph():
    """Create and compile the agent graph with PostgreSQL checkpointing.
    
    Returns:
        Compiled LangGraph workflow ready for invocation
    
    Note:
        The connection to PostgreSQL is managed internally. For production,
        consider using a connection pool for better resource management.
    """
    conn = Connection.connect(config.VECTOR_DATABASE_URL, **_connection_kwargs)
    checkpointer = PostgresSaver(conn)
    checkpointer.setup()
    
    # ... workflow definition ...
    
    return workflow.compile(checkpointer=checkpointer)

Environment Variables

Configure the PostgreSQL connection for checkpointing:
# Required: PostgreSQL database URL
VECTOR_DATABASE_URL=postgresql://user:password@localhost:5432/dbname

# Alternative variable name (for compatibility)
DATABASE_URL=postgresql://user:password@localhost:5432/dbname

Connection String Format

postgresql://[user[:password]@][host][:port][/database][?param1=value1&...]
Examples:
# Local development
VECTOR_DATABASE_URL=postgresql://postgres:password@localhost:5432/copilot_dev

# Production with SSL
VECTOR_DATABASE_URL=postgresql://user:[email protected]:5432/copilot?sslmode=require

# Cloud provider (e.g., Neon, Supabase)
VECTOR_DATABASE_URL=postgresql://user:[email protected]:5432/main?sslmode=require

Connection Configuration

The checkpointer uses specific connection settings (defined in src/copilot/graph.py:32-35):
_connection_kwargs = {
    "prepare_threshold": 0,
    "autocommit": True,
}

Connection Parameters

prepare_threshold
integer
default:0
Disables prepared statements. Set to 0 to avoid issues with connection poolers like PgBouncer.
autocommit
boolean
default:true
Enables autocommit mode for immediate persistence of checkpoint data.

Database Schema

The PostgresSaver automatically creates the required tables when checkpointer.setup() is called:
-- Checkpoints table (created automatically by LangGraph)
CREATE TABLE IF NOT EXISTS checkpoints (
    thread_id TEXT NOT NULL,
    checkpoint_id TEXT NOT NULL,
    parent_id TEXT,
    checkpoint JSONB NOT NULL,
    metadata JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (thread_id, checkpoint_id)
);

-- Writes table for pending operations
CREATE TABLE IF NOT EXISTS checkpoint_writes (
    thread_id TEXT NOT NULL,
    checkpoint_id TEXT NOT NULL,
    task_id TEXT NOT NULL,
    idx INTEGER NOT NULL,
    channel TEXT NOT NULL,
    value JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Note: These tables are managed by LangGraph. You don’t need to create them manually.

Session Management

Conversations are organized by thread_id. Each thread maintains its own checkpoint history:
from src.copilot.graph import create_agent_graph

app = create_agent_graph()

# Start a new conversation
result = app.invoke(
    input={"messages": [("user", "What is incident INC-2025-08-24-001?")]},
    config={"configurable": {"thread_id": "user-123-session-1"}}
)

# Continue the conversation (context is preserved)
result = app.invoke(
    input={"messages": [("user", "What was the resolution?")]},
    config={"configurable": {"thread_id": "user-123-session-1"}}
)

Thread ID Conventions

Recommended patterns for thread_id:
# User-based sessions
thread_id = f"user-{user_id}-session-{session_id}"

# Timestamp-based
thread_id = f"{user_id}-{datetime.now().isoformat()}"

# UUID-based (for anonymous sessions)
thread_id = str(uuid.uuid4())

Agent State Schema

The checkpointed state includes (defined in src/copilot/graph.py:74-82):
class AgentState(TypedDict):
    """State schema for the agent graph."""
    messages: Annotated[Sequence[BaseMessage], add_messages]
    title: Optional[str]
    session_id: Optional[str]
    user_id: Optional[str]
    langfuse_enabled: Optional[bool]
    generate_title: Optional[bool]

State Fields

messages
Sequence[BaseMessage]
Conversation history with automatic message deduplication via add_messages reducer
title
string
Generated title for the conversation
session_id
string
Session identifier for tracking and analytics
user_id
string
User identifier for tracking and analytics
langfuse_enabled
boolean
Enable Langfuse observability callbacks
generate_title
boolean
Whether to generate a conversation title automatically

Checkpoint Lifecycle

1. Checkpoint Creation

A checkpoint is created after each node execution:
User Message → Support Bot → Tool Call → Tool Execution → Support Bot → Response
     ↓              ↓            ↓             ↓              ↓            ↓
  Checkpoint    Checkpoint   Checkpoint    Checkpoint    Checkpoint   Checkpoint

2. Checkpoint Retrieval

When continuing a conversation, LangGraph:
  1. Loads the latest checkpoint for the thread_id
  2. Restores the full agent state
  3. Continues execution from that point

3. Checkpoint History

You can access historical checkpoints:
from langgraph.checkpoint.postgres import PostgresSaver

checkpointer = PostgresSaver(conn)

# Get checkpoint history for a thread
history = checkpointer.list(
    config={"configurable": {"thread_id": "user-123-session-1"}}
)

for checkpoint in history:
    print(f"Checkpoint: {checkpoint.id}")
    print(f"Created: {checkpoint.created_at}")
    print(f"Messages: {len(checkpoint.state['messages'])}")

Streaming with Checkpoints

Checkpoints work seamlessly with streaming:
for mode, chunk in app.stream(
    config={"configurable": {"thread_id": "user-123-session-1"}},
    input={"messages": [("user", "Tell me about recent incidents")]},
    stream_mode=["messages", "custom"]
):
    if mode == "messages":
        token_chunk, metadata = chunk
        if token_chunk.content:
            print(token_chunk.content, end="", flush=True)
    elif mode == "custom":
        # Handle status updates
        print(f"Update: {chunk}")
Checkpoint Timing:
  • Checkpoints are saved after each node completes
  • Streaming tokens are not checkpointed (only final messages)
  • If streaming is interrupted, the last complete checkpoint is preserved

Production Considerations

Connection Pooling

For production deployments, use a connection pool:
from psycopg_pool import ConnectionPool
from langgraph.checkpoint.postgres import PostgresSaver
import src.copilot.config as config

# Create connection pool
pool = ConnectionPool(
    conninfo=config.VECTOR_DATABASE_URL,
    min_size=5,
    max_size=20,
    kwargs={
        "prepare_threshold": 0,
        "autocommit": True,
    }
)

# Use pooled connections
with pool.connection() as conn:
    checkpointer = PostgresSaver(conn)
    checkpointer.setup()
    app = workflow.compile(checkpointer=checkpointer)

Database Maintenance

Cleanup Old Checkpoints

Implement a cleanup strategy to avoid unbounded growth:
-- Delete checkpoints older than 30 days
DELETE FROM checkpoints 
WHERE created_at < NOW() - INTERVAL '30 days';

-- Delete checkpoint writes for deleted checkpoints
DELETE FROM checkpoint_writes 
WHERE (thread_id, checkpoint_id) NOT IN (
    SELECT thread_id, checkpoint_id FROM checkpoints
);

Vacuum and Analyze

Regularly vacuum the checkpoint tables:
VACUUM ANALYZE checkpoints;
VACUUM ANALYZE checkpoint_writes;

Indexing

Create indexes for better query performance:
-- Index on thread_id for faster lookups
CREATE INDEX IF NOT EXISTS idx_checkpoints_thread_id 
ON checkpoints(thread_id);

-- Index on created_at for cleanup queries
CREATE INDEX IF NOT EXISTS idx_checkpoints_created_at 
ON checkpoints(created_at);

-- Composite index for writes
CREATE INDEX IF NOT EXISTS idx_checkpoint_writes_composite 
ON checkpoint_writes(thread_id, checkpoint_id);

Troubleshooting

Connection Errors

Issue: ValueError: Required environment variable VECTOR_DATABASE_URL is not set Solution: Set the environment variable in your .env file:
VECTOR_DATABASE_URL=postgresql://user:password@localhost:5432/dbname

Permission Errors

Issue: permission denied to create table Solution: Ensure the database user has CREATE privileges:
GRANT CREATE ON DATABASE dbname TO your_user;

PgBouncer Compatibility

Issue: Errors with prepared statements when using PgBouncer Solution: The connection is already configured with prepare_threshold: 0 (see src/copilot/graph.py:33). Ensure PgBouncer is in session mode:
# pgbouncer.ini
pool_mode = session

Checkpoint Size

Issue: Checkpoints growing too large with long conversations Solution: Implement message pruning in your application logic:
def prune_messages(state: AgentState, max_messages: int = 20) -> AgentState:
    """Keep only the most recent messages to limit checkpoint size."""
    if len(state["messages"]) > max_messages:
        # Keep system message + recent messages
        system_msgs = [m for m in state["messages"] if m.type == "system"]
        recent_msgs = state["messages"][-max_messages:]
        state["messages"] = system_msgs + recent_msgs
    return state

Migration from Memory-Based Checkpointing

If migrating from MemorySaver:
# Before (memory-based, not persistent)
from langgraph.checkpoint.memory import MemorySaver
checkpointer = MemorySaver()

# After (PostgreSQL-based, persistent)
from langgraph.checkpoint.postgres import PostgresSaver
from psycopg import Connection
import src.copilot.config as config

conn = Connection.connect(
    config.VECTOR_DATABASE_URL,
    prepare_threshold=0,
    autocommit=True
)
checkpointer = PostgresSaver(conn)
checkpointer.setup()
Benefits:
  • Conversations persist across server restarts
  • Scales to multiple instances (shared state)
  • Full conversation history available for analytics

Monitoring

Monitor checkpoint health:
-- Count checkpoints per thread
SELECT thread_id, COUNT(*) as checkpoint_count
FROM checkpoints
GROUP BY thread_id
ORDER BY checkpoint_count DESC
LIMIT 10;

-- Average checkpoint size
SELECT AVG(pg_column_size(checkpoint)) as avg_size_bytes
FROM checkpoints;

-- Most active threads (by checkpoint count)
SELECT thread_id, COUNT(*) as activity
FROM checkpoints
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY thread_id
ORDER BY activity DESC
LIMIT 10;

Best Practices

  1. Use Descriptive Thread IDs: Include user/session identifiers for easier debugging
  2. Implement Cleanup: Regularly delete old checkpoints to manage database size
  3. Monitor Size: Track checkpoint sizes to detect conversation bloat
  4. Use Connection Pools: Essential for production deployments with multiple workers
  5. Enable SSL: Use sslmode=require for production database connections
  6. Backup Regularly: Checkpoint data is valuable conversation history

Build docs developers (and LLMs) love