Working with Chat Models

Chat models are the core interface for working with conversational AI. They take messages as input and return AI-generated messages as output.

Basic Usage

All LangChain chat models implement a common interface:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

# Initialize a chat model
model = ChatOpenAI(model="gpt-4")

# Single message
response = model.invoke("Tell me a joke")
print(response.content)

# Multiple messages with system prompt
messages = [
    SystemMessage(content="You are a helpful coding assistant."),
    HumanMessage(content="How do I reverse a list in Python?")
]

response = model.invoke(messages)
print(response.content)

Message Types

LangChain supports several message types:

from langchain_core.messages import SystemMessage

# Set the behavior and context for the AI
system_msg = SystemMessage(
    content="You are an expert in Python programming."
)

Model Configuration

Configure model parameters to control output:

from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-4",
    temperature=0.7,        # Creativity (0-2, default 1)
    max_tokens=500,         # Maximum response length
    timeout=30,             # Request timeout in seconds
    max_retries=2,          # Number of retries on failure
    api_key="your-key",     # Or set OPENAI_API_KEY env var
)

Temperature Settings

Low (0.0-0.3)
Medium (0.4-0.7)
High (0.8-1.5)

Use for factual, deterministic outputs:

model = ChatOpenAI(temperature=0.0)
# Good for: code generation, data extraction, factual QA

Balanced creativity and consistency:

model = ChatOpenAI(temperature=0.5)
# Good for: general chat, explanations, summaries

Maximum creativity:

model = ChatOpenAI(temperature=1.0)
# Good for: creative writing, brainstorming, diverse outputs

Multi-Provider Support

Swap between providers easily:

from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-4o",
    api_key="your-openai-key"
)

Conversation History

Maintain context across multiple turns:

from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

# Initialize conversation with system prompt
messages = [
    SystemMessage(content="You are a helpful assistant.")
]

# First turn
messages.append(HumanMessage(content="My name is Alice"))
response = model.invoke(messages)
messages.append(response)

# Second turn - model remembers context
messages.append(HumanMessage(content="What's my name?"))
response = model.invoke(messages)
print(response.content)  # "Your name is Alice"

# Continue the conversation
messages.append(response)

Function/Tool Calling

Chat models can call functions to access external data:

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
def get_current_weather(location: str) -> str:
    """Get the current weather for a location.
    
    Args:
        location: The city name.
    """
    # Mock implementation
    return f"Sunny, 72°F in {location}"

# Bind tools to model
model = ChatOpenAI(model="gpt-4")
model_with_tools = model.bind_tools([get_current_weather])

# Model will call the tool when appropriate
response = model_with_tools.invoke(
    "What's the weather in San Francisco?"
)

if response.tool_calls:
    tool_call = response.tool_calls[0]
    print(f"Tool: {tool_call['name']}")
    print(f"Args: {tool_call['args']}")
    # Execute the tool and continue conversation

Structured Output

Generate structured data using Pydantic models:

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

class Person(BaseModel):
    """Information about a person."""
    name: str = Field(description="Person's full name")
    age: int = Field(description="Person's age in years")
    occupation: str = Field(description="Person's job or profession")

model = ChatOpenAI(model="gpt-4")
structured_model = model.with_structured_output(Person)

# Returns a Pydantic model instance
person = structured_model.invoke(
    "John Doe is a 35 year old software engineer."
)

print(person.name)        # "John Doe"
print(person.age)         # 35
print(person.occupation)  # "Software Engineer"

JSON Mode

For flexible JSON output:

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4")
json_model = model.with_structured_output(method="json_mode")

response = json_model.invoke(
    "Extract the key entities from: Alice works at Acme Corp in NYC"
)

print(response)
# {"person": "Alice", "organization": "Acme Corp", "location": "NYC"}

Batch Processing

Process multiple inputs efficiently:

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4")

# Batch invoke
inputs = [
    "What is 2+2?",
    "What is the capital of France?",
    "Who wrote Hamlet?"
]

responses = model.batch(inputs)
for response in responses:
    print(response.content)

Async Execution

Use async for better performance with concurrent requests:

import asyncio
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4")

async def process_queries():
    queries = [
        "What is Python?",
        "What is JavaScript?",
        "What is Rust?"
    ]
    
    # Process all queries concurrently
    tasks = [model.ainvoke(query) for query in queries]
    responses = await asyncio.gather(*tasks)
    
    for query, response in zip(queries, responses):
        print(f"Q: {query}")
        print(f"A: {response.content}\n")

# Run async function
await process_queries()

Async Batch

async def batch_process():
    inputs = ["Question 1", "Question 2", "Question 3"]
    
    # Async batch processing
    responses = await model.abatch(inputs)
    return responses

results = await batch_process()

Caching Responses

Cache model responses to reduce costs and latency:

from langchain_openai import ChatOpenAI
from langchain_core.caches import InMemoryCache
from langchain_core.globals import set_llm_cache

# Enable caching
set_llm_cache(InMemoryCache())

model = ChatOpenAI(model="gpt-4")

# First call - hits the API
response1 = model.invoke("What is LangChain?")

# Second identical call - uses cache (much faster)
response2 = model.invoke("What is LangChain?")

Fallbacks and Retry

Handle failures gracefully:

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic

# Primary model
primary = ChatOpenAI(model="gpt-4")

# Fallback model if primary fails
fallback = ChatAnthropic(model="claude-3-5-sonnet-20241022")

# Create model with fallback
model_with_fallback = primary.with_fallbacks([fallback])

# Will try primary first, then fallback on error
response = model_with_fallback.invoke("Hello!")

Usage Metadata

Track token usage and costs:

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4")
response = model.invoke("Explain quantum computing")

# Access usage metadata
if response.usage_metadata:
    print(f"Input tokens: {response.usage_metadata.input_tokens}")
    print(f"Output tokens: {response.usage_metadata.output_tokens}")
    print(f"Total tokens: {response.usage_metadata.total_tokens}")

Best Practices

Set appropriate temperature

Use low temperature (0-0.3) for deterministic tasks, higher (0.7-1.0) for creative tasks.

Manage conversation history

Trim old messages to stay within context limits and reduce costs.

Use streaming for long outputs

Stream responses for better UX. See Streaming guide.

Handle errors properly

Use fallbacks and retries for production reliability.

Monitor token usage

Track usage metadata to optimize costs and performance.

Next Steps

Learn about Streaming for real-time responses
Explore Output Parsing for structured data
Build Agents with tool-calling capabilities
Check LangSmith for observability

Get Started

Core Concepts

Guides

Advanced

Working with Chat Models

Basic Usage

Message Types

Model Configuration

Temperature Settings

Multi-Provider Support

Conversation History

Function/Tool Calling

Structured Output

JSON Mode

Batch Processing

Async Execution

Async Batch

Caching Responses

Fallbacks and Retry

Usage Metadata

Best Practices

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Advanced

​Basic Usage

​Message Types

​Model Configuration

​Temperature Settings

​Multi-Provider Support

​Conversation History

​Function/Tool Calling

​Structured Output

​JSON Mode

​Batch Processing

​Async Execution

​Async Batch

​Caching Responses

​Fallbacks and Retry

​Usage Metadata

​Best Practices

​Next Steps

Build docs developers (and LLMs) love

Basic Usage

Message Types

Model Configuration

Temperature Settings

Multi-Provider Support

Conversation History

Function/Tool Calling

Structured Output

JSON Mode

Batch Processing

Async Execution

Async Batch

Caching Responses

Fallbacks and Retry

Usage Metadata

Best Practices

Next Steps