Skip to main content

Getting Structured Output

In this lesson, you’ll learn one of the most powerful features of the Strands SDK: extracting structured data (like JSON) from unstructured text. This is essential for any application that needs to reliably get specific pieces of information from an LLM. We’ll use a Pydantic model to define the exact data schema we want, and the agent’s structured_output method will do the heavy lifting of forcing the LLM’s output into that schema.

Why Structured Output Matters

Data Extraction

Extract specific fields from natural language text

Type Safety

Get validated, type-safe data instead of raw strings

Integration

Easily integrate LLM output with databases and APIs

Reliability

Automatic validation and error handling

Use Cases

Extract structured information from resumes, applications, or forms:
class Resume(BaseModel):
    name: str
    email: str
    phone: str
    years_experience: int
    skills: list[str]
Identify and extract entities from documents:
class DocumentEntities(BaseModel):
    people: list[str]
    organizations: list[str]
    locations: list[str]
    dates: list[str]
Analyze and categorize text:
class SentimentAnalysis(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"]
    confidence: float
    key_phrases: list[str]
Convert free-form text into standardized formats:
class ContactInfo(BaseModel):
    name: str
    email: EmailStr  # Validates email format
    phone: str
    country_code: str

Key Concepts

1. Pydantic Schema

We define a PersonInfo class using Pydantic. This class acts as a blueprint for the data we want. By defining fields like name: str and age: int, we tell the agent the exact keys and data types it must return.

2. Field Descriptions

Adding a description to each field helps the LLM understand what to extract. This is especially important for ambiguous or complex fields.

3. structured_output_model Parameter

Instead of getting a string response, we pass a Pydantic model to the agent. The agent then returns a validated instance of that model.

4. Validated Output

The method returns a Pydantic object, not just a string. The data is already:
  • Parsed
  • Validated
  • Type-checked
  • Ready to use in your application
If the LLM fails to return data that matches the schema, Strands will automatically handle the error, often by retrying or raising an exception.

Implementation

Step 1: Import Dependencies

import os
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from strands import Agent
from strands.models.litellm import LiteLLMModel

# Load environment variables from a .env file
load_dotenv()

Step 2: Define the Pydantic Schema

class PersonInfo(BaseModel):
    """A Pydantic model to represent structured information about a person."""
    
    name: str = Field(..., description="The full name of the person.")
    age: int = Field(..., description="The age of the person.")
    occupation: str = Field(..., description="The current occupation of the person.")
Use descriptive field names and detailed descriptions. The LLM uses these to understand what information to extract!

Step 3: Create the Agent

def main():
    """
    Main function to demonstrate structured data extraction.
    """
    # Configure the language model
    model = LiteLLMModel(
        client_args={"api_key": os.getenv("NEBIUS_API_KEY")},
        model_id="nebius/zai-org/GLM-4.5",
    )
    
    # Create the data extraction agent
    agent = Agent(
        model=model,
        system_prompt="You are an expert assistant that extracts structured information about people from text based on the provided schema.",
        structured_output_model=PersonInfo,
    )
The structured_output_model parameter tells the agent to always return data in the PersonInfo format.

Step 4: Extract Structured Data

    # Unstructured text containing the information we want to extract
    text_to_process = (
        "John Smith is a 30-year-old software engineer living in San Francisco."
    )
    
    print(f"--- Extracting information from text ---\n")
    print(f'Input Text: "{text_to_process}"\n')
    
    # Use the agent to extract the data
    try:
        result = agent(text_to_process)
        person_info: PersonInfo = result.structured_output
        
        print("--- Extraction Successful ---")
        print(f"Name: {person_info.name}")
        print(f"Age: {person_info.age}")
        print(f"Occupation: {person_info.occupation}")
        
    except Exception as e:
        print("--- Extraction Failed ---")
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    main()

Running the Example

1

Set up environment

Create a .env file with your API key:
NEBIUS_API_KEY=your_api_key_here
2

Install dependencies

pip install strands pydantic python-dotenv
3

Run the script

python main.py

Expected Output

--- Extracting information from text ---

Input Text: "John Smith is a 30-year-old software engineer living in San Francisco."

--- Extraction Successful ---
Name: John Smith
Age: 30
Occupation: software engineer
The agent successfully extracted all three fields and returned a validated PersonInfo object!

Advanced Pydantic Features

Optional Fields

from typing import Optional

class PersonInfo(BaseModel):
    name: str = Field(..., description="The full name of the person.")
    age: Optional[int] = Field(None, description="The age of the person if mentioned.")
    occupation: str = Field(..., description="The current occupation of the person.")

Default Values

class PersonInfo(BaseModel):
    name: str = Field(..., description="The full name of the person.")
    age: int = Field(0, description="The age of the person.")
    country: str = Field("Unknown", description="The country of residence.")

Nested Models

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class PersonInfo(BaseModel):
    name: str
    age: int
    address: Address  # Nested model!

Lists and Complex Types

from typing import List, Literal

class PersonInfo(BaseModel):
    name: str
    age: int
    skills: List[str] = Field([], description="List of professional skills")
    employment_status: Literal["employed", "unemployed", "self-employed"]

Try It Yourself

Create a model for multiple people:
class PeopleList(BaseModel):
    people: List[PersonInfo]

text = """In our team, we have Alice (28, designer), Bob (35, developer), 
and Carol (42, project manager)."""
Extend the PersonInfo model:
class PersonInfo(BaseModel):
    name: str
    age: int
    occupation: str
    location: str = Field(..., description="City or country of residence")
    years_experience: Optional[int] = Field(None, description="Years in current role")
Try extracting from a resume or LinkedIn profile:
resume_text = """
Jane Doe
Senior Data Scientist
[email protected] | (555) 123-4567

Experience: 8 years in machine learning and data analysis
Skills: Python, TensorFlow, SQL, AWS
Education: PhD in Computer Science, MIT
"""
Test with incomplete information:
text = "John is a developer"  # Missing age
# The agent will try to extract what it can

Validation and Error Handling

Pydantic Validators

from pydantic import BaseModel, Field, field_validator

class PersonInfo(BaseModel):
    name: str
    age: int
    occupation: str
    
    @field_validator('age')
    @classmethod
    def validate_age(cls, v):
        if v < 0 or v > 120:
            raise ValueError('Age must be between 0 and 120')
        return v

Handling Extraction Failures

try:
    result = agent(text_to_process)
    person_info = result.structured_output
    # Use the extracted data
except ValueError as e:
    print(f"Validation error: {e}")
except Exception as e:
    print(f"Extraction failed: {e}")

Production Tips

For production applications:
  • Always validate extracted data before using it in critical systems
  • Add retry logic for extraction failures
  • Log extraction attempts for debugging
  • Consider using confidence scores for important fields
  • Test with edge cases and malformed input

What You Learned

  • How to define Pydantic models for structured data extraction
  • How to use structured_output_model with AWS Strands agents
  • How to extract and validate structured data from unstructured text
  • How to use Pydantic features like optional fields and validators
  • How to handle extraction errors gracefully

Next Steps

You can now extract structured data from text! But what if you want to give your agent access to external tools and services? In the next lesson, you’ll learn how to integrate MCP (Model Context Protocol) servers to dramatically expand your agent’s capabilities.

Lesson 04: MCP Agent

Learn how to connect your agent to external tools using the Model Context Protocol

Resources

Video Tutorial

Watch Lesson 03 on YouTube

Pydantic Documentation

Learn more about Pydantic

Build docs developers (and LLMs) love