Getting Structured Output

In this lesson, you’ll learn one of the most powerful features of the Strands SDK: extracting structured data (like JSON) from unstructured text. This is essential for any application that needs to reliably get specific pieces of information from an LLM. We’ll use a Pydantic model to define the exact data schema we want, and the agent’s structured_output method will do the heavy lifting of forcing the LLM’s output into that schema.

Why Structured Output Matters

Data Extraction

Extract specific fields from natural language text

Type Safety

Get validated, type-safe data instead of raw strings

Integration

Easily integrate LLM output with databases and APIs

Reliability

Automatic validation and error handling

Use Cases

Form Extraction

Extract structured information from resumes, applications, or forms:

class Resume(BaseModel):
    name: str
    email: str
    phone: str
    years_experience: int
    skills: list[str]

Entity Recognition

Identify and extract entities from documents:

class DocumentEntities(BaseModel):
    people: list[str]
    organizations: list[str]
    locations: list[str]
    dates: list[str]

Sentiment Analysis

Analyze and categorize text:

class SentimentAnalysis(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"]
    confidence: float
    key_phrases: list[str]

Data Normalization

Convert free-form text into standardized formats:

class ContactInfo(BaseModel):
    name: str
    email: EmailStr  # Validates email format
    phone: str
    country_code: str

Key Concepts

1. Pydantic Schema

We define a PersonInfo class using Pydantic. This class acts as a blueprint for the data we want. By defining fields like name: str and age: int, we tell the agent the exact keys and data types it must return.

2. Field Descriptions

Adding a description to each field helps the LLM understand what to extract. This is especially important for ambiguous or complex fields.

3. `structured_output_model` Parameter

Instead of getting a string response, we pass a Pydantic model to the agent. The agent then returns a validated instance of that model.

4. Validated Output

The method returns a Pydantic object, not just a string. The data is already:

Parsed
Validated
Type-checked
Ready to use in your application

If the LLM fails to return data that matches the schema, Strands will automatically handle the error, often by retrying or raising an exception.

Implementation

Step 1: Import Dependencies

import os
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from strands import Agent
from strands.models.litellm import LiteLLMModel

# Load environment variables from a .env file
load_dotenv()

Step 2: Define the Pydantic Schema

class PersonInfo(BaseModel):
    """A Pydantic model to represent structured information about a person."""
    
    name: str = Field(..., description="The full name of the person.")
    age: int = Field(..., description="The age of the person.")
    occupation: str = Field(..., description="The current occupation of the person.")

Use descriptive field names and detailed descriptions. The LLM uses these to understand what information to extract!

Step 3: Create the Agent

def main():
    """
    Main function to demonstrate structured data extraction.
    """
    # Configure the language model
    model = LiteLLMModel(
        client_args={"api_key": os.getenv("NEBIUS_API_KEY")},
        model_id="nebius/zai-org/GLM-4.5",
    )
    
    # Create the data extraction agent
    agent = Agent(
        model=model,
        system_prompt="You are an expert assistant that extracts structured information about people from text based on the provided schema.",
        structured_output_model=PersonInfo,
    )

The structured_output_model parameter tells the agent to always return data in the PersonInfo format.

Step 4: Extract Structured Data

    # Unstructured text containing the information we want to extract
    text_to_process = (
        "John Smith is a 30-year-old software engineer living in San Francisco."
    )
    
    print(f"--- Extracting information from text ---\n")
    print(f'Input Text: "{text_to_process}"\n')
    
    # Use the agent to extract the data
    try:
        result = agent(text_to_process)
        person_info: PersonInfo = result.structured_output
        
        print("--- Extraction Successful ---")
        print(f"Name: {person_info.name}")
        print(f"Age: {person_info.age}")
        print(f"Occupation: {person_info.occupation}")
        
    except Exception as e:
        print("--- Extraction Failed ---")
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    main()

Running the Example

Set up environment

Create a .env file with your API key:

NEBIUS_API_KEY=your_api_key_here

Install dependencies

pip install strands pydantic python-dotenv

Run the script

python main.py

Expected Output

--- Extracting information from text ---

Input Text: "John Smith is a 30-year-old software engineer living in San Francisco."

--- Extraction Successful ---
Name: John Smith
Age: 30
Occupation: software engineer

The agent successfully extracted all three fields and returned a validated PersonInfo object!

Advanced Pydantic Features

Optional Fields

from typing import Optional

class PersonInfo(BaseModel):
    name: str = Field(..., description="The full name of the person.")
    age: Optional[int] = Field(None, description="The age of the person if mentioned.")
    occupation: str = Field(..., description="The current occupation of the person.")

Default Values

class PersonInfo(BaseModel):
    name: str = Field(..., description="The full name of the person.")
    age: int = Field(0, description="The age of the person.")
    country: str = Field("Unknown", description="The country of residence.")

Nested Models

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class PersonInfo(BaseModel):
    name: str
    age: int
    address: Address  # Nested model!

Lists and Complex Types

from typing import List, Literal

class PersonInfo(BaseModel):
    name: str
    age: int
    skills: List[str] = Field([], description="List of professional skills")
    employment_status: Literal["employed", "unemployed", "self-employed"]

Try It Yourself

Experiment 1: Extract Multiple People

Create a model for multiple people:

class PeopleList(BaseModel):
    people: List[PersonInfo]

text = """In our team, we have Alice (28, designer), Bob (35, developer), 
and Carol (42, project manager)."""

Experiment 2: Add More Fields

Extend the PersonInfo model:

class PersonInfo(BaseModel):
    name: str
    age: int
    occupation: str
    location: str = Field(..., description="City or country of residence")
    years_experience: Optional[int] = Field(None, description="Years in current role")

Experiment 3: Extract from Real Data

Try extracting from a resume or LinkedIn profile:

resume_text = """
Jane Doe
Senior Data Scientist
[email protected] | (555) 123-4567

Experience: 8 years in machine learning and data analysis
Skills: Python, TensorFlow, SQL, AWS
Education: PhD in Computer Science, MIT
"""

Experiment 4: Handle Missing Data

Test with incomplete information:

text = "John is a developer"  # Missing age
# The agent will try to extract what it can

Validation and Error Handling

Pydantic Validators

from pydantic import BaseModel, Field, field_validator

class PersonInfo(BaseModel):
    name: str
    age: int
    occupation: str
    
    @field_validator('age')
    @classmethod
    def validate_age(cls, v):
        if v < 0 or v > 120:
            raise ValueError('Age must be between 0 and 120')
        return v

Handling Extraction Failures

try:
    result = agent(text_to_process)
    person_info = result.structured_output
    # Use the extracted data
except ValueError as e:
    print(f"Validation error: {e}")
except Exception as e:
    print(f"Extraction failed: {e}")

Production Tips

For production applications:

Always validate extracted data before using it in critical systems
Add retry logic for extraction failures
Log extraction attempts for debugging
Consider using confidence scores for important fields
Test with edge cases and malformed input

What You Learned

How to define Pydantic models for structured data extraction
How to use structured_output_model with AWS Strands agents
How to extract and validate structured data from unstructured text
How to use Pydantic features like optional fields and validators
How to handle extraction errors gracefully

Next Steps

You can now extract structured data from text! But what if you want to give your agent access to external tools and services? In the next lesson, you’ll learn how to integrate MCP (Model Context Protocol) servers to dramatically expand your agent’s capabilities.

Lesson 04: MCP Agent

Learn how to connect your agent to external tools using the Model Context Protocol

Getting Started

Project Categories

Courses

Lesson 03: Structured Output

Getting Structured Output

Why Structured Output Matters

Data Extraction

Type Safety

Integration

Reliability

Use Cases

Key Concepts

1. Pydantic Schema

2. Field Descriptions

3. `structured_output_model` Parameter

4. Validated Output

Implementation

Step 1: Import Dependencies

Step 2: Define the Pydantic Schema

Step 3: Create the Agent

Step 4: Extract Structured Data

Running the Example

Expected Output

Advanced Pydantic Features

Optional Fields

Default Values

Nested Models

Lists and Complex Types

Try It Yourself

Validation and Error Handling

Pydantic Validators

Handling Extraction Failures

Production Tips

What You Learned

Next Steps

Lesson 04: MCP Agent

Resources

Video Tutorial

Pydantic Documentation

Build docs developers (and LLMs) love

Getting Started

Project Categories

Courses

​Getting Structured Output

​Why Structured Output Matters

Data Extraction

Type Safety

Integration

Reliability

​Use Cases

​Key Concepts

​1. Pydantic Schema

​2. Field Descriptions

​3. structured_output_model Parameter

​4. Validated Output

​Implementation

​Step 1: Import Dependencies

​Step 2: Define the Pydantic Schema

​Step 3: Create the Agent

​Step 4: Extract Structured Data

​Running the Example

​Expected Output

​Advanced Pydantic Features

​Optional Fields

​Default Values

​Nested Models

​Lists and Complex Types

​Try It Yourself

​Validation and Error Handling

​Pydantic Validators

​Handling Extraction Failures

​Production Tips

​What You Learned

​Next Steps

Lesson 04: MCP Agent

​Resources

Video Tutorial

Pydantic Documentation

Build docs developers (and LLMs) love

Getting Structured Output

Why Structured Output Matters

Use Cases

Key Concepts

1. Pydantic Schema

2. Field Descriptions

3. `structured_output_model` Parameter

4. Validated Output

Implementation

Step 1: Import Dependencies

Step 2: Define the Pydantic Schema

Step 3: Create the Agent

Step 4: Extract Structured Data

Running the Example

Expected Output

Advanced Pydantic Features

Optional Fields

Default Values

Nested Models

Lists and Complex Types

Try It Yourself

Validation and Error Handling

Pydantic Validators

Handling Extraction Failures

Production Tips

What You Learned

Next Steps

Resources