In this lesson, you’ll learn one of the most powerful features of the Strands SDK: extracting structured data (like JSON) from unstructured text. This is essential for any application that needs to reliably get specific pieces of information from an LLM.We’ll use a Pydantic model to define the exact data schema we want, and the agent’s structured_output method will do the heavy lifting of forcing the LLM’s output into that schema.
We define a PersonInfo class using Pydantic. This class acts as a blueprint for the data we want. By defining fields like name: str and age: int, we tell the agent the exact keys and data types it must return.
class PersonInfo(BaseModel): """A Pydantic model to represent structured information about a person.""" name: str = Field(..., description="The full name of the person.") age: int = Field(..., description="The age of the person.") occupation: str = Field(..., description="The current occupation of the person.")
Use descriptive field names and detailed descriptions. The LLM uses these to understand what information to extract!
def main(): """ Main function to demonstrate structured data extraction. """ # Configure the language model model = LiteLLMModel( client_args={"api_key": os.getenv("NEBIUS_API_KEY")}, model_id="nebius/zai-org/GLM-4.5", ) # Create the data extraction agent agent = Agent( model=model, system_prompt="You are an expert assistant that extracts structured information about people from text based on the provided schema.", structured_output_model=PersonInfo, )
The structured_output_model parameter tells the agent to always return data in the PersonInfo format.
# Unstructured text containing the information we want to extract text_to_process = ( "John Smith is a 30-year-old software engineer living in San Francisco." ) print(f"--- Extracting information from text ---\n") print(f'Input Text: "{text_to_process}"\n') # Use the agent to extract the data try: result = agent(text_to_process) person_info: PersonInfo = result.structured_output print("--- Extraction Successful ---") print(f"Name: {person_info.name}") print(f"Age: {person_info.age}") print(f"Occupation: {person_info.occupation}") except Exception as e: print("--- Extraction Failed ---") print(f"An error occurred: {e}")if __name__ == "__main__": main()
--- Extracting information from text ---Input Text: "John Smith is a 30-year-old software engineer living in San Francisco."--- Extraction Successful ---Name: John SmithAge: 30Occupation: software engineer
The agent successfully extracted all three fields and returned a validated PersonInfo object!
from typing import Optionalclass PersonInfo(BaseModel): name: str = Field(..., description="The full name of the person.") age: Optional[int] = Field(None, description="The age of the person if mentioned.") occupation: str = Field(..., description="The current occupation of the person.")
class PersonInfo(BaseModel): name: str = Field(..., description="The full name of the person.") age: int = Field(0, description="The age of the person.") country: str = Field("Unknown", description="The country of residence.")
class PeopleList(BaseModel): people: List[PersonInfo]text = """In our team, we have Alice (28, designer), Bob (35, developer), and Carol (42, project manager)."""
Experiment 2: Add More Fields
Extend the PersonInfo model:
class PersonInfo(BaseModel): name: str age: int occupation: str location: str = Field(..., description="City or country of residence") years_experience: Optional[int] = Field(None, description="Years in current role")
Experiment 3: Extract from Real Data
Try extracting from a resume or LinkedIn profile:
resume_text = """Jane DoeSenior Data Scientist[email protected] | (555) 123-4567Experience: 8 years in machine learning and data analysisSkills: Python, TensorFlow, SQL, AWSEducation: PhD in Computer Science, MIT"""
Experiment 4: Handle Missing Data
Test with incomplete information:
text = "John is a developer" # Missing age# The agent will try to extract what it can
from pydantic import BaseModel, Field, field_validatorclass PersonInfo(BaseModel): name: str age: int occupation: str @field_validator('age') @classmethod def validate_age(cls, v): if v < 0 or v > 120: raise ValueError('Age must be between 0 and 120') return v
try: result = agent(text_to_process) person_info = result.structured_output # Use the extracted dataexcept ValueError as e: print(f"Validation error: {e}")except Exception as e: print(f"Extraction failed: {e}")
You can now extract structured data from text! But what if you want to give your agent access to external tools and services? In the next lesson, you’ll learn how to integrate MCP (Model Context Protocol) servers to dramatically expand your agent’s capabilities.
Lesson 04: MCP Agent
Learn how to connect your agent to external tools using the Model Context Protocol