Summary Generation Agent

Overview

The Summary Agent generates 1-2 sentence summaries of emergency calls to help dispatchers quickly understand the situation. It uses OpenAI’s GPT models when available, with a smart heuristic fallback.

Source Code

Location: app/agents/summary.py

Public API

generate_summary()

async def generate_summary(
    transcript: str, 
    category: str, 
    tags: list[str]
) -> str:
    """
    Generate a concise dispatcher-friendly summary.
    Uses OpenAI API if available, falls back to heuristic summary otherwise.
    
    Args:
        transcript: Full text from speech-to-text
        category: Service category (EMS, FIRE, POLICE, OTHER)
        tags: Semantic tags like ["CARDIAC_EVENT", "TRAUMA"]
        
    Returns:
        A 1-2 sentence summary of the emergency situation
    """

See source code at line 41-47.

Usage Examples

Basic Usage

from app.agents.summary import generate_summary

summary = await generate_summary(
    transcript="My dad is clutching his chest and saying he can't breathe. He's 65 and has a history of heart problems.",
    category="EMS",
    tags=["CARDIAC_EVENT", "BREATHING_DIFFICULTY"]
)

print(summary)
# "Caller reports 65-year-old father experiencing chest pain and difficulty breathing with cardiac history."

Fire Emergency

summary = await generate_summary(
    transcript="There's smoke coming from the apartment above mine. I can smell it and hear the alarm going off. I don't know if anyone's up there.",
    category="FIRE",
    tags=["SMOKE", "FIRE"]
)

print(summary)
# "Caller reports smoke and fire alarm from upstairs apartment with unknown occupancy."

Police Emergency

summary = await generate_summary(
    transcript="Someone's breaking into my neighbor's house. I can see them through the window. They just broke the glass on the back door.",
    category="POLICE",
    tags=["BURGLARY"]
)

print(summary)
# "Caller reports burglary in progress at neighbor's residence with suspect visible breaking glass."

Integration with Pipeline

from app.agents.service_classify import classify_service_and_tags
from app.agents.summary import generate_summary

async def analyze_call(transcript: str, distress: float):
    # Get service classification first
    service = classify_service_and_tags(transcript, distress)
    
    # Generate summary using classification results
    summary = await generate_summary(
        transcript=transcript,
        category=service['category'],
        tags=service['tags']
    )
    
    return {
        "service": service['category'],
        "tags": service['tags'],
        "summary": summary
    }

OpenAI Implementation

Prompt Engineering

prompt = (
    "You are an emergency dispatcher assistant. "
    "Summarize the caller's situation in 1–2 clear, factual sentences. "
    "Avoid speculation. Include critical details. "
    f"Category: {category}. Tags: {', '.join(tags)}.\n\n"
    f"Transcript:\n{transcript}"
)

See line 56-62.

API Request

resp = await _client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}],
    max_tokens=80,      # Keep summaries concise
    temperature=0,      # Deterministic output
)

content = resp.choices[0].message.content
return content.strip()

See line 65-72.

Model Configuration

# Default model
model="gpt-4o-mini"  # Fast and cost-effective

# Alternative models
# model="gpt-4o"      # More accurate but slower/expensive
# model="gpt-5-nano"  # Fastest, lower quality

Heuristic Fallback

When OpenAI is unavailable, a simple extractive summary is used:

def heuristic_summary(transcript: str) -> str:
    """
    Safe fallback when no OpenAI API key is available.
    Extracts a short clean first-sentence summary.
    """
    if not transcript:
        return "No transcript available."
    
    text = transcript.strip()
    
    # normalize whitespace
    text = re.sub(r"\s+", " ", text)
    
    # find first sentence
    sentences = re.split(r"[.?!]", text)
    first_sentence = sentences[0].strip()
    
    if not first_sentence:
        return text[:120] + "..."
    
    # cap length
    if len(first_sentence) > 200:
        return first_sentence[:200] + "..."
    
    return first_sentence

See line 14-38. Example:

# Input transcript
"My mom fell down the stairs and she's not responding. She hit her head really hard. There's some blood. I don't know what to do."

# Heuristic output (first sentence)
"My mom fell down the stairs and she's not responding"

Configuration

Environment Variables

# OpenAI API Key (required for AI summaries)
OPENAI_API_KEY=sk-...

Client Initialization

from openai import AsyncOpenAI

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Create client once if API key exists
_client: Optional[AsyncOpenAI] = None
if OPENAI_API_KEY:
    _client = AsyncOpenAI(api_key=OPENAI_API_KEY)

See line 6-11.

Runtime Refresh

def refresh_client() -> bool:
    """
    Refresh the OpenAI client if API key is now available.
    
    Useful for runtime configuration changes.
    
    Returns:
        True if client was successfully created, False otherwise
    """
    global _client
    api_key = os.getenv("OPENAI_API_KEY")
    if api_key and not _client:
        _client = AsyncOpenAI(api_key=api_key)
        return True
    return False

See line 86-100.

Error Handling

Gracefully falls back to heuristic on any failure:

try:
    resp = await _client.chat.completions.create(...)
    content = resp.choices[0].message.content
    
    # Handle unlikely edge case of None content
    if content is None:
        print("[summary] OpenAI returned None content, using heuristic")
        return heuristic_summary(transcript)
    
    return content.strip()

except Exception as e:
    print(f"[summary] OpenAI failed: {e}")
    return heuristic_summary(transcript)

See line 74-83.

Fallback Behavior

No API Key: Use heuristic from the start
API Error: Catch exception, use heuristic
Timeout: 10s timeout, then heuristic
Empty Response: Use heuristic
Malformed JSON: Use heuristic

The system always returns a summary, even if it’s just the first sentence of the transcript. Dispatchers never see empty states.

Summary Quality

OpenAI Summaries

Strengths:

Extracts key details (age, symptoms, location)
Rephrases for clarity
Removes filler words and repetition
Focuses on actionable information

Example:

Transcript: "Um, so like, my neighbor, I think he's having some kind of emergency. He's like clutching his chest and he looks really pale and sweaty. He's probably in his 60s I'd say."

Summary: "Caller reports 60-year-old neighbor experiencing chest pain with pale appearance and sweating."

Heuristic Summaries

Strengths:

Fast (< 1ms)
Free (no API costs)
Reliable (no network dependency)

Limitations:

May include filler words
Doesn’t extract key details
Sometimes cuts off mid-thought

Example:

Transcript: "Um, so like, my neighbor, I think he's having some kind of emergency."

Summary: "Um, so like, my neighbor, I think he's having some kind of emergency"

For development and testing, the heuristic fallback is sufficient. For production with real dispatchers, use OpenAI for higher quality summaries.

Performance

Latency

Method	Typical	Max
OpenAI	500-1500ms	2-3s
Heuristic	< 1ms	5ms

Cost

OpenAI pricing (gpt-4o-mini):

Input: ~$0.00015 per call (100 tokens)
Output: ~$0.00006 per summary (40 tokens)
Total: ~ $0.0002 per call ($ 0.20 per 1000 calls)

Token Usage

max_tokens=80  # Typical summary is 30-60 tokens

Example token counts:

“Caller reports shooting victim with severe bleeding.” = ~8 tokens
“Caller reports 65-year-old father experiencing chest pain and difficulty breathing with cardiac history.” = ~16 tokens

Testing

Unit Tests

import pytest
from app.agents.summary import generate_summary, heuristic_summary

@pytest.mark.asyncio
async def test_generate_summary_medical():
    summary = await generate_summary(
        transcript="My dad is having chest pain and can't breathe",
        category="EMS",
        tags=["CARDIAC_EVENT", "BREATHING_DIFFICULTY"]
    )
    assert len(summary) > 0
    assert len(summary) < 300  # Should be concise

def test_heuristic_summary():
    summary = heuristic_summary(
        "Someone's been shot. They're bleeding badly. We need help now!"
    )
    assert summary == "Someone's been shot"
    assert len(summary) <= 200

def test_heuristic_summary_long():
    long_text = "This is a very long sentence that goes on and on and on for a really long time with lots of unnecessary details about the situation and it just keeps going and going without stopping for quite a while"
    summary = heuristic_summary(long_text)
    assert len(summary) <= 203  # 200 + "..."
    assert summary.endswith("...")

def test_heuristic_summary_empty():
    summary = heuristic_summary("")
    assert summary == "No transcript available."

Integration Tests

@pytest.mark.asyncio
async def test_summary_with_service_classify():
    """Test summary generation using real service classification"""
    from app.agents.service_classify import classify_service_and_tags
    
    transcript = "There's a fire in my kitchen and smoke everywhere"
    
    # Classify first
    service = classify_service_and_tags(transcript, distress=0.9)
    
    # Generate summary with classification context
    summary = await generate_summary(
        transcript=transcript,
        category=service['category'],
        tags=service['tags']
    )
    
    assert service['category'] == 'FIRE'
    assert 'fire' in summary.lower() or 'smoke' in summary.lower()
    assert len(summary) > 0

Mocking OpenAI

from unittest.mock import AsyncMock, patch

@pytest.mark.asyncio
async def test_summary_openai_failure():
    """Test fallback when OpenAI fails"""
    with patch('app.agents.summary._client') as mock_client:
        mock_client.chat.completions.create = AsyncMock(
            side_effect=Exception("API error")
        )
        
        summary = await generate_summary(
            transcript="Emergency situation here",
            category="EMS",
            tags=[]
        )
        
        # Should fall back to heuristic
        assert summary == "Emergency situation here"

Best Practices

1. Provide Classification Context

Always pass category and tags for better summaries:

# Good
summary = await generate_summary(
    transcript=text,
    category="EMS",
    tags=["CARDIAC_EVENT"]
)

# Bad (missing context)
summary = await generate_summary(
    transcript=text,
    category="OTHER",
    tags=[]
)

2. Handle Empty Transcripts

if not transcript or len(transcript) < 10:
    return "Transcript too short for summary"

3. Cache Summaries

from functools import lru_cache
import hashlib

@lru_cache(maxsize=1000)
async def cached_summary(transcript_hash: str, category: str, tags_str: str):
    return await generate_summary(transcript, category, tags)

# Usage
transcript_hash = hashlib.md5(transcript.encode()).hexdigest()
tags_str = ",".join(sorted(tags))
summary = await cached_summary(transcript_hash, category, tags_str)

4. Monitor API Usage

import time

start = time.time()
summary = await generate_summary(...)
latency = time.time() - start

if latency > 3.0:
    print(f"[summary] Slow API response: {latency:.2f}s")

Next Steps

Service Classification

Get category and tags for summaries

NLP Track

Complete text processing pipeline

Pipeline

How agents work together

Overview

Analysis Agents

​Overview

​Source Code

​Public API

​generate_summary()

​Usage Examples

​Basic Usage

​Fire Emergency

​Police Emergency

​Integration with Pipeline

​OpenAI Implementation

​Prompt Engineering

​API Request

​Model Configuration

​Heuristic Fallback

​Configuration

​Environment Variables

​Client Initialization

​Runtime Refresh

​Error Handling

​Fallback Behavior

​Summary Quality

​OpenAI Summaries

​Heuristic Summaries

​Performance

​Latency

​Cost

​Token Usage

​Testing

​Unit Tests

​Integration Tests

​Mocking OpenAI

​Best Practices

​1. Provide Classification Context

​2. Handle Empty Transcripts

​3. Cache Summaries

​4. Monitor API Usage

​Next Steps

Service Classification

NLP Track

Pipeline

Build docs developers (and LLMs) love

Overview

Source Code

Public API

generate_summary()

Usage Examples

Basic Usage

Fire Emergency

Police Emergency

Integration with Pipeline

OpenAI Implementation

Prompt Engineering

API Request

Model Configuration

Heuristic Fallback

Configuration

Environment Variables

Client Initialization

Runtime Refresh

Error Handling

Fallback Behavior

Summary Quality

OpenAI Summaries

Heuristic Summaries

Performance

Latency

Cost

Token Usage

Testing

Unit Tests

Integration Tests

Mocking OpenAI

Best Practices

1. Provide Classification Context

2. Handle Empty Transcripts

3. Cache Summaries

4. Monitor API Usage

Next Steps