Skip to main content

Overview

This tutorial walks you through building a complete TypeAgent application in two parts:
  1. Ingesting messages into a conversation database
  2. Querying the indexed content using natural language
By the end, you’ll understand the core TypeAgent workflow and be ready to build more complex applications.

Prerequisites

Before starting, ensure you have:
  • Python 3.12 or later installed
  • An OpenAI API key (get one here)
  • TypeAgent installed (pip install typeagent)
If you need help with installation, see the Installation Guide.

Part 1: Ingesting Messages

Let’s build a simple program that ingests conversation messages and indexes them for querying.
1

Create Sample Data

Create a file named testdata.txt with sample conversation content:
testdata.txt
STEVE We should really make a Python library for Structured RAG.
UMESH Who would be a good person to do the Python library?
GUIDO I volunteer to do the Python library. Give me a few months.
Each line follows the format: SPEAKER message text
2

Create Ingestion Script

Create a file named ingest.py:
ingest.py
from dotenv import load_dotenv

from typeagent import create_conversation
from typeagent.transcripts.transcript import (
    TranscriptMessage,
    TranscriptMessageMeta,
)

load_dotenv()  # Load API keys from .env file


def read_messages(filename) -> list[TranscriptMessage]:
    """Parse text file into TranscriptMessage objects."""
    messages: list[TranscriptMessage] = []
    with open(filename, "r") as f:
        for line in f:
            # Split line into speaker and text
            speaker, text_chunk = line.split(None, 1)
            message = TranscriptMessage(
                text_chunks=[text_chunk],
                metadata=TranscriptMessageMeta(speaker=speaker),
            )
            messages.append(message)
    return messages


async def main():
    # Create a conversation with SQLite storage
    conversation = await create_conversation("demo.db", TranscriptMessage)
    
    # Read and index messages
    messages = read_messages("testdata.txt")
    print(f"Indexing {len(messages)} messages...")
    
    # Add messages with automatic knowledge extraction and indexing
    results = await conversation.add_messages_with_indexing(messages)
    
    print(f"Indexed {results.messages_added} messages.")
    print(f"Got {results.semrefs_added} semantic refs.")


if __name__ == "__main__":
    import asyncio
    asyncio.run(main())
3

Set Up Environment

Create a .env file with your OpenAI credentials:
.env
OPENAI_API_KEY=your-api-key-here
OPENAI_MODEL=gpt-4o
Never commit your .env file to version control! Add it to .gitignore.
4

Run Ingestion

Execute the ingestion script:
python ingest.py
Expected output:
0.027s -- Using OpenAI
Indexing 3 messages...
Indexed 3 messages.
Got 24 semantic refs.
The demo.db file now contains your indexed conversation! TypeAgent extracted entities, topics, and relationships automatically.

Understanding the Ingestion Code

Let’s break down the key components:
# Create conversation with SQLite persistence
conversation = await create_conversation("demo.db", TranscriptMessage)

# For in-memory testing (no persistence):
conversation = await create_conversation(None, TranscriptMessage)

Part 2: Querying the Conversation

Now let’s query the indexed content using natural language.
1

Create Query Script

Create a file named query.py:
query.py
from dotenv import load_dotenv

from typeagent import create_conversation
from typeagent.transcripts.transcript import TranscriptMessage

load_dotenv()


async def main():
    # Connect to existing conversation database
    conversation = await create_conversation("demo.db", TranscriptMessage)
    
    # Ask a question
    question = "Who volunteered to do the python library?"
    print("Q:", question)
    
    # Query using natural language
    answer = await conversation.query(question)
    print("A:", answer)


if __name__ == "__main__":
    import asyncio
    asyncio.run(main())
2

Run Query

Execute the query script:
python query.py
Expected output:
0.019s -- Using OpenAI
Q: Who volunteered to do the python library?
A: Guido volunteered to do the Python library.
The answer is generated from your indexed content, not from the LLM’s training data!

Understanding Query Results

The query() method:
  • Translates your natural language question into structured searches
  • Queries multiple indexes in parallel (entities, topics, semantic similarity)
  • Fuses results and ranks them by relevance
  • Generates an answer grounded in your indexed content
If no relevant content is found, the response starts with "No answer found:"

Complete Working Example

Here’s an interactive demo that combines ingestion and querying:
demo.py
#!/usr/bin/env python3
import asyncio

from dotenv import load_dotenv

from typeagent import create_conversation
from typeagent.transcripts.transcript import TranscriptMessage, TranscriptMessageMeta

load_dotenv()


async def main():
    """Interactive TypeAgent demo."""
    print("Creating conversation...")
    conv = await create_conversation(
        None,  # In-memory for this demo
        TranscriptMessage,
        name="Demo Conversation",
    )

    # Add sample messages about Python
    messages = [
        TranscriptMessage(
            text_chunks=["Welcome to the Python programming tutorial."],
            metadata=TranscriptMessageMeta(speaker="Instructor"),
        ),
        TranscriptMessage(
            text_chunks=["Today we'll learn about async/await in Python."],
            metadata=TranscriptMessageMeta(speaker="Instructor"),
        ),
        TranscriptMessage(
            text_chunks=["Python is a great language for beginners and experts alike."],
            metadata=TranscriptMessageMeta(speaker="Instructor"),
        ),
        TranscriptMessage(
            text_chunks=["The async keyword is used to define asynchronous functions."],
            metadata=TranscriptMessageMeta(speaker="Instructor"),
        ),
        TranscriptMessage(
            text_chunks=[
                "You use await to wait for asynchronous operations to complete."
            ],
            metadata=TranscriptMessageMeta(speaker="Instructor"),
        ),
    ]

    print("Adding messages and building indexes...")
    result = await conv.add_messages_with_indexing(messages)
    print(f"Conversation ready with {await conv.messages.size()} messages.")
    print(
        f"Added {result.messages_added} messages, {result.semrefs_added} semantic refs"
    )
    print()

    # Interactive query loop
    print("You can now ask questions about the conversation.")
    print("Type 'quit' or 'exit' to stop.\n")

    while True:
        try:
            question: str = input("typeagent> ")
            if not question.strip():
                continue
            if question.strip().lower() in ("quit", "exit", "q"):
                break

            # Query the conversation
            answer: str = await conv.query(question)
            print(answer)
            print()

        except EOFError:
            print()
            break
        except KeyboardInterrupt:
            print("\nExiting...")
            break


if __name__ == "__main__":
    asyncio.run(main())
Run it:
python demo.py
Try questions like:
  • “What is this tutorial about?”
  • “What does the async keyword do?”
  • “Who is the speaker?”

Advanced: Working with Different Message Types

TypeAgent supports various message types beyond simple transcripts:
The base message type with full flexibility:
from typeagent.knowpro.universal_message import (
    ConversationMessage,
    ConversationMessageMeta,
)

message = ConversationMessage(
    text_chunks=["First part", "Second part"],
    tags=["important", "meeting"],
    timestamp="2026-03-06T15:30:00z",
    metadata=ConversationMessageMeta(
        speaker="Alice",
        recipients=["Bob", "Charlie"],
    ),
)

Database Storage Options

TypeAgent offers flexible storage backends:
# Data persists across program runs
conv = await create_conversation("my_data.db", TranscriptMessage)

Query Patterns

Here are common query patterns that work well with TypeAgent:
# Find what specific people said or did
await conv.query("What did Guido say about Python?")
await conv.query("Who volunteered for the project?")
# Search by subject or theme
await conv.query("What was discussed about async programming?")
await conv.query("Tell me about the budget conversation")
# Find decisions, actions, or commitments
await conv.query("What decisions were made?")
await conv.query("Who agreed to lead the initiative?")
# Time-based filtering (requires timestamp metadata)
await conv.query("What happened in March?")
await conv.query("Recent discussions about the API")

Error Handling

Handle common scenarios gracefully:
async def safe_query(conversation, question: str) -> str:
    """Query with error handling."""
    try:
        answer = await conversation.query(question)
        
        # Check for "no answer" responses
        if answer.startswith("No answer found:"):
            print(f"Could not find relevant information for: {question}")
            return None
        
        return answer
        
    except Exception as e:
        print(f"Query failed: {e}")
        return None

Performance Tips

Batch Indexing: Add multiple messages at once for better performance:
# Good: Single batch
results = await conv.add_messages_with_indexing(all_messages)

# Avoid: Multiple single calls
for msg in all_messages:
    await conv.add_messages_with_indexing([msg])  # Slower!
Reuse Conversations: Create the conversation once and reuse it:
# Good: Reuse conversation object
conv = await create_conversation("demo.db", TranscriptMessage)
for question in questions:
    await conv.query(question)

# Avoid: Recreating unnecessarily
for question in questions:
    conv = await create_conversation("demo.db", TranscriptMessage)  # Wasteful!
    await conv.query(question)

Troubleshooting

Cause: Insufficient indexed content or poor message structureSolution:
  • Ensure messages have clear speaker metadata
  • Add more context in text_chunks
  • Try rephrasing the query
Cause: LLM API latency or large batch sizesSolution:
  • Process messages in smaller batches (100-500 at a time)
  • Use faster models like gpt-4o-mini for testing
  • Consider parallel processing for large datasets
Cause: Multiple processes accessing same SQLite databaseSolution:
  • Use in-memory storage (None) for parallel testing
  • Implement file locking if needed
  • Consider a client-server database for production

Next Steps

Now that you’ve built your first TypeAgent application, explore more advanced features:

API Reference

Dive deep into the complete API documentation

Email Ingestion

Learn how to index email conversations

Knowledge Extraction

Understand how AI extracts structured knowledge

Advanced Queries

Master complex query patterns

Additional Resources

You’re now ready to build powerful knowledge processing applications with TypeAgent!

Build docs developers (and LLMs) love