Quickstart

Overview

This tutorial walks you through building a complete TypeAgent application in two parts:

Ingesting messages into a conversation database
Querying the indexed content using natural language

By the end, you’ll understand the core TypeAgent workflow and be ready to build more complex applications.

Prerequisites

Before starting, ensure you have:

Python 3.12 or later installed
An OpenAI API key (get one here)
TypeAgent installed (pip install typeagent)

If you need help with installation, see the Installation Guide.

Part 1: Ingesting Messages

Let’s build a simple program that ingests conversation messages and indexes them for querying.

Create Sample Data

Create a file named testdata.txt with sample conversation content:

testdata.txt

STEVE We should really make a Python library for Structured RAG.
UMESH Who would be a good person to do the Python library?
GUIDO I volunteer to do the Python library. Give me a few months.

Each line follows the format: SPEAKER message text

Create Ingestion Script

Create a file named ingest.py:

ingest.py

from dotenv import load_dotenv

from typeagent import create_conversation
from typeagent.transcripts.transcript import (
    TranscriptMessage,
    TranscriptMessageMeta,
)

load_dotenv()  # Load API keys from .env file


def read_messages(filename) -> list[TranscriptMessage]:
    """Parse text file into TranscriptMessage objects."""
    messages: list[TranscriptMessage] = []
    with open(filename, "r") as f:
        for line in f:
            # Split line into speaker and text
            speaker, text_chunk = line.split(None, 1)
            message = TranscriptMessage(
                text_chunks=[text_chunk],
                metadata=TranscriptMessageMeta(speaker=speaker),
            )
            messages.append(message)
    return messages


async def main():
    # Create a conversation with SQLite storage
    conversation = await create_conversation("demo.db", TranscriptMessage)
    
    # Read and index messages
    messages = read_messages("testdata.txt")
    print(f"Indexing {len(messages)} messages...")
    
    # Add messages with automatic knowledge extraction and indexing
    results = await conversation.add_messages_with_indexing(messages)
    
    print(f"Indexed {results.messages_added} messages.")
    print(f"Got {results.semrefs_added} semantic refs.")


if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Set Up Environment

Create a .env file with your OpenAI credentials:

.env

OPENAI_API_KEY=your-api-key-here
OPENAI_MODEL=gpt-4o

Never commit your .env file to version control! Add it to .gitignore.

Run Ingestion

Execute the ingestion script:

python ingest.py

Expected output:

0.027s -- Using OpenAI
Indexing 3 messages...
Indexed 3 messages.
Got 24 semantic refs.

The demo.db file now contains your indexed conversation! TypeAgent extracted entities, topics, and relationships automatically.

Understanding the Ingestion Code

Let’s break down the key components:

# Create conversation with SQLite persistence
conversation = await create_conversation("demo.db", TranscriptMessage)

# For in-memory testing (no persistence):
conversation = await create_conversation(None, TranscriptMessage)

Part 2: Querying the Conversation

Now let’s query the indexed content using natural language.

Create Query Script

Create a file named query.py:

query.py

from dotenv import load_dotenv

from typeagent import create_conversation
from typeagent.transcripts.transcript import TranscriptMessage

load_dotenv()


async def main():
    # Connect to existing conversation database
    conversation = await create_conversation("demo.db", TranscriptMessage)
    
    # Ask a question
    question = "Who volunteered to do the python library?"
    print("Q:", question)
    
    # Query using natural language
    answer = await conversation.query(question)
    print("A:", answer)


if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Run Query

Execute the query script:

python query.py

Expected output:

0.019s -- Using OpenAI
Q: Who volunteered to do the python library?
A: Guido volunteered to do the Python library.

The answer is generated from your indexed content, not from the LLM’s training data!

Understanding Query Results

The query() method:

Translates your natural language question into structured searches
Queries multiple indexes in parallel (entities, topics, semantic similarity)
Fuses results and ranks them by relevance
Generates an answer grounded in your indexed content

If no relevant content is found, the response starts with "No answer found:"

Complete Working Example

Here’s an interactive demo that combines ingestion and querying:

demo.py

#!/usr/bin/env python3
import asyncio

from dotenv import load_dotenv

from typeagent import create_conversation
from typeagent.transcripts.transcript import TranscriptMessage, TranscriptMessageMeta

load_dotenv()


async def main():
    """Interactive TypeAgent demo."""
    print("Creating conversation...")
    conv = await create_conversation(
        None,  # In-memory for this demo
        TranscriptMessage,
        name="Demo Conversation",
    )

    # Add sample messages about Python
    messages = [
        TranscriptMessage(
            text_chunks=["Welcome to the Python programming tutorial."],
            metadata=TranscriptMessageMeta(speaker="Instructor"),
        ),
        TranscriptMessage(
            text_chunks=["Today we'll learn about async/await in Python."],
            metadata=TranscriptMessageMeta(speaker="Instructor"),
        ),
        TranscriptMessage(
            text_chunks=["Python is a great language for beginners and experts alike."],
            metadata=TranscriptMessageMeta(speaker="Instructor"),
        ),
        TranscriptMessage(
            text_chunks=["The async keyword is used to define asynchronous functions."],
            metadata=TranscriptMessageMeta(speaker="Instructor"),
        ),
        TranscriptMessage(
            text_chunks=[
                "You use await to wait for asynchronous operations to complete."
            ],
            metadata=TranscriptMessageMeta(speaker="Instructor"),
        ),
    ]

    print("Adding messages and building indexes...")
    result = await conv.add_messages_with_indexing(messages)
    print(f"Conversation ready with {await conv.messages.size()} messages.")
    print(
        f"Added {result.messages_added} messages, {result.semrefs_added} semantic refs"
    )
    print()

    # Interactive query loop
    print("You can now ask questions about the conversation.")
    print("Type 'quit' or 'exit' to stop.\n")

    while True:
        try:
            question: str = input("typeagent> ")
            if not question.strip():
                continue
            if question.strip().lower() in ("quit", "exit", "q"):
                break

            # Query the conversation
            answer: str = await conv.query(question)
            print(answer)
            print()

        except EOFError:
            print()
            break
        except KeyboardInterrupt:
            print("\nExiting...")
            break


if __name__ == "__main__":
    asyncio.run(main())

Run it:

python demo.py

Try questions like:

“What is this tutorial about?”
“What does the async keyword do?”
“Who is the speaker?”

Advanced: Working with Different Message Types

TypeAgent supports various message types beyond simple transcripts:

ConversationMessage
TranscriptMessage
Multiple Text Chunks

The base message type with full flexibility:

from typeagent.knowpro.universal_message import (
    ConversationMessage,
    ConversationMessageMeta,
)

message = ConversationMessage(
    text_chunks=["First part", "Second part"],
    tags=["important", "meeting"],
    timestamp="2026-03-06T15:30:00z",
    metadata=ConversationMessageMeta(
        speaker="Alice",
        recipients=["Bob", "Charlie"],
    ),
)

Alias for ConversationMessage, optimized for transcripts:

from typeagent.transcripts.transcript import (
    TranscriptMessage,
    TranscriptMessageMeta,
)

message = TranscriptMessage(
    text_chunks=["Spoken text here"],
    metadata=TranscriptMessageMeta(speaker="Speaker"),
)

Split long messages into chunks for better processing:

message = TranscriptMessage(
    text_chunks=[
        "First paragraph of the message.",
        "Second paragraph continues.",
        "Final thoughts in the third chunk.",
    ],
    metadata=TranscriptMessageMeta(speaker="Alice"),
)

Database Storage Options

TypeAgent offers flexible storage backends:

# Data persists across program runs
conv = await create_conversation("my_data.db", TranscriptMessage)

Query Patterns

Here are common query patterns that work well with TypeAgent:

Entity Queries

# Find what specific people said or did
await conv.query("What did Guido say about Python?")
await conv.query("Who volunteered for the project?")

Topic Queries

# Search by subject or theme
await conv.query("What was discussed about async programming?")
await conv.query("Tell me about the budget conversation")

Action Queries

# Find decisions, actions, or commitments
await conv.query("What decisions were made?")
await conv.query("Who agreed to lead the initiative?")

Temporal Queries

# Time-based filtering (requires timestamp metadata)
await conv.query("What happened in March?")
await conv.query("Recent discussions about the API")

Error Handling

Handle common scenarios gracefully:

async def safe_query(conversation, question: str) -> str:
    """Query with error handling."""
    try:
        answer = await conversation.query(question)
        
        # Check for "no answer" responses
        if answer.startswith("No answer found:"):
            print(f"Could not find relevant information for: {question}")
            return None
        
        return answer
        
    except Exception as e:
        print(f"Query failed: {e}")
        return None

Performance Tips

Batch Indexing: Add multiple messages at once for better performance:

# Good: Single batch
results = await conv.add_messages_with_indexing(all_messages)

# Avoid: Multiple single calls
for msg in all_messages:
    await conv.add_messages_with_indexing([msg])  # Slower!

Reuse Conversations: Create the conversation once and reuse it:

# Good: Reuse conversation object
conv = await create_conversation("demo.db", TranscriptMessage)
for question in questions:
    await conv.query(question)

# Avoid: Recreating unnecessarily
for question in questions:
    conv = await create_conversation("demo.db", TranscriptMessage)  # Wasteful!
    await conv.query(question)

Troubleshooting

No answer found for obvious questions

Cause: Insufficient indexed content or poor message structureSolution:

Ensure messages have clear speaker metadata
Add more context in text_chunks
Try rephrasing the query

Slow indexing performance

Cause: LLM API latency or large batch sizesSolution:

Process messages in smaller batches (100-500 at a time)
Use faster models like gpt-4o-mini for testing
Consider parallel processing for large datasets

Database locked errors

Cause: Multiple processes accessing same SQLite databaseSolution:

Use in-memory storage (None) for parallel testing
Implement file locking if needed
Consider a client-server database for production

Next Steps

Now that you’ve built your first TypeAgent application, explore more advanced features:

API Reference

Dive deep into the complete API documentation

Email Ingestion

Learn how to index email conversations

Knowledge Extraction

Understand how AI extracts structured knowledge

Advanced Queries

Master complex query patterns

Additional Resources

You’re now ready to build powerful knowledge processing applications with TypeAgent!

Get Started

Core Concepts

Guides

Overview

Prerequisites

Part 1: Ingesting Messages

Understanding the Ingestion Code

Part 2: Querying the Conversation

Understanding Query Results

Complete Working Example

Advanced: Working with Different Message Types

Database Storage Options

Query Patterns

Error Handling

Performance Tips

Troubleshooting

Next Steps

API Reference

Email Ingestion

Knowledge Extraction

Advanced Queries

Additional Resources

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Overview

​Prerequisites

​Part 1: Ingesting Messages

​Understanding the Ingestion Code

​Part 2: Querying the Conversation

​Understanding Query Results

​Complete Working Example

​Advanced: Working with Different Message Types

​Database Storage Options

​Query Patterns

​Error Handling

​Performance Tips

​Troubleshooting

​Next Steps

API Reference

Email Ingestion

Knowledge Extraction

Advanced Queries

​Additional Resources

Build docs developers (and LLMs) love

Overview

Prerequisites

Part 1: Ingesting Messages

Understanding the Ingestion Code

Part 2: Querying the Conversation

Understanding Query Results

Complete Working Example

Advanced: Working with Different Message Types

Database Storage Options

Query Patterns

Error Handling

Performance Tips

Troubleshooting

Next Steps

Additional Resources