Skip to main content

Overview

Collections are append-only data structures for storing messages and semantic references. They provide both iteration and random access by ordinal number.

Base Protocols

IReadonlyCollection

class IReadonlyCollection[T, TOrdinal](AsyncIterable[T], Protocol)
Base protocol for read-only access to ordered collections.

size

async def size(self) -> int
Get the total number of items in the collection.
count
int
Number of items currently in the collection.

get_item

async def get_item(self, arg: TOrdinal) -> T
Retrieve a single item by its ordinal number (0-based index).
arg
TOrdinal
required
The ordinal/index of the item to retrieve.
item
T
The item at the specified ordinal.
Raises: IndexError if ordinal is out of range.

get_slice

async def get_slice(self, start: int, stop: int) -> list[T]
Retrieve a range of items by ordinal (Python slice semantics).
start
int
required
Starting ordinal (inclusive).
stop
int
required
Ending ordinal (exclusive).
items
list[T]
List of items in the specified range.
Example:
# Get first 10 items
items = await collection.get_slice(0, 10)

# Get items 100-109
items = await collection.get_slice(100, 110)

# Get all items
size = await collection.size()
items = await collection.get_slice(0, size)

get_multiple

async def get_multiple(self, arg: list[TOrdinal]) -> list[T]
Retrieve multiple items by their ordinals.
arg
list[TOrdinal]
required
List of ordinals to retrieve.
items
list[T]
List of items in the same order as the input ordinals.
Example:
# Get specific messages
messages = await collection.get_multiple([0, 5, 10, 15])

Async Iteration

Collections support async iteration:
async for item in collection:
    print(item)

ICollection

class ICollection[T, TOrdinal](IReadonlyCollection[T, TOrdinal], Protocol)
Extends IReadonlyCollection with append operations. Collections are append-only - no deletion or modification.

is_persistent

@property
def is_persistent(self) -> bool
Indicates whether the collection persists across process restarts.
persistent
bool
  • True for SQLite storage
  • False for in-memory storage

append

async def append(self, item: T) -> None
Append a single item to the collection.
item
T
required
The item to append.
Example:
msg = ConversationMessage(
    text_chunks=["Hello world"],
    metadata=ConversationMessageMeta(speaker="Alice")
)
await messages.append(msg)

extend

async def extend(self, items: Iterable[T]) -> None
Append multiple items to the collection.
items
Iterable[T]
required
The items to append.
Default Implementation: Calls append() for each item. SQLite implementations override for batch efficiency. Example:
messages = [
    ConversationMessage(text_chunks=["Hello"], metadata=meta1),
    ConversationMessage(text_chunks=["World"], metadata=meta2),
    ConversationMessage(text_chunks=["!"], metadata=meta3),
]
await collection.extend(messages)

IMessageCollection

class IMessageCollection[TMessage: IMessage](
    ICollection[TMessage, MessageOrdinal],
    Protocol
)
Collection interface for conversation messages. Messages are identified by ordinal numbers (MessageOrdinal = int).

Type Parameters

TMessage
IMessage
The message type (e.g., ConversationMessage, TranscriptMessage).

Usage

from typeagent import create_conversation
from typeagent.knowpro.universal_message import ConversationMessage

conv = await create_conversation(
    dbname="chat.db",
    message_type=ConversationMessage
)

messages: IMessageCollection[ConversationMessage] = conv.messages

# Get collection size
count = await messages.size()
print(f"Total messages: {count}")

# Get first message
if count > 0:
    first_msg = await messages.get_item(0)
    print(f"First message: {first_msg.text_chunks[0]}")

# Get recent messages
recent = await messages.get_slice(max(0, count - 10), count)
for msg in recent:
    print(f"{msg.metadata.speaker}: {msg.text_chunks[0]}")

# Iterate all messages
async for msg in messages:
    print(f"[{msg.timestamp}] {msg.text_chunks[0]}")

Complete Example

from typeagent import create_conversation
from typeagent.knowpro.universal_message import (
    ConversationMessage,
    ConversationMessageMeta
)

# Create conversation
conv = await create_conversation(
    dbname="example.db",
    message_type=ConversationMessage
)

# Access message collection
messages = conv.messages

# Add single message
msg = ConversationMessage(
    text_chunks=["Let's discuss the roadmap."],
    metadata=ConversationMessageMeta(speaker="Alice"),
    tags=["planning"]
)
await messages.append(msg)

# Add multiple messages
new_messages = [
    ConversationMessage(
        text_chunks=["I think we should prioritize performance."],
        metadata=ConversationMessageMeta(speaker="Bob")
    ),
    ConversationMessage(
        text_chunks=["Agreed. Let's profile the hot paths first."],
        metadata=ConversationMessageMeta(speaker="Alice")
    )
]
await messages.extend(new_messages)

# Query by ordinal
msg_0 = await messages.get_item(0)
print(f"Message 0: {msg_0.text_chunks[0]}")

# Get slice
first_three = await messages.get_slice(0, 3)
for i, msg in enumerate(first_three):
    speaker = msg.metadata.speaker or "Unknown"
    print(f"{i}: [{speaker}] {msg.text_chunks[0]}")

# Get specific messages
selected = await messages.get_multiple([0, 2])
print(f"Got {len(selected)} messages")

ISemanticRefCollection

class ISemanticRefCollection(
    ICollection[SemanticRef, SemanticRefOrdinal],
    Protocol
)
Collection interface for semantic references (extracted knowledge). Semantic references are identified by ordinal numbers (SemanticRefOrdinal = int).

What are Semantic References?

Semantic references link text locations to extracted knowledge:
  • Entities - Named entities (people, places, things)
  • Actions - Verb phrases with subject/object
  • Topics - Subject matter categories
  • Tags - User-defined labels
Each semantic reference contains:
  1. Ordinal - Unique sequential ID
  2. Range - Text location (message and chunk ordinals)
  3. Knowledge - The extracted knowledge object

Usage

from typeagent import create_conversation
from typeagent.knowpro.universal_message import ConversationMessage

conv = await create_conversation(
    dbname="chat.db",
    message_type=ConversationMessage
)

semrefs = conv.semantic_refs

# Get collection size
count = await semrefs.size()
print(f"Total semantic references: {count}")

# Get specific semantic reference
if count > 0:
    semref = await semrefs.get_item(0)
    print(f"Knowledge type: {semref.knowledge.knowledge_type}")
    print(f"Text range: {semref.range}")
    
    # Access knowledge details
    if hasattr(semref.knowledge, 'name'):
        print(f"Entity name: {semref.knowledge.name}")
    elif hasattr(semref.knowledge, 'text'):
        print(f"Topic text: {semref.knowledge.text}")

# Iterate all semantic references
async for semref in semrefs:
    knowledge_type = semref.knowledge.knowledge_type
    print(f"[{semref.semantic_ref_ordinal}] {knowledge_type}: {semref.knowledge}")

Example: Exploring Extracted Knowledge

from typeagent import create_conversation
from typeagent.knowpro.universal_message import (
    ConversationMessage,
    ConversationMessageMeta
)

# Create and populate conversation
conv = await create_conversation(
    dbname="knowledge_demo.db",
    message_type=ConversationMessage
)

messages = [
    ConversationMessage(
        text_chunks=["Alice visited Paris last summer."],
        metadata=ConversationMessageMeta(speaker="Bob")
    ),
    ConversationMessage(
        text_chunks=["She really enjoyed the Louvre museum."],
        metadata=ConversationMessageMeta(speaker="Bob")
    )
]

# Add messages with knowledge extraction
result = await conv.add_messages_with_indexing(messages)
print(f"Extracted {result.semrefs_added} semantic references")

# Explore extracted knowledge
semrefs = conv.semantic_refs

entity_count = 0
action_count = 0
topic_count = 0

async for semref in semrefs:
    ktype = semref.knowledge.knowledge_type
    
    if ktype == "entity":
        entity_count += 1
        entity = semref.knowledge
        print(f"Entity: {entity.name} (types: {entity.type})")
    
    elif ktype == "action":
        action_count += 1
        action = semref.knowledge
        print(f"Action: {action.verbs} - {action.subject_entity_name} -> {action.object_entity_name}")
    
    elif ktype == "topic":
        topic_count += 1
        topic = semref.knowledge
        print(f"Topic: {topic.text}")

print(f"\nSummary: {entity_count} entities, {action_count} actions, {topic_count} topics")

Example: Finding Knowledge by Range

from typeagent.knowpro.interfaces import TextLocation, TextRange

# Get all semantic refs for a specific message
message_ordinal = 5

size = await semrefs.size()
all_semrefs = await semrefs.get_slice(0, size)

# Filter by message
message_knowledge = [
    semref for semref in all_semrefs
    if semref.range.start.message_ordinal == message_ordinal
]

print(f"Message {message_ordinal} has {len(message_knowledge)} knowledge items:")
for semref in message_knowledge:
    print(f"  - {semref.knowledge.knowledge_type}: {semref.knowledge}")

Implementation Classes

MemoryMessageCollection

from typeagent.storage.memory import MemoryMessageCollection
In-memory implementation of IMessageCollection. Fast, non-persistent.

MemorySemanticRefCollection

from typeagent.storage.memory import MemorySemanticRefCollection
In-memory implementation of ISemanticRefCollection. Fast, non-persistent.

SqliteMessageCollection

from typeagent.storage.sqlite import SqliteMessageCollection
SQLite-backed implementation of IMessageCollection. Persistent, transactional.

SqliteSemanticRefCollection

from typeagent.storage.sqlite import SqliteSemanticRefCollection
SQLite-backed implementation of ISemanticRefCollection. Persistent, transactional.
Best Practice: Access collections through the storage provider or conversation object rather than instantiating directly.

Type Aliases

type MessageOrdinal = int
type SemanticRefOrdinal = int
Ordinal numbers are 0-based sequential integers used to reference items in collections.

Performance Tips

Batch Operations

Use extend() instead of multiple append() calls for better performance, especially with SQLite.

Range Queries

Use get_slice() for contiguous ranges rather than get_multiple() with sequential ordinals.

Async Iteration

For large collections, async iteration is more memory-efficient than loading all items.

Size Checks

Cache the result of size() if you need it multiple times in a transaction.

Build docs developers (and LLMs) love