Skip to main content
TypeAgent maintains six specialized indexes that enable different query patterns and access methods. Each index serves a specific purpose and is updated incrementally as new knowledge is extracted.

Index Overview

All six indexes are managed through the ConversationSecondaryIndexes class:
from typeagent.knowpro.secindex import ConversationSecondaryIndexes

# Accessed through conversation
secondary_indexes = conversation.secondary_indexes

# Six indexes available:
# 1. semantic_ref_index         - Term → SemanticRef mappings
# 2. property_to_semantic_ref_index - Property → SemanticRef mappings  
# 3. timestamp_index             - Timestamp → Message mappings
# 4. message_index              - Message embedding search
# 5. term_to_related_terms_index - Fuzzy term matching
# 6. threads                    - Conversation threading

1. SemanticRef Index

Purpose: Fast term-based lookup of semantic references (entities, actions, topics).

Structure

class TermToSemanticRefIndex(ITermToSemanticRefIndex):
    _map: dict[str, list[ScoredSemanticRefOrdinal]]
    # Maps lowercase terms to semantic reference ordinals with scores

Operations

# Add entity name to index
await semantic_ref_index.add_term(
    "Alice",                # Term
    semantic_ref_ordinal    # Reference to semantic ref
)

# Stored as:
# _map["alice"] = [ScoredSemanticRefOrdinal(42, 1.0)]

What Gets Indexed

  • Entity names: entity.name
  • Entity types: Each string in entity.type
  • Facet names and values: facet.name and str(facet.value)
  • Action verbs: " ".join(action.verbs)
  • Action entities: Subject, object, indirect object names
  • Topics: topic.text

Storage Backends

from typeagent.storage.memory.semrefindex import TermToSemanticRefIndex

class TermToSemanticRefIndex:
    _map: dict[str, list[ScoredSemanticRefOrdinal]]
    
    # In-memory dictionary
    # Fast lookups: O(1)
    # No persistence
from typeagent.storage.sqlite.semrefindex import SqliteTermToSemanticRefIndex

# Table: SemanticRefIndex
# Columns: term (text), semantic_ref_ordinal (int), score (real)
# Index: CREATE INDEX idx_semref_term ON SemanticRefIndex(term)

# Persistent storage
# Indexed queries
# Transaction support

2. Property Index

Purpose: Structured property queries with name-value pairs.

Structure

class PropertyIndex(IPropertyToSemanticRefIndex):
    _map: dict[str, list[ScoredSemanticRefOrdinal]]
    # Maps "prop.{name}@@{value}" to semantic ref ordinals

Property Names

from typeagent.storage.memory.propindex import PropertyNames

class PropertyNames(enum.Enum):
    EntityName = "name"              # Entity names
    EntityType = "type"              # Entity types  
    FacetName = "facet.name"         # Facet names
    FacetValue = "facet.value"       # Facet values
    Verb = "verb"                    # Action verbs
    Subject = "subject"              # Action subjects
    Object = "object"                # Action objects
    IndirectObject = "indirectObject" # Indirect objects
    Tag = "tag"                      # Message tags
    Topic = "topic"                  # Topics

Operations

# Add entity name property
await property_index.add_property(
    PropertyNames.EntityName.value,  # "name"
    "Alice",
    semantic_ref_ordinal
)

# Stored as:
# _map["prop.name@@alice"] = [ScoredSemanticRefOrdinal(42, 1.0)]

# Add facet property
await property_index.add_property(
    "color",       # Facet name
    "blue",        # Facet value
    semantic_ref_ordinal
)

# Stored as:
# _map["prop.color@@blue"] = [ScoredSemanticRefOrdinal(43, 1.0)]

Why Separate from SemanticRef Index?

The PropertyIndex enables structured queries that the SemanticRef index cannot:
# SemanticRef index: "What mentions 'blue'?"
results = await semantic_ref_index.lookup_term("blue")
# Returns all semantic refs with "blue" anywhere

# Property index: "What entities have color=blue facet?"
results = await property_index.lookup_property(
    PropertyNames.FacetValue.value,
    "blue"
)
# Returns only entities with blue as a facet value

# Property index: "What actions did Alice perform?"
results = await property_index.lookup_property(
    PropertyNames.Subject.value,
    "Alice"
)
# Returns only actions where Alice is the subject

3. Timestamp Index

Purpose: Temporal navigation and time-based filtering.

Structure

class TimestampToTextRangeIndex(ITimestampToTextRangeIndex):
    _timestamp_to_ordinals: dict[str, list[MessageOrdinal]]
    # Maps ISO timestamp strings to message ordinals

Operations

# Add message timestamps
timestamp_data = [
    (message_ordinal_0, "2024-01-15T10:30:00Z"),
    (message_ordinal_1, "2024-01-15T11:45:00Z"),
    (message_ordinal_2, "2024-01-16T09:00:00Z")
]

await timestamp_index.add_timestamps(timestamp_data)

Temporal Scoping

The timestamp index enables time-based search filtering:
from typeagent.knowpro.interfaces import DateRange, WhenFilter

# Create temporal filter
when = WhenFilter(
    date_range=DateRange(
        start=datetime(2024, 1, 15, tzinfo=timezone.utc),
        end=datetime(2024, 1, 16, tzinfo=timezone.utc)
    )
)

# Apply to search
search_expr = SearchSelectExpr(
    search_term_group=term_group,
    when=when  # Restrict to this time range
)

4. Message Text Index

Purpose: Embedding-based semantic similarity search.

Structure

class MessageTextIndex(IMessageTextIndex):
    _embeddings: list[tuple[MessageOrdinal, np.ndarray]]
    # Message ordinals with their embedding vectors
    
    _embedding_model: IEmbeddingModel
    # Model for generating embeddings

Operations

from typeagent.storage.memory.messageindex import MessageTextIndex
from typeagent.knowpro.convsettings import MessageTextIndexSettings

# Create index with embedding settings
settings = MessageTextIndexSettings(
    embedding_index_settings=TextEmbeddingIndexSettings(
        embedding_model=create_embedding_model("openai:text-embedding-3-small")
    )
)

message_index = MessageTextIndex(settings)

# Add messages (automatically generates embeddings)
await message_index.add_messages([
    message1,
    message2,
    message3
])

Embedding Models

TypeAgent supports multiple embedding providers:
  • OpenAI: "openai:text-embedding-3-small", "openai:text-embedding-3-large"
  • Azure: "azure:text-embedding-ada-002"
  • Local models via custom implementations

SQLite Storage

# SQLite table: MessageTextIndex
# Columns:
#   message_ordinal (int)
#   embedding (blob)  - Serialized numpy array
#   embedding_model (text)

# Embeddings are stored as compressed binary blobs
# Retrieved and deserialized for similarity calculations
Purpose: Fuzzy term matching and synonym resolution.

Structure

class RelatedTermsIndex(ITermToRelatedTermsIndex):
    fuzzy_index: FuzzyTermIndex | None
    # Embedding-based term similarity
    
    aliases: dict[str, list[str]]
    # Manual synonym mappings

Operations

# Add terms for fuzzy matching
terms = ["discuss", "talk", "speak", "converse", "chat"]

await related_terms_index.fuzzy_index.add_terms(terms)
# Each term is embedded and stored

Use in Queries

Related terms expand search coverage:
# User searches for "discuss"
original_term = "discuss"

# Find related terms
related = await related_terms_index.find_related(
    original_term,
    max_distance=0.3
)

# Search for original term AND related terms
all_terms = [original_term] + [term for term, _ in related]
# ["discuss", "talk", "speak", "converse"]

# Query all variations
for term in all_terms:
    results = await semantic_ref_index.lookup_term(term)
    # Combine results

6. Conversation Threads

Purpose: Thread organization and context grouping.

Structure

class ConversationThreads(IConversationThreads):
    _threads: dict[str, list[MessageOrdinal]]
    # Thread ID → message ordinals
    
    _message_to_thread: dict[MessageOrdinal, str]
    # Message ordinal → thread ID

Operations

# Create new thread
thread_id = await threads.create_thread(
    name="Project Discussion",
    initial_message_ordinal=0
)

# Add messages to thread
await threads.add_to_thread(
    thread_id,
    message_ordinals=[1, 2, 3]
)
# Search only within a specific thread
thread_id = "project_discussion"
message_ordinals = await threads.get_thread_messages(thread_id)

# Create scope filter
ranges_in_scope = TextRangesInScope()
for ordinal in message_ordinals:
    ranges_in_scope.add_message_ordinal(ordinal)

# Apply to search
scored_refs = await lookup_property_in_property_index(
    property_index,
    PropertyNames.EntityName.value,
    "Alice",
    semantic_refs,
    ranges_in_scope  # Only search thread messages
)

Index Update Flow

All six indexes are updated together during message ingestion:
# From conversation_base.py
async def add_messages_with_indexing(
    self,
    messages: list[TMessage]
) -> AddMessagesResult:
    async with storage:  # Transaction start
        # 1. Add messages to collection
        await self.messages.extend(messages)
        
        # 2. Metadata extraction → SemanticRefIndex + PropertyIndex
        await self._add_metadata_knowledge_incremental(...)
        
        # 3. LLM extraction → SemanticRefIndex + PropertyIndex
        if settings.auto_extract_knowledge:
            await self._add_llm_knowledge_incremental(...)
        
        # 4. Update secondary indexes
        await self._update_secondary_indexes_incremental(...)
        #    - PropertyIndex (from semantic refs)
        #    - TimestampIndex (from message timestamps)
        #    - RelatedTermsIndex (from extracted terms)
        #    - MessageTextIndex (from message text)
        
    # Transaction commit (SQLite) or complete (memory)
For SQLite storage, all six indexes are updated within a single transaction. If any update fails, all changes are rolled back atomically.

Index Persistence

All indexes live in memory:
  • Fast access
  • No disk I/O
  • Lost on process exit
  • Good for:
    • Testing
    • Temporary conversations
    • Performance-critical applications

Performance Characteristics

IndexLookup ComplexityAdd ComplexityStorage
SemanticRefO(1)O(1)O(n terms × refs)
PropertyO(1)O(1)O(n props × refs)
TimestampO(log n)O(1)O(n messages)
MessageTextO(n) similarityO(1)O(n messages × dim)
RelatedTermsO(n) similarityO(1)O(n terms × dim)
ThreadsO(1)O(1)O(n messages)
MessageText and RelatedTerms use linear similarity search. For large datasets, consider using approximate nearest neighbor (ANN) indexes like FAISS or Annoy.

Next Steps

Architecture

See how indexes fit in the overall architecture

Structured RAG

Learn how indexes power multi-stage queries

Knowledge Extraction

Understand what gets indexed

Storage API

Explore the storage provider API

Build docs developers (and LLMs) love