Skip to main content
This quickstart guide walks you through creating a semantic search application with TopK. You’ll learn how to create a collection, add documents, and perform searches.
Before starting, make sure you have:

What you’ll build

By the end of this guide, you’ll have a working search application that can:
1

Create a collection

Set up a collection with semantic search enabled
2

Insert documents

Add sample book data to your collection
3

Search semantically

Query your collection using natural language
4

Get ranked results

Retrieve and optionally rerank the most relevant results

Step 1: Initialize the client

First, create a TopK client with your API key and chosen region.
from topk_sdk import Client

client = Client(
    api_key="YOUR_TOPK_API_KEY",
    region="aws-us-east-1-elastica"
)
Replace YOUR_TOPK_API_KEY with your actual API key from the console.

Step 2: Create a collection

Create a collection called books with a semantic index on the title field. This enables both semantic and keyword search.
from topk_sdk.schema import text, semantic_index

client.collections().create(
    "books",
    schema={
        "title": text().required().index(semantic_index()),
    },
)

What’s happening here?

  • text() - Defines a text field
  • .required() - Makes the field mandatory for all documents
  • .index(semantic_index()) - Creates a semantic index with automatic embeddings
Fields not defined in the schema can still be added to documents. The schema only enforces types and indexes for specified fields.

Step 3: Add documents

Insert some sample book documents into your collection. Each document must have an _id field.
client.collection("books").upsert([
    {"_id": "gatsby", "title": "The Great Gatsby"},
    {"_id": "1984", "title": "1984"},
    {"_id": "catcher", "title": "The Catcher in the Rye"},
])
The upsert() method creates new documents or updates existing ones if a document with the same _id already exists.

Step 4: Search your collection

Now perform a semantic search to find books related to “classic American novel”.
from topk_sdk.query import select, fn, field

results = client.collection("books").query(
    select(
        "title",
        # Calculate semantic similarity between title and query
        title_similarity=fn.semantic_similarity(
            "title", 
            "classic American novel"
        ),
    )
    # Sort by similarity and return top 10 results
    .topk(field("title_similarity"), 10)
)

for doc in results:
    print(f"{doc['title']}: {doc['title_similarity']:.4f}")

Understanding the query

Let’s break down the query:
  1. select() - Defines which fields to return and computed expressions
  2. fn.semantic_similarity() - Computes semantic similarity score between the field and query text
  3. .topk() - Sorts by the similarity score and limits results to top 10

Expected output

The Great Gatsby: 0.8532
The Catcher in the Rye: 0.7891
1984: 0.6234
Scores are similarity values - higher means more relevant. The exact values may vary based on the embedding model.

Step 5: Add reranking (optional)

Improve result relevance by adding reranking to your query:
from topk_sdk.query import select, fn, field

results = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.semantic_similarity(
            "title", 
            "classic American novel"
        ),
    )
    .topk(field("title_similarity"), 10)
    # Add reranking to improve relevance
    .rerank()
)
The .rerank() method uses TopK’s built-in reranking model to refine the results, improving precision for the top results.
Learn more about reranking options in the Reranking guide.

Step 6: Clean up (optional)

When you’re done experimenting, you can delete the collection:
client.collections().delete("books")
Deleting a collection is irreversible. All documents and data in the collection will be permanently deleted.

Complete example

Here’s the full code in one place:
from topk_sdk import Client
from topk_sdk.schema import text, semantic_index
from topk_sdk.query import select, fn, field

# 1. Initialize client
client = Client(
    api_key="YOUR_TOPK_API_KEY",
    region="aws-us-east-1-elastica"
)

# 2. Create collection
client.collections().create(
    "books",
    schema={
        "title": text().required().index(semantic_index()),
    },
)

# 3. Add documents
client.collection("books").upsert([
    {"_id": "gatsby", "title": "The Great Gatsby"},
    {"_id": "1984", "title": "1984"},
    {"_id": "catcher", "title": "The Catcher in the Rye"},
])

# 4. Search with semantic similarity
results = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.semantic_similarity(
            "title", 
            "classic American novel"
        ),
    )
    .topk(field("title_similarity"), 10)
    .rerank()
)

# 5. Display results
for doc in results:
    print(f"{doc['title']}: {doc['title_similarity']:.4f}")

# 6. Clean up (optional)
client.collections().delete("books")

Next steps

Now that you’ve built your first search application, explore more advanced features:

Schema design

Learn about field types, indexes, and schema validation

Advanced queries

Combine filters, keyword search, and semantic search

Vector search

Use custom embeddings for vector similarity search

Keyword search

Perform traditional text matching with BM25 scoring

Build docs developers (and LLMs) love