Skip to main content

delete()

Delete documents by ID. This operation permanently removes documents from the collection.

Signature

def delete(self, ids: Union[str, list[str]]) -> Union[Status, list[Status]]

Parameters

ids
Union[str, list[str]]
required
One or more document IDs to delete.

Returns

Status
Union[Status, list[Status]]
  • If a single ID is provided: returns a single Status object
  • If a list is provided: returns a list[Status] with one status per ID
Each Status indicates success or failure for that deletion.

Basic Example

import zvec

# Delete a single document
status = collection.delete("doc_123")
if status.ok():
    print("Document deleted successfully")
else:
    print(f"Delete failed: {status.message()}")

Batch Deletion

Delete multiple documents efficiently:
# Delete multiple documents by ID
ids_to_delete = ["doc_1", "doc_2", "doc_3", "doc_4", "doc_5"]
statuses = collection.delete(ids_to_delete)

# Check results
success_count = sum(1 for s in statuses if s.ok())
print(f"Successfully deleted {success_count}/{len(ids_to_delete)} documents")

Delete from Query Results

Delete documents found by a query:
from zvec import VectorQuery

# Find similar documents
results = collection.query(
    vectors=VectorQuery("embedding", id="doc_spam"),
    topk=100,
    filter="category == 'spam'"
)

# Delete them
ids_to_delete = [doc.id for doc in results]
statuses = collection.delete(ids_to_delete)

delete_by_filter()

Delete all documents matching a filter expression. This is more efficient than fetching IDs and calling delete() for large-scale deletions.

Signature

def delete_by_filter(self, filter: str) -> None

Parameters

filter
str
required
Boolean expression defining which documents to delete.Examples:
  • "status == 'archived'"
  • "created_date < '2023-01-01'"
  • "category IN ['spam', 'test'] AND verified == false"

Returns

This method does not return a value. It raises an exception if the operation fails.

Example

# Delete all documents in a category
collection.delete_by_filter("category == 'spam'")

# Delete old documents
collection.delete_by_filter("created_timestamp < 1640995200")

# Complex filter
collection.delete_by_filter(
    "(status == 'draft' OR status == 'archived') AND views < 10"
)
delete_by_filter() is irreversible and can delete many documents at once. Test your filter expression carefully:
# Test filter by querying first
results = collection.query(
    vectors=VectorQuery("embedding", vector=[0] * 128),
    topk=10,
    filter="status == 'draft'"
)
print(f"Filter matches {len(results)} documents")

# If satisfied, proceed with deletion
collection.delete_by_filter("status == 'draft'")

Error Handling

Common Delete Errors

from zvec import StatusCode

status = collection.delete("doc_123")

if not status.ok():
    if status.code() == StatusCode.NOT_FOUND:
        print("Document does not exist (may already be deleted)")
    else:
        print(f"Error: {status.message()}")
Deleting a non-existent document typically returns a NOT_FOUND status, but this is not an error in most use cases. The end result (document doesn’t exist) is the same.

Handling Batch Delete Failures

ids = ["doc_1", "doc_2", "doc_3"]
statuses = collection.delete(ids)

# Find which deletions failed
failed = [
    (doc_id, status) for doc_id, status in zip(ids, statuses) if not status.ok()
]

if failed:
    print(f"{len(failed)} deletions failed:")
    for doc_id, status in failed:
        print(f"  ID {doc_id}: {status.message()}")

Reclaiming Space

Deleted documents don’t immediately free up disk space. To reclaim storage:
# Delete documents
collection.delete_by_filter("category == 'spam'")

# Optimize to compact segments and free space
from zvec import OptimizeOption

collection.optimize(OptimizeOption())
collection.flush()
Run optimize() periodically after large deletions to:
  • Reclaim disk space
  • Improve query performance
  • Reduce memory usage

Performance Tips

Use delete_by_filter() for Bulk Deletions

# Good: Direct filter-based deletion
collection.delete_by_filter("status == 'archived'")

# Bad: Fetch IDs then delete
results = collection.query(
    vectors=VectorQuery("embedding", vector=[0] * 128),
    topk=10000,
    filter="status == 'archived'"
)
ids = [doc.id for doc in results]
collection.delete(ids)

Batch Delete Operations

# Good: Delete in a single batch
ids = get_ids_to_delete()  # Returns 10,000 IDs
collection.delete(ids)

# Bad: Individual deletions
for doc_id in ids:
    collection.delete(doc_id)

Optimize After Large Deletions

# Delete 50% of documents
collection.delete_by_filter("created_year < 2020")

# Optimize to reclaim space and rebuild indices
collection.optimize()

Deletion Strategies

Soft Delete Pattern

Instead of permanently deleting, mark documents as deleted:
from zvec import Doc

# Add a "deleted" field to your schema
schema = zvec.CollectionSchema(
    name="articles",
    fields=[
        zvec.FieldSchema("deleted", zvec.DataType.BOOL, nullable=False),
        # ... other fields
    ]
)

# Soft delete: mark as deleted
collection.update(Doc(id="doc_123", fields={"deleted": True}))

# Query excludes deleted documents
results = collection.query(
    vectors=query,
    filter="deleted == false",
    topk=10
)

# Permanently delete later
collection.delete_by_filter("deleted == true AND deleted_date < '2024-01-01'")

Archival Pattern

Move old documents to an archive collection before deletion:
import zvec

# Query old documents
old_docs = collection.query(
    vectors=VectorQuery("embedding", vector=[0] * 128),
    topk=10000,
    filter="created_year < 2020",
    include_vector=True
)

# Move to archive
archive = zvec.open("./archive_collection")
archive.insert(old_docs)

# Delete from main collection
ids = [doc.id for doc in old_docs]
collection.delete(ids)

Atomic Guarantees

Single document deletions are atomic, but batch deletions are not transactional:
  • If you delete ["doc_1", "doc_2", "doc_3"] and it fails on doc_2, then doc_1 may be deleted while doc_3 remains.
  • Always check the returned Status list to verify which deletions succeeded.
ids = ["doc_1", "doc_2", "doc_3"]
statuses = collection.delete(ids)

# Check which succeeded
for doc_id, status in zip(ids, statuses):
    if status.ok():
        print(f"✓ Deleted {doc_id}")
    else:
        print(f"✗ Failed to delete {doc_id}: {status.message()}")

See Also

Build docs developers (and LLMs) love