Skip to main content

Knowledge Base

The Knowledge Base is Support Bot’s memory - a vector-powered repository of historical incidents that enables semantic search across millions of records. It uses Qdrant vector database with embeddings to find similar incidents based on meaning, not just keywords.

How It Works

The Knowledge Base transforms unstructured incident data into searchable, semantically-indexed records:
1

Ingestion

Incident data is uploaded via API or ServiceNow integration. The system validates schema, maps fields, and prepares documents for processing.
2

Embedding Generation

Each incident is converted into vector embeddings using the all-MiniLM-L6-v2 model:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2",
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True}
)
3

Vector Storage

Embeddings are stored in Qdrant with rich metadata for filtering and retrieval:
# LangChain format with metadata
metadata = {
    "incident_id": "INC-2025-001",
    "incident_title": "Payment Gateway Timeout",
    "impacted_application": "PaymentAPI",
    "root_cause": "Database connection pool exhausted",
    "mitigation": "Increased pool size to 50",
    "source_system": "ServiceNow"
}
4

Semantic Search

When you search, your query is embedded and compared against stored vectors using cosine similarity to find the most relevant incidents.

Key Features

Unlike keyword search, semantic search understands meaning:
Query: "database timeout"
Matches only: Records containing exactly "database" AND "timeout"
This helps you find relevant incidents even when they use different terminology.

Dataset Versioning

The Knowledge Base supports multiple versions of your incident dataset:
{
  "version_id": "550e8400-e29b-41d4-a716-446655440000",
  "version_number": 3,
  "collection_name": "past_issues_v3",
  "status": "active",
  "incident_count": 1247,
  "created_at": "2025-03-01T10:30:00Z",
  "activated_at": "2025-03-01T11:00:00Z"
}
You can:
  • Create new versions during ingestion
  • Rollback to previous versions
  • Compare performance across versions
  • Delete inactive versions

Batch Ingestion with Progress

Large incident datasets are ingested in batches with real-time progress tracking:
async def ingest_incidents_to_qdrant(
    incidents: List[dict],
    batch_size: int = 5,  # incidents per batch
    progress_callback=None
):
    total_batches = math.ceil(len(incidents) / batch_size)
    
    for batch_idx in range(total_batches):
        # Process batch
        await progress_callback(
            batch_idx + 1,
            total_batches,
            incident_ids
        )
This prevents timeouts and provides visibility into long-running ingestion jobs.

Document Preparation

Incidents are split into chunks for better retrieval:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # characters per chunk
    chunk_overlap=200,    # overlap for context preservation
    length_function=len
)

# Each incident becomes multiple chunks
source_text = f"""
Incident Title: {title}
Incident Description: {description}
Action Taken and Resolution: {action_taken}
"""

chunks = text_splitter.split_text(source_text)
Each chunk maintains metadata linking back to the original incident.

Metadata Extraction

The system automatically extracts structured metadata from ServiceNow incidents:
def _parse_description_metadata(description: str) -> Dict[str, str]:
    """Extract structured fields from incident descriptions."""
    metadata = {}
    
    # Parse fields like:
    # - impactedApplication: PaymentAPI
    # - rootCause: Database timeout
    # - mitigation: Increased connection pool
    # - accountableParty: DevOps Team
    
    for line in description.split('\n'):
        if ':' in line:
            key, value = line.split(':', 1)
            metadata[key.strip()] = value.strip()
    
    return metadata
This enables filtering by application, root cause, team, and more.

Managing Your Knowledge Base

Viewing Incidents

List all incidents in the knowledge base:
GET /api/knowledge-base/incidents
{
  "success": true,
  "incidents": [
    {
      "incident_id": "INC-2025-08-24-001",
      "title": "Payment Gateway Timeout",
      "description": "Users unable to complete payments...",
      "action_taken": "Increased database connection pool...",
      "opened_at": "2025-08-24T10:30:00Z",
      "source": "servicenow"
    }
  ]
}

Uploading Incidents

1

Upload Files

Send your incident data (CSV, JSON, or Excel):
POST /api/knowledge-base/upload
{
  "files": [
    {
      "filename": "incidents_2025.csv",
      "size": 2048576,
      "content": "base64-encoded-content"
    }
  ]
}
The system creates an upload session and returns a preview:
{
  "success": true,
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending_validation",
  "incident_count": 1500,
  "preview": [/* first 10 records */]
}
2

Validate Schema

Run validation to check for required fields:
POST /api/knowledge-base/validate/{session_id}
{
  "success": true,
  "is_valid": false,
  "errors": [
    {
      "field": "incident_id",
      "message": "Missing required field",
      "affected_rows": [1, 5, 12]
    }
  ],
  "suggested_mapping": {
    "id": "incident_id",
    "summary": "title"
  }
}
3

Apply Field Mapping (If Needed)

If your fields don’t match the schema, apply a mapping:
POST /api/knowledge-base/validate/{session_id}/map-fields
{
  "mapping": {
    "id": "incident_id",
    "summary": "title",
    "details": "description",
    "resolution": "action_taken"
  }
}
4

Confirm Ingestion

Start the ingestion process:
POST /api/knowledge-base/ingest
{
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "notes": "Q4 2024 incidents from ServiceNow export"
}
This returns a Server-Sent Events (SSE) stream with progress:
event: progress
data: {"batch": 1, "totalBatches": 300, "message": "Batch 1 of 300 processed"}

event: progress
data: {"batch": 2, "totalBatches": 300, "message": "Batch 2 of 300 processed"}

event: complete
data: {"success": true, "version_id": "...", "incident_count": 1500}

Deleting Incidents

Remove specific incidents from the knowledge base:
DELETE /api/knowledge-base/incidents/{incident_id}
This removes the incident from both Qdrant and the JSON cache.

Version Management

Listing Versions

View all dataset versions:
GET /api/knowledge-base/versions?limit=20&offset=0
{
  "success": true,
  "versions": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "version_number": 3,
      "collection_name": "past_issues_v3",
      "status": "active",
      "is_active": true,
      "incident_count": 1247,
      "source": "upload",
      "created_at": "2025-03-01T10:30:00Z"
    }
  ],
  "total": 5,
  "limit": 20,
  "offset": 0
}

Rolling Back

Revert to a previous version:
POST /api/knowledge-base/versions/{version_id}/rollback
{
  "notes": "Reverting due to bad data in v3"
}
This:
  1. Deactivates the current version
  2. Activates the specified version
  3. Restores the Qdrant collection from snapshot
  4. Updates the AI copilot to use the restored collection
Rolling back doesn’t delete the current version - it simply makes a different version active. You can always roll forward again.

Deleting Versions

Remove inactive versions to free up storage:
DELETE /api/knowledge-base/versions/{version_id}
You cannot delete the currently active version. Activate a different version first.

ServiceNow Integration

The Knowledge Base has special handling for ServiceNow incident exports:
# ServiceNow format detection
if "servicenow" in source.lower():
    # Parse ServiceNow-specific fields
    metadata = {
        "incident_id": inc.get("number"),
        "incident_title": inc.get("short_description"),
        "impacted_application": parse_field("impactedApplication"),
        "root_cause": parse_field("rootCause"),
        "mitigation": parse_field("mitigation"),
        "category": inc.get("category"),
        "priority": inc.get("priority"),
        "source_system": "ServiceNow"
    }
This ensures ServiceNow exports are automatically structured and searchable.

Search Performance

The Knowledge Base uses COSINE distance for vector similarity:
from qdrant_client.models import VectorParams, Distance

client.create_collection(
    collection_name="past_issues_v2",
    vectors_config=VectorParams(
        size=384,              # all-MiniLM-L6-v2 embedding size
        distance=Distance.COSINE
    )
)

Search Tips

The semantic search works best with full sentences:✅ “Payment gateway timing out during checkout”❌ “payment, gateway, timeout”
More context leads to better results:✅ “Users in EU region unable to complete Swift bank transfers”❌ “Swift errors”
Mention specific systems for filtered results:✅ “Database connection issues in the LoanAPI service”❌ “Database problems”

Monitoring Ingestion

Track incident ingestion with logs:
GET /api/knowledge-base/logs?limit=10
{
  "success": true,
  "logs": [
    {
      "id": "log-001",
      "incident_id": "INC-2025-001",
      "title": "Payment Gateway Timeout",
      "source": "servicenow",
      "created_at": "2025-03-01T10:30:00Z"
    }
  ]
}
This helps debug ingestion issues and audit what’s been added.

Best Practices

Data Quality

Complete Fields

Ensure all required fields are populated:
  • incident_id (unique identifier)
  • title (short description)
  • description (detailed info)
  • action_taken (resolution)

Consistent Format

Use consistent date formats, naming conventions, and terminology across all incidents

Rich Metadata

Include structured metadata like application names, teams, and categories

Regular Updates

Keep your knowledge base current by ingesting new incidents regularly

Performance Optimization

  1. Batch Size: Use smaller batches (5-10 incidents) for better progress tracking
  2. Embeddings Cache: The embedding model loads once and reuses for efficiency
  3. Collection Naming: Each version gets its own collection (e.g., past_issues_v2)

Troubleshooting

Ingestion Failing

If ingestion fails with validation errors:
{
  "success": false,
  "message": "Some records are missing required fields (e.g. incident_id, title, or description)"
}
Solution: Apply field mapping or clean your source data.

Search Not Finding Results

If searches return no results:
  1. Check active version: Verify a version is active
  2. Verify embeddings: Ensure incidents were actually ingested
  3. Test with broader queries: Try more general search terms

Slow Search Performance

If searches are slow:
  1. Check Qdrant: Ensure the Qdrant service is running and healthy
  2. Monitor collection size: Very large collections (>1M vectors) may need optimization
  3. Use metadata filters: Pre-filter by application or date before vector search

Next Steps

AI Copilot

Learn how the copilot uses the knowledge base for intelligent search

ServiceNow Integration

Set up automatic sync from ServiceNow to your knowledge base

Vector Search Deep Dive

Understand the technical details of vector embeddings and search

API Reference

Complete API documentation for knowledge base management

Build docs developers (and LLMs) love