Skip to main content

Indexers in Azure AI Search

Indexers are crawlers that extract searchable data from supported Azure data sources and populate search indexes automatically.

What are Indexers?

Indexers provide:
  • Automated ingestion: Pull data from supported sources
  • Field mapping: Map source fields to index fields
  • Change detection: Incremental updates
  • Scheduling: Periodic refresh (as frequent as every 5 minutes)
  • AI enrichment: Apply skillsets for transformation

Supported Data Sources

Azure Blob Storage

Index documents from blob containers

Azure Cosmos DB

Index from NoSQL, MongoDB, Gremlin

Azure SQL

Index from SQL Database and Managed Instance

SharePoint Online

Index documents and sites (preview)

OneLake

Index from Microsoft Fabric lakehouses

Azure Table Storage

Index from Table Storage

Indexer Workflow

Stages

  1. Document cracking: Open files and extract content
  2. Field mapping: Map source to destination fields
  3. Skillset execution: Apply AI skills (optional)
  4. Output field mapping: Map skill outputs to index fields

Create an Indexer

1. Create Data Source

{
  "name": "my-blob-datasource",
  "type": "azureblob",
  "credentials": {
    "connectionString": "DefaultEndpointsProtocol=https;..."
  },
  "container": {
    "name": "documents"
  }
}

2. Create Indexer

{
  "name": "my-indexer",
  "dataSourceName": "my-blob-datasource",
  "targetIndexName": "my-index",
  "schedule": {
    "interval": "PT2H"
  },
  "parameters": {
    "maxFailedItems": 10,
    "maxFailedItemsPerBatch": 5
  }
}

Scheduling

Run indexers on a schedule:
{
  "schedule": {
    "interval": "PT2H",
    "startTime": "2024-01-01T00:00:00Z"
  }
}
Intervals:
  • Minimum: PT5M (5 minutes)
  • Maximum: P1D (1 day)
  • Format: ISO 8601 duration

Field Mappings

Map source fields to index fields:
{
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_path",
      "targetFieldName": "id",
      "mappingFunction": {
        "name": "base64Encode"
      }
    }
  ]
}
Mapping functions:
  • base64Encode/base64Decode
  • extractTokenAtPosition
  • jsonArrayToStringCollection
  • urlEncode/urlDecode

AI Enrichment with Skillsets

Apply AI transformations during indexing:
{
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "textSplitMode": "pages",
      "maximumPageLength": 4000,
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "chunks"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "deploymentId": "text-embedding-ada-002",
      "inputs": [
        {
          "name": "text",
          "source": "/document/chunks/*"
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "vector"
        }
      ]
    }
  ]
}

Monitoring

Track indexer execution:
  • Status: Success, Failed, InProgress
  • Execution history: Past runs and outcomes
  • Error details: Failed document information
  • Metrics: Documents processed, latency

Change Detection

Indexers detect and process only changed documents:
  • Azure SQL: High water mark change detection
  • Cosmos DB: _ts timestamp
  • Blob Storage: Last modified date

Best Practices

Adjust batch size based on document complexity and size
Configure maxFailedItems and maxFailedItemsPerBatch tolerances
Balance freshness needs with resource utilization
Set up alerts for indexer failures

Next Steps

Skillsets

Add AI enrichment

Blob Indexing

Index from blob storage

Build docs developers (and LLMs) love