Indexers in Azure AI Search

Indexers are crawlers that extract searchable data from supported Azure data sources and populate search indexes automatically.

What are Indexers?

Indexers provide:

Automated ingestion: Pull data from supported sources
Field mapping: Map source fields to index fields
Change detection: Incremental updates
Scheduling: Periodic refresh (as frequent as every 5 minutes)
AI enrichment: Apply skillsets for transformation

Supported Data Sources

Azure Blob Storage

Index documents from blob containers

Azure Cosmos DB

Index from NoSQL, MongoDB, Gremlin

Azure SQL

Index from SQL Database and Managed Instance

SharePoint Online

Index documents and sites (preview)

OneLake

Index from Microsoft Fabric lakehouses

Azure Table Storage

Index from Table Storage

Indexer Workflow

Stages

Document cracking: Open files and extract content
Field mapping: Map source to destination fields
Skillset execution: Apply AI skills (optional)
Output field mapping: Map skill outputs to index fields

Create an Indexer

1. Create Data Source

{
  "name": "my-blob-datasource",
  "type": "azureblob",
  "credentials": {
    "connectionString": "DefaultEndpointsProtocol=https;..."
  },
  "container": {
    "name": "documents"
  }
}

2. Create Indexer

{
  "name": "my-indexer",
  "dataSourceName": "my-blob-datasource",
  "targetIndexName": "my-index",
  "schedule": {
    "interval": "PT2H"
  },
  "parameters": {
    "maxFailedItems": 10,
    "maxFailedItemsPerBatch": 5
  }
}

Scheduling

Run indexers on a schedule:

{
  "schedule": {
    "interval": "PT2H",
    "startTime": "2024-01-01T00:00:00Z"
  }
}

Intervals:

Minimum: PT5M (5 minutes)
Maximum: P1D (1 day)
Format: ISO 8601 duration

Field Mappings

Map source fields to index fields:

{
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_path",
      "targetFieldName": "id",
      "mappingFunction": {
        "name": "base64Encode"
      }
    }
  ]
}

Mapping functions:

base64Encode/base64Decode
extractTokenAtPosition
jsonArrayToStringCollection
urlEncode/urlDecode

AI Enrichment with Skillsets

Apply AI transformations during indexing:

{
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "textSplitMode": "pages",
      "maximumPageLength": 4000,
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "textItems",
          "targetName": "chunks"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "deploymentId": "text-embedding-ada-002",
      "inputs": [
        {
          "name": "text",
          "source": "/document/chunks/*"
        }
      ],
      "outputs": [
        {
          "name": "embedding",
          "targetName": "vector"
        }
      ]
    }
  ]
}

Monitoring

Track indexer execution:

Status: Success, Failed, InProgress
Execution history: Past runs and outcomes
Error details: Failed document information
Metrics: Documents processed, latency

Change Detection

Indexers detect and process only changed documents:

Azure SQL: High water mark change detection
Cosmos DB: _ts timestamp
Blob Storage: Last modified date

Best Practices

Batch Size

Adjust batch size based on document complexity and size

Error Handling

Configure maxFailedItems and maxFailedItemsPerBatch tolerances

Scheduling

Balance freshness needs with resource utilization

Monitoring

Set up alerts for indexer failures

Next Steps

Skillsets

Add AI enrichment

Blob Indexing

Index from blob storage

Getting Started

Core Concepts

Agentic Retrieval

Indexing

Queries

Indexers

Indexers in Azure AI Search

What are Indexers?

Supported Data Sources

Azure Blob Storage

Azure Cosmos DB

Azure SQL

SharePoint Online

OneLake

Azure Table Storage

Indexer Workflow

Stages

Create an Indexer

1. Create Data Source

2. Create Indexer

Scheduling

Field Mappings

AI Enrichment with Skillsets

Monitoring

Change Detection

Best Practices

Next Steps

Skillsets

Blob Indexing

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Agentic Retrieval

Indexing

Queries

​Indexers in Azure AI Search

​What are Indexers?

​Supported Data Sources

Azure Blob Storage

Azure Cosmos DB

Azure SQL

SharePoint Online

OneLake

Azure Table Storage

​Indexer Workflow

​Stages

​Create an Indexer

​1. Create Data Source

​2. Create Indexer

​Scheduling

​Field Mappings

​AI Enrichment with Skillsets

​Monitoring

​Change Detection

​Best Practices

​Next Steps

Skillsets

Blob Indexing

Build docs developers (and LLMs) love

Indexers in Azure AI Search

What are Indexers?

Supported Data Sources

Indexer Workflow

Stages

Create an Indexer

1. Create Data Source

2. Create Indexer

Scheduling

Field Mappings

AI Enrichment with Skillsets

Monitoring

Change Detection

Best Practices

Next Steps