Skip to main content
Search on docs.github.com is powered by Elasticsearch. When a user types a query, the server calls Elasticsearch and returns ranked results. This page explains how the search index is built, how to run the pipeline locally, and how the search API works.

Search types

The site supports two search modes:

General search

Returns docs pages matching the query, sorted by popularity. Served from the /api/search/v1 endpoint. Example: query clone returns URLs to docs pages about cloning repositories.

AI search autocomplete

Returns human-readable full-sentence questions that best match the query. Based on previous searches and popular pages. Served from the /api/search/ai-search-autocomplete/v1 endpoint. Example: query How do I clone returns How do I clone a repository?
You can query the general search endpoint directly:
https://docs.github.com/search?version=<VERSION>&language=<LANGUAGE>&query=<QUERY>
  • VERSION: a numbered GHES version (e.g. 3.12), ghec, or dotcom
  • LANGUAGE: one of es, ja, pt, zh, ru, fr, ko, de
  • QUERY: any alphanumeric string

Architecture

Elasticsearch stores pre-built indexes that the server queries at runtime. Indexes are populated through a two-step pipeline:
  1. Scrape — fetch each page’s content via the Article API and write structured JSON records to disk
  2. Index — upload those JSON records into Elasticsearch
The scrape step calls the Article API (/api/article?pathname=<path>) on a locally running server for each indexable page. Each record includes title, intro, breadcrumbs, headings, content (plain text, not HTML), and a unique objectID (the page permalink).
{
  "objectID": "/en/actions/creating-actions/about-custom-actions",
  "breadcrumbs": "GitHub Actions / Creating actions",
  "title": "About custom actions",
  "headings": "About custom actions\nTypes of actions\n...",
  "content": "Actions are individual tasks that you can combine...",
  "intro": "Actions are individual tasks that you can combine...",
  "toplevel": "GitHub Actions",
  "popularity": 0
}
The objectID is set explicitly to the page permalink. This guarantees that subsequent indexing runs overwrite existing records rather than creating duplicates.

Environment configuration

VariableDescription
ELASTICSEARCH_URLURL of the Elasticsearch cluster. Required for search tests and manual indexing. Example: http://localhost:9200/
Set this in your .env file for local development:
ELASTICSEARCH_URL=http://localhost:9200/

Running the pipeline manually

Run the scrape and index steps separately, or together using the combined command.
1

Start the scrape server

The scrape server is a production-mode instance of the docs app running on port 4002 with minimal rendering enabled:
npm run general-search-scrape-server
This sets MINIMAL_RENDER=true and CHANGELOG_DISABLED=true to reduce memory usage during scraping.
2

Scrape page content

In a separate terminal, run the scrape script against the running server:
npm run general-search-scrape -- <scrape-directory>
To scrape a specific language and version only:
npx tsx src/search/scripts/scrape/scrape-cli.ts -l en -V fpt <scrape-directory>
The script writes one JSON file per page into the target directory.
3

Index scraped records into Elasticsearch

Upload the scraped records to Elasticsearch:
npm run index-general-search -- <scrape-directory>
Combined command — starts the scrape server automatically, waits for it to be ready, runs the scrape, then exits:
npm run general-search-scrape-with-server
The combined command sets --max_old_space_size=8192. Full scrapes across all versions and languages can take 40+ minutes and require significant memory.
For GHES release scraping specifically:
npm run ghes-release-scrape-with-server

AI search autocomplete

AI search autocomplete data comes from an internal data repository, not from scraping. Clone github/docs-internal-data to the root of the docs directory, then index:
npm run index-ai-search-autocomplete -- docs-internal-data

Text analysis

To analyze how Elasticsearch processes text (useful for debugging relevance issues):
npm run analyze-text

Running search tests

Search tests require a running Elasticsearch instance:
npm run test:search
This sets ELASTICSEARCH_URL=http://localhost:9200/ automatically via the test script. Language tests that involve search also need the variable:
npm run test:languages

Production workflow

In production, search indexes are rebuilt automatically by GitHub Actions:
WorkflowScheduleScope
index-general-search.ymlEvery 4 hoursAll versions and languages
index-autocomplete-search.ymlDailyAI autocomplete data
You can manually trigger either workflow from the Actions tab. For urgent index updates after merging to main, trigger index-general-search.yml with a specific version and language to reduce run time (a single version/language takes 5–10 minutes versus ~40 minutes for all).
When manually triggering, run only free-pro-team@latest + en first, then expand to other versions and languages as needed.

Key files

PathDescription
src/search/components/Search.tsxBrowser-side search input component
src/search/components/SearchResults.tsxBrowser-side search results rendering
src/search/middleware/general-search-middleware.tsServer-side entrypoint for /search page
src/search/middleware/search-routes/API route handlers for search endpoints
src/search/scripts/scrape/Scrape scripts and lib/build-records-from-api.ts
src/search/scripts/index/Indexing scripts for general search and autocomplete
src/search/scripts/analyze-text.tsText analysis utility
src/search/tests/Search tests (require ELASTICSEARCH_URL)

Search features

  • Typo tolerance — Elasticsearch returns results even for misspelled queries.
  • Advanced query syntax — Supports exact matching with quotes ("exact phrase") and term exclusion with a minus sign (-excluded). Enabled in the browser client.
  • Multilingual — Indexes exist for each supported language. Search respects the language of the current docs URL.
  • Weighted attributes — Title is ranked higher than body content.
  • Version-scoped — Each query targets the index for the requested GitHub product version.
There is a lag of up to 4 hours between content changes merging to main and those changes appearing in search results, due to the indexing schedule.

Build docs developers (and LLMs) love