robots.txt

Overview

The robots.txt file provides instructions to search engine crawlers about which pages should be indexed. It’s automatically generated and accessible at /robots.txt.

How It Works

The robots.txt file:

Allows all public pages to be crawled
Disallows API routes and private dashboard pages
References the sitemap location
Uses Next.js MetadataRoute.Robots type

Access Points

The robots.txt file is available at:

/robots.txt - Crawler instructions in plain text format

Implementation

The robots.txt is implemented in web/next/src/app/robots.ts:

web/next/src/app/robots.ts

import { MetadataRoute } from "next"
import { config } from "@/lib/config"

export default function robots(): MetadataRoute.Robots {
  const baseUrl = config.app.url

  return {
    rules: [
      {
        userAgent: "*",
        allow: "/",
        disallow: ["/api/", "/dashboard/"],
      },
    ],
    sitemap: `${baseUrl}/sitemap.xml`,
  }
}

Rules

The current configuration:

User Agent

*: Applies to all search engines and crawlers

Allowed

/: All public pages under root (public content)

Disallowed

/api/: API endpoints (internal use only)
/dashboard/: Dashboard routes (authentication required)

Sitemap

Points crawlers to /sitemap.xml for efficient page discovery

Generated Output

The robots.txt file generates:

User-Agent: *
Allow: /
Disallow: /api/
Disallow: /dashboard/

Sitemap: https://yourdomain.com/sitemap.xml

Customization

To customize the robots.txt:

Edit the rules

Modify web/next/src/app/robots.ts to add or remove rules:

return {
  rules: [
    {
      userAgent: "*",
      allow: "/",
      disallow: ["/api/", "/dashboard/", "/admin/"],
    },
  ],
  sitemap: `${baseUrl}/sitemap.xml`,
}

Target specific bots

Add rules for specific user agents:

return {
  rules: [
    {
      userAgent: "*",
      allow: "/",
      disallow: ["/api/", "/dashboard/"],
    },
    {
      userAgent: "Googlebot",
      allow: "/",
      disallow: ["/api/"],
    },
  ],
  sitemap: `${baseUrl}/sitemap.xml`,
}

Add crawl delays

Add crawl delay for rate limiting:

return {
  rules: [
    {
      userAgent: "*",
      allow: "/",
      disallow: ["/api/", "/dashboard/"],
      crawlDelay: 10,
    },
  ],
  sitemap: `${baseUrl}/sitemap.xml`,
}

Common User Agents

User Agent	Description
`*`	All crawlers
`Googlebot`	Google search crawler
`Bingbot`	Bing search crawler
`Slurp`	Yahoo search crawler
`DuckDuckBot`	DuckDuckGo crawler

SEO Best Practices

Block private content: Prevent indexing of sensitive pages
Allow public content: Ensure public pages are crawlable
Reference sitemap: Help crawlers discover all pages
Use specific rules: Target specific bots when needed
Avoid blocking assets: Don’t block CSS, JS, or images needed for rendering

The robots.txt file is a suggestion, not a security measure. Don’t rely on it to protect sensitive content. Use proper authentication instead.

Verification

Verify your robots.txt:

Visit /robots.txt on your site
Check that rules are correctly formatted
Test with Google Search Console’s robots.txt tester
Verify sitemap URL is correct

Troubleshooting

Pages not being crawled

If public pages aren’t being crawled:

Check robots.txt doesn’t block them
Verify the allow rule includes the pages
Check for conflicting disallow rules
Test with Search Console

Private pages being indexed

If private pages are being indexed:

Add them to the disallow array
Use noindex meta tags on those pages
Implement proper authentication
Request removal in Search Console

Sitemap not found

If the sitemap link is broken:

Check config.app.url is set correctly
Verify /sitemap.xml is accessible
Ensure sitemap.ts is implemented

Configuration

Content

SEO & Analytics

Release

Overview

How It Works

Access Points

Implementation

Rules

User Agent

Allowed

Disallowed

Sitemap

Generated Output

Customization

Common User Agents

SEO Best Practices

Verification

Troubleshooting

Build docs developers (and LLMs) love

Configuration

Content

SEO & Analytics

Release

​Overview

​How It Works

​Access Points

​Implementation

​Rules

​User Agent

​Allowed

​Disallowed

​Sitemap

​Generated Output

​Customization

​Common User Agents

​SEO Best Practices

​Verification

​Troubleshooting

Build docs developers (and LLMs) love

Overview

How It Works

Access Points

Implementation

Rules

User Agent

Allowed

Disallowed

Sitemap

Generated Output

Customization

Common User Agents

SEO Best Practices

Verification

Troubleshooting