Skip to main content

Overview

The robots.txt file provides instructions to search engine crawlers about which pages should be indexed. It’s automatically generated and accessible at /robots.txt.

How It Works

The robots.txt file:
  • Allows all public pages to be crawled
  • Disallows API routes and private dashboard pages
  • References the sitemap location
  • Uses Next.js MetadataRoute.Robots type

Access Points

The robots.txt file is available at:
  • /robots.txt - Crawler instructions in plain text format

Implementation

The robots.txt is implemented in web/next/src/app/robots.ts:
web/next/src/app/robots.ts
import { MetadataRoute } from "next"
import { config } from "@/lib/config"

export default function robots(): MetadataRoute.Robots {
  const baseUrl = config.app.url

  return {
    rules: [
      {
        userAgent: "*",
        allow: "/",
        disallow: ["/api/", "/dashboard/"],
      },
    ],
    sitemap: `${baseUrl}/sitemap.xml`,
  }
}

Rules

The current configuration:

User Agent

  • *: Applies to all search engines and crawlers

Allowed

  • /: All public pages under root (public content)

Disallowed

  • /api/: API endpoints (internal use only)
  • /dashboard/: Dashboard routes (authentication required)

Sitemap

  • Points crawlers to /sitemap.xml for efficient page discovery

Generated Output

The robots.txt file generates:
User-Agent: *
Allow: /
Disallow: /api/
Disallow: /dashboard/

Sitemap: https://yourdomain.com/sitemap.xml

Customization

To customize the robots.txt:
1
Edit the rules
2
Modify web/next/src/app/robots.ts to add or remove rules:
3
return {
  rules: [
    {
      userAgent: "*",
      allow: "/",
      disallow: ["/api/", "/dashboard/", "/admin/"],
    },
  ],
  sitemap: `${baseUrl}/sitemap.xml`,
}
4
Target specific bots
5
Add rules for specific user agents:
6
return {
  rules: [
    {
      userAgent: "*",
      allow: "/",
      disallow: ["/api/", "/dashboard/"],
    },
    {
      userAgent: "Googlebot",
      allow: "/",
      disallow: ["/api/"],
    },
  ],
  sitemap: `${baseUrl}/sitemap.xml`,
}
7
Add crawl delays
8
Add crawl delay for rate limiting:
9
return {
  rules: [
    {
      userAgent: "*",
      allow: "/",
      disallow: ["/api/", "/dashboard/"],
      crawlDelay: 10,
    },
  ],
  sitemap: `${baseUrl}/sitemap.xml`,
}

Common User Agents

User AgentDescription
*All crawlers
GooglebotGoogle search crawler
BingbotBing search crawler
SlurpYahoo search crawler
DuckDuckBotDuckDuckGo crawler

SEO Best Practices

  1. Block private content: Prevent indexing of sensitive pages
  2. Allow public content: Ensure public pages are crawlable
  3. Reference sitemap: Help crawlers discover all pages
  4. Use specific rules: Target specific bots when needed
  5. Avoid blocking assets: Don’t block CSS, JS, or images needed for rendering
The robots.txt file is a suggestion, not a security measure. Don’t rely on it to protect sensitive content. Use proper authentication instead.

Verification

Verify your robots.txt:
  1. Visit /robots.txt on your site
  2. Check that rules are correctly formatted
  3. Test with Google Search Console’s robots.txt tester
  4. Verify sitemap URL is correct

Troubleshooting

If public pages aren’t being crawled:
  1. Check robots.txt doesn’t block them
  2. Verify the allow rule includes the pages
  3. Check for conflicting disallow rules
  4. Test with Search Console
If private pages are being indexed:
  1. Add them to the disallow array
  2. Use noindex meta tags on those pages
  3. Implement proper authentication
  4. Request removal in Search Console
If the sitemap link is broken:
  1. Check config.app.url is set correctly
  2. Verify /sitemap.xml is accessible
  3. Ensure sitemap.ts is implemented

Build docs developers (and LLMs) love