Overview
Therobots.txt file provides instructions to search engine crawlers about which pages should be indexed. It’s automatically generated and accessible at /robots.txt.
How It Works
The robots.txt file:- Allows all public pages to be crawled
- Disallows API routes and private dashboard pages
- References the sitemap location
- Uses Next.js
MetadataRoute.Robotstype
Access Points
The robots.txt file is available at:/robots.txt- Crawler instructions in plain text format
Implementation
The robots.txt is implemented inweb/next/src/app/robots.ts:
web/next/src/app/robots.ts
Rules
The current configuration:User Agent
*: Applies to all search engines and crawlers
Allowed
/: All public pages under root (public content)
Disallowed
/api/: API endpoints (internal use only)/dashboard/: Dashboard routes (authentication required)
Sitemap
- Points crawlers to
/sitemap.xmlfor efficient page discovery
Generated Output
The robots.txt file generates:Customization
To customize the robots.txt:return {
rules: [
{
userAgent: "*",
allow: "/",
disallow: ["/api/", "/dashboard/", "/admin/"],
},
],
sitemap: `${baseUrl}/sitemap.xml`,
}
return {
rules: [
{
userAgent: "*",
allow: "/",
disallow: ["/api/", "/dashboard/"],
},
{
userAgent: "Googlebot",
allow: "/",
disallow: ["/api/"],
},
],
sitemap: `${baseUrl}/sitemap.xml`,
}
Common User Agents
| User Agent | Description |
|---|---|
* | All crawlers |
Googlebot | Google search crawler |
Bingbot | Bing search crawler |
Slurp | Yahoo search crawler |
DuckDuckBot | DuckDuckGo crawler |
SEO Best Practices
- Block private content: Prevent indexing of sensitive pages
- Allow public content: Ensure public pages are crawlable
- Reference sitemap: Help crawlers discover all pages
- Use specific rules: Target specific bots when needed
- Avoid blocking assets: Don’t block CSS, JS, or images needed for rendering
Verification
Verify your robots.txt:- Visit
/robots.txton your site - Check that rules are correctly formatted
- Test with Google Search Console’s robots.txt tester
- Verify sitemap URL is correct
Troubleshooting
Pages not being crawled
Pages not being crawled
If public pages aren’t being crawled:
- Check robots.txt doesn’t block them
- Verify the
allowrule includes the pages - Check for conflicting
disallowrules - Test with Search Console
Private pages being indexed
Private pages being indexed
If private pages are being indexed:
- Add them to the
disallowarray - Use
noindexmeta tags on those pages - Implement proper authentication
- Request removal in Search Console
Sitemap not found
Sitemap not found
If the sitemap link is broken:
- Check
config.app.urlis set correctly - Verify
/sitemap.xmlis accessible - Ensure sitemap.ts is implemented