Skip to main content

Overview

Technical SEO rules ensure search engines can crawl, index, and understand your website properly. These rules cover infrastructure, crawlability, and server configuration.

Sitemap Rules

Rule: crawl/sitemap-valid

What it checks:
  • XML sitemap exists and is accessible
  • Valid XML format
  • URLs are absolute (not relative)
  • No errors in sitemap structure
  • Sitemap is discoverable in robots.txt
Why it matters: Sitemaps help search engines discover all your pages, especially new or deep content that might not be easily found through normal crawling.
Issue: 8 sitemaps return unknown format errors
# ❌ Bad: Multiple broken sitemaps
/sitemap.xml Unknown format error
/sitemap_index.xml Unknown format error
/sitemap-index.xml Unknown format error
/sitemaps.xml Unknown format error
Fix:
  1. Generate a valid XML sitemap
  2. Serve it at /sitemap.xml only
  3. Remove or redirect all other sitemap URLs
  4. Submit to Google Search Console
<!-- ✅ Good: Valid XML sitemap -->
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-02-09</lastmod>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/about</loc>
    <lastmod>2026-02-01</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>
For large sites (>50,000 URLs), use a sitemap index file that references multiple smaller sitemaps.

Rule: crawl/sitemap-size

What it checks:
  • Sitemap file size under 50MB (uncompressed)
  • No more than 50,000 URLs per sitemap
  • Proper compression (gzip recommended)
<!-- ✅ Good: Sitemap index for large sites -->
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
    <lastmod>2026-02-09</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
    <lastmod>2026-02-08</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2026-02-07</lastmod>
  </sitemap>
</sitemapindex>
Best practices:
  • Split by content type (pages, posts, products)
  • Keep each sitemap under 50,000 URLs
  • Compress with gzip to reduce bandwidth
  • Update lastmod dates when content changes

Robots.txt

Rule: crawl/robots-txt

What it checks:
  • robots.txt exists at /robots.txt
  • Valid syntax
  • Sitemap reference included
  • No accidental disallows
Why it matters: robots.txt controls which parts of your site search engines can crawl. Mistakes here can accidentally block your entire site from search engines.
# ❌ Bad: Blocking entire site
User-agent: *
Disallow: /

# ❌ Bad: No sitemap reference
User-agent: *
Disallow: /admin/

# ✅ Good: Proper robots.txt
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /private/
Allow: /api/public/

Sitemap: https://example.com/sitemap.xml
Never use robots.txt as a security measure. It only asks bots to avoid certain areas—it doesn’t enforce access control.

Redirects & Status Codes

Rule: links/redirect-chains

What it checks:
  • No redirect chains (A → B → C)
  • Redirects use 301 (permanent) or 302 (temporary) correctly
  • No redirect loops
  • Minimal redirect hops
Why it matters: Redirect chains waste crawl budget and slow down page load times. Each redirect adds latency and can dilute PageRank.
# ❌ Bad: Redirect chain (3 hops)
http://example.com https://example.com https://www.example.com https://www.example.com/home

# ✅ Good: Direct redirect (1 hop)
http://example.com https://www.example.com/home
How to fix:
  1. Audit all redirects with a crawler
  2. Update redirects to point directly to final destination
  3. Use 301 for permanent moves, 302 for temporary
  4. Avoid meta refresh redirects (use server-side 301/302)
What it checks:
  • No 404 errors on internal links
  • No 500 server errors
  • All linked resources are accessible
CodeMeaningImpact
200OK✅ Page loads successfully
301Permanent Redirect⚠️ Should point to final URL
302Temporary Redirect⚠️ Okay for short-term moves
404Not Found❌ Broken link, bad UX
410Gone (Permanent)⚠️ Better than 404 for removed content
500Server Error❌ Critical issue, fix immediately
503Service Unavailable❌ Temporary outage

URL Structure

Rule: url/length

What it checks:
  • URLs under 100 characters (optimal)
  • No excessive parameters
  • Clean, readable structure
Why it matters: Short, descriptive URLs are easier to share, remember, and rank better in search results.
# ❌ Bad: Long, unclear URL
https://example.com/products.php?id=12345&category=electronics&filter=new&sort=price&page=2

# ✅ Good: Clean, semantic URL
https://example.com/products/electronics/laptops

# ❌ Bad: Query parameters for core content
https://example.com/article?id=42

# ✅ Good: Descriptive path
https://example.com/blog/10-sales-automation-tips
Best practices:
  • Use hyphens (not underscores) to separate words
  • Keep URLs under 100 characters when possible
  • Use lowercase letters only
  • Include target keywords
  • Avoid special characters and spaces
  • Use semantic hierarchy: /category/subcategory/page

Rule: url/parameters

What it checks:
  • Minimal use of query parameters for content
  • Tracking parameters don’t create duplicate content
  • Proper use of canonical tags with parameters
Google Search Console: URL Parameters ToolTell Google how to handle parameters:
  • sort, filter, page: Changes content → Crawl every URL
  • utm_source, sessionid: Doesn’t change content → Ignore parameter
Canonical tags for duplicate parameters:
<!-- On: /products?sort=price&filter=new -->
<link rel="canonical" href="https://example.com/products">

<!-- On: /products?page=2 -->
<link rel="canonical" href="https://example.com/products?page=2">

Crawlability

Rule: crawl/depth

What it checks:
  • Important pages are within 3 clicks of homepage
  • No orphaned pages (pages with no internal links)
  • Proper internal linking structure
Why it matters: Search engines prioritize pages that are easily accessible from your homepage. Deep pages get crawled less frequently.
Symptoms:
  • Important pages not ranking
  • Low crawl frequency
  • Pages not appearing in search results
Solutions:
  1. Add links from homepage to important pages
  2. Create category/hub pages that link to related content
  3. Use breadcrumbs for navigation
  4. Add internal links within content
  5. Create an HTML sitemap
  6. Fix orphaned pages (add at least one internal link)

Index Status

Rule: crawl/indexability

What it checks:
  • Pages aren’t blocked by robots.txt
  • No noindex meta tags on important pages
  • Pages are accessible to crawlers
  • No authentication walls for public content
<!-- ❌ Bad: Accidentally blocking important page -->
<meta name="robots" content="noindex, nofollow">

<!-- ✅ Good: Allow indexing -->
<meta name="robots" content="index, follow">
<!-- Or omit the tag entirely (default is index, follow) -->

<!-- ✅ Good: Block admin pages only -->
<!-- In robots.txt: -->
User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
When to use noindex:
  • Thank you pages
  • Search results pages
  • Filtering/sorting variants
  • Login/logout pages
  • Draft/preview content

JavaScript & Rendering

Rule: crawl/rendering

What it checks:
  • Content visible without JavaScript
  • Critical content server-side rendered
  • No infinite scroll issues
  • Lazy loading implemented correctly
Why it matters: While Google can execute JavaScript, server-side rendering ensures content is immediately visible to all crawlers and users.
Problem: Thin content (0 words detected)Likely a Single Page Application (SPA) rendering issue—crawlers see empty content.Solutions:
  1. Server-Side Rendering (SSR): Render HTML on server
  2. Static Site Generation (SSG): Pre-render pages at build time
  3. Dynamic Rendering: Serve static HTML to bots, JS to users
  4. Prerendering Service: Use service like Prerender.io
// Example: Next.js SSR
export async function getServerSideProps() {
  const data = await fetchData();
  return { props: { data } };
}

// Example: Next.js SSG
export async function getStaticProps() {
  const data = await fetchData();
  return { props: { data }, revalidate: 3600 };
}

Canonical URL Chains

Rule: core/canonical-chain

What it checks:
  • No canonical chains (Page A → Page B → Page C)
  • Canonical URLs are directly accessible
  • Self-referencing canonicals on original pages
<!-- ❌ Bad: Canonical chain -->
<!-- Page A: -->
<link rel="canonical" href="/page-b">
<!-- Page B: -->
<link rel="canonical" href="/page-c">

<!-- ✅ Good: Direct canonical -->
<!-- Page A: -->
<link rel="canonical" href="/page-c">
<!-- Page B: -->
<link rel="canonical" href="/page-c">
<!-- Page C: -->
<link rel="canonical" href="https://example.com/page-c">

Core SEO Rules

Title tags, meta descriptions, H1 headings, and Open Graph

Performance Rules

Page speed, compression, caching, and optimization

Running Audits

Learn how to run website audits and identify technical issues

Interpreting Results

Understand health scores and prioritize fixes

Build docs developers (and LLMs) love