Skip to main content

Overview

The content sync process is the bridge between Notion’s CMS and Astro’s static site generation. It runs before every build via a pre-build hook and transforms Notion databases into local MDX files with optimized assets.

Build Hook Integration

The sync is triggered automatically through npm scripts:
package.json
{
  "scripts": {
    "predev": "jiti scripts/index.ts",
    "dev": "astro dev",
    "prebuild": "jiti scripts/index.ts",
    "build": "astro build"
  }
}
jiti enables running TypeScript files directly in Node.js without a compilation step, perfect for build scripts.

Sync Script Entry Point

The main sync orchestrator is minimal and focused:
scripts/index.ts
import "dotenv/config";
import { downloadPostsAsMdx } from "../src/lib/notion-download";

downloadPostsAsMdx("blog");
downloadPostsAsMdx("projects");

console.log("Finished downloading content.");
1

Load Environment

dotenv/config loads .env file with Notion credentials
2

Sync Blog Posts

Downloads all public blog posts from NOTION_BLOG_DB_ID
3

Sync Projects

Downloads all public projects from NOTION_PROJECTS_DB_ID

Core Sync Function

The downloadPostsAsMdx function orchestrates the entire conversion pipeline:
src/lib/notion-download.ts
export async function downloadPostsAsMdx(collection: "projects" | "blog") {
  let databaseId: string;

  if (collection === "projects") {
    databaseId = import.meta.env.NOTION_PROJECTS_DB_ID;
  } else if (collection === "blog") {
    databaseId = import.meta.env.NOTION_BLOG_DB_ID;
  } else {
    throw Error("invalid collection");
  }

  // Query Notion database for public posts
  const posts = await queryNotionDatabase(databaseId, {
    filter: {
      and: [
        {
          property: "public",
          checkbox: {
            equals: true,
          },
        },
      ],
    },
    sorts: [
      {
        property: "published",
        direction: "descending",
      },
    ],
  });

  return Promise.all(
    posts.map(async (post) => {
      const shouldUpdate = await shouldUpdateLocalFile(
        post.last_edited_time,
        collection,
        post.id
      );

      if (shouldUpdate) {
        const postBlocks = await getBlock(post.id);
        const pageProperties = await getPageProperties(post.id);
        const postFrontmatter = pagePropertiesToFrontmatter(
          pageProperties,
          post.last_edited_time
        );

        const postImports = postFrontmatter.concat(
          "import { Image } from 'astro:assets';\n\n"
        );

        const postMdx = postImports.concat(parseBlocks(postBlocks));

        const dest = path
          .join("src", "content", collection, post.id)
          .concat(".mdx");

        console.log("Writing to file:", dest);

        return fsPromises.writeFile(dest, postMdx);
      }
    })
  );
}

Pipeline Stages

1. Query Database

Filters for public posts and sorts by publication date:
{
  filter: {
    and: [
      {
        property: "public",
        checkbox: { equals: true },
      },
    ],
  }
}
Only posts with the public checkbox enabled are synced. This allows drafts to exist in Notion without appearing on the site.

2. Incremental Update Check

The sync implements smart caching to skip unchanged content:
src/lib/notion-download.ts:94-130
async function shouldUpdateLocalFile(
  serverLastEditedTime: string,
  srcContentPath: string,
  postId: string
): Promise<boolean> {
  try {
    const dest = path
      .join(process.cwd(), "src", "content", srcContentPath, postId)
      .concat(".mdx");

    const readStream = fs.createReadStream(dest);

    const rl = readline.createInterface({
      input: readStream,
    });

    let lastEditedTime: string;

    // Read frontmatter to get lastEditedTime
    const lineListener = (line) => {
      if (line.includes("lastEditedTime")) {
        lastEditedTime = line.substring(line.indexOf(": ") + 2);
        rl.close();
        rl.removeListener("line", lineListener);
        readStream.destroy();
      }
    };

    rl.on("line", lineListener);

    await once(rl, "close");
    return `'${serverLastEditedTime}'` > lastEditedTime;
  } catch (err) {
    // File probably doesn't exist, so fetch it
    return true;
  }
}
This optimization can reduce build times from 60s to under 20s when only a few posts have changed.The function:
  1. Reads the first few lines of the existing MDX file
  2. Extracts the lastEditedTime from frontmatter
  3. Compares with Notion’s last_edited_time
  4. Returns false if timestamps match (skip update)
There’s a TODO comment in the code suggesting this could be optimized by storing timestamps in a JSON file rather than reading each MDX file.

3. Fetch Blocks

Retrieves all content blocks from the Notion page:
const postBlocks = await getBlock(post.id);
This recursively fetches:
  • All child blocks
  • Nested content (toggles, lists)
  • Media files (images, videos, PDFs)
See Notion CMS Integration for implementation details.

4. Extract Properties

Fetches page metadata for frontmatter:
const pageProperties = await getPageProperties(post.id);
Returns an object like:
{
  "title": "My Blog Post",
  "published": "2024-01-15",
  "description": "A great post about...",
  "path": "my-blog-post",
  "tags": "web,astro,notion",
  "public": "true"
}

5. Generate Frontmatter

Converts properties to YAML frontmatter:
src/lib/notion-download.ts:11-24
function pagePropertiesToFrontmatter(
  pageProperties: any,
  lastEditedTime?: string
) {
  return "---".concat(
    EOL,
    lastEditedTime ? `lastEditedTime: '${lastEditedTime}'${EOL}` : "",
    ...Object.keys(pageProperties).map(
      (key) => `${key}: '${pageProperties[key]}'${EOL}`
    ),
    "---",
    EOL
  );
}
Produces:
---
lastEditedTime: '2024-01-15T10:30:00.000Z'
title: 'My Blog Post'
published: '2024-01-15'
description: 'A great post about...'
path: 'my-blog-post'
tags: 'web,astro,notion'
public: 'true'
---

6. Parse Blocks to Markdown

Converts Notion blocks to MDX-compatible Markdown:
const postMdx = postImports.concat(parseBlocks(postBlocks));
See Block Parsing below for details.

7. Write MDX File

Writes the final MDX to disk:
const dest = path.join("src", "content", collection, post.id).concat(".mdx");
await fsPromises.writeFile(dest, postMdx);
File structure:
src/content/
├── blog/
│   ├── abc123.mdx
│   └── def456.mdx
└── projects/
    ├── ghi789.mdx
    └── jkl012.mdx
Files are named by Notion page ID (e.g., abc123.mdx) to ensure uniqueness. The path property in frontmatter determines the URL.

Block Parsing Details

The parseBlocks function converts Notion’s block structure to Markdown:
src/lib/notion-parse.ts:316-318
export function parseBlocks(blocks: BlockObjectResponse[]): string {
  return blocks.map((block) => parse(block)).join("");
}

Supported Block Types

The parse function handles these Notion blocks:
  • Paragraph: parseRichTextBlock(block.paragraph)
  • Heading 1-3: "#".repeat(level).concat(" ", parseRichTextBlock(...))
  • Quote: "> ".concat(parseRichTextBlock(block.quote))
  • Bulleted List: "- ".concat(parseRichTextBlock(...))
  • Numbered List: "1. ".concat(parseRichTextBlock(...))
  • To-do: "- [x] " or "- [ ] " based on checked state
  • Code Block: Fenced code with language: ```language
  • Equation: Raw LaTeX expression
  • Callout: Rendered as code block
  • Image: <Image> component with width/height
  • Video: <video> tag with controls
  • Audio: <audio> tag
  • PDF: <object> tag for inline PDF viewer
  • Toggle: <details> and <summary> tags
  • Table: Full <table> with headers and column groups

Rich Text Annotations

The parseRichTextBlock function handles inline formatting:
src/lib/notion-parse.ts:77-104
export function parseRichTextBlock({ rich_text, color = "default" }: RichTextBlock): string {
  return rich_text
    .map((token) => {
      let markdown = token.plain_text;
      markdown = markdown.replace(/</g, "&lt;");  // Escape HTML

      if (token.href) markdown = `[${markdown}](${token.href})`;

      const { bold, italic, strikethrough, underline, code, color } = token.annotations;

      if (code) markdown = `<code>${markdown}</code>`;
      if (bold) markdown = `<b>${markdown}</b>`;
      if (italic) markdown = `<i>${markdown}</i>`;
      if (strikethrough) markdown = `<s>${markdown}</s>`;
      if (underline) markdown = `<u>${markdown}</u>`;
      if (color !== "default") markdown = parseColor(color, markdown);

      return markdown;
    })
    .join("");
}

Markdown Formatting

Links, bold, italic, strikethrough use standard Markdown syntax

HTML Tags

Code, underline, and colors use HTML tags for better control

Color Handling

Notion’s color annotations are converted to Tailwind classes:
src/lib/notion-parse.ts:12-71
export function parseColor(color: ApiColor, text: string): string {
  let className = "";
  switch (color) {
    case "gray":
      className = "!text-gray-500";
      break;
    case "orange":
      className = "!text-orange-500";
      break;
    // ... other colors
    case "orange_background":
      className = "!bg-orange-200";
      break;
    // ...
  }
  return html`<span class="${className}">${text}</span>`;
}

Asset Download Pipeline

During block parsing, media files are downloaded:
src/lib/notion-cms.ts:156-170
if (
  blockType === "image" ||
  blockType === "video" ||
  blockType === "audio" ||
  blockType === "pdf"
) {
  if (block[blockType].type === "file") {
    // Download and remap URL to local path
    block[blockType].file.url = await getAssetUrl(
      block.id,
      block[blockType].file.url,
      blockType === "image",
    );
  }
}
This ensures:
  1. Assets are versioned with block IDs
  2. Notion’s expiring URLs are replaced with local paths
  3. Images get dimension metadata for optimization
See Notion CMS - Asset Management for details.

Generated MDX Structure

A complete MDX file looks like:
src/content/blog/abc123.mdx
---
lastEditedTime: '2024-01-15T10:30:00.000Z'
title: 'Building with Astro'
published: '2024-01-15'
description: 'A comprehensive guide to Astro'
path: 'building-with-astro'
tags: 'astro,web,tutorial'
public: 'true'
---
import { Image } from 'astro:assets';

# Building with Astro

Astro is a modern static site generator...

<Image src={import("@assets/file.xyz123.png")}?w=1200&h=800 width="1200" height="800" format="webp" alt="Astro logo" />

## Key Features

- Zero JS by default
- Island architecture
- Multi-framework support

```typescript
import { defineConfig } from 'astro/config';
export default defineConfig({});

## Content Collection Integration

Astro's content collections automatically validate the generated MDX:

```typescript src/content.config.ts
import { glob } from "astro/loaders";
import { z, defineCollection } from "astro:content";

const blogSchema = z.object({
  lastEditedTime: z.string().transform((str) => new Date(str)),
  published: z.string(),
  description: z.string(),
  path: z.string(),
  tags: z.string(),
  public: z.string(),
  title: z.string(),
});

const blog = defineCollection({
  loader: glob({ pattern: "**/*.mdx", base: "./src/content/blog" }),
  schema: blogSchema,
});
If frontmatter doesn’t match the schema, Astro will fail the build with a validation error.

Performance Optimizations

Incremental Updates

Only syncs changed content by comparing lastEditedTimeReduces build time by 50-75% for small updates

Data Source Caching

Caches database → data source mappingsEliminates redundant API calls

Asset Deduplication

Checks if asset exists before downloadingPrevents re-downloading unchanged media

Parallel Processing

Uses Promise.all to process posts concurrentlyFaster than sequential processing

Error Handling

The sync process is resilient to failures:
try {
  await downloadPostsAsMdx("blog");
} catch (error) {
  console.error("Failed to sync blog:", error);
  // Build continues with existing MDX files
}
If the Notion API is unavailable, the build uses the last successfully synced MDX files. This prevents deployment failures.

Debugging

To debug the sync process:
# Run sync script directly
node --loader jiti scripts/index.ts

# Or with verbose logging
NODE_ENV=development node --loader jiti scripts/index.ts
Watch for console output:
Fetching data_source_id for database: abc123
Resolved data_source_id: xyz789
Fetching pages from Notion data source: xyz789
Writing to file: src/content/blog/def456.mdx
Finished downloading content.

Next Steps

Notion CMS Integration

Understand the Notion API client and querying

Multi-Framework Strategy

Learn how Astro renders the synced MDX content

Build docs developers (and LLMs) love