Overview
The content sync process is the bridge between Notion’s CMS and Astro’s static site generation. It runs before every build via a pre-build hook and transforms Notion databases into local MDX files with optimized assets.
Build Hook Integration
The sync is triggered automatically through npm scripts:
{
"scripts" : {
"predev" : "jiti scripts/index.ts" ,
"dev" : "astro dev" ,
"prebuild" : "jiti scripts/index.ts" ,
"build" : "astro build"
}
}
jiti enables running TypeScript files directly in Node.js without a compilation step, perfect for build scripts.
Sync Script Entry Point
The main sync orchestrator is minimal and focused:
import "dotenv/config" ;
import { downloadPostsAsMdx } from "../src/lib/notion-download" ;
downloadPostsAsMdx ( "blog" );
downloadPostsAsMdx ( "projects" );
console . log ( "Finished downloading content." );
Load Environment
dotenv/config loads .env file with Notion credentials
Sync Blog Posts
Downloads all public blog posts from NOTION_BLOG_DB_ID
Sync Projects
Downloads all public projects from NOTION_PROJECTS_DB_ID
Core Sync Function
The downloadPostsAsMdx function orchestrates the entire conversion pipeline:
src/lib/notion-download.ts
export async function downloadPostsAsMdx ( collection : "projects" | "blog" ) {
let databaseId : string ;
if ( collection === "projects" ) {
databaseId = import . meta . env . NOTION_PROJECTS_DB_ID ;
} else if ( collection === "blog" ) {
databaseId = import . meta . env . NOTION_BLOG_DB_ID ;
} else {
throw Error ( "invalid collection" );
}
// Query Notion database for public posts
const posts = await queryNotionDatabase ( databaseId , {
filter: {
and: [
{
property: "public" ,
checkbox: {
equals: true ,
},
},
],
},
sorts: [
{
property: "published" ,
direction: "descending" ,
},
],
});
return Promise . all (
posts . map ( async ( post ) => {
const shouldUpdate = await shouldUpdateLocalFile (
post . last_edited_time ,
collection ,
post . id
);
if ( shouldUpdate ) {
const postBlocks = await getBlock ( post . id );
const pageProperties = await getPageProperties ( post . id );
const postFrontmatter = pagePropertiesToFrontmatter (
pageProperties ,
post . last_edited_time
);
const postImports = postFrontmatter . concat (
"import { Image } from 'astro:assets'; \n\n "
);
const postMdx = postImports . concat ( parseBlocks ( postBlocks ));
const dest = path
. join ( "src" , "content" , collection , post . id )
. concat ( ".mdx" );
console . log ( "Writing to file:" , dest );
return fsPromises . writeFile ( dest , postMdx );
}
})
);
}
Pipeline Stages
1. Query Database
Filters for public posts and sorts by publication date:
{
filter : {
and : [
{
property: "public" ,
checkbox: { equals: true },
},
],
}
}
Only posts with the public checkbox enabled are synced. This allows drafts to exist in Notion without appearing on the site.
2. Incremental Update Check
The sync implements smart caching to skip unchanged content:
src/lib/notion-download.ts:94-130
async function shouldUpdateLocalFile (
serverLastEditedTime : string ,
srcContentPath : string ,
postId : string
) : Promise < boolean > {
try {
const dest = path
. join ( process . cwd (), "src" , "content" , srcContentPath , postId )
. concat ( ".mdx" );
const readStream = fs . createReadStream ( dest );
const rl = readline . createInterface ({
input: readStream ,
});
let lastEditedTime : string ;
// Read frontmatter to get lastEditedTime
const lineListener = ( line ) => {
if ( line . includes ( "lastEditedTime" )) {
lastEditedTime = line . substring ( line . indexOf ( ": " ) + 2 );
rl . close ();
rl . removeListener ( "line" , lineListener );
readStream . destroy ();
}
};
rl . on ( "line" , lineListener );
await once ( rl , "close" );
return `' ${ serverLastEditedTime } '` > lastEditedTime ;
} catch ( err ) {
// File probably doesn't exist, so fetch it
return true ;
}
}
There’s a TODO comment in the code suggesting this could be optimized by storing timestamps in a JSON file rather than reading each MDX file.
3. Fetch Blocks
Retrieves all content blocks from the Notion page:
const postBlocks = await getBlock ( post . id );
This recursively fetches:
All child blocks
Nested content (toggles, lists)
Media files (images, videos, PDFs)
See Notion CMS Integration for implementation details.
Fetches page metadata for frontmatter:
const pageProperties = await getPageProperties ( post . id );
Returns an object like:
{
"title" : "My Blog Post" ,
"published" : "2024-01-15" ,
"description" : "A great post about..." ,
"path" : "my-blog-post" ,
"tags" : "web,astro,notion" ,
"public" : "true"
}
5. Generate Frontmatter
Converts properties to YAML frontmatter:
src/lib/notion-download.ts:11-24
function pagePropertiesToFrontmatter (
pageProperties : any ,
lastEditedTime ?: string
) {
return "---" . concat (
EOL ,
lastEditedTime ? `lastEditedTime: ' ${ lastEditedTime } ' ${ EOL } ` : "" ,
... Object . keys ( pageProperties ). map (
( key ) => ` ${ key } : ' ${ pageProperties [ key ] } ' ${ EOL } `
),
"---" ,
EOL
);
}
Produces:
---
lastEditedTime : '2024-01-15T10:30:00.000Z'
title : 'My Blog Post'
published : '2024-01-15'
description : 'A great post about...'
path : 'my-blog-post'
tags : 'web,astro,notion'
public : 'true'
---
6. Parse Blocks to Markdown
Converts Notion blocks to MDX-compatible Markdown:
const postMdx = postImports . concat ( parseBlocks ( postBlocks ));
See Block Parsing below for details.
7. Write MDX File
Writes the final MDX to disk:
const dest = path . join ( "src" , "content" , collection , post . id ). concat ( ".mdx" );
await fsPromises . writeFile ( dest , postMdx );
File structure:
src/content/
├── blog/
│ ├── abc123.mdx
│ └── def456.mdx
└── projects/
├── ghi789.mdx
└── jkl012.mdx
Files are named by Notion page ID (e.g., abc123.mdx) to ensure uniqueness. The path property in frontmatter determines the URL.
Block Parsing Details
The parseBlocks function converts Notion’s block structure to Markdown:
src/lib/notion-parse.ts:316-318
export function parseBlocks ( blocks : BlockObjectResponse []) : string {
return blocks . map (( block ) => parse ( block )). join ( "" );
}
Supported Block Types
The parse function handles these Notion blocks:
Paragraph : parseRichTextBlock(block.paragraph)
Heading 1-3 : "#".repeat(level).concat(" ", parseRichTextBlock(...))
Quote : "> ".concat(parseRichTextBlock(block.quote))
Bulleted List : "- ".concat(parseRichTextBlock(...))
Numbered List : "1. ".concat(parseRichTextBlock(...))
To-do : "- [x] " or "- [ ] " based on checked state
Code Block : Fenced code with language: ```language
Equation : Raw LaTeX expression
Callout : Rendered as code block
Toggle : <details> and <summary> tags
Table : Full <table> with headers and column groups
Bookmark : Markdown link
Embed : Link with caption
Link Preview : Blockquote with link
Rich Text Annotations
The parseRichTextBlock function handles inline formatting:
src/lib/notion-parse.ts:77-104
export function parseRichTextBlock ({ rich_text , color = "default" } : RichTextBlock ) : string {
return rich_text
. map (( token ) => {
let markdown = token . plain_text ;
markdown = markdown . replace ( /</ g , "<" ); // Escape HTML
if ( token . href ) markdown = `[ ${ markdown } ]( ${ token . href } )` ;
const { bold , italic , strikethrough , underline , code , color } = token . annotations ;
if ( code ) markdown = `<code> ${ markdown } </code>` ;
if ( bold ) markdown = `<b> ${ markdown } </b>` ;
if ( italic ) markdown = `<i> ${ markdown } </i>` ;
if ( strikethrough ) markdown = `<s> ${ markdown } </s>` ;
if ( underline ) markdown = `<u> ${ markdown } </u>` ;
if ( color !== "default" ) markdown = parseColor ( color , markdown );
return markdown ;
})
. join ( "" );
}
Markdown Formatting Links, bold, italic, strikethrough use standard Markdown syntax
HTML Tags Code, underline, and colors use HTML tags for better control
Color Handling
Notion’s color annotations are converted to Tailwind classes:
src/lib/notion-parse.ts:12-71
export function parseColor ( color : ApiColor , text : string ) : string {
let className = "" ;
switch ( color ) {
case "gray" :
className = "!text-gray-500" ;
break ;
case "orange" :
className = "!text-orange-500" ;
break ;
// ... other colors
case "orange_background" :
className = "!bg-orange-200" ;
break ;
// ...
}
return html `<span class=" ${ className } "> ${ text } </span>` ;
}
Asset Download Pipeline
During block parsing, media files are downloaded:
src/lib/notion-cms.ts:156-170
if (
blockType === "image" ||
blockType === "video" ||
blockType === "audio" ||
blockType === "pdf"
) {
if ( block [ blockType ]. type === "file" ) {
// Download and remap URL to local path
block [ blockType ]. file . url = await getAssetUrl (
block . id ,
block [ blockType ]. file . url ,
blockType === "image" ,
);
}
}
This ensures:
Assets are versioned with block IDs
Notion’s expiring URLs are replaced with local paths
Images get dimension metadata for optimization
See Notion CMS - Asset Management for details.
Generated MDX Structure
A complete MDX file looks like:
src/content/blog/abc123.mdx
---
lastEditedTime : '2024-01-15T10:30:00.000Z'
title : 'Building with Astro'
published : '2024-01-15'
description : 'A comprehensive guide to Astro'
path : 'building-with-astro'
tags : 'astro,web,tutorial'
public : 'true'
---
import { Image } from 'astro:assets' ;
# Building with Astro
Astro is a modern static site generator...
< Image src = {import ( "@assets/file.xyz123.png" ) } ?w=1200&h=800 width = "1200" height = "800" format = "webp" alt = "Astro logo" />
## Key Features
- Zero JS by default
- Island architecture
- Multi-framework support
``` typescript
import { defineConfig } from 'astro/config' ;
export default defineConfig ({}) ;
## Content Collection Integration
Astro's content collections automatically validate the generated MDX:
```typescript src/content.config.ts
import { glob } from "astro/loaders";
import { z, defineCollection } from "astro:content";
const blogSchema = z.object({
lastEditedTime: z.string().transform((str) => new Date(str)),
published: z.string(),
description: z.string(),
path: z.string(),
tags: z.string(),
public: z.string(),
title: z.string(),
});
const blog = defineCollection({
loader: glob({ pattern: "**/*.mdx", base: "./src/content/blog" }),
schema: blogSchema,
});
If frontmatter doesn’t match the schema, Astro will fail the build with a validation error.
Incremental Updates Only syncs changed content by comparing lastEditedTime Reduces build time by 50-75% for small updates
Data Source Caching Caches database → data source mappings Eliminates redundant API calls
Asset Deduplication Checks if asset exists before downloading Prevents re-downloading unchanged media
Parallel Processing Uses Promise.all to process posts concurrently Faster than sequential processing
Error Handling
The sync process is resilient to failures:
try {
await downloadPostsAsMdx ( "blog" );
} catch ( error ) {
console . error ( "Failed to sync blog:" , error );
// Build continues with existing MDX files
}
If the Notion API is unavailable, the build uses the last successfully synced MDX files. This prevents deployment failures.
Debugging
To debug the sync process:
# Run sync script directly
node --loader jiti scripts/index.ts
# Or with verbose logging
NODE_ENV = development node --loader jiti scripts/index.ts
Watch for console output:
Fetching data_source_id for database: abc123
Resolved data_source_id: xyz789
Fetching pages from Notion data source: xyz789
Writing to file: src/content/blog/def456.mdx
Finished downloading content.
Next Steps
Notion CMS Integration Understand the Notion API client and querying
Multi-Framework Strategy Learn how Astro renders the synced MDX content