Skip to main content
The parser package configures the Goldmark markdown parser with custom extensions and AST transformers for URL rewriting, table of contents generation, and server-side rendering of diagrams and math.

Overview

The parser uses Goldmark as the base markdown processor and adds:
  • GFM Support: GitHub Flavored Markdown (tables, strikethrough, task lists)
  • Syntax Highlighting: Nord theme with Chroma, custom code block wrappers
  • Math Rendering: LaTeX passthrough with $...$, $$...$$, \(...\), \[...\]
  • Admonitions: Callout boxes for notes, warnings, tips
  • URL Transformation: Converts .md links to .html, handles versioned docs
  • TOC Generation: Automatic table of contents from headings
  • SSR Support: Pre-renders D2 diagrams and LaTeX math on the server

Creating a Parser

func New(baseURL string, renderer *native.Renderer, diagramCache *sync.Map) goldmark.Markdown
Creates a new Goldmark parser with all extensions and transformers configured. Parameters:
  • baseURL - Base URL for absolute links (empty for relative links)
  • renderer - Native renderer for D2 diagrams and KaTeX math
  • diagramCache - Thread-safe cache for rendered diagrams
Example:
import (
    "sync"
    "github.com/Kush-Singh-26/kosh/builder/parser"
    "github.com/Kush-Singh-26/kosh/builder/renderer/native"
)

renderer := native.New()
diagramCache := &sync.Map{}
md := parser.New("https://example.com", renderer, diagramCache)

// Parse markdown
var buf bytes.Buffer
context := goldenmark_parser.NewContext()
if err := md.Convert([]byte("# Hello World"), &buf, parser.WithContext(context)); err != nil {
    log.Fatal(err)
}

Key Types

TOCEntry

type TOCEntry struct {
    ID    string // Heading ID (auto-generated from text)
    Text  string // Heading text content
    Level int    // Heading level (2-6)
}
Represents a table of contents entry extracted from markdown headings.

D2SVGPair

type D2SVGPair struct {
    D2Code string // Original D2 diagram code
    SVG    string // Rendered SVG output
}
Stores D2 diagram code with its rendered SVG.

Core Functions

ExtractPlainText

func ExtractPlainText(node ast.Node, source []byte) string
Walks the AST and extracts all text content (including code blocks) for search indexing. Example:
md := parser.New("", nil, nil)
context := goldenmark_parser.NewContext()
var buf bytes.Buffer
_ = md.Convert(source, &buf, parser.WithContext(context))

// Extract plain text for search
plainText := parser.ExtractPlainText(node, source)
fmt.Println("Search content:", plainText)

GetTOC

func GetTOC(pc parser.Context) []models.TOCEntry
Retrieves the table of contents from the parser context after conversion. Example:
context := goldenmark_parser.NewContext()
var buf bytes.Buffer
_ = md.Convert(source, &buf, parser.WithContext(context))

toc := parser.GetTOC(context)
for _, entry := range toc {
    fmt.Printf("Level %d: %s (#%s)\n", entry.Level, entry.Text, entry.ID)
}

GetSSRHashes

func GetSSRHashes(pc parser.Context) []string
Returns all server-side rendered input hashes (D2 diagrams, LaTeX math) for cache tracking.

AddSSRHash

func AddSSRHash(pc parser.Context, hash string)
Adds an SSR input hash to the parser context during transformation.

LaTeX Math Functions

ExtractMathExpressions

func ExtractMathExpressions(html string) []native.MathExpression
Finds all LaTeX expressions in HTML and returns them with metadata. Supported delimiters:
  • Block math: $$...$$, \[...\]
  • Inline math: $...$, \(...\)
Example:
html := "<p>The equation $E = mc^2$ is famous.</p>"
exprs := parser.ExtractMathExpressions(html)

for _, expr := range exprs {
    fmt.Printf("LaTeX: %s, Display: %v, Hash: %s\n", 
        expr.LaTeX, expr.DisplayMode, expr.Hash)
}

RenderMathForHTML

func RenderMathForHTML(html string, renderer *native.Renderer, 
    cache map[string]string, cacheMu *sync.Mutex) (string, []string)
Extracts, renders, and replaces all LaTeX in HTML. Returns the rendered HTML and SSR input hashes. Example:
renderer := native.New()
cache := make(map[string]string)
var cacheMu sync.Mutex

html := "<p>$$\\int_0^1 x^2 dx$$</p>"
rendered, hashes := parser.RenderMathForHTML(html, renderer, cache, &cacheMu)

fmt.Println("Rendered:", rendered)
fmt.Println("SSR hashes:", hashes)

URL Transformation

The urlTransformer handles:
  • Markdown to HTML: page.mdpage.html
  • External links: Add target="_blank" and rel="noopener noreferrer"
  • Image optimization: .jpg, .png.webp
  • Version-aware linking: Handles relative paths in versioned documentation
  • Lazy loading: Adds loading="lazy" to images

Version-Aware Linking Rules

<!-- From content/v2.0/guide.md -->
[Same version](./setup.md)       → setup.html
[Different version](../v1.0/old.md) → ../v1.0/old.html
[Root level](../index.md)        → ../index.html

Code Block Enhancement

Code blocks are wrapped with custom HTML structure:
<div class="code-block-container">
  <div class="code-header">config.yaml</div>
  <div class="code-wrapper" data-lang="yaml">
    <!-- Syntax-highlighted code -->
  </div>
</div>
Markdown:
```yaml title="config.yaml"
baseURL: https://example.com
theme: docs
```

AST Transformers

Three transformers run on the parsed AST:
  1. urlTransformer (priority 100) - Rewrites links and images
  2. tocTransformer (priority 200) - Extracts headings for TOC
  3. ssrTransformer (priority 50) - Pre-renders diagrams and math
Lower priority numbers run first.

Architecture

Markdown Source

Goldmark Parser

   AST Node

  ┌──────────────────┐
  │ SSR Transformer  │ (D2 diagrams, LaTeX)
  └──────────────────┘

  ┌──────────────────┐
  │ URL Transformer  │ (.md → .html, versioning)
  └──────────────────┘

  ┌──────────────────┐
  │ TOC Transformer  │ (Extract headings)
  └──────────────────┘

  HTML Renderer

   HTML Output

Configuration

The parser uses these Goldmark extensions:
  • extension.GFM - GitHub Flavored Markdown
  • meta.Meta - YAML frontmatter parsing
  • highlighting.NewHighlighting - Syntax highlighting with Nord theme
  • passthrough.New - LaTeX math passthrough
  • admonitions.Extender - Callout boxes
Renderer options:
  • html.WithUnsafe() - Allows raw HTML in markdown
  • parser.WithAutoHeadingID() - Auto-generates IDs from heading text

Build docs developers (and LLMs) love