Overview
ThepandocHandler provides document format conversion using Pandoc compiled to WebAssembly. It supports extensive document formats including Markdown variants, office documents, HTML, LaTeX, and many markup languages.
Supported Formats
PandocHandler queries Pandoc at runtime for input and output formats. It supports 80+ formats across multiple categories:Markdown Variants
- Markdown - Pandoc’s Markdown
- GFM - GitHub-Flavored Markdown
- CommonMark - CommonMark Markdown
- CommonMark_x - CommonMark with extensions
- Markdown_strict - Original unextended Markdown
- Markdown_mmd - MultiMarkdown
- Markdown_phpextra - PHP Markdown Extra
Office Documents
- DOCX - Microsoft Word Document
- XLSX - Microsoft Excel Spreadsheet
- PPTX - Microsoft PowerPoint Presentation
- ODT - OpenDocument Text
- RTF - Rich Text Format
Markup Languages
- HTML - Hypertext Markup Language
- HTML5 - HTML5
- LaTeX - LaTeX typesetting
- reStructuredText - RST
- AsciiDoc - AsciiDoc markup
- MediaWiki - MediaWiki markup
- Textile - Textile markup
- Org - Emacs Org mode
Presentation Formats
- Beamer - LaTeX Beamer slides
- DZSlides - DZSlides HTML slides
- Slidy - Slidy HTML slides
- Slideous - Slideous HTML slides
- S5 - S5 HTML slides
Other Formats
- EPUB - Electronic Publication (v2 and v3)
- DocBook - DocBook v4 and v5
- JATS - JATS XML
- TEI - TEI Simple
- Typst - Typst typesetting
- Jupyter - Jupyter notebooks (.ipynb)
- CSV - Comma-Separated Values
- TSV - Tab-Separated Values
- JSON - JSON (CSL bibliography)
- XML - Various XML formats
- MathML - Mathematical Markup Language
Filtered Formats
Initialization
The handler dynamically loads Pandoc and queries supported formats:Initialization Process
- Dynamically imports Pandoc WASM module
- Queries input formats:
pandoc --query input-formats - Queries output formats:
pandoc --query output-formats - Manually adds MathML (supported but not exposed by query)
- Normalizes format metadata
- Categorizes formats
- Prioritizes common formats
Format Naming
The handler uses custom format names for better clarity:Format Extensions
Custom extension mappings for formats where extension differs from format name:Format Categorization
Formats are categorized for filtering and organization:Spreadsheets
Presentations
Text Formats
Conversion Process
Basic Conversion
Per-File Processing
Unlike other handlers, pandocHandler processes files individually:Conversion Options
Input format identifier (e.g., “markdown”, “docx”)
Output format identifier (e.g., “html”, “pdf”)
Array of input filenames in the virtual file system
Output filename in the virtual file system
Embed all resources (images, CSS, etc.) in the output file
Method for rendering math in HTML output:
"mathjax" or "mathml"Special Format Handling
MathML Output
MathML is handled specially since Pandoc doesn’t expose it as a format:Plain Text Normalization
Pandoc’s “plain” format is normalized to “text” for consistency:Resource Embedding
HTML outputs automatically embed all resources:Format Prioritization
HTML is prioritized as it can embed resources:Lossless Detection
Office formats are marked as lossy due to conversion limitations:Output File Naming
Output files preserve the base name with updated extension:archive.tar.gz).
Error Handling
Virtual File System
Pandoc uses a virtual file system for I/O:Format Metadata Structure
Human-readable format name from formatNames map
Normalized format identifier (e.g., “text” instead of “plain”)
File extension from formatExtensions map or format name
Normalized MIME type
Whether format can be used as input
Whether format can be used as output
Pandoc’s internal format identifier
Single category or array:
"text", "document", "spreadsheet", "presentation"false for office documents, true for othersProperties
Handler identifier
Array of supported formats populated during initialization
true when initialization is complete and handler is ready for conversionsPerformance Considerations
- Processes files individually (no batch optimization)
- Embeds all resources by default (increases file size)
- Suitable for text-based document conversions
- May be slower than native handlers for large files
Use Cases
Ideal for:- Markdown to HTML conversion
- Document format interchange (DOCX ↔ ODT ↔ HTML)
- Creating presentations from Markdown
- Converting between markup languages
- Academic writing workflows (LaTeX, EPUB, etc.)
- PDF generation (disabled in this configuration)
- RevealJS presentations (hangs indefinitely)
- Large binary office documents
Source Reference
Implementation:~/workspace/source/src/handlers/pandoc.ts