fc.markdown.*.
to_json
Converts a column of Markdown-formatted strings into a hierarchical JSON representation.Input column containing Markdown strings.
A column of JSON-formatted strings representing the structured document tree.
This function parses Markdown into a structured JSON format optimized for document chunking, semantic analysis, and
jq queries. The output conforms to a custom schema that organizes content into nested sections based on heading levels. The full JSON schema is available at: docs.fenic.ai/topics/markdown-jsonSupported Markdown Features
- Headings with nested hierarchy (e.g., h2 → h3 → h4)
- Paragraphs with inline formatting (bold, italics, links, code, etc.)
- Lists (ordered, unordered, task lists)
- Tables with header alignment and inline content
- Code blocks with language info
- Blockquotes, horizontal rules, and inline/flow HTML
Examples
get_code_blocks
Extracts all code blocks from a column of Markdown-formatted strings.Input column containing Markdown strings.
Optional language filter to extract only code blocks with a specific language. By default, all code blocks are extracted.
A column of code blocks. The output column type is
ArrayType(StructType([StructField("language", StringType), StructField("code", StringType)])).- Code blocks are parsed from fenced Markdown blocks (e.g., triple backticks).
- Language identifiers are optional and may be null if not provided in the original Markdown.
- Indented code blocks without fences are not currently supported.
Examples
generate_toc
Generates a table of contents from markdown headings.Input column containing Markdown strings.
Maximum heading level to include in the TOC (1-6). Defaults to 6 (all levels).
A column of Markdown-formatted table of contents strings.
- The TOC is generated using markdown heading syntax (# ## ### etc.)
- Each heading in the source document becomes a line in the TOC
- The heading level is preserved in the output
- This creates a valid markdown document that can be rendered or processed further
Examples
extract_header_chunks
Splits markdown documents into logical chunks based on heading hierarchy.Input column containing Markdown strings.
Heading level to split on (1-6). Creates a new chunk at every heading of this level, including all nested content and subsections.
A column of arrays containing chunk objects with the following structure:
Features
- Context-preserving: Each chunk contains all content and subsections under the heading
- Hierarchical awareness: Includes parent heading context for better LLM understanding
- Clean text output: Strips markdown formatting for direct LLM consumption
Chunking Behavior
Withheader_level=2, this markdown:
Getting Startedchunk (includesPrerequisitessubsection)API Referencechunk
