Overview
Vale transforms markup files into analyzable text while preserving accurate line and column positions for alerts. The transformation system:- Strips markup syntax but tracks original locations
- Applies ignore patterns (blocks and tokens)
- Handles format-specific quirks (Markdown, AsciiDoc, HTML, etc.)
- Routes code files through tree-sitter parsers
Format Detection
Vale automatically detects file formats using extension patterns (seeinternal/core/format.go:60-95):
- markup
- code
- data
- text
Markup formats like Markdown, AsciiDoc, reStructuredText, HTML.These are converted to HTML, then parsed to extract prose content while maintaining position tracking.Examples:
.md, .adoc, .rst, .htmlYou can override format detection using
FormatAssociations in .vale.ini.The Transform Method
Vale exposes the transformation process vialinter.Transform() (see internal/lint/lint.go:48-63):
- Block ignore patterns
- Token ignore patterns
- Built-in replacements
Using the transform Command
View the transformed output:Markup Transformations
Vale transforms markup files through a multi-stage pipeline:Markdown Processing
For Markdown files (seeinternal/lint/md.go:36-55):
Step 1: Lint Frontmatter
Step 1: Lint Frontmatter
Extract and lint YAML/TOML frontmatter:Frontmatter is linted separately from document body.
Step 2: Apply Ignore Patterns
Step 2: Apply Ignore Patterns
Transform the content using configured patterns:This replaces ignored content with placeholders.
Step 3: Convert to HTML
Step 3: Convert to HTML
Use goldmark to parse Markdown:Goldmark supports GitHub Flavored Markdown and footnotes.
Step 4: Prepare Content
Step 4: Prepare Content
Clean up special constructs (see This prevents false matches in markup syntax.
internal/lint/md.go:57-84):Step 5: Parse HTML Tokens
Step 5: Parse HTML Tokens
Extract prose from HTML while tracking positions:Vale walks the HTML token stream, analyzing text nodes.
Other Markup Formats
Vale has specialized processors for:| Format | File | Key Features |
|---|---|---|
| AsciiDoc | internal/lint/adoc.go | Uses Asciidoctor conversion |
| reStructuredText | internal/lint/rst.go | Docutils-based processing |
| HTML | internal/lint/html.go | Direct token parsing |
| DITA | internal/lint/dita.go | XML with semantic understanding |
| Org-mode | internal/lint/org.go | Emacs Org markup |
| MDX | internal/lint/mdx.go | Markdown with JSX |
Code Transformations
For code files, Vale extracts comments using tree-sitter:Tree-Sitter Integration
Vale uses tree-sitter grammars to parse code (seeinternal/lint/code/):
- Compiled Languages
- Scripting Languages
- Other Languages
- Go (
.go): Line and block comments - Rust (
.rs)://,///,/* */comments - C/C++ (
.c,.cpp,.h): Standard C-style - Java (
.java): Including Javadoc
For languages without tree-sitter support, Vale falls back to regex-based comment extraction (see
internal/core/format.go:20-56).Ignore Patterns
Transformations respect two types of ignore patterns:Block Ignores
Ignore multi-line regions:How Ignores Work
During transformation, Vale replaces ignored content with special markers that preserve length:XSLT Transformations
For XML-based formats, you can apply custom XSLT transformations before linting:- Removing metadata sections
- Flattening nested structures
- Extracting specific elements
Format Overrides
Override Vale’s format detection:.txt files as Markdown:
Use Cases
- Non-Standard Extensions
- Embedded Markup
- Plain Text
Linting Pipeline
The complete linting flow (seeinternal/lint/lint.go:168-233):
The
raw scope check runs on original content, before any transformations. Use it for rules that need to see markup syntax.Debugging Transformations
- View Transformed Content
- Check Format Detection
- Test Ignore Patterns
Best Practices
Test Patterns
Use
vale transform to verify ignore patterns work as expected before committing.Preserve Positions
When writing custom transformations, maintain character counts so Vale can report accurate locations.
Format-Specific Rules
Use scopes to target specific document parts:Rather than broad ignore patterns.
XSLT for Complex XML
For complex XML formats, XSLT is more reliable than regex-based ignores.
Common Patterns
Ignore Shortcodes (Hugo, Jekyll)
Ignore Shortcodes (Hugo, Jekyll)
Ignore Attribute Values
Ignore Attribute Values
Language-Specific Comments
Language-Specific Comments
For Python docstrings as Markdown:
Mixed Content
Mixed Content
For files with both prose and data:Ignore YAML while linting embedded Markdown.
Transformation Architecture
Key components:- Converts to a common representation (usually HTML)
- Walks the structure extracting text
- Maintains position mappings
- Yields blocks to the linter
Related Topics
- Scoping - Target specific document parts
- Configuration - Set up ignore patterns
- Format Detection - Format associations reference