Skip to main content

Overview

Vale transforms markup files into analyzable text while preserving accurate line and column positions for alerts. The transformation system:
  • Strips markup syntax but tracks original locations
  • Applies ignore patterns (blocks and tokens)
  • Handles format-specific quirks (Markdown, AsciiDoc, HTML, etc.)
  • Routes code files through tree-sitter parsers
This allows Vale to lint prose content while ignoring markup syntax and code.

Format Detection

Vale automatically detects file formats using extension patterns (see internal/core/format.go:60-95):
var FormatByExtension = map[string][]string{
    `\.(?:md|mdown|markdown|markdn)$`: {".md", "markup"},
    `\.(?:adoc|asciidoc|asc)$`:       {".adoc", "markup"},
    `\.(?:rst|rest)$`:                {".rst", "markup"},
    `\.(?:html|htm|shtml|xhtml)$`:    {".html", "markup"},
    `\.(?:dita)$`:                    {".dita", "markup"},
    `\.(?:org)$`:                     {".org", "markup"},
    `\.(?:xml|xsd)$`:                 {".xml", "markup"},
    `\.(?:go)$`:                      {".go", "code"},
    `\.(?:py[3w]?)$`:                 {".py", "code"},
    `\.(?:rs)$`:                      {".rs", "code"},
    `\.(?:js|jsx)$`:                  {".js", "code"},
    // ... many more
}
Each format is categorized as:
Markup formats like Markdown, AsciiDoc, reStructuredText, HTML.These are converted to HTML, then parsed to extract prose content while maintaining position tracking.Examples: .md, .adoc, .rst, .html
You can override format detection using FormatAssociations in .vale.ini.

The Transform Method

Vale exposes the transformation process via linter.Transform() (see internal/lint/lint.go:48-63):
func (l *Linter) Transform(f *core.File) (string, error) {
    exts := extensionConfig{
        Normed: f.NormedExt,
        Real:   f.RealExt,
    }
    
    return applyPatterns(l.Manager.Config, exts, f.Content)
}
This applies:
  1. Block ignore patterns
  2. Token ignore patterns
  3. Built-in replacements

Using the transform Command

View the transformed output:
vale transform document.md
The transform command shows exactly what text Vale analyzes, useful for debugging why certain content is or isn’t being linted.

Markup Transformations

Vale transforms markup files through a multi-stage pipeline:

Markdown Processing

For Markdown files (see internal/lint/md.go:36-55):
Extract and lint YAML/TOML frontmatter:
err := l.lintMetadata(f)
if err != nil {
    return err
}
Frontmatter is linted separately from document body.
Transform the content using configured patterns:
s, err := l.Transform(f)
if err != nil {
    return err
}
This replaces ignored content with placeholders.
Use goldmark to parse Markdown:
var buf bytes.Buffer
if err = goldMd.Convert([]byte(s), &buf); err != nil {
    return core.NewE100(f.Path, err)
}
Goldmark supports GitHub Flavored Markdown and footnotes.
Clean up special constructs (see internal/lint/md.go:57-84):
// Replace info strings with asterisks
body := reExInfo.ReplaceAllStringFunc(content, func(m string) string {
    parts := strings.Split(m, "`")
    tags := strings.Repeat("`", len(parts)-1)
    span := strings.Repeat("*", nlp.StrLen(parts[len(parts)-1]))
    return tags + span
})

// Replace link references
body = reLinkRef.ReplaceAllStringFunc(body, func(m string) string {
    return "][" + strings.Repeat("*", nlp.StrLen(m)-3) + "]"
})
This prevents false matches in markup syntax.
Extract prose from HTML while tracking positions:
return l.lintHTMLTokens(f, buf.Bytes(), 0)
Vale walks the HTML token stream, analyzing text nodes.

Other Markup Formats

Vale has specialized processors for:
FormatFileKey Features
AsciiDocinternal/lint/adoc.goUses Asciidoctor conversion
reStructuredTextinternal/lint/rst.goDocutils-based processing
HTMLinternal/lint/html.goDirect token parsing
DITAinternal/lint/dita.goXML with semantic understanding
Org-modeinternal/lint/org.goEmacs Org markup
MDXinternal/lint/mdx.goMarkdown with JSX
Each handles format-specific details while producing HTML for analysis.

Code Transformations

For code files, Vale extracts comments using tree-sitter:

Tree-Sitter Integration

Vale uses tree-sitter grammars to parse code (see internal/lint/code/):
// Example: Go comment extraction (internal/lint/code/go.go)
func parseGo(src []byte, file *core.File) error {
    parser := sitter.NewParser()
    parser.SetLanguage(tree_sitter_go.GetLanguage())
    
    tree, _ := parser.ParseCtx(context.Background(), nil, src)
    defer tree.Close()
    
    // Query for comments
    query := `(comment) @comment`
    // ... extract and process comments
}
Supported languages:
  • Go (.go): Line and block comments
  • Rust (.rs): //, ///, /* */ comments
  • C/C++ (.c, .cpp, .h): Standard C-style
  • Java (.java): Including Javadoc
For languages without tree-sitter support, Vale falls back to regex-based comment extraction (see internal/core/format.go:20-56).

Ignore Patterns

Transformations respect two types of ignore patterns:

Block Ignores

Ignore multi-line regions:
[*.md]
BlockIgnores = (?s) *```.*?```, (?s) *:::.*?:::
Example:
This is linted.

```python
This code block is ignored.
This is also linted.

### Token Ignores

Ignore inline patterns:

```ini
[*.md]
TokenIgnores = \$[^\$]+\$, `[^`]+`
Example:
Check this text, but not `inline code` or $math$.

How Ignores Work

During transformation, Vale replaces ignored content with special markers that preserve length:
func applyPatterns(cfg *Config, exts extensionConfig, content string) (string, error) {
    // Apply block ignores first
    for _, pattern := range cfg.BlockIgnores[exts.Normed] {
        re := regexp.MustCompile(pattern)
        content = re.ReplaceAllStringFunc(content, func(match string) string {
            // Replace with markers of same length
            return strings.Repeat("@", len(match))
        })
    }
    
    // Then token ignores
    for _, pattern := range cfg.TokenIgnores[exts.Normed] {
        // Similar replacement
    }
    
    return content, nil
}
This maintains accurate position tracking for alerts.

XSLT Transformations

For XML-based formats, you can apply custom XSLT transformations before linting:
[*.xml]
Transform = transforms/strip-metadata.xsl
Vale applies the transformation, then lints the result:
transform := ""
for sec, p := range config.Stylesheets {
    pat, err := glob.Compile(sec)
    if err != nil {
        return err
    } else if pat.Match(path) {
        transform = p
        break
    }
}
Useful for:
  • Removing metadata sections
  • Flattening nested structures
  • Extracting specific elements
XSLT transformations require external dependencies. Make sure xsltproc or similar is installed.

Format Overrides

Override Vale’s format detection:
[*.txt]
FormatAssociations = md
This tells Vale to treat .txt files as Markdown:
func FormatFromExt(path string, mapping map[string]string) (string, string) {
    base := strings.Trim(filepath.Ext(path), ".")
    kind := getFormat("." + base)
    
    if format, found := mapping[base]; found {
        if kind == "code" && getFormat("."+format) == "markup" {
            return "." + format, "fragment"
        }
        base = format
    }
    // ...
}

Use Cases

[*.mdx]
FormatAssociations = md
Treat MDX as Markdown.

Linting Pipeline

The complete linting flow (see internal/lint/lint.go:168-233):
func (l *Linter) lintFile(src string) lintResult {
    file, err := core.NewFile(src, l.Manager.Config)
    
    // Determine format-specific linter
    if file.Format == "markup" && !simple {
        switch file.NormedExt {
        case ".md":
            err = l.lintMarkdown(file)
        case ".adoc":
            err = l.lintADoc(file)
        case ".rst":
            err = l.lintRST(file)
        // ... other formats
        }
    } else if file.Format == "code" && !simple {
        err = l.lintCode(file)
    } else if file.Format == "data" && hasViews {
        err = l.lintData(file)
    } else {
        err = l.lintLines(file)
    }
    
    // Always check raw scope
    raw := nlp.NewBlock("", strings.Join(file.Lines, ""), "raw"+file.RealExt)
    err = l.lintBlock(file, raw, len(file.Lines), 0, true)
    
    return lintResult{file, err}
}
The raw scope check runs on original content, before any transformations. Use it for rules that need to see markup syntax.

Debugging Transformations

vale transform document.md
Shows the exact text Vale analyzes.

Best Practices

Test Patterns

Use vale transform to verify ignore patterns work as expected before committing.

Preserve Positions

When writing custom transformations, maintain character counts so Vale can report accurate locations.

Format-Specific Rules

Use scopes to target specific document parts:
scope: heading
Rather than broad ignore patterns.

XSLT for Complex XML

For complex XML formats, XSLT is more reliable than regex-based ignores.

Common Patterns

[*.md]
TokenIgnores = {{.*?}}, {%.*?%}
Prevents false positives from template syntax.
[*.html]
TokenIgnores = (class|id)="[^"]+"
Skip linting CSS classes and IDs.
For Python docstrings as Markdown:
[*.py]
FormatAssociations = md
TokenIgnores = `[^`]+`
For files with both prose and data:
[*.yml]
BlockIgnores = (?s)^---$.*?^---$
Ignore YAML while linting embedded Markdown.

Transformation Architecture

Key components:
internal/lint/
├── lint.go          # Core linting logic
├── md.go            # Markdown transformation
├── html.go          # HTML token parsing
├── ast.go           # AST walking for markup
├── code.go          # Code file routing
└── code/
    ├── go.go        # Go tree-sitter parser
    ├── py.go        # Python parser
    ├── rs.go        # Rust parser
    └── ...          # Other languages
Each format handler:
  1. Converts to a common representation (usually HTML)
  2. Walks the structure extracting text
  3. Maintains position mappings
  4. Yields blocks to the linter

Build docs developers (and LLMs) love