Transformations

Overview

Vale transforms markup files into analyzable text while preserving accurate line and column positions for alerts. The transformation system:

Strips markup syntax but tracks original locations
Applies ignore patterns (blocks and tokens)
Handles format-specific quirks (Markdown, AsciiDoc, HTML, etc.)
Routes code files through tree-sitter parsers

This allows Vale to lint prose content while ignoring markup syntax and code.

Format Detection

Vale automatically detects file formats using extension patterns (see internal/core/format.go:60-95):

var FormatByExtension = map[string][]string{
    `\.(?:md|mdown|markdown|markdn)$`: {".md", "markup"},
    `\.(?:adoc|asciidoc|asc)$`:       {".adoc", "markup"},
    `\.(?:rst|rest)$`:                {".rst", "markup"},
    `\.(?:html|htm|shtml|xhtml)$`:    {".html", "markup"},
    `\.(?:dita)$`:                    {".dita", "markup"},
    `\.(?:org)$`:                     {".org", "markup"},
    `\.(?:xml|xsd)$`:                 {".xml", "markup"},
    `\.(?:go)$`:                      {".go", "code"},
    `\.(?:py[3w]?)$`:                 {".py", "code"},
    `\.(?:rs)$`:                      {".rs", "code"},
    `\.(?:js|jsx)$`:                  {".js", "code"},
    // ... many more
}

Each format is categorized as:

markup
code
data
text

Markup formats like Markdown, AsciiDoc, reStructuredText, HTML.These are converted to HTML, then parsed to extract prose content while maintaining position tracking.Examples: .md, .adoc, .rst, .html

Code formats where prose appears in comments.Vale uses tree-sitter grammars to extract comments and docstrings, ignoring actual code.Examples: .go, .py, .js, .rs, .java

Data formats like YAML, JSON, TOML.Can be linted with custom views that extract specific fields.Examples: .yml, .json, .toml

Plain text files.Linted directly with minimal processing.Examples: .txt

You can override format detection using FormatAssociations in .vale.ini.

The Transform Method

Vale exposes the transformation process via linter.Transform() (see internal/lint/lint.go:48-63):

func (l *Linter) Transform(f *core.File) (string, error) {
    exts := extensionConfig{
        Normed: f.NormedExt,
        Real:   f.RealExt,
    }
    
    return applyPatterns(l.Manager.Config, exts, f.Content)
}

This applies:

Block ignore patterns
Token ignore patterns
Built-in replacements

Using the transform Command

View the transformed output:

vale transform document.md

The transform command shows exactly what text Vale analyzes, useful for debugging why certain content is or isn’t being linted.

Markup Transformations

Vale transforms markup files through a multi-stage pipeline:

Markdown Processing

For Markdown files (see internal/lint/md.go:36-55):

Step 1: Lint Frontmatter

Extract and lint YAML/TOML frontmatter:

err := l.lintMetadata(f)
if err != nil {
    return err
}

Frontmatter is linted separately from document body.

Step 2: Apply Ignore Patterns

Transform the content using configured patterns:

s, err := l.Transform(f)
if err != nil {
    return err
}

This replaces ignored content with placeholders.

Step 3: Convert to HTML

Use goldmark to parse Markdown:

var buf bytes.Buffer
if err = goldMd.Convert([]byte(s), &buf); err != nil {
    return core.NewE100(f.Path, err)
}

Goldmark supports GitHub Flavored Markdown and footnotes.

Step 4: Prepare Content

Clean up special constructs (see internal/lint/md.go:57-84):

// Replace info strings with asterisks
body := reExInfo.ReplaceAllStringFunc(content, func(m string) string {
    parts := strings.Split(m, "`")
    tags := strings.Repeat("`", len(parts)-1)
    span := strings.Repeat("*", nlp.StrLen(parts[len(parts)-1]))
    return tags + span
})

// Replace link references
body = reLinkRef.ReplaceAllStringFunc(body, func(m string) string {
    return "][" + strings.Repeat("*", nlp.StrLen(m)-3) + "]"
})

This prevents false matches in markup syntax.

Step 5: Parse HTML Tokens

Extract prose from HTML while tracking positions:

return l.lintHTMLTokens(f, buf.Bytes(), 0)

Vale walks the HTML token stream, analyzing text nodes.

Other Markup Formats

Vale has specialized processors for:

Format	File	Key Features
AsciiDoc	`internal/lint/adoc.go`	Uses Asciidoctor conversion
reStructuredText	`internal/lint/rst.go`	Docutils-based processing
HTML	`internal/lint/html.go`	Direct token parsing
DITA	`internal/lint/dita.go`	XML with semantic understanding
Org-mode	`internal/lint/org.go`	Emacs Org markup
MDX	`internal/lint/mdx.go`	Markdown with JSX

Each handles format-specific details while producing HTML for analysis.

Code Transformations

For code files, Vale extracts comments using tree-sitter:

Tree-Sitter Integration

Vale uses tree-sitter grammars to parse code (see internal/lint/code/):

// Example: Go comment extraction (internal/lint/code/go.go)
func parseGo(src []byte, file *core.File) error {
    parser := sitter.NewParser()
    parser.SetLanguage(tree_sitter_go.GetLanguage())
    
    tree, _ := parser.ParseCtx(context.Background(), nil, src)
    defer tree.Close()
    
    // Query for comments
    query := `(comment) @comment`
    // ... extract and process comments
}

Supported languages:

Compiled Languages
Scripting Languages
Other Languages

Go (.go): Line and block comments
Rust (.rs): //, ///, /* */ comments
C/C++ (.c, .cpp, .h): Standard C-style
Java (.java): Including Javadoc

Python (.py): # comments and docstrings
JavaScript (.js, .jsx): // and /* */
TypeScript (.ts, .tsx): Same as JavaScript
Ruby (.rb): # comments and RDoc

CSS (.css): /* */ comments
YAML (.yml): # comments (via custom parser)
Proto (.proto): Protocol buffer comments
Julia (.jl): # and multi-line comments

For languages without tree-sitter support, Vale falls back to regex-based comment extraction (see internal/core/format.go:20-56).

Ignore Patterns

Transformations respect two types of ignore patterns:

Block Ignores

Ignore multi-line regions:

[*.md]
BlockIgnores = (?s) *```.*?```, (?s) *:::.*?:::

Example:

This is linted.

```python
This code block is ignored.

This is also linted.

### Token Ignores

Ignore inline patterns:

```ini
[*.md]
TokenIgnores = \$[^\$]+\$, `[^`]+`

Example:

Check this text, but not `inline code` or $math$.

How Ignores Work

During transformation, Vale replaces ignored content with special markers that preserve length:

func applyPatterns(cfg *Config, exts extensionConfig, content string) (string, error) {
    // Apply block ignores first
    for _, pattern := range cfg.BlockIgnores[exts.Normed] {
        re := regexp.MustCompile(pattern)
        content = re.ReplaceAllStringFunc(content, func(match string) string {
            // Replace with markers of same length
            return strings.Repeat("@", len(match))
        })
    }
    
    // Then token ignores
    for _, pattern := range cfg.TokenIgnores[exts.Normed] {
        // Similar replacement
    }
    
    return content, nil
}

This maintains accurate position tracking for alerts.

XSLT Transformations

For XML-based formats, you can apply custom XSLT transformations before linting:

[*.xml]
Transform = transforms/strip-metadata.xsl

Vale applies the transformation, then lints the result:

transform := ""
for sec, p := range config.Stylesheets {
    pat, err := glob.Compile(sec)
    if err != nil {
        return err
    } else if pat.Match(path) {
        transform = p
        break
    }
}

Useful for:

Removing metadata sections
Flattening nested structures
Extracting specific elements

XSLT transformations require external dependencies. Make sure xsltproc or similar is installed.

Format Overrides

Override Vale’s format detection:

[*.txt]
FormatAssociations = md

This tells Vale to treat .txt files as Markdown:

func FormatFromExt(path string, mapping map[string]string) (string, string) {
    base := strings.Trim(filepath.Ext(path), ".")
    kind := getFormat("." + base)
    
    if format, found := mapping[base]; found {
        if kind == "code" && getFormat("."+format) == "markup" {
            return "." + format, "fragment"
        }
        base = format
    }
    // ...
}

Use Cases

Non-Standard Extensions
Embedded Markup
Plain Text

[*.mdx]
FormatAssociations = md

Treat MDX as Markdown.

[*.py]
FormatAssociations = md

Treat Python docstrings as Markdown:

def example():
    """This **Markdown** is linted."""

[*.log]
FormatAssociations = txt

Lint log files as plain text.

Linting Pipeline

The complete linting flow (see internal/lint/lint.go:168-233):

func (l *Linter) lintFile(src string) lintResult {
    file, err := core.NewFile(src, l.Manager.Config)
    
    // Determine format-specific linter
    if file.Format == "markup" && !simple {
        switch file.NormedExt {
        case ".md":
            err = l.lintMarkdown(file)
        case ".adoc":
            err = l.lintADoc(file)
        case ".rst":
            err = l.lintRST(file)
        // ... other formats
        }
    } else if file.Format == "code" && !simple {
        err = l.lintCode(file)
    } else if file.Format == "data" && hasViews {
        err = l.lintData(file)
    } else {
        err = l.lintLines(file)
    }
    
    // Always check raw scope
    raw := nlp.NewBlock("", strings.Join(file.Lines, ""), "raw"+file.RealExt)
    err = l.lintBlock(file, raw, len(file.Lines), 0, true)
    
    return lintResult{file, err}
}

The raw scope check runs on original content, before any transformations. Use it for rules that need to see markup syntax.

Debugging Transformations

View Transformed Content
Check Format Detection
Test Ignore Patterns

vale transform document.md

Shows the exact text Vale analyzes.

vale ls-config | grep Format

Verify how Vale categorizes your files.

Add patterns incrementally and use transform to verify:

[*.md]
BlockIgnores = (?s) *```.*?```

vale transform test.md | grep -v '@@@'

Best Practices

Test Patterns

Use vale transform to verify ignore patterns work as expected before committing.

Preserve Positions

When writing custom transformations, maintain character counts so Vale can report accurate locations.

Format-Specific Rules

Use scopes to target specific document parts:

scope: heading

Rather than broad ignore patterns.

XSLT for Complex XML

For complex XML formats, XSLT is more reliable than regex-based ignores.

Common Patterns

Ignore Shortcodes (Hugo, Jekyll)

[*.md]
TokenIgnores = {{.*?}}, {%.*?%}

Prevents false positives from template syntax.

Ignore Attribute Values

[*.html]
TokenIgnores = (class|id)="[^"]+"

Skip linting CSS classes and IDs.

Language-Specific Comments

For Python docstrings as Markdown:

[*.py]
FormatAssociations = md
TokenIgnores = `[^`]+`

Mixed Content

For files with both prose and data:

[*.yml]
BlockIgnores = (?s)^---$.*?^---$

Ignore YAML while linting embedded Markdown.

Transformation Architecture

Key components:

internal/lint/
├── lint.go          # Core linting logic
├── md.go            # Markdown transformation
├── html.go          # HTML token parsing
├── ast.go           # AST walking for markup
├── code.go          # Code file routing
└── code/
    ├── go.go        # Go tree-sitter parser
    ├── py.go        # Python parser
    ├── rs.go        # Rust parser
    └── ...          # Other languages

Each format handler:

Converts to a common representation (usually HTML)
Walks the structure extracting text
Maintains position mappings
Yields blocks to the linter

Scoping - Target specific document parts
Configuration - Set up ignore patterns
Format Detection - Format associations reference

Get Started

Core Concepts

Guides

Advanced

Transformations

Overview

Format Detection

The Transform Method

Using the transform Command

Markup Transformations

Markdown Processing

Other Markup Formats

Code Transformations

Tree-Sitter Integration

Ignore Patterns

Block Ignores

How Ignores Work

XSLT Transformations

Format Overrides

Use Cases

Linting Pipeline

Debugging Transformations

Best Practices

Test Patterns

Preserve Positions

Format-Specific Rules

XSLT for Complex XML

Common Patterns

Transformation Architecture

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Advanced

​Overview

​Format Detection

​The Transform Method

​Using the transform Command

​Markup Transformations

​Markdown Processing

​Other Markup Formats

​Code Transformations

​Tree-Sitter Integration

​Ignore Patterns

​Block Ignores

​How Ignores Work

​XSLT Transformations

​Format Overrides

​Use Cases

​Linting Pipeline

​Debugging Transformations

​Best Practices

Test Patterns

Preserve Positions

Format-Specific Rules

XSLT for Complex XML

​Common Patterns

​Transformation Architecture

​Related Topics

Build docs developers (and LLMs) love

Overview

Format Detection

The Transform Method

Using the transform Command

Markup Transformations

Markdown Processing

Other Markup Formats

Code Transformations

Tree-Sitter Integration

Ignore Patterns

Block Ignores

How Ignores Work

XSLT Transformations

Format Overrides

Use Cases

Linting Pipeline

Debugging Transformations

Best Practices

Common Patterns

Transformation Architecture

Related Topics