Skip to main content

Creating Parsers

Developing Tree-sitter grammars can have a difficult learning curve, but once you get the hang of it, it can be fun and even zen-like. This guide will help you get started and develop a useful mental model for creating parsers.

What You’ll Learn

This section covers everything you need to know about creating Tree-sitter parsers:

Getting Started

Set up your development environment and create your first parser

Grammar DSL

Master the grammar DSL functions and syntax

Writing Grammar

Learn best practices for structuring your grammar rules

External Scanners

Handle complex lexical rules with custom C code

Testing

Write comprehensive tests for your parser

Publishing

Share your parser with the community

Key Concepts

Before diving into parser development, it’s important to understand a few key concepts:

Grammar Structure

Tree-sitter grammars are written in JavaScript using a declarative DSL. Each grammar defines:
  • Rules - The structure of your language’s syntax
  • Tokens - The terminal symbols (keywords, operators, literals)
  • Extras - Tokens that can appear anywhere (whitespace, comments)

Parse Trees

Tree-sitter produces concrete syntax trees where:
  • Each node corresponds to a grammar symbol
  • The tree structure reflects your grammar’s hierarchy
  • Nodes can have field names for easier navigation
Tree-sitter’s output is a concrete syntax tree (CST), not an abstract syntax tree (AST). This means every detail of the source code is preserved in the tree.

LR(1) Grammars

Tree-sitter is based on the GLR parsing algorithm but works most efficiently with LR(1) grammars. This means:
  • The parser can look ahead one token to make decisions
  • Most conflicts can be resolved with precedence and associativity
  • Some ambiguities can be explicitly declared
Tree-sitter grammars are similar to Yacc/Bison grammars but different from ANTLR or PEG grammars. You’ll likely need to adjust existing grammars when porting them to Tree-sitter.

Development Workflow

A typical workflow for developing a Tree-sitter parser:
1

Set up your project

Use tree-sitter init to create the initial project structure with a grammar.js file.
2

Define basic rules

Start with the top-level structure and gradually add more detailed rules.
3

Write tests

Create tests in test/corpus/ for each rule as you add them.
4

Generate and test

Run tree-sitter generate to create the parser, then tree-sitter test to verify it works.
5

Iterate

Refine your grammar, fix conflicts, and add more features incrementally.

Why Tree-sitter?

Tree-sitter parsers offer several advantages:
  • Incremental parsing - Only re-parse changed portions of the document
  • Error recovery - Continue parsing even with syntax errors
  • Performance - Fast enough for real-time editing in text editors
  • Language agnostic - Generate bindings for multiple programming languages
  • Query system - Powerful pattern matching for syntax highlighting and analysis
Start small and build incrementally. Don’t try to implement the entire language specification at once. Focus on getting a working parser for a subset of the language first.

Prerequisites

Before you begin, you should have:
  • Basic understanding of context-free grammars
  • Familiarity with JavaScript (for writing grammars)
  • Knowledge of C (for external scanners, if needed)
  • A language specification or documentation for the language you’re parsing

Next Steps

Ready to get started? Continue to Getting Started to set up your development environment and create your first parser.

Build docs developers (and LLMs) love