Skip to main content
The parsing stage uses tree-sitter to convert preprocessed C code into a concrete syntax tree (CST), performing syntax validation with error recovery.

Overview

MCC uses tree-sitter with the tree-sitter-c grammar to parse C source into a lossless syntax tree. Type-sitter provides strongly-typed Rust bindings over the raw tree-sitter nodes. Input: SourceFile (preprocessed C code)
Output: Ast<'_> (concrete syntax tree)
Module: crates/mcc/src/parsing.rs

Entry Point

#[salsa::tracked]
pub fn parse(db: &dyn Db, file: SourceFile) -> Ast<'_>
This tracked function always returns an AST, even for invalid programs. Diagnostics are accumulated separately.

Implementation Details

Tree-sitter Setup

let mut parser = tree_sitter::Parser::new();
parser
    .set_language(&tree_sitter::Language::new(tree_sitter_c::LANGUAGE))
    .unwrap();

let src = file.contents(db);
let tree = Tree::from(parser.parse(src, None).unwrap());
The parser is configured with the C grammar and runs on the source text.

Error Recovery

Tree-sitter has built-in error recovery – it produces a tree even for invalid syntax. The check_tree function walks the tree to identify error nodes:
fn check_tree(db: &dyn Db, tree: &Tree, file: SourceFile) {
    let mut cursor = tree.walk();
    let mut to_check: Vec<TsNode<'_>> = vec![tree.root_node()];

    while let Some(node) = to_check.pop() {
        match check_node(db, node, file) {
            Continuation::Skip => {},
            Continuation::Recurse => { /* push children */ },
            Continuation::Emit(diag) => diag.accumulate(db),
        }
    }
}

Error Node Types

Missing Nodes

When the parser expects a token but reaches EOF:
if node.is_missing() {
    let diagnostic = Diagnostic::error()
        .with_message(format!(
            "Expected a \"{}\"",
            node.parent().unwrap().grammar_name()
        ))
        .with_code(codes::parse::unexpected_token)
        .with_labels(vec![...]);
}

Error Nodes

Unexpected tokens or malformed syntax:
if node.is_error() {
    let token = node.utf8_text(file.contents(db).as_ref()).unwrap();
    let diagnostic = Diagnostic::error()
        .with_message(format!(
            "Expected a \"{}\", but found \"{}\"",
            node.parent().unwrap().grammar_name(),
            token
        ))
        .with_labels(vec![...]);
}

AST Types

The AST is a typed wrapper around the tree-sitter tree, generated by type-sitter:
#[salsa::tracked]
pub struct Ast<'db> {
    pub tree: Tree,
}

impl Ast {
    pub fn root(&self, db: &dyn Db) -> ast::TranslationUnit<'_> {
        // Returns strongly-typed root node
    }
}

Type-Sitter Integration

Type-sitter provides Rust enums and structs matching the C grammar:
pub enum Statement<'a> {
    ReturnStatement(ReturnStatement<'a>),
    ExpressionStatement(ExpressionStatement<'a>),
    IfStatement(IfStatement<'a>),
    // ...
}

pub enum Expression<'a> {
    NumberLiteral(NumberLiteral<'a>),
    UnaryExpression(UnaryExpression<'a>),
    BinaryExpression(BinaryExpression<'a>),
    // ...
}
This enables exhaustive pattern matching and compile-time guarantees about tree structure.

Semantic vs. Syntactic Checks

Parsing checks ONLY syntax:
  • Tree shape and token structure
  • Parentheses matching
  • Statement terminators
Deferred to typechecking:
  • Return type validation
  • Keyword usage (e.g., int return = 5;)
  • Type specifier validity (e.g., ints vs int)
This separation keeps parsing focused on structure, not semantics.

Example

Input:
int main(void) {
    return 0;
}
AST Structure:
translation_unit
  function_definition
    primitive_type: "int"
    function_declarator
      declarator: identifier "main"
      parameter_list: (void)
    compound_statement
      return_statement
        number_literal: "0"
Invalid Input:
int main(void) {
    return
}
Error Diagnostic:
error: Expected a "return_statement", but found "}"
  ┌─ main.c:3:1

3 │ }
  │ ^ error occurred here
  • Previous: Preprocessing – Expands macros and directives
  • Next: Typechecking – Validates semantics and builds HIR
  • AST Definition: crates/mcc-syntax/src/ast.rs (generated by type-sitter)

Build docs developers (and LLMs) love