Overview
MCC uses tree-sitter with thetree-sitter-c grammar to parse C source into a lossless syntax tree. Type-sitter provides strongly-typed Rust bindings over the raw tree-sitter nodes.
Input: SourceFile (preprocessed C code)Output:
Ast<'_> (concrete syntax tree)Module:
crates/mcc/src/parsing.rs
Entry Point
Implementation Details
Tree-sitter Setup
Error Recovery
Tree-sitter has built-in error recovery – it produces a tree even for invalid syntax. Thecheck_tree function walks the tree to identify error nodes:
Error Node Types
Missing Nodes
When the parser expects a token but reaches EOF:Error Nodes
Unexpected tokens or malformed syntax:AST Types
The AST is a typed wrapper around the tree-sitter tree, generated by type-sitter:Type-Sitter Integration
Type-sitter provides Rust enums and structs matching the C grammar:Semantic vs. Syntactic Checks
Parsing checks ONLY syntax:- Tree shape and token structure
- Parentheses matching
- Statement terminators
- Return type validation
- Keyword usage (e.g.,
int return = 5;) - Type specifier validity (e.g.,
intsvsint)
Example
Input:Related Stages
- Previous: Preprocessing – Expands macros and directives
- Next: Typechecking – Validates semantics and builds HIR
- AST Definition:
crates/mcc-syntax/src/ast.rs(generated by type-sitter)