Project Structure
MCC is organized as a Cargo workspace with multiple crates, each with a specific responsibility:Crate Overview
mcc - Core Compiler Library
The heart of the compiler, containing the main compilation pipeline. Each compilation stage is implemented as a separate module:
preprocessing- Runs the system C preprocessor (viacc -E -P)parsing- Tree-sitter-based parsing with error recovery and validationtypechecking- Builds the High-Level IR (HIR) from the AST; semantic errors are surfaced herelowering- Transforms HIR into Three Address Code (TAC) vialower_programcodegen- Lowers TAC to a target-agnostic assembly IR (codegen::asm)render- Renders the assembly IR to textual assembly, with OS-specific conventionsassembling- Invokes the system compiler to assemble the emitted assembly file into an executable
mcc-syntax - Syntax Layer
Provides strongly-typed AST nodes generated from the tree-sitter grammar. This layer is independent of compilation logic and focuses purely on syntax representation.
Key principle: The syntax layer contains no compilation logic.
mcc-driver - Command-Line Interface
Orchestrates the compilation pipeline and handles user interaction. Exposes a Callbacks trait fired after each stage:
after_parseafter_lowerafter_codegenafter_render_assemblyafter_compile
mcc-macros - Procedural Macros
Contains procedural macros used throughout the codebase.
xtask - Build-Time Tooling
Development utilities and build-time tools following the xtask pattern.
Data Flow
The compilation follows a linear pipeline where each stage consumes the output of the previous stage:Key Types and Abstractions
SourceFile- Represents a source file with path and contentsAst- Wraps the tree-sitter parse tree with strongly-typed accessorshir::TranslationUnit- High-Level IR produced by typecheckingtacky::Program- Three Address Code (TAC) IRcodegen::asm::Program- Assembly IR (prior to textual rendering)Database/Db- Salsa database/trait for incremental compilationDiagnostics- Salsa accumulator newtype for collecting diagnosticsText- Reference-counted string type for efficient memory sharingFiles- File collection for error reporting and source management
Module Boundaries
Syntax/Compilation Boundary
Themcc-syntax crate provides the AST interface, while mcc contains all compilation logic. This boundary ensures that syntax changes don’t require recompiling the entire compiler.
Pipeline Stage Boundaries
Each compilation stage is implemented as a separate module with clear input/output contracts. Stages communicate only through well-defined data structures. Architectural invariant: No compilation stage depends on later stages in the pipeline.External Tool Boundary
The compiler delegates preprocessing, assembly, and linking to external tools (typically the system C compiler). This boundary allows the compiler to focus on core compilation logic while leveraging mature external tools.Error Handling Boundary
All compilation stages accumulate diagnostics rather than failing immediately, allowing the compiler to report all errors in a single pass.Dependencies
MCC uses several key dependencies:- Salsa - Incremental compilation framework
- tree-sitter - Parsing library
- type-sitter - Strongly-typed tree-sitter bindings
- codespan-reporting - Diagnostic formatting and error reporting
- clap - Command-line argument parsing
- anyhow - Error handling
- tracing - Structured logging
Finding Your Way Around
Entry Points
- CLI:
crates/mcc-driver/src/main.rs - Pipeline:
crates/mcc/src/lib.rs - AST:
crates/mcc-syntax/src/lib.rs
Common Tasks
Adding a new compilation stage module: Modifycrates/mcc/src/ and update the pipeline in the core library.
Modifying the AST: Update the tree-sitter grammar source, then regenerate (don’t hand-edit generated files).
Adding CLI options: Modify crates/mcc-driver/src/ using clap’s derive macros.
Adding diagnostics: Use the Diagnostics accumulator in the relevant compilation stage.
Testing
The project includes comprehensive testing:- Unit tests: Located alongside source code in each crate
- Doc tests: Embedded in documentation comments
- Integration tests: Full end-to-end compilation testing against the writing-a-c-compiler-tests suite
Target Support
The compiler targets x86_64 by default but is designed to support multiple architectures through thetarget-lexicon crate. The renderer applies OS-specific conventions:
- macOS: leading underscore on symbols
- Linux: GNU stack note