Skip to main content
MCC transforms C source code into executable machine code through seven well-defined stages. Each stage consumes the output of the previous stage and is implemented as a Salsa tracked function.

Pipeline overview

The compilation follows a linear pipeline: Each stage is cached by Salsa, enabling incremental compilation when source files haven’t changed.

Stage 1: Preprocessing

Runs the system C preprocessor to handle #include, #define, and other directives.

Implementation

// crates/mcc/src/preprocessing.rs:12
#[salsa::tracked]
pub fn preprocess(db: &dyn Db, cc: OsString, src: SourceFile) 
    -> Result<Text, PreprocessorError>

What it does

  • Invokes the system C compiler with -E -P flags
  • Expands macros and includes
  • Removes comments and directives
  • Returns preprocessed source as Text
SourceFile {
    path: "main.c",
    contents: "#define X 42\nint main() { return X; }"
}
The mcc-driver currently runs preprocessing and writes the result to a temporary file. The parser reads the original SourceFile contents.

Stage 2: Parsing

Transforms source text into a strongly-typed Abstract Syntax Tree (AST) using tree-sitter.

Implementation

// crates/mcc/src/parsing.rs:15
#[salsa::tracked]
pub fn parse(db: &dyn Db, file: SourceFile) -> Ast<'_>

What it does

  • Invokes tree-sitter’s C grammar parser
  • Wraps the parse tree in strongly-typed AST nodes from mcc-syntax
  • Accumulates syntax error diagnostics
  • Returns an Ast even if errors are present (error recovery)
int main(void) { return 0; }

Error recovery

Tree-sitter provides error recovery, allowing parsing to continue even when syntax errors are present:
// crates/mcc/src/parsing.rs:51
fn check_node(db: &dyn Db, node: TsNode<'_>, file: SourceFile) -> Continuation {
    if node.is_missing() {
        Diagnostic::error()
            .with_message(format!("Expected a \"{}\"", 
                node.parent().unwrap().grammar_name()))
            .accumulate(db);
    }
    // ... continue parsing
}

Stage 3: Typechecking (HIR)

Builds the High-Level Intermediate Representation (HIR) from the AST. Semantic errors are surfaced here.

Implementation

// crates/mcc/src/typechecking/mod.rs:23
#[salsa::tracked]
pub fn typecheck(db: &dyn Db, file: SourceFile) -> hir::TranslationUnit<'_>

What it does

  • Validates return types (int, void)
  • Checks for C keywords used as identifiers
  • Validates type specifiers in declarations
  • Builds HIR representation with resolved names
// crates/mcc/src/typechecking/hir.rs:14
#[salsa::tracked(debug)]
pub struct TranslationUnit<'db> {
    pub items: Vec<Item<'db>>,
    pub file: SourceFile,
}

#[derive(Debug, Copy, Clone)]
pub enum Item<'db> {
    Function(FunctionDefinition<'db>),
}

#[salsa::tracked(debug)]
pub struct FunctionDefinition<'db> {
    pub name: Identifier<'db>,
    pub node: ptr::FunctionDefinition<'db>,
}

#[derive(Debug, Copy, Clone)]
pub enum Type {
    Void,
    Int,
    Error,
}

Validation examples

ints main(void) { return 0; }
// Error: invalid type "ints"

Stage 4: Lowering (TAC)

Transforms HIR into Three Address Code (TAC), a simplified intermediate representation.

Implementation

// crates/mcc/src/lowering/mod.rs:716
#[salsa::tracked]
pub fn lower_program(db: &dyn Db, file: SourceFile) -> tacky::Program<'_>

What it does

  • Converts HIR to TAC instructions
  • Breaks complex expressions into simple operations
  • Generates temporary variables
  • Implements control flow (jumps, labels)
  • Validates variable declarations (undeclared, redefinition)
int main(void) {
    int a = 2 + 3;
    return a * 4;
}

TAC instruction types

pub enum Instruction {
    Return(Val),
    Unary { op: UnaryOperator, src: Val, dst: Val },
    Binary { op: BinaryOperator, left_src: Val, right_src: Val, dst: Val },
    Comparison { op: ComparisonOperator, left_src: Val, right_src: Val, dst: Val },
    Copy { src: Val, dst: Val },
    Jump { target: Text },
    JumpIfZero { condition: Val, target: Text },
    JumpIfNotZero { condition: Val, target: Text },
    Label(Text),
}

Short-circuit evaluation

Logical operators implement short-circuit semantics:
// crates/mcc/src/lowering/mod.rs:532
fn lower_logical_and(
    &mut self,
    left: ast::Expression<'_>,
    right: ast::Expression<'_>,
) -> Option<tacky::Val> {
    let left_val = self.lower_expression(left)?;
    let false_label = self.label();
    
    // If left is zero, jump to false case
    self.instructions.push(tacky::Instruction::JumpIfZero {
        condition: left_val,
        target: false_label.clone(),
    });
    
    // Otherwise evaluate right
    let right_val = self.lower_expression(right)?;
    // ...
}

Stage 5: Code generation (ASM IR)

Lowers TAC to a target-agnostic assembly intermediate representation.

Implementation

// crates/mcc/src/codegen/mod.rs:10
#[salsa::tracked]
pub fn generate_assembly(db: &dyn Db, program: tacky::Program<'_>) 
    -> asm::Program<'_>

What it does

  • Converts TAC instructions to assembly operations
  • Allocates stack space for local variables
  • Maps TAC values to operands (immediate, register, stack)
  • Fixes up invalid instruction combinations
// crates/mcc/src/codegen/mod.rs:210
impl StackAllocator {
    fn operand_for(&mut self, val: tacky::Val) -> asm::Operand {
        match val {
            tacky::Val::Constant(c) => asm::Operand::Imm(c),
            tacky::Val::Var(v) => asm::Operand::Stack(self.offset_for(v)),
        }
    }
    
    fn offset_for(&mut self, variable: tacky::Variable) -> u32 {
        (self.index_of(variable) as u32) * 4  // 4 bytes per slot
    }
}

Assembly IR types

pub enum Instruction {
    Mov { src: Operand, dst: Operand },
    Unary { op: UnaryOperator, operand: Operand },
    Binary { op: BinaryOperator, src: Operand, dst: Operand },
    Comparison { op: ComparisonOperator, left: Operand, right: Operand, dst: Operand },
    Idiv { src: Operand },
    Cdq,
    AllocateStack(u32),
    Ret,
    Label(Text),
    Jump { target: Text },
    JumpIfZero { condition: Operand, target: Text },
    JumpIfNotZero { condition: Operand, target: Text },
}

pub enum Operand {
    Imm(i32),
    Register(Register),
    Stack(u32),
}

Instruction fixups

Fixes invalid instruction patterns:
// crates/mcc/src/codegen/mod.rs:247
#[salsa::tracked]
fn fix_up_instructions(db: &dyn Db, function: asm::FunctionDefinition<'_>) 
    -> asm::FunctionDefinition<'_> {
    match instruction {
        // mov from stack to stack is invalid
        Mov { src: Stack(_), dst: Stack(_) } => {
            instructions.push(Mov { src, dst: Register(R10) });
            instructions.push(Mov { src: Register(R10), dst });
        }
        // idiv doesn't accept immediates
        Idiv { src: Imm(_) } => {
            instructions.push(Mov { src, dst: Register(R10) });
            instructions.push(Idiv { src: Register(R10) });
        }
        // ...
    }
}

Stage 6: Rendering (assembly text)

Renders the assembly IR to textual x86_64 assembly with OS-specific conventions.

Implementation

// crates/mcc/src/render.rs:13
#[salsa::tracked]
pub fn render_program(db: &dyn Db, program: asm::Program<'_>, target: Triple) 
    -> Text

What it does

  • Converts assembly IR to AT&T syntax
  • Applies macOS symbol prefixes (leading underscore)
  • Emits GNU stack note on Linux
  • Generates function prologues and epilogues
.globl _main
_main:
pushq %rbp
movq %rsp, %rbp
movl $0, %eax
movq %rbp, %rsp
popq %rbp
ret

OS-specific rendering

// crates/mcc/src/render.rs:46
fn function_name<'a>(&self, name: &'a str) -> Cow<'a, str> {
    if matches!(
        self.target.operating_system,
        OperatingSystem::MacOSX(_) | OperatingSystem::Darwin(_)
    ) {
        format!("_{name}").into()  // macOS: _main
    } else {
        name.into()                // Linux: main
    }
}

Stage 7: Assembling

Invokes the system compiler to assemble the emitted assembly file into an executable.

Implementation

// crates/mcc/src/assembling.rs:10
#[salsa::tracked]
pub fn assemble_and_link(
    _db: &dyn Db,
    cc: OsString,
    assembly: PathBuf,
    dest: PathBuf,
    target: Triple,
) -> Result<(), CommandError>

What it does

  • Invokes system compiler (e.g., cc or gcc)
  • Passes target architecture flags
  • Links standard libraries
  • Produces final executable
// crates/mcc/src/assembling.rs:17
let mut cmd = Command::new(cc);
cmd.arg("-o").arg(dest);

if matches!(target.operating_system, OperatingSystem::Darwin(_)) {
    cmd.arg("-arch").arg(target.architecture.to_string());
}

cmd.arg(assembly);

Data flow summary

1

Source → Preprocessed Text

SourceFilepreprocess()Text
2

Text → AST

SourceFileparse()Ast
3

AST → HIR

SourceFiletypecheck()hir::TranslationUnit
4

HIR → TAC

SourceFilelower_program()tacky::Program
5

TAC → ASM IR

tacky::Programgenerate_assembly()asm::Program
6

ASM IR → Assembly Text

asm::Programrender_program()Text
7

Assembly Text → Executable

PathBufassemble_and_link()Executable
Each stage is a Salsa tracked function, enabling incremental recompilation. See Incremental compilation for details.

Build docs developers (and LLMs) love