AST to IR Transformation

The IR Generator bridges the gap between the language front-end (Phase 1) and backend compilation (Phase 2). It transforms the cognitive AST into a model-agnostic Intermediate Representation (IR).

Cognitive AST → [IR Generator] → Model-Agnostic IR → [Backend] → LLM-Specific Prompts

Why IR Matters

The Problem

Without an IR layer, every backend would need to:

Understand the full AST structure
Resolve cross-references between declarations
Compute data dependencies between steps
Handle tool resolution and validation

This creates tight coupling between language semantics and prompt generation.

The Solution

The IR layer provides:

Model Agnosticism: Zero dependencies on Claude, GPT, Gemini, etc.
Resolved References: All names are linked to their definitions
DAG Ordering: Steps are topologically sorted by dependencies
JSON Serializable: Complete programs can be saved and loaded
Immutable: Once generated, IR nodes are never mutated

IR Node Design

Base IR Node

@dataclass(frozen=True)
class IRNode:
    """Base class for all AXON IR nodes.
    
    Every IR node carries a node_type string for serialization
    dispatch and source location for error traceability.
    """
    node_type: str = ""
    source_line: int = 0
    source_column: int = 0

    def to_dict(self) -> dict[str, Any]:
        """Convert this IR node to a JSON-serializable dictionary."""
        result: dict[str, Any] = {"node_type": self.node_type}
        for key, value in self.__dict__.items():
            if key == "node_type":
                continue
            result[key] = _serialize_value(value)
        return result

Key differences from AST:

frozen=True — IR nodes are immutable
to_dict() — Full JSON serialization support
node_type — Runtime type identification
Tuples instead of lists — Enforces immutability

IR Program Structure

@dataclass(frozen=True)
class IRProgram(IRNode):
    """Root of the AXON IR — the complete compiled program."""
    node_type: str = "program"
    personas: tuple[IRPersona, ...] = ()
    contexts: tuple[IRContext, ...] = ()
    anchors: tuple[IRAnchor, ...] = ()
    tools: tuple[IRToolSpec, ...] = ()
    memories: tuple[IRMemory, ...] = ()
    types: tuple[IRType, ...] = ()
    flows: tuple[IRFlow, ...] = ()
    runs: tuple[IRRun, ...] = ()
    imports: tuple[IRImport, ...] = ()

All declarations are grouped by category, making backend lookups O(1).

The IR Generator

Overview

class IRGenerator:
    """Transforms a type-checked AST into an AXON IR program."""

    def __init__(self) -> None:
        # Symbol tables for cross-reference resolution
        self._personas: dict[str, IRPersona] = {}
        self._contexts: dict[str, IRContext] = {}
        self._anchors: dict[str, IRAnchor] = {}
        self._tools: dict[str, IRToolSpec] = {}
        self._memories: dict[str, IRMemory] = {}
        self._types: dict[str, IRType] = {}
        self._flows: dict[str, IRFlow] = {}
        self._imports: list[IRImport] = []
        self._runs: list[IRRun] = []

    def generate(self, program: ast.ProgramNode) -> IRProgram:
        """Generate a complete IR program from a validated AST."""
        self._reset()

        # Phase 1: Lower all declarations into IR
        for declaration in program.declarations:
            self._visit(declaration)

        # Phase 2: Resolve cross-references in run statements
        resolved_runs = tuple(
            self._resolve_run(run) for run in self._runs
        )

        return IRProgram(
            personas=tuple(self._personas.values()),
            contexts=tuple(self._contexts.values()),
            anchors=tuple(self._anchors.values()),
            tools=tuple(self._tools.values()),
            flows=tuple(self._flows.values()),
            runs=resolved_runs,
            # ...
        )

Visitor Pattern

The IR Generator uses an explicit visitor registry:

_VISITOR_MAP: dict[type, str] = {
    ast.PersonaDefinition: "_visit_persona",
    ast.FlowDefinition: "_visit_flow",
    ast.StepNode: "_visit_step",
    ast.ReasonChain: "_visit_reason",
    ast.ProbeDirective: "_visit_probe",
    ast.WeaveNode: "_visit_weave",
    ast.RunStatement: "_visit_run",
    # ...
}

def _visit(self, node: ast.ASTNode) -> IRNode:
    visitor_name = self._VISITOR_MAP.get(type(node))
    if visitor_name is None:
        raise AxonIRError(
            f"No IR visitor for AST node type: {type(node).__name__}"
        )
    visitor = getattr(self, visitor_name)
    return visitor(node)

Why explicit? Clear errors at development time instead of silent failures.

Declaration Lowering

Persona Example

AST → IR:

# AST (mutable, lists)
PersonaDefinition(
    name="LegalExpert",
    domain=["contract law", "IP"],
    tone="precise",
    confidence_threshold=0.85
)

# IR (frozen, tuples)
IRPersona(
    node_type="persona",
    name="LegalExpert",
    domain=("contract law", "IP"),
    tone="precise",
    confidence_threshold=0.85,
    source_line=1,
    source_column=1
)

Visitor implementation:

def _visit_persona(self, node: ast.PersonaDefinition) -> IRPersona:
    ir_persona = IRPersona(
        source_line=node.line,
        source_column=node.column,
        name=node.name,
        domain=tuple(node.domain),  # List → Tuple
        tone=node.tone,
        confidence_threshold=node.confidence_threshold,
        cite_sources=node.cite_sources,
        refuse_if=tuple(node.refuse_if),
    )
    self._personas[node.name] = ir_persona  # Register in symbol table
    return ir_persona

Flow Lowering with DAG Computation

Key Challenge: Steps may reference each other’s outputs. The IR must order them correctly.

def _visit_flow(self, node: ast.FlowDefinition) -> IRFlow:
    # Compile flow body (steps, probes, reasons, etc.)
    raw_steps = tuple(self._visit(child) for child in node.body)

    # Compute execution DAG
    sorted_steps, edges, execution_levels = self._calculate_execution_dag(
        raw_steps, node.line, node.column
    )

    ir_flow = IRFlow(
        name=node.name,
        parameters=parameters,
        return_type_name=node.return_type.name if node.return_type else "",
        steps=sorted_steps,  # Topologically sorted!
        edges=edges,
        execution_levels=execution_levels,
    )
    self._flows[node.name] = ir_flow
    return ir_flow

DAG Algorithm

The _calculate_execution_dag method:

Extract dependencies from step expressions:

# Extract "Extract" and "Assess" as dependencies
weave [Extract.output, Assess.output] into Report

Build dependency graph:

edges = [
    IRDataEdge(source_step="Extract", target_step="Weave"),
    IRDataEdge(source_step="Assess", target_step="Weave"),
]

Topological sort (Kahn’s algorithm):

sorted_steps = [Extract, Assess, Weave]  # Dependency order

Execution levels (for potential parallelism):

execution_levels = (
    ("Extract", "Assess"),  # Level 0: can run in parallel
    ("Weave",),              # Level 1: depends on level 0
)

Cross-Reference Resolution

The Problem

flow AnalyzeContract(doc: Document) -> ContractAnalysis {
  step Extract { ... }
}

run AnalyzeContract(myContract.pdf)
  as LegalExpert          # ← Must resolve to PersonaDefinition
  within LegalReview      # ← Must resolve to ContextDefinition
  constrained_by [NoHallucination]  # ← Must resolve to AnchorConstraint

The Solution

Phase 2 of IR generation resolves all name references:

def _resolve_run(self, run: IRRun) -> IRRun:
    # Resolve flow
    resolved_flow = self._resolve_ref(
        run.flow_name, self._flows, "flow", run
    )

    # Resolve persona (optional)
    resolved_persona: IRPersona | None = None
    if run.persona_name:
        resolved_persona = self._resolve_ref(
            run.persona_name, self._personas, "persona", run
        )

    # Resolve context (optional)
    resolved_context: IRContext | None = None
    if run.context_name:
        resolved_context = self._resolve_ref(
            run.context_name, self._contexts, "context", run
        )

    # Resolve anchors (Anchor Enforcer)
    resolved_anchors = tuple(
        self._resolve_ref(name, self._anchors, "anchor", run)
        for name in run.anchor_names
    )

    # Produce a new IRRun with all references resolved
    return IRRun(
        flow_name=run.flow_name,
        resolved_flow=resolved_flow,
        resolved_persona=resolved_persona,
        resolved_context=resolved_context,
        resolved_anchors=resolved_anchors,
        # ...
    )

Error handling:

def _resolve_ref(
    self, name: str, table: dict[str, IRNode], kind: str, referrer: IRRun
) -> IRNode:
    if name not in table:
        available = ", ".join(sorted(table.keys())) or "(none)"
        raise AxonIRError(
            f"Run statement references undefined {kind} '{name}'. "
            f"Available {kind}s: {available}",
            line=referrer.source_line,
            column=referrer.source_column,
        )
    return table[name]

IR Data Structures

IRFlow — Compiled Flow

@dataclass(frozen=True)
class IRFlow(IRNode):
    """Compiled flow — an ordered cognitive pipeline."""
    node_type: str = "flow"
    name: str = ""
    parameters: tuple[IRParameter, ...] = ()
    return_type_name: str = ""
    return_type_generic: str = ""
    return_type_optional: bool = False
    steps: tuple[IRNode, ...] = ()  # Topologically sorted!
    edges: tuple[IRDataEdge, ...] = ()
    execution_levels: tuple[tuple[str, ...], ...] = ()

IRReason — Compiled Reasoning

@dataclass(frozen=True)
class IRReason(IRNode):
    """Compiled reason chain — explicit chain-of-thought directive."""
    node_type: str = "reason"
    name: str = ""
    about: str = ""
    given: tuple[str, ...] = ()  # Always normalized to tuple
    depth: int = 1
    show_work: bool = False
    chain_of_thought: bool = False
    ask: str = ""
    output_type: str = ""

IRRun — Resolved Execution

@dataclass(frozen=True)
class IRRun(IRNode):
    """Compiled run statement — the complete execution binding."""
    node_type: str = "run"
    flow_name: str = ""
    arguments: tuple[str, ...] = ()
    persona_name: str = ""
    context_name: str = ""
    anchor_names: tuple[str, ...] = ()
    
    # Resolved references (populated by IRGenerator)
    resolved_flow: IRFlow | None = None
    resolved_persona: IRPersona | None = None
    resolved_context: IRContext | None = None
    resolved_anchors: tuple[IRAnchor, ...] = ()

Tool Resolution

The IR Generator verifies that all tool references are valid:

def _verify_flow_tools(self, flow: IRFlow, run: IRRun) -> None:
    """Verify that all tool references within a flow's steps
    are resolvable against declared tool definitions."""
    for step_node in flow.steps:
        self._verify_step_tools(step_node, run)

def _verify_step_tools(self, node: IRNode, run: IRRun) -> None:
    if isinstance(node, IRStep):
        if node.use_tool is not None:
            tool_name = node.use_tool.tool_name
            if tool_name and tool_name not in self._tools:
                available = ", ".join(sorted(self._tools.keys())) or "(none)"
                raise AxonIRError(
                    f"Step '{node.name}' uses undefined tool '{tool_name}'. "
                    f"Available tools: {available}"
                )

JSON Serialization

The IR is fully JSON-serializable:

ir_program: IRProgram = ir_generator.generate(ast)
ir_dict: dict = ir_program.to_dict()

# Save to file
import json
with open("program.ir.json", "w") as f:
    json.dump(ir_dict, f, indent=2)

Example output:

{
  "node_type": "program",
  "personas": [
    {
      "node_type": "persona",
      "name": "LegalExpert",
      "domain": ["contract law", "IP"],
      "tone": "precise",
      "confidence_threshold": 0.85
    }
  ],
  "flows": [
    {
      "node_type": "flow",
      "name": "AnalyzeContract",
      "steps": [...],
      "edges": [...],
      "execution_levels": [["Extract", "Assess"], ["Weave"]]
    }
  ]
}

Compiler

Backends

Runtime

AST to IR Transformation

AST to IR Transformation

Why IR Matters

The Problem

The Solution

IR Node Design

Base IR Node

IR Program Structure

The IR Generator

Overview

Visitor Pattern

Declaration Lowering

Persona Example

Flow Lowering with DAG Computation

DAG Algorithm

Cross-Reference Resolution

The Problem

The Solution

IR Data Structures

IRFlow — Compiled Flow

IRReason — Compiled Reasoning

IRRun — Resolved Execution

Tool Resolution

JSON Serialization

Next Steps

Type Checker

Backend Compilation

Build docs developers (and LLMs) love

Compiler

Backends

Runtime

​AST to IR Transformation

​Why IR Matters

​The Problem

​The Solution

​IR Node Design

​Base IR Node

​IR Program Structure

​The IR Generator

​Overview

​Visitor Pattern

​Declaration Lowering

​Persona Example

​Flow Lowering with DAG Computation

​DAG Algorithm

​Cross-Reference Resolution

​The Problem

​The Solution

​IR Data Structures

​IRFlow — Compiled Flow

​IRReason — Compiled Reasoning

​IRRun — Resolved Execution

​Tool Resolution

​JSON Serialization

​Next Steps

Type Checker

Backend Compilation

Build docs developers (and LLMs) love

AST to IR Transformation

Why IR Matters

The Problem

The Solution

IR Node Design

Base IR Node

IR Program Structure

The IR Generator

Overview

Visitor Pattern

Declaration Lowering

Persona Example

Flow Lowering with DAG Computation

DAG Algorithm

Cross-Reference Resolution

The Problem

The Solution

IR Data Structures

IRFlow — Compiled Flow

IRReason — Compiled Reasoning

IRRun — Resolved Execution

Tool Resolution

JSON Serialization

Next Steps