Skip to main content
The assembler converts Intel 4004 assembly source code into executable binary format. It supports standard 4004 instructions plus extended syntax for register operations and macros.

Command line usage

The assembler requires exactly two arguments:
./assembler <input_file> <output_file>
input_file
string
required
Path to the assembly source file (.asm)
output_file
string
required
Path where the assembled binary will be written

Input format

Assembly source files use a line-based format with support for labels, instructions, and directives.

Labels

Labels are identifiers followed by a comma:
START,
LOOP,
Labels automatically receive the address type and can be referenced by instructions.

Instructions

Standard Intel 4004 mnemonics are supported. Instructions can take various argument types:
LDM 5        / Load immediate value
ADD 3R       / Add register 3
JUN START    / Jump to label

Comments

Comments begin with / and continue to end of line:
LDM 10  / Load the value 10

Number formats

The assembler supports multiple number literal formats:
LDM 15

Character literals

Character literals support escape sequences:
  • '\n' - Newline
  • '\t' - Tab
  • '\a' - Bell (0x07)
  • '\d' - Delete (0x7F)
  • '\\' - Backslash
  • '\'' - Single quote

Expressions

The assembler supports arithmetic expressions with operators:
1

Addition operator (+)

Add two values together
JUN START+10
2

Subtraction operator (-)

Subtract second value from first
JUN END-5
3

Nibble extraction (@)

Extract a specific nibble (4 bits) from a number
LDM 255@1    / Gets nibble 1 (value 15)
4

Program counter (*)

Current assembly address
*+4          / Four bytes ahead

Directives

Origin directive

Set the assembly address:
= 256        / Start assembling at address 256
Origin expressions cannot contain labels, only numeric literals and the program counter (*).

Equate directive

Define named constants:
STACK_SIZE = 32
MAX_COUNT = 15
Equates can reference labels and use expressions.

Extended syntax

The assembler provides high-level syntax that expands into multiple instructions.

Arrow operator (->)

Load values into registers:
255 -> 0R 1R     / Load 15 into 0R, 15 into 1R
0R 1R -> 2R 3R   / Copy 0R to 2R, 1R to 3R
The arrow operator expands to LDM/LD and XCH instructions.
Use underscore _ as a placeholder to skip destination registers:
255 -> _ 1R      / Only load into 1R

Add-assign operator (+=)

Add values to registers:
1R 2R += 5       / Add 5 to multi-register value
Expands to CLC, LDM/LD, ADD, and XCH instructions.
Placeholders (_) are not allowed in add-assign operations.

Subtract-assign operator (-=)

Subtract values from registers:
1R 2R -= 3       / Subtract 3 from multi-register value
Expands to CLC, LDM/LD, SUB, XCH, and CMC instructions (except last CMC).
Placeholders (_) are not allowed in subtract-assign operations.

Macros

CALL macro

Implements a subroutine call by pushing the return address:
#CALL SUBROUTINE
Expands to save the return address in three parts (using registers 8R) and jump to the target.

LJCN macro

Long conditional jump (conditional jump to any address):
#LJCN Z? TARGET      / Jump to TARGET if zero
#LJCN NC? HANDLER    / Jump to HANDLER if not carry
Supports conditions: Z?, NZ?, C?, NC? Expands to a conditional jump over an unconditional jump.

Condition codes

Condition arguments use the ? suffix:
  • Z? - Zero (0b0100)
  • NZ? - Not zero (0b1100)
  • C? - Carry (0b0010)
  • NC? - Not carry (0b1010)

Error handling

The assembler validates:
  • Type mismatches between arguments and instruction requirements
  • Label redefinitions
  • 8-bit jumps across page boundaries
  • Expression overflow
  • Invalid mnemonics and conditions
The assembler performs two-pass assembly: first pass collects label addresses, second pass resolves all references and generates code.

Output format

The assembled output is a raw binary file of exactly 4096 bytes, representing the full Intel 4004 program memory space. Unwritten addresses are zero-filled.

Build docs developers (and LLMs) love