Skip to main content

Introduction

Go’s assembler is based on the Plan 9 assembler syntax, which differs from traditional assemblers. It operates on a semi-abstract instruction set rather than providing direct access to machine instructions. This guide explains Go’s assembly language and how to use it effectively.
Go’s assembler is not a direct representation of the underlying machine. Some details map precisely to hardware, but many are abstracted. The toolchain handles instruction selection during code generation.

Key Concepts

Semi-Abstract Instructions

The assembler works with semi-abstract instructions:
  • A MOV might not generate a move instruction at all
  • Could be a clear, load, or other operation
  • Machine-specific operations tend to appear as themselves
  • General concepts (memory move, calls) are more abstract

Viewing Assembly Output

To see what assembly your Go code generates:
# Compile and show assembly
go build -gcflags=-S main.go

# Or use compile tool directly
GOOS=linux GOARCH=amd64 go tool compile -S x.go

# Disassemble compiled binary
go build -o program main.go
go tool objdump -s main.main program

Syntax and Structure

Constants

Constant expressions use Go operator precedence:
// This is 4, not 0 (parsed as (3&1)<<2, not 3&(1<<2))
3&1<<2

// Constants are 64-bit unsigned
-2  // Represented as unsigned 64-bit with same bit pattern
Division or right shift where the right operand’s high bit is set is rejected to avoid ambiguity.

Symbols and Pseudo-Registers

Four predeclared pseudo-registers (same on all architectures):
  • FP (Frame Pointer): Arguments and locals
  • PC (Program Counter): Jumps and branches
  • SB (Static Base): Global symbols
  • SP (Stack Pointer): Top of local stack frame

Static Base (SB)

Used for global functions and data:
// Global function
TEXT runtime·profileloop(SB), NOSPLIT, $8

// Global symbol
MOVQ $runtime·profileloop1(SB), CX

// File-local symbol (like static in C)
TEXT foo<>(SB), NOSPLIT, $0

// Offset from symbol
MOVQ foo+4(SB), AX  // 4 bytes past start of foo

Frame Pointer (FP)

Access function arguments:
// Must use names with offsets
MOVQ first_arg+0(FP), AX    // First argument
MOVQ second_arg+8(FP), BX   // Second argument (64-bit)

// On 32-bit systems, 64-bit values split:
MOVL arg_lo+0(FP), AX
MOVL arg_hi+4(FP), DX
Plain 0(FP) is rejected - you must use a name like arg+0(FP). The name is for documentation and verification by go vet.

Stack Pointer (SP)

Access local variables and prepare function calls:
// Negative offsets from SP for locals
MOVQ x-8(SP), AX    // Local variable
MOVQ y-16(SP), BX   // Another local

// Range: [-framesize, 0)
On architectures with hardware SP register:
  • x-8(SP) - virtual stack pointer
  • -8(SP) - hardware SP register

Labels and Jumps

label:
    MOVW $0, R1
    JMP label  // Jump to label

// Labels are function-local
// Multiple functions can use same label names
Direct jumps use symbols:
CALL name(SB)      // OK
JMP name(SB)       // OK  
JMP name+4(SB)     // ERROR: cannot use offset

Directives

TEXT Directive

Declares a function:
TEXT runtime·profileloop(SB), NOSPLIT, $8
    MOVQ $runtime·profileloop1(SB), CX
    MOVQ CX, 0(SP)
    CALL runtime·externalthreadhandler(SB)
    RET
Format: TEXT symbol(SB), flags, $framesize-argsize
  • framesize: Local stack frame size
  • argsize: Argument size on caller’s frame
  • flags: See textflag.h (NOSPLIT, WRAPPER, etc.)
The last instruction in a TEXT block must be a jump (usually RET). The linker will add a jump-to-itself if missing.

Common Flags

NOSPLIT     = 4    // Don't check for stack split
RODATA      = 8    // Read-only data
NOPTR       = 16   // Contains no pointers (GC)
WRAPPER     = 32   // Wrapper function (for recover)
NEEDCTXT    = 64   // Closure, uses context register
NOFRAME     = 512  // No frame allocation (frame must be $0)
TOPFRAME    = 2048 // Outermost frame (stop traceback)

DATA and GLOBL Directives

Define global data:
// Initialize data
DATA divtab<>+0x00(SB)/4, $0xf4f8fcff
DATA divtab<>+0x04(SB)/4, $0xe6eaedf0
DATA divtab<>+0x3c(SB)/4, $0x81828384

// Declare global symbol
GLOBL divtab<>(SB), RODATA, $64

// Implicitly zeroed variable
GLOBL runtime·tlsoffset(SB), NOPTR, $4
Format: DATA symbol+offset(SB)/width, value

Special Instructions

FUNCDATA and PCDATA

Generated by compiler for GC information:
FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
PCDATA $0, $0
These provide stack map information for the garbage collector.

PCALIGN

Align next instruction:
PCALIGN $32
MOVD $2, R0  // Start of MOVD aligned to 32 bytes
Supported on: arm64, amd64, ppc64, loong64, riscv64

Interacting with Go

go_asm.h Header

When a package has .s files, go build generates go_asm.h:
// Provides constants for Go types
const_bufSize              // Go const values
reader__size              // Struct sizes
reader_buf                // Field offsets  
reader_r
Usage in assembly:
#include "go_asm.h"

// Access field of struct pointer in R1
MOVQ reader_r(R1), AX
This keeps assembly robust to changes in Go type layouts. Always use these constants instead of hard-coding offsets.

Runtime Coordination

Assembly functions need pointer information for GC:
  1. Define Go prototype in a .go file:
//go:linkname ·asmFunction
func asmFunction(arg1 int, arg2 *byte) int
  1. Implement in assembly:
TEXT ·asmFunction(SB), NOSPLIT, $0-24
    MOVQ arg1+0(FP), AX
    MOVQ arg2+8(FP), BX
    // ... implementation ...
    MOVQ AX, ret+16(FP)
    RET
Rules:
  • Assembly name must not include package (use ·Function not pkg·Function)
  • Always provide Go prototype for pointer safety
  • Mark data with NOPTR if it contains no pointers
  • Use NO_LOCAL_POINTERS if local frame has no pointers

Calling Convention

Data flow is left to right:
MOVQ $0, CX    // Clear CX (0CX)
ADDQ AX, BX    // Add AX to BX (AX + BXBX)
This applies even on architectures with opposite conventional notation.

Architecture-Specific Details

x86 (386 and amd64)

Accessing g and m

#include "go_tls.h"
#include "go_asm.h"

get_tls(CX)
MOVQ g(CX), AX        // Move g into AX
MOVQ g_m(AX), BX      // Move g.m into BX  

Addressing Modes

(DI)(BX*2)            // Address DI + BX*2
64(DI)(BX*2)          // Address DI + BX*2 + 64
// Scale factors: 1, 2, 4, 8 only
In -dynlink or -shared modes, loads/stores of globals may overwrite CX. Avoid using CX between memory references.

ARM64

Registers:
  • R18: Platform register (reserved on Apple)
  • R27, R28: Reserved by compiler/linker
  • R29: Frame pointer
  • R30: Link register
Instruction modifiers:
MOVW.P    // Post-increment
MOVW.W    // Pre-increment
Addressing modes:
R0->16        // Arithmetic right shift
R0>>16        // Logical right shift  
R0<<16        // Left shift
R0@>16        // Rotate right

$(8<<12)      // Immediate with shift
8(R0)         // R0 + 8
(R2)(R0)      // R0 + R2

R0.UXTB       // Zero-extend byte
R0.SXTB       // Sign-extend byte

ARM (32-bit)

Registers:
  • R10: Points to g (goroutine structure) - use g not R10
  • R11: Reserved for linker temps
  • R13: Hardware SP (use R13, not SP)
Special:
  • Frame size $-4 tells linker not to save LR (leaf function)
  • Condition codes append to instruction: MOVW.EQ, MOVM.IA.W

Writing Assembly Functions

Complete Example

Go declaration:
package main

//go:linkname ·add
func add(x, y int64) int64
Assembly implementation:
#include "textflag.h"

// func add(x, y int64) int64
TEXT ·add(SB), NOSPLIT, $0-24
    MOVQ x+0(FP), AX
    MOVQ y+8(FP), BX
    ADDQ BX, AX
    MOVQ AX, ret+16(FP)
    RET
Frame size calculation:
  • 2 arguments × 8 bytes = 16 bytes
  • 1 return value × 8 bytes = 8 bytes
  • Total: $0-24 (no locals, 24 byte args+results)

Using BYTE and WORD

For unsupported opcodes:
TEXT runtime·atomicload64(SB), NOSPLIT, $0-12
    MOVL ptr+0(FP), AX
    LEAL ret_lo+4(FP), BX
    
    // MOVQ (%EAX), %MM0
    BYTE $0x0f; BYTE $0x6f; BYTE $0x00
    
    // MOVQ %MM0, 0(%EBX)
    BYTE $0x0f; BYTE $0x7f; BYTE $0x03
    
    // EMMS
    BYTE $0x0F; BYTE $0x77
    RET

Best Practices

  1. Always provide Go prototypes for pointer safety and go vet checking
  2. Use go_asm.h constants instead of hard-coding offsets
  3. Mark nosplit functions appropriately and keep them small
  4. Document why assembly is needed in comments
  5. Test thoroughly - assembly bypasses safety checks
  6. Use NOPTR for data without pointers to help GC
  7. Avoid architecture-specific code when possible - use Go instead
Assembly code bypasses Go’s type safety and bounds checking. Use only when necessary for performance or to access features not available in Go.

References

Build docs developers (and LLMs) love