Introduction
Go’s assembler is based on the Plan 9 assembler syntax, which differs from traditional assemblers. It operates on a semi-abstract instruction set rather than providing direct access to machine instructions. This guide explains Go’s assembly language and how to use it effectively.
Go’s assembler is not a direct representation of the underlying machine. Some details map precisely to hardware, but many are abstracted. The toolchain handles instruction selection during code generation.
Key Concepts
Semi-Abstract Instructions
The assembler works with semi-abstract instructions:
- A
MOV might not generate a move instruction at all
- Could be a clear, load, or other operation
- Machine-specific operations tend to appear as themselves
- General concepts (memory move, calls) are more abstract
Viewing Assembly Output
To see what assembly your Go code generates:
# Compile and show assembly
go build -gcflags=-S main.go
# Or use compile tool directly
GOOS=linux GOARCH=amd64 go tool compile -S x.go
# Disassemble compiled binary
go build -o program main.go
go tool objdump -s main.main program
Syntax and Structure
Constants
Constant expressions use Go operator precedence:
// This is 4, not 0 (parsed as (3&1)<<2, not 3&(1<<2))
3&1<<2
// Constants are 64-bit unsigned
-2 // Represented as unsigned 64-bit with same bit pattern
Division or right shift where the right operand’s high bit is set is rejected to avoid ambiguity.
Symbols and Pseudo-Registers
Four predeclared pseudo-registers (same on all architectures):
- FP (Frame Pointer): Arguments and locals
- PC (Program Counter): Jumps and branches
- SB (Static Base): Global symbols
- SP (Stack Pointer): Top of local stack frame
Static Base (SB)
Used for global functions and data:
// Global function
TEXT runtime·profileloop(SB), NOSPLIT, $8
// Global symbol
MOVQ $runtime·profileloop1(SB), CX
// File-local symbol (like static in C)
TEXT foo<>(SB), NOSPLIT, $0
// Offset from symbol
MOVQ foo+4(SB), AX // 4 bytes past start of foo
Frame Pointer (FP)
Access function arguments:
// Must use names with offsets
MOVQ first_arg+0(FP), AX // First argument
MOVQ second_arg+8(FP), BX // Second argument (64-bit)
// On 32-bit systems, 64-bit values split:
MOVL arg_lo+0(FP), AX
MOVL arg_hi+4(FP), DX
Plain 0(FP) is rejected - you must use a name like arg+0(FP). The name is for documentation and verification by go vet.
Stack Pointer (SP)
Access local variables and prepare function calls:
// Negative offsets from SP for locals
MOVQ x-8(SP), AX // Local variable
MOVQ y-16(SP), BX // Another local
// Range: [-framesize, 0)
On architectures with hardware SP register:
x-8(SP) - virtual stack pointer
-8(SP) - hardware SP register
Labels and Jumps
label:
MOVW $0, R1
JMP label // Jump to label
// Labels are function-local
// Multiple functions can use same label names
Direct jumps use symbols:
CALL name(SB) // OK
JMP name(SB) // OK
JMP name+4(SB) // ERROR: cannot use offset
Directives
TEXT Directive
Declares a function:
TEXT runtime·profileloop(SB), NOSPLIT, $8
MOVQ $runtime·profileloop1(SB), CX
MOVQ CX, 0(SP)
CALL runtime·externalthreadhandler(SB)
RET
Format: TEXT symbol(SB), flags, $framesize-argsize
- framesize: Local stack frame size
- argsize: Argument size on caller’s frame
- flags: See textflag.h (NOSPLIT, WRAPPER, etc.)
The last instruction in a TEXT block must be a jump (usually RET). The linker will add a jump-to-itself if missing.
Common Flags
NOSPLIT = 4 // Don't check for stack split
RODATA = 8 // Read-only data
NOPTR = 16 // Contains no pointers (GC)
WRAPPER = 32 // Wrapper function (for recover)
NEEDCTXT = 64 // Closure, uses context register
NOFRAME = 512 // No frame allocation (frame must be $0)
TOPFRAME = 2048 // Outermost frame (stop traceback)
DATA and GLOBL Directives
Define global data:
// Initialize data
DATA divtab<>+0x00(SB)/4, $0xf4f8fcff
DATA divtab<>+0x04(SB)/4, $0xe6eaedf0
DATA divtab<>+0x3c(SB)/4, $0x81828384
// Declare global symbol
GLOBL divtab<>(SB), RODATA, $64
// Implicitly zeroed variable
GLOBL runtime·tlsoffset(SB), NOPTR, $4
Format: DATA symbol+offset(SB)/width, value
Special Instructions
FUNCDATA and PCDATA
Generated by compiler for GC information:
FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
PCDATA $0, $0
These provide stack map information for the garbage collector.
PCALIGN
Align next instruction:
PCALIGN $32
MOVD $2, R0 // Start of MOVD aligned to 32 bytes
Supported on: arm64, amd64, ppc64, loong64, riscv64
Interacting with Go
When a package has .s files, go build generates go_asm.h:
// Provides constants for Go types
const_bufSize // Go const values
reader__size // Struct sizes
reader_buf // Field offsets
reader_r
Usage in assembly:
#include "go_asm.h"
// Access field of struct pointer in R1
MOVQ reader_r(R1), AX
This keeps assembly robust to changes in Go type layouts. Always use these constants instead of hard-coding offsets.
Runtime Coordination
Assembly functions need pointer information for GC:
- Define Go prototype in a .go file:
//go:linkname ·asmFunction
func asmFunction(arg1 int, arg2 *byte) int
- Implement in assembly:
TEXT ·asmFunction(SB), NOSPLIT, $0-24
MOVQ arg1+0(FP), AX
MOVQ arg2+8(FP), BX
// ... implementation ...
MOVQ AX, ret+16(FP)
RET
Rules:
- Assembly name must not include package (use
·Function not pkg·Function)
- Always provide Go prototype for pointer safety
- Mark data with
NOPTR if it contains no pointers
- Use
NO_LOCAL_POINTERS if local frame has no pointers
Calling Convention
Data flow is left to right:
MOVQ $0, CX // Clear CX (0 → CX)
ADDQ AX, BX // Add AX to BX (AX + BX → BX)
This applies even on architectures with opposite conventional notation.
Architecture-Specific Details
x86 (386 and amd64)
Accessing g and m
#include "go_tls.h"
#include "go_asm.h"
get_tls(CX)
MOVQ g(CX), AX // Move g into AX
MOVQ g_m(AX), BX // Move g.m into BX
Addressing Modes
(DI)(BX*2) // Address DI + BX*2
64(DI)(BX*2) // Address DI + BX*2 + 64
// Scale factors: 1, 2, 4, 8 only
In -dynlink or -shared modes, loads/stores of globals may overwrite CX. Avoid using CX between memory references.
ARM64
Registers:
- R18: Platform register (reserved on Apple)
- R27, R28: Reserved by compiler/linker
- R29: Frame pointer
- R30: Link register
Instruction modifiers:
MOVW.P // Post-increment
MOVW.W // Pre-increment
Addressing modes:
R0->16 // Arithmetic right shift
R0>>16 // Logical right shift
R0<<16 // Left shift
R0@>16 // Rotate right
$(8<<12) // Immediate with shift
8(R0) // R0 + 8
(R2)(R0) // R0 + R2
R0.UXTB // Zero-extend byte
R0.SXTB // Sign-extend byte
ARM (32-bit)
Registers:
- R10: Points to g (goroutine structure) - use
g not R10
- R11: Reserved for linker temps
- R13: Hardware SP (use
R13, not SP)
Special:
- Frame size
$-4 tells linker not to save LR (leaf function)
- Condition codes append to instruction:
MOVW.EQ, MOVM.IA.W
Writing Assembly Functions
Complete Example
Go declaration:
package main
//go:linkname ·add
func add(x, y int64) int64
Assembly implementation:
#include "textflag.h"
// func add(x, y int64) int64
TEXT ·add(SB), NOSPLIT, $0-24
MOVQ x+0(FP), AX
MOVQ y+8(FP), BX
ADDQ BX, AX
MOVQ AX, ret+16(FP)
RET
Frame size calculation:
- 2 arguments × 8 bytes = 16 bytes
- 1 return value × 8 bytes = 8 bytes
- Total: $0-24 (no locals, 24 byte args+results)
Using BYTE and WORD
For unsupported opcodes:
TEXT runtime·atomicload64(SB), NOSPLIT, $0-12
MOVL ptr+0(FP), AX
LEAL ret_lo+4(FP), BX
// MOVQ (%EAX), %MM0
BYTE $0x0f; BYTE $0x6f; BYTE $0x00
// MOVQ %MM0, 0(%EBX)
BYTE $0x0f; BYTE $0x7f; BYTE $0x03
// EMMS
BYTE $0x0F; BYTE $0x77
RET
Best Practices
- Always provide Go prototypes for pointer safety and
go vet checking
- Use go_asm.h constants instead of hard-coding offsets
- Mark nosplit functions appropriately and keep them small
- Document why assembly is needed in comments
- Test thoroughly - assembly bypasses safety checks
- Use NOPTR for data without pointers to help GC
- Avoid architecture-specific code when possible - use Go instead
Assembly code bypasses Go’s type safety and bounds checking. Use only when necessary for performance or to access features not available in Go.
References