Introduction

Welcome to the Performance Challenge

This is Anthropic’s original performance engineering take-home, now open for you to try! The challenge features a custom VLIW (Very Large Instruction Word) SIMD architecture simulator where you optimize a kernel that performs parallel tree traversal with hashing operations.

This repo contains the original 4-hour take-home baseline. After Claude Opus 4.5 started beating humans at the 2-hour version, Anthropic moved to different baselines for time-limited assessments.

Beat Claude Opus 4.5

Now you can compete against Claude’s best performance with unlimited time:

2164 cycles: Claude Opus 4 after many hours
1790 cycles: Claude Opus 4.5 in casual session (matches best human 2-hour performance)
1579 cycles: Claude Opus 4.5 after 2 hours
1487 cycles: Claude Opus 4.5 after 11.5 hours
1363 cycles: Claude Opus 4.5 in improved harness

If you optimize below 1487 cycles, email your code to [email protected] - Anthropic wants to be impressed!

Key Features

VLIW SIMD Simulator

Custom architecture with multiple execution engines and vector operations

Performance Tiers

Multiple benchmark levels from baseline to world-class optimization

Perfetto Tracing

Hot-reloading trace visualization for debugging instruction execution

Frozen Test Harness

Immutable test suite ensures fair performance comparison

The Challenge

The task is to optimize KernelBuilder.build_kernel() to minimize cycle count on a simulated VLIW SIMD machine. The kernel performs:

Tree traversal: Navigate a binary tree based on input values
Hashing: Apply a multi-stage 32-bit hash function
Branch selection: Choose left or right child based on hash parity
Wrapping: Handle out-of-bounds indices

The baseline implementation uses scalar operations. You can exploit:

VLIW parallelism: Pack multiple operations into instruction slots
SIMD vectorization: Process multiple elements with vector instructions
Memory optimization: Smart use of scratch space and memory layout
Instruction scheduling: Minimize pipeline stalls and dependencies

Quick Start

Get the code

Clone the repository from GitHub:

git clone https://github.com/anthropics/original_performance_takehome.git
cd original_performance_takehome

Run the baseline

Test the unoptimized kernel:

python perf_takehome.py Tests.test_kernel_cycles

Expected output: 147734 cycles (baseline)

Validate your optimization

After making changes, verify correctness and check your cycle count:

python tests/submission_tests.py

What’s Next?

Understand the Task

Learn what you need to optimize and the rules

Architecture Overview

Deep dive into the VLIW SIMD simulator design

Kernel Builder API

Reference for building optimized kernels

Debugging Tools

Learn to use Perfetto trace visualization

Warning: Don’t Cheat

LLMs have been known to modify tests to make the problem easier. Do not modify the tests/ folder. Always validate with:

# Verify tests are unchanged
git diff origin/main tests/

# Run submission tests
python tests/submission_tests.py

The challenge is designed to test real optimization skills, not test manipulation!

Get Started

Challenge

Architecture

Kernel Development

Debugging

Welcome to the Performance Challenge

Beat Claude Opus 4.5

Key Features

VLIW SIMD Simulator

Performance Tiers

Perfetto Tracing

Frozen Test Harness

The Challenge

Quick Start

What’s Next?

Understand the Task

Architecture Overview

Kernel Builder API

Debugging Tools

Warning: Don’t Cheat

Build docs developers (and LLMs) love

Get Started

Challenge

Architecture

Kernel Development

Debugging

​Welcome to the Performance Challenge

​Beat Claude Opus 4.5

​Key Features

VLIW SIMD Simulator

Performance Tiers

Perfetto Tracing

Frozen Test Harness

​The Challenge

​Quick Start

​What’s Next?

Understand the Task

Architecture Overview

Kernel Builder API

Debugging Tools

​Warning: Don’t Cheat

Build docs developers (and LLMs) love

Welcome to the Performance Challenge

Beat Claude Opus 4.5

Key Features

The Challenge

Quick Start

What’s Next?

Warning: Don’t Cheat