Performance tuning

FastrAPI is built for speed from the ground up, leveraging Rust’s performance and Axum’s efficiency. This guide will help you maximize the performance of your FastrAPI applications.

Benchmark results

FastrAPI consistently outperforms traditional Python web frameworks. Here are real-world benchmarks using k6 load testing:

Test environment

Kernel: 6.16.8-arch3-1
CPU: AMD Ryzen 7 7735HS (16 cores, 4.83 GHz)
Memory: 15 GB
Load Test: 20 Virtual Users (VUs), 30 seconds

Performance comparison

Framework	Avg Latency (ms)	Median Latency (ms)	Requests/sec	P95 Latency (ms)	P99 Latency (ms)
FASTRAPI	0.59	0.00	31360	2.39	11.12
FastAPI + Guvicorn (workers: 1)	21.08	19.67	937	38.47	93.42
FastAPI + Guvicorn (workers: 16)	4.84	4.17	3882	10.22	81.20

FastrAPI handles 31,360 requests per second with sub-millisecond latency, making it approximately 33x faster than FastAPI + Guvicorn with 1 worker.

Fast-path optimization

FastrAPI includes an intelligent “fast-path” optimization that detects simple endpoints with no dependencies, validation, or complex parameters. These endpoints skip unnecessary processing entirely.

How it works

When you register a route, FastrAPI analyzes the function signature at decorator time:

from fastrapi import FastrAPI

app = FastrAPI()

# Fast-path: no parameters, no validation
@app.get("/health")
def health_check():
    return {"status": "ok"}

# Slow-path: requires parameter validation
@app.get("/user/{user_id}")
def get_user(user_id: int):
    return {"user_id": user_id}

The health_check endpoint sets is_fast_path = true and bypasses:

Dependency resolution
Parameter validation
Kwargs construction
Type coercion

Optimization tips

Minimize dependencies for hot paths

Keep frequently-accessed endpoints simple. Avoid unnecessary dependencies for routes that need maximum speed.

# Slower: dependency overhead
@app.get("/stats")
def stats(db = Depends(get_db)):
    return get_cached_stats()

# Faster: no dependencies
@app.get("/stats")
def stats():
    return get_cached_stats()

Use appropriate response types

Explicitly declare response types to avoid runtime type detection:

from fastrapi.responses import JSONResponse

@app.get("/data")
def get_data() -> JSONResponse:
    return JSONResponse({"key": "value"})

Leverage concurrent hashmap routing

FastrAPI uses the papaya concurrent hashmap for O(1) route lookups, which scales efficiently even with 10,000+ routes. Don’t hesitate to create many endpoints.

# Scales linearly - no performance penalty
for i in range(10000):
    @app.get(f"/endpoint_{i}")
    def handler():
        return {"id": i}

Runtime configuration

Worker threads

FastrAPI automatically configures the Python handler thread pool based on your CPU:

static PYTHON_RUNTIME: Lazy<tokio::runtime::Runtime> = Lazy::new(|| {
    tokio::runtime::Builder::new_multi_thread()
        .worker_threads(num_cpus::get().max(4).min(16))
        .thread_name("python-handler")
        .enable_all()
        .build()
        .expect("Failed to create Python runtime")
});

Minimum: 4 threads
Maximum: 16 threads
Default: Number of CPU cores (clamped to range)

Tokio async runtime

All request handling runs on Tokio’s async runtime, maximizing concurrency. Python code is executed in a separate thread pool to maintain the GIL isolation.

Middleware performance

Layer ordering

FastrAPI applies middleware in a specific order for optimal performance:

Sessions - Lightweight cookie management
GZip - Compression (only for responses above threshold)
Python Middleware - Custom user middleware
CORS - Cross-origin handling
Trusted Host - Host validation

GZip compression

Configure compression thresholds to avoid overhead on small responses:

from fastrapi.middleware import GZipMiddleware

app.add_middleware(
    GZipMiddleware,
    minimum_size=500,  # Only compress responses > 500 bytes
    compresslevel=9    # Maximum compression
)

Higher compression levels (7-9) increase CPU usage. Use level 6 for balanced performance.

Memory optimization

Route storage

Routes are stored in a lock-free concurrent hashmap (papaya) with pre-allocated capacity:

pub static ROUTES: Lazy<PapayaHashMap<String, RouteHandler>> =
    Lazy::new(|| PapayaHashMap::with_capacity(128));

Response type detection

FastrAPI determines response types at decorator time, not at request time:

# Response type is analyzed once during decoration
@app.get("/html")
def get_html() -> HTMLResponse:
    return HTMLResponse("<h1>Hello</h1>")

This eliminates per-request type inspection overhead.

Profiling and monitoring

Built-in tracing

FastrAPI uses the tracing crate for structured logging. Enable debug logs to identify bottlenecks:

app = FastrAPI(debug=True)

Request timing

Monitor P95 and P99 latencies to identify slow endpoints. FastrAPI’s median latency of 0.00ms means most requests complete within the measurement precision.

Release build optimizations

FastrAPI’s Rust components are compiled with aggressive optimizations:

[profile.release]
codegen-units = 1    # Single codegen unit for better inlining
lto = "fat"          # Full link-time optimization
panic = "abort"      # Smaller binary, faster panics
strip = true         # Remove debug symbols
opt-level = 3        # Maximum optimization

These settings produce highly optimized binaries with minimal overhead.

Best practices

Minimize dependencies

Use Depends() only when necessary. Each dependency adds resolution overhead.

Declare response types

Explicit type hints enable compile-time optimization and skip runtime detection.

Cache expensive operations

Use dependency caching for database connections and expensive computations:

from fastrapi import Depends

def get_db():
    return expensive_db_connection()

@app.get("/data")
def handler(db = Depends(get_db, use_cache=True)):
    return db.query()

Batch similar endpoints

Group related routes to improve CPU cache locality and reduce context switching.

Comparison with FastAPI

Optimization	FastAPI	FastrAPI
Dependency resolution	Runtime `inspect` + reflection every request	One-time parsing at decorator time
Route lookup	Regex router (O(n))	Concurrent hashmap (O(1))
Fast-path detection	No	Yes - skips unnecessary work
Async runtime	asyncio	Native Tokio
Concurrency	GIL-limited	Lock-free concurrent data structures

For more details on architectural differences, see the comparison with FastAPI page.

Get Started

Core Concepts

Guides

Advanced

Performance tuning

Benchmark results

Test environment

Performance comparison

Fast-path optimization

How it works

Optimization tips

Runtime configuration

Worker threads

Tokio async runtime

Middleware performance

Layer ordering

GZip compression

Memory optimization

Route storage

Response type detection

Profiling and monitoring

Built-in tracing

Request timing

Release build optimizations

Best practices

Comparison with FastAPI

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Advanced

​Benchmark results

​Test environment

​Performance comparison

​Fast-path optimization

​How it works

​Optimization tips

​Runtime configuration

​Worker threads

​Tokio async runtime

​Middleware performance

​Layer ordering

​GZip compression

​Memory optimization

​Route storage

​Response type detection

​Profiling and monitoring

Built-in tracing

Request timing

​Release build optimizations

​Best practices

​Comparison with FastAPI

Build docs developers (and LLMs) love

Benchmark results

Test environment

Performance comparison

Fast-path optimization

How it works

Optimization tips

Runtime configuration

Worker threads

Tokio async runtime

Middleware performance

Layer ordering

GZip compression

Memory optimization

Route storage

Response type detection

Profiling and monitoring

Release build optimizations

Best practices

Comparison with FastAPI