Skip to main content
FastrAPI is built for speed from the ground up, leveraging Rust’s performance and Axum’s efficiency. This guide will help you maximize the performance of your FastrAPI applications.

Benchmark results

FastrAPI consistently outperforms traditional Python web frameworks. Here are real-world benchmarks using k6 load testing:

Test environment

  • Kernel: 6.16.8-arch3-1
  • CPU: AMD Ryzen 7 7735HS (16 cores, 4.83 GHz)
  • Memory: 15 GB
  • Load Test: 20 Virtual Users (VUs), 30 seconds

Performance comparison

FrameworkAvg Latency (ms)Median Latency (ms)Requests/secP95 Latency (ms)P99 Latency (ms)
FASTRAPI0.590.00313602.3911.12
FastAPI + Guvicorn (workers: 1)21.0819.6793738.4793.42
FastAPI + Guvicorn (workers: 16)4.844.17388210.2281.20
FastrAPI handles 31,360 requests per second with sub-millisecond latency, making it approximately 33x faster than FastAPI + Guvicorn with 1 worker.

Fast-path optimization

FastrAPI includes an intelligent “fast-path” optimization that detects simple endpoints with no dependencies, validation, or complex parameters. These endpoints skip unnecessary processing entirely.

How it works

When you register a route, FastrAPI analyzes the function signature at decorator time:
from fastrapi import FastrAPI

app = FastrAPI()

# Fast-path: no parameters, no validation
@app.get("/health")
def health_check():
    return {"status": "ok"}

# Slow-path: requires parameter validation
@app.get("/user/{user_id}")
def get_user(user_id: int):
    return {"user_id": user_id}
The health_check endpoint sets is_fast_path = true and bypasses:
  • Dependency resolution
  • Parameter validation
  • Kwargs construction
  • Type coercion

Optimization tips

Keep frequently-accessed endpoints simple. Avoid unnecessary dependencies for routes that need maximum speed.
# Slower: dependency overhead
@app.get("/stats")
def stats(db = Depends(get_db)):
    return get_cached_stats()

# Faster: no dependencies
@app.get("/stats")
def stats():
    return get_cached_stats()
Explicitly declare response types to avoid runtime type detection:
from fastrapi.responses import JSONResponse

@app.get("/data")
def get_data() -> JSONResponse:
    return JSONResponse({"key": "value"})
FastrAPI uses the papaya concurrent hashmap for O(1) route lookups, which scales efficiently even with 10,000+ routes. Don’t hesitate to create many endpoints.
# Scales linearly - no performance penalty
for i in range(10000):
    @app.get(f"/endpoint_{i}")
    def handler():
        return {"id": i}

Runtime configuration

Worker threads

FastrAPI automatically configures the Python handler thread pool based on your CPU:
static PYTHON_RUNTIME: Lazy<tokio::runtime::Runtime> = Lazy::new(|| {
    tokio::runtime::Builder::new_multi_thread()
        .worker_threads(num_cpus::get().max(4).min(16))
        .thread_name("python-handler")
        .enable_all()
        .build()
        .expect("Failed to create Python runtime")
});
  • Minimum: 4 threads
  • Maximum: 16 threads
  • Default: Number of CPU cores (clamped to range)

Tokio async runtime

All request handling runs on Tokio’s async runtime, maximizing concurrency. Python code is executed in a separate thread pool to maintain the GIL isolation.

Middleware performance

Layer ordering

FastrAPI applies middleware in a specific order for optimal performance:
  1. Sessions - Lightweight cookie management
  2. GZip - Compression (only for responses above threshold)
  3. Python Middleware - Custom user middleware
  4. CORS - Cross-origin handling
  5. Trusted Host - Host validation

GZip compression

Configure compression thresholds to avoid overhead on small responses:
from fastrapi.middleware import GZipMiddleware

app.add_middleware(
    GZipMiddleware,
    minimum_size=500,  # Only compress responses > 500 bytes
    compresslevel=9    # Maximum compression
)
Higher compression levels (7-9) increase CPU usage. Use level 6 for balanced performance.

Memory optimization

Route storage

Routes are stored in a lock-free concurrent hashmap (papaya) with pre-allocated capacity:
pub static ROUTES: Lazy<PapayaHashMap<String, RouteHandler>> =
    Lazy::new(|| PapayaHashMap::with_capacity(128));

Response type detection

FastrAPI determines response types at decorator time, not at request time:
# Response type is analyzed once during decoration
@app.get("/html")
def get_html() -> HTMLResponse:
    return HTMLResponse("<h1>Hello</h1>")
This eliminates per-request type inspection overhead.

Profiling and monitoring

Built-in tracing

FastrAPI uses the tracing crate for structured logging. Enable debug logs to identify bottlenecks:
app = FastrAPI(debug=True)

Request timing

Monitor P95 and P99 latencies to identify slow endpoints. FastrAPI’s median latency of 0.00ms means most requests complete within the measurement precision.

Release build optimizations

FastrAPI’s Rust components are compiled with aggressive optimizations:
[profile.release]
codegen-units = 1    # Single codegen unit for better inlining
lto = "fat"          # Full link-time optimization
panic = "abort"      # Smaller binary, faster panics
strip = true         # Remove debug symbols
opt-level = 3        # Maximum optimization
These settings produce highly optimized binaries with minimal overhead.

Best practices

1

Minimize dependencies

Use Depends() only when necessary. Each dependency adds resolution overhead.
2

Declare response types

Explicit type hints enable compile-time optimization and skip runtime detection.
3

Cache expensive operations

Use dependency caching for database connections and expensive computations:
from fastrapi import Depends

def get_db():
    return expensive_db_connection()

@app.get("/data")
def handler(db = Depends(get_db, use_cache=True)):
    return db.query()
4

Batch similar endpoints

Group related routes to improve CPU cache locality and reduce context switching.

Comparison with FastAPI

OptimizationFastAPIFastrAPI
Dependency resolutionRuntime inspect + reflection every requestOne-time parsing at decorator time
Route lookupRegex router (O(n))Concurrent hashmap (O(1))
Fast-path detectionNoYes - skips unnecessary work
Async runtimeasyncioNative Tokio
ConcurrencyGIL-limitedLock-free concurrent data structures
For more details on architectural differences, see the comparison with FastAPI page.

Build docs developers (and LLMs) love