LLM Gateway Core uses a layered architecture that separates concerns between routing, caching, rate limiting, and provider communication. The system is built around a central ChatService that orchestrates the entire request lifecycle.
The FastAPI endpoint in app/api/v1/chat.py ties everything together:
rate_limiter = RedisRateLimiter( capacity=settings.RATE_LIMITER_CAPACITY, refill_rate=settings.RATE_LIMITER_REFILL_RATE)async def rate_limit_dependency(request: Request): """ FastAPI dependency to enforce rate limiting and record metrics. Also validates the API key if provided. """ api_key = request.headers.get("X-API-Key") valid_keys = [k.strip() for k in settings.API_KEYS.split(",") if k.strip()] if api_key not in valid_keys: raise HTTPException( status_code=401, detail="Invalid or missing API Key" ) key = api_key or request.client.host if not rate_limiter.allow(key): RATE_LIMIT_BLOCKED.inc() raise HTTPException( status_code=429, detail="Too many requests. Please wait before trying again." ) RATE_LIMIT_ALLOWED.inc()chat_service = ChatService()@app.post("", response_model=ChatResponse, dependencies=[Depends(rate_limit_dependency)])async def chat(request: ChatRequest): """ Entry point for all chat completions. Processes the chat request and returns a chat response. """ return await chat_service.chat(request)
The rate limiting is enforced at the dependency level, ensuring all requests are checked before reaching the service layer.