Purpose
Thellmfit serve mode exposes node-local model fit analysis over HTTP. It provides the same core data used by the TUI and CLI, optimized for programmatic access by schedulers, controllers, and cluster management systems.
Primary use case:
- Query each node in a cluster for its top runnable models
- Aggregate results externally for intelligent placement decisions
- Enable dynamic model routing based on hardware capabilities
Starting the Server
Start the API server using theserve subcommand:
serve subcommand:
Base URL
Default local base URL:0.0.0.0 and access via node IP or hostname.
API Versioning
The current API version isv1, with endpoints prefixed by /api/v1/.
For long-lived client integrations:
- Pin to
/api/v1/endpoints - Treat unknown response fields as forward-compatible additions
- Parse only the fields your application requires
Authentication
Currently no authentication is required. The API is designed for trusted internal cluster networks. For production deployments:- Use network-level access controls (firewall rules, VPC policies)
- Consider placing behind a reverse proxy with authentication if exposed beyond trusted networks
Response Format
All endpoints return JSON. Successful responses use HTTP 200 status codes. Error responses include anerror field:
Common Response Envelope
Most model-listing endpoints (/api/v1/models, /api/v1/models/top, /api/v1/models/{name}) return a common envelope structure:
- Node identity for multi-node aggregation
- System specs for validation and display
- Counts for pagination awareness
- Active filters for audit trails
- Models array with detailed fit analysis
Quick Start Example
Next Steps
- See Endpoints for detailed endpoint documentation
- See Query Parameters for filtering options
- See Response Schemas for field definitions
