REST API Overview

Purpose

The llmfit serve mode exposes node-local model fit analysis over HTTP. It provides the same core data used by the TUI and CLI, optimized for programmatic access by schedulers, controllers, and cluster management systems. Primary use case:

Query each node in a cluster for its top runnable models
Aggregate results externally for intelligent placement decisions
Enable dynamic model routing based on hardware capabilities

Starting the Server

Start the API server using the serve subcommand:

llmfit serve --host 0.0.0.0 --port 8787

Global flags apply before the serve subcommand:

llmfit --memory 24G --max-context 8192 serve --host 0.0.0.0 --port 8787

The server will print available endpoints on startup:

llmfit API server listening on http://0.0.0.0:8787
  GET /health
  GET /api/v1/system
  GET /api/v1/models?limit=20&min_fit=marginal&sort=score
  GET /api/v1/models/top?limit=5&use_case=coding&min_fit=good
  GET /api/v1/models/<name>

Base URL

Default local base URL:

http://127.0.0.1:8787

For cluster deployments, bind to 0.0.0.0 and access via node IP or hostname.

API Versioning

The current API version is v1, with endpoints prefixed by /api/v1/. For long-lived client integrations:

Pin to /api/v1/ endpoints
Treat unknown response fields as forward-compatible additions
Parse only the fields your application requires

Authentication

Currently no authentication is required. The API is designed for trusted internal cluster networks. For production deployments:

Use network-level access controls (firewall rules, VPC policies)
Consider placing behind a reverse proxy with authentication if exposed beyond trusted networks

Response Format

All endpoints return JSON. Successful responses use HTTP 200 status codes. Error responses include an error field:

{
  "error": "invalid min_fit value: use perfect|good|marginal|too_tight"
}

Common Response Envelope

Most model-listing endpoints (/api/v1/models, /api/v1/models/top, /api/v1/models/{name}) return a common envelope structure:

{
  "node": {
    "name": "worker-1",
    "os": "linux"
  },
  "system": { /* hardware details */ },
  "total_models": 23,
  "returned_models": 10,
  "filters": { /* query parameters echoed */ },
  "models": [ /* array of model fit objects */ ]
}

This envelope provides:

Node identity for multi-node aggregation
System specs for validation and display
Counts for pagination awareness
Active filters for audit trails
Models array with detailed fit analysis

Quick Start Example

# Check server health
curl http://127.0.0.1:8787/health

# Get hardware specs
curl http://127.0.0.1:8787/api/v1/system

# Get top 5 runnable models
curl "http://127.0.0.1:8787/api/v1/models/top?limit=5&min_fit=good"

Next Steps

See Endpoints for detailed endpoint documentation
See Query Parameters for filtering options
See Response Schemas for field definitions

CLI Commands

REST API

Core Library

REST API Overview

Purpose

Starting the Server

Base URL

API Versioning

Authentication

Response Format

Common Response Envelope

Quick Start Example

Next Steps

Build docs developers (and LLMs) love

CLI Commands

REST API

Core Library

​Purpose

​Starting the Server

​Base URL

​API Versioning

​Authentication

​Response Format

​Common Response Envelope

​Quick Start Example

​Next Steps

Build docs developers (and LLMs) love

Purpose

Starting the Server

Base URL

API Versioning

Authentication

Response Format

Common Response Envelope

Quick Start Example

Next Steps