GET /health
Liveness probe for health checks and monitoring.Request
Response
Fields
Always returns
"ok" when server is runningUse Cases
- Kubernetes liveness/readiness probes
- Load balancer health checks
- Service discovery validation
GET /api/v1/system
Hardware detection endpoint returning node identity and detected system specifications.Request
Response
Fields
See Response Schemas - System Object for complete field documentation.Use Cases
- Cluster inventory and hardware discovery
- Validating hardware requirements before placement
- Displaying node capabilities in dashboards
GET /api/v1/models
Filtered model listing with scoring and fit analysis for the current node.Request
Query Parameters
All parameters are optional. See Query Parameters for details.limit- Maximum number of models to returnperfect- Return only perfect fitsmin_fit- Minimum fit level (perfect|good|marginal|too_tight)runtime- Filter by inference runtime (any|mlx|llamacpp)use_case- Filter by use case (coding|reasoning|chat|multimodal|embedding|general)provider- Filter by provider substringsearch- Free-text search across name/provider/paramssort- Sort column (score|tps|params|mem|ctx|date|use_case)include_too_tight- Include unrunnable models (default: true)max_context- Context length limit for memory estimation
Response
Fields
See Response Schemas for complete field documentation.Use Cases
- Browsing all models compatible with a node
- Filtering by specific requirements (use case, runtime, fit level)
- Custom sorting and ranking strategies
GET /api/v1/models/top
Top runnable models optimized for scheduling decisions. This is the key endpoint for cluster schedulers.Request
Query Parameters
Same as/api/v1/models with different defaults:
limitdefaults to 5 (instead of unlimited)include_too_tightdefaults to false (excludes unrunnable models)
Response
Identical structure to/api/v1/models, but optimized for scheduling:
Use Cases
- Scheduler polling: Query each node for top K runnable models
- Fast placement decisions: Get best options without full model list
- Workload-specific routing: Filter by use case for targeted placement
Recommended Scheduler Pattern
GET /api/v1/models/
Model search by name - path-constrained text search.Request
Path Parameters
Model name search string (substring match, case-insensitive)
Query Parameters
All query parameters from/api/v1/models are supported. The {name} path parameter is automatically added as a search filter.
Response
Identical structure to/api/v1/models, filtered by name:
Use Cases
- Client-side drilldown after selecting a model family
- Validating if a specific model runs on a node
- Finding all variants of a model (e.g., all “Qwen” models)
Error Responses
HTTP 400 - Bad Request
Returned for invalid query parameter values:- Invalid
min_fitvalue - Invalid
runtimevalue - Invalid
use_casevalue - Invalid
sortcolumn
HTTP 500 - Internal Server Error
Returned for unexpected server errors:Client Integration Best Practices
1. Polling Pattern for Schedulers
For each node agent:- Call
/healthto verify availability - Call
/api/v1/systemto get hardware specs - Call
/api/v1/models/top?limit=K&min_fit=goodfor scheduling candidates - Attach node metadata and forward to central scheduler
2. Conservative Placement Defaults
For production placement:3. Per-Workload Targeting
Examples:- Coding workloads:
use_case=coding - Embedding workloads:
use_case=embedding - Runtime-constrained fleet:
runtime=llamacpp
4. Forward-Compatible Parsing
Treat unknown fields as forward-compatible additions:- Parse only required fields your application depends on
- Ignore unknown fields to support future API versions
- Validate critical fields exist before accessing
