Audio endpoints
Create transcription
Transcribe audio to text using Qwen3-ASR models. Compatible with OpenAI Whisper API format.Request parameters
Audio file to transcribe. Supported formats: WAV, MP3, M4A, FLAC.Multipart upload only. Use
file_path for JSON requests.Path to audio file on server filesystem.JSON requests only. Use
file for multipart uploads.Language of the audio content. Specifying the language improves accuracy.Supported languages:
Chinese, English, Spanish, French, German, Japanese, Korean, Arabic, Russian, Hindi, Thai, Vietnamese, and 15+ more.Example: "English", "Chinese"Model identifier. Currently only
qwen3-asr is supported.Response format. Options:
json, text, verbose_jsonjson: Returns transcription text onlytext: Returns plain text responseverbose_json: Includes timestamps and metadata
Response format
Transcribed text from the audio file
Audio duration in seconds (verbose_json only)
Detected or specified language (verbose_json only)
Example response
Verbose response
Chat endpoints
Create chat completion
Available in MiniCPM-SALA server. Coming to OminiX-API unified server soon.
Request parameters
Model identifier to use for completionExample:
"minicpm-sala-9b"Array of message objects representing the conversation historyEach message contains:
role: One ofsystem,user, orassistantcontent: Message text content
Sampling temperature between 0 and 2. Higher values make output more random.
0.0: Deterministic (greedy sampling)0.7: Balanced creativity and coherence1.5+: More creative but less focused
Maximum number of tokens to generate
Nucleus sampling threshold. Alternative to temperature.
Whether to stream response tokens (not yet implemented)
Response format
Unique completion identifier
Object type, always
"chat.completion"Model used for completion
Array of completion choices (currently always length 1)Each choice contains:
index: Choice index (always 0)message: Response message withroleandcontentfinish_reason: Why generation stopped ("stop","length", etc.)
Token usage statisticsContains:
prompt_tokens: Input token countcompletion_tokens: Generated token counttotal_tokens: Sum of prompt and completion tokens
Example response
Model management endpoints
List models
Retrieve list of available models with metadata including path, size, quantization, and loaded status.cURL
Response format
Object type, always
"list"Array of model objects
Model object fields
Model identifier (directory name)
Object type, always
"model"Model owner (e.g.,
"mlx-community", "local")Absolute path to model directory
Whether this model is currently loaded in the server
HuggingFace repository ID (if downloaded via API)
Quantization configurationContains:
bits: Quantization bit width (4, 8, etc.)group_size: Group size for quantization
Total size of model files in bytes
Unix timestamp when model was downloaded
Example response
Download model
Download a model from HuggingFace Hub to the local models directory.cURL
Request parameters
HuggingFace repository ID in format
owner/model-nameExample: "mlx-community/Qwen3-ASR-1.7B-8bit"Response format
Download status, always
"downloading"Model identifier (extracted from repo_id)
HuggingFace repository ID
Example response
Download runs asynchronously. Use
GET /v1/models to check when the model is available.Downloaded files
The server downloads these essential files:config.json- Model configurationtokenizer.json- Tokenizer vocabularytokenizer_config.json- Tokenizer settingsmodel.safetensors- Model weights (single file)model.safetensors.index.json+ shards - Model weights (sharded)
Delete model
Delete a downloaded model from the local filesystem.cURL
Path parameters
Model identifier to delete (from model list)
Response format
Model identifier that was deleted
Always
true if successfulExample response
Health endpoint
Health check
Check server health and readiness.cURL
Response format
200 OK when server is healthy and ready to accept requests.
Error responses
All endpoints return OpenAI-compatible error responses:Error types
| HTTP Status | Error Type | Description |
|---|---|---|
| 400 | invalid_request_error | Malformed request or invalid parameters |
| 404 | not_found | Endpoint or resource not found |
| 409 | conflict | Resource conflict (e.g., model already exists) |
| 500 | server_error | Internal server error or inference failure |
Common error codes
| Code | Description |
|---|---|
invalid_audio_format | Unsupported audio file format |
invalid_language | Unsupported language specified |
model_not_found | Requested model not available |
inference_failed | Model inference error |
download_failed | Model download error |