Welcome to OminiX-MLX
OminiX-MLX is a comprehensive Rust ecosystem for running machine learning models on Apple Silicon using MLX. It provides safe Rust bindings to Apple’s MLX framework and a collection of production-ready model crates for LLMs, Vision-Language Models, Speech Recognition, Text-to-Speech, and Image Generation. Built for production use with zero Python dependencies at inference time.Key features
GPU acceleration
Metal-optimized inference on M1/M2/M3/M4 chips with unified memory architecture
Pure Rust
No Python runtime required for inference - compile once, deploy anywhere
Lazy evaluation
Automatic kernel fusion and memory optimization through MLX’s lazy execution
Modular design
Use only what you need - each model family is a separate crate
What you can build
Language models
Run Qwen, GLM, Mixtral, Mistral, and MiniCPM models with high throughput
Vision-language models
Process images and text together with Moxin-7B VLM
Speech recognition
Transcribe audio in 30+ languages with Qwen3-ASR, Paraformer, and FunASR
Text-to-speech
Clone voices with few-shot learning using GPT-SoVITS
Image generation
Generate images from text with FLUX.2-klein and Z-Image
API server
Deploy OpenAI-compatible API endpoints for your models
Architecture
OminiX-MLX follows a layered architecture:Performance
Benchmarks on Apple M3 Max (128GB):| Task | Model | Performance | Memory |
|---|---|---|---|
| LLM | Qwen3-4B | 45 tok/s | 8GB |
| LLM | GLM4-9B-4bit | 35 tok/s | 6GB |
| LLM | MiniCPM-SALA-9B-8bit | 28 tok/s | 9.6GB |
| VLM | Moxin-7B-8bit | 30 tok/s | 10GB |
| ASR | Qwen3-ASR-1.7B-8bit | 30x real-time | 2.5GB |
| ASR | Qwen3-ASR-0.6B-8bit | 50x real-time | 1.0GB |
| TTS | GPT-SoVITS | 4x real-time | 2GB |
| Image | FLUX.2-klein | ~5s/image | 13GB |
Next steps
Installation
Install Rust, Xcode tools, and set up your environment
Quick start
Run your first model in under 5 minutes
Model guides
Explore detailed guides for each model family
API reference
Browse the complete API documentation