Skip to main content
OminiX-MLX Light

Welcome to OminiX-MLX

OminiX-MLX is a comprehensive Rust ecosystem for running machine learning models on Apple Silicon using MLX. It provides safe Rust bindings to Apple’s MLX framework and a collection of production-ready model crates for LLMs, Vision-Language Models, Speech Recognition, Text-to-Speech, and Image Generation. Built for production use with zero Python dependencies at inference time.

Key features

GPU acceleration

Metal-optimized inference on M1/M2/M3/M4 chips with unified memory architecture

Pure Rust

No Python runtime required for inference - compile once, deploy anywhere

Lazy evaluation

Automatic kernel fusion and memory optimization through MLX’s lazy execution

Modular design

Use only what you need - each model family is a separate crate

What you can build

Language models

Run Qwen, GLM, Mixtral, Mistral, and MiniCPM models with high throughput

Vision-language models

Process images and text together with Moxin-7B VLM

Speech recognition

Transcribe audio in 30+ languages with Qwen3-ASR, Paraformer, and FunASR

Text-to-speech

Clone voices with few-shot learning using GPT-SoVITS

Image generation

Generate images from text with FLUX.2-klein and Z-Image

API server

Deploy OpenAI-compatible API endpoints for your models

Architecture

OminiX-MLX follows a layered architecture:
┌─────────────────────────────────────────────────────────────────────────────┐
│                              User Application                                │
│                    (OminiX-API / Custom Rust Application)                   │
└───────────────────────────────┬─────────────────────────────────────────────┘

        ┌───────────────────────┼───────────────────────────┐
        │                       │                           │
        ▼                       ▼                           ▼
┌───────────────┐       ┌─────────────────┐       ┌─────────────────┐
│  LLM / VLM    │       │   Audio Crates  │       │  Image Crates   │
├───────────────┤       ├─────────────────┤       ├─────────────────┤
│ qwen3-mlx     │       │ funasr-mlx      │       │ flux-klein-mlx  │
│ glm4-mlx      │       │ funasr-nano-mlx │       │ zimage-mlx      │
│ mixtral-mlx   │       │ qwen3-asr-mlx   │       │ qwen-image-mlx  │
│ moxin-vlm-mlx │       │ gpt-sovits-mlx  │       │                 │
└───────┬───────┘       └────────┬────────┘       └────────┬────────┘
        │                        │                         │
        └────────────────────────┼─────────────────────────┘


                  ┌──────────────────────────┐
                  │       mlx-rs-core        │
                  ├──────────────────────────┤
                  │ • KV Cache Management    │
                  │ • RoPE Embeddings        │
                  │ • Attention (SDPA)       │
                  │ • Audio Processing       │
                  │ • Metal Kernels          │
                  └────────────┬─────────────┘


                  ┌──────────────────────────┐
                  │         mlx-rs           │
                  ├──────────────────────────┤
                  │ • Safe Rust API          │
                  │ • Array Operations       │
                  │ • Neural Network Layers  │
                  └────────────┬─────────────┘


                  ┌──────────────────────────┐
                  │      Apple MLX (C++)     │
                  ├──────────────────────────┤
                  │ • Metal GPU Backend      │
                  │ • Unified Memory         │
                  │ • Lazy Evaluation        │
                  └──────────────────────────┘

Performance

Benchmarks on Apple M3 Max (128GB):
TaskModelPerformanceMemory
LLMQwen3-4B45 tok/s8GB
LLMGLM4-9B-4bit35 tok/s6GB
LLMMiniCPM-SALA-9B-8bit28 tok/s9.6GB
VLMMoxin-7B-8bit30 tok/s10GB
ASRQwen3-ASR-1.7B-8bit30x real-time2.5GB
ASRQwen3-ASR-0.6B-8bit50x real-time1.0GB
TTSGPT-SoVITS4x real-time2GB
ImageFLUX.2-klein~5s/image13GB

Next steps

Installation

Install Rust, Xcode tools, and set up your environment

Quick start

Run your first model in under 5 minutes

Model guides

Explore detailed guides for each model family

API reference

Browse the complete API documentation

Build docs developers (and LLMs) love