Welcome to OminiX-MLX

OminiX-MLX is a comprehensive Rust ecosystem for running machine learning models on Apple Silicon using MLX. It provides safe Rust bindings to Apple’s MLX framework and a collection of production-ready model crates for LLMs, Vision-Language Models, Speech Recognition, Text-to-Speech, and Image Generation. Built for production use with zero Python dependencies at inference time.

Key features

GPU acceleration

Metal-optimized inference on M1/M2/M3/M4 chips with unified memory architecture

Pure Rust

No Python runtime required for inference - compile once, deploy anywhere

Lazy evaluation

Automatic kernel fusion and memory optimization through MLX’s lazy execution

Modular design

Use only what you need - each model family is a separate crate

What you can build

Language models

Run Qwen, GLM, Mixtral, Mistral, and MiniCPM models with high throughput

Vision-language models

Process images and text together with Moxin-7B VLM

Speech recognition

Transcribe audio in 30+ languages with Qwen3-ASR, Paraformer, and FunASR

Text-to-speech

Clone voices with few-shot learning using GPT-SoVITS

Image generation

Generate images from text with FLUX.2-klein and Z-Image

API server

Deploy OpenAI-compatible API endpoints for your models

Architecture

OminiX-MLX follows a layered architecture:

┌─────────────────────────────────────────────────────────────────────────────┐
│                              User Application                                │
│                    (OminiX-API / Custom Rust Application)                   │
└───────────────────────────────┬─────────────────────────────────────────────┘
                                │
        ┌───────────────────────┼───────────────────────────┐
        │                       │                           │
        ▼                       ▼                           ▼
┌───────────────┐       ┌─────────────────┐       ┌─────────────────┐
│  LLM / VLM    │       │   Audio Crates  │       │  Image Crates   │
├───────────────┤       ├─────────────────┤       ├─────────────────┤
│ qwen3-mlx     │       │ funasr-mlx      │       │ flux-klein-mlx  │
│ glm4-mlx      │       │ funasr-nano-mlx │       │ zimage-mlx      │
│ mixtral-mlx   │       │ qwen3-asr-mlx   │       │ qwen-image-mlx  │
│ moxin-vlm-mlx │       │ gpt-sovits-mlx  │       │                 │
└───────┬───────┘       └────────┬────────┘       └────────┬────────┘
        │                        │                         │
        └────────────────────────┼─────────────────────────┘
                                 │
                                 ▼
                  ┌──────────────────────────┐
                  │       mlx-rs-core        │
                  ├──────────────────────────┤
                  │ • KV Cache Management    │
                  │ • RoPE Embeddings        │
                  │ • Attention (SDPA)       │
                  │ • Audio Processing       │
                  │ • Metal Kernels          │
                  └────────────┬─────────────┘
                               │
                               ▼
                  ┌──────────────────────────┐
                  │         mlx-rs           │
                  ├──────────────────────────┤
                  │ • Safe Rust API          │
                  │ • Array Operations       │
                  │ • Neural Network Layers  │
                  └────────────┬─────────────┘
                               │
                               ▼
                  ┌──────────────────────────┐
                  │      Apple MLX (C++)     │
                  ├──────────────────────────┤
                  │ • Metal GPU Backend      │
                  │ • Unified Memory         │
                  │ • Lazy Evaluation        │
                  └──────────────────────────┘

Performance

Benchmarks on Apple M3 Max (128GB):

Task	Model	Performance	Memory
LLM	Qwen3-4B	45 tok/s	8GB
LLM	GLM4-9B-4bit	35 tok/s	6GB
LLM	MiniCPM-SALA-9B-8bit	28 tok/s	9.6GB
VLM	Moxin-7B-8bit	30 tok/s	10GB
ASR	Qwen3-ASR-1.7B-8bit	30x real-time	2.5GB
ASR	Qwen3-ASR-0.6B-8bit	50x real-time	1.0GB
TTS	GPT-SoVITS	4x real-time	2GB
Image	FLUX.2-klein	~5s/image	13GB

Next steps

Installation

Install Rust, Xcode tools, and set up your environment

Quick start

Run your first model in under 5 minutes

Model guides

Explore detailed guides for each model family

API reference

Browse the complete API documentation

Get Started

Core Concepts

Language Models

Vision-Language Models

Speech Recognition

Text-to-Speech

Image Generation

API Server

Advanced

Introduction to OminiX-MLX

Welcome to OminiX-MLX

Key features

GPU acceleration

Pure Rust

Lazy evaluation

Modular design

What you can build

Language models

Vision-language models

Speech recognition

Text-to-speech

Image generation

API server

Architecture

Performance

Next steps

Installation

Quick start

Model guides

API reference

Build docs developers (and LLMs) love

Get Started

Core Concepts

Language Models

Vision-Language Models

Speech Recognition

Text-to-Speech

Image Generation

API Server

Advanced

​Welcome to OminiX-MLX

​Key features

GPU acceleration

Pure Rust

Lazy evaluation

Modular design

​What you can build

Language models

Vision-language models

Speech recognition

Text-to-speech

Image generation

API server

​Architecture

​Performance

​Next steps

Installation

Quick start

Model guides

API reference

Build docs developers (and LLMs) love

Welcome to OminiX-MLX

Key features

What you can build

Architecture

Performance

Next steps