Skip to main content

Prerequisites

Before installing OminiX-MLX, ensure your system meets these requirements:

Operating system

macOS 14.0+ (Sonoma) or newer

Hardware

Apple Silicon (M1, M2, M3, or M4 chip)

Rust

Rust 1.82.0 or newer

Development tools

Xcode Command Line Tools
OminiX-MLX requires Apple Silicon to take advantage of Metal GPU acceleration and unified memory. Intel Macs are not supported.

Step 1: Install Xcode Command Line Tools

The Command Line Tools provide essential build utilities and frameworks:
xcode-select --install
Verify the installation:
xcode-select -p
# Should output: /Library/Developer/CommandLineTools

Step 2: Install Rust

If you don’t have Rust installed, use rustup:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
After installation, reload your shell configuration:
source $HOME/.cargo/env
Verify Rust is installed and meets the minimum version:
rustc --version
# Should output: rustc 1.82.0 or newer
If you have an older version of Rust, update it with rustup update.

Step 3: Install HuggingFace CLI (optional)

The HuggingFace CLI makes it easy to download pre-trained models:
pip install huggingface-hub
Verify the installation:
huggingface-cli --version
While the HuggingFace CLI is optional, it’s highly recommended for downloading models. Alternatively, you can download models manually from huggingface.co.

Step 4: Clone the repository

Clone the OminiX-MLX repository:
git clone https://github.com/OminiX-ai/OminiX-MLX.git
cd OminiX-MLX

Step 5: Build OminiX-MLX

You can build all crates or just specific ones you need.
cargo build --release
This will compile all model crates and core libraries. The first build may take 10-20 minutes.
Always use the --release flag for production builds. Debug builds are significantly slower and use more memory.

Step 6: Verify installation

Verify your installation by checking the built binaries:
ls -lh target/release/examples/
You should see example executables for the models you built.

Available crates

OminiX-MLX is organized into multiple crates. Here are the main ones:

Core libraries

CrateDescription
mlx-rsSafe Rust bindings to Apple’s MLX framework
mlx-rs-coreShared inference infrastructure (KV cache, RoPE, attention)

Language models

CrateModelsSizes
qwen3-mlxQwen2, Qwen3, Qwen3-MoE0.5B - 235B
glm4-mlxGLM-49B
glm4-moe-mlxGLM-4-MoE9B (45 experts)
mixtral-mlxMixtral8x7B, 8x22B
mistral-mlxMistral7B
minicpm-sala-mlxMiniCPM-SALA9B (1M context)

Vision-language models

CrateModelsFeatures
moxin-vlm-mlxMoxin-7B VLMDINOv2 + SigLIP + Mistral-7B

Speech recognition

CrateModelsLanguages
qwen3-asr-mlxQwen3-ASR30+ languages
funasr-mlxParaformerChinese, English
funasr-nano-mlxFunASR-NanoChinese, English

Text-to-speech

CrateModelsFeatures
gpt-sovits-mlxGPT-SoVITSFew-shot voice cloning

Image generation

CrateModelsNotes
flux-klein-mlxFLUX.2-kleinFast 4-step generation
zimage-mlxZ-ImageLightweight generation
qwen-image-mlxQwen ImageQwen-based generation

API server

CrateDescription
ominix-apiUnified OpenAI-compatible API server for all models

System requirements by model

Different models have different memory requirements:
Model SizeRecommended RAMMinimum RAM
0.5B - 2B8GB8GB
4B - 7B16GB12GB
9B24GB16GB
30B+64GB+32GB
VLM (7B)24GB16GB
ASR8GB8GB
Image Gen32GB16GB
Use quantized models (8-bit or 4-bit) to reduce memory usage. For example, Qwen3-4B-8bit uses ~4GB instead of ~8GB.

Troubleshooting

The MLX C bindings are included as a submodule. Make sure you cloned with submodules:
git submodule update --init --recursive
Ensure you have the Xcode Command Line Tools installed:
xcode-select --install
Build specific crates instead of all at once:
cargo build --release -p qwen3-mlx
Make sure you’re using --release builds. Debug builds are 10-100x slower:
cargo build --release

Next steps

Quick start

Run your first model in under 5 minutes

LLM guide

Learn how to use language models

Build docs developers (and LLMs) love