Skip to main content
OminiX-MLX provides three state-of-the-art speech recognition models optimized for Apple Silicon, delivering 18-50x real-time transcription speeds through Metal GPU acceleration.

Available models

Qwen3-ASR

Multilingual ASR supporting 30+ languages with 30-50x real-time speed

Paraformer

Non-autoregressive Chinese ASR with 18x+ real-time speed

FunASR-Nano

LLM-based 800M parameter model supporting 31 languages

Performance comparison

ModelLanguagesSpeedArchitectureParameters
Qwen3-ASR-1.7B30+ languages30x RTEncoder-decoder1.7B
Qwen3-ASR-0.6B30+ languages22x RTEncoder-decoder0.6B
ParaformerChinese18-75x RTNon-autoregressive220M
FunASR-Nano31 languages~10x RTLLM-based800M
Speed measured on Apple M3/M4 series chips. RT = real-time factor.

Key features

Pure Rust implementation

All models are implemented in Rust with zero Python dependencies at runtime:
  • Native Metal GPU acceleration via MLX
  • Efficient memory management
  • Cross-platform binary distribution
  • Direct integration into Rust applications

Optimized for Apple Silicon

  • Metal GPU acceleration for neural network operations
  • Accelerate framework for audio processing (FFT, resampling)
  • 8-bit quantization support for reduced memory usage
  • Efficient batch processing for long-form audio

Production-ready API

Unified API server provides OpenAI-compatible endpoints:
# Start API server
cargo run --release -p ominix-api -- \
    --asr-model ~/.OminiX/models/qwen3-asr-1.7b --port 8080

# Transcribe audio (OpenAI Whisper-compatible)
curl http://localhost:8080/v1/audio/transcriptions \
    -F [email protected] -F language=English

Architecture overview

Qwen3-ASR architecture

Audio (16kHz) → 128-mel Spectrogram → Conv2d×3 (8× downsample)
             → Transformer Encoder → Linear Projector → Qwen3 Decoder → Text

Paraformer architecture

Audio (16kHz) → 80-mel Spectrogram → LFR 7/6
             → SAN-M Encoder (50 layers) → CIF Predictor
             → Bidirectional Decoder → Tokens (parallel)

FunASR-Nano architecture

Audio (16kHz) → 80-mel Spectrogram → Whisper Encoder (frozen)
             → Audio Adaptor → Qwen LLM → Text

Supported audio formats

All models support:
  • WAV - Native support (any sample rate, mono/stereo)
  • MP3, M4A, FLAC, OGG, AAC - Automatic conversion via ffmpeg
  • Raw samples - Direct f32 array input at 16kHz
Automatic resampling to 16kHz is handled internally.

Model selection guide

Choose Qwen3-ASR when you need:

  • Multilingual support (30+ languages)
  • Best accuracy on Chinese, English, Japanese, Korean
  • Long-form audio transcription (automatic 30s chunking)
  • Production-grade quality and speed balance

Choose Paraformer when you need:

  • Chinese-only transcription
  • Maximum speed (non-autoregressive)
  • Lower memory footprint
  • Extremely fast inference for short audio

Choose FunASR-Nano when you need:

  • 31 language support including dialects
  • Far-field/noisy environment robustness
  • Regional accent recognition
  • LLM-based semantic understanding

Quick start

1

Download a model

Download any ASR model from HuggingFace:
# Qwen3-ASR-1.7B (recommended)
huggingface-cli download mlx-community/Qwen3-ASR-1.7B-8bit \
    --local-dir ~/.OminiX/models/qwen3-asr-1.7b

# Paraformer
git clone https://modelscope.cn/models/damo/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch.git

# FunASR-Nano
huggingface-cli download mlx-community/Fun-ASR-Nano-2512-fp16 \
    --local-dir ~/.OminiX/models/funasr-nano
2

Transcribe audio

Use the command-line interface:
# Qwen3-ASR
cargo run --release --example transcribe -- audio.wav

# Paraformer (after conversion)
cargo run --release --example transcribe -- audio.wav /path/to/paraformer

# FunASR-Nano
cargo run --release --example transcribe -- audio.wav
3

Integrate into your application

Use the Rust API:
use qwen3_asr_mlx::{Qwen3ASR, default_model_path};

let mut model = Qwen3ASR::load(default_model_path())?;
let text = model.transcribe_with_language("audio.wav", "English")?;
println!("Transcription: {}", text);

Next steps

Qwen3-ASR

Learn about the multilingual Qwen3-ASR models

Paraformer

Explore the high-speed Paraformer Chinese ASR

FunASR-Nano

Discover the LLM-based FunASR-Nano

API Reference

View the unified API documentation

Build docs developers (and LLMs) love