Qwen3-ASR MLX
Qwen3-ASR speech recognition on Apple Silicon using MLX. Supports all Qwen3-ASR model sizes (0.6B, 1.7B) with architecture fully driven by config.Architecture
- Audio Encoder (AuT): Conv2d frontend + Transformer with windowed attention
- Projector: Linear projection from encoder dim to decoder dim
- Text Decoder: Qwen3 LLM with GQA and Q/K RMSNorm
Installation
Quick start
Functions
default_model_path
Get the default model path.QWEN3_ASR_MODEL_PATHenvironment variable~/.OminiX/models/qwen3-asr-1.7b
Default model directory path
load_model
Load a Qwen3-ASR model from a directory.Path to model directory containing config.json and safetensors weights
Loaded Qwen3ASR model instance
Qwen3ASR
Main model struct for Qwen3-ASR speech recognition.Qwen3ASR::load
Load model from directory.Directory containing config.json and model safetensors files
Loaded model with audio encoder, text decoder, and tokenizer
Qwen3ASR::transcribe
Transcribe audio file.Path to audio file (WAV format)
Transcribed text in Chinese by default
Qwen3ASR::transcribe_with_language
Transcribe audio file with specified language.Path to audio file (WAV format)
Language hint (e.g., “Chinese”, “English”, “Japanese”, “Korean”, “French”, “German”, “Spanish”, “Russian”)
Transcribed text in specified language
Qwen3ASR::transcribe_samples
Transcribe audio samples (16kHz mono f32).Audio samples at 16kHz, mono, f32 format
Language hint for transcription
Transcribed text
Qwen3ASR::transcribe_samples_chunked
Transcribe long audio by splitting into chunks.Audio samples at 16kHz
Language hint
Sampling configuration for generation
Duration of each chunk in seconds (e.g., 30.0)
Concatenated transcription from all chunks
Qwen3ASR::transcribe_samples_with_config
Transcribe with full configuration.Audio samples at 16kHz
Language hint
Sampling configuration (temperature, max_tokens)
Transcribed text
Types
Qwen3ASRConfig
Model configuration loaded from config.json.SamplingConfig
Sampling configuration for text generation.temperature: 0.0(greedy decoding)max_tokens: 8192
AudioConfig
Audio preprocessing configuration.Error
Error type for Qwen3-ASR operations.Supported languages
- Chinese
- English
- Cantonese
- Japanese
- Korean
- French
- German
- Spanish
- Russian
Model files
Required files in model directory:config.json- Model configurationmodel.safetensorsormodel-00001-of-*.safetensors- Model weightstokenizer.jsonorvocab.json+merges.txt- Tokenizertokenizer_config.json- Tokenizer configuration
Environment variables
QWEN3_ASR_MODEL_PATH- Override default model path