Introduction

Header image with a collage of on-the-ground photos from the transcription gathering efforts in Pakistan and Liberia

Welcome to Omnilingual ASR

Omnilingual ASR is an open-source speech recognition system supporting over 1,600 languages — including hundreds never previously covered by any ASR technology. Designed for broad accessibility, it enables new languages to be added with just a few paired examples without requiring specialized expertise or large datasets. By combining scalable zero-shot learning with a flexible model family, Omnilingual ASR aims to make speech technology more inclusive and adaptable for communities and researchers worldwide.

Quick Start

Get transcribing audio in minutes with simple Python code

Installation

Install via pip or uv and set up your environment

Supported Languages

Explore 1600+ languages with language ID format reference

Model Architecture

Learn about W2V, CTC, and LLM model families

Key Features

1600+ Language Support

State-of-the-art performance across 1,600+ languages with character error rates (CER) below 10% for 78% of those languages. Includes hundreds of languages never previously covered by ASR technology.

Multiple Model Families

Choose from three model architectures:

CTC Models: Fast parallel generation (up to 96x real-time)
LLM Models: Language-conditioned transcription with optional context
Zero-Shot Models: Transcribe new languages with just a few examples

Flexible Model Sizes

Available in 300M, 1B, 3B, and 7B parameter variants to balance accuracy and computational resources.

Zero-Shot Learning

Add support for new languages with just a few paired audio-text examples using the zero-shot model variant.

Performance Highlights

Our 7B-LLM-ASR system achieves state-of-the-art performance across 1,600+ languages, with character error rates (CER) below 10 for 78% of those languages.

Per-language CER results and training hours can be found in the per_language_results_table_7B_llm_asr.csv file.

Model Architecture Overview

Omnilingual ASR provides three distinct model families:

SSL (Self-Supervised Learning) Models

Pre-trained wav2vec2 encoders that learn speech representations without transcriptions. These serve as the foundation for both CTC and LLM models.

CTC (Connectionist Temporal Classification) Models

Fast, parallel transcription models ideal for real-time applications:

Real-Time Factor: 0.001-0.006 (16x to 96x faster than real-time)
Use Case: High-throughput batch processing
Limitation: No language conditioning support

LLM (Language Model) Models

Autoregressive models with optional language conditioning:

Real-Time Factor: ~0.09 (approximately real-time)
Language Conditioning: Specify target language for better accuracy
Unlimited Variants: Support for audio of any length
Zero-Shot Variant: Learn new languages from context examples

December 2025 Update

We released two major improvements:

Improved v2 Models: Enhanced accuracy (CER) for both CTC and LLM-ASR models
Unlimited Audio Length: New LLM variant supporting transcription of unlimited-length audio (omniASR_LLM_Unlimited_{300M,1B,3B,7B}_v2)

Unlimited audio length models have comparable accuracy to limited-length models, however finetuning recipes for these models are currently not supported.

Resources

Research Paper

Read the full technical paper

Blog Post

Learn about the research and development

HuggingFace Demo

Try the interactive demo

Dataset

Access the multilingual corpus

Next Steps

Install the Package

Follow the installation guide to set up Omnilingual ASR in your environment.

Run Your First Transcription

Try the quick start guide to transcribe your first audio file.

Explore Advanced Features

Learn about language conditioning and zero-shot learning.

License

Omnilingual ASR code and models are released under the Apache 2.0 License.

Get Started

Guides

Models

Advanced

Welcome to Omnilingual ASR

Quick Start

Installation

Supported Languages

Model Architecture

Key Features

Performance Highlights

Model Architecture Overview

SSL (Self-Supervised Learning) Models

CTC (Connectionist Temporal Classification) Models

LLM (Language Model) Models

December 2025 Update

Resources

Research Paper

Blog Post

HuggingFace Demo

Dataset

Next Steps

License

Build docs developers (and LLMs) love

Get Started

Guides

Models

Advanced

​Welcome to Omnilingual ASR

Quick Start

Installation

Supported Languages

Model Architecture

​Key Features

​Performance Highlights

​Model Architecture Overview

​SSL (Self-Supervised Learning) Models

​CTC (Connectionist Temporal Classification) Models

​LLM (Language Model) Models

​December 2025 Update

​Resources

Research Paper

Blog Post

HuggingFace Demo

Dataset

​Next Steps

​License

Build docs developers (and LLMs) love

Welcome to Omnilingual ASR

Key Features

Performance Highlights

Model Architecture Overview

SSL (Self-Supervised Learning) Models

CTC (Connectionist Temporal Classification) Models

LLM (Language Model) Models

December 2025 Update

Resources

Next Steps

License