Create a new file transcribe.py with the following code:
transcribe.py
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline# Initialize the pipeline with a modelpipeline = ASRInferencePipeline(model_card="omniASR_LLM_Unlimited_7B_v2")# Transcribe an audio fileaudio_files = ["/path/to/your/audio.wav"]transcriptions = pipeline.transcribe(audio_files, batch_size=1)# Print the resultprint(f"Transcription: {transcriptions[0]}")
3
Run the Script
Execute your script:
python transcribe.py
The model will be automatically downloaded on first use and cached for future runs.
The first run will download the model (~30 GiB for the 7B model). Subsequent runs will use the cached model from ~/.cache/fairseq2/assets/.
The pipeline accepts multiple audio input formats:
# Most common: provide file pathsaudio_files = [ "/path/to/audio1.flac", "/path/to/audio2.wav", "/path/to/audio3.mp3"]transcriptions = pipeline.transcribe(audio_files, batch_size=3)
Audio Length Constraint: Currently, only audio files shorter than 40 seconds are accepted for CTC and standard LLM models. Use omniASR_LLM_Unlimited_* models for longer audio.
Models are large (1.2 GiB to 30 GiB). The first download may take time depending on your internet connection. Models are cached in ~/.cache/fairseq2/assets/ for future use.
Out of memory errors
Try a smaller model (300M or 1B instead of 3B or 7B), reduce batch size to 1, or use a GPU with more VRAM.