Overview
ContextExample represents a single context example with paired audio and text, used for zero-shot transcription with the omniASR_LLM_7B_ZS model.
Structure
ContextExample is a dataclass with two fields:
Audio input in one of the following formats:
strorPath: Path to an audio filebytes: Raw audio bytesNDArray[np.int8]: Audio data as numpy arraydict: Pre-decoded audio with'waveform'and'sample_rate'keys
Corresponding text transcription for the audio. This demonstrates the expected transcription style, language, and formatting.
Usage
Context examples are used withASRInferencePipeline.transcribe_with_context() to perform zero-shot transcription. They provide the model with reference audio-text pairs to learn the transcription style.
Basic Example
Using Pre-decoded Audio
Using with ASRInferencePipeline
Context Example Requirements
Training Configuration: The zero-shot model was trained with:
- Up to 30 seconds per context example
- Exactly 10 context examples per training sample
Automatic Replication
The pipeline automatically handles context example count:- Fewer than 10 examples: Automatically replicated to reach 10 total
- More than 10 examples: Automatically cropped to first 10
- At least 1 required: You must provide at least one context example
Best Practices
- Quality over quantity: Provide high-quality context examples that closely match your target domain
- Consistent style: Use context examples with consistent transcription style (punctuation, formatting)
- Language matching: Use context examples in the same language as your target audio
- Audio length: Keep context examples under 30 seconds each
- Diversity: If possible, provide varied examples to cover different acoustic conditions
Example: Low-Resource Language
Source Reference
See implementation atsrc/omnilingual_asr/models/inference/pipeline.py:64