Functions
transcribeAudio
Transcribes an audio file into timestamped text segments using the Faster Whisper model.Path to the audio file to transcribe (supports WAV, MP3, and other common formats)
List of transcription segments, where each segment is
[text, start_time, end_time]:text(str): The transcribed text for this segmentstart_time(float): Start time in secondsend_time(float): End time in seconds
Model Configuration
- Model:
base.en(English-only for faster processing) - Device: Automatically selects CUDA if available, otherwise CPU
- Beam Size: 5 (for better accuracy)
- Language: English (
en) - Max New Tokens: 128
- Condition on Previous Text: False (prevents context accumulation)
Features
- Automatic GPU Detection: Uses CUDA if available for 5-10x faster transcription
- Timestamped Segments: Returns precise start/end times for each phrase
- Error Handling: Returns empty list on failure with error message
- Progress Indication: Prints device type and completion status
Output Format
Each transcription segment is a list with three elements:The function automatically detects and uses GPU acceleration if CUDA is available. On CPU, transcription may be significantly slower for long audio files.
Performance
- GPU (CUDA): ~1-2 minutes for a 10-minute audio file
- CPU: ~5-10 minutes for a 10-minute audio file
- Model Size: ~150MB download on first run
Device Detection
The component prints the detected device on execution:Error Handling
Returns an empty list if transcription fails:Model Details
- Model Name:
base.en - Language Support: English only (faster than multilingual models)
- Accuracy: Good for clear speech, may struggle with heavy accents or background noise
- Beam Search: Uses beam size of 5 for better accuracy vs. speed tradeoff
