How it works
The transcription system uses a multi-stage pipeline to process meeting videos:Video submission
Provide a YouTube URL for the council meeting video. The system validates that the meeting exists and checks for existing transcripts.
Voice preparation
The system collects voiceprints from known council members and builds a custom vocabulary with city names, person names, and party names to improve accuracy.
Task processing
A background task downloads the video, extracts audio, and sends it to the transcription service with custom prompts and vocabulary.
Speaker identification
The system matches anonymous speaker segments to known council members using voice biometric matching with confidence scores.
Speaker recognition
OpenCouncil uses voice biometrics to automatically identify speakers in meeting recordings:Voiceprint matching
Voiceprint matching
The system compares speaker audio segments against stored voiceprints for each council member. When a match is found with sufficient confidence, the speaker is automatically tagged with the person’s identity.
Administrative body filtering
Administrative body filtering
Only voiceprints for people relevant to the meeting’s administrative body are used for matching. This improves accuracy by limiting the search space to expected speakers.The system retrieves people based on the meeting’s
administrativeBodyId from src/lib/tasks/transcribe.ts:92.Unknown speakers
Unknown speakers
When a speaker cannot be identified, they are labeled as “Unknown Speaker 1”, “Unknown Speaker 2”, etc. These can be manually corrected later through the UI.
Custom vocabulary and prompts
To improve transcription accuracy for Greek council meetings, the system uses:Custom vocabulary
A list of city-specific terms including:
- Municipality name
- Council member names
- Political party names
- Local place names
Custom prompts
Context-aware instructions in Greek that describe the meeting:“Αυτή είναι η απομαγνητοφώνηση της συνεδρίασης του δήμου της [City] που έγινε στις [Date].”
Data structure
Transcripts are organized hierarchically:Each speaker segment represents continuous speech by one person. Segments are split when the speaker changes or when there’s more than 5 seconds of silence.
Speaker segments
Speaker segments group consecutive utterances from the same speaker:src/lib/tasks/transcribe.ts:318-350
Utterances
Each utterance represents a complete sentence or phrase with:startTimestamp- Start time in secondsendTimestamp- End time in secondstext- The transcribed textdrift- Audio sync correction value
Configuration
Set up transcription in your.env file:
Performance optimization
The system uses several optimizations for large meetings:- Batch processing
- Pre-computation
- Nested creates
Speaker segments are created in parallel batches of 50 to reduce database round-trips:This provides 10-50x performance improvement over sequential processing.
Reprocessing transcripts
You can force reprocessing of existing transcripts:force: false (default), the system will throw an error if speaker segments already exist.
API reference
Key functions fromsrc/lib/tasks/transcribe.ts:
Initiates transcription for a meeting video
Processes the transcription result from the task server
Next steps
Search transcripts
Learn how to search across all meeting transcripts
AI summaries
Generate summaries and insights from transcripts
Meeting highlights
Create shareable video clips from transcripts
Notifications
Set up alerts for transcript availability