~/.notewise/.notewise_cache.db (or $NOTEWISE_HOME/.notewise_cache.db if overridden). The cache is local per-user and is never shared, synced, or uploaded anywhere.
The database stores:
- Video metadata — title, duration, and when it was last cached
- Transcripts — raw transcript text and language for each video
- Run statistics — token usage, cost, timing, and model for every processing run
- Export records — a log of which transcript files were exported and where
notewise stats, notewise history, and notewise cache commands all read from this database.
Tables
video
Cached metadata for each YouTube video that has been processed.
YouTube video ID (11-character alphanumeric string). Primary key.
Video title at the time it was last processed.
Video duration in seconds.
Timestamp of when this video’s metadata was last written to the cache. Used by
prune_old_entries() and displayed in notewise history.transcript
One row per video, storing the raw transcript content.
Auto-increment primary key.
Foreign key referencing
video.id. Has a UNIQUE constraint — only one transcript is stored per video. Re-processing a video replaces the existing transcript row in place.Full transcript text as a single string.
BCP 47 language code of the transcript (e.g.,
en, es, fr).runstats
One row per processing run. A video accumulates multiple rows here if it is processed more than once (e.g., with different models or after a --force re-run).
Auto-increment primary key.
Foreign key referencing
video.id.LiteLLM model string used for this run (e.g.,
gemini/gemini-2.5-flash).Total tokens consumed (prompt + completion).
Input (prompt) tokens consumed.
Output (completion) tokens generated.
Estimated cost in USD for this run, as reported by LiteLLM.
Wall-clock time in seconds spent fetching the transcript.
Wall-clock time in seconds spent on LLM generation.
When this run was completed.
exportrecord
One row per transcript file exported via --export-transcript. A video may have multiple export records (e.g., both .txt and .json).
Auto-increment primary key.
Foreign key referencing
video.id.Export format identifier:
txt or json.Absolute filesystem path where the exported transcript file was written.
When the export was performed.
schema_version
Internal table used by the migration runner. Contains a single row with the current schema version number.
Current schema version. Currently
2.Migration system
NoteWise uses a hand-rolled, additive migration runner instorage/migrations.py. Alembic is not used.
When the DatabaseRepository is first opened, it calls run_migrations(connection) which:
Reads the current schema version
Queries
schema_version. If the table does not exist, it is created and the version is set to 0.Runs pending migrations in order
Each migration is registered as a
(version_number, function) tuple in the MIGRATIONS tuple. Any migration with a version number greater than the current schema version is executed.Registered migrations
| Version | Function | Description |
|---|---|---|
| 1 | migration_1_add_runstats_columns | Adds prompt_tokens, completion_tokens, cost_usd, transcript_seconds, and generation_seconds to the runstats table for existing databases that predate these columns. |
| 2 | migration_2_add_video_cached_at | Adds the cached_at column to the video table. Existing rows are backfilled with the current timestamp at migration time. |
To add a new migration, append a new
(version, function) entry to MIGRATIONS in migrations.py and update LATEST_SCHEMA_VERSION. Do not modify or reorder existing migration entries.What commands read from the cache
notewise history
Reads the video and runstats tables via a JOIN to return the most recently processed videos. For each video, it displays the title, duration, the model used in the most recent run, total tokens, estimated cost, and the timestamp of the last run.
The underlying query returns a RecentVideoSchema for each row:
YouTube video ID.
Video title.
Duration in seconds.
When the metadata was last cached.
Timestamp of the most recent run.
Model used in the most recent run.
Cost of the most recent run.
Token count for the most recent run.
notewise stats
Aggregates the entire runstats table (or a filtered subset by --days or --model) and returns a StatsSummarySchema with totals and a per-model breakdown:
- Total videos processed (distinct
video_idcount) - Total runs
- Total tokens (prompt + completion)
- Total estimated cost in USD
- Total transcript and generation time in seconds
- Per-model breakdown of all of the above
notewise cache
The notewise cache info subcommand uses get_cache_summary() to report:
- Total cached videos, transcripts, runs, and exports
- Oldest and newest
cached_attimestamps
notewise cache clear deletes all rows from all tables and resets the database.
Thread safety
DatabaseRepository is a singleton per database path, protected by a module-level threading.Lock. All write operations (upsert_video_cache, add_export_record, prune_old_entries) acquire an additional per-instance _write_lock before opening a session. Read operations are lock-free. The engine is created with NullPool to avoid connection-pool issues in async contexts.