Skip to main content
NoteWise keeps a local SQLite cache at ~/.notewise/.notewise_cache.db (or $NOTEWISE_HOME/.notewise_cache.db if overridden). The cache is local per-user and is never shared, synced, or uploaded anywhere. The database stores:
  • Video metadata — title, duration, and when it was last cached
  • Transcripts — raw transcript text and language for each video
  • Run statistics — token usage, cost, timing, and model for every processing run
  • Export records — a log of which transcript files were exported and where
The notewise stats, notewise history, and notewise cache commands all read from this database.

Tables

video

Cached metadata for each YouTube video that has been processed.
id
string
required
YouTube video ID (11-character alphanumeric string). Primary key.
title
string
required
Video title at the time it was last processed.
duration
integer
required
Video duration in seconds.
cached_at
datetime (UTC)
required
Timestamp of when this video’s metadata was last written to the cache. Used by prune_old_entries() and displayed in notewise history.

transcript

One row per video, storing the raw transcript content.
id
integer
required
Auto-increment primary key.
video_id
string
required
Foreign key referencing video.id. Has a UNIQUE constraint — only one transcript is stored per video. Re-processing a video replaces the existing transcript row in place.
content
string
required
Full transcript text as a single string.
language
string
required
BCP 47 language code of the transcript (e.g., en, es, fr).

runstats

One row per processing run. A video accumulates multiple rows here if it is processed more than once (e.g., with different models or after a --force re-run).
id
integer
required
Auto-increment primary key.
video_id
string
required
Foreign key referencing video.id.
model
string
required
LiteLLM model string used for this run (e.g., gemini/gemini-2.5-flash).
tokens_used
integer
required
Total tokens consumed (prompt + completion).
prompt_tokens
integer
default:"0"
Input (prompt) tokens consumed.
completion_tokens
integer
default:"0"
Output (completion) tokens generated.
cost_usd
float
default:"0.0"
Estimated cost in USD for this run, as reported by LiteLLM.
transcript_seconds
float
default:"0.0"
Wall-clock time in seconds spent fetching the transcript.
generation_seconds
float
default:"0.0"
Wall-clock time in seconds spent on LLM generation.
timestamp
datetime (UTC)
required
When this run was completed.

exportrecord

One row per transcript file exported via --export-transcript. A video may have multiple export records (e.g., both .txt and .json).
id
integer
required
Auto-increment primary key.
video_id
string
required
Foreign key referencing video.id.
format
string
required
Export format identifier: txt or json.
output_path
string
required
Absolute filesystem path where the exported transcript file was written.
timestamp
datetime (UTC)
required
When the export was performed.

schema_version

Internal table used by the migration runner. Contains a single row with the current schema version number.
version
integer
required
Current schema version. Currently 2.

Migration system

NoteWise uses a hand-rolled, additive migration runner in storage/migrations.py. Alembic is not used. When the DatabaseRepository is first opened, it calls run_migrations(connection) which:
1

Reads the current schema version

Queries schema_version. If the table does not exist, it is created and the version is set to 0.
2

Runs pending migrations in order

Each migration is registered as a (version_number, function) tuple in the MIGRATIONS tuple. Any migration with a version number greater than the current schema version is executed.
3

Updates the schema version

After each migration function completes, schema_version is updated to that migration’s version number. The final version is returned.

Registered migrations

VersionFunctionDescription
1migration_1_add_runstats_columnsAdds prompt_tokens, completion_tokens, cost_usd, transcript_seconds, and generation_seconds to the runstats table for existing databases that predate these columns.
2migration_2_add_video_cached_atAdds the cached_at column to the video table. Existing rows are backfilled with the current timestamp at migration time.
All migrations are additive — they only add columns or tables, never drop or rename. This ensures forward compatibility with older versions of NoteWise reading a newer database.
To add a new migration, append a new (version, function) entry to MIGRATIONS in migrations.py and update LATEST_SCHEMA_VERSION. Do not modify or reorder existing migration entries.

What commands read from the cache

notewise history

Reads the video and runstats tables via a JOIN to return the most recently processed videos. For each video, it displays the title, duration, the model used in the most recent run, total tokens, estimated cost, and the timestamp of the last run. The underlying query returns a RecentVideoSchema for each row:
id
string
YouTube video ID.
title
string
Video title.
duration
integer
Duration in seconds.
cached_at
datetime
When the metadata was last cached.
last_run_at
datetime
Timestamp of the most recent run.
model
string
Model used in the most recent run.
cost_usd
float
Cost of the most recent run.
tokens_used
integer
Token count for the most recent run.

notewise stats

Aggregates the entire runstats table (or a filtered subset by --days or --model) and returns a StatsSummarySchema with totals and a per-model breakdown:
  • Total videos processed (distinct video_id count)
  • Total runs
  • Total tokens (prompt + completion)
  • Total estimated cost in USD
  • Total transcript and generation time in seconds
  • Per-model breakdown of all of the above

notewise cache

The notewise cache info subcommand uses get_cache_summary() to report:
  • Total cached videos, transcripts, runs, and exports
  • Oldest and newest cached_at timestamps
notewise cache clear deletes all rows from all tables and resets the database.

Thread safety

DatabaseRepository is a singleton per database path, protected by a module-level threading.Lock. All write operations (upsert_video_cache, add_export_record, prune_old_entries) acquire an additional per-instance _write_lock before opening a session. Read operations are lock-free. The engine is created with NullPool to avoid connection-pool issues in async contexts.

Build docs developers (and LLMs) love