Storage schema

NoteWise keeps a local SQLite cache at ~/.notewise/.notewise_cache.db (or $NOTEWISE_HOME/.notewise_cache.db if overridden). The cache is local per-user and is never shared, synced, or uploaded anywhere. The database stores:

Video metadata — title, duration, and when it was last cached
Transcripts — raw transcript text and language for each video
Run statistics — token usage, cost, timing, and model for every processing run
Export records — a log of which transcript files were exported and where

The notewise stats, notewise history, and notewise cache commands all read from this database.

Tables

`video`

Cached metadata for each YouTube video that has been processed.

string

required

YouTube video ID (11-character alphanumeric string). Primary key.

title

string

required

Video title at the time it was last processed.

duration

integer

required

Video duration in seconds.

cached_at

datetime (UTC)

required

Timestamp of when this video’s metadata was last written to the cache. Used by prune_old_entries() and displayed in notewise history.

`transcript`

One row per video, storing the raw transcript content.

integer

required

Auto-increment primary key.

video_id

string

required

Foreign key referencing video.id. Has a UNIQUE constraint — only one transcript is stored per video. Re-processing a video replaces the existing transcript row in place.

content

string

required

Full transcript text as a single string.

language

string

required

BCP 47 language code of the transcript (e.g., en, es, fr).

`runstats`

One row per processing run. A video accumulates multiple rows here if it is processed more than once (e.g., with different models or after a --force re-run).

integer

required

Auto-increment primary key.

video_id

string

required

Foreign key referencing video.id.

model

string

required

LiteLLM model string used for this run (e.g., gemini/gemini-2.5-flash).

tokens_used

integer

required

Total tokens consumed (prompt + completion).

prompt_tokens

integer

default:"0"

Input (prompt) tokens consumed.

completion_tokens

integer

default:"0"

Output (completion) tokens generated.

cost_usd

float

default:"0.0"

Estimated cost in USD for this run, as reported by LiteLLM.

transcript_seconds

float

default:"0.0"

Wall-clock time in seconds spent fetching the transcript.

generation_seconds

float

default:"0.0"

Wall-clock time in seconds spent on LLM generation.

timestamp

datetime (UTC)

required

When this run was completed.

`exportrecord`

One row per transcript file exported via --export-transcript. A video may have multiple export records (e.g., both .txt and .json).

integer

required

Auto-increment primary key.

video_id

string

required

Foreign key referencing video.id.

format

string

required

Export format identifier: txt or json.

output_path

string

required

Absolute filesystem path where the exported transcript file was written.

timestamp

datetime (UTC)

required

When the export was performed.

`schema_version`

Internal table used by the migration runner. Contains a single row with the current schema version number.

version

integer

required

Current schema version. Currently 2.

Migration system

NoteWise uses a hand-rolled, additive migration runner in storage/migrations.py. Alembic is not used. When the DatabaseRepository is first opened, it calls run_migrations(connection) which:

Reads the current schema version

Queries schema_version. If the table does not exist, it is created and the version is set to 0.

Runs pending migrations in order

Each migration is registered as a (version_number, function) tuple in the MIGRATIONS tuple. Any migration with a version number greater than the current schema version is executed.

Updates the schema version

After each migration function completes, schema_version is updated to that migration’s version number. The final version is returned.

Registered migrations

Version	Function	Description
1	`migration_1_add_runstats_columns`	Adds `prompt_tokens`, `completion_tokens`, `cost_usd`, `transcript_seconds`, and `generation_seconds` to the `runstats` table for existing databases that predate these columns.
2	`migration_2_add_video_cached_at`	Adds the `cached_at` column to the `video` table. Existing rows are backfilled with the current timestamp at migration time.

All migrations are additive — they only add columns or tables, never drop or rename. This ensures forward compatibility with older versions of NoteWise reading a newer database.

To add a new migration, append a new (version, function) entry to MIGRATIONS in migrations.py and update LATEST_SCHEMA_VERSION. Do not modify or reorder existing migration entries.

What commands read from the cache

`notewise history`

Reads the video and runstats tables via a JOIN to return the most recently processed videos. For each video, it displays the title, duration, the model used in the most recent run, total tokens, estimated cost, and the timestamp of the last run. The underlying query returns a RecentVideoSchema for each row:

string

YouTube video ID.

title

string

Video title.

duration

integer

Duration in seconds.

cached_at

datetime

When the metadata was last cached.

last_run_at

datetime

Timestamp of the most recent run.

model

string

Model used in the most recent run.

cost_usd

float

Cost of the most recent run.

tokens_used

integer

Token count for the most recent run.

`notewise stats`

Aggregates the entire runstats table (or a filtered subset by --days or --model) and returns a StatsSummarySchema with totals and a per-model breakdown:

Total videos processed (distinct video_id count)
Total runs
Total tokens (prompt + completion)
Total estimated cost in USD
Total transcript and generation time in seconds
Per-model breakdown of all of the above

`notewise cache`

The notewise cache info subcommand uses get_cache_summary() to report:

Total cached videos, transcripts, runs, and exports
Oldest and newest cached_at timestamps

notewise cache clear deletes all rows from all tables and resets the database.

Thread safety

DatabaseRepository is a singleton per database path, protected by a module-level threading.Lock. All write operations (upsert_video_cache, add_export_record, prune_old_entries) acquire an additional per-instance _write_lock before opening a session. Read operations are lock-free. The engine is created with NullPool to avoid connection-pool issues in async contexts.

Configuration & Output

Internals

Tables

`video`

`transcript`

`runstats`

`exportrecord`

`schema_version`

Migration system

Registered migrations

What commands read from the cache

`notewise history`

`notewise stats`

`notewise cache`

Thread safety

Build docs developers (and LLMs) love

Configuration & Output

Internals

​Tables

​video

​transcript

​runstats

​exportrecord

​schema_version

​Migration system

​Registered migrations

​What commands read from the cache

​notewise history

​notewise stats

​notewise cache

​Thread safety

Build docs developers (and LLMs) love

Tables

`video`

`transcript`

`runstats`

`exportrecord`

`schema_version`

Migration system

Registered migrations

What commands read from the cache

`notewise history`

`notewise stats`

`notewise cache`

Thread safety