Architecture

NoteWise is a Python CLI built around a linear pipeline: resolve the source, fetch the transcript, chunk the text, call the LLM, and write Markdown files. Each stage is isolated into its own module, and all heavy work runs asynchronously so multiple videos and chapters can be processed concurrently.

Pipeline overview

CLI entry point

Every command goes through notewise/__main__.py, which calls main() and hands off to the Typer app defined in cli/app.py. Heavy imports (pipeline, storage, LLM) are loaded lazily inside command bodies so startup time stays fast.

notewise process "https://youtube.com/watch?v=VIDEO_ID"

Source resolution

The process command accepts three input forms:

Single video URL — a watch?v= link.
Playlist URL — a playlist?list= link. NoteWise fetches all video IDs from the playlist before processing.
Batch file — a plain .txt file with one URL per line. All URLs are resolved to video IDs and de-duplicated before any processing begins.

YouTube metadata and transcript extraction

For each video, NoteWise makes two sequential calls into the youtube/ module:

get_video_metadata() — fetches title, duration, and chapter markers.
fetch_transcript() — downloads the transcript with language fallback and exponential-backoff retries (up to 3 attempts).

All YouTube requests are rate-limited via aiolimiter (default: 10 requests per minute, configurable with YOUTUBE_REQUESTS_PER_MINUTE). Requests are dispatched with asyncio.to_thread so the event loop is never blocked.

Pass a Netscape-format cookie file with --cookie-file (or YOUTUBE_COOKIE_FILE) to access age-gated or login-required videos.

CorePipeline orchestration

CorePipeline (in pipeline/core.py) owns all shared state: the concurrency semaphore, the output-path reservation set, the SQLite cache handle, and per-run metrics. It delegates the actual per-video work to pipeline/_execution.py.Concurrency is controlled by two asyncio.Semaphore instances:

Semaphore	Config key	Default
Video-level	`MAX_CONCURRENT_VIDEOS`	5
Chapter-level	(internal)	3

Each video runs as an independent asyncio.Task. asyncio.gather collects all results and continues processing even when individual videos fail.

Chapter detection and splitting

After fetching metadata, _execution.py checks whether the video qualifies for chapter-mode generation:

use_chapters = duration > DEFAULT_CHAPTER_MIN_DURATION and chapters is not None

DEFAULT_CHAPTER_MIN_DURATION is 3600 seconds (1 hour). Videos shorter than one hour, or videos with no chapter markers, are processed as a single file. Videos that meet both conditions have their transcript split by chapter boundaries using split_transcript_by_chapters().

Chapter splitting is done by aligning transcript segment timestamps against the chapter start/end times reported by YouTube. Each chapter’s text is processed independently and written to its own numbered Markdown file.

StudyMaterialGenerator — chunking and LLM calls

StudyMaterialGenerator (in pipeline/generation.py) handles token counting, text chunking, and the LLM call sequence.Token counting uses LiteLLM’s token_counter with the active model’s tokenizer, falling back to a 4-chars-per-token estimate when the tokenizer is unavailable.Chunking strategy (triggered when the transcript exceeds DEFAULT_CHUNK_SIZE):

Parameter	Default	Description
`DEFAULT_CHUNK_SIZE`	4000 tokens	Maximum tokens per chunk
`DEFAULT_CHUNK_OVERLAP`	200 tokens	Overlap carried into the next chunk

Split priority: sentence boundaries → newlines → words → hard character limit. When the transcript fits in a single chunk, one LLM call is made (get_single_pass_prompt). For multi-chunk transcripts, each chunk is processed separately and the results are merged in a final combine call (get_combine_prompt).The same chunked approach applies to individual chapters (generate_single_chapter_notes) and quiz generation (generate_quiz).

LLMProvider — LiteLLM wrapper

LLMProvider (in llm/provider.py) wraps LiteLLM’s acompletion with:

Automatic retries with exponential backoff for rate-limit errors (3 retries by default).
Per-call token usage tracking via a ContextVar-based UsageTotals collector.
Cost estimation using LiteLLM’s model price map.
Markdown fence stripping — if the LLM wraps its output in triple backticks, the fences are removed before the content is returned.

The provider is instantiated with a LiteLLM-format model string (e.g., gemini/gemini-2.5-flash) and requires the corresponding API key to be present in the environment.

File output

Completed notes are written as UTF-8 Markdown files. The output layout depends on whether chapter mode is active:Standard (single file):

output/
└── Video Title.md

Chapter mode:

output/
└── Video Title/
    ├── 01_Introduction.md
    ├── 02_Core_Concepts.md
    └── ...

Output paths are reserved atomically with an async lock to prevent two concurrent videos from writing to the same filename.Optional outputs written alongside the notes:

Quiz (--quiz) — a multiple-choice quiz Markdown file in the same directory.
Transcript (--export-transcript txt|json) — the raw transcript text or a timestamped JSON file.

SQLite cache

After a video is successfully processed, its transcript text, token usage, cost, and timing stats are persisted to a SQLite database via DatabaseRepository (backed by SQLAlchemy). On the next run, cached videos are detected and skipped automatically.The cache file lives at ~/.notewise/.notewise_cache.db. Use --force to reprocess a cached video, or notewise cache to inspect and manage cache entries.

Rich Live dashboard

When the --no-ui flag is not set, ui/dashboard.py renders a Rich Live table that updates in real time as PipelineEvent objects are emitted. Each event carries a typed EventType enum value and the relevant video ID, so the dashboard can track per-video progress (metadata fetched, transcript downloaded, chunks generating, complete) alongside aggregate token and cost totals.Use --no-ui for CI pipelines or cron jobs to get plain stdout output instead.

State directory

All persistent state lives under ~/.notewise/ by default:

~/.notewise/
├── config.env            # API keys and settings (written by notewise setup)
├── .notewise_cache.db    # SQLite transcript and run cache
└── logs/
    └── notewise-<date>.log

Override the base directory with NOTEWISE_HOME:

export NOTEWISE_HOME=/path/to/custom/dir

Concurrency model

NoteWise is fully async. The top-level run_pipeline coroutine creates one asyncio.Task per video and gathers them all concurrently, bounded by the video-level semaphore. Within a chapter-mode video, individual chapters are also processed concurrently, bounded by the chapter-level semaphore. YouTube requests go through an aiolimiter rate limiter so NoteWise never hammers the API.

run_pipeline()
 ├── [Task] process_single_video(video_1)
 │    ├── fetch_transcript()         ← rate-limited
 │    └── generate_chapter_notes_concurrent()
 │         ├── [Task] chapter_1 LLM call
 │         ├── [Task] chapter_2 LLM call
 │         └── [Task] chapter_3 LLM call
 └── [Task] process_single_video(video_2)
      └── generate_study_notes()     ← single-file path

Increase MAX_CONCURRENT_VIDEOS in ~/.notewise/config.env or with --model and YOUTUBE_REQUESTS_PER_MINUTE if you are processing large batches and want to tune throughput vs. rate-limit risk.

Getting Started

Guides

How It Works

Pipeline overview

State directory

Concurrency model

Build docs developers (and LLMs) love

Getting Started

Guides

How It Works

​Pipeline overview

​State directory

​Concurrency model

Build docs developers (and LLMs) love

Pipeline overview

State directory

Concurrency model