courses_cache.json, which maps all courses, modules, and classes to their Drive file IDs.
What is the Cache?
The cache (courses_cache.json) is a comprehensive index of your course library:
- Size: ~20 MB for a typical library of 500 courses
- Content: Category/route/course structure with Drive file IDs for all videos, summaries, subtitles, and resources
- Purpose: Enables fast navigation without querying Drive API on every page load
- Validity: Permanent until you reorganize your Drive folder structure
The cache only stores metadata and file IDs - actual video files and content remain in Google Drive and are streamed on-demand.
Cache Structure
The cache follows this hierarchy:Expected Drive Folder Structure
Your Google Drive should be organized like this:File Naming Conventions
The scanner recognizes files by their extensions and naming patterns:| File Type | Pattern | Example |
|---|---|---|
| Video | {num}. {name}.mp4 | 1. Introducción.mp4 |
| Summary | {num}. {name}_summary.html | 1. Introducción_summary.html |
| Subtitles | {num}. {name}.vtt | 1. Introducción.vtt |
| Reading | {num}. {name} - Lecturas recomendadas.txt | 1. Intro - Lecturas recomendadas.txt |
| HTML | {num}. {name}.html | 1. Demo.html |
| Resources | {num}. {name}.{ext} | 1. Slides.pdf |
- Numbers must start each filename (e.g.,
1.,2.,3.) - Class name follows the number and period
- Files without numbers are treated as course-level resources
- The scanner automatically groups files by their leading number
Running the Cache Builder
Prerequisites
Before building the cache, ensure:- ✅ Service account is configured (see Google Drive Setup)
- ✅ Drive folder is shared with service account
- ✅ Dependencies are installed (
pip install -r requirements.txt) - ✅ You have the Drive root folder ID
Build Command
Update Drive root folder ID
Open Find your folder ID from the Drive URL:
rebuild_cache_drive.py and update the DRIVE_ROOT_ID constant:rebuild_cache_drive.py:18
Build Output
Once complete, you’ll see:Understanding the Matching Process
The script matches courses fromPlatziRoutes.md to Drive folders using fuzzy matching:
rebuild_cache_drive.py:213
exact: Exact match after sanitization (removing special chars, lowercasing)prefix: One name is a prefix of the otherfuzzy: High word overlap (≥80% common words)
Courses not matched to Drive folders are still included in the cache with
foundInDrive: false and classCount: 0.Resume Capability
If the scan is interrupted (Ctrl+C, connection loss, etc.):- Progress is saved to
drive_scan_progress.jsonevery 10 courses - Run the same command again to resume from where it stopped
- Already scanned courses are loaded from the progress file
drive_scan_progress.json
Rate Limiting
The script includes automatic throttling to stay within Google Drive API quotas:rebuild_cache_drive.py:26
- Google Drive API: 12,000 queries/minute for service accounts
- Script throttles to: ~100 calls/second (conservative)
- Total calls for 500 courses: ~5,000-10,000 (depends on folder depth)
Updating the Cache
When you add new courses to Drive or reorganize folders:The
/api/refresh endpoint is restricted to loopback addresses (localhost, 127.0.0.1) for security. Remote clients cannot reload the cache.Troubleshooting
”Drive service not available”
Problem: Cannot connect to Google Drive API Solution:”No courses matched to Drive”
Problem: All courses showfoundInDrive: false
Solution:
- Verify
DRIVE_ROOT_IDpoints to the correct folder - Check folder is shared with service account email
- Ensure course folders exist in Drive (not empty)
- Review folder naming - must loosely match names in
PlatziRoutes.md
”Scan is very slow”
Problem: Taking longer than 30 minutes Causes:- Many subfolders/files per course
- API rate limiting kicking in
- Network latency
- Let it run - progress is saved every 10 courses
- Use wired connection instead of WiFi for stability
- Avoid running during peak hours
”Invalid Drive file IDs detected”
Problem: Cache validation shows local refs or invalid IDs Solution: This should not happen withrebuild_cache_drive.py. If you see this:
“Out of memory during scan”
Problem: Python crashes with memory errors Solution: The script loads everything in memory. For very large libraries (1000+ courses):Cache File Locations
The application checks for cache files in this order:$PLATZI_DATA_PATH/courses_cache.json(ifPLATZI_DATA_PATHis set)$PLATZI_VIEWER_PATH/courses_cache.json(ifPLATZI_VIEWER_PATHis set)./courses_cache.json(current directory)
| File | Purpose | Safe to Delete? |
|---|---|---|
courses_cache.json | Main cache - required for app to run | ❌ No (must rebuild) |
drive_scan_progress.json | Resume checkpoint | ✅ Yes (will rescan) |
PlatziRoutes.md | Course definitions | ❌ No (required for rebuild) |
Next Steps
Start the Server
With the cache built, you’re ready to launch Platzi Viewer.Go to Quickstart →
Explore the Application
Learn how to navigate, watch videos, and track your progress.View User Guide →