Skip to main content
Before using Platzi Viewer, you must build the courses cache by scanning your Google Drive folder structure. This process generates courses_cache.json, which maps all courses, modules, and classes to their Drive file IDs.

What is the Cache?

The cache (courses_cache.json) is a comprehensive index of your course library:
  • Size: ~20 MB for a typical library of 500 courses
  • Content: Category/route/course structure with Drive file IDs for all videos, summaries, subtitles, and resources
  • Purpose: Enables fast navigation without querying Drive API on every page load
  • Validity: Permanent until you reorganize your Drive folder structure
The cache only stores metadata and file IDs - actual video files and content remain in Google Drive and are streamed on-demand.

Cache Structure

The cache follows this hierarchy:
{
  "categories": [
    {
      "name": "Desarrollo Web",
      "icon": "🌐",
      "routes": [
        {
          "name": "Desarrollo Backend con Node.js",
          "courses": [
            {
              "name": "Curso de Fundamentos de Node.js",
              "id": "1ABC...xyz",  // Drive folder ID
              "modules": [
                {
                  "name": "Introducción",
                  "classes": [
                    {
                      "name": "Bienvenida al curso",
                      "hasVideo": true,
                      "hasSummary": true,
                      "files": {
                        "video": "1OOJ5lrsLfFEnp6AKVKZKYZH5A-NasCjl",
                        "summary": "1WWggG3NLugsK6dZ37wzbNeLAPqFdOVfj",
                        "subtitles": "1ABCdef...",
                        "reading": null,
                        "html": null
                      },
                      "resources": [
                        {
                          "name": "slides.pdf",
                          "file": "1QWE456...",
                          "ext": ".pdf",
                          "viewable": true
                        }
                      ]
                    }
                  ]
                }
              ],
              "moduleCount": 5,
              "classCount": 47
            }
          ]
        }
      ]
    }
  ],
  "stats": {
    "totalCategories": 8,
    "totalRoutes": 120,
    "totalCourses": 500,
    "totalClasses": 20000
  }
}

Expected Drive Folder Structure

Your Google Drive should be organized like this:
Platzi Courses/  (Root folder shared with service account)
├── Curso de Python/
│   ├── 1. Introducción/  (Module folder)
│   │   ├── 1. Bienvenida.mp4
│   │   ├── 1. Bienvenida_summary.html
│   │   ├── 1. Bienvenida.vtt
│   │   ├── 1. Bienvenida - Lecturas recomendadas.txt
│   │   ├── 2. Instalación de Python.mp4
│   │   ├── 2. Instalación de Python_summary.html
│   │   └── ...
│   ├── 2. Fundamentos/
│   │   ├── 1. Variables y tipos de datos.mp4
│   │   └── ...
│   └── presentation.html  (Optional course presentation)
├── Curso de JavaScript/
│   ├── 1. Primeros Pasos/
│   └── ...
└── ...

File Naming Conventions

The scanner recognizes files by their extensions and naming patterns:
File TypePatternExample
Video{num}. {name}.mp41. Introducción.mp4
Summary{num}. {name}_summary.html1. Introducción_summary.html
Subtitles{num}. {name}.vtt1. Introducción.vtt
Reading{num}. {name} - Lecturas recomendadas.txt1. Intro - Lecturas recomendadas.txt
HTML{num}. {name}.html1. Demo.html
Resources{num}. {name}.{ext}1. Slides.pdf
  • Numbers must start each filename (e.g., 1., 2., 3.)
  • Class name follows the number and period
  • Files without numbers are treated as course-level resources
  • The scanner automatically groups files by their leading number

Running the Cache Builder

Prerequisites

Before building the cache, ensure:
  1. ✅ Service account is configured (see Google Drive Setup)
  2. ✅ Drive folder is shared with service account
  3. ✅ Dependencies are installed (pip install -r requirements.txt)
  4. ✅ You have the Drive root folder ID

Build Command

1

Navigate to project directory

cd platzi-viewer
2

Activate virtual environment (if using one)

# Windows
.venv\Scripts\activate

# Linux/Mac
source .venv/bin/activate
3

Update Drive root folder ID

Open rebuild_cache_drive.py and update the DRIVE_ROOT_ID constant:
rebuild_cache_drive.py:18
DRIVE_ROOT_ID = "17kPqqPSheDtQ5S1HM6Qvvh2qJ7O3YADm"  # Replace with your folder ID
Find your folder ID from the Drive URL:
https://drive.google.com/drive/folders/17kPqqPSheDtQ5S1HM6Qvvh2qJ7O3YADm
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                         This is your folder ID
4

Run the rebuild script

python rebuild_cache_drive.py
You’ll see output like:
============================================================
📦 Rebuilding courses_cache.json from Google Drive
============================================================

📖 Parsing PlatziRoutes.md...
   8 categories, 120 routes, 500 course entries

📁 Listing Drive root folder...
   487 course folders found in Drive
   0 courses already scanned (resumable)

🔗 Matching courses to Drive folders and scanning content...

  🌐 Desarrollo Web...
     ✓ 85 courses matched to Drive

  🎨 Diseño y UX...
     ✓ 42 courses matched to Drive

  ...
5

Wait for completion

The scan takes 15-30 minutes for a full library due to API rate limiting.Progress is saved automatically every 10 courses to drive_scan_progress.json. If interrupted, simply run the command again to resume.

Build Output

Once complete, you’ll see:
============================================================
✅ Cache rebuilt from Google Drive!
   Categories:     8
   Routes:         120
   Course entries: 500
   Matched Drive:  487
   With content:   478
   Total classes:  19,847
   File size:      18.7 MB
   Scanned this run: 487
   API calls:      ~5,240
============================================================

Understanding the Matching Process

The script matches courses from PlatziRoutes.md to Drive folders using fuzzy matching:
rebuild_cache_drive.py:213
def match_course_to_drive(md_name, drive_names_san, drive_names_map):
    """Try to match an MD course name to a Drive folder.

    Returns (drive_folder_name, drive_folder_id, match_type) or (None, None, None).
    """
    san = sanitize_for_match(md_name)

    # 1. Exact match
    if san in drive_names_san:
        info = drive_names_map[san]
        return info["name"], info["id"], "exact"

    # 2. MD name starts with Drive name
    for ds, info in drive_names_map.items():
        if san.startswith(ds) and len(ds) > 20:
            return info["name"], info["id"], "prefix"

    # 3. Drive name starts with MD name
    for ds, info in drive_names_map.items():
        if ds.startswith(san) and len(san) > 20:
            return info["name"], info["id"], "prefix"

    # 4. High word overlap (80%+ matching words)
    ...
Match types:
  • exact: Exact match after sanitization (removing special chars, lowercasing)
  • prefix: One name is a prefix of the other
  • fuzzy: High word overlap (≥80% common words)
Courses not matched to Drive folders are still included in the cache with foundInDrive: false and classCount: 0.

Resume Capability

If the scan is interrupted (Ctrl+C, connection loss, etc.):
  1. Progress is saved to drive_scan_progress.json every 10 courses
  2. Run the same command again to resume from where it stopped
  3. Already scanned courses are loaded from the progress file
Progress file structure:
drive_scan_progress.json
{
  "1ABC...xyz": {  // Drive folder ID
    "modules": [...],
    "moduleCount": 5,
    "classCount": 47,
    "hasPresentation": true,
    "presentationId": "1XYZ..."
  },
  ...
}
To start fresh (rescan everything):
rm drive_scan_progress.json
python rebuild_cache_drive.py

Rate Limiting

The script includes automatic throttling to stay within Google Drive API quotas:
rebuild_cache_drive.py:26
def api_call_throttle():
    """Simple rate limiter to avoid hitting Drive API quotas."""
    global API_CALL_COUNT, API_CALL_START
    API_CALL_COUNT += 1
    elapsed = time.time() - API_CALL_START
    # Google Drive API: 12,000 queries per minute for service accounts
    # Be conservative: max ~100 calls per second
    if API_CALL_COUNT % 50 == 0 and elapsed < 1.0:
        wait = 1.0 - elapsed
        time.sleep(wait)
        API_CALL_START = time.time()
        API_CALL_COUNT = 0
Limits:
  • Google Drive API: 12,000 queries/minute for service accounts
  • Script throttles to: ~100 calls/second (conservative)
  • Total calls for 500 courses: ~5,000-10,000 (depends on folder depth)
If you encounter “User Rate Limit Exceeded” errors, the script will automatically retry with exponential backoff (up to 5 retries). If errors persist, wait a few minutes before running again.

Updating the Cache

When you add new courses to Drive or reorganize folders:
# Full rebuild (recommended for major changes)
rm drive_scan_progress.json
python rebuild_cache_drive.py

# Resume scan (if you only added new courses)
python rebuild_cache_drive.py
The server can reload the cache without restarting:
# Rebuild cache
python rebuild_cache_drive.py

# Trigger server reload (from localhost only)
curl http://localhost:8080/api/refresh
The /api/refresh endpoint is restricted to loopback addresses (localhost, 127.0.0.1) for security. Remote clients cannot reload the cache.

Troubleshooting

”Drive service not available”

Problem: Cannot connect to Google Drive API Solution:
# Check service account configuration
ls service_account.json

# Test Drive access
python -c "from drive_service import drive_service; print('Drive OK')"

# Verify folder is shared with service account
# (check service_account.json for client_email)
See Google Drive Setup for more details.

”No courses matched to Drive”

Problem: All courses show foundInDrive: false Solution:
  1. Verify DRIVE_ROOT_ID points to the correct folder
  2. Check folder is shared with service account email
  3. Ensure course folders exist in Drive (not empty)
  4. Review folder naming - must loosely match names in PlatziRoutes.md

”Scan is very slow”

Problem: Taking longer than 30 minutes Causes:
  • Many subfolders/files per course
  • API rate limiting kicking in
  • Network latency
Solution:
  • Let it run - progress is saved every 10 courses
  • Use wired connection instead of WiFi for stability
  • Avoid running during peak hours

”Invalid Drive file IDs detected”

Problem: Cache validation shows local refs or invalid IDs Solution: This should not happen with rebuild_cache_drive.py. If you see this:
# Check cache integrity
python server.py
# Look at http://localhost:8080/api/health for cache.driveOnlyCheck

# Rebuild cache from scratch
rm courses_cache.json drive_scan_progress.json
python rebuild_cache_drive.py

“Out of memory during scan”

Problem: Python crashes with memory errors Solution: The script loads everything in memory. For very large libraries (1000+ courses):
# Increase Python memory limit (Linux/Mac)
ulimit -v 8388608  # 8GB

# Or process in chunks (requires code modification)
# Split PlatziRoutes.md into smaller files

Cache File Locations

The application checks for cache files in this order:
  1. $PLATZI_DATA_PATH/courses_cache.json (if PLATZI_DATA_PATH is set)
  2. $PLATZI_VIEWER_PATH/courses_cache.json (if PLATZI_VIEWER_PATH is set)
  3. ./courses_cache.json (current directory)
Generated files:
FilePurposeSafe to Delete?
courses_cache.jsonMain cache - required for app to run❌ No (must rebuild)
drive_scan_progress.jsonResume checkpoint✅ Yes (will rescan)
PlatziRoutes.mdCourse definitions❌ No (required for rebuild)

Next Steps

1

Start the Server

With the cache built, you’re ready to launch Platzi Viewer.Go to Quickstart →
2

Explore the Application

Learn how to navigate, watch videos, and track your progress.View User Guide →

Build docs developers (and LLMs) love