Skip to main content

Overview

Platzi Viewer includes several Python utility scripts for cache management, data processing, and deployment. All scripts are located in the project root directory and require the virtual environment to be activated.

Core Scripts

rebuild_cache_drive.py

Purpose: Scans the Google Drive folder structure and builds courses_cache.json with Drive file IDs for all courses, modules, and classes. Usage:
python rebuild_cache_drive.py
What it does:
  1. Parses PlatziRoutes.md to get the expected course structure
  2. Lists all course folders from the Drive root (ID: 17kPqqPSheDtQ5S1HM6Qvvh2qJ7O3YADm)
  3. Matches course names using fuzzy matching (handles variations, accents, special characters)
  4. Scans each course folder for modules and class files
  5. Stores Drive file IDs for videos, summaries, subtitles, readings, and resources
  6. Generates courses_cache.json (~20MB, ~20,000 classes)
Features:
  • Resume capability: Saves progress to drive_scan_progress.json - can be interrupted and resumed
  • Rate limiting: Throttles API calls to stay within Google Drive API quotas (12,000 queries/minute)
  • Retry logic: Automatically retries failed API calls with exponential backoff (up to 5 attempts)
  • Incremental saves: Saves progress every 10 courses scanned
Time: First run takes 15-30 minutes for ~500 courses Output:
{
  "categories": [
    {
      "id": "desarrollo-web",
      "name": "Desarrollo Web",
      "icon": "💻",
      "routes": [
        {
          "name": "Fundamentos de Programación",
          "courses": [
            {
              "name": "Curso de Programación Básica",
              "id": "1ABC...XYZ",
              "modules": [
                {
                  "name": "Introducción",
                  "classes": [
                    {
                      "name": "Bienvenida",
                      "files": {
                        "video": "1OOJ5lrs...",
                        "summary": "1WWggG3N...",
                        "subtitles": "1ABCdef..."
                      }
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ],
  "stats": {
    "totalCategories": 17,
    "totalRoutes": 156,
    "totalCourses": 542,
    "totalClasses": 19847
  }
}
If the scan is interrupted, simply run the script again. It will resume from where it left off using drive_scan_progress.json.

parse_routes.py

Purpose: Parses PlatziRoutes.md into structured categories, routes, and courses. Usage:
# Parse and print summary
python parse_routes.py

# Import as module
from parse_routes import parse
data = parse()
print(data['stats']['totalCourses'])
What it does:
  • Reads PlatziRoutes.md (Markdown file with course catalog)
  • Extracts categories (schools) with icons
  • Parses routes (learning paths) and courses
  • Sanitizes folder names to match Drive naming conventions
  • Returns structured JSON with hierarchy
School Icons Mapping:
SCHOOL_ICONS = {
    "Desarrollo Web": "💻",
    "English Academy": "🇬🇧",
    "Marketing Digital": "📣",
    "Inteligencia Artificial y Data Science": "🤖",
    "Ciberseguridad": "🔒",
    # ... and more
}
Output Example:
============================================================
📊 PlatziRoutes.md Summary
============================================================
  Schools:  17
  Routes:   156
  Courses:  542

  💻 Desarrollo Web 23 routes, 87 courses [development]
  🇬🇧 English Academy 8 routes, 42 courses [english]
  📣 Marketing Digital 12 routes, 38 courses [marketing]
  ...
This script is called by rebuild_cache_drive.py to get the expected course structure before scanning Drive.

server.py

Purpose: HTTP server that serves the frontend and acts as a proxy to Google Drive API. Usage:
# Start server (default port 8080)
python server.py

# Custom port and host
export PORT=3000
export HOST=0.0.0.0
python server.py

# Use data directory cache
export PLATZI_DATA_PATH=/path/to/data
python server.py
Key Features:
  • Multi-threaded: Uses ThreadingHTTPServer for concurrent requests
  • Drive proxy: All files served via /drive/files/{fileId} endpoint
  • Range requests: Supports HTTP 206 Partial Content for video seeking
  • Progress sync: Saves/loads user progress to progress.json
  • Cache reload: Automatically detects changes to courses_cache.json
  • Health check: /api/health endpoint for diagnostics
Environment Variables:
# Server configuration
PORT=8080                          # Server port
HOST=127.0.0.1                     # Bind address
PUBLIC_HOST=localhost              # Display hostname

# Paths
PLATZI_VIEWER_PATH=/path/to/app    # Frontend files location
PLATZI_DATA_PATH=/path/to/data     # Data storage (cache, progress)
PLATZI_PREFER_DATA_CACHE=1         # Prefer data cache over viewer cache

# Drive credentials
GOOGLE_SERVICE_ACCOUNT_FILE=/path/to/service_account.json
GOOGLE_SERVICE_ACCOUNT_JSON={...}  # Inline JSON credentials

# Limits
MAX_PROGRESS_BYTES=2097152         # Max progress file size (2MB)

# FFmpeg
FFMPEG_PATH=/usr/bin/ffmpeg        # Custom FFmpeg path
PLATZI_COMPAT_FORCE_REENCODE=1     # Force video re-encoding for compatibility
API Endpoints:
EndpointMethodDescription
/api/coursesGETFull course cache with all class details
/api/bootstrapGETLightweight cache (module summaries only)
/api/cache-metaGETCache metadata (source, timestamp, stats)
/api/course-detail/{cat}/{route}/{course}GETDetailed course data by reference
/api/progressGETLoad user progress
/api/progressPOSTSave user progress
/api/healthGETServer health, Drive status, FFmpeg availability
/api/refreshGETReload cache (localhost only)
/api/self-check-driveGETValidate all file IDs are Drive references
/api/video-compatible/{fileId}GETFFmpeg-processed video for A/V sync issues
/drive/files/{fileId}GETStream file from Drive (supports Range headers)
Logging:
[INFO] Cache selected (viewer): /app/courses_cache.json
[OK] Datos cargados: 17 categorías, 156 rutas, 542 cursos, 19847 clases
[INFO] Servidor listo en http://127.0.0.1:8080

[STREAM] 1OOJ5lrs... | Range: bytes=0-1048575 | 1.00 MB in 0.43s (2.33 MB/s)
[COMPAT] 1ABC... | 145.32 MB in 62.18s (2.34 MB/s)

drive_service.py

Purpose: Google Drive API v3 wrapper with authentication and streaming support. Usage:
from drive_service import drive_service

# List files in folder
files = drive_service.list_files(folder_id)

# Get file metadata
meta = drive_service.get_file_metadata(file_id)

# Download file with range
response = drive_service.download_file_range(
    file_id,
    range_header="bytes=0-1048575"
)
Features:
  • Service account auth: Automatic authentication using service_account.json
  • Thread-safe: Uses thread-local storage for API client
  • Retry logic: Exponential backoff for transient failures (up to 5 retries)
  • Streaming: Returns response objects for efficient chunked downloads
  • Validation: Drive ID format validation (regex: ^[A-Za-z0-9_-]{10,}$)
Authentication Flow:
  1. Looks for GOOGLE_SERVICE_ACCOUNT_JSON environment variable (inline JSON)
  2. Falls back to GOOGLE_SERVICE_ACCOUNT_FILE path
  3. Searches in: PyInstaller bundle, current directory, script directory
  4. Refreshes credentials if expired
  5. Creates thread-safe AuthorizedSession for HTTP requests

Utility Scripts

check_remaining.py

Purpose: Compares DriveCourses.md and PlatziRoutes.md to find missing courses. Usage:
python check_remaining.py
Output:
Still missing: 23
  Drive: "Curso de Python: Comprehensions, Funciones y Manejo de Errores"
    San: "curso de python comprehensions funciones y manejo de errores"
    Best: "Curso de Python" (san: "curso de python", overlap: 3)

desktop_app.py

Purpose: Desktop application entry point using PyQt6 or pywebview. Usage:
# Run desktop app directly
python desktop_app.py

# Or use the launcher
python app_launcher.py
Features:
  • GPU acceleration: Configures Chromium flags for hardware video decoding
  • Auto-port selection: Finds free port if default is occupied
  • Persistent storage: Saves localStorage/cookies to PlatziData/ folder
  • Embedded server: Launches server.py in background thread
  • Multiple backends: PyQt6 WebEngine or pywebview

check_drive_runtime.py

Purpose: Diagnostic tool to verify Drive service initialization. Usage:
python check_drive_runtime.py
Checks:
  • Service account file existence and validity
  • Credentials loading and authentication
  • Drive API connectivity
  • File listing capabilities

Build Scripts (PowerShell)

build_portable_exe.ps1

Purpose: Creates standalone Windows executable using PyInstaller. Usage:
powershell -ExecutionPolicy Bypass -File .\build_portable_exe.ps1
Output: dist/PlatziViewer/PlatziViewer.exe What’s included:
  • All Python scripts and dependencies
  • Frontend files (HTML, CSS, JS)
  • Icon (favicon.ico)
  • Optionally: service_account.json, courses_cache.json

build_desktop_exe.ps1

Purpose: Creates desktop application with embedded browser window. Usage:
powershell -ExecutionPolicy Bypass -File .\build_desktop_exe.ps1
Output: dist/PlatziViewerDesktop.exe Difference from portable:
  • Uses desktop_app.py as entry point
  • Includes PyQt6 WebEngine dependencies
  • Opens as native application window (not browser tab)
  • GPU-accelerated video playback

Script Dependencies

All scripts require:
# Install dependencies
pip install -r requirements.txt
Key packages:
  • google-api-python-client - Drive API client
  • google-auth - Service account authentication
  • google-auth-httplib2 - HTTP transport
Optional (for desktop builds):
  • pyinstaller - Executable packaging
  • PyQt6 + PyQt6-WebEngine - Desktop UI
  • pywebview - Alternative lightweight UI

Common Tasks

Rebuild cache after adding courses

python rebuild_cache_drive.py

Update PlatziRoutes.md and refresh

# Edit PlatziRoutes.md
vim PlatziRoutes.md

# Rebuild cache
python rebuild_cache_drive.py

# Restart server
python server.py

Test Drive connectivity

python check_drive_runtime.py

Build for distribution

# Windows executable
.\build_portable_exe.ps1

# Desktop app
.\build_desktop_exe.ps1

Force cache reload in running server

# Only works from localhost
curl http://localhost:8080/api/refresh

Troubleshooting

”Drive service not available”

Check:
  1. service_account.json exists and is valid
  2. Drive folder is shared with service account email
  3. GOOGLE_SERVICE_ACCOUNT_FILE environment variable (if used)
Test:
python check_drive_runtime.py

Cache rebuild fails midway

Solution: Just run again - it will resume from drive_scan_progress.json

”Rate limit exceeded”

Solution: The script has built-in throttling. Wait 60 seconds and resume.

FFmpeg not found for video compatibility

Solution:
# Windows: Install FFmpeg and add to PATH
choco install ffmpeg

# Or set custom path
export FFMPEG_PATH="C:\Program Files\ffmpeg\bin\ffmpeg.exe"

Script Architecture

Best Practice: Run rebuild_cache_drive.py weekly to sync new courses added to Drive.

Build docs developers (and LLMs) love