Backend Architecture - Platzi Viewer

Overview

The backend is a multi-threaded Python HTTP server that serves static files, provides REST API endpoints, and streams content from Google Drive. Built with standard library components, it requires no external web framework.

Core Components

# server.py:1342-1348
def create_server(host=BIND_HOST, port=PORT):
    init_cache()
    return ThreadingHTTPServer((host, port), PlatziHandler)

def run_server(server):
    server.serve_forever()

ThreadingHTTPServer

Built-in Python server that spawns a thread per request for concurrent handling

PlatziHandler

Custom request handler extending SimpleHTTPRequestHandler with API routes

Cache Layer

Pre-loaded JSON cache with thread-safe access and auto-reload on file changes

Drive Integration

Lazy-loaded Drive service with shared authenticated session

Request Handler

Handler Initialization

From server.py:730-745:

class PlatziHandler(SimpleHTTPRequestHandler):
    """Manejador HTTP personalizado."""

    extensions_map = {
        **SimpleHTTPRequestHandler.extensions_map,
        ".js": "application/javascript; charset=utf-8",
        ".mjs": "application/javascript; charset=utf-8",
        ".css": "text/css; charset=utf-8",
        ".json": "application/json; charset=utf-8",
        ".svg": "image/svg+xml",
        ".wasm": "application/wasm",
    }

    def __init__(self, *args, **kwargs):
        super().__init__(*args, directory=VIEWER_PATH, **kwargs)

Security Headers

From server.py:747-751:

def end_headers(self):
    self.send_header("X-Content-Type-Options", "nosniff")
    self.send_header("Referrer-Policy", "no-referrer")
    self.send_header("Cross-Origin-Resource-Policy", "same-site")
    super().end_headers()

CORS is only enabled for localhost origins to prevent unauthorized access from remote sites. See server.py:753-766 for origin validation logic.

API Endpoints

Health Check

From server.py:817-835:

if self.path == "/api/health":
    ds = get_drive_service()
    ffmpeg_executable = _get_ffmpeg_executable()
    with compat_stream_lock:
        compat_snapshot = dict(compat_stream_stats)
    payload = {
        "status": "ok",
        "drive": {
            "available": bool(ds),
            "error": None if ds else get_drive_service_error(),
        },
        "ffmpeg": {
            "available": bool(ffmpeg_executable),
            "path": ffmpeg_executable,
        },
        "compatStream": compat_snapshot,
    }
    self._send_json(200, payload)
    return

Bootstrap Endpoint

From server.py:848-856:

if self.path == "/api/bootstrap":
    refresh_cache_if_changed()

    with cache_lock:
        raw_bytes = bootstrap_cache_json_bytes or _payload_to_bytes(_empty_courses_payload())
        gzip_bytes = bootstrap_cache_json_gzip_bytes or None

    self._send_json_bytes(200, raw_bytes, gzip_bytes)
    return

The bootstrap endpoint returns pre-serialized and pre-compressed JSON from memory. No JSON encoding happens per-request, ensuring sub-millisecond response times.

Course Detail Endpoint

From server.py:877-911:

if self.path.startswith("/api/course-detail/"):
    refresh_cache_if_changed()

    parsed = urlparse(self.path)
    parts = [unquote(segment) for segment in parsed.path.split("/") if segment]
    if len(parts) != 5:
        self._send_json(400, {
            "error": "invalid_course_detail_path",
            "expected": "/api/course-detail/<catId>/<routeId>/<courseId>",
        })
        return

    _, _, cat_ref, route_ref, course_ref = parts
    with cache_lock:
        data = courses_cache or _empty_courses_payload()

    match = _resolve_course_detail_refs(data, cat_ref, route_ref, course_ref)
    if not match:
        self._send_json(404, {
            "error": "course_not_found",
            "catRef": cat_ref,
            "routeRef": route_ref,
            "courseRef": course_ref,
        })
        return

    payload = _build_course_detail_payload(match)
    self._send_json(200, payload)
    return

Flexible Reference Matching

From server.py:468-480:

def _ref_matches(ref_value, idx, candidates):
    normalized = str(ref_value or "").strip().lower()
    if not normalized:
        return False

    if normalized == str(idx).lower():
        return True  # Match by index

    for candidate in candidates:
        c = str(candidate or "").strip().lower()
        if c and normalized == c:
            return True  # Match by ID, name, or slug
    return False

Courses can be referenced by:

Index: 0, 1, 2
ID: curso-react-2024
Name: React Course
Slug: Extracted from URL path

Progress Endpoints

From server.py:924-935 (GET):

if self.path == "/api/progress":
    try:
        if os.path.exists(PROGRESS_FILE):
            with open(PROGRESS_FILE, "r", encoding="utf-8") as f:
                data = json.load(f)
        else:
            data = {}
    except:
        data = {}

    self._send_json(200, data)
    return

From server.py:218-244 (POST):

if self.path == "/api/progress":
    content_length = int(self.headers.get("Content-Length", 0))

    if content_length <= 0 or content_length > MAX_PROGRESS_BYTES:
        self._send_json(413, {"error": "payload_too_large"})
        return

    post_data = self.rfile.read(content_length)

    try:
        # Validar que es JSON válido
        parsed = json.loads(post_data.decode("utf-8"))
        if not isinstance(parsed, dict):
            raise ValueError("progress payload must be a JSON object")

        # Guardar en archivo
        progress_dir = os.path.dirname(PROGRESS_FILE)
        if progress_dir:
            os.makedirs(progress_dir, exist_ok=True)
        with open(PROGRESS_FILE, "wb") as f:
            f.write(post_data)

        self._send_json(200, {"status": "saved"})
    except Exception as e:
        self._send_json(400, {"error": str(e)})
    return

Progress is stored as a raw JSON file (progress.json) with a 2MB size limit to prevent abuse. The server validates JSON structure but doesn’t interpret the content.

File Streaming

Drive File Streaming

From server.py:1124-1212:

if self.path.startswith("/drive/files/"):
    file_id = unquote(self.path[13:])

    if not file_id or not DRIVE_ID_RE.match(file_id):
        self.send_error(400, "Invalid file ID")
        return

    ds = get_drive_service()
    if not ds:
        self._safe_send_error(503, "Drive service not available...")
        return

    try:
        range_header = self.headers.get("Range")

        if range_header:
            sanitized_range = str(range_header).strip()
            if "," in sanitized_range or not re.match(r"^bytes=\d*-\d*$", sanitized_range):
                self._safe_send_error(416, "Invalid range header")
                return
            resp = ds.download_file_range(file_id, range_header=sanitized_range)
        else:
            resp = ds.download_file_range(file_id)

        status_code = resp.status_code
        mime_type = resp.headers.get("Content-Type", "application/octet-stream")
        is_video = mime_type.startswith("video")
        content_range = resp.headers.get("Content-Range")
        content_length = resp.headers.get("Content-Length")

        self.send_response(status_code)
        self.send_header("Content-Type", mime_type)
        if content_range:
            self.send_header("Content-Range", content_range)
        if content_length:
            self.send_header("Content-Length", content_length)
        self.send_header("Accept-Ranges", "bytes")
        self._set_cors_headers()
        if is_video:
            self.send_header("Cache-Control", "public, max-age=3600")
        self.end_headers()

        start_time = time.time()
        total_bytes = 0
        try:
            # Optimización: Aumentar buffer a 1MB para reducir overhead y mejorar A/V sync
            for chunk in resp.iter_content(chunk_size=1024 * 1024):
                if chunk:
                    self.wfile.write(chunk)
                    total_bytes += len(chunk)
        except OSError as error:
            if not self._is_client_disconnect_error(error):
                raise
        finally:
            duration = time.time() - start_time
            # Loguear métricas de streaming para detectar cuellos de botella
            if duration > 0.5:
                speed = (total_bytes / 1024 / 1024) / duration
                print(f"[STREAM] {file_id} | Range: {range_header or 'Full'} | "
                      f"{total_bytes/1024/1024:.2f} MB in {duration:.2f}s ({speed:.2f} MB/s)")
            resp.close()

Range Request Optimization

Why 1MB Chunks?

Larger chunk sizes (1MB vs 64KB default) reduce:

System call overhead
Network round-trips
Audio/video desync issues

Trade-off: Slightly higher memory usage per request.

Performance Metrics

The server logs streaming performance when duration > 0.5s:

[STREAM] 1a2b3c4d5e | Range: bytes=0-1048575 | 1.00 MB in 0.87s (1.15 MB/s)

This helps identify bottlenecks (slow Drive API, network issues, etc.).

FFmpeg Compatibility Streaming

For browsers/devices with codec issues, the server can remux video through FFmpeg: From server.py:947-1122:

if self.path.startswith("/api/video-compatible/"):
    file_id = unquote(self.path[len("/api/video-compatible/"):])
    
    ffmpeg_executable = _get_ffmpeg_executable()
    if not ffmpeg_executable:
        self._safe_send_error(503, "ffmpeg_not_available")
        return

    # Route through local /drive/files endpoint for auth
    source_url = f"http://127.0.0.1:{PORT}/drive/files/{file_id}"
    compat_force_reencode = os.environ.get("PLATZI_COMPAT_FORCE_REENCODE", "0").strip() == "1"

    ffmpeg_cmd = [
        ffmpeg_executable,
        "-hide_banner",
        "-loglevel", "error",
        "-fflags", "+genpts+discardcorrupt",
        "-avoid_negative_ts", "make_zero",
        "-i", source_url,
        "-map", "0:v:0",
        "-map", "0:a?",
    ]

    if compat_force_reencode:
        ffmpeg_cmd.extend(["-c:v", "libx264", "-preset", "veryfast"])
    else:
        ffmpeg_cmd.extend(["-c:v", "copy"])  # Stream copy (fast)

    ffmpeg_cmd.extend([
        "-c:a", "aac",
        "-ar", "48000",
        "-af", "aresample=async=1:first_pts=0",
        "-movflags", "+frag_keyframe+empty_moov+default_base_moof",
        "-f", "mp4",
        "-",
    ])

    process = subprocess.Popen(
        ffmpeg_cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        bufsize=0,
    )

    self.send_response(200)
    self.send_header("Content-Type", "video/mp4")
    self.send_header("Cache-Control", "no-store, max-age=0")
    self.send_header("Accept-Ranges", "none")  # Can't seek in live transcode
    self.end_headers()

    while True:
        chunk = process.stdout.read(1024 * 512)
        if not chunk:
            break
        self.wfile.write(chunk)
        total_bytes += len(chunk)

FFmpeg mode is not seekable (Accept-Ranges: none) because the stream is transcoded on-the-fly. Use this only when direct streaming fails due to codec issues.

Cache Management

Cache Loading

From server.py:583-689:

def init_cache():
    """Inicializa el caché cargando courses_cache.json."""
    global courses_cache, bootstrap_cache, cache_meta
    global cache_mtime, cache_file_path, cache_source
    global full_cache_json_bytes, full_cache_json_gzip_bytes
    global bootstrap_cache_json_bytes, bootstrap_cache_json_gzip_bytes
    global cache_meta_json_bytes, cache_meta_json_gzip_bytes

    selected_cache_file = None
    selected_mtime = None
    selected_data = None
    load_errors = []

    for candidate in _get_cache_preference_order():
        if not os.path.exists(candidate):
            continue

        data, mtime, error = _load_cache_file(candidate)
        if error:
            load_errors.append((candidate, error))
            _mark_cache_invalid(candidate, mtime)
            print(f"[WARN] Cache inválido en {candidate}: {error}")
            continue

        selected_cache_file = candidate
        selected_mtime = mtime
        selected_data = data
        _clear_invalid_cache_mark(candidate)
        break

    if selected_data is None:
        # Fallback to empty cache if all files invalid
        empty_data = _empty_courses_payload()
        # ... initialize with empty data
    else:
        bootstrap_data = _build_bootstrap_payload(selected_data)
        full_bytes = _payload_to_bytes(selected_data)
        bootstrap_bytes = _payload_to_bytes(bootstrap_data)
        
        with cache_lock:
            courses_cache = selected_data
            bootstrap_cache = bootstrap_data
            full_cache_json_bytes = full_bytes
            full_cache_json_gzip_bytes = _gzip_payload(full_bytes)
            bootstrap_cache_json_bytes = bootstrap_bytes
            bootstrap_cache_json_gzip_bytes = _gzip_payload(bootstrap_bytes)

Cache Preference Order

From server.py:261-265:

def _get_cache_preference_order():
    prefer_data = str(os.environ.get("PLATZI_PREFER_DATA_CACHE", "1")).strip().lower() in {"1", "true", "yes", "on"}
    if prefer_data:
        return [DATA_CACHE_FILE, VIEWER_CACHE_FILE]
    return [VIEWER_CACHE_FILE, DATA_CACHE_FILE]

DATA_CACHE_FILE: $PLATZI_DATA_PATH/courses_cache.json
VIEWER_CACHE_FILE: $PLATZI_VIEWER_PATH/courses_cache.json

By default, the data path cache is preferred. This allows separating code (viewer) from data (courses) for easier deployment and updates.

Auto-Reload on Change

From server.py:692-727:

def refresh_cache_if_changed():
    """Recarga el caché si courses_cache.json cambió en disco."""
    current_cache_file = resolve_cache_file_path()

    if not os.path.exists(current_cache_file):
        return

    try:
        current_mtime = os.path.getmtime(current_cache_file)
    except OSError:
        return

    with cache_lock:
        previous_mtime = cache_mtime
        previous_cache_file = cache_file_path

    if previous_cache_file != current_cache_file:
        with cache_reload_lock:
            with cache_lock:
                if cache_file_path != current_cache_file:
                    print("[INFO] Cambio de origen de caché detectado, recargando...")
                    init_cache()
        return

    if previous_mtime is not None and current_mtime <= previous_mtime:
        return

    with cache_reload_lock:
        with cache_lock:
            latest_mtime = cache_mtime

        if latest_mtime is not None and current_mtime <= latest_mtime:
            return

        print("[INFO] Detectado cambio en courses_cache.json, recargando caché...")
        init_cache()

Every API request calls refresh_cache_if_changed() to detect file updates. This enables hot-reloading when you rebuild the cache with rebuild_cache_drive.py.

Bootstrap Data Construction

From server.py:420-441:

def _build_bootstrap_payload(data):
    raw = data if isinstance(data, dict) else _empty_courses_payload()
    categories = []

    for category in raw.get("categories", []):
        routes = [_build_bootstrap_route(route) for route in (category or {}).get("routes", [])]
        cat_payload = {
            "id": (category or {}).get("id", ""),
            "name": (category or {}).get("name", ""),
            "icon": (category or {}).get("icon", ""),
            "description": (category or {}).get("description", ""),
            "type": (category or {}).get("type", ""),
            "routes": routes,
        }
        categories.append(cat_payload)

    return {
        "categories": categories,
        "stats": _normalize_stats(raw.get("stats")),
    }

Bootstrap payload differs from full cache:

Full: Includes classes array with all class details
Bootstrap: classes field is a number (count) instead of array

This reduces initial payload size by ~70-80%.

Threading Model

Thread Safety

From server.py:42:

cache_lock = threading.Lock()
cache_reload_lock = threading.Lock()

All cache reads/writes are protected:

with cache_lock:
    data = courses_cache  # Thread-safe read

Per-Thread Drive Service

From drive_service.py:97-100:

def get_service(self):
    if not hasattr(self._thread_local, "service"):
        self._thread_local.service = build("drive", "v3", credentials=self.creds, cache_discovery=False)
    return self._thread_local.service

Each thread gets its own drive.v3 service instance to avoid conflicts.

Shared Authenticated Session

From drive_service.py:102-110:

def _get_session(self):
    # Shared session avoids cold-start latency on each per-request thread.
    if self._shared_session is None:
        with self._shared_session_lock:
            if self._shared_session is None:
                session = AuthorizedSession(self.creds)
                session.headers.update({"Accept-Encoding": "identity"})
                self._shared_session = session
    return self._shared_session

requests.Session is mostly thread-safe for reads, but modifying headers isn’t. We set headers once during initialization and reuse the session across threads.

Error Handling

Client Disconnect Detection

From server.py:795-807:

def _is_client_disconnect_error(self, error):
    if isinstance(error, (BrokenPipeError, ConnectionResetError, ConnectionAbortedError)):
        return True

    winerror = getattr(error, "winerror", None)
    if winerror in {10053, 10054}:  # Windows connection errors
        return True

    err_no = getattr(error, "errno", None)
    if err_no in {errno.EPIPE, errno.ECONNRESET, errno.ECONNABORTED}:
        return True

    return False

Streaming errors caused by client disconnect (user closes tab, seeks video) are silently ignored to avoid log spam.

Safe Error Responses

From server.py:809-814:

def _safe_send_error(self, code, message):
    try:
        self.send_error(code, message)
    except OSError as error:
        if not self._is_client_disconnect_error(error):
            raise

Environment Configuration

From server.py:20-29:

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
VIEWER_PATH = os.environ.get("PLATZI_VIEWER_PATH", BASE_DIR)
DATA_PATH = os.environ.get("PLATZI_DATA_PATH", VIEWER_PATH)
PORT = int(os.environ.get("PORT", "8080"))
BIND_HOST = os.environ.get("HOST", "127.0.0.1")
DISPLAY_HOST = os.environ.get("PUBLIC_HOST", BIND_HOST)
PROGRESS_FILE = os.path.join(DATA_PATH, "progress.json")
VIEWER_CACHE_FILE = os.path.join(VIEWER_PATH, "courses_cache.json")
DATA_CACHE_FILE = os.path.join(DATA_PATH, "courses_cache.json")
MAX_PROGRESS_BYTES = int(os.environ.get("MAX_PROGRESS_BYTES", "2097152"))  # 2MB

Overview

Getting Started

User Guide

Deployment

Architecture

​Overview

​Core Components

ThreadingHTTPServer

PlatziHandler

Cache Layer

Drive Integration

​Request Handler

​Handler Initialization

​Security Headers

​API Endpoints

​Health Check

​Bootstrap Endpoint

​Course Detail Endpoint

​Flexible Reference Matching

​Progress Endpoints

​File Streaming

​Drive File Streaming

​Range Request Optimization

​FFmpeg Compatibility Streaming

​Cache Management

​Cache Loading

​Cache Preference Order

​Auto-Reload on Change

​Bootstrap Data Construction

​Threading Model

​Thread Safety

​Per-Thread Drive Service

​Shared Authenticated Session

​Error Handling

​Client Disconnect Detection

​Safe Error Responses

​Environment Configuration

​Next Steps

Google Drive Integration

Frontend Architecture

Build docs developers (and LLMs) love

Overview

Core Components

Request Handler

Handler Initialization

Security Headers

API Endpoints

Health Check

Bootstrap Endpoint

Course Detail Endpoint

Flexible Reference Matching

Progress Endpoints

File Streaming

Drive File Streaming

Range Request Optimization

FFmpeg Compatibility Streaming

Cache Management

Cache Loading

Cache Preference Order

Auto-Reload on Change

Bootstrap Data Construction

Threading Model

Thread Safety

Per-Thread Drive Service

Shared Authenticated Session

Error Handling

Client Disconnect Detection

Safe Error Responses

Environment Configuration

Next Steps