server.py

Overview

FastAPI backend that serves videos, provides predictions, and handles sorting videos into folders. Includes endpoints for listing videos, managing folders, sorting videos, and triggering model retraining. Location: source/server.py Server: FastAPI with Uvicorn Default Port: 8000

Configuration Constants

DATA_DIR

Path

default:"data/Favorites/videos"

Directory containing videos and category subfolders.

ARTIFACTS_DIR

Path

default:"artifacts"

Directory containing predictions and trained models.

INDEX_HTML

Path

default:"index.html"

Path to the frontend HTML file.

Pydantic Models

SortRequest

Request body schema for sorting a video into a folder.

class SortRequest(BaseModel):
    filename: str
    folder: str

filename

string

required

Video filename (must match pattern \d+\.mp4, e.g., 7234567890123456.mp4)

folder

string

required

Target folder name (must exist as a subfolder in DATA_DIR)

Example:

{
  "filename": "7234567890123456.mp4",
  "folder": "soccer"
}

Functions

get_folders()

Returns list of category folders with video counts. Returns: list[dict] - List of folders with structure:

[
  {"name": "soccer", "count": 82},
  {"name": "food", "count": 55},
  {"name": "funny", "count": 5}
]

load_predictions()

Startup event handler that loads predictions from artifacts/predictions.json into memory. Populates the global predictions dictionary mapping filename → prediction data.

_run_retrain()

Background function that runs the full ML pipeline in sequence:

extract_features.py - Extract features from all videos
train.py - Train classifier on labeled data
predict.py - Generate predictions for unsorted videos

Runs in a separate thread to avoid blocking API requests. Updates retrain_status global state. Timeout: 900 seconds (15 minutes) per script

API Endpoints

GET /

Serves the main HTML interface. Response: HTML file (index.html)

curl http://localhost:8000/

GET /api/videos

Lists all unsorted videos (in root directory) with their predictions. Response:

videos

array

required

Array of unsorted videos with prediction data

total

integer

required

Total number of unsorted videos

Video Object Schema:

filename

string

required

Video filename

predicted_folder

string | null

required

Predicted category folder, or null if no prediction available

confidence

float

required

Prediction confidence score (0-1)

top_predictions

array

required

Array of top predictions with folder names and confidence scores

Example Response:

{
  "videos": [
    {
      "filename": "7234567890123456.mp4",
      "predicted_folder": "soccer",
      "confidence": 0.87,
      "top_predictions": [
        {"folder": "soccer", "confidence": 0.87},
        {"folder": "food", "confidence": 0.10}
      ]
    },
    {
      "filename": "7234567890123457.mp4",
      "predicted_folder": null,
      "confidence": 0,
      "top_predictions": []
    }
  ],
  "total": 2
}

cURL Example:

curl http://localhost:8000/api/videos

GET /api/folders

Lists all category folders with video counts. Response:

folders

array

required

Array of folder objects

Folder Object Schema:

name

string

required

Folder name

count

integer

required

Number of videos in this folder

Example Response:

{
  "folders": [
    {"name": "funny", "count": 5},
    {"name": "food", "count": 55},
    {"name": "soccer", "count": 82}
  ]
}

cURL Example:

curl http://localhost:8000/api/folders

POST /api/sort

Moves a video from root directory to a category folder. Request Body: SortRequest Response:

success

boolean

required

Whether the sort operation succeeded

filename

string

required

Filename that was sorted

folder

string

required

Folder the video was moved to

folders

array

required

Updated list of all folders with new counts

Example Response:

{
  "success": true,
  "filename": "7234567890123456.mp4",
  "folder": "soccer",
  "folders": [
    {"name": "funny", "count": 5},
    {"name": "food", "count": 55},
    {"name": "soccer", "count": 83}
  ]
}

Error Responses:

400 Bad Request - Invalid filename format or folder name
404 Not Found - Video file not found (may already be sorted)
409 Conflict - File already exists in target folder

cURL Example:

curl -X POST http://localhost:8000/api/sort \
  -H "Content-Type: application/json" \
  -d '{"filename": "7234567890123456.mp4", "folder": "soccer"}'

Security:

Validates filename matches pattern \d+\.mp4
Prevents path traversal (rejects folders containing .. or /)
Checks that destination folder exists and is a directory
Validates source file exists and is a file (not already moved)

POST /api/retrain

Triggers a full model retraining pipeline in the background. Request Body: None Response:

status

string

required

Either "started" if retraining began, or "already_running" if a retrain is in progress

Example Response:

{
  "status": "started"
}

cURL Example:

curl -X POST http://localhost:8000/api/retrain

Process:

Checks if retraining is already in progress
Launches background thread to run pipeline
Returns immediately (doesn’t wait for completion)
Pipeline runs: extract_features.py → train.py → predict.py
Reloads predictions into memory when complete

GET /api/retrain/status

Checks the status of the retraining pipeline. Response:

running

boolean

required

Whether retraining is currently in progress

last_result

string | null

required

Result of last retrain attempt:

"success" - Completed successfully
"Failed at [script]: [error]" - Failed with error message
null - No retrain has completed yet

Example Response:

{
  "running": false,
  "last_result": "success"
}

cURL Example:

curl http://localhost:8000/api/retrain/status

Polling Pattern: Frontend can poll this endpoint every few seconds while running: true to show progress.

GET /videos/

Serves a video file for playback. Path Parameters:

filename

string

required

Video filename (must match pattern \d+\.mp4)

Response: Video file with Content-Type: video/mp4 File Lookup:

First checks root directory (unsorted videos)
If not found, searches all category subfolders
Returns 404 if not found anywhere

Example:

curl http://localhost:8000/videos/7234567890123456.mp4 --output video.mp4

Browser Usage:

<video src="http://localhost:8000/videos/7234567890123456.mp4" controls></video>

Error Response:

400 Bad Request - Invalid filename format
404 Not Found - Video file not found

Running the Server

Development

python server.py

Starts Uvicorn server on http://0.0.0.0:8000

Production

uvicorn server:app --host 0.0.0.0 --port 8000 --workers 4

With Auto-Reload

uvicorn server:app --reload

State Management

The server maintains two global state variables:

predictions

predictions: dict[str, dict] = {}

Dictionary mapping video filenames to prediction data. Loaded from artifacts/predictions.json at startup and after retraining.

retrain_status

retrain_status: dict = {
    "running": False,
    "last_result": None
}

Tracks retraining pipeline status:

running: Boolean indicating if retrain is in progress
last_result: String with success/failure message from last retrain

Dependencies

Required Python packages:

fastapi - Web framework
uvicorn - ASGI server
pydantic - Request/response validation

The server also depends on the ML pipeline scripts:

extract_features.py
train.py
predict.py

Architecture Notes

Thread Safety: The retraining pipeline runs in a daemon thread to avoid blocking API requests. Only one retrain can run at a time. File System: All file operations use shutil.move() for atomic moves. Path traversal is prevented through validation. Caching: Predictions are cached in memory and only reloaded after successful retraining. Error Handling: Invalid filenames, missing files, and folder conflicts return appropriate HTTP error codes.

Scripts

Models

Overview

Configuration Constants

Pydantic Models

SortRequest

Functions

get_folders()

load_predictions()

_run_retrain()

API Endpoints

GET /

GET /api/videos

GET /api/folders

POST /api/sort

POST /api/retrain

GET /api/retrain/status

GET /videos/

Running the Server

Development

Production

With Auto-Reload

State Management

predictions

retrain_status

Dependencies

Architecture Notes

Build docs developers (and LLMs) love

Scripts

Models

​Overview

​Configuration Constants

​Pydantic Models

​SortRequest

​Functions

​get_folders()

​load_predictions()

​_run_retrain()

​API Endpoints

​GET /

​GET /api/videos

​GET /api/folders

​POST /api/sort

​POST /api/retrain

​GET /api/retrain/status

​GET /videos/

​Running the Server

​Development

​Production

​With Auto-Reload

​State Management

​predictions

​retrain_status

​Dependencies

​Architecture Notes

Build docs developers (and LLMs) love

Overview

Configuration Constants

Pydantic Models

SortRequest

Functions

get_folders()

load_predictions()

_run_retrain()

API Endpoints

GET /

GET /api/videos

GET /api/folders

POST /api/sort

POST /api/retrain

GET /api/retrain/status

GET /videos/

Running the Server

Development

Production

With Auto-Reload

State Management

predictions

retrain_status

Dependencies

Architecture Notes