Overview
FastAPI backend that serves videos, provides predictions, and handles sorting videos into folders. Includes endpoints for listing videos, managing folders, sorting videos, and triggering model retraining. Location:source/server.py
Server: FastAPI with Uvicorn
Default Port: 8000
Configuration Constants
Directory containing videos and category subfolders.
Directory containing predictions and trained models.
Path to the frontend HTML file.
Pydantic Models
SortRequest
Request body schema for sorting a video into a folder.Video filename (must match pattern
\d+\.mp4, e.g., 7234567890123456.mp4)Target folder name (must exist as a subfolder in DATA_DIR)
Functions
get_folders()
Returns list of category folders with video counts. Returns:list[dict] - List of folders with structure:
load_predictions()
Startup event handler that loads predictions fromartifacts/predictions.json into memory.
Populates the global predictions dictionary mapping filename → prediction data.
_run_retrain()
Background function that runs the full ML pipeline in sequence:extract_features.py- Extract features from all videostrain.py- Train classifier on labeled datapredict.py- Generate predictions for unsorted videos
retrain_status global state.
Timeout: 900 seconds (15 minutes) per script
API Endpoints
GET /
Serves the main HTML interface. Response: HTML file (index.html)
GET /api/videos
Lists all unsorted videos (in root directory) with their predictions. Response:Array of unsorted videos with prediction data
Total number of unsorted videos
Video filename
Predicted category folder, or null if no prediction available
Prediction confidence score (0-1)
Array of top predictions with folder names and confidence scores
GET /api/folders
Lists all category folders with video counts. Response:Array of folder objects
Folder name
Number of videos in this folder
POST /api/sort
Moves a video from root directory to a category folder. Request Body:SortRequest
Response:
Whether the sort operation succeeded
Filename that was sorted
Folder the video was moved to
Updated list of all folders with new counts
400 Bad Request- Invalid filename format or folder name404 Not Found- Video file not found (may already be sorted)409 Conflict- File already exists in target folder
- Validates filename matches pattern
\d+\.mp4 - Prevents path traversal (rejects folders containing
..or/) - Checks that destination folder exists and is a directory
- Validates source file exists and is a file (not already moved)
POST /api/retrain
Triggers a full model retraining pipeline in the background. Request Body: None Response:Either
"started" if retraining began, or "already_running" if a retrain is in progress- Checks if retraining is already in progress
- Launches background thread to run pipeline
- Returns immediately (doesn’t wait for completion)
- Pipeline runs:
extract_features.py→train.py→predict.py - Reloads predictions into memory when complete
GET /api/retrain/status
Checks the status of the retraining pipeline. Response:Whether retraining is currently in progress
Result of last retrain attempt:
"success"- Completed successfully"Failed at [script]: [error]"- Failed with error messagenull- No retrain has completed yet
running: true to show progress.
GET /videos/
Serves a video file for playback. Path Parameters:Video filename (must match pattern
\d+\.mp4)Content-Type: video/mp4
File Lookup:
- First checks root directory (unsorted videos)
- If not found, searches all category subfolders
- Returns 404 if not found anywhere
400 Bad Request- Invalid filename format404 Not Found- Video file not found
Running the Server
Development
http://0.0.0.0:8000
Production
With Auto-Reload
State Management
The server maintains two global state variables:predictions
artifacts/predictions.json at startup and after retraining.
retrain_status
running: Boolean indicating if retrain is in progresslast_result: String with success/failure message from last retrain
Dependencies
Required Python packages:fastapi- Web frameworkuvicorn- ASGI serverpydantic- Request/response validation
extract_features.pytrain.pypredict.py
Architecture Notes
Thread Safety: The retraining pipeline runs in a daemon thread to avoid blocking API requests. Only one retrain can run at a time. File System: All file operations useshutil.move() for atomic moves. Path traversal is prevented through validation.
Caching: Predictions are cached in memory and only reloaded after successful retraining.
Error Handling: Invalid filenames, missing files, and folder conflicts return appropriate HTTP error codes.