Installation
Run KaggleIngest on your own infrastructure for development, testing, or self-hosting.Prerequisites
Before you begin, ensure you have:- Python 3.11+ with
uvpackage manager (recommended) orpip - Node.js 18+ and
npmfor the frontend - PostgreSQL 14+ running locally or accessible remotely
- Kaggle API credentials (username and API key from kaggle.com/settings)
Architecture
KaggleIngest consists of three main components:- Backend API: FastAPI application (
backend/app.py) - Frontend UI: React application with Vite
- PostgreSQL Database: Caching, search, and user management
Quick installation
Clone the repository and set up both backend and frontend:Set up the backend
Install dependencies and configure environment variables.Edit The backend will automatically create required tables on first run.
Using
uv is recommended for faster dependency resolution. If you don’t have uv, install it with pip install uv or use pip install -r requirements.txt instead.Configure environment
Copy the example environment file:.env and add your credentials:Get Kaggle credentials
- Go to kaggle.com/settings
- Scroll to the API section
- Click Create New API Token
- Extract
usernameandkeyfrom the downloadedkaggle.jsonfile
Set up PostgreSQL
Create a database for KaggleIngest:Set up the frontend
Install frontend dependencies.The frontend is pre-configured to connect to
http://localhost:8000 for the backend API.Start the services
Verify installation
Test that everything is working correctly.Health check
Create a test account
Test the API
Production deployment
For production deployments, consider the following:Environment configuration
Update your.env for production:
Run with Gunicorn
For production workloads, use a production ASGI server:Docker deployment
Create aDockerfile for containerized deployment:
Frontend build
Build the frontend for production:frontend/dist/. Serve them with Nginx, Vercel, or any static hosting provider.
Development tips
Code formatting
The backend usesblack for formatting:
prettier:
Linting
Backend linting withmypy:
Database migrations
Schema changes should be applied carefully. The application uses raw SQL for schema management. Checkbackend/core/database.py for the initialization logic.
Configuration reference
Key configuration options frombackend/config.py:
| Variable | Default | Description |
|---|---|---|
ENV | development | Environment mode (development or production) |
POSTGRES_URL | (required) | PostgreSQL connection string |
KAGGLE_USERNAME | (required) | Your Kaggle username |
KAGGLE_KEY | (required) | Your Kaggle API key |
CORS_ORIGINS | http://localhost:5173 | Comma-separated allowed origins |
RATE_LIMIT | 20 | Requests per minute per IP |
LOG_LEVEL | INFO | Logging level (DEBUG, INFO, WARNING, ERROR) |
MAX_CSV_FILES | 3 | Max CSV files to parse for dataset schemas |
SENTRY_DSN | (optional) | Sentry error tracking DSN |
PROMETHEUS_ENABLED | true | Enable Prometheus metrics at /api/metrics |
Troubleshooting
Backend won’t start
Error:CRITICAL: Missing required environment variables
Solution: Ensure KAGGLE_USERNAME, KAGGLE_KEY, and POSTGRES_URL are set in .env.
Database connection fails
Error:Failed to connect to database
Solution: Verify PostgreSQL is running and credentials are correct:
Frontend can’t reach backend
Error:Network Error or CORS policy
Solution: Ensure:
- Backend is running on
http://localhost:8000 - Frontend origin is listed in
CORS_ORIGINS - No firewall blocking port 8000
Kaggle API fails
Error:401 Unauthorized from Kaggle API
Solution: Verify your Kaggle credentials are correct and your Kaggle account has API access enabled.
Next steps
Core concepts
Learn about TOON format and caching architecture
API reference
Explore available endpoints and integration patterns
Authentication
Learn about API key authentication
Quick start
Use the hosted API instead of self-hosting