Skip to main content

Welcome to KaggleIngest

The High-Performance Bridge Between Kaggle Data and LLMs. KaggleIngest transforms complex Kaggle competitions, datasets, and notebooks into token-optimized context for AI assistants. It eliminates noise and intelligently ranks high-signal implementation patterns to help you win competitions faster.

Why KaggleIngest?

Building competitive machine learning solutions requires understanding past winning approaches, effective feature engineering, and optimal model architectures. KaggleIngest automatically extracts and ranks the most valuable insights from thousands of Kaggle notebooks, delivering them in a format optimized for AI assistant consumption.

Quick start

Get from signup to your first API call in under 5 minutes

Installation

Set up KaggleIngest locally for development or self-hosting

API reference

Complete API documentation with examples and schemas

Core concepts

Learn about TOON format and ranking algorithms

Core capabilities

Smart context ranking

Our custom scoring algorithm (Log(Upvotes) * TimeDecay) prioritizes recent, high-quality solution patterns. You get the most relevant notebooks first, not just the most popular ones from years ago.

Token-Oriented Object Notation (TOON)

A proprietary format that reduces token consumption by up to 60% while preserving structural metadata for LLMs. TOON delivers competition metadata, dataset schemas, sample data, and top notebook content in a single, optimized payload.

Dual-track ingestion

Instant context for cached competitions with zero-latency hits. For new competitions, automated background fetching ensures you’re never left waiting.

PostgreSQL-as-Everything architecture

We’ve evolved from SQLite/Redis to a unified PostgreSQL engine for state management, ranked search, and audit-compliant caching. High-performance UNLOGGED tables provide maximum write throughput during ingestion.

Features at a glance

  • API-first security: Robust X-API-Key authentication with tiered credit management
  • Multi-strategy search: Full-text search (FTS) combined with trigram similarity for typo-tolerant results
  • Robust parsing: Hardened support for legacy nbformat, multi-encoding CSVs, and malformed datasets
  • Free tier: 10 credits per user to get started
  • FastAPI backend: High-performance async Python API
  • React frontend: Modern web interface for exploration and testing

Architecture overview

KaggleIngest is built as a production-ready SaaS platform:
  • Backend: FastAPI with asyncpg for high-throughput PostgreSQL access
  • Frontend: React with Vite for fast development and optimal production builds
  • Database: PostgreSQL for caching, search, user management, and audit logs
  • Authentication: Simple API key-based authentication with credit tracking
  • Rate limiting: SlowAPI integration to prevent abuse
  • Observability: Sentry for error tracking, Prometheus for metrics

Getting started

Ready to transform how you approach Kaggle competitions?

Quick start guide

Create an account and make your first API call

Local installation

Run KaggleIngest on your own infrastructure

Build docs developers (and LLMs) love