PostgreSQL-backed caching system with UNLOGGED tables for high-performance write throughput
KaggleIngest uses a sophisticated PostgreSQL-based caching layer to minimize redundant Kaggle API calls and provide sub-second response times for frequently accessed competitions and datasets.
The caching system implements a read-through cache pattern with three-state lifecycle management:
Check cache: Query PostgreSQL for existing data
Return if ready: Serve completed results immediately
Process if missing: Trigger background ingestion and return processing status
The cache uses PostgreSQL UNLOGGED tables for 2-3x faster write throughput at the cost of crash recovery. This is acceptable because cached data can be regenerated from Kaggle’s API.
The competition cache table is defined in backend/migrations/001_initial_schema.sql:40:
CREATE UNLOGGED TABLE IF NOT EXISTS competition_cache ( slug VARCHAR(255) PRIMARY KEY, -- Basic metadata title TEXT, description TEXT, deadline TIMESTAMP WITH TIME ZONE, reward VARCHAR(255), category VARCHAR(100), -- Full Kaggle API response metadata JSONB, -- TOON serialized content toon_content TEXT, -- Can be large (1-10MB) notebook_count INT DEFAULT 0, -- Cache status status VARCHAR(20) DEFAULT 'pending' CHECK (status IN ('pending', 'processing', 'completed', 'failed')), error_message TEXT, -- Timestamps fetched_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), -- Full-text search vector (auto-generated) search_vector tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', COALESCE(title, '')), 'A') || setweight(to_tsvector('english', COALESCE(description, '')), 'B') || setweight(to_tsvector('english', COALESCE(slug, '')), 'C') ) STORED);
Key indexes for performance:
-- Fast status lookups for pending/processing queriesCREATE INDEX idx_competition_status ON competition_cache(status, updated_at);-- Recent competitions (cache warming)CREATE INDEX idx_competition_updated ON competition_cache(updated_at DESC);-- Full-text search (GIN index for tsvector)CREATE INDEX idx_competition_search ON competition_cache USING GIN(search_vector);-- Fuzzy search (trigram similarity)CREATE INDEX idx_competition_title_trgm ON competition_cache USING GIN(title gin_trgm_ops);
The search_vector column is a generated column that automatically indexes title, description, and slug for full-text search. This enables fast competition discovery without manual index maintenance.