Configure, monitor, and maintain the local parquet cache for optimal performance
The library caches indicator data locally as parquet files, dramatically improving performance for repeated queries. This guide covers cache configuration, monitoring, and maintenance.
Configure cache behavior when creating the client:
from esios import ESIOSClient, CacheConfigfrom pathlib import Pathconfig = CacheConfig( enabled=True, cache_dir=Path("/custom/cache/path"), recent_ttl_hours=48, # Re-fetch data newer than this meta_ttl_days=7, # Cache metadata for this long catalog_ttl_hours=24 # Cache indicator list for this long)client = ESIOSClient(token="your_api_key", cache_config=config)
config = CacheConfig(enabled=False)client = ESIOSClient(token="your_api_key", cache_config=config)# All requests hit the APIhandle = client.indicators.get(600)df = handle.historical("2024-01-01", "2024-01-31")# No caching performed
Disabling cache means every request hits the API, which is slower and counts against rate limits.
By default, data within the last 48 hours is always re-fetched because it may be updated:
import pandas as pdfrom datetime import timedelta# Default: 48 hoursconfig = CacheConfig(recent_ttl_hours=48)# Custom: only re-fetch data from last 24 hoursconfig = CacheConfig(recent_ttl_hours=24)# Aggressive caching: only re-fetch last 6 hoursconfig = CacheConfig(recent_ttl_hours=6)client = ESIOSClient(token="your_api_key", cache_config=config)
This prevents serving stale data for recent dates that might still be updated by REE.
The cache tracks coverage per-column (per-geography):
handle = client.indicators.get(600)# Request Spain onlydf1 = handle.historical( "2024-01-01", "2024-01-31", geo_ids=[3] # España)# Cache: España column populated for Jan 2024# Request Portugal onlydf2 = handle.historical( "2024-01-01", "2024-01-31", geo_ids=[8741] # Portugal)# Cache: Portugal column populated, España not re-fetched# Request bothdf3 = handle.historical( "2024-01-01", "2024-01-31", geo_ids=[3, 8741])# Cache hit: both columns read from cache, no API call
This per-column gap detection enables efficient partial caching.
handle = client.indicators.get(600)# These parameters bypass cache:# Time aggregation (API-side)df = handle.historical( "2024-01-01", "2024-01-31", time_agg="sum" # ⚠️ Hits API every time)# Geographic aggregation (API-side)df = handle.historical( "2024-01-01", "2024-01-31", geo_agg="average" # ⚠️ Hits API every time)# Cache disabled in configconfig = CacheConfig(enabled=False)client = ESIOSClient(token="your_api_key", cache_config=config)# ⚠️ All requests hit API
For aggregated views, fetch raw data (uses cache) then aggregate with pandas:
# First call fetches from APIindicators = client.indicators.list()# Cached at: ~/.cache/esios/indicators/catalog.json# Second call within 24h reads from cacheindicators = client.indicators.list() # Instant# After 24h, automatically refreshes
# First call fetches metadata from APIhandle = client.indicators.get(600)# Cached at: ~/.cache/esios/indicators/600/meta.json# Second call within 7 days reads from cachehandle = client.indicators.get(600) # Instant# After 7 days, automatically refreshes
Customize TTLs:
config = CacheConfig( catalog_ttl_hours=48, # Cache indicator list for 48h meta_ttl_days=30 # Cache metadata for 30 days)
The library maintains a global geo_id → geo_name registry:
# Read the global registrygeos = client.cache.read_geos()print(geos)# Output: {'3': 'España', '8741': 'Portugal', '8742': 'Francia', ...}# Automatically populated as you fetch datahandle = client.indicators.get(600)df = handle.historical("2024-01-01", "2024-01-31")# New geos discovered in data are merged into registrygeos = client.cache.read_geos()print(f"Known geographies: {len(geos)}")
The library automatically migrates old cache formats:
# Old layout (v0.x):~/.cache/esios/indicators/600.parquet# New layout (v1.0+):~/.cache/esios/indicators/600/data.parquet~/.cache/esios/indicators/600/meta.json
Migration happens transparently on first access. Old files are removed after successful migration.
Columns: String geo_ids (e.g., “3”, “8741”) converted to geo_names on read
Compression: Snappy (default)
You can read cache files directly:
import pandas as pddf = pd.read_parquet( "/home/user/.cache/esios/indicators/600/data.parquet")print(df.head())# Columns will be geo_ids ("3", "8741") not geo_names