Skip to main content
The run_full_pipeline.py script includes two critical configuration flags (lines 64-67) that control which optional data is fetched:
# ═══════════════════════════════════════════════════
# Configuration
# ═══════════════════════════════════════════════════

# OHLCV: Auto-detect mode
# True = always fetch (incremental update: ~2-5 min if data exists, ~30 min first time)
# False = skip entirely (ADR, RVOL, ATH, % from ATH fields will be 0)
FETCH_OHLCV = True

# Set to True to also fetch standalone data (Indices, ETFs)
FETCH_OPTIONAL = False

FETCH_OHLCV Flag

What It Controls

Default: TrueControls whether lifetime historical OHLCV (Open, High, Low, Close, Volume) data is downloaded for all stocks and indices.

Behavior

Pipeline executes Phase 2.5:
📊 PHASE 2.5: OHLCV History (Smart Incremental)
────────────────────────────────────────────────
 Running fetch_all_ohlcv.py...
 fetch_all_ohlcv.py (142.3s)     # ~2-5 min if data exists
 Running fetch_indices_ohlcv.py...
 fetch_indices_ohlcv.py (18.7s)  # ~20-30 sec
Scripts Executed:
  • fetch_all_ohlcv.py → Creates ohlcv_data/ directory (~2,775 CSV files)
  • fetch_indices_ohlcv.py → Creates indices_ohlcv_data/ directory (~194 CSV files)
Time Impact:
  • First Run (Full History): +25-35 minutes
  • Daily Update (Incremental): +2-5 minutes
  • Weekly Update: +3-6 minutes

When to Use

Set to True

Use Cases:
  • Need volatility metrics (ADR)
  • Need volume analysis (RVOL)
  • Need all-time high tracking
  • Need post-earnings return tracking
  • Running production screener
Time Cost: +2-5 min (daily updates)

Set to False

Use Cases:
  • Quick testing/debugging
  • Only need fundamental data
  • Only need live market snapshot
  • Bandwidth/storage constraints
Time Saved: ~2-5 min per run

Dependencies

These Phase 4 scripts depend on OHLCV data:
Phase 4 Scripts requiring OHLCV:
├── advanced_metrics_processor.py    # Calculates ADR, RVOL, ATH
└── process_earnings_performance.py  # Calculates post-earnings returns
If FETCH_OHLCV = False, these scripts will run but skip calculations and output:
⚠️  WARNING: ohlcv_data/ not found. Skipping ADR, RVOL, ATH calculations.

Storage Impact

StateStorage Used
FETCH_OHLCV = True~320-420 MB (2,969 CSV files)
FETCH_OHLCV = False0 MB
OHLCV data is preserved during cleanup (it’s not considered intermediate).

FETCH_OPTIONAL Flag

What It Controls

Default: FalseControls whether standalone reference data (Indices, ETFs) is fetched. This data is NOT merged into the main all_stocks_fundamental_analysis.json.

Behavior

Pipeline executes Phase 6:
📋 PHASE 6: Optional Standalone Data
────────────────────────────────────
 Running fetch_all_indices.py...
 fetch_all_indices.py (2.3s)
 Running fetch_etf_data.py...
 fetch_etf_data.py (1.8s)
Scripts Executed:
  • fetch_all_indices.py → Creates all_indices_list.json (~194 indices)
  • fetch_etf_data.py → Creates etf_data_response.json (~361 ETFs)
Output Files:
├── all_indices_list.json         (1.2 MB)
└── etf_data_response.json        (2.8 MB)
Time Impact: +3-5 seconds

When to Use

Set to True

Use Cases:
  • Need indices for benchmarking
  • Building ETF screener
  • Analyzing sectoral rotation
  • Creating index correlation charts
Time Cost: +3-5 seconds

Set to False (Default)

Use Cases:
  • Only analyzing individual stocks
  • Don’t need index/ETF reference data
  • Reducing output file count
Time Saved: ~4 seconds

Important Notes

These files are standalone reference datasets. They are NOT merged into all_stocks_fundamental_analysis.json.If you need index/ETF data in your application, you must:
  1. Set FETCH_OPTIONAL = True
  2. Read all_indices_list.json and etf_data_response.json separately
Indices OHLCV vs Indices Metadata:
  • fetch_all_indices.py (controlled by FETCH_OPTIONAL) → Live snapshot + metadata
  • fetch_indices_ohlcv.py (controlled by FETCH_OHLCV) → Historical OHLCV data
To get complete index data, set both flags to True.

Storage Impact

StateStorage Used
FETCH_OPTIONAL = True~4 MB (2 JSON files)
FETCH_OPTIONAL = False0 MB

Configuration Matrix


How to Change Flags

Step 1: Edit run_full_pipeline.py

Open the file and navigate to lines 64-67:
# ═══════════════════════════════════════════════════
# Configuration
# ═══════════════════════════════════════════════════

FETCH_OHLCV = True     # Change to False to skip OHLCV
FETCH_OPTIONAL = False  # Change to True to fetch Indices/ETFs

Step 2: Save and Run

python3 run_full_pipeline.py

Step 3: Verify Output

Check the console for Phase 2.5 (OHLCV) and Phase 6 (Optional):
# If FETCH_OHLCV = True:
📊 PHASE 2.5: OHLCV History (Smart Incremental)

# If FETCH_OPTIONAL = True:
📋 PHASE 6: Optional Standalone Data

Impact Summary Table

Flag ConfigurationTime (Incremental)StorageFields in JSONExtra Files
Both True~4-5 min~420 MB86/86 (100%)Indices, ETFs, OHLCV
FETCH_OHLCV=True, FETCH_OPTIONAL=False~4-5 min~420 MB86/86 (100%)OHLCV only
FETCH_OHLCV=False, FETCH_OPTIONAL=True~2 min~4 MB72/86 (84%)Indices, ETFs
Both False~2 min0 MB72/86 (84%)None

Performance Tips

Set both flags to False during development:
FETCH_OHLCV = False
FETCH_OPTIONAL = False
Time saved: ~2-3 minutes per run
Set both flags to True for production:
FETCH_OHLCV = True
FETCH_OPTIONAL = True
Benefit: All 86 fields + reference datasets
For daily scheduled runs (cron/Task Scheduler):
FETCH_OHLCV = True   # Incremental updates are fast (~2-5 min)
FETCH_OPTIONAL = True # Only adds 4 seconds
Schedule: Run once per day at 4:00 PM IST (after market close)

Next Steps

Cleanup Options

Configure CLEANUP_INTERMEDIATE flag

OHLCV Scripts

Deep dive into OHLCV data fetching

Optional Scripts

Understand Indices & ETF data

Pipeline Architecture

Learn how all scripts work together

Build docs developers (and LLMs) love