run_full_pipeline.py script includes two critical configuration flags (lines 64-67) that control which optional data is fetched:
FETCH_OHLCV Flag
What It Controls
Default:
TrueControls whether lifetime historical OHLCV (Open, High, Low, Close, Volume) data is downloaded for all stocks and indices.Behavior
- FETCH_OHLCV = True
- FETCH_OHLCV = False
Pipeline executes Phase 2.5:Scripts Executed:
fetch_all_ohlcv.py→ Createsohlcv_data/directory (~2,775 CSV files)fetch_indices_ohlcv.py→ Createsindices_ohlcv_data/directory (~194 CSV files)
- First Run (Full History): +25-35 minutes
- Daily Update (Incremental): +2-5 minutes
- Weekly Update: +3-6 minutes
When to Use
Set to True
Use Cases:
- Need volatility metrics (ADR)
- Need volume analysis (RVOL)
- Need all-time high tracking
- Need post-earnings return tracking
- Running production screener
Set to False
Use Cases:
- Quick testing/debugging
- Only need fundamental data
- Only need live market snapshot
- Bandwidth/storage constraints
Dependencies
These Phase 4 scripts depend on OHLCV data:FETCH_OHLCV = False, these scripts will run but skip calculations and output:
Storage Impact
| State | Storage Used |
|---|---|
FETCH_OHLCV = True | ~320-420 MB (2,969 CSV files) |
FETCH_OHLCV = False | 0 MB |
FETCH_OPTIONAL Flag
What It Controls
Default:
FalseControls whether standalone reference data (Indices, ETFs) is fetched. This data is NOT merged into the main all_stocks_fundamental_analysis.json.Behavior
- FETCH_OPTIONAL = True
- FETCH_OPTIONAL = False
Pipeline executes Phase 6:Scripts Executed:Time Impact: +3-5 seconds
fetch_all_indices.py→ Createsall_indices_list.json(~194 indices)fetch_etf_data.py→ Createsetf_data_response.json(~361 ETFs)
When to Use
Set to True
Use Cases:
- Need indices for benchmarking
- Building ETF screener
- Analyzing sectoral rotation
- Creating index correlation charts
Set to False (Default)
Use Cases:
- Only analyzing individual stocks
- Don’t need index/ETF reference data
- Reducing output file count
Important Notes
Indices OHLCV vs Indices Metadata:
fetch_all_indices.py(controlled byFETCH_OPTIONAL) → Live snapshot + metadatafetch_indices_ohlcv.py(controlled byFETCH_OHLCV) → Historical OHLCV data
Storage Impact
| State | Storage Used |
|---|---|
FETCH_OPTIONAL = True | ~4 MB (2 JSON files) |
FETCH_OPTIONAL = False | 0 MB |
Configuration Matrix
- Production (Recommended)
- Full Dataset (Everything)
- Testing/Debug (Fast)
- Fundamentals Only
all_stocks_fundamental_analysis.json.gz(with all 86 fields)ohlcv_data/directory (~2,775 CSV files)indices_ohlcv_data/directory (~194 CSV files)
How to Change Flags
Step 1: Edit run_full_pipeline.py
Open the file and navigate to lines 64-67:Step 2: Save and Run
Step 3: Verify Output
Check the console for Phase 2.5 (OHLCV) and Phase 6 (Optional):Impact Summary Table
| Flag Configuration | Time (Incremental) | Storage | Fields in JSON | Extra Files |
|---|---|---|---|---|
Both True | ~4-5 min | ~420 MB | 86/86 (100%) | Indices, ETFs, OHLCV |
FETCH_OHLCV=True, FETCH_OPTIONAL=False | ~4-5 min | ~420 MB | 86/86 (100%) | OHLCV only |
FETCH_OHLCV=False, FETCH_OPTIONAL=True | ~2 min | ~4 MB | 72/86 (84%) | Indices, ETFs |
Both False | ~2 min | 0 MB | 72/86 (84%) | None |
Performance Tips
Optimize for Speed
Optimize for Speed
Set both flags to Time saved: ~2-3 minutes per run
False during development:Optimize for Completeness
Optimize for Completeness
Set both flags to Benefit: All 86 fields + reference datasets
True for production:Scheduled Runs
Scheduled Runs
For daily scheduled runs (cron/Task Scheduler):Schedule: Run once per day at 4:00 PM IST (after market close)
Next Steps
Cleanup Options
Configure CLEANUP_INTERMEDIATE flag
OHLCV Scripts
Deep dive into OHLCV data fetching
Optional Scripts
Understand Indices & ETF data
Pipeline Architecture
Learn how all scripts work together