Pipeline Configuration Flags

The run_full_pipeline.py script includes two critical configuration flags (lines 64-67) that control which optional data is fetched:

# ═══════════════════════════════════════════════════
# Configuration
# ═══════════════════════════════════════════════════

# OHLCV: Auto-detect mode
# True = always fetch (incremental update: ~2-5 min if data exists, ~30 min first time)
# False = skip entirely (ADR, RVOL, ATH, % from ATH fields will be 0)
FETCH_OHLCV = True

# Set to True to also fetch standalone data (Indices, ETFs)
FETCH_OPTIONAL = False

FETCH_OHLCV Flag

What It Controls

Default: TrueControls whether lifetime historical OHLCV (Open, High, Low, Close, Volume) data is downloaded for all stocks and indices.

Behavior

FETCH_OHLCV = True
FETCH_OHLCV = False

Pipeline executes Phase 2.5:

📊 PHASE 2.5: OHLCV History (Smart Incremental)
────────────────────────────────────────────────
  ▶ Running fetch_all_ohlcv.py...
  ✅ fetch_all_ohlcv.py (142.3s)     # ~2-5 min if data exists
  ▶ Running fetch_indices_ohlcv.py...
  ✅ fetch_indices_ohlcv.py (18.7s)  # ~20-30 sec

Scripts Executed:

fetch_all_ohlcv.py → Creates ohlcv_data/ directory (~2,775 CSV files)
fetch_indices_ohlcv.py → Creates indices_ohlcv_data/ directory (~194 CSV files)

Time Impact:

First Run (Full History): +25-35 minutes
Daily Update (Incremental): +2-5 minutes
Weekly Update: +3-6 minutes

Phase 2.5 is skipped entirely:

📡 PHASE 2: Data Enrichment (Fetching)
────────────────────────────────────────
  ✅ fetch_complete_price_bands.py (3.2s)

🔬 PHASE 3: Base Analysis (Building Master JSON)
────────────────────────────────────────────────
  ▶ Running bulk_market_analyzer.py...

Scripts Skipped:

fetch_all_ohlcv.py
fetch_indices_ohlcv.py

Impact on Output: The following 14 fields will be set to 0 or N/A in all_stocks_fundamental_analysis.json:

Field	Value When Skipped
`5 Days MA ADR(%)`	`0`
`14 Days MA ADR(%)`	`0`
`20 Days MA ADR(%)`	`0`
`30 Days MA ADR(%)`	`0`
`RVOL`	`0`
`200 Days EMA Volume`	`0`
`% from 52W High 200 Days EMA Volume`	`0`
`Daily Rupee Turnover 20(Cr.)`	`0`
`Daily Rupee Turnover 50(Cr.)`	`0`
`Daily Rupee Turnover 100(Cr.)`	`0`
`30 Days Average Rupee Volume(Cr.)`	`0`
`% from ATH`	`0`
`Returns since Earnings(%)`	`N/A`
`Max Returns since Earnings(%)`	`N/A`

When to Use

Set to True

Use Cases:

Need volatility metrics (ADR)
Need volume analysis (RVOL)
Need all-time high tracking
Need post-earnings return tracking
Running production screener

Time Cost: +2-5 min (daily updates)

Set to False

Use Cases:

Quick testing/debugging
Only need fundamental data
Only need live market snapshot
Bandwidth/storage constraints

Time Saved: ~2-5 min per run

Dependencies

These Phase 4 scripts depend on OHLCV data:

Phase 4 Scripts requiring OHLCV:
├── advanced_metrics_processor.py    # Calculates ADR, RVOL, ATH
└── process_earnings_performance.py  # Calculates post-earnings returns

If FETCH_OHLCV = False, these scripts will run but skip calculations and output:

⚠️  WARNING: ohlcv_data/ not found. Skipping ADR, RVOL, ATH calculations.

Storage Impact

State	Storage Used
`FETCH_OHLCV = True`	~320-420 MB (2,969 CSV files)
`FETCH_OHLCV = False`	0 MB

OHLCV data is preserved during cleanup (it’s not considered intermediate).

FETCH_OPTIONAL Flag

What It Controls

Default: FalseControls whether standalone reference data (Indices, ETFs) is fetched. This data is NOT merged into the main all_stocks_fundamental_analysis.json.

Behavior

FETCH_OPTIONAL = True
FETCH_OPTIONAL = False

Pipeline executes Phase 6:

📋 PHASE 6: Optional Standalone Data
────────────────────────────────────
  ▶ Running fetch_all_indices.py...
  ✅ fetch_all_indices.py (2.3s)
  ▶ Running fetch_etf_data.py...
  ✅ fetch_etf_data.py (1.8s)

Scripts Executed:

fetch_all_indices.py → Creates all_indices_list.json (~194 indices)
fetch_etf_data.py → Creates etf_data_response.json (~361 ETFs)

Output Files:

├── all_indices_list.json         (1.2 MB)
└── etf_data_response.json        (2.8 MB)

Time Impact: +3-5 seconds

Phase 6 is skipped entirely:

📦 PHASE 5: Compression (.json → .json.gz)
────────────────────────────────────────────
  📦 Compressed: 45.2 MB → 6.8 MB (85% reduction)

🧹 CLEANUP: Removing intermediate files...

Scripts Skipped:

fetch_all_indices.py
fetch_etf_data.py

Output Files Not Created:

all_indices_list.json
etf_data_response.json

When to Use

Set to True

Use Cases:

Need indices for benchmarking
Building ETF screener
Analyzing sectoral rotation
Creating index correlation charts

Time Cost: +3-5 seconds

Set to False (Default)

Use Cases:

Only analyzing individual stocks
Don’t need index/ETF reference data
Reducing output file count

Time Saved: ~4 seconds

Important Notes

These files are standalone reference datasets. They are NOT merged into all_stocks_fundamental_analysis.json.If you need index/ETF data in your application, you must:

Set FETCH_OPTIONAL = True
Read all_indices_list.json and etf_data_response.json separately

Indices OHLCV vs Indices Metadata:

fetch_all_indices.py (controlled by FETCH_OPTIONAL) → Live snapshot + metadata
fetch_indices_ohlcv.py (controlled by FETCH_OHLCV) → Historical OHLCV data

To get complete index data, set both flags to True.

Storage Impact

State	Storage Used
`FETCH_OPTIONAL = True`	~4 MB (2 JSON files)
`FETCH_OPTIONAL = False`	0 MB

Configuration Matrix

Production (Recommended)
Full Dataset (Everything)
Testing/Debug (Fast)
Fundamentals Only

FETCH_OHLCV = True
FETCH_OPTIONAL = False  # Unless you need indices/ETFs

Outputs:

all_stocks_fundamental_analysis.json.gz (with all 86 fields)
ohlcv_data/ directory (~2,775 CSV files)
indices_ohlcv_data/ directory (~194 CSV files)

Total Time: ~4-5 minutes (incremental updates)

FETCH_OHLCV = True
FETCH_OPTIONAL = True

Outputs:

all_stocks_fundamental_analysis.json.gz (with all 86 fields)
ohlcv_data/ directory (~2,775 CSV files)
indices_ohlcv_data/ directory (~194 CSV files)
all_indices_list.json (194 indices)
etf_data_response.json (361 ETFs)

Total Time: ~4-5 minutes (incremental updates)

FETCH_OHLCV = False
FETCH_OPTIONAL = False

Outputs:

all_stocks_fundamental_analysis.json.gz (14 fields set to 0/N/A)

Total Time: ~2 minutes

Missing Fields: ADR, RVOL, ATH, Earnings Returns, Volume Metrics

FETCH_OHLCV = False
FETCH_OPTIONAL = True   # If you need indices/ETFs

Outputs:

all_stocks_fundamental_analysis.json.gz (14 fields set to 0/N/A)
all_indices_list.json
etf_data_response.json

Total Time: ~2 minutes

How to Change Flags

Step 1: Edit run_full_pipeline.py

Open the file and navigate to lines 64-67:

# ═══════════════════════════════════════════════════
# Configuration
# ═══════════════════════════════════════════════════

FETCH_OHLCV = True     # Change to False to skip OHLCV
FETCH_OPTIONAL = False  # Change to True to fetch Indices/ETFs

Step 2: Save and Run

python3 run_full_pipeline.py

Step 3: Verify Output

Check the console for Phase 2.5 (OHLCV) and Phase 6 (Optional):

# If FETCH_OHLCV = True:
📊 PHASE 2.5: OHLCV History (Smart Incremental)

# If FETCH_OPTIONAL = True:
📋 PHASE 6: Optional Standalone Data

Impact Summary Table

Flag Configuration	Time (Incremental)	Storage	Fields in JSON	Extra Files
Both `True`	~4-5 min	~420 MB	86/86 (100%)	Indices, ETFs, OHLCV
`FETCH_OHLCV=True`, `FETCH_OPTIONAL=False`	~4-5 min	~420 MB	86/86 (100%)	OHLCV only
`FETCH_OHLCV=False`, `FETCH_OPTIONAL=True`	~2 min	~4 MB	72/86 (84%)	Indices, ETFs
Both `False`	~2 min	0 MB	72/86 (84%)	None

Performance Tips

Optimize for Speed

Set both flags to False during development:

FETCH_OHLCV = False
FETCH_OPTIONAL = False

Time saved: ~2-3 minutes per run

Optimize for Completeness

Set both flags to True for production:

FETCH_OHLCV = True
FETCH_OPTIONAL = True

Benefit: All 86 fields + reference datasets

Scheduled Runs

For daily scheduled runs (cron/Task Scheduler):

FETCH_OHLCV = True   # Incremental updates are fast (~2-5 min)
FETCH_OPTIONAL = True # Only adds 4 seconds

Schedule: Run once per day at 4:00 PM IST (after market close)

Next Steps

Cleanup Options

Configure CLEANUP_INTERMEDIATE flag

OHLCV Scripts

Deep dive into OHLCV data fetching

Optional Scripts

Understand Indices & ETF data

Pipeline Architecture

Learn how all scripts work together

Get Started

Core Concepts

Pipeline Scripts

Standalone Scripts

Configuration

FETCH_OHLCV Flag

What It Controls

Behavior

When to Use

Set to True

Set to False

Dependencies

Storage Impact

FETCH_OPTIONAL Flag

What It Controls

Behavior

When to Use

Set to True

Set to False (Default)

Important Notes

Storage Impact

Configuration Matrix

How to Change Flags

Step 1: Edit run_full_pipeline.py

Step 2: Save and Run

Step 3: Verify Output

Impact Summary Table

Performance Tips

Next Steps

Cleanup Options

OHLCV Scripts

Optional Scripts

Pipeline Architecture

Build docs developers (and LLMs) love

Get Started

Core Concepts

Pipeline Scripts

Standalone Scripts

Configuration

​FETCH_OHLCV Flag

​What It Controls

​Behavior

​When to Use

Set to True

Set to False

​Dependencies

​Storage Impact

​FETCH_OPTIONAL Flag

​What It Controls

​Behavior

​When to Use

Set to True

Set to False (Default)

​Important Notes

​Storage Impact

​Configuration Matrix

​How to Change Flags

​Step 1: Edit run_full_pipeline.py

​Step 2: Save and Run

​Step 3: Verify Output

​Impact Summary Table

​Performance Tips

​Next Steps

Cleanup Options

OHLCV Scripts

Optional Scripts

Pipeline Architecture

Build docs developers (and LLMs) love

FETCH_OHLCV Flag

What It Controls

Behavior

When to Use

Dependencies

Storage Impact

FETCH_OPTIONAL Flag

What It Controls

Behavior

When to Use

Important Notes

Storage Impact

Configuration Matrix

How to Change Flags

Step 1: Edit run_full_pipeline.py

Step 2: Save and Run

Step 3: Verify Output

Impact Summary Table

Performance Tips

Next Steps