Quickstart Guide

Get from zero to a complete market intelligence dataset in under 5 minutes (first run with OHLCV: ~30-40 minutes).

Prerequisites

Python 3.8+

Verify your Python version:

python3 --version

Expected output: Python 3.8.x or higher

Install Dependencies

The pipeline requires only one external package:

pip install requests

Navigate to Source

cd ~/workspace/source/"DO NOT DELETE EDL PIPELINE"

Running the Pipeline

Execute the master runner script:

python3 run_full_pipeline.py

You’ll see output like this:

═══════════════════════════════════════════════════════════
  EDL PIPELINE - FULL DATA REFRESH
═══════════════════════════════════════════════════════════

📦 PHASE 1: Core Data (Foundation)
────────────────────────────────────────────
  ▶ Running fetch_dhan_data.py...
  ✅ fetch_dhan_data.py (3.2s)
  ▶ Running fetch_fundamental_data.py...
  ✅ fetch_fundamental_data.py (45.8s)
  ▶ Downloading NSE Listing Dates...
  ✅ NSE Listing Dates downloaded.

📡 PHASE 2: Data Enrichment (Fetching)
────────────────────────────────────────────
  ▶ Running fetch_company_filings.py...
  ✅ fetch_company_filings.py (120.5s)
  ▶ Running fetch_new_announcements.py...
  ✅ fetch_new_announcements.py (35.2s)
  ...

📊 PHASE 2.5: OHLCV History (Smart Incremental)
────────────────────────────────────────────
  ▶ Running fetch_all_ohlcv.py...
  ✅ fetch_all_ohlcv.py (180.3s)  # First run: ~30 min, subsequent: ~2-5 min

🔬 PHASE 3: Base Analysis (Building Master JSON)
────────────────────────────────────────────
  ▶ Running bulk_market_analyzer.py...
  ✅ bulk_market_analyzer.py (8.1s)

✨ PHASE 4: Enrichment (Injecting into Master JSON)
────────────────────────────────────────────
  ▶ Running advanced_metrics_processor.py...
  ✅ advanced_metrics_processor.py (12.5s)
  ▶ Running process_earnings_performance.py...
  ✅ process_earnings_performance.py (5.2s)
  ▶ Running enrich_fno_data.py...
  ✅ enrich_fno_data.py (2.1s)
  ▶ Running process_market_breadth.py...
  ✅ process_market_breadth.py (4.3s)
  ▶ Running add_corporate_events.py...
  ✅ add_corporate_events.py (6.8s)

📦 PHASE 5: Compression (.json → .json.gz)
────────────────────────────────────────────
  📦 Compressed: 68.5 MB → 9.2 MB (87% reduction)

🧹 CLEANUP: Removing intermediate files...
────────────────────────────────────────────
  🗑️  Cleaned: 13 files + 2 dirs (58.3 MB freed)

═══════════════════════════════════════════════════════════
  PIPELINE COMPLETE
═══════════════════════════════════════════════════════════
  Total Time:  245.3s (4.1 min)
  Successful:  18/18
  Failed:      0/18

  📄 Output: all_stocks_fundamental_analysis.json.gz (9.2 MB)
  📦 Compression: 68.5 MB → 9.2 MB (87% smaller)
  🧹 Only .json.gz + ohlcv_data/ remain. All intermediate data purged.
═══════════════════════════════════════════════════════════

Configuration Options

Edit run_full_pipeline.py to customize pipeline behavior. Open the file and locate the configuration section (lines 57-71):

# ═══════════════════════════════════════════════════
# Configuration
# ═══════════════════════════════════════════════════

# OHLCV: Auto-detect mode
# True = always fetch (incremental update: ~2-5 min if data exists, ~30 min first time)
# False = skip entirely (ADR, RVOL, ATH, % from ATH fields will be 0)
FETCH_OHLCV = True

# Set to True to also fetch standalone data (Indices, ETFs)
FETCH_OPTIONAL = False

# Auto-delete intermediate files after pipeline succeeds
# Keeps: all_stocks_fundamental_analysis.json.gz + ohlcv_data/
CLEANUP_INTERMEDIATE = True

Configuration Flags Explained

# Default: True
# Controls lifetime OHLCV data fetching

FETCH_OHLCV = True
# ✅ First run: Downloads complete history (~30 min for 2,775 stocks)
# ✅ Subsequent runs: Auto-detects existing data, fetches only new candles (~2-5 min)
# ✅ Enables: ADR, RVOL, ATH, % from ATH, post-earnings returns

FETCH_OHLCV = False
# ⚠️ Skips OHLCV entirely (saves ~30 min on first run)
# ⚠️ Fields will be 0: ADR, RVOL, ATH, % from ATH, Returns since Earnings

Verifying Output

After the pipeline completes, verify the output:

ls -lh all_stocks_fundamental_analysis.json.gz

Expected output:

-rw-r--r-- 1 user user 9.2M Mar  3 18:45 all_stocks_fundamental_analysis.json.gz

Inspecting the Data

Decompress and inspect the JSON:

gunzip -c all_stocks_fundamental_analysis.json.gz | python3 -m json.tool | head -n 50

Or use the built-in single stock analyzer:

python3 single_stock_analyzer.py

Enter a symbol (e.g., RELIANCE) to see all 86 fields for that stock.

Understanding the Output Structure

The compressed file contains an array of stock objects. Here’s a sample record:

{
  "Symbol": "RELIANCE",
  "Name": "Reliance Industries Ltd.",
  "Listing Date": "29-Nov-1977",
  "Basic Industry": "Refineries",
  "Sector": "Energy",
  "Index": "NIFTY 50"
}

Common Workflows

Daily Incremental Update

Run the pipeline once per day after market close:

# Add to crontab for automated runs
0 18 * * 1-5 cd ~/workspace/source/"DO NOT DELETE EDL PIPELINE" && python3 run_full_pipeline.py

With FETCH_OHLCV = True, subsequent runs complete in ~4-6 minutes (smart incremental updates).

First-Time Setup (Full History)

For initial setup with complete OHLCV history:

# run_full_pipeline.py
FETCH_OHLCV = True  # Downloads lifetime data (~30 min)
FETCH_OPTIONAL = True  # Include indices/ETFs
CLEANUP_INTERMEDIATE = False  # Keep intermediate files for inspection

Quick Refresh (No OHLCV)

For rapid fundamental-only updates:

# run_full_pipeline.py
FETCH_OHLCV = False  # Skip OHLCV (~3-4 min total)
FETCH_OPTIONAL = False
CLEANUP_INTERMEDIATE = True

First Run Duration: With FETCH_OHLCV = True, the first run takes ~30-40 minutes to download lifetime daily candles for 2,775 stocks. Subsequent runs detect existing data and fetch only new candles (~2-5 min).

Troubleshooting

Pipeline Fails at Phase 1

🛑 CRITICAL: fetch_dhan_data.py failed. Cannot continue.
   This script produces master_isin_map.json which ALL other scripts need.

Solution: Check your internet connection. fetch_dhan_data.py must succeed before any other scripts run. Retry:

python3 fetch_dhan_data.py

If successful, re-run the full pipeline.

Timeout Errors

⏰ fetch_all_ohlcv.py TIMED OUT (>30 min)

Solution: The default timeout is 1800s (30 min). Edit run_full_pipeline.py line 117 to increase:

result = subprocess.run(
    [sys.executable, script_path],
    cwd=BASE_DIR,
    text=True,
    timeout=3600  # Increase to 60 min
)

Script Failures in Phase 4

❌ add_corporate_events.py FAILED (12.3s)

Note: The pipeline continues on enrichment failures (line 126). Check the script output for specific errors. Most Phase 4 failures are non-critical (data will be missing but pipeline completes).

Missing OHLCV Data

If ADR, RVOL, ATH fields are all 0:

Verify FETCH_OHLCV = True in run_full_pipeline.py
Check if ohlcv_data/ directory exists and contains .csv files:
```
ls ohlcv_data/ | wc -l
```
Expected: ~2775 files
Re-run fetch_all_ohlcv.py manually:
```
python3 fetch_all_ohlcv.py
```

Next Steps

Explore Configuration

Configure OHLCV, cleanup, and optional data

Pipeline Architecture

Deep dive into 6-phase pipeline execution

API Reference

Complete endpoint and field documentation

Output Schema

All 86 output fields explained

Pipeline Architecture

Understand the 6-phase execution model and dependency chain

API Reference

Endpoint details, payloads, pagination, and rate limits

Field Reference

Complete documentation of all 86 output fields

Auto-Cleanup: By default, CLEANUP_INTERMEDIATE = True deletes all intermediate files after success. Only all_stocks_fundamental_analysis.json.gz and ohlcv_data/ remain. Set to False for debugging.

Performance Benchmarks

Configuration	First Run	Subsequent Runs	Output Size
Full (OHLCV + Optional)	~40 min	~6 min	~9.2 MB
Standard (OHLCV only)	~35 min	~4 min	~9.2 MB
Quick (No OHLCV)	~4 min	~3 min	~8.5 MB

Tested on: 100 Mbps connection, 16 GB RAM, 4-core CPU

Get Started

Core Concepts

Pipeline Scripts

Standalone Scripts

Configuration

Quickstart

Quickstart Guide

Prerequisites

Running the Pipeline

Configuration Options

Configuration Flags Explained

Verifying Output

Inspecting the Data

Understanding the Output Structure

Common Workflows

Daily Incremental Update

First-Time Setup (Full History)

Quick Refresh (No OHLCV)

Troubleshooting

Pipeline Fails at Phase 1

Timeout Errors

Script Failures in Phase 4

Missing OHLCV Data

Next Steps

Explore Configuration

Pipeline Architecture

API Reference

Output Schema

Pipeline Architecture

API Reference

Field Reference

Performance Benchmarks

Build docs developers (and LLMs) love

Get Started

Core Concepts

Pipeline Scripts

Standalone Scripts

Configuration

​Quickstart Guide

​Prerequisites

​Running the Pipeline

​Configuration Options

​Configuration Flags Explained

​Verifying Output

​Inspecting the Data

​Understanding the Output Structure

​Common Workflows

​Daily Incremental Update

​First-Time Setup (Full History)

​Quick Refresh (No OHLCV)

​Troubleshooting

​Pipeline Fails at Phase 1

​Timeout Errors

​Script Failures in Phase 4

​Missing OHLCV Data

​Next Steps

Explore Configuration

Pipeline Architecture

API Reference

Output Schema

Pipeline Architecture

API Reference

Field Reference

​Performance Benchmarks

Build docs developers (and LLMs) love

Quickstart Guide

Prerequisites

Running the Pipeline

Configuration Options

Configuration Flags Explained

Verifying Output

Inspecting the Data

Understanding the Output Structure

Common Workflows

Daily Incremental Update

First-Time Setup (Full History)

Quick Refresh (No OHLCV)

Troubleshooting

Pipeline Fails at Phase 1

Timeout Errors

Script Failures in Phase 4

Missing OHLCV Data

Next Steps

Performance Benchmarks