Skip to main content

Quickstart Guide

Get from zero to a complete market intelligence dataset in under 5 minutes (first run with OHLCV: ~30-40 minutes).

Prerequisites

1

Python 3.8+

Verify your Python version:
python3 --version
Expected output: Python 3.8.x or higher
2

Install Dependencies

The pipeline requires only one external package:
pip install requests
3

Navigate to Source

cd ~/workspace/source/"DO NOT DELETE EDL PIPELINE"

Running the Pipeline

Execute the master runner script:
python3 run_full_pipeline.py
You’ll see output like this:
═══════════════════════════════════════════════════════════
  EDL PIPELINE - FULL DATA REFRESH
═══════════════════════════════════════════════════════════

📦 PHASE 1: Core Data (Foundation)
────────────────────────────────────────────
 Running fetch_dhan_data.py...
 fetch_dhan_data.py (3.2s)
 Running fetch_fundamental_data.py...
 fetch_fundamental_data.py (45.8s)
 Downloading NSE Listing Dates...
 NSE Listing Dates downloaded.

📡 PHASE 2: Data Enrichment (Fetching)
────────────────────────────────────────────
 Running fetch_company_filings.py...
 fetch_company_filings.py (120.5s)
 Running fetch_new_announcements.py...
 fetch_new_announcements.py (35.2s)
  ...

📊 PHASE 2.5: OHLCV History (Smart Incremental)
────────────────────────────────────────────
 Running fetch_all_ohlcv.py...
 fetch_all_ohlcv.py (180.3s)  # First run: ~30 min, subsequent: ~2-5 min

🔬 PHASE 3: Base Analysis (Building Master JSON)
────────────────────────────────────────────
 Running bulk_market_analyzer.py...
 bulk_market_analyzer.py (8.1s)

 PHASE 4: Enrichment (Injecting into Master JSON)
────────────────────────────────────────────
 Running advanced_metrics_processor.py...
 advanced_metrics_processor.py (12.5s)
 Running process_earnings_performance.py...
 process_earnings_performance.py (5.2s)
 Running enrich_fno_data.py...
 enrich_fno_data.py (2.1s)
 Running process_market_breadth.py...
 process_market_breadth.py (4.3s)
 Running add_corporate_events.py...
 add_corporate_events.py (6.8s)

📦 PHASE 5: Compression (.json  .json.gz)
────────────────────────────────────────────
  📦 Compressed: 68.5 MB 9.2 MB (87% reduction)

🧹 CLEANUP: Removing intermediate files...
────────────────────────────────────────────
  🗑️  Cleaned: 13 files + 2 dirs (58.3 MB freed)

═══════════════════════════════════════════════════════════
  PIPELINE COMPLETE
═══════════════════════════════════════════════════════════
  Total Time:  245.3s (4.1 min)
  Successful:  18/18
  Failed:      0/18

  📄 Output: all_stocks_fundamental_analysis.json.gz (9.2 MB)
  📦 Compression: 68.5 MB 9.2 MB (87% smaller)
  🧹 Only .json.gz + ohlcv_data/ remain. All intermediate data purged.
═══════════════════════════════════════════════════════════

Configuration Options

Edit run_full_pipeline.py to customize pipeline behavior. Open the file and locate the configuration section (lines 57-71):
# ═══════════════════════════════════════════════════
# Configuration
# ═══════════════════════════════════════════════════

# OHLCV: Auto-detect mode
# True = always fetch (incremental update: ~2-5 min if data exists, ~30 min first time)
# False = skip entirely (ADR, RVOL, ATH, % from ATH fields will be 0)
FETCH_OHLCV = True

# Set to True to also fetch standalone data (Indices, ETFs)
FETCH_OPTIONAL = False

# Auto-delete intermediate files after pipeline succeeds
# Keeps: all_stocks_fundamental_analysis.json.gz + ohlcv_data/
CLEANUP_INTERMEDIATE = True

Configuration Flags Explained

# Default: True
# Controls lifetime OHLCV data fetching

FETCH_OHLCV = True
# ✅ First run: Downloads complete history (~30 min for 2,775 stocks)
# ✅ Subsequent runs: Auto-detects existing data, fetches only new candles (~2-5 min)
# ✅ Enables: ADR, RVOL, ATH, % from ATH, post-earnings returns

FETCH_OHLCV = False
# ⚠️ Skips OHLCV entirely (saves ~30 min on first run)
# ⚠️ Fields will be 0: ADR, RVOL, ATH, % from ATH, Returns since Earnings

Verifying Output

After the pipeline completes, verify the output:
ls -lh all_stocks_fundamental_analysis.json.gz
Expected output:
-rw-r--r-- 1 user user 9.2M Mar  3 18:45 all_stocks_fundamental_analysis.json.gz

Inspecting the Data

Decompress and inspect the JSON:
gunzip -c all_stocks_fundamental_analysis.json.gz | python3 -m json.tool | head -n 50
Or use the built-in single stock analyzer:
python3 single_stock_analyzer.py
Enter a symbol (e.g., RELIANCE) to see all 86 fields for that stock.

Understanding the Output Structure

The compressed file contains an array of stock objects. Here’s a sample record:
{
  "Symbol": "RELIANCE",
  "Name": "Reliance Industries Ltd.",
  "Listing Date": "29-Nov-1977",
  "Basic Industry": "Refineries",
  "Sector": "Energy",
  "Index": "NIFTY 50"
}

Common Workflows

Daily Incremental Update

Run the pipeline once per day after market close:
# Add to crontab for automated runs
0 18 * * 1-5 cd ~/workspace/source/"DO NOT DELETE EDL PIPELINE" && python3 run_full_pipeline.py
With FETCH_OHLCV = True, subsequent runs complete in ~4-6 minutes (smart incremental updates).

First-Time Setup (Full History)

For initial setup with complete OHLCV history:
# run_full_pipeline.py
FETCH_OHLCV = True  # Downloads lifetime data (~30 min)
FETCH_OPTIONAL = True  # Include indices/ETFs
CLEANUP_INTERMEDIATE = False  # Keep intermediate files for inspection

Quick Refresh (No OHLCV)

For rapid fundamental-only updates:
# run_full_pipeline.py
FETCH_OHLCV = False  # Skip OHLCV (~3-4 min total)
FETCH_OPTIONAL = False
CLEANUP_INTERMEDIATE = True
First Run Duration: With FETCH_OHLCV = True, the first run takes ~30-40 minutes to download lifetime daily candles for 2,775 stocks. Subsequent runs detect existing data and fetch only new candles (~2-5 min).

Troubleshooting

Pipeline Fails at Phase 1

🛑 CRITICAL: fetch_dhan_data.py failed. Cannot continue.
   This script produces master_isin_map.json which ALL other scripts need.
Solution: Check your internet connection. fetch_dhan_data.py must succeed before any other scripts run. Retry:
python3 fetch_dhan_data.py
If successful, re-run the full pipeline.

Timeout Errors

 fetch_all_ohlcv.py TIMED OUT (>30 min)
Solution: The default timeout is 1800s (30 min). Edit run_full_pipeline.py line 117 to increase:
result = subprocess.run(
    [sys.executable, script_path],
    cwd=BASE_DIR,
    text=True,
    timeout=3600  # Increase to 60 min
)

Script Failures in Phase 4

 add_corporate_events.py FAILED (12.3s)
Note: The pipeline continues on enrichment failures (line 126). Check the script output for specific errors. Most Phase 4 failures are non-critical (data will be missing but pipeline completes).

Missing OHLCV Data

If ADR, RVOL, ATH fields are all 0:
  1. Verify FETCH_OHLCV = True in run_full_pipeline.py
  2. Check if ohlcv_data/ directory exists and contains .csv files:
    ls ohlcv_data/ | wc -l
    
    Expected: ~2775 files
  3. Re-run fetch_all_ohlcv.py manually:
    python3 fetch_all_ohlcv.py
    

Next Steps

Explore Configuration

Configure OHLCV, cleanup, and optional data

Pipeline Architecture

Deep dive into 6-phase pipeline execution

API Reference

Complete endpoint and field documentation

Output Schema

All 86 output fields explained

Pipeline Architecture

Understand the 6-phase execution model and dependency chain

API Reference

Endpoint details, payloads, pagination, and rate limits

Field Reference

Complete documentation of all 86 output fields
Auto-Cleanup: By default, CLEANUP_INTERMEDIATE = True deletes all intermediate files after success. Only all_stocks_fundamental_analysis.json.gz and ohlcv_data/ remain. Set to False for debugging.

Performance Benchmarks

ConfigurationFirst RunSubsequent RunsOutput Size
Full (OHLCV + Optional)~40 min~6 min~9.2 MB
Standard (OHLCV only)~35 min~4 min~9.2 MB
Quick (No OHLCV)~4 min~3 min~8.5 MB
Tested on: 100 Mbps connection, 16 GB RAM, 4-core CPU

Build docs developers (and LLMs) love