Quickstart Guide
Get from zero to a complete market intelligence dataset in under 5 minutes (first run with OHLCV: ~30-40 minutes).
Prerequisites
Python 3.8+
Verify your Python version: Expected output: Python 3.8.x or higher
Install Dependencies
The pipeline requires only one external package:
Navigate to Source
cd ~/workspace/source/"DO NOT DELETE EDL PIPELINE"
Running the Pipeline
Execute the master runner script:
python3 run_full_pipeline.py
You’ll see output like this:
═══════════════════════════════════════════════════════════
EDL PIPELINE - FULL DATA REFRESH
═══════════════════════════════════════════════════════════
📦 PHASE 1: Core Data (Foundation)
────────────────────────────────────────────
▶ Running fetch_dhan_data.py...
✅ fetch_dhan_data.py (3.2s)
▶ Running fetch_fundamental_data.py...
✅ fetch_fundamental_data.py (45.8s)
▶ Downloading NSE Listing Dates...
✅ NSE Listing Dates downloaded.
📡 PHASE 2: Data Enrichment (Fetching)
────────────────────────────────────────────
▶ Running fetch_company_filings.py...
✅ fetch_company_filings.py (120.5s)
▶ Running fetch_new_announcements.py...
✅ fetch_new_announcements.py (35.2s)
...
📊 PHASE 2.5: OHLCV History (Smart Incremental )
────────────────────────────────────────────
▶ Running fetch_all_ohlcv.py...
✅ fetch_all_ohlcv.py (180.3s) # First run: ~30 min, subsequent: ~2-5 min
🔬 PHASE 3: Base Analysis (Building Master JSON )
────────────────────────────────────────────
▶ Running bulk_market_analyzer.py...
✅ bulk_market_analyzer.py (8.1s)
✨ PHASE 4: Enrichment (Injecting into Master JSON )
────────────────────────────────────────────
▶ Running advanced_metrics_processor.py...
✅ advanced_metrics_processor.py (12.5s)
▶ Running process_earnings_performance.py...
✅ process_earnings_performance.py (5.2s)
▶ Running enrich_fno_data.py...
✅ enrich_fno_data.py (2.1s)
▶ Running process_market_breadth.py...
✅ process_market_breadth.py (4.3s)
▶ Running add_corporate_events.py...
✅ add_corporate_events.py (6.8s)
📦 PHASE 5: Compression (.json → .json.gz )
────────────────────────────────────────────
📦 Compressed: 68.5 MB → 9.2 MB (87% reduction )
🧹 CLEANUP: Removing intermediate files...
────────────────────────────────────────────
🗑️ Cleaned: 13 files + 2 dirs (58.3 MB freed )
═══════════════════════════════════════════════════════════
PIPELINE COMPLETE
═══════════════════════════════════════════════════════════
Total Time: 245.3s (4.1 min )
Successful: 18/18
Failed: 0/18
📄 Output: all_stocks_fundamental_analysis.json.gz (9.2 MB )
📦 Compression: 68.5 MB → 9.2 MB (87% smaller )
🧹 Only .json.gz + ohlcv_data/ remain. All intermediate data purged.
═══════════════════════════════════════════════════════════
Configuration Options
Edit run_full_pipeline.py to customize pipeline behavior. Open the file and locate the configuration section (lines 57-71):
# ═══════════════════════════════════════════════════
# Configuration
# ═══════════════════════════════════════════════════
# OHLCV: Auto-detect mode
# True = always fetch (incremental update: ~2-5 min if data exists, ~30 min first time)
# False = skip entirely (ADR, RVOL, ATH, % from ATH fields will be 0)
FETCH_OHLCV = True
# Set to True to also fetch standalone data (Indices, ETFs)
FETCH_OPTIONAL = False
# Auto-delete intermediate files after pipeline succeeds
# Keeps: all_stocks_fundamental_analysis.json.gz + ohlcv_data/
CLEANUP_INTERMEDIATE = True
Configuration Flags Explained
FETCH_OHLCV
FETCH_OPTIONAL
CLEANUP_INTERMEDIATE
# Default: True
# Controls lifetime OHLCV data fetching
FETCH_OHLCV = True
# ✅ First run: Downloads complete history (~30 min for 2,775 stocks)
# ✅ Subsequent runs: Auto-detects existing data, fetches only new candles (~2-5 min)
# ✅ Enables: ADR, RVOL, ATH, % from ATH, post-earnings returns
FETCH_OHLCV = False
# ⚠️ Skips OHLCV entirely (saves ~30 min on first run)
# ⚠️ Fields will be 0: ADR, RVOL, ATH, % from ATH, Returns since Earnings
Verifying Output
After the pipeline completes, verify the output:
ls -lh all_stocks_fundamental_analysis.json.gz
Expected output:
-rw-r--r-- 1 user user 9.2M Mar 3 18:45 all_stocks_fundamental_analysis.json.gz
Inspecting the Data
Decompress and inspect the JSON:
gunzip -c all_stocks_fundamental_analysis.json.gz | python3 -m json.tool | head -n 50
Or use the built-in single stock analyzer:
python3 single_stock_analyzer.py
Enter a symbol (e.g., RELIANCE) to see all 86 fields for that stock.
Understanding the Output Structure
The compressed file contains an array of stock objects. Here’s a sample record:
Identity
Fundamentals
Valuation
Technical
Volume & Liquidity
Events & News
{
"Symbol" : "RELIANCE" ,
"Name" : "Reliance Industries Ltd." ,
"Listing Date" : "29-Nov-1977" ,
"Basic Industry" : "Refineries" ,
"Sector" : "Energy" ,
"Index" : "NIFTY 50"
}
Common Workflows
Daily Incremental Update
Run the pipeline once per day after market close:
# Add to crontab for automated runs
0 18 * * 1-5 cd ~/workspace/source/"DO NOT DELETE EDL PIPELINE" && python3 run_full_pipeline.py
With FETCH_OHLCV = True, subsequent runs complete in ~4-6 minutes (smart incremental updates).
First-Time Setup (Full History)
For initial setup with complete OHLCV history:
# run_full_pipeline.py
FETCH_OHLCV = True # Downloads lifetime data (~30 min)
FETCH_OPTIONAL = True # Include indices/ETFs
CLEANUP_INTERMEDIATE = False # Keep intermediate files for inspection
Quick Refresh (No OHLCV)
For rapid fundamental-only updates:
# run_full_pipeline.py
FETCH_OHLCV = False # Skip OHLCV (~3-4 min total)
FETCH_OPTIONAL = False
CLEANUP_INTERMEDIATE = True
First Run Duration : With FETCH_OHLCV = True, the first run takes ~30-40 minutes to download lifetime daily candles for 2,775 stocks. Subsequent runs detect existing data and fetch only new candles (~2-5 min).
Troubleshooting
Pipeline Fails at Phase 1
🛑 CRITICAL: fetch_dhan_data.py failed. Cannot continue.
This script produces master_isin_map.json which ALL other scripts need.
Solution : Check your internet connection. fetch_dhan_data.py must succeed before any other scripts run. Retry:
python3 fetch_dhan_data.py
If successful, re-run the full pipeline.
Timeout Errors
⏰ fetch_all_ohlcv.py TIMED OUT (>30 min )
Solution : The default timeout is 1800s (30 min). Edit run_full_pipeline.py line 117 to increase:
result = subprocess.run(
[sys.executable, script_path],
cwd = BASE_DIR ,
text = True ,
timeout = 3600 # Increase to 60 min
)
Script Failures in Phase 4
❌ add_corporate_events.py FAILED (12.3s)
Note : The pipeline continues on enrichment failures (line 126). Check the script output for specific errors. Most Phase 4 failures are non-critical (data will be missing but pipeline completes).
Missing OHLCV Data
If ADR, RVOL, ATH fields are all 0:
Verify FETCH_OHLCV = True in run_full_pipeline.py
Check if ohlcv_data/ directory exists and contains .csv files:
Expected: ~2775 files
Re-run fetch_all_ohlcv.py manually:
python3 fetch_all_ohlcv.py
Next Steps
Explore Configuration Configure OHLCV, cleanup, and optional data
Pipeline Architecture Deep dive into 6-phase pipeline execution
API Reference Complete endpoint and field documentation
Output Schema All 86 output fields explained
Pipeline Architecture Understand the 6-phase execution model and dependency chain
API Reference Endpoint details, payloads, pagination, and rate limits
Field Reference Complete documentation of all 86 output fields
Auto-Cleanup : By default, CLEANUP_INTERMEDIATE = True deletes all intermediate files after success. Only all_stocks_fundamental_analysis.json.gz and ohlcv_data/ remain. Set to False for debugging.
Configuration First Run Subsequent Runs Output Size Full (OHLCV + Optional) ~40 min ~6 min ~9.2 MB Standard (OHLCV only) ~35 min ~4 min ~9.2 MB Quick (No OHLCV) ~4 min ~3 min ~8.5 MB
Tested on: 100 Mbps connection, 16 GB RAM, 4-core CPU