Overview
The EDL (Exchange Data Layer) Pipeline is a comprehensive data integration system that processes market data from Dhan/NSE endpoints into a unified 86-field schema. The pipeline executes 16 core scripts in strict dependency order and producesall_stocks_fundamental_analysis.json.gz in approximately 4 minutes.
Single Command Execution:
Architecture Diagram
Phase 1: Core Data (Foundation)
Purpose
Establishes the foundation by fetching the complete market dataset and fundamental financial data for all stocks.Scripts Executed
1. fetch_dhan_data.py
- Output:
dhan_data_response.json+master_isin_map.json - Records: ~2,775 stocks
- Critical Dependency: All Phase 2 scripts depend on
master_isin_map.json
2. fetch_fundamental_data.py
- Output:
fundamental_data.json(35 MB) - Contains: Quarterly results, ratios, P&L statements, balance sheets
- Timeout: 30s per ISIN
3. NSE Listing Dates CSV Download
- Source:
https://nsearchives.nseindia.com/content/equities/EQUITY_L.csv - Output:
nse_equity_list.csv - Used by: Phase 3 analysis to populate “Listing Date” field
Phase 2: Data Enrichment (Fetching)
Purpose
Enriches the core dataset with regulatory filings, news, technical indicators, corporate actions, and market metadata.Scripts Executed (10 scripts)
| Script | Output | Dependencies |
|---|---|---|
fetch_company_filings.py | company_filings/{SYMBOL}_filings.json | master_isin_map.json |
fetch_new_announcements.py | all_company_announcements.json | master_isin_map.json |
fetch_advanced_indicators.py | advanced_indicator_data.json (8.3 MB) | master_isin_map.json (requires Sid) |
fetch_market_news.py | market_news/{SYMBOL}_news.json | master_isin_map.json |
fetch_corporate_actions.py | upcoming/history_corporate_actions.json | None (fetches via date filters) |
fetch_surveillance_lists.py | nse_asm_list.json, nse_gsm_list.json | None (Google Sheets Gviz) |
fetch_circuit_stocks.py | upper/lower_circuit_stocks.json | None |
fetch_bulk_block_deals.py | bulk_block_deals.json | None |
fetch_incremental_price_bands.py | incremental_price_bands.json | None (NSE CSV) |
fetch_complete_price_bands.py | complete_price_bands.json | None (NSE CSV) |
fetch_all_indices.py | all_indices_list.json | None |
Threading & Performance
- Company Filings: 20 threads
- Announcements: 40 threads
- Advanced Indicators: 50 threads
- Market News: 15 threads
All Phase 2 scripts can partially fail without stopping the pipeline. The pipeline continues and marks failed scripts for review in the final report.
Phase 2.5: OHLCV History (Smart Incremental)
Purpose
Fetches historical daily OHLCV (Open, High, Low, Close, Volume) data for all stocks and indices. This phase is optional and controlled by theFETCH_OHLCV flag.
Scripts Executed
1. fetch_all_ohlcv.py (Stocks)
- Output:
ohlcv_data/{SYMBOL}.csv - Threads: 15
- Start Date: October 31, 1976 (timestamp: 215634600)
- Duration: ~2-5 min (incremental), ~30 min (first run)
- Interval: Daily candles
2. fetch_indices_ohlcv.py (Indices)
- Output:
ohlcv_data/indices/{INDEX}.csv - Optimized: High-speed specialized fetcher
Configuration
Phase 3: Base Analysis (Building Master JSON)
Purpose
Builds the foundationalall_stocks_fundamental_analysis.json structure by combining fundamental data, technical indicators, and market metadata.
Script Executed
bulk_market_analyzer.py
- Input Files:
fundamental_data.jsondhan_data_response.jsonadvanced_indicator_data.jsonnse_equity_list.csv
- Output:
all_stocks_fundamental_analysis.json(base structure) - Fields Created: ~60 fields (see Output Schema)
Processing Logic
-
Load Data Sources
- Fundamental data (quarterly financials)
- Technical data (price, returns, RSI)
- Advanced indicators (Pivots, EMA/SMA)
- Listing dates from NSE CSV
-
Calculate Metrics
- QoQ/YoY % changes (Net Profit, EPS, Sales, OPM)
- 5-year Sales CAGR
- Valuation ratios (P/E, PEG, Forward P/E, D/E)
- Shareholding changes (FII, DII)
- Free float calculation
-
Assemble JSON
- 2,775 stock records
- 60+ fields per stock
- ~40 MB uncompressed JSON
Phase 4: Field Injection (Order Matters!)
Purpose
Enriches the master JSON with advanced metrics, earnings tracking, F&O data, market breadth, and event markers. Execution order is critical as each script modifies the same JSON file.Scripts Executed (7 scripts, sequential)
1. advanced_metrics_processor.py
Fields Injected:
5/14/20/30 Days MA ADR(%)— Average Daily RangeRVOL— Relative Volume (vs 20-day avg)% from ATH— Distance from All-Time HighDaily Rupee Turnover 20/50/100(Cr.)200 Days EMA Volume% from 52W High 200 Days EMA Volume
ohlcv_data/*.csv
2. process_earnings_performance.py
Fields Injected:
Quarterly Results DateReturns since Earnings(%)Max Returns since Earnings(%)
company_filings/*.json, ohlcv_data/*.csv
3. enrich_fno_data.py
Fields Injected:
F&O Flag(Yes/No)Lot SizeNext Expiry Date
fno_lot_sizes_cleaned.json, fno_expiry_calendar.json
4. process_market_breadth.py
Output: market_breadth.csv
Generates:
- Sector-wise advance/decline ratios
- 52-week high/low distribution
- SMA 200 Above/Below counts
5. process_historical_market_breadth.py
Output: Historical market breadth line charts
Purpose: Time-series analysis of market breadth metrics
6. add_corporate_events.py (MUST BE LAST)
Fields Injected:
Event Markers— Event icons with triggersRecent Announcements— Top 5 regulatory filingsNews Feed— Top 5 media news itemsCircuit Limit— Current price band
Phase 5: Compression
Purpose
Compresses output files to reduce storage and transfer bandwidth.Process
Compression Stats
- Algorithm: gzip (level 9)
- Typical Ratio: 70-80% size reduction
- Example: 40 MB → 8-10 MB
Phase 6: Optional Standalone Data
Purpose
Fetches additional market data that is NOT included in the master JSON but available for separate analysis.Scripts (when FETCH_OPTIONAL = True)
| Script | Output | Records |
|---|---|---|
fetch_all_indices.py | all_indices_list.json | ~194 indices |
fetch_etf_data.py | etf_data_response.json | ~361 ETFs |
These outputs are standalone and not consumed by the main pipeline. Use them for custom analysis or separate dashboards.
Cleanup & Intermediate Files
Auto-Cleanup (when CLEANUP_INTERMEDIATE = True)
After pipeline success, the following intermediate files are automatically deleted:
Files Removed:
master_isin_map.jsondhan_data_response.jsonfundamental_data.jsonadvanced_indicator_data.jsonall_company_announcements.jsonupcoming/history_corporate_actions.jsonnse_asm_list.json,nse_gsm_list.jsonbulk_block_deals.jsonupper/lower_circuit_stocks.jsonincremental/complete_price_bands.jsonnse_equity_list.csvall_stocks_fundamental_analysis.json(raw, before compression)
company_filings/market_news/
all_stocks_fundamental_analysis.json.gz✅ohlcv_data/directory ✅
Cleanup Stats
Typically frees 50-100 MB of disk space.Configuration Flags
FETCH_OHLCV (Default: True)
FETCH_OPTIONAL (Default: False)
CLEANUP_INTERMEDIATE (Default: True)
Execution Timeline
Typical Runtime (with FETCH_OHLCV = True, incremental)
| Phase | Duration | Scripts |
|---|---|---|
| Phase 1 | ~30s | 2 scripts |
| Phase 2 | ~90s | 11 scripts (parallel threading) |
| Phase 2.5 | ~120s | 2 scripts (incremental OHLCV) |
| Phase 3 | ~20s | 1 script |
| Phase 4 | ~60s | 7 scripts (sequential) |
| Phase 5 | ~5s | Compression |
| Total | ~5 min | 23 scripts |
First-Time Run (with FETCH_OHLCV = True, full download)
- Phase 2.5 Duration: ~30 minutes (lifetime OHLCV for 2,775 stocks)
- Total Duration: ~35 minutes
Error Handling & Resilience
Critical Failures (Pipeline Stops)
fetch_dhan_data.pyfails → Nomaster_isin_map.jsonbulk_market_analyzer.pyfails → No base JSON for Phase 4
Non-Critical Failures (Pipeline Continues)
- Any Phase 2 enrichment script can fail
- Phase 4 scripts continue on error (fields may be missing)
- Final report shows all failed scripts
Timeout Policy
- Script Timeout: 30 minutes per script
- Request Timeout: Varies by endpoint (10-30s)
Pipeline Output
Final Artifacts
Final Report Example
Next Steps
Data Sources
Explore all 12+ Dhan/NSE endpoints used in the pipeline
Output Schema
View the complete 86-field output schema structure