OHLCV History Scripts

These two scripts download complete historical Open, High, Low, Close, Volume (OHLCV) data for all NSE stocks and indices. They support smart incremental updates and are optimized for minimal API load.

Overview

Script	Output Directory	Records	Update Mode
`fetch_all_ohlcv.py`	`ohlcv_data/`	~2,775 CSV files	Incremental (2-5 min)
`fetch_indices_ohlcv.py`	`indices_ohlcv_data/`	~194 CSV files	Incremental (30-60 sec)

fetch_all_ohlcv.py

What It Does

Fetches lifetime daily OHLCV data for all NSE stocks, starting from the earliest available date (some stocks go back to 1990s). Uses smart incremental updates to only fetch missing dates.

API Endpoint

URL: https://openweb-ticks.dhan.co/getDataH
Method: POST

Payload:
{
  "EXCH": "NSE",
  "SYM": "RELIANCE",
  "SEG": "E",
  "INST": "EQUITY",
  "SEC_ID": "2885",
  "EXPCODE": 0,
  "INTERVAL": "D",          // Daily candles
  "START": 215634600,        // Oct 31, 1976 (forces max history)
  "END": 1735689600          // Current timestamp
}

Parameters Explained:

START: Unix timestamp (default: 215634600 = Oct 31, 1976)
- This forces the API to return all available history
INTERVAL: D for daily candles (use W for weekly, M for monthly)
SEC_ID: Security ID from master_isin_map.json (created by fetch_dhan_data.py)

Smart Incremental Logic

Check Existing Data

Reads the last date from ohlcv_data/{SYMBOL}.csv

last_date = rows[-1]["Date"]  # e.g., "2024-01-15"
target_start = int(last_dt.timestamp()) + 86400

Fetch Missing Chunks

Downloads data in 180-day chunks from last date to today:

CHUNK_DAYS = 180
while chunk_ptr > target_start:
    c_start = max(target_start, chunk_ptr - (180 * 86400))
    fetch_chunk(c_start, chunk_ptr)
    chunk_ptr = c_start - 86400

Merge Live Data

Fetches today’s live snapshot from ScanX API:

live_snapshot = get_live_snapshots()  # Real-time OHLC
today_row = {
    'Date': '2024-03-03',
    'Open': snapshot['Open'],
    'High': snapshot['High'],
    'Low': snapshot['Low'],
    'Close': snapshot['Ltp'],
    'Volume': snapshot['Volume']
}

Deduplicate & Save

Merges historical + live data, removes duplicates by date:

merged = {r['Date']: r for r in existing_rows + new_rows}
final_rows = sorted(merged.values(), key=lambda x: x['Date'])

Output Format

Directory: ohlcv_data/ Files: One CSV per stock (e.g., RELIANCE.csv, TCS.csv)

Date,Open,High,Low,Close,Volume
2020-01-01,1450.50,1468.30,1442.10,1461.75,8234567
2020-01-02,1463.20,1475.80,1458.40,1472.35,7654321
2020-01-03,1474.10,1489.60,1471.25,1485.90,9123456
...
2024-03-03,2543.80,2567.20,2538.50,2555.40,6789012

Volume = 0 for some older dates (pre-2010) where NSE did not publish volume data.

Configuration

Edit these constants in fetch_all_ohlcv.py (lines 11-16):

CHUNK_DAYS = 180     # Fetch in 180-day chunks (reduce for slower APIs)
MAX_THREADS = 15     # Concurrent downloads (increase if API is fast)
SCANX_URL = "https://ow-scanx-analytics.dhan.co/customscan/fetchdt"
TICK_API_URL = "https://openweb-ticks.dhan.co/getDataH"

Performance Benchmarks

Scenario	Time Taken	API Calls
First Run (Full History)	~25-35 min	~8,000 chunks
Daily Update (Next Day)	~2-3 min	~2,775 stocks
Weekly Update (7 Days)	~2-5 min	~2,775 stocks
Monthly Update (30 Days)	~3-6 min	~5,500 chunks

Run this script once per day after market close (3:45 PM IST) to keep data fresh.

Usage

python3 fetch_all_ohlcv.py

Output (First Run):

Fetching live snapshots for stocks (Today's data)...
Syncing OHLCV for 2775 stocks (Hybrid Multi-Chunk Mode)...
Done! Updated: 2753 | UpToDate: 22 | Errors: 0

Output (Incremental Update):

Fetching live snapshots for stocks (Today's data)...
Syncing OHLCV for 2775 stocks (Hybrid Multi-Chunk Mode)...
Done! Updated: 2775 | UpToDate: 0 | Errors: 0

Error Handling

def fetch_history_chunk(payload):
    try:
        response = requests.post(TICK_API_URL, json=payload, headers=get_headers(), timeout=15)
        if response.status_code == 200:
            data = response.json().get("data", {})
            # Parse OHLCV arrays
            return rows
    except:
        pass  # Silent fail, counted in errors
    return []

If a stock has no data (newly listed or delisted), the CSV will not be created. Check Errors count in output.

fetch_indices_ohlcv.py

What It Does

Fetches lifetime daily OHLCV data for all NSE indices (NIFTY 50, NIFTY Bank, etc.) with the same incremental update logic.

Differences from Stock OHLCV

Feature	`fetch_all_ohlcv.py`	`fetch_indices_ohlcv.py`
Input File	`dhan_data_response.json`	`all_indices_list.json`
Output Directory	`ohlcv_data/`	`indices_ohlcv_data/`
Threads	15	60 (indices are faster)
Chunk Size	180 days	120 days
Volume Data	Real trading volume	0 (indices don’t have volume)
Filename	`RELIANCE.csv`	`Nifty_50.csv` (sanitized)

API Endpoint

URL: https://openweb-ticks.dhan.co/getDataH
Method: POST

Payload:
{
  "EXCH": "IDX",
  "SYM": "Nifty 50",
  "SEG": "IDX",
  "INST": "IDX",
  "SEC_ID": "13",
  "EXPCODE": 0,
  "INTERVAL": "D",
  "START": 215634600,
  "END": 1735689600
}

Filename Sanitization

Index names contain spaces/special chars, so filenames are sanitized:

def get_safe_sym(sym):
    return "".join([c if c.isalnum() else "_" for c in sym])

Examples:
"Nifty 50"           → Nifty_50.csv
"Nifty Bank"         → Nifty_Bank.csv
"NIFTY Alpha 50"     → NIFTY_Alpha_50.csv

Output Format

Directory: indices_ohlcv_data/ Example: Nifty_50.csv

Date,Open,High,Low,Close,Volume
2015-01-01,8282.70,8328.50,8276.95,8314.20,0
2015-01-02,8314.20,8343.80,8298.40,8334.70,0
2015-01-03,8334.70,8375.15,8312.90,8368.50,0
...
2024-03-03,21450.30,21523.80,21398.50,21487.65,0

Volume is always 0 for indices because NSE does not publish index volume. For constituent volumes, use stock OHLCV data.

Configuration

Edit these constants in fetch_indices_ohlcv.py (lines 18-19):

CHUNK_DAYS = 120     # Smaller chunks for faster APIs
MAX_THREADS = 60     # More threads (indices are lightweight)

Performance Benchmarks

Scenario	Time Taken	API Calls
First Run (Full History)	~30-60 sec	~600 chunks
Daily Update	~10-15 sec	~194 indices
Weekly Update	~15-20 sec	~300 chunks

Usage

python3 fetch_indices_ohlcv.py

Output:

Checking 194 indices for sync...
Executing 612 API chunks for history...
Merging with Live Snapshots and saving CSVs...
Successfully updated all index CSVs with Today's Live data.

Pipeline Integration

Both scripts are optionally included in the pipeline based on the FETCH_OHLCV flag.

Enable OHLCV in Pipeline

Edit run_full_pipeline.py (line 64):

# OHLCV: Auto-detect mode
# True = always fetch (incremental update: ~2-5 min if data exists)
# False = skip entirely (ADR, RVOL, ATH fields will be 0)
FETCH_OHLCV = True  # Change from False to True

Pipeline Execution (Phase 2.5)

python3 run_full_pipeline.py

Output:

📊 PHASE 2.5: OHLCV History (Smart Incremental)
────────────────────────────────────────────────
  ▶ Running fetch_all_ohlcv.py...
  ✅ fetch_all_ohlcv.py (142.3s)
  ▶ Running fetch_indices_ohlcv.py...
  ✅ fetch_indices_ohlcv.py (18.7s)

If FETCH_OHLCV = False, Phase 2.5 is skipped, and fields like ADR, RVOL, % from ATH will be 0 in the final JSON.

Why OHLCV Data is Critical

The OHLCV CSVs power 14 advanced fields in the final output:

Volatility Metrics (4 fields)

Calculated from High/Low ranges:

5 Days MA ADR(%)
14 Days MA ADR(%)
20 Days MA ADR(%)
30 Days MA ADR(%)

Formula:

ADR = ((High - Low) / Close) * 100
5-Day MA ADR = Average of last 5 days' ADR

Volume Metrics (5 fields)

Calculated from Volume column:

RVOL (Relative Volume vs 20-day avg)
200 Days EMA Volume
% from 52W High 200 Days EMA Volume
Daily Rupee Turnover 20/50/100(Cr.)
30 Days Average Rupee Volume(Cr.)

Formula:

RVOL = Today's Volume / Avg(Last 20 Days Volume)
Rupee Turnover = Volume * Close / 10,000,000

All-Time High Tracking (2 fields)

% from ATH (Distance from all-time high)
Implicitly used in Returns since Earnings(%) calculation

Formula:

ATH = max(Close for all dates)
% from ATH = ((Current Price - ATH) / ATH) * 100

Earnings Performance (2 fields)

Requires OHLCV to calculate post-earnings returns:

Returns since Earnings(%)
Max Returns since Earnings(%)

Formula:

Earnings Date = "2024-01-20"
Pre-Earnings Close = OHLCV["2024-01-19"]["Close"]
Returns = ((Current Price - Pre-Earnings Close) / Pre-Earnings Close) * 100

Technical Validation (1 field)

Cross-validates SMA/EMA levels from fetch_advanced_indicators.py

If OHLCV data is missing, advanced_metrics_processor.py and process_earnings_performance.py will set these fields to 0 or N/A.

Data Quality Checks

Both scripts validate data integrity:

# 1. Date Format Validation
dt_str = t if isinstance(t, str) else datetime.fromtimestamp(t).strftime("%Y-%m-%d")

# 2. Volume Sanitization (no negative values)
if isinstance(vol, (int, float)) and vol < 0: 
    vol = 0

# 3. Deduplication by Date
merged = {r['Date']: r for r in existing_rows + new_rows}

# 4. Chronological Sorting
final_rows = sorted(merged.values(), key=lambda x: x['Date'])

Storage Requirements

Dataset	Files	Avg Size/File	Total Size
Stocks OHLCV	~2,775	50-150 KB	~300-400 MB
Indices OHLCV	~194	30-100 KB	~10-20 MB
Total	~2,969	-	~320-420 MB

If using CLEANUP_INTERMEDIATE = True in the pipeline, OHLCV data is preserved after cleanup (it’s not considered intermediate).

Troubleshooting

Error: master_isin_map.json not found

Cause: fetch_dhan_data.py has not been run.Solution:

python3 fetch_dhan_data.py
python3 fetch_all_ohlcv.py

Error: all_indices_list.json not found

Cause: fetch_all_indices.py has not been run.Solution:

python3 fetch_all_indices.py
python3 fetch_indices_ohlcv.py

High error count (>50 stocks failed)

Cause: API rate limiting or network issues.Solution:

Reduce MAX_THREADS from 15 to 5
Increase timeout from 15s to 30s
Re-run script (incremental mode will fill gaps)

CSV has gaps in dates (weekends/holidays missing)

This is normal. NSE only publishes data for trading days. Use:

import pandas as pd
df = pd.read_csv('RELIANCE.csv', parse_dates=['Date'])
df = df.set_index('Date').asfreq('D', method='ffill')  # Forward-fill

Next Steps

F&O Data Scripts

Fetch Futures & Options data

Indices & ETF Scripts

Fetch market indices and ETF data

Pipeline Flags

Configure FETCH_OHLCV and FETCH_OPTIONAL

Advanced Metrics

Learn how OHLCV powers ADR, RVOL, ATH

Get Started

Core Concepts

Pipeline Scripts

Standalone Scripts

Configuration

Overview

fetch_all_ohlcv.py

What It Does

API Endpoint

Smart Incremental Logic

Output Format

Configuration

Performance Benchmarks

Usage

Error Handling

fetch_indices_ohlcv.py

What It Does

Differences from Stock OHLCV

API Endpoint

Filename Sanitization

Output Format

Configuration

Performance Benchmarks

Usage

Pipeline Integration

Enable OHLCV in Pipeline

Pipeline Execution (Phase 2.5)

Why OHLCV Data is Critical

Data Quality Checks

Storage Requirements

Troubleshooting

Next Steps

F&O Data Scripts

Indices & ETF Scripts

Pipeline Flags

Advanced Metrics

Build docs developers (and LLMs) love

Get Started

Core Concepts

Pipeline Scripts

Standalone Scripts

Configuration

​Overview

​fetch_all_ohlcv.py

​What It Does

​API Endpoint

​Smart Incremental Logic

​Output Format

​Configuration

​Performance Benchmarks

​Usage

​Error Handling

​fetch_indices_ohlcv.py

​What It Does

​Differences from Stock OHLCV

​API Endpoint

​Filename Sanitization

​Output Format

​Configuration

​Performance Benchmarks

​Usage

​Pipeline Integration

​Enable OHLCV in Pipeline

​Pipeline Execution (Phase 2.5)

​Why OHLCV Data is Critical

​Data Quality Checks

​Storage Requirements

​Troubleshooting

​Next Steps

F&O Data Scripts

Indices & ETF Scripts

Pipeline Flags

Advanced Metrics

Build docs developers (and LLMs) love

Overview

fetch_all_ohlcv.py

What It Does

API Endpoint

Smart Incremental Logic

Output Format

Configuration

Performance Benchmarks

Usage

Error Handling

fetch_indices_ohlcv.py

What It Does

Differences from Stock OHLCV

API Endpoint

Filename Sanitization

Output Format

Configuration

Performance Benchmarks

Usage

Pipeline Integration

Enable OHLCV in Pipeline

Pipeline Execution (Phase 2.5)

Why OHLCV Data is Critical

Data Quality Checks

Storage Requirements

Troubleshooting

Next Steps