Incremental Updates

Overview

Incremental updates allow you to refresh market data daily without re-fetching the entire historical dataset. The pipeline intelligently detects existing data and only fetches what’s new. Runtime: ~2-5 minutes (vs ~30 minutes for first-time full fetch)

How Incremental Updates Work

Smart OHLCV Incremental Logic

The fetch_all_ohlcv.py script implements intelligent incremental fetching:

Check for existing data

Scans ohlcv_data/ directory for existing CSV files per stock:

ohlcv_data/
├── RELIANCE.csv
├── TCS.csv
├── INFY.csv
└── ... (2,775+ files)

Identify last recorded date

Reads the last row of each CSV to find the most recent date:

# Example: RELIANCE.csv last row
Date,Open,High,Low,Close,Volume
...
2026-03-02,2734.50,2756.00,2720.00,2745.30,8523400

Last date: 2026-03-02

Fetch only missing dates

Requests data from (last_date + 1 day) to today:

{
  "START": 1709510400,  // 2026-03-03 timestamp
  "END": 1709683200     // 2026-03-04 (today)
}

Only fetches 1-2 days of new data instead of 500+ days.

Append new rows

Appends new data to existing CSV:

2026-03-03,2745.30,2768.00,2738.50,2761.20,7234500
2026-03-04,2761.20,2780.50,2755.00,2772.80,8901200

Other Data Sources

Most other fetchers re-fetch fresh data daily (lightweight):

Script	Update Strategy	Runtime
`fetch_fundamental_data.py`	Full refresh (quarterly data changes slowly)	~18s
`fetch_company_filings.py`	Fetches last 100 filings (new ones appear daily)	~45s
`fetch_market_news.py`	Fetches last 50 news items per stock	~30s
`fetch_corporate_actions.py`	Fetches upcoming + 2yr history	~8s
`fetch_bulk_block_deals.py`	Fetches last 30 days	~5s
`fetch_circuit_stocks.py`	Live snapshot (today’s circuits)	~3s
`fetch_surveillance_lists.py`	Current ASM/GSM lists	~4s
`fetch_incremental_price_bands.py`	Today’s band changes CSV	~2s

Total Phase 1-2 runtime: ~2 minutes (same as full pipeline)

Running Daily Updates

Ensure OHLCV data exists

Verify the ohlcv_data/ directory from previous run:

cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/
ls -lh ohlcv_data/ | wc -l

Should show ~2,775+ CSV files.

Run the full pipeline

python3 run_full_pipeline.py

The pipeline automatically detects existing OHLCV data and runs incrementally.

Monitor incremental fetch

Watch Phase 2.5 output:

📊 PHASE 2.5: OHLCV History (Smart Incremental)
────────────────────────────────────────
  ▶ Running fetch_all_ohlcv.py...
  Fetching live snapshots for stocks (Today's data)...
  Processing 2775 stocks with 15 threads...
  [████████████████████████████] 2775/2775 (100%)
  Updated: 2775 | Skipped: 0 | Failed: 0
  ✅ fetch_all_ohlcv.py (142.5s)

First run: ~30 min (fetching 500+ days per stock)Incremental run: ~2-5 min (fetching 1-2 days per stock)

Verify updated output

Check the timestamp of the output file:

ls -lh all_stocks_fundamental_analysis.json.gz

-rw-r--r-- 1 user user 2.1M Mar  4 16:15 all_stocks_fundamental_analysis.json.gz

Performance Optimization

Adjust Thread Count

For faster incremental updates, increase parallelization: Edit fetch_all_ohlcv.py line 14:

MAX_THREADS = 20  # Default: 15

Trade-offs:

Higher threads = faster execution
Too many threads = rate limiting or connection errors
Recommended range: 10-25 threads

Skip OHLCV for Quick Refresh

If you only need fundamental/event updates without price data: Edit run_full_pipeline.py line 64:

FETCH_OHLCV = False

Result: Runtime drops to ~4 minutes, but these fields will be zero:

ADR (Average Daily Range)
RVOL (Relative Volume)
ATH (All-Time High) and % from ATH
All returns calculations (1D, 1W, 1M, 3M, 6M, 1Y)

Run with FETCH_OHLCV = True later to backfill.

Selective Script Execution

If you only need specific data updated, run individual scripts:

Update Only Fundamental Data

python3 fetch_dhan_data.py
python3 fetch_fundamental_data.py
python3 bulk_market_analyzer.py

Runtime: ~1 minute

Update Only Technical Indicators

python3 fetch_dhan_data.py          # For live prices
python3 fetch_advanced_indicators.py
python3 fetch_all_ohlcv.py          # Incremental
python3 advanced_metrics_processor.py

Runtime: ~3-4 minutes

Update Only Events & News

python3 fetch_company_filings.py
python3 fetch_market_news.py
python3 fetch_corporate_actions.py
python3 fetch_bulk_block_deals.py
python3 add_corporate_events.py

Runtime: ~2 minutes Note: After selective execution, run compression manually:

python3 -c "import gzip, json
with open('all_stocks_fundamental_analysis.json', 'rb') as f_in:
    with gzip.open('all_stocks_fundamental_analysis.json.gz', 'wb', compresslevel=9) as f_out:
        f_out.write(f_in.read())"

Automated Daily Updates

Using Cron (Linux/Mac)

Schedule automatic execution after market close:

Open crontab editor

crontab -e

Add pipeline job

Run daily at 4:00 PM IST (after market close):

# Mon-Fri at 4:00 PM
0 16 * * 1-5 cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/ && /usr/bin/python3 run_full_pipeline.py >> ~/pipeline.log 2>&1

Verify job is scheduled

crontab -l

Monitor execution

Check log file after 4 PM:

tail -f ~/pipeline.log

Using systemd Timer (Linux)

For more control and better logging:

Create service file

sudo nano /etc/systemd/system/edl-pipeline.service

[Unit]
Description=ChartsMaze EDL Pipeline Daily Update
After=network.target

[Service]
Type=oneshot
User=YOUR_USERNAME
WorkingDirectory=/home/YOUR_USERNAME/workspace/source/DO NOT DELETE EDL PIPELINE
ExecStart=/usr/bin/python3 run_full_pipeline.py
StandardOutput=append:/var/log/edl-pipeline.log
StandardError=append:/var/log/edl-pipeline.log

[Install]
WantedBy=multi-user.target

Create timer file

sudo nano /etc/systemd/system/edl-pipeline.timer

[Unit]
Description=Run EDL Pipeline Daily at 4 PM

[Timer]
OnCalendar=Mon-Fri 16:00:00
Persistent=true

[Install]
WantedBy=timers.target

Enable and start timer

sudo systemctl daemon-reload
sudo systemctl enable edl-pipeline.timer
sudo systemctl start edl-pipeline.timer

Check timer status

sudo systemctl status edl-pipeline.timer
systemctl list-timers edl-pipeline.timer

Monitoring & Alerts

Log Analysis

The pipeline outputs structured logs. Parse for key metrics:

# Extract runtime
grep "Total Time:" ~/pipeline.log | tail -1

# Check for failures
grep "Failed:" ~/pipeline.log | tail -1

# List failed scripts
grep "❌" ~/pipeline.log | tail -20

# Verify output file created
grep "Output:" ~/pipeline.log | tail -1

Error Notifications

Send email if pipeline fails:

#!/bin/bash
cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/

python3 run_full_pipeline.py > /tmp/pipeline_output.log 2>&1

if [ $? -ne 0 ]; then
    mail -s "EDL Pipeline Failed" [email protected] < /tmp/pipeline_output.log
fi

Slack Webhook Integration

Notify Slack on completion:

# Add to end of run_full_pipeline.py
import requests

webhook_url = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

if failed == 0:
    message = f"✅ EDL Pipeline completed in {total_time/60:.1f} min"
else:
    message = f"⚠️ EDL Pipeline completed with {failed} failures in {total_time/60:.1f} min"

requests.post(webhook_url, json={"text": message})

Data Validation

Verify Output Integrity

After each update, validate the output:

import gzip
import json
from datetime import datetime

# Decompress and load
with gzip.open('all_stocks_fundamental_analysis.json.gz', 'rb') as f:
    data = json.load(f)

# Basic checks
assert len(data) > 2700, f"Too few stocks: {len(data)}"
assert all('Symbol' in stock for stock in data), "Missing symbols"
assert all('Stock Price(₹)' in stock for stock in data), "Missing prices"

print(f"✅ Validation passed: {len(data)} stocks")

# Check data freshness
sample = data[0]
if 'Latest Quarter' in sample:
    print(f"Latest quarter data: {sample['Latest Quarter']}")

Compare with Previous Run

import gzip
import json

# Load current and previous
with gzip.open('all_stocks_fundamental_analysis.json.gz', 'rb') as f:
    current = json.load(f)

with gzip.open('all_stocks_fundamental_analysis.json.gz.backup', 'rb') as f:
    previous = json.load(f)

# Find stocks with significant changes
for curr, prev in zip(current, previous):
    if curr['Symbol'] != prev['Symbol']:
        continue
    
    curr_price = curr.get('Stock Price(₹)', 0)
    prev_price = prev.get('Stock Price(₹)', 0)
    
    if prev_price > 0:
        change_pct = ((curr_price - prev_price) / prev_price) * 100
        if abs(change_pct) > 5:  # 5% threshold
            print(f"{curr['Symbol']}: {prev_price:.2f} → {curr_price:.2f} ({change_pct:+.2f}%)")

Backup Strategy

Archive Previous Versions

Before each update, backup the previous output:

#!/bin/bash
cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/

# Create dated backup
if [ -f all_stocks_fundamental_analysis.json.gz ]; then
    cp all_stocks_fundamental_analysis.json.gz \
       "backups/all_stocks_$(date +%Y%m%d_%H%M%S).json.gz"
fi

# Keep only last 7 days
find backups/ -name "all_stocks_*.json.gz" -mtime +7 -delete

# Run pipeline
python3 run_full_pipeline.py

OHLCV Data Backup

The ohlcv_data/ directory grows over time (~200 MB). Backup weekly:

# Weekly backup (run on Sundays)
tar -czf ohlcv_backup_$(date +%Y%m%d).tar.gz ohlcv_data/
mv ohlcv_backup_*.tar.gz ~/backups/

# Keep only last 4 weeks
find ~/backups/ -name "ohlcv_backup_*.tar.gz" -mtime +28 -delete

Troubleshooting Incremental Updates

OHLCV Not Updating Incrementally

Symptom: Phase 2.5 still takes 30 minutes instead of 2-5 minutes Cause: CSV files may be corrupted or have incorrect last dates Solutions:

Check a sample CSV for integrity:
```
tail -5 ohlcv_data/RELIANCE.csv
```
Last row should have today’s or yesterday’s date.

Verify last date parsing:

import csv
with open('ohlcv_data/RELIANCE.csv', 'r') as f:
    rows = list(csv.DictReader(f))
    print(f"Last date: {rows[-1]['Date']}")

If corrupted, delete specific CSV to re-fetch:

rm ohlcv_data/RELIANCE.csv
python3 fetch_all_ohlcv.py  # Will re-fetch full history for RELIANCE

Missing Recent Data

Symptom: Latest quarter or news not showing in output Cause: Source API may not have published data yet Solutions:

Wait 1-2 hours after market close for data availability
Check source manually (Dhan ScanX website)
Re-run pipeline after delay

Stale Event Markers

Symptom: Old events still showing (e.g., “Results Recently Out” from 10 days ago) Cause: Event marker logic uses fixed time windows (7 days for results, 15 days for insider trading) Solution: This is expected behavior. Events auto-expire after their window:

Results: 7 days
Insider Trading: 15 days
Block Deals: 7 days

If marker persists beyond window, check add_corporate_events.py logic.

Incremental Fetch Skipping Dates

Symptom: Some dates missing in OHLCV (e.g., 2026-03-03 present, 2026-03-04 missing) Cause: Market holiday or trading halt Solution: This is normal. OHLCV only contains trading days. Non-trading days (weekends, holidays) are automatically skipped.

Best Practices for Incremental Updates

Run once daily after market close (after 3:30 PM IST)
Keep FETCH_OHLCV=True for continuous incremental updates
Monitor first few incremental runs to ensure 2-5 min runtime
Backup before first incremental run to test rollback
Validate output after each run with automated checks
Archive old outputs with date stamps for historical analysis
Set up failure alerts to catch issues immediately
Test manual execution before automating with cron/systemd

Next Steps

Running Full Pipeline - First-time setup guide
Single Stock Analysis - Analyze individual stocks
Troubleshooting - Common errors and fixes

Usage

Data Management

Advanced

Overview

How Incremental Updates Work

Smart OHLCV Incremental Logic

Other Data Sources

Running Daily Updates

Performance Optimization

Adjust Thread Count

Skip OHLCV for Quick Refresh

Selective Script Execution

Update Only Fundamental Data

Update Only Technical Indicators

Update Only Events & News

Automated Daily Updates

Using Cron (Linux/Mac)

Using systemd Timer (Linux)

Monitoring & Alerts

Log Analysis

Error Notifications

Slack Webhook Integration

Data Validation

Verify Output Integrity

Compare with Previous Run

Backup Strategy

Archive Previous Versions

OHLCV Data Backup

Troubleshooting Incremental Updates

OHLCV Not Updating Incrementally

Missing Recent Data

Stale Event Markers

Incremental Fetch Skipping Dates

Best Practices for Incremental Updates

Next Steps

Build docs developers (and LLMs) love

Usage

Data Management

Advanced

​Overview

​How Incremental Updates Work

​Smart OHLCV Incremental Logic

​Other Data Sources

​Running Daily Updates

​Performance Optimization

​Adjust Thread Count

​Skip OHLCV for Quick Refresh

​Selective Script Execution

​Update Only Fundamental Data

​Update Only Technical Indicators

​Update Only Events & News

​Automated Daily Updates

​Using Cron (Linux/Mac)

​Using systemd Timer (Linux)

​Monitoring & Alerts

​Log Analysis

​Error Notifications

​Slack Webhook Integration

​Data Validation

​Verify Output Integrity

​Compare with Previous Run

​Backup Strategy

​Archive Previous Versions

​OHLCV Data Backup

​Troubleshooting Incremental Updates

​OHLCV Not Updating Incrementally

​Missing Recent Data

​Stale Event Markers

​Incremental Fetch Skipping Dates

​Best Practices for Incremental Updates

​Next Steps

Build docs developers (and LLMs) love

Overview

How Incremental Updates Work

Smart OHLCV Incremental Logic

Other Data Sources

Running Daily Updates

Performance Optimization

Adjust Thread Count

Skip OHLCV for Quick Refresh

Selective Script Execution

Update Only Fundamental Data

Update Only Technical Indicators

Update Only Events & News

Automated Daily Updates

Using Cron (Linux/Mac)

Using systemd Timer (Linux)

Monitoring & Alerts

Log Analysis

Error Notifications

Slack Webhook Integration

Data Validation

Verify Output Integrity

Compare with Previous Run

Backup Strategy

Archive Previous Versions

OHLCV Data Backup

Troubleshooting Incremental Updates

OHLCV Not Updating Incrementally

Missing Recent Data

Stale Event Markers

Incremental Fetch Skipping Dates

Best Practices for Incremental Updates

Next Steps