Skip to main content

Overview

Incremental updates allow you to refresh market data daily without re-fetching the entire historical dataset. The pipeline intelligently detects existing data and only fetches what’s new. Runtime: ~2-5 minutes (vs ~30 minutes for first-time full fetch)

How Incremental Updates Work

Smart OHLCV Incremental Logic

The fetch_all_ohlcv.py script implements intelligent incremental fetching:
1

Check for existing data

Scans ohlcv_data/ directory for existing CSV files per stock:
ohlcv_data/
├── RELIANCE.csv
├── TCS.csv
├── INFY.csv
└── ... (2,775+ files)
2

Identify last recorded date

Reads the last row of each CSV to find the most recent date:
# Example: RELIANCE.csv last row
Date,Open,High,Low,Close,Volume
...
2026-03-02,2734.50,2756.00,2720.00,2745.30,8523400
Last date: 2026-03-02
3

Fetch only missing dates

Requests data from (last_date + 1 day) to today:
{
  "START": 1709510400,  // 2026-03-03 timestamp
  "END": 1709683200     // 2026-03-04 (today)
}
Only fetches 1-2 days of new data instead of 500+ days.
4

Append new rows

Appends new data to existing CSV:
2026-03-03,2745.30,2768.00,2738.50,2761.20,7234500
2026-03-04,2761.20,2780.50,2755.00,2772.80,8901200

Other Data Sources

Most other fetchers re-fetch fresh data daily (lightweight):
ScriptUpdate StrategyRuntime
fetch_fundamental_data.pyFull refresh (quarterly data changes slowly)~18s
fetch_company_filings.pyFetches last 100 filings (new ones appear daily)~45s
fetch_market_news.pyFetches last 50 news items per stock~30s
fetch_corporate_actions.pyFetches upcoming + 2yr history~8s
fetch_bulk_block_deals.pyFetches last 30 days~5s
fetch_circuit_stocks.pyLive snapshot (today’s circuits)~3s
fetch_surveillance_lists.pyCurrent ASM/GSM lists~4s
fetch_incremental_price_bands.pyToday’s band changes CSV~2s
Total Phase 1-2 runtime: ~2 minutes (same as full pipeline)

Running Daily Updates

1

Ensure OHLCV data exists

Verify the ohlcv_data/ directory from previous run:
cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/
ls -lh ohlcv_data/ | wc -l
Should show ~2,775+ CSV files.
2

Run the full pipeline

python3 run_full_pipeline.py
The pipeline automatically detects existing OHLCV data and runs incrementally.
3

Monitor incremental fetch

Watch Phase 2.5 output:
📊 PHASE 2.5: OHLCV History (Smart Incremental)
────────────────────────────────────────
  ▶ Running fetch_all_ohlcv.py...
  Fetching live snapshots for stocks (Today's data)...
  Processing 2775 stocks with 15 threads...
  [████████████████████████████] 2775/2775 (100%)
  Updated: 2775 | Skipped: 0 | Failed: 0
  ✅ fetch_all_ohlcv.py (142.5s)
First run: ~30 min (fetching 500+ days per stock)Incremental run: ~2-5 min (fetching 1-2 days per stock)
4

Verify updated output

Check the timestamp of the output file:
ls -lh all_stocks_fundamental_analysis.json.gz
-rw-r--r-- 1 user user 2.1M Mar  4 16:15 all_stocks_fundamental_analysis.json.gz

Performance Optimization

Adjust Thread Count

For faster incremental updates, increase parallelization: Edit fetch_all_ohlcv.py line 14:
MAX_THREADS = 20  # Default: 15
Trade-offs:
  • Higher threads = faster execution
  • Too many threads = rate limiting or connection errors
  • Recommended range: 10-25 threads

Skip OHLCV for Quick Refresh

If you only need fundamental/event updates without price data: Edit run_full_pipeline.py line 64:
FETCH_OHLCV = False
Result: Runtime drops to ~4 minutes, but these fields will be zero:
  • ADR (Average Daily Range)
  • RVOL (Relative Volume)
  • ATH (All-Time High) and % from ATH
  • All returns calculations (1D, 1W, 1M, 3M, 6M, 1Y)
Run with FETCH_OHLCV = True later to backfill.

Selective Script Execution

If you only need specific data updated, run individual scripts:

Update Only Fundamental Data

python3 fetch_dhan_data.py
python3 fetch_fundamental_data.py
python3 bulk_market_analyzer.py
Runtime: ~1 minute

Update Only Technical Indicators

python3 fetch_dhan_data.py          # For live prices
python3 fetch_advanced_indicators.py
python3 fetch_all_ohlcv.py          # Incremental
python3 advanced_metrics_processor.py
Runtime: ~3-4 minutes

Update Only Events & News

python3 fetch_company_filings.py
python3 fetch_market_news.py
python3 fetch_corporate_actions.py
python3 fetch_bulk_block_deals.py
python3 add_corporate_events.py
Runtime: ~2 minutes Note: After selective execution, run compression manually:
python3 -c "import gzip, json
with open('all_stocks_fundamental_analysis.json', 'rb') as f_in:
    with gzip.open('all_stocks_fundamental_analysis.json.gz', 'wb', compresslevel=9) as f_out:
        f_out.write(f_in.read())"

Automated Daily Updates

Using Cron (Linux/Mac)

Schedule automatic execution after market close:
1

Open crontab editor

crontab -e
2

Add pipeline job

Run daily at 4:00 PM IST (after market close):
# Mon-Fri at 4:00 PM
0 16 * * 1-5 cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/ && /usr/bin/python3 run_full_pipeline.py >> ~/pipeline.log 2>&1
3

Verify job is scheduled

crontab -l
4

Monitor execution

Check log file after 4 PM:
tail -f ~/pipeline.log

Using systemd Timer (Linux)

For more control and better logging:
1

Create service file

sudo nano /etc/systemd/system/edl-pipeline.service
[Unit]
Description=ChartsMaze EDL Pipeline Daily Update
After=network.target

[Service]
Type=oneshot
User=YOUR_USERNAME
WorkingDirectory=/home/YOUR_USERNAME/workspace/source/DO NOT DELETE EDL PIPELINE
ExecStart=/usr/bin/python3 run_full_pipeline.py
StandardOutput=append:/var/log/edl-pipeline.log
StandardError=append:/var/log/edl-pipeline.log

[Install]
WantedBy=multi-user.target
2

Create timer file

sudo nano /etc/systemd/system/edl-pipeline.timer
[Unit]
Description=Run EDL Pipeline Daily at 4 PM

[Timer]
OnCalendar=Mon-Fri 16:00:00
Persistent=true

[Install]
WantedBy=timers.target
3

Enable and start timer

sudo systemctl daemon-reload
sudo systemctl enable edl-pipeline.timer
sudo systemctl start edl-pipeline.timer
4

Check timer status

sudo systemctl status edl-pipeline.timer
systemctl list-timers edl-pipeline.timer

Monitoring & Alerts

Log Analysis

The pipeline outputs structured logs. Parse for key metrics:
# Extract runtime
grep "Total Time:" ~/pipeline.log | tail -1

# Check for failures
grep "Failed:" ~/pipeline.log | tail -1

# List failed scripts
grep "❌" ~/pipeline.log | tail -20

# Verify output file created
grep "Output:" ~/pipeline.log | tail -1

Error Notifications

Send email if pipeline fails:
#!/bin/bash
cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/

python3 run_full_pipeline.py > /tmp/pipeline_output.log 2>&1

if [ $? -ne 0 ]; then
    mail -s "EDL Pipeline Failed" [email protected] < /tmp/pipeline_output.log
fi

Slack Webhook Integration

Notify Slack on completion:
# Add to end of run_full_pipeline.py
import requests

webhook_url = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

if failed == 0:
    message = f"✅ EDL Pipeline completed in {total_time/60:.1f} min"
else:
    message = f"⚠️ EDL Pipeline completed with {failed} failures in {total_time/60:.1f} min"

requests.post(webhook_url, json={"text": message})

Data Validation

Verify Output Integrity

After each update, validate the output:
import gzip
import json
from datetime import datetime

# Decompress and load
with gzip.open('all_stocks_fundamental_analysis.json.gz', 'rb') as f:
    data = json.load(f)

# Basic checks
assert len(data) > 2700, f"Too few stocks: {len(data)}"
assert all('Symbol' in stock for stock in data), "Missing symbols"
assert all('Stock Price(₹)' in stock for stock in data), "Missing prices"

print(f"✅ Validation passed: {len(data)} stocks")

# Check data freshness
sample = data[0]
if 'Latest Quarter' in sample:
    print(f"Latest quarter data: {sample['Latest Quarter']}")

Compare with Previous Run

import gzip
import json

# Load current and previous
with gzip.open('all_stocks_fundamental_analysis.json.gz', 'rb') as f:
    current = json.load(f)

with gzip.open('all_stocks_fundamental_analysis.json.gz.backup', 'rb') as f:
    previous = json.load(f)

# Find stocks with significant changes
for curr, prev in zip(current, previous):
    if curr['Symbol'] != prev['Symbol']:
        continue
    
    curr_price = curr.get('Stock Price(₹)', 0)
    prev_price = prev.get('Stock Price(₹)', 0)
    
    if prev_price > 0:
        change_pct = ((curr_price - prev_price) / prev_price) * 100
        if abs(change_pct) > 5:  # 5% threshold
            print(f"{curr['Symbol']}: {prev_price:.2f}{curr_price:.2f} ({change_pct:+.2f}%)")

Backup Strategy

Archive Previous Versions

Before each update, backup the previous output:
#!/bin/bash
cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/

# Create dated backup
if [ -f all_stocks_fundamental_analysis.json.gz ]; then
    cp all_stocks_fundamental_analysis.json.gz \
       "backups/all_stocks_$(date +%Y%m%d_%H%M%S).json.gz"
fi

# Keep only last 7 days
find backups/ -name "all_stocks_*.json.gz" -mtime +7 -delete

# Run pipeline
python3 run_full_pipeline.py

OHLCV Data Backup

The ohlcv_data/ directory grows over time (~200 MB). Backup weekly:
# Weekly backup (run on Sundays)
tar -czf ohlcv_backup_$(date +%Y%m%d).tar.gz ohlcv_data/
mv ohlcv_backup_*.tar.gz ~/backups/

# Keep only last 4 weeks
find ~/backups/ -name "ohlcv_backup_*.tar.gz" -mtime +28 -delete

Troubleshooting Incremental Updates

OHLCV Not Updating Incrementally

Symptom: Phase 2.5 still takes 30 minutes instead of 2-5 minutes Cause: CSV files may be corrupted or have incorrect last dates Solutions:
  1. Check a sample CSV for integrity:
    tail -5 ohlcv_data/RELIANCE.csv
    
    Last row should have today’s or yesterday’s date.
  2. Verify last date parsing:
    import csv
    with open('ohlcv_data/RELIANCE.csv', 'r') as f:
        rows = list(csv.DictReader(f))
        print(f"Last date: {rows[-1]['Date']}")
    
  3. If corrupted, delete specific CSV to re-fetch:
    rm ohlcv_data/RELIANCE.csv
    python3 fetch_all_ohlcv.py  # Will re-fetch full history for RELIANCE
    

Missing Recent Data

Symptom: Latest quarter or news not showing in output Cause: Source API may not have published data yet Solutions:
  • Wait 1-2 hours after market close for data availability
  • Check source manually (Dhan ScanX website)
  • Re-run pipeline after delay

Stale Event Markers

Symptom: Old events still showing (e.g., “Results Recently Out” from 10 days ago) Cause: Event marker logic uses fixed time windows (7 days for results, 15 days for insider trading) Solution: This is expected behavior. Events auto-expire after their window:
  • Results: 7 days
  • Insider Trading: 15 days
  • Block Deals: 7 days
If marker persists beyond window, check add_corporate_events.py logic.

Incremental Fetch Skipping Dates

Symptom: Some dates missing in OHLCV (e.g., 2026-03-03 present, 2026-03-04 missing) Cause: Market holiday or trading halt Solution: This is normal. OHLCV only contains trading days. Non-trading days (weekends, holidays) are automatically skipped.

Best Practices for Incremental Updates

  1. Run once daily after market close (after 3:30 PM IST)
  2. Keep FETCH_OHLCV=True for continuous incremental updates
  3. Monitor first few incremental runs to ensure 2-5 min runtime
  4. Backup before first incremental run to test rollback
  5. Validate output after each run with automated checks
  6. Archive old outputs with date stamps for historical analysis
  7. Set up failure alerts to catch issues immediately
  8. Test manual execution before automating with cron/systemd

Next Steps

Build docs developers (and LLMs) love