Phase 2 - Data Enrichment - ChartsMaze EDL Pipeline

Overview

Phase 2 consists of 11 parallel data enrichment scripts that fetch additional context for each stock. All scripts depend on master_isin_map.json but are independent of each other, enabling concurrent execution.

Script Execution Pattern

All Phase 2 scripts follow a common pattern:

Load master_isin_map.json - Get list of all ISINs/Symbols
Multi-threaded fetching - Use ThreadPoolExecutor for parallel requests
Rate limiting - Respect API limits with delays and retry logic
Output to JSON - Save structured data for Phase 4 processing

1. fetch_company_filings.py

Purpose: Fetches regulatory filings (quarterly results, board meetings, announcements) from both legacy and new LODR endpoints. Configuration:

MAX_THREADS = 20  # Fast with 20 threads
FORCE_UPDATE = True  # Set to True to refresh all filings

Dual-Endpoint Strategy: This script merges data from TWO API endpoints to ensure complete coverage:

url1 = "https://ow-static-scanx.dhan.co/staticscanx/company_filings"
payload1 = {
    "data": {
        "isin": "INE002A01018",
        "pg_no": 1,
        "count": 100
    }
}

Deduplication Logic:

def fetch_filings(item):
    # Fetch from both endpoints
    data1 = fetch_legacy(isin)
    data2 = fetch_lodr(isin)
    
    # Merge and deduplicate
    combined = data1 + data2
    unique_map = {}
    
    for entry in combined:
        nid = entry.get("news_id")
        date_str = entry.get("news_date")
        caption = entry.get("caption") or entry.get("descriptor") or "Unknown"
        
        # Create unique key
        key = nid if nid else f"{date_str}_{caption}"
        
        # Keep first occurrence (with file_url preference)
        if key not in unique_map:
            unique_map[key] = entry
        else:
            if entry.get("file_url") and not unique_map[key].get("file_url"):
                unique_map[key] = entry
    
    final_list = list(unique_map.values())
    
    # Sort by date descending (latest first)
    final_list.sort(key=lambda x: x.get("news_date", "1900-01-01"), reverse=True)
    
    # Save to company_filings/SYMBOL_filings.json
    wrapped_data = {"code": 0, "data": final_list}
    with open(f"company_filings/{symbol}_filings.json", "w") as f:
        json.dump(wrapped_data, f, indent=4)

Output Structure:

{
  "code": 0,
  "data": [
    {
      "news_id": "12345",
      "news_date": "2026-02-15 18:30:00",
      "caption": "Financial Results",
      "descriptor": "Financial Results",
      "file_url": "https://...",
      "cat": "Results"
    }
  ]
}

Performance:

20 concurrent threads
~5000 stocks in 60-90 seconds
Smart skip logic: only refreshes if FORCE_UPDATE=True or file missing

2. fetch_new_announcements.py

Purpose: Fetches live company announcements (events, results updates, material info). Configuration:

MAX_THREADS = 40  # Faster for small payloads

API Endpoint:

api_url = "https://ow-static-scanx.dhan.co/staticscanx/announcements"
payload = {"data": {"isin": "INE002A01018"}}

Core Logic:

def fetch_announcements(item):
    symbol = item.get("Symbol")
    isin = item.get("ISIN")
    name = item.get("Name")
    
    response = requests.post(api_url, json=payload, headers=headers, timeout=10)
    
    if response.status_code == 200:
        res_json = response.json()
        announcements = res_json.get("data")
        
        if announcements and isinstance(announcements, list):
            return [
                {
                    "Symbol": symbol,
                    "Name": name,
                    "Event": ann.get("events"),
                    "Date": ann.get("date"),
                    "Type": ann.get("type")
                } for ann in announcements
            ]
    return None

Output:

[
  {
    "Symbol": "RELIANCE",
    "Name": "Reliance Industries Ltd",
    "Event": "Quarterly Results are out",
    "Date": "2026-02-15 18:45:23",
    "Type": "Results Update"
  }
]

Saved to: all_company_announcements.json (consolidated, sorted by date descending)

3. fetch_advanced_indicators.py

Purpose: Fetches advanced technical indicators (SMA, EMA, RSI, MACD, Pivot Points). Configuration:

MAX_THREADS = 50  # Fast parallel execution

API Endpoint:

api_url = "https://ow-static-scanx.dhan.co/staticscanx/indicator"
payload = {
    "exchange": "NSE",
    "segment": "E",
    "security_id": "1234",  # Sid from master_isin_map
    "isin": "INE002A01018",
    "symbol": "RELIANCE",
    "minute": "D"  # Daily timeframe
}

Requires Sid (Security ID) from master_isin_map.json. Stocks without Sid are skipped.

Response Structure:

{
  "data": [
    {
      "EMA": [
        {"Indicator": "20-EMA", "Value": 2680.5, "Action": "Buy"},
        {"Indicator": "50-EMA", "Value": 2620.3, "Action": "Buy"},
        {"Indicator": "200-EMA", "Value": 2450.8, "Action": "Buy"}
      ],
      "SMA": [
        {"Indicator": "20-SMA", "Value": 2685.2, "Action": "Buy"},
        {"Indicator": "50-SMA", "Value": 2625.7, "Action": "Buy"},
        {"Indicator": "200-SMA", "Value": 2455.1, "Action": "Buy"}
      ],
      "Indicator": [
        {"Indicator": "RSI(14)", "Value": 58.3, "Action": "Buy"},
        {"Indicator": "MACD(12,26)", "Value": 12.5, "Action": "Buy"}
      ],
      "Pivot": [
        {
          "Classic": {
            "PP": 2745.0,
            "R1": 2780.5,
            "R2": 2816.0,
            "S1": 2709.5,
            "S2": 2674.0
          }
        }
      ]
    }
  ]
}

Saved Structure:

[
  {
    "Symbol": "RELIANCE",
    "EMA": [...],
    "SMA": [...],
    "TechnicalIndicators": [...],
    "Pivots": [...]
  }
]

Saved to: advanced_indicator_data.json

4. fetch_market_news.py

Purpose: Fetches last 50 market news items with sentiment analysis for each stock. Configuration:

MAX_THREADS = 15  # Conservative for this API
NEWS_LIMIT = 50   # News items per stock

API Endpoint:

url = "https://news-live.dhan.co/v2/news/getLiveNews"
payload = {
    "categories": ["ALL"],
    "page_no": 0,
    "limit": 50,
    "first_news_timeStamp": 0,
    "last_news_timeStamp": 0,
    "news_feed_type": "live",
    "stock_list": ["INE002A01018"],  # ISIN
    "entity_id": ""
}

Response Processing:

data = response.json()
news_items = data.get("data", {}).get("latest_news", [])

processed_news = []
for news in news_items:
    news_obj = news.get("news_object", {})
    processed_news.append({
        "Title": news_obj.get("title", ""),
        "Summary": news_obj.get("text", ""),
        "Sentiment": news_obj.get("overall_sentiment", "neutral"),
        "PublishDate": news.get("publish_date", 0),
        "Source": news.get("category", "")
    })

Output Structure:

{
  "Symbol": "RELIANCE",
  "ISIN": "INE002A01018",
  "News": [
    {
      "Title": "Reliance Q3 Results Beat Estimates",
      "Summary": "Reliance Industries reported...",
      "Sentiment": "positive",
      "PublishDate": 1709123456789,
      "Source": "Business"
    }
  ]
}

Saved to: market_news/SYMBOL_news.json (per-stock files)

5. fetch_corporate_actions.py

Purpose: Fetches upcoming and historical corporate actions (dividend, bonus, splits, buybacks, results). API Endpoint:

url = "https://ow-scanx-analytics.dhan.co/customscan/fetchdt"

Dual Time Windows:

start_date = (today - 730 days).strftime("%Y-%m-%d")
end_date = (today - 1 day).strftime("%Y-%m-%d")

payload = {
    "data": {
        "sort": "CorpAct.ExDate",
        "sorder": "asc",
        "count": 5000,
        "fields": [
            "CorpAct.ActType", "Sym", "DispSym", 
            "CorpAct.ExDate", "CorpAct.RecDate", "CorpAct.Note"
        ],
        "params": [
            {"field": "CorpAct.ExDate", "op": "lte", "val": end_date},
            {"field": "CorpAct.ExDate", "op": "gte", "val": start_date},
            {"field": "CorpAct.ActType", "op": "", "val": "BONUS,DIVIDEND,QUARTERLY RESULT ANNOUNCEMENT,SPLIT,RIGHTS,BUYBACK"}
        ]
    }
}

Flattening Logic:

for stock in raw_data:
    symbol = stock.get('Sym')
    name = stock.get('DispSym')
    actions = stock.get('CorpAct', [])
    
    for action in actions:
        ex_date = action.get('ExDate')
        if start_date <= ex_date <= end_date:
            flattened.append({
                "Symbol": symbol,
                "Name": name,
                "Type": action.get('ActType'),
                "ExDate": ex_date,
                "RecordDate": action.get('RecDate'),
                "Details": action.get('Note')
            })

Output Files:

history_corporate_actions.json - Last 2 years
upcoming_corporate_actions.json - Next 60 days

6. fetch_surveillance_lists.py

Purpose: Fetches NSE Additional Surveillance Measure (ASM) and Graded Surveillance Measure (GSM) lists. Multi-Source Strategy: This script uses 3 fallback sources to ensure data availability:

Primary: Google Sheets Gviz API (fastest)
Secondary: Next.js Direct JSON API
Fallback: Web scraping with BeautifulSoup

Configuration:

lists_config = {
    "nse_asm_list.json": {
        "gid": "290894275",
        "web_url": "https://dhan.co/nse-asm-list/",
        "data_key": "nse-asm-list"
    },
    "nse_gsm_list.json": {
        "gid": "1525483995",
        "web_url": "https://dhan.co/nse-gsm-list/",
        "data_key": "nse-gsm-list"
    }
}

Source 1: Gviz API

url = f"https://docs.google.com/spreadsheets/d/1zqhM3geRNW_ZzEx62y0W5U2ZlaXxG-NDn0V8sJk5TQ4/gviz/tq?tqx=out:json&gid={gid}"

response = requests.get(url)
text = response.text
match = re.search(r'setResponse\((.*)\);', text)
data = json.loads(match.group(1))
rows = data.get('table', {}).get('rows', [])

for row in rows:
    c = row.get('c', [])
    if len(c) >= 5:
        cleaned_list.append({
            "Symbol": c[1].get('v'),
            "Name": c[2].get('v'),
            "ISIN": c[3].get('v'),
            "Stage": c[4].get('v')  # LTASM, STASM, etc.
        })

Output Structure:

[
  {
    "Symbol": "EXAMPLE",
    "Name": "Example Company Ltd",
    "ISIN": "INE123A01012",
    "Stage": "LTASM Stage 1"
  }
]

7. fetch_circuit_stocks.py

Purpose: Fetches stocks hitting upper or lower circuit limits. Configuration:

scans_config = {
    "upper_circuit_stocks.json": {
        "field": "LiveData.UpperCircuitBreak",
        "val": "1"
    },
    "lower_circuit_stocks.json": {
        "field": "LiveData.LowerCircuitBreak",
        "val": "1"
    }
}

API Payload:

payload = {
    "data": {
        "sort": "Mcap", "sorder": "desc", "count": 500,
        "fields": [
            "Sym", "DispSym", "Ltp", "PPerchange", "Mcap", 
            "Volume", "High5yr", "Low1Yr", "High1Yr", 
            "Pe", "Pb", "DivYeild"
        ],
        "params": [
            {"field": "LiveData.UpperCircuitBreak", "op": "", "val": "1"},
            {"field": "OgInst", "op": "", "val": "ES"},
            {"field": "Seg", "op": "", "val": "E"}
        ]
    }
}

8-11. Additional Fetchers

fetch_bulk_block_deals.py

Fetches recent bulk and block deals (large institutional trades).

fetch_incremental_price_bands.py

Fetches circuit limit revisions (bands changing from 5% to 2%, etc.).

fetch_complete_price_bands.py

Fetches current circuit limits for all stocks.

fetch_all_indices.py

Fetches index constituents for all major indices (NIFTY 50, NIFTY 500, sectoral indices).

Common Utilities (pipeline_utils.py)

Header Generation:

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...",
    # ... 5 total user agents
]

def get_headers(include_origin=False):
    h = {
        "Content-Type": "application/json",
        "User-Agent": random.choice(USER_AGENTS),
        "Accept": "application/json, text/plain, */*"
    }
    if include_origin:
        h["Origin"] = "https://scanx.dhan.co"
        h["Referer"] = "https://scanx.dhan.co/"
    return h

Performance Summary

Script	Threads	Avg Time	Output
fetch_company_filings.py	20	60-90s	~5000 JSON files
fetch_new_announcements.py	40	30-45s	1 consolidated JSON
fetch_advanced_indicators.py	50	45-60s	1 JSON (~5000 stocks)
fetch_market_news.py	15	90-120s	~5000 JSON files
fetch_corporate_actions.py	1	5-8s	2 JSON files
fetch_surveillance_lists.py	1	3-5s	2 JSON files
fetch_circuit_stocks.py	1	2-3s	2 JSON files
Others	Varies	10-30s	Various
Total Phase 2	-	3-5 min	-

Next Steps

Phase 3: Base Analysis

Learn how all this data is merged into the master JSON

Get Started

Core Concepts

Pipeline Scripts

Standalone Scripts

Configuration

Phase 2 - Data Enrichment

Overview

Script Execution Pattern

1. fetch_company_filings.py

2. fetch_new_announcements.py

3. fetch_advanced_indicators.py

4. fetch_market_news.py

5. fetch_corporate_actions.py

6. fetch_surveillance_lists.py

7. fetch_circuit_stocks.py

8-11. Additional Fetchers

fetch_bulk_block_deals.py

fetch_incremental_price_bands.py

fetch_complete_price_bands.py

fetch_all_indices.py

Common Utilities (pipeline_utils.py)

Performance Summary

Next Steps

Phase 3: Base Analysis

Build docs developers (and LLMs) love

Get Started

Core Concepts

Pipeline Scripts

Standalone Scripts

Configuration

​Overview

​Script Execution Pattern

​1. fetch_company_filings.py

​2. fetch_new_announcements.py

​3. fetch_advanced_indicators.py

​4. fetch_market_news.py

​5. fetch_corporate_actions.py

​6. fetch_surveillance_lists.py

​7. fetch_circuit_stocks.py

​8-11. Additional Fetchers

​fetch_bulk_block_deals.py

​fetch_incremental_price_bands.py

​fetch_complete_price_bands.py

​fetch_all_indices.py

​Common Utilities (pipeline_utils.py)

​Performance Summary

​Next Steps

Phase 3: Base Analysis

Build docs developers (and LLMs) love

Overview

Script Execution Pattern

1. fetch_company_filings.py

2. fetch_new_announcements.py

3. fetch_advanced_indicators.py

4. fetch_market_news.py

5. fetch_corporate_actions.py

6. fetch_surveillance_lists.py

7. fetch_circuit_stocks.py

8-11. Additional Fetchers

fetch_bulk_block_deals.py

fetch_incremental_price_bands.py

fetch_complete_price_bands.py

fetch_all_indices.py

Common Utilities (pipeline_utils.py)

Performance Summary

Next Steps