Overview
Phase 2 consists of 11 parallel data enrichment scripts that fetch additional context for each stock. All scripts depend onmaster_isin_map.json but are independent of each other, enabling concurrent execution.
Script Execution Pattern
All Phase 2 scripts follow a common pattern:- Load master_isin_map.json - Get list of all ISINs/Symbols
- Multi-threaded fetching - Use ThreadPoolExecutor for parallel requests
- Rate limiting - Respect API limits with delays and retry logic
- Output to JSON - Save structured data for Phase 4 processing
1. fetch_company_filings.py
Purpose: Fetches regulatory filings (quarterly results, board meetings, announcements) from both legacy and new LODR endpoints. Configuration:- 20 concurrent threads
- ~5000 stocks in 60-90 seconds
- Smart skip logic: only refreshes if
FORCE_UPDATE=Trueor file missing
2. fetch_new_announcements.py
Purpose: Fetches live company announcements (events, results updates, material info). Configuration:all_company_announcements.json (consolidated, sorted by date descending)
3. fetch_advanced_indicators.py
Purpose: Fetches advanced technical indicators (SMA, EMA, RSI, MACD, Pivot Points). Configuration:advanced_indicator_data.json
4. fetch_market_news.py
Purpose: Fetches last 50 market news items with sentiment analysis for each stock. Configuration:market_news/SYMBOL_news.json (per-stock files)
5. fetch_corporate_actions.py
Purpose: Fetches upcoming and historical corporate actions (dividend, bonus, splits, buybacks, results). API Endpoint:history_corporate_actions.json- Last 2 yearsupcoming_corporate_actions.json- Next 60 days
6. fetch_surveillance_lists.py
Purpose: Fetches NSE Additional Surveillance Measure (ASM) and Graded Surveillance Measure (GSM) lists. Multi-Source Strategy: This script uses 3 fallback sources to ensure data availability:- Primary: Google Sheets Gviz API (fastest)
- Secondary: Next.js Direct JSON API
- Fallback: Web scraping with BeautifulSoup
7. fetch_circuit_stocks.py
Purpose: Fetches stocks hitting upper or lower circuit limits. Configuration:8-11. Additional Fetchers
fetch_bulk_block_deals.py
Fetches recent bulk and block deals (large institutional trades).fetch_incremental_price_bands.py
Fetches circuit limit revisions (bands changing from 5% to 2%, etc.).fetch_complete_price_bands.py
Fetches current circuit limits for all stocks.fetch_all_indices.py
Fetches index constituents for all major indices (NIFTY 50, NIFTY 500, sectoral indices).Common Utilities (pipeline_utils.py)
Header Generation:Performance Summary
| Script | Threads | Avg Time | Output |
|---|---|---|---|
| fetch_company_filings.py | 20 | 60-90s | ~5000 JSON files |
| fetch_new_announcements.py | 40 | 30-45s | 1 consolidated JSON |
| fetch_advanced_indicators.py | 50 | 45-60s | 1 JSON (~5000 stocks) |
| fetch_market_news.py | 15 | 90-120s | ~5000 JSON files |
| fetch_corporate_actions.py | 1 | 5-8s | 2 JSON files |
| fetch_surveillance_lists.py | 1 | 3-5s | 2 JSON files |
| fetch_circuit_stocks.py | 1 | 2-3s | 2 JSON files |
| Others | Varies | 10-30s | Various |
| Total Phase 2 | - | 3-5 min | - |
Next Steps
Phase 3: Base Analysis
Learn how all this data is merged into the master JSON