DSPy ChainOfThought extraction with transcript windowing, candidate deduplication, and memory validation
The extraction pipeline transforms raw agent session transcripts into structured memory candidates using DSPy’s ChainOfThought prompting with transcript windowing.
Each window is processed independently, then candidates are merged.
Why overlapping windows?
Overlapping windows ensure that important context near window boundaries isn’t lost. A decision that spans a boundary will appear in both windows, and the merge step deduplicates it.Example: With 300K token windows and 30K overlap, a 500K transcript becomes:
Window 1: tokens 0-300K
Window 2: tokens 270K-500K (30K overlap with Window 1)
When the transcript spans multiple windows, candidates are merged:
# src/lerim/memory/extract_pipeline.py:112-130if len(windows) == 1: capture_dspy_cost(lm, history_start) return all_candidates# Multiple windows: merge and deduplicatemerger = dspy.ChainOfThought(MemoryMergeSignature)with dspy.context(lm=lm): merge_result = merger(candidates=all_candidates, metadata=meta)capture_dspy_cost(lm, history_start)merged = getattr(merge_result, "primitives", [])if not isinstance(merged, list): return all_candidatesreturn [ item.model_dump(mode="json", exclude_none=True) if isinstance(item, MemoryCandidate) else item for item in merged if isinstance(item, (MemoryCandidate, dict))]
The merge step uses the same ChainOfThought prompting but operates on structured candidate data instead of raw transcripts. This is much faster than re-processing the full transcript.
{"role":"user","content":"Queue jobs got stuck again. Heartbeat drift caused retries and duplicate claims."}{"role":"assistant","content":"Fix worked: heartbeat every 15s, max_attempts=3, then dead_letter. Add metrics for retries and dead letters."}
The pipeline might extract:
{ "primitive": "learning", "title": "Queue heartbeat timing fix", "body": "Jobs stuck due to heartbeat drift causing duplicate claims. Fixed by setting heartbeat interval to 15 seconds with max_attempts=3 and dead_letter queue for failures. Added metrics for monitoring retries and dead letters.", "kind": "procedure", "confidence": 0.85, "tags": ["queue", "reliability", "monitoring"], "evidence": "heartbeat every 15s, max_attempts=3, then dead_letter"}