PRD Parser
The PRD Parser is Phase 1 of the Omni Architect pipeline. It analyzes Product Requirements Documents (PRDs) written in Markdown and extracts semantic structure including features, user stories, domain entities, business flows, and acceptance criteria.Purpose
The PRD Parser transforms unstructured PRD text into a machine-readable semantic structure that serves as the foundation for automated diagram generation and validation. It identifies:- Features with priority and complexity ratings
- User Stories following the “As X, I want Y, so that Z” pattern
- Domain Entities and their attributes
- Business Flows with sequential steps
- Acceptance Criteria per feature
- Dependencies between features
- Personas and their characteristics
Inputs
Complete PRD content in Markdown format. The parser expects structured PRDs with heading levels (H1, H2, H3) and recognizable section patterns.
Outputs
Semantic structure containing extracted features, stories, entities, flows, requirements, acceptance criteria, dependencies, and personas.Structure:
Quality score indicating how complete the PRD is (0.0 to 1.0). Scores below 0.6 trigger warnings with improvement suggestions.
Algorithm
The parser follows a 7-step extraction process:1. Tokenization
The PRD is tokenized by heading levels (H1, H2, H3) to establish document structure and hierarchy.2. Semantic Classification
Each section is classified by type using pattern matching heuristics:| Pattern in Text | Classification |
|---|---|
| ”Como [persona], quero…” or “As [persona], I want…” | User Story |
| ”Requisito:”, “Deve…”, “Must…” | Functional Requirement |
| ”Performance:”, “Security:”, “Availability:“ | Non-Functional Requirement |
| Tables with attributes/fields | Domain Entity |
| ”Fluxo:”, “Flow:”, numbered step lists | Business Flow |
| ”Critério de aceite”, “Acceptance criteria”, checkboxes | Acceptance Criteria |
3. Named Entity Recognition (NER)
Extracts domain entities by identifying nouns that appear consistently across the document, particularly in:- Table column headers
- Entity relationship descriptions
- User story subjects and objects
4. Relationship Mapping
Maps relationships between entities based on:- Explicit relationship statements (“User has many Orders”)
- Foreign key references in entity tables
- Implied relationships in user stories and flows
5. Dependency Graph Calculation
Builds a directed acyclic graph (DAG) of feature dependencies by analyzing:- Explicit “depends on” statements
- Sequential ordering in roadmaps
- Prerequisite mentions in acceptance criteria
6. Completeness Score Computation
Calculates a weighted score based on:- Features defined (25%): Are features clearly documented?
- User stories present (20%): Are user stories complete?
- Entities documented (20%): Are domain entities specified?
- Acceptance criteria (15%): Do features have acceptance criteria?
- Flows defined (10%): Are business flows documented?
- Dependencies mapped (10%): Are feature dependencies clear?
7. Warning Generation
Ifcompleteness_score < 0.6, generates specific warnings with actionable suggestions:
- Missing sections
- Incomplete user stories
- Undefined entities
- Ambiguous requirements
Example Output
Best Practices
Write PRDs with Clear Structure
Organize your PRD using consistent heading levels:Use Standard User Story Format
Document Entities with Tables
Define Clear Acceptance Criteria
Error Handling
| Scenario | Behavior |
|---|---|
| PRD is empty or too short | Returns error with minimum length requirement |
| No features detected | Emits warning, attempts to parse stories and entities |
| Ambiguous entities | Lists ambiguities in warnings array |
| Score < 0.6 | Continues processing but includes detailed improvement suggestions |
| Invalid Markdown | Attempts graceful parsing, warns about malformed sections |
Integration
The PRD Parser is invoked automatically as Phase 1 when running the full Omni Architect pipeline:parsed_prd output flows directly into the Mermaid Generator (Phase 2).