Overview
The PRD Parser is Phase 1 of the Omni Architect pipeline. It transforms unstructured Markdown PRD documents into a rich semantic structure that identifies features, user stories, domain entities, business flows, and acceptance criteria.Version: 1.0.0
Author: fabioeloi
Pipeline Phase: 1 of 5
Author: fabioeloi
Pipeline Phase: 1 of 5
Purpose
The PRD Parser solves the critical problem of extracting actionable structure from natural language requirements. By tokenizing and classifying PRD sections, it creates a machine-readable representation that serves as the foundation for automated diagram generation and validation.Inputs & Outputs
Inputs
Complete PRD content in Markdown format. Should include headings, user stories, entity descriptions, and business flows.
Outputs
Semantic structure containing:
- features: List of functionality with priority and complexity
- user_stories: Stories in “As X, I want Y, so that Z” format
- entities: Domain entities with attributes and relationships
- flows: Business flows with sequential steps
- requirements: Functional and non-functional requirements
- acceptance_criteria: Acceptance criteria per feature
- dependencies: Feature dependency graph
- personas: Identified user personas
PRD completeness score ranging from 0.0 to 1.0. Scores below 0.6 trigger warnings with improvement suggestions.
Algorithm
The parser follows a multi-stage extraction process:Tokenize PRD
Split the document into sections by heading levels (H1, H2, H3) to create a hierarchical structure.
Classify Sections
Apply semantic classification to each section using pattern matching heuristics (feature, story, requirement, entity, flow).
Extract Named Entities
Perform Named Entity Recognition (NER) to identify domain-specific entities and concepts.
Calculate Dependency Graph
Build a directed graph of dependencies between features based on explicit references and implicit relationships.
Compute Completeness Score
Calculate completeness score (0.0 - 1.0) based on presence of key sections and depth of detail.
Classification Heuristics
The parser uses pattern matching to classify PRD sections:| Pattern in Text | Classification | Example |
|---|---|---|
| ”Como [persona], quero…” | User Story | ”Como cliente, quero visualizar meu histórico de pedidos" |
| "Requisito:”, “Deve…” | Functional Requirement | ”O sistema deve validar CPF no cadastro" |
| "Performance:”, “Segurança:“ | Non-Functional Requirement | ”Performance: Tempo de resposta < 200ms” |
| Tables with attributes | Domain Entity | Table with columns: id, name, email, role |
| ”Fluxo:”, numbered step lists | Business Flow | ”Fluxo de checkout: 1. Adicionar ao carrinho…" |
| "Critério de aceite”, checkboxes | Acceptance Criteria | ”- [x] Validação de email implementada” |
Example Output
Completeness Scoring
The parser evaluates PRD quality based on:- Feature Coverage: Are all major features described?
- Story Depth: Do user stories follow the standard format?
- Entity Definitions: Are domain entities clearly defined with attributes?
- Flow Documentation: Are business flows documented with steps?
- Acceptance Criteria: Are testable acceptance criteria provided?
- Dependency Clarity: Are feature dependencies explicitly stated?
Score Interpretation
| Score Range | Assessment | Action |
|---|---|---|
| 0.85 - 1.0 | Excellent | Proceed with confidence |
| 0.70 - 0.84 | Good | Minor improvements suggested |
| 0.60 - 0.69 | Fair | Review warnings carefully |
| 0.0 - 0.59 | Poor | Significant PRD improvements needed |
Usage in Pipeline
The PRD Parser is automatically invoked as Phase 1 when running the full Omni Architect pipeline:Best Practices
Use Structured Headings
Organize PRD with clear H1/H2/H3 hierarchy for optimal parsing.
Write Explicit User Stories
Follow “Como X, quero Y, para Z” format for accurate story extraction.
Define Entities in Tables
Use Markdown tables with columns for attributes to enable entity recognition.
Document Dependencies
Explicitly state which features depend on others to build accurate graphs.
Next Phase
Once parsing is complete, the structured PRD is passed to:Phase 2: Mermaid Generator
Automatically generate flowcharts, sequence diagrams, ER diagrams, and more from the parsed structure.