DocuGen AI uses Python’s built-in Abstract Syntax Tree (AST) module to extract structured metadata from source code. This approach is more reliable than regex-based parsing because it understands Python’s syntax at a deep level.
The AST parser is implemented in docugen/core/parser.py and handles classes, functions, type annotations, docstrings, and more.
An Abstract Syntax Tree represents the syntactic structure of source code as a tree. Each node represents a construct in the code (class, function, expression, etc.).
The parser uses errors="replace" to handle files with encoding issues gracefully.
Parse into AST (parser.py:114)
tree = ast.parse(source, filename=str(path))
Traverse Top-Level Nodes (parser.py:123-139)
for node in tree.body: if isinstance(node, ast.ClassDef): # Extract class metadata elif isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): # Extract function metadata
Calculate Metrics (parser.py:143-145)
result["metrics"]["class_count"] = len(classes)result["metrics"]["function_count"] = len(functions)result["metrics"]["method_count"] = sum(len(item["methods"]) for item in classes)
For each ast.ClassDef node, DocuGen extracts (parser.py:124-136):
{ "name": node.name, "bases": [_safe_unparse(base) for base in node.bases if _safe_unparse(base)], "docstring": ast.get_docstring(node) or "", "methods": methods,}
Fields:
name: Class name (e.g., GeminiClient)
bases: List of base classes (e.g., ["BaseModel", "ABC"])
docstring: The class-level docstring
methods: List of method metadata (see Functions below)
Base classes are unparsed back to strings using ast.unparse(), which reconstructs the original code from the AST node.
Each argument includes its name, type annotation, default value, and kind. This granular metadata helps the AI understand function signatures completely.
The _safe_unparse() helper (parser.py:8-14) converts AST nodes back to strings:
def _safe_unparse(node: ast.AST | None) -> str: if node is None: return "" try: return ast.unparse(node) except Exception: return ""
Why is this needed?Type annotations and default values are stored as AST nodes. To include them in documentation, we need to convert them back to readable strings:
ast.Name(id='str') → "str"
ast.Constant(value=25) → "25"
ast.Call(...) → "datetime.now()"
The try/except block ensures that even if unparsing fails (rare edge cases), the parser continues rather than crashing.
try: tree = ast.parse(source, filename=str(path))except SyntaxError as exc: message = f"SyntaxError at line {exc.lineno}, column {exc.offset}: {exc.msg}" result["errors"].append(message) return result
When a file has syntax errors, the parser records the error details but continues processing other files. This allows documentation generation even for projects with incomplete or broken code.
The parser calculates useful code metrics (parser.py:96-101):
"metrics": { "line_count": 0, # Total lines in file "class_count": 0, # Number of classes "method_count": 0, # Total methods across all classes "function_count": 0, # Module-level functions}
These metrics help users understand the project size and complexity at a glance.