Processor

The processor module transforms raw parsed data from the parser into a clean, normalized format suitable for AI model consumption or further analysis.

prepare_for_ai()

Normalizes parsed file data and computes aggregate statistics across the entire project. Location: docugen/core/processor.py:57

def prepare_for_ai(
    parsed_files: Mapping[str, Mapping[str, Any]]
) -> dict[str, Any]

Parameters

parsed_files

Mapping[str, Mapping[str, Any]]

required

Dictionary mapping file paths to parsed data structures (typically output from parse_project()). Each value should be a dictionary with classes, functions, metrics, and errors keys.

Returns

result

dict[str, Any]

Normalized project data with summary statistics and cleaned file data:

{
    "summary": {
        "file_count": int,        # Total number of files
        "class_count": int,       # Total classes across all files
        "method_count": int,      # Total methods across all classes
        "function_count": int,    # Total top-level functions
        "error_count": int        # Total parse errors
    },
    "files": [
        {
            "path": str,              # File path
            "classes": list[dict],    # Normalized classes
            "functions": list[dict],  # Normalized functions
            "metrics": dict,          # File metrics
            "errors": list[str]       # Non-empty error messages
        },
        # ... more files
    ]
}

Normalization Process

The function performs the following transformations:

Text cleaning: All string values are stripped of whitespace
Type safety: Missing values are replaced with appropriate defaults
Error filtering: Only non-empty error messages are included
Sorting: Files are sorted alphabetically by path
Statistics: Computes aggregate counts across all files

Behavior

Processes files in sorted order by path
Empty or missing fields are normalized to empty strings, empty lists, or zero
Non-integer metrics are coerced to integers
All text values are cleaned using _as_clean_text()

Example

from docugen.core.scanner import scan_python_files
from docugen.core.parser import parse_project
from docugen.core.processor import prepare_for_ai

# Complete workflow: scan, parse, and prepare
files = scan_python_files("~/my-project")
parsed = parse_project(files, root="~/my-project")
ai_ready = prepare_for_ai(parsed)

# Access summary statistics
print(f"Project Summary:")
print(f"  Files: {ai_ready['summary']['file_count']}")
print(f"  Classes: {ai_ready['summary']['class_count']}")
print(f"  Functions: {ai_ready['summary']['function_count']}")
print(f"  Methods: {ai_ready['summary']['method_count']}")
print(f"  Errors: {ai_ready['summary']['error_count']}")

# Iterate through normalized files
for file_data in ai_ready["files"]:
    print(f"\n{file_data['path']}:")
    for cls in file_data["classes"]:
        print(f"  class {cls['name']}:")
        for method in cls["methods"]:
            args_str = ", ".join(arg["name"] for arg in method["args"])
            print(f"    def {method['name']}({args_str})")

Normalization Helper Functions

The module includes several internal helper functions that ensure data consistency:

_as_clean_text()

Location: docugen/core/processor.py:6

def _as_clean_text(value: Any) -> str

Converts any value to a clean string:

None → empty string
Strings → stripped of leading/trailing whitespace
Other types → converted to string representation

_normalize_args()

Location: docugen/core/processor.py:14

def _normalize_args(
    args: list[Mapping[str, Any]] | None
) -> list[dict[str, str]]

Normalizes function argument lists to ensure all fields are clean strings:

[
    {
        "name": str,          # Parameter name (cleaned)
        "annotation": str,    # Type annotation (cleaned)
        "default": str,       # Default value (cleaned)
        "kind": str          # "positional", "keyword_only", etc.
    },
    # ... more args
]

Missing fields default to empty strings
kind defaults to "positional" if not specified
None input returns empty list

_normalize_function()

Location: docugen/core/processor.py:28

def _normalize_function(record: Mapping[str, Any]) -> dict[str, Any]

Normalizes a function or method record:

{
    "name": str,              # Function name (cleaned)
    "args": list[dict],       # Normalized arguments
    "returns": str,           # Return type annotation (cleaned)
    "docstring": str,         # Docstring (cleaned)
    "is_async": bool          # Async function flag
}

_normalize_class()

Location: docugen/core/processor.py:38

def _normalize_class(record: Mapping[str, Any]) -> dict[str, Any]

Normalizes a class definition:

{
    "name": str,              # Class name (cleaned)
    "bases": list[str],       # Base classes (cleaned)
    "docstring": str,         # Class docstring (cleaned)
    "methods": list[dict]     # Normalized methods
}

_normalize_metrics()

Location: docugen/core/processor.py:47

def _normalize_metrics(
    metrics: Mapping[str, Any] | None
) -> dict[str, int]

Normalizes metrics dictionary to ensure all values are integers:

{
    "line_count": int,
    "class_count": int,
    "method_count": int,
    "function_count": int
}

Missing metrics default to 0
Non-integer values are coerced to integers
None input is treated as empty dictionary

Usage Patterns

Full Pipeline

from docugen.core.scanner import scan_python_files
from docugen.core.parser import parse_project
from docugen.core.processor import prepare_for_ai

# 1. Discover Python files
files = scan_python_files("./project")

# 2. Parse files to extract structure
parsed = parse_project(files, root="./project")

# 3. Normalize and prepare for AI
ai_ready = prepare_for_ai(parsed)

# 4. Use the clean data
for file_data in ai_ready["files"]:
    if file_data["errors"]:
        print(f"Errors in {file_data['path']}: {file_data['errors']}")

Error Checking

ai_ready = prepare_for_ai(parsed_files)

if ai_ready["summary"]["error_count"] > 0:
    print("Files with errors:")
    for file_data in ai_ready["files"]:
        if file_data["errors"]:
            print(f"  {file_data['path']}: {file_data['errors']}")

Statistics Gathering

ai_ready = prepare_for_ai(parsed_files)
summary = ai_ready["summary"]

print(f"Project contains:")
print(f"  {summary['file_count']} files")
print(f"  {summary['class_count']} classes")
print(f"  {summary['function_count']} functions")
print(f"  {summary['method_count']} methods")

avg_lines = sum(f["metrics"]["line_count"] for f in ai_ready["files"]) / summary["file_count"]
print(f"  {avg_lines:.0f} average lines per file")

Core Modules

AI Integration

Templates & Utils

prepare_for_ai()

Parameters

Returns

Normalization Process

Behavior

Example

Normalization Helper Functions

_as_clean_text()

_normalize_args()

_normalize_function()

_normalize_class()

_normalize_metrics()

Usage Patterns

Full Pipeline

Error Checking

Statistics Gathering

Build docs developers (and LLMs) love

Core Modules

AI Integration

Templates & Utils

​prepare_for_ai()

​Parameters

​Returns

​Normalization Process

​Behavior

​Example

​Normalization Helper Functions

​_as_clean_text()

​_normalize_args()

​_normalize_function()

​_normalize_class()

​_normalize_metrics()

​Usage Patterns

​Full Pipeline

​Error Checking

​Statistics Gathering

Build docs developers (and LLMs) love

prepare_for_ai()

Parameters

Returns

Normalization Process

Behavior

Example

Normalization Helper Functions

_as_clean_text()

_normalize_args()

_normalize_function()

_normalize_class()

_normalize_metrics()

Usage Patterns

Full Pipeline

Error Checking

Statistics Gathering