Skip to main content
The scanner module provides functionality to discover Python files in a directory tree while respecting .gitignore patterns and default exclusions.

scan_python_files()

Scans a directory tree for Python files while respecting gitignore rules. Location: docugen/core/scanner.py:94
def scan_python_files(root_path: str | Path) -> list[Path]

Parameters

root_path
str | Path
required
Path to the root directory or single Python file to scan. Accepts both string paths and Path objects. Supports ~ for home directory expansion.

Returns

files
list[Path]
Sorted list of absolute Path objects for all discovered Python files. Returns empty list if no .py files are found.

Behavior

  • Single file mode: If root_path points to a file, returns it only if it has .py extension
  • Directory mode: Recursively walks the directory tree and discovers all .py files
  • Automatic filtering: Excludes directories in DEFAULT_IGNORED_DIRS and patterns from .gitignore
  • Path resolution: Expands ~ and resolves to absolute paths

Raises

  • FileNotFoundError: If the specified path does not exist

Example

from docugen.core.scanner import scan_python_files

# Scan entire project
files = scan_python_files("~/my-project")
print(f"Found {len(files)} Python files")

# Scan single file
files = scan_python_files("script.py")
if files:
    print(f"File is valid: {files[0]}")

GitIgnoreRule

Represents a parsed rule from a .gitignore file. Location: docugen/core/scanner.py:20
@dataclass(frozen=True)
class GitIgnoreRule:
    pattern: str
    negated: bool
    directory_only: bool
    anchored: bool

Attributes

pattern
str
The gitignore pattern without special prefix/suffix characters (e.g., "*.pyc", "build", "docs/temp")
negated
bool
True if the rule starts with !, meaning it negates (un-ignores) matching paths
directory_only
bool
True if the rule ends with /, meaning it only matches directories
anchored
bool
True if the rule starts with /, meaning the pattern is relative to the repository root

Usage

This class is immutable (frozen dataclass) and primarily used internally by the scanner to evaluate whether paths should be ignored.

Example

from docugen.core.scanner import GitIgnoreRule

# Examples of how gitignore patterns are parsed
rule1 = GitIgnoreRule(pattern="*.pyc", negated=False, directory_only=False, anchored=False)
rule2 = GitIgnoreRule(pattern="build", negated=False, directory_only=True, anchored=False)
rule3 = GitIgnoreRule(pattern="!important.log", negated=True, directory_only=False, anchored=False)

Default Ignored Directories

Location: docugen/core/scanner.py:8 The scanner automatically excludes these directories regardless of .gitignore rules:
DEFAULT_IGNORED_DIRS = {
    "__pycache__",
    ".git",
    ".venv",
    "venv",
    ".mypy_cache",
    ".pytest_cache",
    "build",
    "dist",
}
These directories are commonly used for Python tooling and build artifacts and are excluded for performance and relevance.

Helper Functions

The module includes internal helper functions that are not part of the public API:
  • _load_gitignore_rules(root: Path) -> list[GitIgnoreRule] - Parses .gitignore file (scanner.py:28)
  • _match_rule(relative_path: str, is_dir: bool, rule: GitIgnoreRule) -> bool - Tests if a path matches a rule (scanner.py:64)
  • _is_ignored(relative_path: str, is_dir: bool, rules: list[GitIgnoreRule]) -> bool - Determines if a path should be ignored (scanner.py:86)

Build docs developers (and LLMs) love