Python Transforms

Python transforms provide a powerful, code-based approach to transforming data in Infrahub. They’re ideal for complex data manipulation, integration with external systems, and type-safe transformations with validation.

What are Python Transforms?

Python transforms are Python classes that inherit from InfrahubTransform and implement custom transformation logic. Unlike Jinja2 templates, Python transforms give you:

Full Python language capabilities
Access to external libraries and APIs
Complex data manipulation and business logic
Type safety and validation
Structured data output (JSON, YAML, etc.)
Reusable transformation logic

Creating a Python Transform

1. Define a GraphQL Query

Create a GraphQL query to retrieve the data:

# templates/person_with_cars.gql
query PersonWithTheirCars($name: String!) {
  TestingPerson(name__value: $name) {
    edges {
      node {
        id
        __typename
        name { value }
        age { value }
        cars {
          edges {
            node {
              id
              __typename
              name { value }
            }
          }
        }
      }
    }
  }
}

2. Create the Transform Class

Create a Python file with a class inheriting from InfrahubTransform:

# transforms/person_with_cars_transform.py
from typing import Any
from infrahub_sdk.transforms import InfrahubTransform

class PersonWithCarsTransform(InfrahubTransform):
    query = "person_with_cars"
    timeout = 10  # Optional, default is 10 seconds

    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        """Transform the GraphQL response data.
        
        Args:
            data: GraphQL query response as nested dictionaries
            
        Returns:
            Transformed data as dictionary
        """
        if data["TestingPerson"]["edges"]:
            return {
                "name": data["TestingPerson"]["edges"][0]["node"]["name"]["value"]
            }
        
        return {"name": None}

3. Register in .infrahub.yml

Add the query and transform to your repository configuration:

# .infrahub.yml
queries:
  - name: person_with_cars
    file_path: "templates/person_with_cars.gql"

python_transforms:
  - name: PersonWithCarsTransform
    class_name: PersonWithCarsTransform
    file_path: "transforms/person_with_cars_transform.py"

Transform Class Structure

Basic Structure

Every Python transform follows this pattern:

from typing import Any
from infrahub_sdk.transforms import InfrahubTransform

class MyTransform(InfrahubTransform):
    # Required: GraphQL query name or ID
    query = "my_query_name"
    
    # Optional: Timeout in seconds (default: 10)
    timeout = 30
    
    # Required: Transform method
    async def transform(self, data: dict[str, Any]) -> Any:
        """Implement your transformation logic here."""
        # Process data and return result
        return processed_data

Class Attributes

query (required): Name or ID of the GraphQL query to execute
timeout (optional): Maximum execution time in seconds

Transform Method

The transform() method receives the GraphQL query response and returns the transformed data:

async def transform(self, data: dict[str, Any]) -> Any:
    """Transform method signature.
    
    Args:
        data: GraphQL query response as nested dictionaries
        
    Returns:
        Transformed data - can be dict, str, list, etc.
    """

Data Access Patterns

Raw GraphQL Response (Default)

By default, transforms receive raw GraphQL responses as nested dictionaries:

class CarSpecMarkdown(InfrahubTransform):
    query = "person_with_cars"
    
    async def transform(self, data: dict[str, Any]) -> str:
        # Access nested dictionary structure
        person_name = data["TestingPerson"]["edges"][0]["node"]["name"]["value"]
        
        markdown = f"# {person_name}\n\n"
        
        for car in data["TestingPerson"]["edges"][0]["node"]["cars"]["edges"]:
            car_name = car["node"]["name"]["value"]
            markdown += f"- {car_name}\n"
        
        return markdown

InfrahubNode Objects (Converted)

Set convert_query_response: true to receive InfrahubNode objects with direct attribute access:

python_transforms:
  - name: ConvertedPersonWithCarsTransform
    class_name: ConvertedPersonWith
    file_path: "transforms/converted_person_with_cars.py"
    convert_query_response: true  # Enable conversion

class ConvertedPersonWith(InfrahubTransform):
    query = "person_with_cars"

    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        # Get node ID from response
        node_id = data["TestingPerson"]["edges"][0]["node"]["id"]
        
        # Access converted InfrahubNode object from store
        person = self.store.get(key=node_id, kind="TestingPerson")
        
        # Direct attribute access (cleaner than nested dicts)
        return {
            "name": person.name.value,
            "age": person.age.value
        }

When to Use Each Approach

Raw Response (default):

Simple data extraction
Working with query results directly
No need for node operations

Converted Nodes (convert_query_response: true):

Complex node operations
Need to access relationships
Type-safe attribute access
Working with InfrahubNode methods

Return Types

Transforms can return various data types:

Dictionary (JSON/YAML)

async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
    return {
        "name": "John",
        "age": 30,
        "cars": ["Toyota", "Honda"]
    }

Useful for:

JSON artifacts (content_type: application/json)
YAML artifacts (content_type: application/yaml)
Structured data output

String

async def transform(self, data: dict[str, Any]) -> str:
    markdown = """
    ## Car Specification
    
    **blue** Sedan
    Make: **Toyota**
    Model: **Camry**
    """
    return markdown

Useful for:

Markdown artifacts (content_type: text/markdown)
Plain text artifacts (content_type: text/plain)
Configuration files

List

async def transform(self, data: dict[str, Any]) -> list[dict[str, Any]]:
    return [
        {"name": "Item 1", "value": 100},
        {"name": "Item 2", "value": 200}
    ]

Real-World Examples

Example 1: Simple Data Extraction

from typing import Any
from infrahub_sdk.transforms import InfrahubTransform

class PersonWithCarsTransform(InfrahubTransform):
    query = "person_with_cars"
    timeout = 2

    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        """Extract person's name from query response."""
        if data["TestingPerson"]["edges"]:
            return {
                "name": data["TestingPerson"]["edges"][0]["node"]["name"]["value"]
            }
        
        return {"name": None}

Example 2: Markdown Generation

from typing import Any
from infrahub_sdk.transforms import InfrahubTransform

class CarSpecMarkdown(InfrahubTransform):
    query = "person_with_cars"
    timeout = 10

    async def transform(self, data: dict[str, Any]) -> str:
        """Generate markdown documentation for car specifications."""
        markdown = """
        ## Car Specification

        **blue** Sedan
        Make: **Toyota**
        Model: **Camry**
        """
        return markdown

Example 3: Using Converted Nodes

from typing import Any
from infrahub_sdk.transforms import InfrahubTransform

class ConvertedPersonWith(InfrahubTransform):
    query = "person_with_cars"

    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        """Use InfrahubNode objects for cleaner data access."""
        node_id = data["TestingPerson"]["edges"][0]["node"]["id"]
        
        # Get InfrahubNode from store
        person = self.store.get(key=node_id, kind="TestingPerson")
        
        # Clean attribute access
        return {
            "name": person.name.value,
            "age": person.age.value
        }

Example 4: OpenConfig Interface Generation

from typing import Any
from infrahub_sdk.transforms import InfrahubTransform

class OCInterfaces(InfrahubTransform):
    query = "oc_interfaces"
    timeout = 10

    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        """Generate OpenConfig-formatted interface configuration."""
        device = data["InfraDevice"]["edges"][0]["node"]
        
        interfaces = []
        for intf in device["interfaces"]["edges"]:
            intf_node = intf["node"]
            interfaces.append({
                "name": intf_node["name"]["value"],
                "config": {
                    "name": intf_node["name"]["value"],
                    "enabled": intf_node["enabled"]["value"],
                    "description": intf_node.get("description", {}).get("value", "")
                }
            })
        
        return {
            "openconfig-interfaces:interfaces": {
                "interface": interfaces
            }
        }

Transform Execution

Transforms are executed by the Infrahub repository integrator:

# From infrahub/git/integrator.py
async def execute_python_transform(
    self,
    branch_name: str,
    commit: str,
    location: str,
    client: InfrahubClient,
    convert_query_response: bool,
    data: dict | None = None,
) -> Any:
    """Execute A Python Transform stored in the repository."""
    
    # Parse location (file_path::class_name)
    file_path, class_name = location.split("::") 
    
    # Import the module
    module = importlib.import_module(module_name)
    
    # Get the transform class
    transform_class = getattr(module, class_name)
    
    # Instantiate and run
    transform = transform_class(
        root_directory=commit_worktree.directory,
        branch=branch_name,
        client=client,
        convert_query_response=convert_query_response,
        infrahub_node=InfrahubNode,
    )
    
    return await transform.run(data=data)

Using with Artifacts

Combine Python transforms with artifact definitions:

python_transforms:
  - name: OCInterfaces
    class_name: OCInterfaces
    file_path: "transforms/openconfig.py"

artifact_definitions:
  - name: "Openconfig Interface for Arista devices"
    artifact_name: "openconfig-interfaces"
    parameters:
      device: "name__value"
    content_type: "application/json"
    targets: "arista_devices"
    transformation: "OCInterfaces"

When the artifact is generated:

GraphQL query executes with device parameters
Python transform processes the response
Return value is formatted based on content_type
Result is stored as an artifact

Error Handling

Implement proper error handling in transforms:

class SafeTransform(InfrahubTransform):
    query = "my_query"
    
    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        try:
            # Safe data access
            edges = data.get("MyNode", {}).get("edges", [])
            if not edges:
                return {"error": "No data found"}
            
            node = edges[0]["node"]
            return {
                "name": node["name"]["value"],
                "status": node.get("status", {}).get("value", "unknown")
            }
        except (KeyError, IndexError, TypeError) as exc:
            # Log error and return safe default
            self.log.error(f"Transform error: {exc}")
            return {"error": str(exc)}

Common error scenarios:

Missing data in query response
Unexpected data types
External API failures
Timeout exceeded

Accessing the SDK Client

Transforms have access to the Infrahub client:

class ClientTransform(InfrahubTransform):
    query = "my_query"
    
    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        # Access the client
        # self.client is an InfrahubClient instance
        
        # Make additional queries if needed
        additional_data = await self.client.query_gql_query(
            name="another_query",
            variables={"id": some_id}
        )
        
        # Combine data
        return {
            "primary": data,
            "additional": additional_data
        }

Testing Transforms

Test transforms using pytest:

import pytest
from infrahub_sdk import InfrahubClient
from infrahub.transformations.models import TransformPythonData
from infrahub.transformations.tasks import transform_python

async def test_transform_python_success(
    git_fixture_repo, init_service, prefect_test_fixture
):
    commit = git_fixture_repo.get_commit_value(branch_name="main")
    
    message = TransformPythonData(
        repository_id=str(git_fixture_repo.id),
        repository_name=git_fixture_repo.name,
        repository_kind="CoreRepository",
        commit=commit,
        branch="main",
        transform_location="transforms/my_transform.py::MyTransform",
        timeout=10,
        data={"key": "value"},
        convert_query_response=False,
    )
    
    response = await transform_python(message=message)
    assert response == {"expected": "output"}

Configuration Options

In .infrahub.yml

python_transforms:
  - name: MyTransform                    # Unique identifier
    class_name: MyTransform              # Python class name
    file_path: "transforms/my.py"        # Path to Python file
    timeout: 30                          # Optional: override class timeout
    convert_query_response: true         # Optional: enable node conversion

Transform Location Format

The transform location uses the format file_path::class_name:

transforms/openconfig.py::OCInterfaces
│                         │
│                         └─ Class name
└─ File path

Best Practices

Type hints: Use type hints for clarity and IDE support

async def transform(self, data: dict[str, Any]) -> dict[str, Any]:

Error handling: Handle missing or unexpected data gracefully

edges = data.get("MyNode", {}).get("edges", [])
if not edges:
    return {"error": "No data"}

Docstrings: Document transform purpose and behavior

async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
    """Generate OpenConfig interface configuration.
    
    Args:
        data: GraphQL response with device and interface data
        
    Returns:
        OpenConfig-formatted interface configuration
    """

Keep transforms focused: One transformation responsibility per class
Use convert_query_response: When working extensively with nodes
Set appropriate timeouts: Balance between allowing complexity and preventing hangs
Test thoroughly: Write unit tests for transform logic
Log appropriately: Use self.log for debugging information
Validate input: Check data structure before processing
Return consistent types: Match artifact content_type expectations

Transformations Overview - Understanding transformation concepts
Artifacts - Using transforms to generate artifacts
Jinja2 Templates - Alternative for simple templates
GraphQL Queries - Writing queries for transforms

Next Steps

Create your first Python transformation
Learn about Jinja2 templates for simpler use cases
Build artifacts using your transforms
Explore computed attributes with transforms

Get Started

Core Concepts

Schema & Data Modeling

Data Management

Version Control & Branching

Transformations & Artifacts

Integration & Automation

Deployment & Operations

​What are Python Transforms?

​Creating a Python Transform

​1. Define a GraphQL Query

​2. Create the Transform Class

​3. Register in .infrahub.yml

​Transform Class Structure

​Basic Structure

​Class Attributes

​Transform Method

​Data Access Patterns

​Raw GraphQL Response (Default)

​InfrahubNode Objects (Converted)

​When to Use Each Approach

​Return Types

​Dictionary (JSON/YAML)

​String

​List

​Real-World Examples

​Example 1: Simple Data Extraction

​Example 2: Markdown Generation

​Example 3: Using Converted Nodes

​Example 4: OpenConfig Interface Generation

​Transform Execution

​Using with Artifacts

​Error Handling

​Accessing the SDK Client

​Testing Transforms

​Configuration Options

​In .infrahub.yml

​Transform Location Format

​Best Practices

​Related Topics

​Next Steps

Build docs developers (and LLMs) love