Skip to main content
Python transforms provide a powerful, code-based approach to transforming data in Infrahub. They’re ideal for complex data manipulation, integration with external systems, and type-safe transformations with validation.

What are Python Transforms?

Python transforms are Python classes that inherit from InfrahubTransform and implement custom transformation logic. Unlike Jinja2 templates, Python transforms give you:
  • Full Python language capabilities
  • Access to external libraries and APIs
  • Complex data manipulation and business logic
  • Type safety and validation
  • Structured data output (JSON, YAML, etc.)
  • Reusable transformation logic

Creating a Python Transform

1. Define a GraphQL Query

Create a GraphQL query to retrieve the data:
# templates/person_with_cars.gql
query PersonWithTheirCars($name: String!) {
  TestingPerson(name__value: $name) {
    edges {
      node {
        id
        __typename
        name { value }
        age { value }
        cars {
          edges {
            node {
              id
              __typename
              name { value }
            }
          }
        }
      }
    }
  }
}

2. Create the Transform Class

Create a Python file with a class inheriting from InfrahubTransform:
# transforms/person_with_cars_transform.py
from typing import Any
from infrahub_sdk.transforms import InfrahubTransform

class PersonWithCarsTransform(InfrahubTransform):
    query = "person_with_cars"
    timeout = 10  # Optional, default is 10 seconds

    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        """Transform the GraphQL response data.
        
        Args:
            data: GraphQL query response as nested dictionaries
            
        Returns:
            Transformed data as dictionary
        """
        if data["TestingPerson"]["edges"]:
            return {
                "name": data["TestingPerson"]["edges"][0]["node"]["name"]["value"]
            }
        
        return {"name": None}

3. Register in .infrahub.yml

Add the query and transform to your repository configuration:
# .infrahub.yml
queries:
  - name: person_with_cars
    file_path: "templates/person_with_cars.gql"

python_transforms:
  - name: PersonWithCarsTransform
    class_name: PersonWithCarsTransform
    file_path: "transforms/person_with_cars_transform.py"

Transform Class Structure

Basic Structure

Every Python transform follows this pattern:
from typing import Any
from infrahub_sdk.transforms import InfrahubTransform

class MyTransform(InfrahubTransform):
    # Required: GraphQL query name or ID
    query = "my_query_name"
    
    # Optional: Timeout in seconds (default: 10)
    timeout = 30
    
    # Required: Transform method
    async def transform(self, data: dict[str, Any]) -> Any:
        """Implement your transformation logic here."""
        # Process data and return result
        return processed_data

Class Attributes

  • query (required): Name or ID of the GraphQL query to execute
  • timeout (optional): Maximum execution time in seconds

Transform Method

The transform() method receives the GraphQL query response and returns the transformed data:
async def transform(self, data: dict[str, Any]) -> Any:
    """Transform method signature.
    
    Args:
        data: GraphQL query response as nested dictionaries
        
    Returns:
        Transformed data - can be dict, str, list, etc.
    """

Data Access Patterns

Raw GraphQL Response (Default)

By default, transforms receive raw GraphQL responses as nested dictionaries:
class CarSpecMarkdown(InfrahubTransform):
    query = "person_with_cars"
    
    async def transform(self, data: dict[str, Any]) -> str:
        # Access nested dictionary structure
        person_name = data["TestingPerson"]["edges"][0]["node"]["name"]["value"]
        
        markdown = f"# {person_name}\n\n"
        
        for car in data["TestingPerson"]["edges"][0]["node"]["cars"]["edges"]:
            car_name = car["node"]["name"]["value"]
            markdown += f"- {car_name}\n"
        
        return markdown

InfrahubNode Objects (Converted)

Set convert_query_response: true to receive InfrahubNode objects with direct attribute access:
python_transforms:
  - name: ConvertedPersonWithCarsTransform
    class_name: ConvertedPersonWith
    file_path: "transforms/converted_person_with_cars.py"
    convert_query_response: true  # Enable conversion
class ConvertedPersonWith(InfrahubTransform):
    query = "person_with_cars"

    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        # Get node ID from response
        node_id = data["TestingPerson"]["edges"][0]["node"]["id"]
        
        # Access converted InfrahubNode object from store
        person = self.store.get(key=node_id, kind="TestingPerson")
        
        # Direct attribute access (cleaner than nested dicts)
        return {
            "name": person.name.value,
            "age": person.age.value
        }

When to Use Each Approach

Raw Response (default):
  • Simple data extraction
  • Working with query results directly
  • No need for node operations
Converted Nodes (convert_query_response: true):
  • Complex node operations
  • Need to access relationships
  • Type-safe attribute access
  • Working with InfrahubNode methods

Return Types

Transforms can return various data types:

Dictionary (JSON/YAML)

async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
    return {
        "name": "John",
        "age": 30,
        "cars": ["Toyota", "Honda"]
    }
Useful for:
  • JSON artifacts (content_type: application/json)
  • YAML artifacts (content_type: application/yaml)
  • Structured data output

String

async def transform(self, data: dict[str, Any]) -> str:
    markdown = """
    ## Car Specification
    
    **blue** Sedan
    Make: **Toyota**
    Model: **Camry**
    """
    return markdown
Useful for:
  • Markdown artifacts (content_type: text/markdown)
  • Plain text artifacts (content_type: text/plain)
  • Configuration files

List

async def transform(self, data: dict[str, Any]) -> list[dict[str, Any]]:
    return [
        {"name": "Item 1", "value": 100},
        {"name": "Item 2", "value": 200}
    ]

Real-World Examples

Example 1: Simple Data Extraction

from typing import Any
from infrahub_sdk.transforms import InfrahubTransform

class PersonWithCarsTransform(InfrahubTransform):
    query = "person_with_cars"
    timeout = 2

    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        """Extract person's name from query response."""
        if data["TestingPerson"]["edges"]:
            return {
                "name": data["TestingPerson"]["edges"][0]["node"]["name"]["value"]
            }
        
        return {"name": None}

Example 2: Markdown Generation

from typing import Any
from infrahub_sdk.transforms import InfrahubTransform

class CarSpecMarkdown(InfrahubTransform):
    query = "person_with_cars"
    timeout = 10

    async def transform(self, data: dict[str, Any]) -> str:
        """Generate markdown documentation for car specifications."""
        markdown = """
        ## Car Specification

        **blue** Sedan
        Make: **Toyota**
        Model: **Camry**
        """
        return markdown

Example 3: Using Converted Nodes

from typing import Any
from infrahub_sdk.transforms import InfrahubTransform

class ConvertedPersonWith(InfrahubTransform):
    query = "person_with_cars"

    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        """Use InfrahubNode objects for cleaner data access."""
        node_id = data["TestingPerson"]["edges"][0]["node"]["id"]
        
        # Get InfrahubNode from store
        person = self.store.get(key=node_id, kind="TestingPerson")
        
        # Clean attribute access
        return {
            "name": person.name.value,
            "age": person.age.value
        }

Example 4: OpenConfig Interface Generation

from typing import Any
from infrahub_sdk.transforms import InfrahubTransform

class OCInterfaces(InfrahubTransform):
    query = "oc_interfaces"
    timeout = 10

    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        """Generate OpenConfig-formatted interface configuration."""
        device = data["InfraDevice"]["edges"][0]["node"]
        
        interfaces = []
        for intf in device["interfaces"]["edges"]:
            intf_node = intf["node"]
            interfaces.append({
                "name": intf_node["name"]["value"],
                "config": {
                    "name": intf_node["name"]["value"],
                    "enabled": intf_node["enabled"]["value"],
                    "description": intf_node.get("description", {}).get("value", "")
                }
            })
        
        return {
            "openconfig-interfaces:interfaces": {
                "interface": interfaces
            }
        }

Transform Execution

Transforms are executed by the Infrahub repository integrator:
# From infrahub/git/integrator.py
async def execute_python_transform(
    self,
    branch_name: str,
    commit: str,
    location: str,
    client: InfrahubClient,
    convert_query_response: bool,
    data: dict | None = None,
) -> Any:
    """Execute A Python Transform stored in the repository."""
    
    # Parse location (file_path::class_name)
    file_path, class_name = location.split("::") 
    
    # Import the module
    module = importlib.import_module(module_name)
    
    # Get the transform class
    transform_class = getattr(module, class_name)
    
    # Instantiate and run
    transform = transform_class(
        root_directory=commit_worktree.directory,
        branch=branch_name,
        client=client,
        convert_query_response=convert_query_response,
        infrahub_node=InfrahubNode,
    )
    
    return await transform.run(data=data)

Using with Artifacts

Combine Python transforms with artifact definitions:
python_transforms:
  - name: OCInterfaces
    class_name: OCInterfaces
    file_path: "transforms/openconfig.py"

artifact_definitions:
  - name: "Openconfig Interface for Arista devices"
    artifact_name: "openconfig-interfaces"
    parameters:
      device: "name__value"
    content_type: "application/json"
    targets: "arista_devices"
    transformation: "OCInterfaces"
When the artifact is generated:
  1. GraphQL query executes with device parameters
  2. Python transform processes the response
  3. Return value is formatted based on content_type
  4. Result is stored as an artifact

Error Handling

Implement proper error handling in transforms:
class SafeTransform(InfrahubTransform):
    query = "my_query"
    
    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        try:
            # Safe data access
            edges = data.get("MyNode", {}).get("edges", [])
            if not edges:
                return {"error": "No data found"}
            
            node = edges[0]["node"]
            return {
                "name": node["name"]["value"],
                "status": node.get("status", {}).get("value", "unknown")
            }
        except (KeyError, IndexError, TypeError) as exc:
            # Log error and return safe default
            self.log.error(f"Transform error: {exc}")
            return {"error": str(exc)}
Common error scenarios:
  • Missing data in query response
  • Unexpected data types
  • External API failures
  • Timeout exceeded

Accessing the SDK Client

Transforms have access to the Infrahub client:
class ClientTransform(InfrahubTransform):
    query = "my_query"
    
    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        # Access the client
        # self.client is an InfrahubClient instance
        
        # Make additional queries if needed
        additional_data = await self.client.query_gql_query(
            name="another_query",
            variables={"id": some_id}
        )
        
        # Combine data
        return {
            "primary": data,
            "additional": additional_data
        }

Testing Transforms

Test transforms using pytest:
import pytest
from infrahub_sdk import InfrahubClient
from infrahub.transformations.models import TransformPythonData
from infrahub.transformations.tasks import transform_python

async def test_transform_python_success(
    git_fixture_repo, init_service, prefect_test_fixture
):
    commit = git_fixture_repo.get_commit_value(branch_name="main")
    
    message = TransformPythonData(
        repository_id=str(git_fixture_repo.id),
        repository_name=git_fixture_repo.name,
        repository_kind="CoreRepository",
        commit=commit,
        branch="main",
        transform_location="transforms/my_transform.py::MyTransform",
        timeout=10,
        data={"key": "value"},
        convert_query_response=False,
    )
    
    response = await transform_python(message=message)
    assert response == {"expected": "output"}

Configuration Options

In .infrahub.yml

python_transforms:
  - name: MyTransform                    # Unique identifier
    class_name: MyTransform              # Python class name
    file_path: "transforms/my.py"        # Path to Python file
    timeout: 30                          # Optional: override class timeout
    convert_query_response: true         # Optional: enable node conversion

Transform Location Format

The transform location uses the format file_path::class_name:
transforms/openconfig.py::OCInterfaces
│                         │
│                         └─ Class name
└─ File path

Best Practices

  1. Type hints: Use type hints for clarity and IDE support
    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
    
  2. Error handling: Handle missing or unexpected data gracefully
    edges = data.get("MyNode", {}).get("edges", [])
    if not edges:
        return {"error": "No data"}
    
  3. Docstrings: Document transform purpose and behavior
    async def transform(self, data: dict[str, Any]) -> dict[str, Any]:
        """Generate OpenConfig interface configuration.
        
        Args:
            data: GraphQL response with device and interface data
            
        Returns:
            OpenConfig-formatted interface configuration
        """
    
  4. Keep transforms focused: One transformation responsibility per class
  5. Use convert_query_response: When working extensively with nodes
  6. Set appropriate timeouts: Balance between allowing complexity and preventing hangs
  7. Test thoroughly: Write unit tests for transform logic
  8. Log appropriately: Use self.log for debugging information
  9. Validate input: Check data structure before processing
  10. Return consistent types: Match artifact content_type expectations

Next Steps

Build docs developers (and LLMs) love