Pipeline

The Pipeline class is the primary interface for creating and executing PDAL operations. It can be constructed from JSON strings, sequences of Stage objects, or by piping stages together.

Constructor

Pipeline(
    spec: Union[None, str, Sequence[Stage]] = None,
    arrays: Sequence[np.ndarray] = (),
    loglevel: int = logging.ERROR,
    json: Optional[str] = None,
    dataframes: Sequence[DataFrame] = (),
    stream_handlers: Sequence[Callable[[], int]] = ()
)

spec

Union[None, str, Sequence[Stage]]

default:"None"

Pipeline specification. Can be a JSON string or a sequence of Stage objects.

arrays

Sequence[np.ndarray]

default:"()"

Numpy arrays to use as input data for the pipeline.

loglevel

int

default:"logging.ERROR"

Logging level using Python’s logging module constants (ERROR, WARNING, INFO, DEBUG).

json

Optional[str]

default:"None"

JSON string specification (alternative to spec parameter). Cannot be used together with spec.

dataframes

Sequence[DataFrame]

default:"()"

Pandas DataFrames to use as input data. Will be converted to Numpy structured arrays.

stream_handlers

Sequence[Callable[[], int]]

default:"()"

Functions called to populate input arrays during streaming execution. Must match the number of input arrays/dataframes.

Example

import pdal
import logging

# From JSON string
pipeline = pdal.Pipeline('{"pipeline": ["input.las", {"type": "filters.sort", "dimension": "X"}]}')

# From Stage objects
pipeline = pdal.Pipeline([pdal.Reader.las("input.las"), pdal.Filter.sort(dimension="X")])

# Using pipe operator
pipeline = pdal.Reader.las("input.las") | pdal.Filter.sort(dimension="X")

# With numpy arrays as input
import numpy as np
array = np.array([(0, 0, 0)], dtype=[('X', float), ('Y', float), ('Z', float)])
pipeline = pdal.Filter.sort(dimension="X").pipeline(array)

Properties

stages

@property
stages -> List[Stage]

Returns a list of Stage objects in the pipeline.

streamable

@property
streamable -> bool

Returns True if all stages in the pipeline support streaming execution.

loglevel

@property
loglevel -> int

Gets or sets the logging level. Accepts Python logging module constants.

pipeline.loglevel = logging.INFO

arrays

arrays: List[np.ndarray]

Numpy structured arrays containing the point cloud data after pipeline execution. Each array represents a point view output from the pipeline.

pipeline.execute()
point_data = pipeline.arrays[0]
print(point_data['X'])  # Access X coordinates

meshes

meshes: List[np.ndarray]

Numpy arrays containing mesh data (triangles) from stages like filters.delaunay. Each triangle is a tuple (A, B, C) of indices into the corresponding point view.

metadata

metadata: dict

Dictionary containing metadata from the pipeline execution. This is automatically parsed from JSON.

pipeline.execute()
print(pipeline.metadata)  # Access metadata as dict

log

log: str

Log output from the pipeline execution.

schema

schema: dict

Dictionary containing the schema information (dimensions and their types) for the point cloud data.

pipeline.execute()
print(pipeline.schema)

pipeline: str

JSON string representation of the pipeline configuration. This is the internal representation used by PDAL.

quickinfo

quickinfo: dict

Dictionary containing quick preview information about the data source without fully reading it. Useful for inspecting file headers and metadata.

pipeline = pdal.Reader.las("input.las").pipeline()
info = pipeline.quickinfo
print(info)

srswkt2

srswkt2: str

Spatial reference system in WKT2 format.

Methods

execute

execute(allowed_dims: list = []) -> int

Executes the pipeline in standard (non-streaming) mode.

allowed_dims

list

default:"[]"

Optional list of dimension names to include in the output arrays. If empty, all dimensions are included.

return

int

Total number of points processed.

pipeline = pdal.Reader.las("input.las") | pdal.Filter.sort(dimension="X")
count = pipeline.execute()
print(f"Processed {count} points")

# Only load specific dimensions
count = pipeline.execute(allowed_dims=['X', 'Y', 'Z', 'Intensity'])

execute_streaming

execute_streaming(chunk_size: int = 10000, allowed_dims: list = []) -> int

Executes a streamable pipeline in streaming mode without allocating arrays in memory. Useful when the pipeline has Writer stages and you don’t need to access point data.

chunk_size

int

default:"10000"

Number of points to process per chunk.

allowed_dims

list

default:"[]"

Optional list of dimension names to include. If empty, all dimensions are included.

return

int

Total number of points processed.

pipeline = pdal.Reader.las("input.las") | pdal.Writer.las("output.las")
count = pipeline.execute_streaming(chunk_size=10000)

iterator

iterator(chunk_size: int = 10000, prefetch: int = 0, allowed_dims: list = []) -> Iterator[np.ndarray]

Returns an iterator that yields Numpy arrays of up to chunk_size points at a time. Only works with streamable pipelines.

chunk_size

int

default:"10000"

Maximum number of points per yielded array.

prefetch

int

default:"0"

Number of arrays to prefetch and buffer in parallel.

allowed_dims

list

default:"[]"

Optional list of dimension names to include in yielded arrays. If empty, all dimensions are included.

return

Iterator[np.ndarray]

Iterator yielding Numpy structured arrays.

pipeline = pdal.Reader.las("input.las") | pdal.Filter.range(limits="Intensity[100:200]")
for chunk in pipeline.iterator(chunk_size=5000):
    print(f"Processing {len(chunk)} points")
    # Process chunk...

# Only iterate over specific dimensions
for chunk in pipeline.iterator(chunk_size=5000, allowed_dims=['X', 'Y', 'Z']):
    print(f"Processing {len(chunk)} points with X, Y, Z only")

toJSON

toJSON() -> str

Serializes the pipeline to a JSON string representation.

return

str

JSON string of the pipeline configuration.

pipeline = pdal.Reader.las("input.las") | pdal.Filter.sort(dimension="X")
json_str = pipeline.toJSON()
print(json_str)

get_meshio

get_meshio(idx: int) -> Optional[Mesh]

Creates a meshio Mesh object from the point view and mesh data at the specified index. Requires the meshio package to be installed.

idx

int

Index of the point view to convert.

return

Optional[Mesh]

Meshio Mesh object, or None if no mesh data exists.

import pdal

pipeline = pdal.Reader.las("input.las") | pdal.Filter.delaunay()
pipeline.execute()

mesh = pipeline.get_meshio(0)
if mesh:
    mesh.write('output.obj')

get_dataframe

get_dataframe(idx: int) -> Optional[DataFrame]

Converts the point view at the specified index to a Pandas DataFrame. Requires the pandas package to be installed.

idx

int

Index of the point view to convert.

return

Optional[DataFrame]

Pandas DataFrame containing the point data.

pipeline.execute()
df = pipeline.get_dataframe(0)
print(df.head())

get_geodataframe

get_geodataframe(idx: int, xyz: bool = False, crs: Any = None) -> Optional[GeoDataFrame]

Converts the point view at the specified index to a GeoPandas GeoDataFrame with Point geometries. Requires the geopandas package to be installed.

idx

int

Index of the point view to convert.

xyz

bool

default:"False"

If True, creates 3D points including Z coordinates. Otherwise creates 2D points.

crs

Any

default:"None"

Coordinate reference system to assign to the GeoDataFrame.

return

Optional[GeoDataFrame]

GeoPandas GeoDataFrame with Point geometries.

pipeline.execute()
gdf = pipeline.get_geodataframe(0, xyz=True, crs="EPSG:4326")

Pipeline Composition

Pipelines support the pipe operator (|) for composition:

# Pipe stages together
pipeline = stage1 | stage2 | stage3

# Pipe a stage to an existing pipeline
pipeline |= new_stage

# Pipe pipelines together
combined = pipeline1 | pipeline2

Core API

Utilities

Pipeline

Constructor

Example

Properties

stages

streamable

loglevel

arrays

meshes

metadata

log

schema

pipeline

quickinfo

srswkt2

Methods

execute

execute_streaming

iterator

toJSON

get_meshio

get_dataframe

get_geodataframe

Pipeline Composition

Build docs developers (and LLMs) love

Core API

Utilities

​Constructor

​Example

​Properties

​stages

​streamable

​loglevel

​arrays

​meshes

​metadata

​log

​schema

​pipeline

​quickinfo

​srswkt2

​Methods

​execute

​execute_streaming

​iterator

​toJSON

​get_meshio

​get_dataframe

​get_geodataframe

​Pipeline Composition

Build docs developers (and LLMs) love

Constructor

Example

Properties

stages

streamable

loglevel

arrays

meshes

metadata

log

schema

pipeline

quickinfo

srswkt2

Methods

execute

execute_streaming

iterator

toJSON

get_meshio

get_dataframe

get_geodataframe

Pipeline Composition