The Pipeline class is the primary interface for creating and executing PDAL operations. It can be constructed from JSON strings, sequences of Stage objects, or by piping stages together.
Constructor
Pipeline(
spec: Union[None, str, Sequence[Stage]] = None,
arrays: Sequence[np.ndarray] = (),
loglevel: int = logging.ERROR,
json: Optional[str] = None,
dataframes: Sequence[DataFrame] = (),
stream_handlers: Sequence[Callable[[], int]] = ()
)
spec
Union[None, str, Sequence[Stage]]
default:"None"
Pipeline specification. Can be a JSON string or a sequence of Stage objects.
arrays
Sequence[np.ndarray]
default:"()"
Numpy arrays to use as input data for the pipeline.
loglevel
int
default:"logging.ERROR"
Logging level using Python’s logging module constants (ERROR, WARNING, INFO, DEBUG).
json
Optional[str]
default:"None"
JSON string specification (alternative to spec parameter). Cannot be used together with spec.
dataframes
Sequence[DataFrame]
default:"()"
Pandas DataFrames to use as input data. Will be converted to Numpy structured arrays.
stream_handlers
Sequence[Callable[[], int]]
default:"()"
Functions called to populate input arrays during streaming execution. Must match the number of input arrays/dataframes.
Example
import pdal
import logging
# From JSON string
pipeline = pdal.Pipeline('{"pipeline": ["input.las", {"type": "filters.sort", "dimension": "X"}]}')
# From Stage objects
pipeline = pdal.Pipeline([pdal.Reader.las("input.las"), pdal.Filter.sort(dimension="X")])
# Using pipe operator
pipeline = pdal.Reader.las("input.las") | pdal.Filter.sort(dimension="X")
# With numpy arrays as input
import numpy as np
array = np.array([(0, 0, 0)], dtype=[('X', float), ('Y', float), ('Z', float)])
pipeline = pdal.Filter.sort(dimension="X").pipeline(array)
Properties
stages
@property
stages -> List[Stage]
Returns a list of Stage objects in the pipeline.
streamable
@property
streamable -> bool
Returns True if all stages in the pipeline support streaming execution.
loglevel
@property
loglevel -> int
Gets or sets the logging level. Accepts Python logging module constants.
pipeline.loglevel = logging.INFO
arrays
Numpy structured arrays containing the point cloud data after pipeline execution. Each array represents a point view output from the pipeline.
pipeline.execute()
point_data = pipeline.arrays[0]
print(point_data['X']) # Access X coordinates
meshes
Numpy arrays containing mesh data (triangles) from stages like filters.delaunay. Each triangle is a tuple (A, B, C) of indices into the corresponding point view.
Dictionary containing metadata from the pipeline execution. This is automatically parsed from JSON.
pipeline.execute()
print(pipeline.metadata) # Access metadata as dict
log
Log output from the pipeline execution.
schema
Dictionary containing the schema information (dimensions and their types) for the point cloud data.
pipeline.execute()
print(pipeline.schema)
pipeline
JSON string representation of the pipeline configuration. This is the internal representation used by PDAL.
quickinfo
Dictionary containing quick preview information about the data source without fully reading it. Useful for inspecting file headers and metadata.
pipeline = pdal.Reader.las("input.las").pipeline()
info = pipeline.quickinfo
print(info)
srswkt2
Spatial reference system in WKT2 format.
Methods
execute
execute(allowed_dims: list = []) -> int
Executes the pipeline in standard (non-streaming) mode.
Optional list of dimension names to include in the output arrays. If empty, all dimensions are included.
Total number of points processed.
pipeline = pdal.Reader.las("input.las") | pdal.Filter.sort(dimension="X")
count = pipeline.execute()
print(f"Processed {count} points")
# Only load specific dimensions
count = pipeline.execute(allowed_dims=['X', 'Y', 'Z', 'Intensity'])
execute_streaming
execute_streaming(chunk_size: int = 10000, allowed_dims: list = []) -> int
Executes a streamable pipeline in streaming mode without allocating arrays in memory. Useful when the pipeline has Writer stages and you don’t need to access point data.
Number of points to process per chunk.
Optional list of dimension names to include. If empty, all dimensions are included.
Total number of points processed.
pipeline = pdal.Reader.las("input.las") | pdal.Writer.las("output.las")
count = pipeline.execute_streaming(chunk_size=10000)
iterator
iterator(chunk_size: int = 10000, prefetch: int = 0, allowed_dims: list = []) -> Iterator[np.ndarray]
Returns an iterator that yields Numpy arrays of up to chunk_size points at a time. Only works with streamable pipelines.
Maximum number of points per yielded array.
Number of arrays to prefetch and buffer in parallel.
Optional list of dimension names to include in yielded arrays. If empty, all dimensions are included.
Iterator yielding Numpy structured arrays.
pipeline = pdal.Reader.las("input.las") | pdal.Filter.range(limits="Intensity[100:200]")
for chunk in pipeline.iterator(chunk_size=5000):
print(f"Processing {len(chunk)} points")
# Process chunk...
# Only iterate over specific dimensions
for chunk in pipeline.iterator(chunk_size=5000, allowed_dims=['X', 'Y', 'Z']):
print(f"Processing {len(chunk)} points with X, Y, Z only")
toJSON
Serializes the pipeline to a JSON string representation.
JSON string of the pipeline configuration.
pipeline = pdal.Reader.las("input.las") | pdal.Filter.sort(dimension="X")
json_str = pipeline.toJSON()
print(json_str)
get_meshio
get_meshio(idx: int) -> Optional[Mesh]
Creates a meshio Mesh object from the point view and mesh data at the specified index. Requires the meshio package to be installed.
Index of the point view to convert.
Meshio Mesh object, or None if no mesh data exists.
import pdal
pipeline = pdal.Reader.las("input.las") | pdal.Filter.delaunay()
pipeline.execute()
mesh = pipeline.get_meshio(0)
if mesh:
mesh.write('output.obj')
get_dataframe
get_dataframe(idx: int) -> Optional[DataFrame]
Converts the point view at the specified index to a Pandas DataFrame. Requires the pandas package to be installed.
Index of the point view to convert.
Pandas DataFrame containing the point data.
pipeline.execute()
df = pipeline.get_dataframe(0)
print(df.head())
get_geodataframe
get_geodataframe(idx: int, xyz: bool = False, crs: Any = None) -> Optional[GeoDataFrame]
Converts the point view at the specified index to a GeoPandas GeoDataFrame with Point geometries. Requires the geopandas package to be installed.
Index of the point view to convert.
If True, creates 3D points including Z coordinates. Otherwise creates 2D points.
Coordinate reference system to assign to the GeoDataFrame.
GeoPandas GeoDataFrame with Point geometries.
pipeline.execute()
gdf = pipeline.get_geodataframe(0, xyz=True, crs="EPSG:4326")
Pipeline Composition
Pipelines support the pipe operator (|) for composition:
# Pipe stages together
pipeline = stage1 | stage2 | stage3
# Pipe a stage to an existing pipeline
pipeline |= new_stage
# Pipe pipelines together
combined = pipeline1 | pipeline2