Skip to main content

Quick start

This guide will walk you through creating and executing your first PDAL pipeline to process point cloud data with Python.

Your first pipeline

Let’s start with a simple example that reads a LAS file and sorts it by the X dimension.

JSON pipeline approach

You can define a pipeline using a JSON string:
import pdal

json = """
{
  "pipeline": [
    "1.2-with-color.las",
    {
        "type": "filters.sort",
        "dimension": "X"
    }
  ]
}"""

pipeline = pdal.Pipeline(json)
count = pipeline.execute()
arrays = pipeline.arrays
metadata = pipeline.metadata
log = pipeline.log

print(f"Processed {count} points")

Programmatic pipeline approach

Alternatively, you can build pipelines programmatically using Python objects and the pipe operator:
import pdal

pipeline = pdal.Reader("1.2-with-color.las") | pdal.Filter.sort(dimension="X")
count = pipeline.execute()

print(f"Processed {count} points")
Both approaches produce identical results. The programmatic approach is often more readable for complex pipelines.

Working with arrays

PDAL Python converts point cloud data into NumPy structured arrays, making it easy to work with point attributes:
import pdal

# Read point cloud data
data = "https://github.com/PDAL/PDAL/blob/master/test/data/las/1.2-with-color.las?raw=true"

pipeline = pdal.Reader.las(filename=data).pipeline()
count = pipeline.execute()
print(f"Read {count} points")  # 1065 points

# Access the array
arr = pipeline.arrays[0]
print(arr.dtype)  # Shows available dimensions: X, Y, Z, Intensity, etc.

# Filter with NumPy
intensity_filtered = arr[arr["Intensity"] > 30]
print(f"After NumPy filter: {len(intensity_filtered)} points")  # 704 points
The array is a NumPy structured array with fields for each dimension (X, Y, Z, Intensity, Classification, etc.).

Combining PDAL and NumPy

You can mix PDAL operations with NumPy processing in the same workflow:
import pdal

data = "https://github.com/PDAL/PDAL/blob/master/test/data/las/1.2-with-color.las?raw=true"

# Step 1: Read data with PDAL
pipeline = pdal.Reader.las(filename=data).pipeline()
pipeline.execute()
arr = pipeline.arrays[0]

# Step 2: Filter with NumPy
intensity = arr[arr["Intensity"] > 30]
print(f"After NumPy filter: {len(intensity)} points")  # 704 points

# Step 3: Process filtered data with PDAL
pipeline = pdal.Filter.expression(
    expression="Intensity >= 100 && Intensity < 300"
).pipeline(intensity)
pipeline.execute()
clamped = pipeline.arrays[0]
print(f"After PDAL filter: {len(clamped)} points")  # 387 points

Writing output

You can write processed point clouds to various formats:
import pdal

# Build a pipeline with a writer
pipeline = (
    pdal.Reader.las("input.las")
    | pdal.Filter.sort(dimension="X")
    | pdal.Writer.las(
        filename="output.las",
        offset_x="auto",
        offset_y="auto",
        offset_z="auto",
        scale_x=0.01,
        scale_y=0.01,
        scale_z=0.01,
    )
)

count = pipeline.execute()
print(f"Wrote {count} points")

Stage types

PDAL pipelines are built from three types of stages:

Readers

Readers load point cloud data from files or URLs:
# Explicit reader type
reader = pdal.Reader.las(filename="data.las")

# Automatic type inference from filename
reader = pdal.Reader("data.las")

# Reader with options
reader = pdal.Reader.las(
    filename="data.laz",
    spatialreference="EPSG:4326"
)

Filters

Filters transform point cloud data:
# Sort by dimension
filter1 = pdal.Filter.sort(dimension="Z")

# Filter by expression
filter2 = pdal.Filter.expression(expression="Classification == 2")

# Compute statistics
filter3 = pdal.Filter.stats()

# Chain multiple filters
pipeline = reader | filter1 | filter2 | filter3

Writers

Writers save point cloud data to files:
# LAS writer
writer1 = pdal.Writer.las(filename="output.las")

# TileDB writer
writer2 = pdal.Writer.tiledb(array_name="output_array")

# Multiple writers in one pipeline
pipeline = reader | filter1 | writer1 | writer2

Streaming large datasets

For large point clouds that don’t fit in memory, use streaming execution:
import pdal

pipeline = (
    pdal.Reader("large-file.las")
    | pdal.Filter.expression(expression="Intensity > 80 && Intensity < 120")
)

# Process in chunks of 500 points
for array in pipeline.iterator(chunk_size=500):
    print(f"Processing chunk with {len(array)} points")
    # Process each chunk...
If you don’t need to access the point data (for example, when using writers), use execute_streaming() for better performance:
pipeline = (
    pdal.Reader("input.laz")
    | pdal.Filter.expression(expression="Classification == 2")
    | pdal.Writer.las(filename="output.las")
)

# Stream processing without allocating arrays
count = pipeline.execute_streaming(chunk_size=1000000)
print(f"Processed {count} points")

Next steps

Now that you’ve created your first PDAL pipeline, explore more advanced features:

Pipeline API

Learn about all Pipeline methods and properties

Stage objects

Explore Readers, Filters, and Writers

Working with arrays

Deep dive into NumPy array operations

Streaming

Process massive datasets efficiently

Build docs developers (and LLMs) love