Skip to main content
PDAL Python seamlessly integrates with NumPy, allowing you to read point cloud data into arrays, perform operations using NumPy, and pass the results back to PDAL for further processing.

Complete workflow example

This example demonstrates the full cycle between PDAL and Python:
  1. Read a point cloud file into a NumPy array
  2. Filter the array using NumPy operations
  3. Pass the filtered array back to PDAL for additional filtering
  4. Write the final result to output files
import pdal

data = "https://github.com/PDAL/PDAL/blob/master/test/data/las/1.2-with-color.las?raw=true"

pipeline = pdal.Reader.las(filename=data).pipeline()
print(pipeline.execute())  # 1065 points

# Get the data from the first array
# [array([(637012.24, 849028.31, 431.66, 143, 1,
# 1, 1, 0, 1,  -9., 132, 7326, 245380.78254963,  68,  77,  88),
# dtype=[('X', '<f8'), ('Y', '<f8'), ('Z', '<f8'), ('Intensity', '<u2'),
# ('ReturnNumber', 'u1'), ('NumberOfReturns', 'u1'), ('ScanDirectionFlag', 'u1'),
# ('EdgeOfFlightLine', 'u1'), ('Classification', 'u1'), ('ScanAngleRank', '<f4'),
# ('UserData', 'u1'), ('PointSourceId', '<u2'),
# ('GpsTime', '<f8'), ('Red', '<u2'), ('Green', '<u2'), ('Blue', '<u2')])]
arr = pipeline.arrays[0]

# Filter out entries that have intensity < 50
intensity = arr[arr["Intensity"] > 30]
print(len(intensity))  # 704 points

# Now use pdal to clamp points that have intensity 100 <= v < 300
pipeline = pdal.Filter.expression(expression="Intensity >= 100 && Intensity < 300").pipeline(intensity)
print(pipeline.execute())  # 387 points
clamped = pipeline.arrays[0]

# Write our intensity data to a LAS file and a TileDB array. For TileDB it is
# recommended to use Hilbert ordering by default with geospatial point cloud data,
# which requires specifying a domain extent. This can be determined automatically
# from a stats filter that computes statistics about each dimension (min, max, etc.).
pipeline = pdal.Writer.las(
    filename="clamped.las",
    offset_x="auto",
    offset_y="auto",
    offset_z="auto",
    scale_x=0.01,
    scale_y=0.01,
    scale_z=0.01,
).pipeline(clamped)
pipeline |= pdal.Filter.stats() | pdal.Writer.tiledb(array_name="clamped")
print(pipeline.execute())  # 387 points

# Dump the TileDB array schema
import tiledb
with tiledb.open("clamped") as a:
    print(a.schema)

Reading data into NumPy arrays

1

Execute the pipeline

First, create and execute a pipeline to read your data:
pipeline = pdal.Reader.las(filename=data).pipeline()
count = pipeline.execute()
2

Access the arrays

After execution, access the NumPy arrays via the arrays property:
arr = pipeline.arrays[0]
The array contains structured data with fields for each dimension (X, Y, Z, Intensity, etc.).
pipeline.arrays returns a list of NumPy arrays, one for each PointView in the pipeline output. Most pipelines produce a single array.

Filtering arrays with NumPy

Once you have data in a NumPy array, you can use standard NumPy operations to filter it:
# Filter points by intensity
intensity = arr[arr["Intensity"] > 30]
print(len(intensity))  # Shows number of filtered points
NumPy’s boolean indexing makes it easy to create complex filters:
# Multiple conditions
filtered = arr[(arr["Intensity"] > 30) & (arr["Classification"] == 2)]

# Filter by spatial bounds
subset = arr[(arr["X"] > xmin) & (arr["X"] < xmax)]

Passing filtered arrays back to PDAL

You can pass NumPy arrays back to PDAL for further processing:
# Create a pipeline from a NumPy array
pipeline = pdal.Filter.expression(
    expression="Intensity >= 100 && Intensity < 300"
).pipeline(intensity)

count = pipeline.execute()
clamped = pipeline.arrays[0]
The .pipeline() method on a stage accepts a NumPy array as input, allowing you to chain Python processing with PDAL operations.

Writing filtered data

Once you’ve filtered your data, write it to various formats:
pipeline = pdal.Writer.las(
    filename="output.las",
    offset_x="auto",
    offset_y="auto",
    offset_z="auto",
    scale_x=0.01,
    scale_y=0.01,
    scale_z=0.01,
).pipeline(clamped)

count = pipeline.execute()
You can chain multiple writers to output the same data in different formats simultaneously using the pipe operator.

Build docs developers (and LLMs) love