include/linux/dma-mapping.h) provides a bus-independent interface that handles IOMMU translation, cache coherency, and bounce buffers automatically.
Address spaces
Understanding the three address spaces is critical for correct DMA use:| Address type | C type | Description |
|---|---|---|
| Virtual address | void * | CPU virtual address returned by kmalloc(), vmalloc(), etc. |
| Physical address | phys_addr_t | CPU physical address; visible in /proc/iomem. Not directly usable by devices. |
| DMA (bus) address | dma_addr_t | The address a device places on the bus. May differ from physical via an IOMMU mapping. |
dma_map_single(dev, X, size, dir) which sets up the IOMMU mapping and returns Z — the address to program into the device’s DMA descriptor.
Memory returned by
vmalloc(), kernel image addresses (.bss, .data, .text), and stack addresses are not suitable for DMA. Use memory from kmalloc(), kzalloc(), or the page allocator, which guarantees physical contiguity for single-page allocations.Required include
DMA addressing limitations
Set the DMA mask before performing any mappings. The mask is a bitmask of the address bits the device can access.Coherent DMA mappings
Coherent memory is simultaneously accessible by both the CPU and the device without explicit cache flushes. It is appropriate for long-lived control structures such as descriptor rings, completion queues, and command buffers.Allocation and release
Coherent memory can be expensive — the minimum allocation is one page. Consolidate small allocations using DMA pools (see below) rather than calling
dma_alloc_coherent() for each small structure.DMA pools
DMA pools are analogous tokmem_cache but allocate from coherent DMA memory. They are ideal for drivers that need many small DMA-coherent allocations (e.g., per-packet descriptors).
Streaming DMA mappings
Streaming mappings are for one-time or short-lived transfers: individual packets, disk I/O buffers, user-space bounce buffers. Unlike coherent memory, streaming mappings require explicit synchronization when ownership is transferred between CPU and device.Data direction
Single-buffer mapping
Synchronization
When using streaming mappings withDMA_BIDIRECTIONAL or when the CPU needs to read data written by the device, explicit synchronization is required:
Scatter-gather mappings
Scatter-gather (SG) allows a single DMA transfer to span multiple non-contiguous memory buffers. The IOMMU can merge adjacent entries, so the returned count of mapped segments may be smaller than the input count.Scatter-gather sync helpers
Page mapping
For mapping an individualstruct page (e.g., from alloc_page()):
IOMMU considerations
When an IOMMU is present,dma_map_* functions create a mapping in the IOMMU page tables, returning an IOVA (I/O Virtual Address) that the device uses. Key behaviors:
Mapping may fail
Mapping may fail
Streaming mappings can fail even when the requested memory region is valid — for example, when IOVA space is exhausted or IOMMU hardware limits are reached. Always check
dma_mapping_error() after every call to dma_map_single() or dma_map_page(), and check the return value of dma_map_sg() for zero.Segment merging
Segment merging
An IOMMU can merge physically discontiguous scatter-gather entries into fewer IOVA segments. The
dma_map_sg() return value is the number of device-visible DMA segments, which may be less than the number of input scatter-gather entries. Always iterate the return count, never the original nents.Bounce buffers
Bounce buffers
On platforms without an IOMMU (or when the DMA mask is too small), the kernel may use a software IOMMU translation lookaside buffer (SWIOTLB). Data is copied to/from a fixed low-memory bounce region. This is transparent to the driver but adds latency and contention on the bounce buffer.
dma_max_mapping_size()
dma_max_mapping_size()
The maximum size of a single mapping varies by IOMMU implementation. Query it at probe time and clamp request sizes accordingly:Similarly,
dma_opt_mapping_size(dev) returns the optimal single-mapping size beyond which performance may degrade.DMA mapping rules summary
Rule 1: Only DMA-safe memory
Rule 1: Only DMA-safe memory
Only map memory allocated via
kmalloc(), kzalloc(), get_free_pages(), or similar page-allocator functions. Never use vmalloc() addresses, kernel image addresses, or stack.Rule 2: Coherence granularity
Rule 2: Coherence granularity
Streaming-mapped regions must begin and end on cache-line boundaries to avoid cache-line sharing between independently-mapped regions. Use page-aligned buffers when in doubt.
Rule 3: Ownership discipline
Rule 3: Ownership discipline
After
dma_map_single(), the device owns the buffer. Do not access it from the CPU until after dma_sync_single_for_cpu() or dma_unmap_single(). After dma_sync_single_for_device(), the device owns it again.Rule 4: Match map/unmap parameters
Rule 4: Match map/unmap parameters
dma_unmap_single(), dma_unmap_sg(), and dma_free_coherent() require the exact parameters that were passed at map/allocation time. Stash them in your per-device private structure.Rule 5: sg nents at unmap
Rule 5: sg nents at unmap
Pass the original
nents to dma_unmap_sg(), not the mapped segment count returned by dma_map_sg().