Skip to main content
Direct Memory Access (DMA) allows hardware devices to transfer data to and from system memory without CPU involvement. The Linux DMA API (include/linux/dma-mapping.h) provides a bus-independent interface that handles IOMMU translation, cache coherency, and bounce buffers automatically.

Address spaces

Understanding the three address spaces is critical for correct DMA use:
Address typeC typeDescription
Virtual addressvoid *CPU virtual address returned by kmalloc(), vmalloc(), etc.
Physical addressphys_addr_tCPU physical address; visible in /proc/iomem. Not directly usable by devices.
DMA (bus) addressdma_addr_tThe address a device places on the bus. May differ from physical via an IOMMU mapping.
  CPU Virtual    CPU Physical     Bus/DMA
  Address        Address          Address
  Space          Space            Space

  +-------+      +------+        +------+
  |       |      |MMIO  |        |      |
  |  (C)  +----->| (B)  +------->| (A)  |  device registers
  |       | vmap |      | bridge |      |
  +-------+      +------+        +------+
  |       |      |Buffer|        |      |
  |  (X)  +----->| (Y)  |<-------+ (Z)  |  DMA buffer
  |       | pmap |      |  IOMMU |      |
  +-------+      +------+        +------+
A driver calls dma_map_single(dev, X, size, dir) which sets up the IOMMU mapping and returns Z — the address to program into the device’s DMA descriptor.
Memory returned by vmalloc(), kernel image addresses (.bss, .data, .text), and stack addresses are not suitable for DMA. Use memory from kmalloc(), kzalloc(), or the page allocator, which guarantees physical contiguity for single-page allocations.

Required include

#include <linux/dma-mapping.h>  /* dma_addr_t, all dma_map_* functions */
#include <linux/dmapool.h>      /* struct dma_pool, dma_pool_create */

DMA addressing limitations

Set the DMA mask before performing any mappings. The mask is a bitmask of the address bits the device can access.
/*
 * dma_set_mask_and_coherent - set both streaming and coherent DMA masks.
 * Returns 0 on success, negative error if the mask cannot be satisfied.
 */
int dma_set_mask_and_coherent(struct device *dev, u64 mask);

/* Set only the streaming DMA mask */
int dma_set_mask(struct device *dev, u64 mask);

/* Set only the coherent (alloc_coherent) mask */
int dma_set_coherent_mask(struct device *dev, u64 mask);

/* Query the minimum mask required to cover all RAM on this platform */
u64 dma_get_required_mask(struct device *dev);
Typical probe-time setup:
static int mydrv_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
    int ret;

    /* Prefer 64-bit DMA; fall back to 32-bit */
    ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
    if (ret) {
        ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
        if (ret) {
            dev_err(&pdev->dev, "no suitable DMA mask available\n");
            return ret;
        }
        dev_warn(&pdev->dev, "using 32-bit DMA\n");
    }
    return 0;
}
Always check the return value of dma_set_mask_and_coherent(). Skipping this check means mappings may silently fail later on platforms with restricted DMA address ranges.

Coherent DMA mappings

Coherent memory is simultaneously accessible by both the CPU and the device without explicit cache flushes. It is appropriate for long-lived control structures such as descriptor rings, completion queues, and command buffers.

Allocation and release

/*
 * dma_alloc_coherent - allocate coherent DMA memory.
 *
 * @dev:        device for which memory is allocated
 * @size:       allocation size in bytes (will be rounded up to page size)
 * @dma_handle: output — DMA address to give to the device
 * @flag:       GFP_* allocation flags (GFP_DMA is ignored)
 *
 * Returns: CPU virtual address, or NULL on failure.
 */
void *dma_alloc_coherent(struct device *dev, size_t size,
                         dma_addr_t *dma_handle, gfp_t flag);

/*
 * dma_free_coherent - free a coherent DMA allocation.
 * All parameters must match those passed to dma_alloc_coherent().
 * Must be called with IRQs enabled.
 */
void dma_free_coherent(struct device *dev, size_t size,
                       void *cpu_addr, dma_addr_t dma_handle);
Example — allocating a descriptor ring:
struct mydrv_priv {
    void          *desc_ring_cpu;  /* CPU-accessible virtual address */
    dma_addr_t     desc_ring_dma; /* DMA address for the device */
    size_t         desc_ring_size;
};

static int mydrv_alloc_rings(struct mydrv_priv *priv, struct device *dev)
{
    priv->desc_ring_size = 256 * sizeof(struct mydrv_desc);

    priv->desc_ring_cpu = dma_alloc_coherent(dev, priv->desc_ring_size,
                                             &priv->desc_ring_dma,
                                             GFP_KERNEL);
    if (!priv->desc_ring_cpu)
        return -ENOMEM;

    /* Program the device with the DMA address */
    writel((u32)priv->desc_ring_dma, priv->base + REG_DESC_BASE_LO);
    writel((u32)(priv->desc_ring_dma >> 32), priv->base + REG_DESC_BASE_HI);
    return 0;
}

static void mydrv_free_rings(struct mydrv_priv *priv, struct device *dev)
{
    dma_free_coherent(dev, priv->desc_ring_size,
                      priv->desc_ring_cpu, priv->desc_ring_dma);
}
Coherent memory can be expensive — the minimum allocation is one page. Consolidate small allocations using DMA pools (see below) rather than calling dma_alloc_coherent() for each small structure.

DMA pools

DMA pools are analogous to kmem_cache but allocate from coherent DMA memory. They are ideal for drivers that need many small DMA-coherent allocations (e.g., per-packet descriptors).
#include <linux/dmapool.h>

/* Create a pool */
struct dma_pool *dma_pool_create(const char *name, struct device *dev,
                                 size_t size,    /* object size */
                                 size_t align,   /* alignment (power of 2) */
                                 size_t alloc);  /* boundary — objects won't
                                                    straddle this boundary; 0 = none */

/* Allocate from pool */
void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
                     dma_addr_t *handle);

void *dma_pool_zalloc(struct dma_pool *pool, gfp_t mem_flags,
                      dma_addr_t *handle);

/* Free back to pool */
void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t addr);

/* Destroy pool (all allocations must have been freed first) */
void dma_pool_destroy(struct dma_pool *pool);
static int mydrv_init_pool(struct mydrv_priv *priv, struct device *dev)
{
    /* 64-byte descriptors aligned to 64 bytes, not crossing 4K boundaries */
    priv->desc_pool = dma_pool_create("mydrv_desc", dev,
                                      sizeof(struct mydrv_desc), 64, 4096);
    if (!priv->desc_pool)
        return -ENOMEM;
    return 0;
}

static struct mydrv_desc *mydrv_alloc_desc(struct mydrv_priv *priv,
                                           dma_addr_t *dma_addr)
{
    return dma_pool_zalloc(priv->desc_pool, GFP_ATOMIC, dma_addr);
}

Streaming DMA mappings

Streaming mappings are for one-time or short-lived transfers: individual packets, disk I/O buffers, user-space bounce buffers. Unlike coherent memory, streaming mappings require explicit synchronization when ownership is transferred between CPU and device.

Data direction

enum dma_data_direction {
    DMA_BIDIRECTIONAL = 0,  /* driver isn't sure; sync in both directions */
    DMA_TO_DEVICE     = 1,  /* CPU → device (write) */
    DMA_FROM_DEVICE   = 2,  /* device → CPU (read) */
    DMA_NONE          = 3,  /* debugging only */
};

Single-buffer mapping

/*
 * dma_map_single - map a kernel virtual address for DMA.
 *
 * The buffer must be from kmalloc/kzalloc (physically contiguous).
 * Returns the DMA address; check with dma_mapping_error().
 */
dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
                          size_t size, enum dma_data_direction direction);

/*
 * dma_unmap_single - unmap after DMA is complete.
 * All parameters must match those given to dma_map_single().
 */
void dma_unmap_single(struct device *dev, dma_addr_t dma_addr,
                      size_t size, enum dma_data_direction direction);

/* Always check for mapping failure */
int dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
Example — mapping a transmit buffer:
static int mydrv_send(struct mydrv_priv *priv, void *data, size_t len)
{
    dma_addr_t dma_addr;

    dma_addr = dma_map_single(priv->dev, data, len, DMA_TO_DEVICE);
    if (dma_mapping_error(priv->dev, dma_addr)) {
        dev_err(priv->dev, "DMA mapping failed\n");
        return -ENOMEM;
    }

    /* Program descriptor */
    mydrv_post_tx_desc(priv, dma_addr, len);

    /* dma_unmap_single() called from TX completion interrupt handler */
    return 0;
}

Synchronization

When using streaming mappings with DMA_BIDIRECTIONAL or when the CPU needs to read data written by the device, explicit synchronization is required:
/* Transfer ownership from device to CPU (before CPU reads) */
void dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
                              size_t size, enum dma_data_direction direction);

/* Transfer ownership back to device (before device reads/writes) */
void dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle,
                                 size_t size, enum dma_data_direction direction);

Scatter-gather mappings

Scatter-gather (SG) allows a single DMA transfer to span multiple non-contiguous memory buffers. The IOMMU can merge adjacent entries, so the returned count of mapped segments may be smaller than the input count.
/*
 * dma_map_sg - map a scatter/gather list.
 *
 * @nents: number of entries in sglist
 * Returns: number of DMA segments mapped (≤ nents); 0 on failure.
 *
 * On failure the driver must abort the request.
 */
int dma_map_sg(struct device *dev, struct scatterlist *sg,
               int nents, enum dma_data_direction direction);

/*
 * dma_unmap_sg - unmap a scatter/gather list.
 * nents must be the original count, not the mapped count.
 */
void dma_unmap_sg(struct device *dev, struct scatterlist *sg,
                  int nents, enum dma_data_direction direction);
Iterating mapped segments:
static int mydrv_map_and_post(struct mydrv_priv *priv,
                              struct scatterlist *sglist, int nents)
{
    struct scatterlist *sg;
    int count, i;

    count = dma_map_sg(priv->dev, sglist, nents, DMA_TO_DEVICE);
    if (!count) {
        dev_err(priv->dev, "dma_map_sg failed\n");
        return -ENOMEM;
    }

    /*
     * count may be less than nents if the IOMMU merged consecutive
     * entries. Always iterate the returned count, not nents.
     */
    for_each_sg(sglist, sg, count, i) {
        dma_addr_t addr = sg_dma_address(sg);
        unsigned int len  = sg_dma_len(sg);

        mydrv_post_sg_desc(priv, addr, len, i == count - 1);
    }

    /* dma_unmap_sg(priv->dev, sglist, nents, DMA_TO_DEVICE)
     * called from completion path with the *original* nents */
    return 0;
}

Scatter-gather sync helpers

void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
                          int nents, enum dma_data_direction direction);

void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
                             int nents, enum dma_data_direction direction);

Page mapping

For mapping an individual struct page (e.g., from alloc_page()):
dma_addr_t dma_map_page(struct device *dev, struct page *page,
                        unsigned long offset, size_t size,
                        enum dma_data_direction direction);

void dma_unmap_page(struct device *dev, dma_addr_t dma_address,
                    size_t size, enum dma_data_direction direction);

IOMMU considerations

When an IOMMU is present, dma_map_* functions create a mapping in the IOMMU page tables, returning an IOVA (I/O Virtual Address) that the device uses. Key behaviors:
Streaming mappings can fail even when the requested memory region is valid — for example, when IOVA space is exhausted or IOMMU hardware limits are reached. Always check dma_mapping_error() after every call to dma_map_single() or dma_map_page(), and check the return value of dma_map_sg() for zero.
An IOMMU can merge physically discontiguous scatter-gather entries into fewer IOVA segments. The dma_map_sg() return value is the number of device-visible DMA segments, which may be less than the number of input scatter-gather entries. Always iterate the return count, never the original nents.
On platforms without an IOMMU (or when the DMA mask is too small), the kernel may use a software IOMMU translation lookaside buffer (SWIOTLB). Data is copied to/from a fixed low-memory bounce region. This is transparent to the driver but adds latency and contention on the bounce buffer.
The maximum size of a single mapping varies by IOMMU implementation. Query it at probe time and clamp request sizes accordingly:
size_t max = dma_max_mapping_size(dev);
Similarly, dma_opt_mapping_size(dev) returns the optimal single-mapping size beyond which performance may degrade.

DMA mapping rules summary

Only map memory allocated via kmalloc(), kzalloc(), get_free_pages(), or similar page-allocator functions. Never use vmalloc() addresses, kernel image addresses, or stack.
Streaming-mapped regions must begin and end on cache-line boundaries to avoid cache-line sharing between independently-mapped regions. Use page-aligned buffers when in doubt.
After dma_map_single(), the device owns the buffer. Do not access it from the CPU until after dma_sync_single_for_cpu() or dma_unmap_single(). After dma_sync_single_for_device(), the device owns it again.
dma_unmap_single(), dma_unmap_sg(), and dma_free_coherent() require the exact parameters that were passed at map/allocation time. Stash them in your per-device private structure.
Pass the original nents to dma_unmap_sg(), not the mapped segment count returned by dma_map_sg().

Build docs developers (and LLMs) love