RFC 8878
Zstandard’s format is stable and documented in RFC 8878. This IETF standard defines the Zstandard Compression Format as an independent specification suitable for file compression, pipe and streaming compression. Multiple independent implementations are already available based on this specification.Format Documentation
The detailed format specification is maintained in the Zstandard repository:- Current Version: 0.4.4 (2025-03-22)
- Full Specification: Available in the repository at
doc/zstd_compression_format.md
Key Format Features
Frames
Zstandard compressed data is made of one or more frames. Each frame is independent and can be decompressed independently of other frames. The decompressed content of multiple concatenated frames is the concatenation of each frame decompressed content. There are two frame formats defined by Zstandard:- Zstandard frames: Contain compressed data
- Skippable frames: Contain custom user metadata
Blocks
Each frame encapsulates one or multiple blocks. Each block contains arbitrary content, which is described by its header, and has a guaranteed maximum content size, which depends on frame parameters. Unlike frames, each block depends on previous blocks for proper decoding. However, each block can be decompressed without waiting for its successor, allowing streaming operations.Entropy Encoding
Zstandard uses two types of entropy encoding:- FSE (Finite State Entropy): Based on ANS (Asymmetric Numeral Systems), used for all symbols except literals
- Huffman coding: Used to compress literals
Magic Number
Zstandard frames begin with a 4-byte magic number in little-endian format:- Value:
0xFD2FB528
Window Size
The format supports configurable window sizes for compression:- Minimum: 1 KB
- Maximum:
(1<<41) + 7*(1<<38)bytes (3.75 TB) - Recommended for decoders: At least 8 MB
- Recommended for encoders: Not to exceed 8 MB
Content Checksum
An optional 32-bit checksum can be included at the end of frames. The content checksum uses the xxHash-64 hash function digesting the original (decoded) data as input, with a seed of zero. The low 4 bytes of the checksum are stored in little-endian format.License and Permissions
Copyright (c) Meta Platforms, Inc. and affiliates. Permission is granted to copy and distribute this document for any purpose and without charge, including translations into other languages and incorporation into compilations, provided that the copyright notice and this notice are preserved, and that any substantive changes or deletions from the original are clearly marked. Distribution of this document is unlimited.Reference Implementation
The Zstandard format is supported by an open source reference implementation, written in portable C, and available at: https://github.com/facebook/zstd The reference implementation provides:- A dual BSD OR GPLv2 licensed C library
- Command line utility producing and decoding
.zst,.gz,.xzand.lz4files