ZDICT_trainFromBuffer()
Train a dictionary from an array of samples using the fast COVER algorithm.Output buffer where the trained dictionary will be stored.
Maximum size of the output dictionary buffer. Recommended: ~100 KB.
Input buffer containing all samples concatenated together.
Array containing the size of each sample, in order.
Number of samples provided. Recommended: provide ~100x the dictionary size in total samples.
Returns
Size of dictionary stored intodictBuffer (<= dictBufferCapacity), or an error code which can be tested with ZDICT_isError().
Notes
- This function redirects to
ZDICT_optimizeTrainFromBuffer_fastCover()with default parameters (d=8, steps=4, f=20, accel=1) - Memory usage is about 6 MB
- Training will fail if there are not enough samples or if samples are too small (< 8 bytes)
- Recommended to provide a few thousand samples totaling ~100x the target dictionary size
Example
ZDICT_finalizeDictionary()
Convert raw dictionary content into a zstd dictionary by adding headers and entropy tables.Output buffer for the finalized dictionary. Can overlap with
dictContent.Maximum size of the output dictionary. Must be >= max(dictContentSize, ZDICT_DICTSIZE_MIN).
Raw dictionary content (can be from any source, not just zstd training).
Size of the raw dictionary content.
Buffer containing concatenated samples for building entropy tables.
Array of sizes for each sample.
Number of samples provided.
Dictionary parameters:
compressionLevel: Optimize for specific compression level (0 = default)notificationLevel: Log verbosity (0-4, where 0 = none)dictID: Force specific dictionary ID (0 = auto-generate random ID)
Returns
Size of dictionary stored intodstDictBuffer (<= maxDictSize), or an error code which can be tested with ZDICT_isError().
Notes
- Adds zstd header with magic number, dictionary ID, and entropy tables
- Samples are used to construct statistics for the compression level specified
- If header + content doesn’t fit in
maxDictSize, content is truncated from the beginning - Most profitable content is presumed to be at the end of the dictionary
- May fail if not enough samples, samples are uncompressible, or all samples are identical