The Format
A block editor post is the proper block-aware representation of a post: a collection of semantically consistent descriptions of what each block is and what its essential data is. This representation only ever exists in memory. The block editor post is not the artifact it produces, namely thepost_content. The latter is the saved HTML, optimized for the reader but retaining invisible markings for later editing.
The input and output of the block editor is a tree of block objects:
The Block Object
Each block object has an ID, a set of attributes, and potentially a list of child blocks:- The core quote block has a
citestring attribute - A heading block has a numeric
levelattribute (1 to 6)
Metadata During Editing
During the block’s lifecycle in the editor, the block object receives extra metadata:isValid- Boolean representing whether the block is validoriginalContent- The original HTML serialization of the block
Examples
Simple paragraph block:Serialization and Parsing
The block tree lives in memory while editing. To save it, the editor transforms it into HTML that can be stored inpost_content through serialization.
Why HTML Storage?
The WordPress ecosystem expects HTML when rendering or editing posts. Serialization ensures:- Single source of truth for content
- Content remains readable and compatible with all WordPress tools
- No risk of data duplication or sync issues
- Backward compatibility with blocks-unaware themes
The Serialization Process
Serialization converts the block tree into HTML using HTML comments as explicit block delimiters. These comments can contain attributes in JSON form. This creates invisible marks on the saved page that leave a trace of the original structured intention.Delimiters and Parsing
HTML comments were chosen as delimiters because:- They don’t break the rest of the HTML document
- Browsers ignore them
- They simplify parsing
- They cannot exist in ambiguous places (like inside HTML attributes)
- They’re permissive—we only need to escape double-hyphen sequences
Comment Structure
Comments are easily described by:- Leading
<!-- - Any content except
-- - Closing
-->
Benefits of This Approach
- Simple and performant parsing
- Damage isolation - Errors in one block don’t bleed into others
- Unrecognized block identification - System can identify blocks before rendering
- No dependency on valid HTML - Top-level blocks can be extracted in a first pass
The Anatomy of a Serialized Block
When blocks are saved, their attributes are serialized to explicit comment delimiters. Static block with content:The Data Lifecycle
The complete workflow:- Parsing - Parse saved document to in-memory block tree using token delimiters
- Editing - All manipulations happen within the block tree
- Serialization - Serialize blocks back to
post_content
Flexibility
The workflow relies on a serialization/parser pair to persist posts. Hypothetically:- Post data could be stored via a plugin
- Data could be retrieved from a remote JSON file
- Alternative storage mechanisms could be used
Important Notes
Architectural Decision:Blocks are in-memory tree structures during editing, serialized as HTML with comment delimiters. Work with the block tree via APIs, not the serialized HTML.Block Identity: The defining aspects of blocks are:
- Their semantics
- The isolation mechanism they provide
- Their identity
- Static local data (JSON in HTML comments)
- Data within the block’s HTML
- Global/reusable blocks
- Storage in complementary
WP_Postobjects