Protocol Buffers serializes messages to a compact binary format. Understanding the encoding helps you choose the right field types, debug wire-level issues, and reason about forward/backward compatibility.
Wire types
The binary format is a sequence of tag–value pairs. Each pair begins with a tag that encodes both the field number and the wire type. The wire type tells the decoder how many bytes to consume for the value.
| Wire type | ID | Used for |
|---|
| Varint | 0 | int32, int64, uint32, uint64, sint32, sint64, bool, enum |
| 64-bit | 1 | fixed64, sfixed64, double |
| Length-delimited | 2 | string, bytes, embedded messages, packed repeated fields |
| Start group | 3 | Groups (deprecated) |
| End group | 4 | Groups (deprecated) |
| 32-bit | 5 | fixed32, sfixed32, float |
Wire types 3 and 4 (groups) are deprecated. You will not encounter them in proto3 files, but you may see them in older proto2 data.
The tag is a varint formed by combining the field number and wire type:
tag = (field_number << 3) | wire_type
For example, field number 1 with wire type 0 (varint) produces:
tag = (1 << 3) | 0 = 0x08
Field number 2 with wire type 2 (length-delimited) produces:
tag = (2 << 3) | 2 = 0x12
Varint encoding
Varints use one or more bytes to encode an integer. The most significant bit (MSB) of each byte is a continuation bit: 1 means more bytes follow, 0 means this is the last byte. The remaining 7 bits of each byte carry the data, in little-endian order.
Example: encoding the integer 300
The value 300 in binary is 100101100. Splitting into 7-bit groups (little-endian):
300 = 0b100101100
→ groups: 0101100 0000010
→ with continuation bits:
1_0101100 0_0000010
= 0xAC 0x02
Hex dump:
Example: encoding the integer 1
Single byte, no continuation:
Negative numbers and ZigZag encoding
Standard varint encoding of a negative int32 always uses 10 bytes because the sign bit propagates through the full 64-bit representation. Use sint32 / sint64 for fields that frequently carry negative values — these use ZigZag encoding that maps signed integers to unsigned integers:
ZigZag(n) = (n << 1) ^ (n >> 31) // for sint32
ZigZag(n) = (n << 1) ^ (n >> 63) // for sint64
| Signed | ZigZag encoded |
|---|
0 | 0 |
-1 | 1 |
1 | 2 |
-2 | 3 |
2147483647 | 4294967294 |
-2147483648 | 4294967295 |
Encoding a simple message
Consider this message:
message Test {
int32 a = 1;
}
With a = 150, the encoded bytes are:
Breaking it down:
| Bytes | Meaning |
|---|
08 | Tag: field 1, wire type 0 (varint) |
96 01 | Value 150 as varint (0x96 = 10010110, continuation set; 0x01 = 00000001) |
Decoding 96 01:
96 → 0b10010110 → continuation bit set, data = 0010110
01 → 0b00000001 → continuation bit clear, data = 0000001
- Concatenate little-endian:
0000001_0010110 = 10010110 = 150
Strings and bytes
Strings and bytes use wire type 2 (length-delimited). The value is the byte count (as a varint) followed by the UTF-8 encoded string bytes.
For a string name = 2 field with value "testing":
12 07 74 65 73 74 69 6E 67
| Bytes | Meaning |
|---|
12 | Tag: field 2, wire type 2 |
07 | Length: 7 bytes |
74 65 73 74 69 6E 67 | UTF-8 bytes for "testing" |
Embedded messages
Embedded messages also use wire type 2. The serialized sub-message bytes follow a length varint, exactly like a string or bytes field.
For this schema:
message Inner {
int32 x = 1;
}
message Outer {
Inner inner = 3;
}
With inner.x = 150, the encoding of Outer is:
| Bytes | Meaning |
|---|
1A | Tag: field 3, wire type 2 |
03 | Length: 3 bytes |
08 96 01 | Encoded Inner with x = 150 |
Repeated field encoding
Unpacked (non-scalar or legacy)
Each element of a repeated field is encoded as a separate tag–value pair using the same field number:
// repeated int32 ids = 6 with values [1, 2, 3]:
32 01 // tag field 6, value 1
32 02 // tag field 6, value 2
32 03 // tag field 6, value 3
Packed encoding (default for scalar types in proto3)
In proto3, repeated scalar fields are packed by default. All values are concatenated into a single length-delimited record, saving the overhead of repeating the tag for each element.
message Test {
repeated int32 d = 4;
}
With d = [3, 270, 86942], the packed encoding is:
| Bytes | Meaning |
|---|
22 | Tag: field 4, wire type 2 |
06 | Payload length: 6 bytes |
03 | Value 3 |
8E 02 | Value 270 |
9E A7 05 | Value 86942 |
A parser must be able to accept both packed and unpacked formats for backward compatibility, even in proto3. The packed option defaults to true for scalar numeric repeated fields.
Map encoding
Map fields are encoded as repeated entries of an auto-generated message type:
// map<string, int32> scores = 1;
// is equivalent to:
message ScoresEntry {
string key = 1;
int32 value = 2;
}
repeated ScoresEntry scores = 1;
Each map entry uses wire type 2 (length-delimited), encoding the key and value as fields of the synthetic entry message.
Field ordering and unknown fields
Fields are not required to appear in field-number order in the wire format. Parsers must handle any ordering.
When a parser encounters a field number that is not defined in the current schema, it stores the raw tag–value bytes as unknown fields. This is the mechanism that enables forward compatibility: old code reading a message written by new code preserves any fields it does not understand.
Wire compatibility
Certain field type changes are wire-compatible because they share the same wire type:
| Compatible types (wire type 0) |
|---|
int32, int64, uint32, uint64, bool, enum |
| Compatible types (wire type 2) |
|---|
string, bytes, embedded messages, packed repeated fields |
Changing a field type to one with a different wire type will cause parse errors for any data written in the old format. Always check wire type compatibility before changing a field type in a production schema.