Skip to main content
Protocol Buffers serializes messages to a compact binary format. Understanding the encoding helps you choose the right field types, debug wire-level issues, and reason about forward/backward compatibility.

Wire types

The binary format is a sequence of tag–value pairs. Each pair begins with a tag that encodes both the field number and the wire type. The wire type tells the decoder how many bytes to consume for the value.
Wire typeIDUsed for
Varint0int32, int64, uint32, uint64, sint32, sint64, bool, enum
64-bit1fixed64, sfixed64, double
Length-delimited2string, bytes, embedded messages, packed repeated fields
Start group3Groups (deprecated)
End group4Groups (deprecated)
32-bit5fixed32, sfixed32, float
Wire types 3 and 4 (groups) are deprecated. You will not encounter them in proto3 files, but you may see them in older proto2 data.

Field tags

The tag is a varint formed by combining the field number and wire type:
tag = (field_number << 3) | wire_type
For example, field number 1 with wire type 0 (varint) produces:
tag = (1 << 3) | 0 = 0x08
Field number 2 with wire type 2 (length-delimited) produces:
tag = (2 << 3) | 2 = 0x12

Varint encoding

Varints use one or more bytes to encode an integer. The most significant bit (MSB) of each byte is a continuation bit: 1 means more bytes follow, 0 means this is the last byte. The remaining 7 bits of each byte carry the data, in little-endian order.

Example: encoding the integer 300

The value 300 in binary is 100101100. Splitting into 7-bit groups (little-endian):
300 = 0b100101100
    → groups: 0101100  0000010
    → with continuation bits:
       1_0101100  0_0000010
       = 0xAC     0x02
Hex dump:
AC 02

Example: encoding the integer 1

Single byte, no continuation:
01

Negative numbers and ZigZag encoding

Standard varint encoding of a negative int32 always uses 10 bytes because the sign bit propagates through the full 64-bit representation. Use sint32 / sint64 for fields that frequently carry negative values — these use ZigZag encoding that maps signed integers to unsigned integers:
ZigZag(n) = (n << 1) ^ (n >> 31)   // for sint32
ZigZag(n) = (n << 1) ^ (n >> 63)   // for sint64
SignedZigZag encoded
00
-11
12
-23
21474836474294967294
-21474836484294967295

Encoding a simple message

Consider this message:
message Test {
  int32 a = 1;
}
With a = 150, the encoded bytes are:
08 96 01
Breaking it down:
BytesMeaning
08Tag: field 1, wire type 0 (varint)
96 01Value 150 as varint (0x96 = 10010110, continuation set; 0x01 = 00000001)
Decoding 96 01:
  • 960b10010110 → continuation bit set, data = 0010110
  • 010b00000001 → continuation bit clear, data = 0000001
  • Concatenate little-endian: 0000001_0010110 = 10010110 = 150

Strings and bytes

Strings and bytes use wire type 2 (length-delimited). The value is the byte count (as a varint) followed by the UTF-8 encoded string bytes. For a string name = 2 field with value "testing":
12 07 74 65 73 74 69 6E 67
BytesMeaning
12Tag: field 2, wire type 2
07Length: 7 bytes
74 65 73 74 69 6E 67UTF-8 bytes for "testing"

Embedded messages

Embedded messages also use wire type 2. The serialized sub-message bytes follow a length varint, exactly like a string or bytes field. For this schema:
message Inner {
  int32 x = 1;
}

message Outer {
  Inner inner = 3;
}
With inner.x = 150, the encoding of Outer is:
1A 03 08 96 01
BytesMeaning
1ATag: field 3, wire type 2
03Length: 3 bytes
08 96 01Encoded Inner with x = 150

Repeated field encoding

Unpacked (non-scalar or legacy)

Each element of a repeated field is encoded as a separate tag–value pair using the same field number:
// repeated int32 ids = 6 with values [1, 2, 3]:
32 01    // tag field 6, value 1
32 02    // tag field 6, value 2
32 03    // tag field 6, value 3

Packed encoding (default for scalar types in proto3)

In proto3, repeated scalar fields are packed by default. All values are concatenated into a single length-delimited record, saving the overhead of repeating the tag for each element.
message Test {
  repeated int32 d = 4;
}
With d = [3, 270, 86942], the packed encoding is:
22 06 03 8E 02 9E A7 05
BytesMeaning
22Tag: field 4, wire type 2
06Payload length: 6 bytes
03Value 3
8E 02Value 270
9E A7 05Value 86942
A parser must be able to accept both packed and unpacked formats for backward compatibility, even in proto3. The packed option defaults to true for scalar numeric repeated fields.

Map encoding

Map fields are encoded as repeated entries of an auto-generated message type:
// map<string, int32> scores = 1;
// is equivalent to:
message ScoresEntry {
  string key = 1;
  int32 value = 2;
}
repeated ScoresEntry scores = 1;
Each map entry uses wire type 2 (length-delimited), encoding the key and value as fields of the synthetic entry message.

Field ordering and unknown fields

Fields are not required to appear in field-number order in the wire format. Parsers must handle any ordering. When a parser encounters a field number that is not defined in the current schema, it stores the raw tag–value bytes as unknown fields. This is the mechanism that enables forward compatibility: old code reading a message written by new code preserves any fields it does not understand.

Wire compatibility

Certain field type changes are wire-compatible because they share the same wire type:
Compatible types (wire type 0)
int32, int64, uint32, uint64, bool, enum
Compatible types (wire type 2)
string, bytes, embedded messages, packed repeated fields
Changing a field type to one with a different wire type will cause parse errors for any data written in the old format. Always check wire type compatibility before changing a field type in a production schema.

Build docs developers (and LLMs) love