Skip to main content

Understanding Deep Packet Inspection

What DPI Does

Deep Packet Inspection (DPI) examines the content of network packets beyond basic header information. For censorship purposes, DPI systems:
  1. Reassemble TCP streams (to varying degrees)
  2. Parse application protocols (HTTP, TLS, etc.)
  3. Extract identifying information (SNI, Host headers, URLs)
  4. Match against blocklists
  5. Terminate or redirect connections

DPI Architectures

Sits directly in the network path:
Client → [DPI Box] → Server
Characteristics:
  • Can block in real-time
  • Must be fast (limited processing time)
  • Typically stateless or minimally stateful
  • Performance-constrained
Turkish ISPs use this model.

Evasion Strategy 1: Header Fragmentation

TLS Record Header Splitting

The TLS record header is 5 bytes:
Byte:   0      1-2     3-4      5+
      ┌────┬────────┬────────┬─────────────────────┐
      │ 16 │ 03 03  │ length │ Handshake data...   │
      └────┴────────┴────────┴─────────────────────┘
        │      │        │
        │      │        └─ 2 bytes: record length
        │      └─ TLS version (0x0303 = TLS 1.2)
        └─ Content type (0x16 = Handshake)
Strategy: Split before the DPI can identify this as a TLS Handshake record.
// From engine/src/bypass.rs:178-179
let split_pos = if self.config.tls_split_pos > 0 {
    self.config.tls_split_pos.min(data.len() - 1)

Example: Split at position 2

Original:  [16] [03 03] [00 xx] [01 00 00 ... SNI ...]

                                  Handshake type

Fragment:  [16 03] | [03 00 xx 01 00 ... SNI ...]

              DPI sees incomplete header, can't classify as TLS Handshake
The first packet has:
  • Byte 0: 0x16 (could be TLS Handshake)
  • Byte 1: 0x03 (first byte of version)
  • Missing: Second version byte, length field, handshake type
DPI cannot confirm this is a ClientHello without more data.

HTTP Host Header Splitting

HTTP headers are line-delimited text:
GET / HTTP/1.1\r\n
Host: discord.com\r\n
Connection: close\r\n
\r\n
Strategy: Find the Host header line and split within the value.
// From engine/src/tls.rs:233-252
pub fn find_http_host(data: &[u8]) -> Option<(usize, usize)> {
    let text = std::str::from_utf8(data).ok()?;
    let lower = text.to_lowercase();
    let host_pos = lower.find("\nhost:")?;
    
    let value_start = host_pos + 6;  // "\nhost:" = 6 chars
    
    // Skip whitespace
    let mut start = value_start;
    while start < text.len() && 
          (text.as_bytes()[start] == b' ' || 
           text.as_bytes()[start] == b'\t') {
        start += 1;
    }
    
    // Find end of line
    let end = text[start..].find('\r')
        .or_else(|| text[start..].find('\n'))
        .map(|p| start + p)
        .unwrap_or(text.len());
    
    Some((start, end - start))
}

Example: Split at “Host: twi”

Fragment 1:
GET / HTTP/1.1\r\n
Host: twi

Fragment 2:
tter.com\r\n
Connection: close\r\n
\r\n
The DPI sees:
  • Packet 1: Incomplete Host header (“twi” is not in blocklist)
  • Packet 2: Continuation (“tter.com” alone is not recognized as a full hostname)
Neither packet alone reveals “twitter.com”.
HTTP fragmentation is less important since most sites now use HTTPS, but it’s still effective for legacy HTTP connections and some proxy scenarios.

Evasion Strategy 2: SNI Field Splitting

Locating the SNI Extension

The TLS ClientHello structure is complex:
TLS Record (5 bytes)
└─ Handshake Protocol
   ├─ Type (1 byte): 0x01 = ClientHello
   ├─ Length (3 bytes)
   ├─ Version (2 bytes)
   ├─ Random (32 bytes)
   ├─ Session ID (variable)
   ├─ Cipher Suites (variable)
   ├─ Compression Methods (variable)
   └─ Extensions (variable)
      └─ Server Name (type 0x0000)
         ├─ Extension length
         ├─ Server Name list length
         ├─ Name type: 0x00 = hostname
         ├─ Name length
         └─ Name data ← Target for fragmentation
We must parse the entire structure to find the SNI:
// From engine/src/tls.rs:74-196 (abbreviated)
pub fn parse_client_hello(data: &[u8]) -> Option<ClientHelloInfo> {
    let mut info = ClientHelloInfo::default();
    let mut pos = 0;
    
    // Parse TLS record header
    if data[0] != TLS_HANDSHAKE { return None; }
    pos += 1;
    info.record_version = (data[pos], data[pos + 1]);
    pos += 2;
    let record_length = u16::from_be_bytes([data[pos], data[pos + 1]]) as usize;
    pos += 2;
    
    // Parse handshake header
    if data[pos] != HANDSHAKE_CLIENT_HELLO { return None; }
    pos += 1;
    let _handshake_length = u32::from_be_bytes(
        [0, data[pos], data[pos + 1], data[pos + 2]]
    ) as usize;
    pos += 3;
    
    // Parse ClientHello fields
    info.client_version = (data[pos], data[pos + 1]);
    pos += 2;
    pos += 32;  // Random
    
    let session_id_len = data[pos] as usize;
    pos += 1 + session_id_len;
    
    let cipher_suites_len = u16::from_be_bytes(
        [data[pos], data[pos + 1]]
    ) as usize;
    pos += 2 + cipher_suites_len;
    
    let compression_len = data[pos] as usize;
    pos += 1 + compression_len;
    
    let extensions_len = u16::from_be_bytes(
        [data[pos], data[pos + 1]]
    ) as usize;
    pos += 2;
    
    // Parse extensions to find SNI
    let extensions_end = pos + extensions_len;
    while pos + 4 <= data.len() && pos < extensions_end {
        let ext_type = u16::from_be_bytes([data[pos], data[pos + 1]]);
        pos += 2;
        let ext_len = u16::from_be_bytes([data[pos], data[pos + 1]]) as usize;
        pos += 2;
        
        if ext_type == EXT_SERVER_NAME {  // 0x0000
            let _sni_list_len = u16::from_be_bytes(
                [data[pos], data[pos + 1]]
            ) as usize;
            let name_type = data[pos + 2];
            let name_len = u16::from_be_bytes(
                [data[pos + 3], data[pos + 4]]
            ) as usize;
            
            if name_type == SNI_HOST_NAME {  // 0x00
                let name_offset = pos + 5;
                info.sni_offset = Some(name_offset);
                info.sni_length = Some(name_len);
                
                if name_offset + name_len <= data.len() {
                    if let Ok(hostname) = std::str::from_utf8(
                        &data[name_offset..name_offset + name_len]
                    ) {
                        info.sni_hostname = Some(hostname.to_string());
                    }
                }
            }
            break;
        }
        pos += ext_len;
    }
    
    Some(info)
}

Splitting the Hostname

Once we know the SNI offset and length, we can split the hostname itself:
// From engine/src/bypass.rs:180-187
} else if let (Some(sni_off), Some(sni_len)) = 
          (info.sni_offset, info.sni_length) {
    // Split hostname in the middle
    if sni_len > 2 {
        sni_off + (sni_len / 2)
    } else {
        sni_off
    }.min(data.len() - 1)
}

Example: “discord.com” split

SNI field bytes:
[...] [00 0b] "discord.com" [...]
               ↓ split at position 6
[...] [00 0b] "discor" | "d.com" [...]
Packet 1 contains: "discor"
Packet 2 contains: "d.com"
The DPI sees neither full hostname in a single packet. This is extremely effective because:
  • The length field (00 0b = 11 bytes) is in packet 1
  • But only 6 bytes of data follow in that packet
  • Reconstructing the SNI requires buffering and reassembly

Evasion Strategy 3: Timing Delays

Why Timing Matters

Some DPI systems use short-term packet buffering to defeat simple fragmentation:
DPI Buffer:
Packet 1 arrives → [Buffer: pkt1]
Packet 2 arrives within 1ms → [Buffer: pkt1, pkt2] → Reassemble → Block
Counter-strategy: Delay fragments beyond the DPI’s buffer timeout.
// From engine/src/bypass.rs:215-217
if self.config.fragment_delay_us > 0 {
    result.inter_fragment_delay = Some(
        Duration::from_micros(self.config.fragment_delay_us)
    );
}

Delay Values

Works against: Simple stateless DPI
Used by: Turk Telekom, Superonline presets
Fragments are sent immediately. Effective when DPI doesn’t attempt reassembly.
Delays only apply to the initial handshake packets. Once the TLS connection is established, data packets flow normally without artificial delays.

Evasion Strategy 4: Segment Size Control

TCP Segmentation

Fragments can be further split into very small TCP segments:
// From engine/src/bypass.rs:196-207
let segment_size = self.config.max_segment_size.max(1);

if segment_size < split_pos {
    // Multi-segment fragmentation
    let mut pos = 0;
    while pos < split_pos {
        let end = (pos + segment_size).min(split_pos);
        result.fragments.push(
            Bytes::copy_from_slice(&data[pos..end])
        );
        pos = end;
    }
    result.fragments.push(
        Bytes::copy_from_slice(&data[split_pos..])
    );
}

Example: 5-byte max segments

A 50-byte ClientHello becomes:
Original: [50 bytes in 1 packet]

Segmented: 
[5 bytes] packet 1
[5 bytes] packet 2
[5 bytes] packet 3
[5 bytes] packet 4
...
[5 bytes] packet 10
Extremely difficult for DPI to process:
  • 10x more packets to track
  • Requires stateful reassembly
  • Memory-intensive
  • Easy to time out
The aggressive preset uses max_segment_size: 5, creating maximum fragmentation at the cost of increased packet overhead.

Evasion Strategy 5: Fake Packets (Future)

The codebase includes infrastructure for sending decoy packets:
// From engine/src/bypass.rs:268-288
fn generate_fake_tls_packet(&self, original: &[u8]) -> Bytes {
    let mut fake = BytesMut::with_capacity(original.len());
    fake.extend_from_slice(original);
    
    // Corrupt the SNI field
    if let Some(info) = parse_client_hello(original) {
        if let (Some(offset), Some(len)) = 
               (info.sni_offset, info.sni_length) {
            if offset + len <= fake.len() {
                for i in 0..len {
                    fake[offset + i] = b'x';  // "xxxxxxxx.xxx"
                }
            }
        }
    }
    
    fake.freeze()
}

TTL-Based Decoys

The idea:
  1. Send a fake packet with valid structure but wrong SNI (e.g., “xxxxxxxx.xxx”)
  2. Set TTL low enough to expire before reaching the server
  3. Send real fragmented packets immediately after
Client → [TTL=1, SNI="xxxxxxx.xxx"] → [DPI] → [X dies]
       → [TTL=64, fragmented real SNI] → [DPI confused] → [Server ✓]
The DPI sees the fake packet first and might whitelist the connection.
Fake packet injection is currently disabled by default (send_fake_packets: false) in all presets. It’s experimental and requires further tuning.

Why These Techniques Work

DPI Performance Constraints

DPI boxes must inspect millions of packets per second. They have nanosecond-scale budgets per packet.Stateful reassembly is expensive:
  • Allocate memory per flow
  • Track sequence numbers
  • Buffer out-of-order packets
  • Implement timeouts
Most inline DPI systems avoid this complexity.
Buffering requires RAM. At 1 million concurrent connections:
  • 1KB per connection = 1GB RAM
  • 10KB per connection = 10GB RAM
DPI boxes can’t buffer significant amounts per flow.
Censorship systems must minimize false positives (blocking legitimate traffic).Conservative approach: If a packet can’t be conclusively identified, let it through.Fragmentation creates ambiguity that triggers this conservatism.

Server Robustness

Legitimate servers are designed to handle:
  • Fragmented packets (TCP is a stream protocol)
  • Out-of-order delivery (TCP reordering)
  • Retransmissions (packet loss)
  • Variable packet sizes (MSS negotiation)
TLS libraries like OpenSSL parse ClientHello incrementally and handle split records transparently. HTTP parsers are line-buffered and don’t care about packet boundaries.

Defense Against Evasion

DPI vendors could counter these techniques:
Implement complete TCP reassembly for all flows.Cost:
  • Massive memory requirements
  • Complex state management
  • Performance degradation
TurkeyDPI counter: Timing delays + resource exhaustion

Testing Bypass Effectiveness

The engine validates that fragmentation preserves data integrity:
// From engine/src/bypass.rs:333-339
// Critical: verify reassembly produces original data
let mut reassembled = Vec::new();
for frag in &result.fragments {
    reassembled.extend_from_slice(frag);
}
assert_eq!(reassembled, data);
All presets are tested to ensure:
  1. ✓ Fragmentation occurs (result.modified == true)
  2. ✓ Multiple fragments created (fragments.len() >= 2)
  3. ✓ Hostname extracted correctly
  4. ✓ Reassembly produces byte-identical data

See Also

Build docs developers (and LLMs) love