Encoding APIs

The Encoding APIs provide text encoding and decoding functionality for converting between strings and byte sequences. Workerd implements the WHATWG Encoding Standard.

Overview

The Encoding APIs provide:

TextEncoder - Encode strings to UTF-8 bytes
TextDecoder - Decode bytes to strings with multiple encodings
TextEncoderStream - Streaming text encoding
TextDecoderStream - Streaming text decoding
Support for legacy encodings (ISO-8859, Windows-1252, etc.)

Implementation: src/workerd/api/encoding.h and encoding.c++

TextEncoder

Encode strings to UTF-8 bytes:

const encoder = new TextEncoder();

// Encode a string
const bytes = encoder.encode('Hello World');
console.log(bytes); // Uint8Array [72, 101, 108, 108, 111, ...]

// Encoding is always UTF-8
console.log(encoder.encoding); // "utf-8"

Definition: src/workerd/api/encoding.h:133

encodeInto()

Encode directly into an existing buffer:

const encoder = new TextEncoder();
const buffer = new Uint8Array(100);

const result = encoder.encodeInto('Hello', buffer);
console.log(result.read);    // 5 (characters read)
console.log(result.written); // 5 (bytes written)

This is more efficient than encode() when you have a pre-allocated buffer. Source: src/workerd/api/encoding.h:147

TextDecoder

Decode bytes to strings with support for multiple encodings:

const decoder = new TextDecoder('utf-8');

// Decode bytes
const bytes = new Uint8Array([72, 101, 108, 108, 111]);
const text = decoder.decode(bytes);
console.log(text); // "Hello"

Definition: src/workerd/api/encoding.h:59

Supported encodings

TextDecoder supports many encodings through ICU:

// UTF encodings
new TextDecoder('utf-8');
new TextDecoder('utf-16le');
new TextDecoder('utf-16be');

// Legacy single-byte encodings
new TextDecoder('iso-8859-1');   // Latin-1
new TextDecoder('iso-8859-2');   // Latin-2
new TextDecoder('windows-1252'); // Windows Western

// And many more...

Source: src/workerd/api/encoding.h:17

Constructor options

Customize decoder behavior:

const decoder = new TextDecoder('utf-8', {
  fatal: false,    // Don't throw on invalid sequences
  ignoreBOM: false // Don't ignore byte order mark
});

console.log(decoder.encoding);  // "utf-8"
console.log(decoder.fatal);     // false
console.log(decoder.ignoreBOM); // false

Source: src/workerd/api/encoding.h:63

Streaming decoding

Decode data across multiple chunks:

const decoder = new TextDecoder('utf-8');

// First chunk (incomplete character)
const text1 = decoder.decode(new Uint8Array([0xE2, 0x82]), {
  stream: true  // More data coming
});

// Second chunk (completes character)
const text2 = decoder.decode(new Uint8Array([0xAC]), {
  stream: false // Last chunk
});

console.log(text1 + text2); // "€"

Source: src/workerd/api/encoding.h:70

TextEncoderStream

Encode text streams to byte streams:

const textStream = new ReadableStream({
  start(controller) {
    controller.enqueue('Hello ');
    controller.enqueue('World');
    controller.close();
  }
});

const byteStream = textStream.pipeThrough(
  new TextEncoderStream()
);

// byteStream now contains Uint8Array chunks
const reader = byteStream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  console.log(value); // Uint8Array
}

Implementation: src/workerd/api/streams/encoding.h and encoding.c++

TextDecoderStream

Decode byte streams to text streams:

const byteStream = new ReadableStream({
  start(controller) {
    controller.enqueue(new Uint8Array([72, 101, 108, 108, 111]));
    controller.enqueue(new Uint8Array([32, 87, 111, 114, 108, 100]));
    controller.close();
  }
});

const textStream = byteStream.pipeThrough(
  new TextDecoderStream('utf-8')
);

// textStream now contains string chunks
const reader = textStream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  console.log(value); // string
}

Stream options

Configure the decoder stream:

const decoder = new TextDecoderStream('utf-8', {
  fatal: true,    // Throw on invalid sequences
  ignoreBOM: true // Ignore byte order mark
});

Common patterns

Converting between strings and bytes

// String to bytes
const encoder = new TextEncoder();
const bytes = encoder.encode('Hello World');

// Bytes to string
const decoder = new TextDecoder();
const text = decoder.decode(bytes);

Base64 encoding

Combine with base64 for data URLs:

function stringToBase64(str) {
  const bytes = new TextEncoder().encode(str);
  return btoa(String.fromCharCode(...bytes));
}

function base64ToString(base64) {
  const bytes = Uint8Array.from(
    atob(base64),
    c => c.charCodeAt(0)
  );
  return new TextDecoder().decode(bytes);
}

const encoded = stringToBase64('Hello');
const decoded = base64ToString(encoded);

Reading text files

export default {
  async fetch(request, env) {
    const object = await env.BUCKET.get('file.txt');
    
    if (!object) {
      return new Response('Not found', { status: 404 });
    }
    
    // Decode as UTF-8 text
    const text = await object.text();
    
    // Or manually decode with specific encoding
    const decoder = new TextDecoder('iso-8859-1');
    const bytes = await object.arrayBuffer();
    const text = decoder.decode(bytes);
    
    return new Response(text);
  }
};

Processing CSV with encoding

async function parseCSV(response) {
  // Detect encoding from Content-Type header
  const contentType = response.headers.get('Content-Type') || '';
  const charset = contentType.match(/charset=([^;]+)/)?.[1] || 'utf-8';
  
  // Decode with detected encoding
  const decoder = new TextDecoder(charset);
  const bytes = await response.arrayBuffer();
  const text = decoder.decode(bytes);
  
  // Parse CSV
  const lines = text.split('\n');
  return lines.map(line => line.split(','));
}

Streaming text transformation

export default {
  async fetch(request, env) {
    const response = await fetch('https://example.com/large.txt');
    
    // Transform: bytes -> text -> uppercase -> bytes
    const transformed = response.body
      .pipeThrough(new TextDecoderStream())
      .pipeThrough(new TransformStream({
        transform(chunk, controller) {
          controller.enqueue(chunk.toUpperCase());
        }
      }))
      .pipeThrough(new TextEncoderStream());
    
    return new Response(transformed);
  }
};

Handling invalid data

Non-fatal decoding

Replace invalid sequences with replacement character:

const decoder = new TextDecoder('utf-8', { fatal: false });
const invalid = new Uint8Array([0xFF, 0xFE, 0xFD]);
const text = decoder.decode(invalid);
console.log(text); // Contains � (U+FFFD REPLACEMENT CHARACTER)

Fatal decoding

Throw on invalid sequences:

const decoder = new TextDecoder('utf-8', { fatal: true });
const invalid = new Uint8Array([0xFF, 0xFE, 0xFD]);

try {
  const text = decoder.decode(invalid);
} catch (error) {
  console.error('Invalid UTF-8 sequence');
}

Best practices

Reuse encoder/decoder instances

Create once and reuse for better performance:

// Good: reuse encoder
const encoder = new TextEncoder();
const bytes1 = encoder.encode('text1');
const bytes2 = encoder.encode('text2');

// Avoid: creating new instances
const bytes1 = new TextEncoder().encode('text1');
const bytes2 = new TextEncoder().encode('text2');

Use encodeInto() for large buffers

More efficient when you have pre-allocated buffers:

const encoder = new TextEncoder();
const buffer = new Uint8Array(1024);

let offset = 0;
for (const chunk of chunks) {
  const result = encoder.encodeInto(
    chunk,
    buffer.subarray(offset)
  );
  offset += result.written;
}

Specify encoding explicitly

Always specify the encoding when decoding:

// Good: explicit encoding
const decoder = new TextDecoder('utf-8');

// Avoid: relying on defaults
const decoder = new TextDecoder();

Use streams for large data

Stream large text data instead of loading it all:

// Good: streaming
const text = response.body
  .pipeThrough(new TextDecoderStream())
  .pipeThrough(processingStream);

// Avoid: loading everything
const bytes = await response.arrayBuffer();
const text = new TextDecoder().decode(bytes);

Implementation details

The Encoding APIs are implemented in:

src/workerd/api/encoding.h / .c++ - TextEncoder and TextDecoder (180 lines)
src/workerd/api/encoding-legacy.h / .c++ - Legacy encoding support
src/workerd/api/encoding-shared.h - Shared encoding utilities
src/workerd/api/streams/encoding.h / .c++ - TextEncoderStream and TextDecoderStream

TextEncoder always produces UTF-8 (per WHATWG spec):

class TextEncoder final: public jsg::Object {
  jsg::JsUint8Array encode(jsg::Lock& js, jsg::Optional<jsg::JsString> input);
  
  EncodeIntoResult encodeInto(
    jsg::Lock& js,
    jsg::JsString input,
    jsg::JsUint8Array buffer);
  
  kj::StringPtr getEncoding() {
    return "utf-8"; // Always UTF-8
  }
};

TextDecoder uses ICU (International Components for Unicode) for comprehensive encoding support:

class IcuDecoder final: public Decoder {
  static kj::Maybe<IcuDecoder> create(
    Encoding encoding,
    bool fatal,
    bool ignoreBom);
  
  kj::Maybe<jsg::JsString> decode(
    jsg::Lock& js,
    kj::ArrayPtr<const kj::byte> buffer,
    bool flush = false) override;
};

Source: src/workerd/api/encoding.h:20

Character encoding reference

Common encodings supported:

UTF-8 - Universal, variable-width (1-4 bytes)
UTF-16LE - 16-bit little-endian
UTF-16BE - 16-bit big-endian
ISO-8859-1 - Latin-1 (Western European)
ISO-8859-2 - Latin-2 (Central European)
Windows-1252 - Windows Western European
GBK - Simplified Chinese
Shift_JIS - Japanese
EUC-KR - Korean

For a complete list of supported encodings, refer to the ICU documentation.

Streams API - TextEncoderStream and TextDecoderStream
Crypto API - Often used with TextEncoder for hashing
Fetch API - Response.text() uses TextDecoder internally

Getting Started

Core Concepts

Configuration

Runtime APIs

Node.js Compatibility

Advanced

Development

Overview

TextEncoder

encodeInto()

TextDecoder

Supported encodings

Constructor options

Streaming decoding

TextEncoderStream

TextDecoderStream

Stream options

Common patterns

Converting between strings and bytes

Base64 encoding

Reading text files

Processing CSV with encoding

Streaming text transformation

Handling invalid data

Non-fatal decoding

Fatal decoding

Best practices

Implementation details

Character encoding reference

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Configuration

Runtime APIs

Node.js Compatibility

Advanced

Development

​Overview

​TextEncoder

​encodeInto()

​TextDecoder

​Supported encodings

​Constructor options

​Streaming decoding

​TextEncoderStream

​TextDecoderStream

​Stream options

​Common patterns

​Converting between strings and bytes

​Base64 encoding

​Reading text files

​Processing CSV with encoding

​Streaming text transformation

​Handling invalid data

​Non-fatal decoding

​Fatal decoding

​Best practices

​Implementation details

​Character encoding reference

​Related APIs

Build docs developers (and LLMs) love

Overview

TextEncoder

encodeInto()

TextDecoder

Supported encodings

Constructor options

Streaming decoding

TextEncoderStream

TextDecoderStream

Stream options

Common patterns

Converting between strings and bytes

Base64 encoding

Reading text files

Processing CSV with encoding

Streaming text transformation

Handling invalid data

Non-fatal decoding

Fatal decoding

Best practices

Implementation details

Character encoding reference

Related APIs