Skip to main content
The Encoding APIs provide text encoding and decoding functionality for converting between strings and byte sequences. Workerd implements the WHATWG Encoding Standard.

Overview

The Encoding APIs provide:
  • TextEncoder - Encode strings to UTF-8 bytes
  • TextDecoder - Decode bytes to strings with multiple encodings
  • TextEncoderStream - Streaming text encoding
  • TextDecoderStream - Streaming text decoding
  • Support for legacy encodings (ISO-8859, Windows-1252, etc.)
Implementation: src/workerd/api/encoding.h and encoding.c++

TextEncoder

Encode strings to UTF-8 bytes:
const encoder = new TextEncoder();

// Encode a string
const bytes = encoder.encode('Hello World');
console.log(bytes); // Uint8Array [72, 101, 108, 108, 111, ...]

// Encoding is always UTF-8
console.log(encoder.encoding); // "utf-8"
Definition: src/workerd/api/encoding.h:133

encodeInto()

Encode directly into an existing buffer:
const encoder = new TextEncoder();
const buffer = new Uint8Array(100);

const result = encoder.encodeInto('Hello', buffer);
console.log(result.read);    // 5 (characters read)
console.log(result.written); // 5 (bytes written)
This is more efficient than encode() when you have a pre-allocated buffer. Source: src/workerd/api/encoding.h:147

TextDecoder

Decode bytes to strings with support for multiple encodings:
const decoder = new TextDecoder('utf-8');

// Decode bytes
const bytes = new Uint8Array([72, 101, 108, 108, 111]);
const text = decoder.decode(bytes);
console.log(text); // "Hello"
Definition: src/workerd/api/encoding.h:59

Supported encodings

TextDecoder supports many encodings through ICU:
// UTF encodings
new TextDecoder('utf-8');
new TextDecoder('utf-16le');
new TextDecoder('utf-16be');

// Legacy single-byte encodings
new TextDecoder('iso-8859-1');   // Latin-1
new TextDecoder('iso-8859-2');   // Latin-2
new TextDecoder('windows-1252'); // Windows Western

// And many more...
Source: src/workerd/api/encoding.h:17

Constructor options

Customize decoder behavior:
const decoder = new TextDecoder('utf-8', {
  fatal: false,    // Don't throw on invalid sequences
  ignoreBOM: false // Don't ignore byte order mark
});

console.log(decoder.encoding);  // "utf-8"
console.log(decoder.fatal);     // false
console.log(decoder.ignoreBOM); // false
Source: src/workerd/api/encoding.h:63

Streaming decoding

Decode data across multiple chunks:
const decoder = new TextDecoder('utf-8');

// First chunk (incomplete character)
const text1 = decoder.decode(new Uint8Array([0xE2, 0x82]), {
  stream: true  // More data coming
});

// Second chunk (completes character)
const text2 = decoder.decode(new Uint8Array([0xAC]), {
  stream: false // Last chunk
});

console.log(text1 + text2); // "€"
Source: src/workerd/api/encoding.h:70

TextEncoderStream

Encode text streams to byte streams:
const textStream = new ReadableStream({
  start(controller) {
    controller.enqueue('Hello ');
    controller.enqueue('World');
    controller.close();
  }
});

const byteStream = textStream.pipeThrough(
  new TextEncoderStream()
);

// byteStream now contains Uint8Array chunks
const reader = byteStream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  console.log(value); // Uint8Array
}
Implementation: src/workerd/api/streams/encoding.h and encoding.c++

TextDecoderStream

Decode byte streams to text streams:
const byteStream = new ReadableStream({
  start(controller) {
    controller.enqueue(new Uint8Array([72, 101, 108, 108, 111]));
    controller.enqueue(new Uint8Array([32, 87, 111, 114, 108, 100]));
    controller.close();
  }
});

const textStream = byteStream.pipeThrough(
  new TextDecoderStream('utf-8')
);

// textStream now contains string chunks
const reader = textStream.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  console.log(value); // string
}

Stream options

Configure the decoder stream:
const decoder = new TextDecoderStream('utf-8', {
  fatal: true,    // Throw on invalid sequences
  ignoreBOM: true // Ignore byte order mark
});

Common patterns

Converting between strings and bytes

// String to bytes
const encoder = new TextEncoder();
const bytes = encoder.encode('Hello World');

// Bytes to string
const decoder = new TextDecoder();
const text = decoder.decode(bytes);

Base64 encoding

Combine with base64 for data URLs:
function stringToBase64(str) {
  const bytes = new TextEncoder().encode(str);
  return btoa(String.fromCharCode(...bytes));
}

function base64ToString(base64) {
  const bytes = Uint8Array.from(
    atob(base64),
    c => c.charCodeAt(0)
  );
  return new TextDecoder().decode(bytes);
}

const encoded = stringToBase64('Hello');
const decoded = base64ToString(encoded);

Reading text files

export default {
  async fetch(request, env) {
    const object = await env.BUCKET.get('file.txt');
    
    if (!object) {
      return new Response('Not found', { status: 404 });
    }
    
    // Decode as UTF-8 text
    const text = await object.text();
    
    // Or manually decode with specific encoding
    const decoder = new TextDecoder('iso-8859-1');
    const bytes = await object.arrayBuffer();
    const text = decoder.decode(bytes);
    
    return new Response(text);
  }
};

Processing CSV with encoding

async function parseCSV(response) {
  // Detect encoding from Content-Type header
  const contentType = response.headers.get('Content-Type') || '';
  const charset = contentType.match(/charset=([^;]+)/)?.[1] || 'utf-8';
  
  // Decode with detected encoding
  const decoder = new TextDecoder(charset);
  const bytes = await response.arrayBuffer();
  const text = decoder.decode(bytes);
  
  // Parse CSV
  const lines = text.split('\n');
  return lines.map(line => line.split(','));
}

Streaming text transformation

export default {
  async fetch(request, env) {
    const response = await fetch('https://example.com/large.txt');
    
    // Transform: bytes -> text -> uppercase -> bytes
    const transformed = response.body
      .pipeThrough(new TextDecoderStream())
      .pipeThrough(new TransformStream({
        transform(chunk, controller) {
          controller.enqueue(chunk.toUpperCase());
        }
      }))
      .pipeThrough(new TextEncoderStream());
    
    return new Response(transformed);
  }
};

Handling invalid data

Non-fatal decoding

Replace invalid sequences with replacement character:
const decoder = new TextDecoder('utf-8', { fatal: false });
const invalid = new Uint8Array([0xFF, 0xFE, 0xFD]);
const text = decoder.decode(invalid);
console.log(text); // Contains � (U+FFFD REPLACEMENT CHARACTER)

Fatal decoding

Throw on invalid sequences:
const decoder = new TextDecoder('utf-8', { fatal: true });
const invalid = new Uint8Array([0xFF, 0xFE, 0xFD]);

try {
  const text = decoder.decode(invalid);
} catch (error) {
  console.error('Invalid UTF-8 sequence');
}

Best practices

Create once and reuse for better performance:
// Good: reuse encoder
const encoder = new TextEncoder();
const bytes1 = encoder.encode('text1');
const bytes2 = encoder.encode('text2');

// Avoid: creating new instances
const bytes1 = new TextEncoder().encode('text1');
const bytes2 = new TextEncoder().encode('text2');
More efficient when you have pre-allocated buffers:
const encoder = new TextEncoder();
const buffer = new Uint8Array(1024);

let offset = 0;
for (const chunk of chunks) {
  const result = encoder.encodeInto(
    chunk,
    buffer.subarray(offset)
  );
  offset += result.written;
}
Always specify the encoding when decoding:
// Good: explicit encoding
const decoder = new TextDecoder('utf-8');

// Avoid: relying on defaults
const decoder = new TextDecoder();
Stream large text data instead of loading it all:
// Good: streaming
const text = response.body
  .pipeThrough(new TextDecoderStream())
  .pipeThrough(processingStream);

// Avoid: loading everything
const bytes = await response.arrayBuffer();
const text = new TextDecoder().decode(bytes);

Implementation details

The Encoding APIs are implemented in:
  • src/workerd/api/encoding.h / .c++ - TextEncoder and TextDecoder (180 lines)
  • src/workerd/api/encoding-legacy.h / .c++ - Legacy encoding support
  • src/workerd/api/encoding-shared.h - Shared encoding utilities
  • src/workerd/api/streams/encoding.h / .c++ - TextEncoderStream and TextDecoderStream
TextEncoder always produces UTF-8 (per WHATWG spec):
class TextEncoder final: public jsg::Object {
  jsg::JsUint8Array encode(jsg::Lock& js, jsg::Optional<jsg::JsString> input);
  
  EncodeIntoResult encodeInto(
    jsg::Lock& js,
    jsg::JsString input,
    jsg::JsUint8Array buffer);
  
  kj::StringPtr getEncoding() {
    return "utf-8"; // Always UTF-8
  }
};
TextDecoder uses ICU (International Components for Unicode) for comprehensive encoding support:
class IcuDecoder final: public Decoder {
  static kj::Maybe<IcuDecoder> create(
    Encoding encoding,
    bool fatal,
    bool ignoreBom);
  
  kj::Maybe<jsg::JsString> decode(
    jsg::Lock& js,
    kj::ArrayPtr<const kj::byte> buffer,
    bool flush = false) override;
};
Source: src/workerd/api/encoding.h:20

Character encoding reference

Common encodings supported:
  • UTF-8 - Universal, variable-width (1-4 bytes)
  • UTF-16LE - 16-bit little-endian
  • UTF-16BE - 16-bit big-endian
  • ISO-8859-1 - Latin-1 (Western European)
  • ISO-8859-2 - Latin-2 (Central European)
  • Windows-1252 - Windows Western European
  • GBK - Simplified Chinese
  • Shift_JIS - Japanese
  • EUC-KR - Korean
For a complete list of supported encodings, refer to the ICU documentation.
  • Streams API - TextEncoderStream and TextDecoderStream
  • Crypto API - Often used with TextEncoder for hashing
  • Fetch API - Response.text() uses TextDecoder internally

Build docs developers (and LLMs) love