Skip to main content
Streaming tar archive parsing for JavaScript. The tar-parser package handles POSIX/GNU/PAX archives incrementally so large tar files can be processed without buffering the full payload.

Installation

npm i remix

Features

  • Universal Runtime - Runs anywhere JavaScript runs (Node.js, Bun, Deno, browsers, edge)
  • Web Streams - Built on the standard Streams API
  • Format Support - Supports POSIX, GNU, and PAX tar formats
  • Memory Efficient - Does not buffer anything in normal usage
  • Zero Dependencies - No external dependencies
  • Composable - Works with fetch() streams and compression streams

Basic Usage

Parsing a Tar Archive

The main parser interface is the parseTar(archive, handler) function:
import { parseTar } from 'remix/tar-parser'

let response = await fetch(
  'https://github.com/remix-run/remix/archive/refs/heads/main.tar.gz'
)

await parseTar(
  response.body.pipeThrough(new DecompressionStream('gzip')),
  (entry) => {
    console.log(entry.name, entry.size)
  }
)

API Reference

parseTar

Parses a tar archive stream and invokes a handler for each entry.
function parseTar(
  archive: ReadableStream<Uint8Array>,
  handler: (entry: TarEntry) => void | Promise<void>
): Promise<void>

function parseTar(
  archive: ReadableStream<Uint8Array>,
  options: ParseTarOptions,
  handler: (entry: TarEntry) => void | Promise<void>
): Promise<void>
archive
ReadableStream<Uint8Array>
required
The tar archive stream to parse.
options
ParseTarOptions
Optional parsing configuration.
handler
(entry: TarEntry) => void | Promise<void>
required
Function called for each entry in the archive.

ParseTarOptions

filenameEncoding
string
The character encoding for filenames in the archive.Default: 'utf-8'Common values: 'utf-8', 'latin1', 'ascii'

TarEntry

Represents a single entry (file or directory) in a tar archive.

Properties

name
string
The name (path) of the entry.
size
number
The size of the entry data in bytes.
type
string
The entry type:
  • 'file' - Regular file
  • 'directory' - Directory
  • 'symlink' - Symbolic link
  • 'link' - Hard link
  • Other tar entry types
mode
number
Unix file mode (permissions) as an octal number.
mtime
Date
Modification time of the entry.
uid
number
User ID of the owner.
gid
number
Group ID of the owner.
uname
string | undefined
Username of the owner (if available).
gname
string | undefined
Group name of the owner (if available).
linkname
string | undefined
Target path for symlinks and hard links.

Methods

arrayBuffer
() => Promise<ArrayBuffer>
Reads the entry data as an ArrayBuffer.
text
() => Promise<string>
Reads the entry data and decodes it as UTF-8 text.
bytes
() => Promise<Uint8Array>
Reads the entry data as a Uint8Array.
stream
() => ReadableStream<Uint8Array>
Returns a stream for reading the entry data.

TarParser

Low-level class for parsing tar archives with manual control.
class TarParser {
  constructor(options?: TarParserOptions)
  
  next(): Promise<TarEntry | null>
}
options
TarParserOptions
Parser configuration options.

Methods

next
() => Promise<TarEntry | null>
Returns the next entry in the archive, or null when the archive is complete.

parseTarHeader

Low-level function to parse a tar header from a buffer.
function parseTarHeader(
  buffer: Uint8Array,
  options?: ParseTarHeaderOptions
): TarHeader | null
buffer
Uint8Array
required
The header block data (512 bytes).
options
ParseTarHeaderOptions
Header parsing options.
returns
TarHeader | null
The parsed header, or null if the buffer is all zeros (end of archive).

Examples

List Archive Contents

import { parseTar } from 'remix/tar-parser'

let response = await fetch('archive.tar')

await parseTar(response.body, (entry) => {
  console.log(`${entry.type.padEnd(10)} ${entry.size.toString().padStart(8)} ${entry.name}`)
})

Extract Specific Files

import { parseTar } from 'remix/tar-parser'
import * as fsp from 'node:fs/promises'
import * as path from 'node:path'

let response = await fetch('archive.tar')

await parseTar(response.body, async (entry) => {
  if (entry.type === 'file' && entry.name.endsWith('.txt')) {
    let filepath = path.join('./extracted', entry.name)
    await fsp.mkdir(path.dirname(filepath), { recursive: true })
    await fsp.writeFile(filepath, await entry.bytes())
    console.log(`Extracted: ${entry.name}`)
  }
})

Parse Compressed Archive

import { parseTar } from 'remix/tar-parser'

// Gzip
let response = await fetch('archive.tar.gz')
await parseTar(
  response.body.pipeThrough(new DecompressionStream('gzip')),
  (entry) => {
    console.log(entry.name)
  }
)

// Brotli
let response2 = await fetch('archive.tar.br')
await parseTar(
  response2.body.pipeThrough(new DecompressionStream('br')),
  (entry) => {
    console.log(entry.name)
  }
)

Custom Filename Encoding

import { parseTar } from 'remix/tar-parser'

let response = await fetch('archive.tar')

await parseTar(
  response.body,
  { filenameEncoding: 'latin1' },
  (entry) => {
    console.log(entry.name)
  }
)

Read Entry Content

import { parseTar } from 'remix/tar-parser'

let response = await fetch('archive.tar')

await parseTar(response.body, async (entry) => {
  if (entry.name === 'README.md') {
    // Read as text
    let content = await entry.text()
    console.log(content)
  } else if (entry.name.endsWith('.json')) {
    // Parse JSON
    let json = JSON.parse(await entry.text())
    console.log(json)
  } else if (entry.name.endsWith('.jpg')) {
    // Get bytes
    let bytes = await entry.bytes()
    await saveImage(entry.name, bytes)
  }
})

Stream Entry Content

For large files, use streaming:
import { parseTar } from 'remix/tar-parser'
import * as fs from 'node:fs'

let response = await fetch('archive.tar')

await parseTar(response.body, async (entry) => {
  if (entry.type === 'file') {
    // Stream directly to file
    let writeStream = fs.createWriteStream(`./output/${entry.name}`)
    let reader = entry.stream().getReader()
    
    while (true) {
      let { done, value } = await reader.read()
      if (done) break
      writeStream.write(value)
    }
    
    writeStream.end()
  }
})

Filter by Entry Type

import { parseTar } from 'remix/tar-parser'

let response = await fetch('archive.tar')

await parseTar(response.body, async (entry) => {
  switch (entry.type) {
    case 'file':
      console.log(`File: ${entry.name} (${entry.size} bytes)`)
      break
    case 'directory':
      console.log(`Directory: ${entry.name}/`)
      break
    case 'symlink':
      console.log(`Symlink: ${entry.name} -> ${entry.linkname}`)
      break
  }
})

Low-Level Parser

For manual control over parsing:
import { TarParser } from 'remix/tar-parser'

let response = await fetch('archive.tar')
let parser = new TarParser({ filenameEncoding: 'utf-8' })

while (true) {
  let entry = await parser.next()
  if (!entry) break
  
  console.log(`Processing: ${entry.name}`)
  
  // Must consume entry data before calling next()
  await entry.arrayBuffer()
}

Type Definitions

interface ParseTarOptions {
  filenameEncoding?: string
}

interface TarParserOptions {
  filenameEncoding?: string
}

interface ParseTarHeaderOptions {
  filenameEncoding?: string
}

interface TarHeader {
  name: string
  size: number
  type: string
  mode: number
  mtime: Date
  uid: number
  gid: number
  uname?: string
  gname?: string
  linkname?: string
}

class TarEntry {
  readonly name: string
  readonly size: number
  readonly type: string
  readonly mode: number
  readonly mtime: Date
  readonly uid: number
  readonly gid: number
  readonly uname: string | undefined
  readonly gname: string | undefined
  readonly linkname: string | undefined
  
  arrayBuffer(): Promise<ArrayBuffer>
  text(): Promise<string>
  bytes(): Promise<Uint8Array>
  stream(): ReadableStream<Uint8Array>
}

class TarParser {
  constructor(options?: TarParserOptions)
  next(): Promise<TarEntry | null>
}

class TarParseError extends Error {}

Entry Types

The entry.type property can be:
  • 'file' - Regular file
  • 'directory' - Directory
  • 'symlink' - Symbolic link
  • 'link' - Hard link
  • 'character-device' - Character device node
  • 'block-device' - Block device node
  • 'fifo' - FIFO (named pipe)
  • 'contiguous-file' - Contiguous file (rarely used)

Performance

The tar-parser performs on par with popular tar libraries:
Platform: Darwin (24.0.0)
Node.js v22.8.0
┌────────────┬────────────────────┐
│ (index)    │ lodash npm package │
├────────────┼────────────────────┤
│ tar-parser │ '6.23 ms ± 0.58'   │
│ tar-stream │ '6.72 ms ± 2.24'   │
│ node-tar   │ '6.49 ms ± 0.44'   │
└────────────┴────────────────────┘
The streaming design ensures minimal memory usage regardless of archive size.

Best Practices

Always Consume Entry Data

// ✅ Good: Consume entry data
await parseTar(stream, async (entry) => {
  await entry.arrayBuffer() // or text(), bytes(), stream()
})

// ❌ Bad: Not consuming entry data will break parsing
await parseTar(stream, (entry) => {
  console.log(entry.name) // Data not consumed!
})

Use Streaming for Large Entries

// ✅ Good: Stream large files
if (entry.size > 100_000_000) {
  // Stream instead of buffering
  let stream = entry.stream()
  // Process stream...
}

// ❌ Bad: Buffering very large files
let buffer = await entry.arrayBuffer() // May exhaust memory

Handle Compressed Archives

// ✅ Good: Decompress before parsing
let stream = response.body.pipeThrough(new DecompressionStream('gzip'))
await parseTar(stream, handler)

// ❌ Bad: Parsing compressed data directly
await parseTar(response.body, handler) // Will fail

Error Handling

import { TarParseError, parseTar } from 'remix/tar-parser'

try {
  await parseTar(stream, (entry) => {
    console.log(entry.name)
  })
} catch (error) {
  if (error instanceof TarParseError) {
    console.error('Invalid tar archive:', error.message)
  } else {
    console.error('Unexpected error:', error)
  }
}

Build docs developers (and LLMs) love