Streaming tar archive parsing for JavaScript. The tar-parser package handles POSIX/GNU/PAX archives incrementally so large tar files can be processed without buffering the full payload.
Installation
Features
- Universal Runtime - Runs anywhere JavaScript runs (Node.js, Bun, Deno, browsers, edge)
- Web Streams - Built on the standard Streams API
- Format Support - Supports POSIX, GNU, and PAX tar formats
- Memory Efficient - Does not buffer anything in normal usage
- Zero Dependencies - No external dependencies
- Composable - Works with
fetch() streams and compression streams
Basic Usage
Parsing a Tar Archive
The main parser interface is the parseTar(archive, handler) function:
import { parseTar } from 'remix/tar-parser'
let response = await fetch(
'https://github.com/remix-run/remix/archive/refs/heads/main.tar.gz'
)
await parseTar(
response.body.pipeThrough(new DecompressionStream('gzip')),
(entry) => {
console.log(entry.name, entry.size)
}
)
API Reference
parseTar
Parses a tar archive stream and invokes a handler for each entry.
function parseTar(
archive: ReadableStream<Uint8Array>,
handler: (entry: TarEntry) => void | Promise<void>
): Promise<void>
function parseTar(
archive: ReadableStream<Uint8Array>,
options: ParseTarOptions,
handler: (entry: TarEntry) => void | Promise<void>
): Promise<void>
archive
ReadableStream<Uint8Array>
required
The tar archive stream to parse.
Optional parsing configuration.
handler
(entry: TarEntry) => void | Promise<void>
required
Function called for each entry in the archive.
ParseTarOptions
The character encoding for filenames in the archive.Default: 'utf-8'Common values: 'utf-8', 'latin1', 'ascii'
TarEntry
Represents a single entry (file or directory) in a tar archive.
Properties
The name (path) of the entry.
The size of the entry data in bytes.
The entry type:
'file' - Regular file
'directory' - Directory
'symlink' - Symbolic link
'link' - Hard link
- Other tar entry types
Unix file mode (permissions) as an octal number.
Modification time of the entry.
Username of the owner (if available).
Group name of the owner (if available).
Target path for symlinks and hard links.
Methods
arrayBuffer
() => Promise<ArrayBuffer>
Reads the entry data as an ArrayBuffer.
Reads the entry data and decodes it as UTF-8 text.
bytes
() => Promise<Uint8Array>
Reads the entry data as a Uint8Array.
stream
() => ReadableStream<Uint8Array>
Returns a stream for reading the entry data.
TarParser
Low-level class for parsing tar archives with manual control.
class TarParser {
constructor(options?: TarParserOptions)
next(): Promise<TarEntry | null>
}
Parser configuration options.
Methods
next
() => Promise<TarEntry | null>
Returns the next entry in the archive, or null when the archive is complete.
Low-level function to parse a tar header from a buffer.
function parseTarHeader(
buffer: Uint8Array,
options?: ParseTarHeaderOptions
): TarHeader | null
The header block data (512 bytes).
The parsed header, or null if the buffer is all zeros (end of archive).
Examples
List Archive Contents
import { parseTar } from 'remix/tar-parser'
let response = await fetch('archive.tar')
await parseTar(response.body, (entry) => {
console.log(`${entry.type.padEnd(10)} ${entry.size.toString().padStart(8)} ${entry.name}`)
})
import { parseTar } from 'remix/tar-parser'
import * as fsp from 'node:fs/promises'
import * as path from 'node:path'
let response = await fetch('archive.tar')
await parseTar(response.body, async (entry) => {
if (entry.type === 'file' && entry.name.endsWith('.txt')) {
let filepath = path.join('./extracted', entry.name)
await fsp.mkdir(path.dirname(filepath), { recursive: true })
await fsp.writeFile(filepath, await entry.bytes())
console.log(`Extracted: ${entry.name}`)
}
})
Parse Compressed Archive
import { parseTar } from 'remix/tar-parser'
// Gzip
let response = await fetch('archive.tar.gz')
await parseTar(
response.body.pipeThrough(new DecompressionStream('gzip')),
(entry) => {
console.log(entry.name)
}
)
// Brotli
let response2 = await fetch('archive.tar.br')
await parseTar(
response2.body.pipeThrough(new DecompressionStream('br')),
(entry) => {
console.log(entry.name)
}
)
Custom Filename Encoding
import { parseTar } from 'remix/tar-parser'
let response = await fetch('archive.tar')
await parseTar(
response.body,
{ filenameEncoding: 'latin1' },
(entry) => {
console.log(entry.name)
}
)
Read Entry Content
import { parseTar } from 'remix/tar-parser'
let response = await fetch('archive.tar')
await parseTar(response.body, async (entry) => {
if (entry.name === 'README.md') {
// Read as text
let content = await entry.text()
console.log(content)
} else if (entry.name.endsWith('.json')) {
// Parse JSON
let json = JSON.parse(await entry.text())
console.log(json)
} else if (entry.name.endsWith('.jpg')) {
// Get bytes
let bytes = await entry.bytes()
await saveImage(entry.name, bytes)
}
})
Stream Entry Content
For large files, use streaming:
import { parseTar } from 'remix/tar-parser'
import * as fs from 'node:fs'
let response = await fetch('archive.tar')
await parseTar(response.body, async (entry) => {
if (entry.type === 'file') {
// Stream directly to file
let writeStream = fs.createWriteStream(`./output/${entry.name}`)
let reader = entry.stream().getReader()
while (true) {
let { done, value } = await reader.read()
if (done) break
writeStream.write(value)
}
writeStream.end()
}
})
Filter by Entry Type
import { parseTar } from 'remix/tar-parser'
let response = await fetch('archive.tar')
await parseTar(response.body, async (entry) => {
switch (entry.type) {
case 'file':
console.log(`File: ${entry.name} (${entry.size} bytes)`)
break
case 'directory':
console.log(`Directory: ${entry.name}/`)
break
case 'symlink':
console.log(`Symlink: ${entry.name} -> ${entry.linkname}`)
break
}
})
Low-Level Parser
For manual control over parsing:
import { TarParser } from 'remix/tar-parser'
let response = await fetch('archive.tar')
let parser = new TarParser({ filenameEncoding: 'utf-8' })
while (true) {
let entry = await parser.next()
if (!entry) break
console.log(`Processing: ${entry.name}`)
// Must consume entry data before calling next()
await entry.arrayBuffer()
}
Type Definitions
interface ParseTarOptions {
filenameEncoding?: string
}
interface TarParserOptions {
filenameEncoding?: string
}
interface ParseTarHeaderOptions {
filenameEncoding?: string
}
interface TarHeader {
name: string
size: number
type: string
mode: number
mtime: Date
uid: number
gid: number
uname?: string
gname?: string
linkname?: string
}
class TarEntry {
readonly name: string
readonly size: number
readonly type: string
readonly mode: number
readonly mtime: Date
readonly uid: number
readonly gid: number
readonly uname: string | undefined
readonly gname: string | undefined
readonly linkname: string | undefined
arrayBuffer(): Promise<ArrayBuffer>
text(): Promise<string>
bytes(): Promise<Uint8Array>
stream(): ReadableStream<Uint8Array>
}
class TarParser {
constructor(options?: TarParserOptions)
next(): Promise<TarEntry | null>
}
class TarParseError extends Error {}
Entry Types
The entry.type property can be:
'file' - Regular file
'directory' - Directory
'symlink' - Symbolic link
'link' - Hard link
'character-device' - Character device node
'block-device' - Block device node
'fifo' - FIFO (named pipe)
'contiguous-file' - Contiguous file (rarely used)
The tar-parser performs on par with popular tar libraries:
Platform: Darwin (24.0.0)
Node.js v22.8.0
┌────────────┬────────────────────┐
│ (index) │ lodash npm package │
├────────────┼────────────────────┤
│ tar-parser │ '6.23 ms ± 0.58' │
│ tar-stream │ '6.72 ms ± 2.24' │
│ node-tar │ '6.49 ms ± 0.44' │
└────────────┴────────────────────┘
The streaming design ensures minimal memory usage regardless of archive size.
Best Practices
Always Consume Entry Data
// ✅ Good: Consume entry data
await parseTar(stream, async (entry) => {
await entry.arrayBuffer() // or text(), bytes(), stream()
})
// ❌ Bad: Not consuming entry data will break parsing
await parseTar(stream, (entry) => {
console.log(entry.name) // Data not consumed!
})
Use Streaming for Large Entries
// ✅ Good: Stream large files
if (entry.size > 100_000_000) {
// Stream instead of buffering
let stream = entry.stream()
// Process stream...
}
// ❌ Bad: Buffering very large files
let buffer = await entry.arrayBuffer() // May exhaust memory
Handle Compressed Archives
// ✅ Good: Decompress before parsing
let stream = response.body.pipeThrough(new DecompressionStream('gzip'))
await parseTar(stream, handler)
// ❌ Bad: Parsing compressed data directly
await parseTar(response.body, handler) // Will fail
Error Handling
import { TarParseError, parseTar } from 'remix/tar-parser'
try {
await parseTar(stream, (entry) => {
console.log(entry.name)
})
} catch (error) {
if (error instanceof TarParseError) {
console.error('Invalid tar archive:', error.message)
} else {
console.error('Unexpected error:', error)
}
}