Batch Conversion - TOON JSON Converter

Overview

The TOON JSON Converter includes built-in batch conversion capabilities for processing multiple files efficiently. This guide covers JSONL splitting, folder aggregation, and advanced batch processing techniques.

Built-In Batch Modes

The converter provides two native batch conversion modes:

JSONL → TOON Folder

Split a JSONL file into multiple TOON files

TOON Folder → JSONL

Aggregate multiple TOON files into one JSONL file

JSONL to TOON Folder

Basic Usage

Convert each line in a JSONL file to a separate TOON file:

python toon_json_converter.py dataset.jsonl output_folder/

File Naming Convention

Generated files follow this pattern: {base_name}_{index:04d}.toon Example:

python toon_json_converter.py logs.jsonl logs_toons/

Creates:

logs_toons/
  logs_0000.toon
  logs_0001.toon
  logs_0002.toon
  ...

Source reference: toon_json_converter.py:1113

output_file = os.path.join(output_dir, f"{base_name}_{i:04d}.toon")

Automatic Output Folder

If output is omitted, creates {input_name}_toons/ folder:

python toon_json_converter.py dataset.jsonl
# Creates: dataset_toons/

Example: Processing Log Files

Input: events.jsonl (1,000 lines)

{"id": 1, "event": "login", "user": "alice", "timestamp": "2024-01-01T10:00:00Z"}
{"id": 2, "event": "purchase", "user": "bob", "amount": 49.99}
{"id": 3, "event": "logout", "user": "alice", "timestamp": "2024-01-01T10:30:00Z"}
...

Command:

python toon_json_converter.py events.jsonl events_toons/ --tab --length-marker

Output: 1,000 files in events_toons/ events_0000.toon:

id: 1
event: login
user: alice
timestamp: 2024-01-01T10:00:00Z

events_0001.toon:

id: 2
event: purchase
user: bob
amount: 49.99

Terminal output:

✅ Converted 1000 items to events_toons/

Error Handling

The converter gracefully handles errors and continues processing: Input: mixed.jsonl

{"valid": "data", "id": 1}
invalid json line here
{"valid": "data", "id": 2}
{"another": "malformed" json
{"valid": "data", "id": 3}

Command:

python toon_json_converter.py mixed.jsonl output/

Terminal output:

⚠️  Line 1: Invalid JSON: Expecting value: line 1 column 1 (char 0)
⚠️  Line 3: Invalid JSON: Expecting ',' delimiter: line 1 column 20 (char 19)
✅ Converted 3 items to output/
⚠️  2 items skipped due to errors

Empty Lines

Empty or whitespace-only lines are automatically skipped: Input: sparse.jsonl

{"id": 1}

{"id": 2}
   
{"id": 3}

Result: Creates only 3 files (empty lines ignored) Source reference: toon_json_converter.py:1107-1109

for i, raw_line in enumerate(f):
    line = raw_line.strip()
    if not line:
        continue

Performance Considerations

Processing 10,000-line JSONL file:

time python toon_json_converter.py large_dataset.jsonl output/

Typical performance:

Small objects (less than 1KB): ~5,000-10,000 records/second
Medium objects (around 10KB): ~1,000-2,000 records/second
Large objects (greater than 100KB): ~100-500 records/second

Performance depends on object complexity, disk I/O speed, and system resources.

TOON Folder to JSONL

Basic Usage

Aggregate all TOON files in a folder into a single JSONL file:

python toon_json_converter.py input_folder/ output.jsonl

File Processing Order

TOON files are processed in alphabetical order for deterministic output:

python toon_json_converter.py data_toons/ dataset.jsonl

Processing order:

data_toons/
  config_0003.toon  ← 1st (alphabetically)
  config_0010.toon  ← 2nd
  log_0001.toon     ← 3rd
  log_0002.toon     ← 4th

Source reference: toon_json_converter.py:1124

toon_files = sorted(f for f in os.listdir(input_folder) if f.endswith(".toon"))

Automatic Output Path

If output is omitted, creates {folder_name}.jsonl:

python toon_json_converter.py data_toons/
# Creates: data_toons.jsonl

Example: Aggregating User Records

Input folder: users/

users/
  user_0000.toon
  user_0001.toon
  user_0002.toon

user_0000.toon:

id: 1
name: Alice
email: [email protected]
roles[2]: admin, developer

user_0001.toon:

id: 2
name: Bob
email: [email protected]
roles[1]: user

user_0002.toon:

id: 3
name: Carol
email: [email protected]
roles[3]: admin, user, tester

Command:

python toon_json_converter.py users/ users.jsonl --compact

Output: users.jsonl

{"id":1,"name":"Alice","email":"[email protected]","roles":["admin","developer"]}
{"id":2,"name":"Bob","email":"[email protected]","roles":["user"]}
{"id":3,"name":"Carol","email":"[email protected]","roles":["admin","user","tester"]}

Terminal output:

✅ Converted 3 items to users.jsonl

Error Handling

The converter handles various error types:

File Read Errors

⚠️  locked.toon: File read error: [Errno 13] Permission denied: 'locked.toon'

Parse Errors

⚠️  corrupt.toon: Parse error: Invalid array header

Unexpected Errors

⚠️  data.toon: Unexpected error (UnicodeDecodeError): 'utf-8' codec can't decode byte

Source reference: toon_json_converter.py:1136-1144

try:
    data = self.parser.parse(self._read_toon(toon_path))
    outfile.write(json.dumps(data, ensure_ascii=False) + "\n")
    result.record_success()
except OSError as e:
    result.record_error(f"{toon_file}: File read error", e)
except (ValueError, KeyError) as e:
    result.record_error(f"{toon_file}: Parse error", e)
except Exception as e:
    result.record_error(f"{toon_file}: Unexpected error ({type(e).__name__})", e)

Empty Folder Handling

If no .toon files are found:

python toon_json_converter.py empty_folder/ output.jsonl

Terminal output:

⚠️  No .toon files found in empty_folder/

Source reference: toon_json_converter.py:1125-1127

if not toon_files:
    print(f"⚠️  No .toon files found in {input_folder}")
    return

Mixed File Types

Only .toon files are processed; other files are ignored: Folder structure:

data/
  record_0001.toon  ← Processed
  record_0002.toon  ← Processed
  readme.txt        ← Ignored
  config.json       ← Ignored
  image.png         ← Ignored

Result: Only record_0001.toon and record_0002.toon are converted.

Advanced Batch Processing

Shell Scripting

Process multiple files using shell loops:

Convert Multiple JSON Files

#!/bin/bash
for file in data/*.json; do
  python toon_json_converter.py "$file" --tab --length-marker
done

Convert with Custom Output Names

#!/bin/bash
for file in inputs/*.json; do
  base=$(basename "$file" .json)
  python toon_json_converter.py "$file" "outputs/${base}.toon" --pipe
done

Parallel Processing with xargs

find data/ -name "*.json" | xargs -I {} -P 4 python toon_json_converter.py {} --tab

-P 4 runs 4 conversions in parallel. Adjust based on CPU cores.

Python Scripting

Use the converter programmatically:

#!/usr/bin/env python3
import os
from toon_json_converter import BidirectionalConverter, EncodeOptions, Delimiter

# Configure options
options = EncodeOptions(
    delimiter=Delimiter.TAB,
    length_marker=True,
    key_folding=True
)

converter = BidirectionalConverter(encode_options=options)

# Batch convert all JSON files
input_dir = "data/json"
output_dir = "data/toon"

os.makedirs(output_dir, exist_ok=True)

for filename in os.listdir(input_dir):
    if filename.endswith(".json"):
        input_path = os.path.join(input_dir, filename)
        output_path = os.path.join(output_dir, filename.replace(".json", ".toon"))
        
        try:
            converter.convert_file(input_path, output_path)
        except Exception as e:
            print(f"❌ Failed to convert {filename}: {e}")

print("\n✅ Batch conversion complete!")

Directory Structure Preservation

Preserve directory hierarchy during batch conversion:

#!/bin/bash

# Find all JSON files recursively
find source/ -name "*.json" | while read -r file; do
  # Calculate relative path
  rel_path="${file#source/}"
  output_path="output/${rel_path%.json}.toon"
  
  # Create output directory
  mkdir -p "$(dirname "$output_path")"
  
  # Convert
  python toon_json_converter.py "$file" "$output_path" --tab
done

Example structure:

source/                      output/
  users/                      users/
    admin.json        →         admin.toon
    guest.json        →         guest.toon
  config/                     config/
    server.json       →         server.toon
    database.json     →         database.toon

Batch Processing with Filtering

Convert only files matching specific criteria:

#!/bin/bash

# Convert only files larger than 1KB
find data/ -name "*.json" -size +1k | while read -r file; do
  python toon_json_converter.py "$file" --tab
done

# Convert only recently modified files (last 7 days)
find data/ -name "*.json" -mtime -7 | while read -r file; do
  python toon_json_converter.py "$file"
done

Progress Tracking

Add progress indicators for large batches:

#!/bin/bash

files=(data/*.json)
total=${#files[@]}
current=0

for file in "${files[@]}"; do
  ((current++))
  echo "[$current/$total] Converting $file..."
  python toon_json_converter.py "$file" --tab
done

echo "✅ Converted $total files"

Output:

[1/50] Converting data/file1.json...
✅ data/file1.json → data/file1.toon
[2/50] Converting data/file2.json...
✅ data/file2.json → data/file2.toon
...
✅ Converted 50 files

Error Handling Strategies

Logging Errors to File

#!/bin/bash

error_log="conversion_errors.log"
: > "$error_log"  # Clear log file

for file in data/*.json; do
  if ! python toon_json_converter.py "$file" 2>> "$error_log"; then
    echo "Failed: $file" >> "$error_log"
  fi
done

if [ -s "$error_log" ]; then
  echo "⚠️  Errors occurred. See $error_log"
else
  echo "✅ All conversions successful"
fi

Skip vs. Halt on Error

Skip errors (continue processing):

for file in data/*.json; do
  python toon_json_converter.py "$file" || echo "⚠️  Skipped $file"
done

Halt on first error:

for file in data/*.json; do
  python toon_json_converter.py "$file" || exit 1
done

Retry Failed Conversions

#!/bin/bash

max_retries=3

for file in data/*.json; do
  success=false
  
  for ((i=1; i<=max_retries; i++)); do
    if python toon_json_converter.py "$file"; then
      success=true
      break
    else
      echo "⚠️  Retry $i/$max_retries for $file"
      sleep 1
    fi
  done
  
  if [ "$success" = false ]; then
    echo "❌ Failed after $max_retries attempts: $file"
  fi
done

Performance Optimization

Parallel Processing

Use GNU Parallel for efficient batch processing:

# Install GNU Parallel (if not installed)
# sudo apt-get install parallel  # Ubuntu/Debian
# brew install parallel           # macOS

# Convert files in parallel (8 jobs)
find data/ -name "*.json" | parallel -j 8 python toon_json_converter.py {} --tab

Benchmarking

Measure conversion performance:

#!/bin/bash

echo "Starting batch conversion..."
start_time=$(date +%s)

for file in data/*.json; do
  python toon_json_converter.py "$file"
done

end_time=$(date +%s)
elapsed=$((end_time - start_time))
file_count=$(ls data/*.json | wc -l)

echo "✅ Converted $file_count files in ${elapsed}s"
echo "   Average: $((file_count / elapsed)) files/second"

Best Practices

Use Consistent Options

Apply the same options across all files in a batch for consistent output:

# Good: Consistent options
for file in *.json; do
  python toon_json_converter.py "$file" --tab --length-marker
done

# Bad: Inconsistent options
python toon_json_converter.py file1.json --tab
python toon_json_converter.py file2.json --pipe
python toon_json_converter.py file3.json  # defaults

Validate Output

Verify conversions by round-tripping:

# Convert JSON → TOON → JSON and compare
python toon_json_converter.py original.json temp.toon
python toon_json_converter.py temp.toon reconstructed.json
diff <(jq -S . original.json) <(jq -S . reconstructed.json)

Handle Large Files

For very large JSONL files (>1GB), consider splitting first:

# Split into 10,000-line chunks
split -l 10000 huge_file.jsonl chunk_

# Convert each chunk
for chunk in chunk_*; do
  python toon_json_converter.py "$chunk" "${chunk}_toons/"
done

Use Version Control

Track batch conversion scripts in version control:

git add batch_convert.sh
git commit -m "Add batch conversion script with tab delimiter"

Monitor Disk Space

Check available disk space before large batch operations:

# Estimate output size (TOON is typically 1.1-1.3x JSON size)
input_size=$(du -sb data/*.json | awk '{sum+=$1} END {print sum}')
required_space=$((input_size * 13 / 10))  # 1.3x
available_space=$(df -B1 . | tail -1 | awk '{print $4}')

if [ $required_space -gt $available_space ]; then
  echo "❌ Insufficient disk space"
  exit 1
fi

Real-World Examples

Example 1: Processing API Responses

#!/bin/bash
# Convert API response logs from JSONL to individual TOON files

input="api_responses.jsonl"
output_dir="responses_$(date +%Y%m%d)"

python toon_json_converter.py "$input" "$output_dir/" --tab --length-marker

echo "✅ Converted API responses to $output_dir/"

Example 2: Aggregating Configuration Files

#!/bin/bash
# Aggregate all service configs into a single JSONL file

configs_dir="configs/services"
output="all_configs_$(date +%Y%m%d).jsonl"

python toon_json_converter.py "$configs_dir/" "$output" --compact

echo "✅ Aggregated configs to $output"

Example 3: Data Pipeline

#!/bin/bash
# Multi-stage data processing pipeline

# Stage 1: Extract data (assume this generates raw.jsonl)
./extract_data.sh > raw.jsonl

# Stage 2: Split into individual TOON files for manual review
python toon_json_converter.py raw.jsonl review_toons/ --tab

echo "📝 Review files in review_toons/ and make edits"
read -p "Press Enter when ready to continue..."

# Stage 3: Aggregate back to JSONL
python toon_json_converter.py review_toons/ processed.jsonl --compact

# Stage 4: Load into database (assume this consumes JSONL)
./load_to_db.sh processed.jsonl

echo "✅ Pipeline complete"

Next Steps

CLI Reference

Complete command-line interface reference

Options Guide

Deep dive into all options

Conversion Modes

Learn about all conversion modes

TOON Format

Understand the TOON format specification

Get Started

Usage Guide

TOON Format

API Reference

​Overview

​Built-In Batch Modes

JSONL → TOON Folder

TOON Folder → JSONL

​JSONL to TOON Folder

​Basic Usage

​File Naming Convention

​Automatic Output Folder

​Example: Processing Log Files

​Error Handling

​Empty Lines

​Performance Considerations

​TOON Folder to JSONL

​Basic Usage

​File Processing Order

​Automatic Output Path

​Example: Aggregating User Records

​Error Handling

​File Read Errors

​Parse Errors

​Unexpected Errors

​Empty Folder Handling

​Mixed File Types

​Advanced Batch Processing

​Shell Scripting

​Convert Multiple JSON Files

​Convert with Custom Output Names

​Parallel Processing with xargs

​Python Scripting

​Directory Structure Preservation

​Batch Processing with Filtering

​Progress Tracking

​Error Handling Strategies

​Logging Errors to File

​Skip vs. Halt on Error

​Retry Failed Conversions

​Performance Optimization

​Parallel Processing

​Benchmarking

​Best Practices

​Real-World Examples

​Example 1: Processing API Responses

​Example 2: Aggregating Configuration Files

​Example 3: Data Pipeline

​Next Steps

CLI Reference

Options Guide

Conversion Modes

TOON Format

Build docs developers (and LLMs) love

Overview

Built-In Batch Modes

JSONL to TOON Folder

Basic Usage

File Naming Convention

Automatic Output Folder

Example: Processing Log Files

Error Handling

Empty Lines

Performance Considerations

TOON Folder to JSONL

Basic Usage

File Processing Order

Automatic Output Path

Example: Aggregating User Records

Error Handling

File Read Errors

Parse Errors

Unexpected Errors

Empty Folder Handling

Mixed File Types

Advanced Batch Processing

Shell Scripting

Convert Multiple JSON Files

Convert with Custom Output Names

Parallel Processing with xargs

Python Scripting

Directory Structure Preservation

Batch Processing with Filtering

Progress Tracking

Error Handling Strategies

Logging Errors to File

Skip vs. Halt on Error

Retry Failed Conversions

Performance Optimization

Parallel Processing

Benchmarking

Best Practices

Real-World Examples

Example 1: Processing API Responses

Example 2: Aggregating Configuration Files

Example 3: Data Pipeline

Next Steps