Overview
Forge’s data processing feature allows you to apply AI-powered transformations to large datasets in JSONL (JSON Lines) format. This is particularly useful for:- Data enrichment and augmentation
- Batch classification tasks
- Content generation at scale
- Dataset validation and cleaning
- Synthetic data generation
Basic Usage
Command Structure
Required Files
Example
Configuration Options
System Prompt
Define the AI’s behavior and role:system.txt:
User Prompt Template
Define how each data item is presented:user.txt:
{{text}} placeholder is replaced with data from the input JSONL.
Concurrency Control
Process multiple items in parallel:- Default:
5 - Higher values: Faster processing, more API load
- Lower values: Slower processing, more conservative
Common Use Cases
Sentiment Analysis
Input (reviews.jsonl):Data Enrichment
Input (companies.jsonl):Text Classification
Input (support-tickets.jsonl):Synthetic Data Generation
Input (templates.jsonl):Advanced Features
Conversation Context
Continue processing in an existing conversation:Template Variables
Use any field from the input JSON in your prompts: Input:Schema Validation
Forge validates output against your schema:- Type checking (string, number, boolean, array, object)
- Required fields enforcement
- Enum validation
- Range validation (minimum, maximum)
- Pattern matching (regex)
- Custom constraints
Output Format
Processed data is written to stdout in JSONL format:- All original fields from input
- New fields generated by the AI
- Fields validated against the schema
Performance Optimization
Optimal Concurrency
Choose concurrency based on:Batch Processing
For very large datasets, process in batches:Monitoring Progress
Forge displays progress during processing:Error Handling
Schema Validation Errors
If output doesn’t match schema:- Forge automatically retries
- After 3 retries, the item is skipped
- Error is logged to stderr
API Errors
For API failures:- Automatic retry with exponential backoff
- Configurable retry attempts (see Environment Variables)
- Failed items can be reprocessed