Skip to main content
The forge loop is the core of Dream Foundry. It runs multiple AI agent implementations in parallel, scores them, and selects the best approach.

Prerequisites

Before running the forge, ensure you have:
  • Python 3.11+
  • Required dependencies installed
  • Optional: Daytona API key for sandbox execution
  • Optional: Sentry DSN for error monitoring
The forge works both locally and in Daytona sandboxes. Local execution is faster for development, while Daytona provides true isolation.

Command Line Interface

The forge is available through forge.py in the project root.

Basic Usage

1

Run with default objective

python forge.py
Uses the default objective: “Generate weekly ‘AI Events in San Francisco’ post for Discord”
2

Run with custom objective

python forge.py "Create a weekly newsletter about AI events"
Or use the --objective flag:
python forge.py --objective "Weekly SF AI events"
3

Run in Daytona sandboxes

python forge.py --daytona
Requires DAYTONA_API_KEY in your environment.

Command Options

OptionShortDescription
--objective-oSpecify the objective for agents to implement
--daytona-dRun candidates in Daytona sandboxes
--help-hShow help message

Example Output

When you run the forge, you’ll see output like this:
============================================================
DREAM FOUNDRY - Forge Loop
============================================================
Objective: Generate weekly AI events for Discord
Candidates: 5
Execution: LOCAL

PHASE: Running Candidates
----------------------------------------

[Agent Alpha]
  [Run] Starting Agent Alpha...
  [Done] Output produced
  [Done] 4.2s

[Agent Beta]
  [Run] Starting Agent Beta...
  [Done] Output produced
  [Done] 12.8s

[Agent Gamma]
  [Run] Starting Agent Gamma...
  [Done] Output produced
  [Done] 3.1s

============================================================
PHASE: Scoring
----------------------------------------
Candidate     Success  Quality    Speed    TOTAL   
----------------------------------------------
alpha         PASS     45.0       72.0     62.4    
beta          PASS     88.5       57.3     79.7    
gamma         PASS     95.2       89.7     92.8    

============================================================
WINNER: gamma (score: 92.8)
============================================================

Streamlit UI

For a visual, step-by-step experience, use the Streamlit interface.

Starting the UI

streamlit run app.py
The UI will open at http://localhost:8501

UI Phases

The Streamlit interface walks through all 5 phases of the Dream Foundry:

1. Dreamcatcher

Input your objective and constraints

2. Dream Factory

View generated candidate approaches

3. Arena

Watch agents compete in real-time with live logs

4. Podium

See CodeRabbit polish the top 3 candidates

5. Awakening

Experience the winner with ElevenLabs narration

Live Execution Logs

The Arena phase shows real-time logs from each candidate:
# From app.py:304-308
def log_callback(candidate_id: str, message: str):
    if candidate_id in logs:
        logs[candidate_id].append(message)
        log_containers[candidate_id].code("\n".join(logs[candidate_id][-8:]))
This callback streams execution progress directly to the UI.

Understanding Execution Modes

Local Execution

In local mode (default), candidates run as subprocesses on your machine:
# From forge.py:154-160
result = subprocess.run(
    [sys.executable, script_path, objective, str(output_file)],
    capture_output=True,
    text=True,
    timeout=60,
    cwd=os.path.dirname(os.path.abspath(__file__)),
)
Advantages:
  • Fast startup (no sandbox creation)
  • Easy debugging
  • No API dependencies
Disadvantages:
  • No isolation between candidates
  • Shares system resources
  • Can’t test different environments

Daytona Execution

With --daytona, each candidate runs in an isolated sandbox:
# From forge.py:241-246
if execution_mode == "daytona":
    daytona_results = run_all_candidates_in_daytona(
        CANDIDATES, objective, artifacts_dir, log_callback
    )
Advantages:
  • Complete isolation
  • Can test different Python versions
  • Parallel execution
  • Safer for untrusted code
Disadvantages:
  • Slower startup (sandbox creation)
  • Requires API key
  • Network dependency
Daytona mode automatically falls back to local execution if DAYTONA_API_KEY is not configured.

Scoring System

The forge scores candidates on three criteria:

Success (20%)

Did the candidate produce output without errors?
# From scoring.py:376-377
success = produced_output and not error_occurred

Quality (60%)

Does the output meet requirements?
# From scoring.py:381-383
if artifact_content:
    quality = validate_quality(artifact_content)
    quality_score = quality.score
Quality validation checks:
  • Valid events: 10+ events (40 points)
  • Hackathon included: Must have a hackathon event (25 points)
  • Event count bonus: 14+ events (20 points)
  • No invalid events: All events pass validation (15 points)

Speed (20%)

Faster execution = higher score:
# From scoring.py:332-338
def calculate_speed_score(runtime_seconds: float, max_time: float = 30.0) -> float:
    if runtime_seconds <= 0:
        return 100.0
    if runtime_seconds >= max_time:
        return 0.0
    return round(100 * (1 - runtime_seconds / max_time), 1)

Final Score

# From scoring.py:341-365
def calculate_total_score(
    success: bool,
    quality_score: float,
    speed_score: float,
    weights: dict = None,
) -> float:
    if weights is None:
        weights = {'success': 0.2, 'quality': 0.6, 'speed': 0.2}

    if not success:
        return 0.0

    success_score = 100.0
    total = (
        success_score * weights['success'] +
        quality_score * weights['quality'] +
        speed_score * weights['speed']
    )

    return round(total, 1)

Artifacts

After a forge run, check the artifacts/ directory:
artifacts/
├── alpha/
│   └── discord_post.md
├── beta/
│   └── discord_post.md
├── gamma/
│   └── discord_post.md
├── scores.json
└── winner.txt

scores.json

Complete scoring breakdown:
{
  "objective": "Generate weekly AI events for Discord",
  "timestamp": "2026-01-24T10:30:00",
  "execution_mode": "local",
  "candidates": [
    {
      "candidate_id": "gamma",
      "success": true,
      "quality_score": 95.2,
      "speed_score": 89.7,
      "total_score": 92.8,
      "quality_details": {
        "passed": true,
        "total_events": 10,
        "valid_events": 10,
        "has_daytona_event": true
      }
    }
  ],
  "winner": "gamma",
  "winner_score": 92.8
}

winner.txt

Simple file containing just the winner’s ID:
gamma

Programmatic Usage

You can also use the forge programmatically:
from forge import run_forge, ForgeResult

def my_log_callback(candidate_id: str, message: str):
    print(f"[{candidate_id}] {message}")

result: ForgeResult = run_forge(
    objective="Create a weekly AI newsletter",
    use_daytona=True,
    log_callback=my_log_callback
)

print(f"Winner: {result.winner_id}")
print(f"Score: {result.winner_score}")
print(f"Artifact:\n{result.winner_artifact}")

Troubleshooting

No output produced

Check that the candidate script:
  • Accepts two arguments: objective and output_file
  • Writes to the output file path
  • Exits with code 0 on success

Timeout errors

Candidates have a 60-second timeout. If they exceed this:
# From forge.py:184-187
except subprocess.TimeoutExpired:
    error_occurred = True
    error_message = "Timeout after 60 seconds"
    log("[Error] Timeout!")
Optimize slow candidates or increase the timeout in forge.py:158.

Daytona connection failed

Ensure your API key is valid:
export DAYTONA_API_KEY="your_key_here"
Test with:
from src.daytona_runner import is_daytona_configured
print(is_daytona_configured())  # Should print True

Next Steps

Creating Agents

Learn how to create your own agent implementations

Sandbox Execution

Deep dive into Daytona sandbox integration

Build docs developers (and LLMs) love