Agent Epsilon: The Hallucinator
Nickname: The HallucinatorStrategy: Return unverified data
Purpose: Demonstrate quality scoring with bad data
Overview
Agent Epsilon produces syntactically valid output but semantically incorrect data. It generates “hallucinated” events with wrong dates, wrong locations, non-AI content, and fake URLs. This agent demonstrates how the Dream Arena’s quality scoring system detects and penalizes poor data quality even when code executes successfully.Strategy & Approach
Data Quality Problems
Agent Epsilon’s output contains five types of errors:- Wrong Dates - Events in wrong month or year
- Wrong Locations - Events outside Bay Area (NYC, etc.)
- Wrong Topic - Non-AI events (cooking classes, etc.)
- Fake URLs - Invalid domains that don’t exist
- Mixed Errors - Combinations of the above
Purpose in Dream Foundry
This agent demonstrates:- Quality scoring catches bad data even with valid code
- Content validation beyond syntax checking
- Location filtering for geographic relevance
- Date range verification for temporal accuracy
- URL validation to prevent broken links
Implementation
Hallucinated Data
BAD_EVENTS
The hardcoded list of intentionally incorrect events.
Each event in
BAD_EVENTS has at least one critical flaw. The data is defined at agent_epsilon.py:14-60.Core Functions
fetch_events()
Returns bad data without verification.
format_discord_post(events, objective)
Formats the bad data (format is correct, content is wrong).
Data Quality Analysis
Error Breakdown
Output Example
Agent Epsilon produces valid markdown but terrible content:Quality Scoring Penalties
The Dream Arena’s quality scorer penalizes Epsilon for:| Issue | Penalty | Affected Events |
|---|---|---|
| Wrong month/year | -80% | Events 1, 4 |
| Wrong location | -60% | Event 2 |
| Non-AI topic | -70% | Event 3 |
| Fake/invalid URL | -50% | Events 1, 5 |
| No verification | -30% | All events |
Performance Characteristics
Speed Metrics
- Execution Time: Less than 1 second (no network calls)
- Network Requests: 0 (returns hardcoded data)
- Timeout: N/A (no external requests)
Quality Metrics
- Event Coverage: 5 events (but 0 relevant)
- Accuracy: 0% (all events have errors)
- Completeness: 100% format, 0% content
- Reliability: 100% (code runs), 0% (data is wrong)
- Relevance: ~20% (1 out of 5 meets basic criteria)
Comparison with Good Agents
Demo Walkthrough Moment
In the Dream Foundry demo, Agent Epsilon plays an important role:Phase 3: Dream Arena
- All agents complete - Epsilon doesn’t crash (unlike Delta)
- Output looks valid - Markdown formatting is perfect
- Quality scoring runs - Automated validation checks each event
- Epsilon scores low - Quality score ~1.5/10
- Does not advance - Fails to make top 3 for Dream Podium
Key Demo Moment
Presenter: “Notice that Agent Epsilon completed successfully and produced nicely formatted output. But look at the actual events—one is in February, one is in New York, one is a cooking class! The Dream Arena’s quality scoring system detected these issues and gave Epsilon a quality score of only 1.5 out of 10. This shows that success isn’t just about running without errors—it’s about producing valuable, accurate results.”
Source Code Location
File:/candidates/agent_epsilon.pyLines: 122 total
Key Constants:
BAD_EVENTS-agent_epsilon.py:14-60⚠️ Intentionally wrong data
fetch_events()-agent_epsilon.py:63-67format_discord_post()-agent_epsilon.py:70-96main()-agent_epsilon.py:99-117
Scoring in Dream Arena
Agent Epsilon receives:- Speed: ⭐⭐⭐ (2x weight - EXCELLENT - instant)
- Reliability: ⭐⭐⭐ (3x weight - runs without crashing)
- Quality: ⭐ (3x weight - TERRIBLE - 1.5/10)
- Format: ⭐⭐⭐ (1x weight - perfect markdown)
When to Use Agent Epsilon
Use Cases
- Demonstrating quality scoring systems
- Testing validation logic
- Showing difference between format and content
- Educational/demo purposes only
Never Use For
- Any production workload
- Actual event discovery
- User-facing features
- Anything requiring accurate data
Educational Value
Agent Epsilon teaches important lessons:- Format ≠ Quality - Perfect syntax doesn’t mean good data
- Validation matters - Content must be checked, not just structure
- Quality scoring works - Automated systems can detect bad data
- Success isn’t enough - Running without errors doesn’t guarantee value
- Multi-dimensional scoring - Speed and reliability alone don’t win
Related Documentation
- Quality Scoring System - How quality is measured
- Dream Arena - Where agents compete
- Validation Rules - What makes data “good”
Comparison Summary
| Agent | Crashes | Format | Quality | Advances |
|---|---|---|---|---|
| Alpha | No | ✅ | ⭐⭐ (40%) | Maybe |
| Beta | No | ✅ | ⭐⭐⭐ (80%) | Yes |
| Gamma | No | ✅ | ⭐⭐⭐ (100%) | Yes |
| Delta | Yes | ❌ | 0% | No |
| Epsilon | No | ✅ | ⭐ (15%) | No |