How to create custom presets and tune YC-Bench for your specific testing needs
YC-Bench’s five built-in presets test different agent capabilities, but you may want to create custom configurations to test specific scenarios. This guide shows you how.
Copy default.toml or another preset as a starting point
Use extends = "default" to inherit base parameters
Override only the parameters you want to change
Save to src/yc_bench/config/presets/your_preset.toml
Run with yc-bench run --config your_preset
Best practice: Always use extends = "default" unless you want to specify every parameter from scratch. This ensures you get sensible defaults for everything you don’t override.
Use constant distributions to isolate specific variables:
# Remove prestige as a variable entirely[world.dist.required_prestige]type = "constant"value = 1# Force ALL tasks to be single-domain[world.dist.domain_count]type = "constant"value = 1
Deadlines too tight: If deadline_qty_per_day is too high, even perfectly-played tasks will miss deadlines due to insufficient employee throughput.Formula to check:
If this inequality fails for typical tasks, deadlines are mathematically impossible.
Runway too short: If initial_funds / (num_employees × avg_salary) is less than ~3 months, agents may not have time to climb prestige and reach profitability.Rule of thumb: Runway should be at least 2× the time needed to reach break-even prestige tier.
Salary tier shares: The three salary tier share values MUST sum to exactly 1.0:
Prestige gate lockout: If world.dist.required_prestige.mode is too high and reward_prestige_delta is too low, agents may be unable to climb fast enough to access high-reward tasks before running out of money.Rule of thumb: (target_prestige - 1) / avg_prestige_delta tasks needed to climb. Ensure this is achievable within your runway.
Goal: Test whether agents can handle “burst” workloads - long quiet periods followed by deadline crunches.
# presets/burst_workload.tomlextends = "default"name = "burst_workload"description = "Tests burst workloads: deadlines are VERY tight but task qty is bimodal (some tiny, some huge)."[sim]horizon_years = 1[world]initial_funds_cents = 18_000_000# Bimodal task sizes: many small, some giants[world.dist.required_qty]type = "triangular"low = 300 # Quick winshigh = 8000 # All-hands-on-deck crunchmode = 500 # Most tasks small, but the big ones dominate# Tight deadlines across the boarddeadline_qty_per_day = 300.0# High parallelism penalty: agents must sequence carefullypenalty_fail_multiplier = 1.6penalty_cancel_multiplier = 2.2
What this tests:
Can the agent recognize when a “giant” task appears and dedicate the team to it?
Does it use small tasks to fill gaps between big tasks?
Does it avoid accepting a giant task when another is in flight?