Development Environment Setup
Prerequisites
- Python 3.12+ (required)
- uv (recommended package manager)
- Git for version control
Installation
- Clone the repository:
- Install dependencies:
pyproject.toml.
- Set up API keys:
.env file in the project root:
- Verify installation:
Optional: PostgreSQL Support
By default, YC-Bench uses SQLite. For PostgreSQL:Running Tests
YC-Bench currently does not have a formal test suite. Contributions to add testing infrastructure are welcome! Recommended testing approach during development:- Run a fast tutorial benchmark:
tutorial preset has relaxed deadlines and minimal prestige complexity, completing in ~10-20 turns.
- Inspect the output:
- SQLite DB:
db/tutorial_1_<model>.db - Rollout JSON:
results/yc_bench_result_tutorial_1_<model>.json - Logs:
logs/debug.log(if using live dashboard)
- Use the CLI directly for manual testing:
Code Style and Standards
Python Style
- Follow PEP 8 with 100-character line length
- Use type hints for function signatures
- Prefer
from __future__ import annotationsfor cleaner type syntax - Docstrings: Use for complex functions; keep them concise
Import Order
Database Access
- CLI commands: Use
get_db()context manager (auto-commit) - Services: Accept
db: Sessionparameter - Always flush after mutations:
db.flush() - Use UUIDs for primary keys (not auto-increment integers)
Error Handling
- CLI commands: Return JSON errors via
error_output("message") - Services: Raise exceptions with descriptive messages
- Agent loop: Catch exceptions and mark terminal with
TerminalReason.ERROR
Configuration
- All tunable parameters live in
config/schema.py - Add new parameters to appropriate Pydantic model
- Document meaning in inline comments
- Provide sensible defaults in field factories
How to Add New Features
Adding a New CLI Command
-
Choose the appropriate module:
- Company queries →
cli/company_commands.py - Employee operations →
cli/employee_commands.py - Task lifecycle →
cli/task_commands.py - Market browsing →
cli/market_commands.py - Financial data →
cli/finance_commands.py - Reports →
cli/report_commands.py - Simulation control →
cli/sim_commands.py - Agent memory →
cli/scratchpad_commands.py
- Company queries →
- Define the command:
- Test the command:
- Add to agent tool schema (if agent-accessible):
agent/tools/run_command_schema.py to document the new command.
Modifying Simulation Mechanics
Example: Change prestige decay formula-
Locate the implementation:
core/engine.py:apply_prestige_decay() - Modify the formula:
- Test with a benchmark run:
- Compare results:
Creating New Agent Runtimes
To support a custom agent architecture or new LLM provider:- Create runtime file:
- Register in factory:
- Test:
Adding New Configuration Parameters
- Update schema:
- Use in code:
- Add to presets:
Pull Request Process
Before Submitting
-
Test your changes:
- Run at least one full benchmark with
tutorialoreasypreset - Verify no runtime errors or crashes
- Check that JSON output is well-formed
- Run at least one full benchmark with
-
Check code quality:
- Remove debug print statements
- Add docstrings to new functions/classes
- Ensure type hints are present
- Follow existing code style
-
Update documentation:
- If adding a new CLI command, document it in
/docs/api-reference/ - If changing mechanics, update
/docs/how-it-works/ - If adding config parameters, document them in
/docs/configuration/
- If adding a new CLI command, document it in
Submitting a PR
- Fork the repository and create a feature branch:
- Commit your changes:
- Use imperative mood (“Add” not “Added”)
- Keep first line under 72 characters
- Add details in body if needed
- Push to your fork:
-
Open a pull request on GitHub:
- Title: Clear, concise description (e.g., “Add task priority command”)
- Description:
- What does this PR do?
- Why is this change needed?
- How was it tested?
- Any breaking changes?
-
Respond to review feedback:
- Address reviewer comments
- Push updates to the same branch
- Mark conversations as resolved when addressed
PR Checklist
- Code follows project style (PEP 8, type hints, imports)
- No debug print statements or commented-out code
- Changes tested with at least one benchmark run
- Documentation updated (if adding features)
- Commit messages are clear and descriptive
- No merge conflicts with
main
Areas for Contribution
High Priority
- Test suite: Add pytest-based tests for core mechanics, CLI commands, and agent loop
- Metrics dashboard: Extend
runner/dashboard.pywith more visualizations - Multi-agent support: Allow multiple agents to compete on the same world seed
- Evaluation framework: Automated scoring and leaderboard generation
New Features
- Task dependencies: Tasks that unlock after completing prerequisites
- Employee hiring/firing: Dynamic workforce management
- Market events: Random external shocks (funding rounds, competitor launches)
- Custom domains: Allow users to define their own domain types
- Visualization tools: Plot prestige curves, cash flow, task timelines
Documentation
- Tutorial videos: Walkthrough of CLI commands and strategy
- Example strategies: Annotated rollouts showing good vs. bad decisions
- API reference completeness: Full CLI command documentation
- Developer guides: Deep dives into specific subsystems
Performance
- Profiling: Identify and optimize slow DB queries
- Batch operations: Reduce DB round-trips in CLI commands
- Parallel benchmarks: Run multiple seeds in parallel without conflicts
Code of Conduct
YC-Bench is an open-source project. We expect contributors to:- Be respectful of other contributors and maintainers
- Provide constructive feedback in code reviews
- Focus on technical merit rather than personal preferences
- Credit others’ work when building on existing code
Getting Help
- GitHub Issues: For bug reports and feature requests
- GitHub Discussions: For questions and community chat
- Email: [email protected] for security issues