Quick reference
The test runner script auto-activates your virtual environment and runs tests from the project root:The test runner automatically activates
.venv if it exists and is not already active.Test organization
Tests are organized by tier in subdirectories:Test selection is directory-based:
pytest tests/unit/ runs only unit tests. No --ignore flags or marker filtering needed.Test markers
Lerim uses pytest markers to categorize tests. Markers are defined inpyproject.toml:
| Marker | Description | Environment variable |
|---|---|---|
unit | Fast, deterministic tests with no LLM calls | N/A (default) |
smoke | Quick LLM sanity checks | LERIM_SMOKE=1 |
integration | Real LLM pipeline tests | LERIM_INTEGRATION=1 |
e2e | End-to-end CLI flows | LERIM_E2E=1 |
llm | Non-agentic LLM integration tests | Set by integration tests |
agent | Agent SDK or CLI integration tests | Set by integration tests |
embeddings | Embedding integration tests | Set by integration tests |
openrouter | OpenRouter provider tests | Set by provider tests |
openai | OpenAI provider tests | Set by provider tests |
zai | ZAI provider tests | Set by provider tests |
kimi | Kimi provider tests | Set by provider tests |
Running tests with pytest
You can also run tests directly with pytest:Smoke, integration, and e2e tests run in parallel via
pytest-xdist using -n auto to utilize all CPU cores.Test categories explained
Unit tests (tests/unit/)
Fast, deterministic tests with no LLM calls and no network. External state (config paths, DB paths) is monkeypatched to temp directories.
Execution time: ~2 secondsRequirements: None (no API keys needed) Example test files:
test_claude_adapter.py- Claude JSONL trace parsing and session discoverytest_codex_adapter.py- Codex trace parsing and session metadatatest_adapter_registry.py- Adapter loading and registrationtest_memory_record.py- Memory record construction and serializationtest_config.py- Settings loading and TOML layer merging
View complete unit test listing
View complete unit test listing
test_adapter_common.py- Shared adapter utilitiestest_claude_adapter.py- Claude adaptertest_codex_adapter.py- Codex adaptertest_opencode_adapter.py- OpenCode adaptertest_cursor_adapter.py- Cursor adaptertest_adapter_registry.py- Registry and platform discoverytest_memory_record.py- Memory serializationtest_memory_schemas.py- Pydantic schema validationtest_summary_write.py- Summary file writingtest_catalog_queries.py- Session catalog DB queriestest_fts.py- Full-text searchtest_config.py- Configuration systemtest_settings.py- Settings layer precedencetest_project_scope.py- Project scope resolutiontest_cli.py- CLI argument parsingtest_docker_compose.py- Docker compose generationtest_runtime_tools.py- Runtime tool boundary enforcement
Smoke tests (tests/smoke/)
Quick LLM round-trips to verify basic functionality. Requires API keys.
Execution time: ~40 seconds (parallel)Requirements:
LERIM_SMOKE=1 and LLM API keys
Test files:
test_pipelines.py- DSPy extraction and summarization pipelinestest_agent.py- PydanticAI agent basic response
Integration tests (tests/integration/)
Multi-component flows with real LLM calls, real file I/O, and real DB writes.
Execution time: ~3 minutes (parallel)Requirements:
LERIM_INTEGRATION=1 and LLM API keys
Test files:
test_extract.py- Full DSPy extraction pipeline with fixturestest_summarize.py- Summarization pipeline with seeded memoriestest_agent.py- Full agent ask with memory contexttest_providers.py- LM provider construction with real backendstest_memory_write.py- Agent-driven memory write flows
E2E tests (tests/e2e/)
Full CLI command flows as a user would invoke them.
Execution time: ~5 minutes (parallel)Requirements:
LERIM_E2E=1 and LLM API keys
Test files:
test_sync.py- Fulllerim syncagainst fixture tracestest_maintain.py- Fulllerim maintainon seeded memoriestest_full_cycle.py- Complete lifecycle: reset → sync → asktest_context_layers.py- Context layer resolution end-to-endtest_memory_write_modes.py- Agent memory write modes
Test script usage
Thetests/run_tests.sh script provides several options:
lint- Run ruff linterunit- Unit tests (no LLM calls)smoke- Smoke tests (quick LLM round-trips)integration- Integration tests (real LLM pipelines)e2e- End-to-end tests (full sync/maintain flows)quality- Compile check + pip checkall- Run all groups in order
Environment variables
| Variable | Required for | Default | |----------|-------------|---------|| |LERIM_SMOKE=1 | Smoke tests | - |
| LERIM_INTEGRATION=1 | Integration tests | - |
| LERIM_E2E=1 | E2E tests | - |
| LERIM_CONFIG | Override config path | tests/test_config.toml |
| LERIM_TEST_PROVIDER | Override provider | openrouter |
| LERIM_TEST_MODEL | Override model | x-ai/grok-4.1-fast |
The root
conftest.py automatically applies tests/test_config.toml when running smoke/integration/e2e tests.Test fixtures
Lerim uses hand-crafted fixture files for deterministic testing. Fixtures are NOT auto-generated.Trace fixtures (fixtures/traces/)
| File | Format | Purpose |
|------|--------|---------||
| claude_simple.jsonl | Claude | JWT auth decision + CORS learning |
| claude_long_multitopic.jsonl | Claude | Multi-topic session for windowed extraction |
| codex_simple.jsonl | Codex | Basic Codex adapter parsing |
| codex_with_tools.jsonl | Codex | Tool call extraction |
| debug_session.jsonl | Generic | Debugging session for pitfall extraction |
| mixed_decisions_learnings.jsonl | Generic | Multiple primitives in one trace |
| edge_short.jsonl | Generic | Minimal conversation edge case |
| edge_empty.jsonl | Generic | Empty content handling |
Memory fixtures (fixtures/memories/)
| File | Type | Purpose |
|------|------|---------||
| decision_auth_pattern.md | decision | JWT/HS256 auth decision |
| learning_queue_fix.md | learning | Atomic queue operations |
| learning_stale.md | learning | Low-confidence record for decay testing |
| learning_duplicate_a.md | learning | Deduplication test A |
| learning_duplicate_b.md | learning | Deduplication test B |
Writing new tests
Choose the appropriate tier
- Unit: No LLM, no network, mocked external state
- Smoke: Quick LLM sanity check
- Integration: Multi-component with real LLM
- E2E: Full CLI command flows
Shared test infrastructure
Root conftest.py fixtures
Available to all test tiers:
tmp_lerim_root- Temporary directory with canonical Lerim folder structuretmp_config-Configobject pointing attmp_lerim_rootseeded_memory-tmp_lerim_rootwith fixture memory files pre-populated
Helper functions (tests/helpers.py)
make_config(base)- Build a deterministicConfigrooted at a pathwrite_test_config(tmp_path, **sections)- Write a TOML config filerun_cli(args)- Run a CLI command in-process, returns(exit_code, stdout)run_cli_json(args)- Run a CLI command and parse stdout as JSON
Tier-specific conftest.py
unit/conftest.py- Autouse dummy API key for PydanticAI constructorssmoke/conftest.py- Skip all unlessLERIM_SMOKE=1integration/conftest.py- Skip all unlessLERIM_INTEGRATION=1e2e/conftest.py- Skip all unlessLERIM_E2E=1
CI/CD setup
Lerim uses GitHub Actions for continuous integration. The workflow is defined in.github/workflows/ci.yml:
Only unit tests run in CI by default. Smoke, integration, and e2e tests require API keys and are run manually or in separate workflows.
Common test patterns
Testing adapters
Testing memory operations
Testing CLI commands
Next steps
Getting started
Set up your development environment
Adding adapters
Step-by-step guide for adding new agent platform adapters