Skip to main content
Lerim has a comprehensive test suite organized into four tiers: unit, smoke, integration, and e2e. Tests are directory-based, with no marker filtering needed.

Quick reference

The test runner script auto-activates your virtual environment and runs tests from the project root:
tests/run_tests.sh unit
The test runner automatically activates .venv if it exists and is not already active.

Test organization

Tests are organized by tier in subdirectories:
tests/
  conftest.py              # Root: shared fixtures, marker registration
  helpers.py               # make_config, run_cli, etc.
  run_tests.sh             # Directory-based test selection
  test_config.toml         # Default LLM config for smoke/integration/e2e
  js_render_harness.js     # JavaScript rendering test
  fixtures/                # Shared across tiers
    traces/                # JSONL session traces
    memories/              # Seeded memory files
  unit/                    # Flat, descriptive names. No LLM, <5s.
    conftest.py            # autouse dummy API key
  smoke/                   # Quick LLM sanity, requires LERIM_SMOKE=1
    conftest.py            # Skip gate
  integration/             # Real LLM, quality checks, requires LERIM_INTEGRATION=1
    conftest.py            # Skip gate
  e2e/                     # Full CLI flows, requires LERIM_E2E=1
    conftest.py            # Skip gate
Test selection is directory-based: pytest tests/unit/ runs only unit tests. No --ignore flags or marker filtering needed.

Test markers

Lerim uses pytest markers to categorize tests. Markers are defined in pyproject.toml:
MarkerDescriptionEnvironment variable
unitFast, deterministic tests with no LLM callsN/A (default)
smokeQuick LLM sanity checksLERIM_SMOKE=1
integrationReal LLM pipeline testsLERIM_INTEGRATION=1
e2eEnd-to-end CLI flowsLERIM_E2E=1
llmNon-agentic LLM integration testsSet by integration tests
agentAgent SDK or CLI integration testsSet by integration tests
embeddingsEmbedding integration testsSet by integration tests
openrouterOpenRouter provider testsSet by provider tests
openaiOpenAI provider testsSet by provider tests
zaiZAI provider testsSet by provider tests
kimiKimi provider testsSet by provider tests

Running tests with pytest

You can also run tests directly with pytest:
pytest tests/unit/ -x -q
Smoke, integration, and e2e tests run in parallel via pytest-xdist using -n auto to utilize all CPU cores.

Test categories explained

Unit tests (tests/unit/)

Fast, deterministic tests with no LLM calls and no network. External state (config paths, DB paths) is monkeypatched to temp directories. Execution time: ~2 seconds
Requirements: None (no API keys needed)
Example test files:
  • test_claude_adapter.py - Claude JSONL trace parsing and session discovery
  • test_codex_adapter.py - Codex trace parsing and session metadata
  • test_adapter_registry.py - Adapter loading and registration
  • test_memory_record.py - Memory record construction and serialization
  • test_config.py - Settings loading and TOML layer merging
  • test_adapter_common.py - Shared adapter utilities
  • test_claude_adapter.py - Claude adapter
  • test_codex_adapter.py - Codex adapter
  • test_opencode_adapter.py - OpenCode adapter
  • test_cursor_adapter.py - Cursor adapter
  • test_adapter_registry.py - Registry and platform discovery
  • test_memory_record.py - Memory serialization
  • test_memory_schemas.py - Pydantic schema validation
  • test_summary_write.py - Summary file writing
  • test_catalog_queries.py - Session catalog DB queries
  • test_fts.py - Full-text search
  • test_config.py - Configuration system
  • test_settings.py - Settings layer precedence
  • test_project_scope.py - Project scope resolution
  • test_cli.py - CLI argument parsing
  • test_docker_compose.py - Docker compose generation
  • test_runtime_tools.py - Runtime tool boundary enforcement

Smoke tests (tests/smoke/)

Quick LLM round-trips to verify basic functionality. Requires API keys. Execution time: ~40 seconds (parallel)
Requirements: LERIM_SMOKE=1 and LLM API keys
Test files:
  • test_pipelines.py - DSPy extraction and summarization pipelines
  • test_agent.py - PydanticAI agent basic response

Integration tests (tests/integration/)

Multi-component flows with real LLM calls, real file I/O, and real DB writes. Execution time: ~3 minutes (parallel)
Requirements: LERIM_INTEGRATION=1 and LLM API keys
Test files:
  • test_extract.py - Full DSPy extraction pipeline with fixtures
  • test_summarize.py - Summarization pipeline with seeded memories
  • test_agent.py - Full agent ask with memory context
  • test_providers.py - LM provider construction with real backends
  • test_memory_write.py - Agent-driven memory write flows

E2E tests (tests/e2e/)

Full CLI command flows as a user would invoke them. Execution time: ~5 minutes (parallel)
Requirements: LERIM_E2E=1 and LLM API keys
Test files:
  • test_sync.py - Full lerim sync against fixture traces
  • test_maintain.py - Full lerim maintain on seeded memories
  • test_full_cycle.py - Complete lifecycle: reset → sync → ask
  • test_context_layers.py - Context layer resolution end-to-end
  • test_memory_write_modes.py - Agent memory write modes

Test script usage

The tests/run_tests.sh script provides several options:
tests/run_tests.sh [group] [options]
Groups:
  • lint - Run ruff linter
  • unit - Unit tests (no LLM calls)
  • smoke - Smoke tests (quick LLM round-trips)
  • integration - Integration tests (real LLM pipelines)
  • e2e - End-to-end tests (full sync/maintain flows)
  • quality - Compile check + pip check
  • all - Run all groups in order
LLM configuration options:
--llm-provider PROVIDER
--llm-model MODEL
--llm-base-url URL
--agent-provider PROVIDER
--agent-model MODEL
Example:
tests/run_tests.sh smoke --llm-provider openrouter --llm-model x-ai/grok-4.1-fast
Some test tiers require API keys. Set these as environment variables:
  • OPENROUTER_API_KEY
  • ZAI_API_KEY or ZAI_CODING_API_KEY
  • OPENAI_API_KEY
  • ANTHROPIC_API_KEY (optional)

Environment variables

| Variable | Required for | Default | |----------|-------------|---------|| | LERIM_SMOKE=1 | Smoke tests | - | | LERIM_INTEGRATION=1 | Integration tests | - | | LERIM_E2E=1 | E2E tests | - | | LERIM_CONFIG | Override config path | tests/test_config.toml | | LERIM_TEST_PROVIDER | Override provider | openrouter | | LERIM_TEST_MODEL | Override model | x-ai/grok-4.1-fast |
The root conftest.py automatically applies tests/test_config.toml when running smoke/integration/e2e tests.

Test fixtures

Lerim uses hand-crafted fixture files for deterministic testing. Fixtures are NOT auto-generated.

Trace fixtures (fixtures/traces/)

| File | Format | Purpose | |------|--------|---------|| | claude_simple.jsonl | Claude | JWT auth decision + CORS learning | | claude_long_multitopic.jsonl | Claude | Multi-topic session for windowed extraction | | codex_simple.jsonl | Codex | Basic Codex adapter parsing | | codex_with_tools.jsonl | Codex | Tool call extraction | | debug_session.jsonl | Generic | Debugging session for pitfall extraction | | mixed_decisions_learnings.jsonl | Generic | Multiple primitives in one trace | | edge_short.jsonl | Generic | Minimal conversation edge case | | edge_empty.jsonl | Generic | Empty content handling |

Memory fixtures (fixtures/memories/)

| File | Type | Purpose | |------|------|---------|| | decision_auth_pattern.md | decision | JWT/HS256 auth decision | | learning_queue_fix.md | learning | Atomic queue operations | | learning_stale.md | learning | Low-confidence record for decay testing | | learning_duplicate_a.md | learning | Deduplication test A | | learning_duplicate_b.md | learning | Deduplication test B |

Writing new tests

1

Choose the appropriate tier

  • Unit: No LLM, no network, mocked external state
  • Smoke: Quick LLM sanity check
  • Integration: Multi-component with real LLM
  • E2E: Full CLI command flows
2

Create the test file

Place it in the appropriate directory:
tests/unit/test_my_feature.py
tests/smoke/test_my_feature.py
tests/integration/test_my_feature.py
tests/e2e/test_my_feature.py
3

Add a docstring

Every test file must have a top-level docstring:
"""Tests for my new feature."""
4

Write focused tests

Each test function should test ONE thing:
def test_my_specific_behavior():
    """Test that my feature does X when Y."""
    # Arrange
    # Act
    # Assert
5

Use fixtures

Leverage shared fixtures from conftest.py:
def test_with_temp_memory(tmp_lerim_root):
    """Test using temporary Lerim directory."""
    assert tmp_lerim_root.exists()
6

Update documentation

Add your test file to tests/README.md with a brief description.

Shared test infrastructure

Root conftest.py fixtures

Available to all test tiers:
  • tmp_lerim_root - Temporary directory with canonical Lerim folder structure
  • tmp_config - Config object pointing at tmp_lerim_root
  • seeded_memory - tmp_lerim_root with fixture memory files pre-populated

Helper functions (tests/helpers.py)

  • make_config(base) - Build a deterministic Config rooted at a path
  • write_test_config(tmp_path, **sections) - Write a TOML config file
  • run_cli(args) - Run a CLI command in-process, returns (exit_code, stdout)
  • run_cli_json(args) - Run a CLI command and parse stdout as JSON

Tier-specific conftest.py

  • unit/conftest.py - Autouse dummy API key for PydanticAI constructors
  • smoke/conftest.py - Skip all unless LERIM_SMOKE=1
  • integration/conftest.py - Skip all unless LERIM_INTEGRATION=1
  • e2e/conftest.py - Skip all unless LERIM_E2E=1

CI/CD setup

Lerim uses GitHub Actions for continuous integration. The workflow is defined in .github/workflows/ci.yml:
name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    timeout-minutes: 10

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - uses: astral-sh/setup-uv@v4
      - run: sudo apt-get update && sudo apt-get install -y ripgrep
      - run: uv venv && uv pip install -e '.[test,lint]'
      - run: uv run ruff check src/ tests/
      - run: uv run python -m pytest tests/unit/ -x -q
Only unit tests run in CI by default. Smoke, integration, and e2e tests require API keys and are run manually or in separate workflows.

Common test patterns

Testing adapters

from pathlib import Path
from lerim.adapters import claude

def test_read_claude_session(tmp_path):
    """Test Claude adapter reads JSONL session correctly."""
    # Create a fixture JSONL file
    session_file = tmp_path / "test.jsonl"
    session_file.write_text(
        '{"type":"user","message":{"content":"hello"}}\n'
        '{"type":"assistant","message":{"content":"hi"}}\n'
    )
    
    # Read the session
    session = claude.read_session(session_file)
    
    # Verify structure
    assert session is not None
    assert len(session.messages) == 2
    assert session.messages[0].role == "user"
    assert session.messages[0].content == "hello"

Testing memory operations

from lerim.memory.schemas import Decision

def test_decision_serialization(tmp_path):
    """Test decision record serializes to markdown correctly."""
    decision = Decision(
        title="Use JWT for auth",
        body="We chose JWT with HS256 for stateless auth.",
        confidence=0.9,
        tags=["auth", "jwt"]
    )
    
    # Write to file
    output = tmp_path / "decision.md"
    output.write_text(decision.to_markdown())
    
    # Read back and verify
    content = output.read_text()
    assert "title: Use JWT for auth" in content
    assert "JWT with HS256" in content

Testing CLI commands

from tests.helpers import run_cli

def test_memory_list_command(seeded_memory):
    """Test 'lerim memory list' returns expected records."""
    exit_code, output = run_cli(["memory", "list"])
    
    assert exit_code == 0
    assert "decision" in output.lower()
    assert "learning" in output.lower()

Next steps

Getting started

Set up your development environment

Adding adapters

Step-by-step guide for adding new agent platform adapters

Build docs developers (and LLMs) love