Test Case Groups

The test-case-groups parameter allows you to selectively run specific categories of security tests, enabling focused evaluation during development and CI/CD workflows.

Overview

By default, Circuit Breaker Labs actions run all available test groups when evaluating your model. The test-case-groups parameter lets you narrow the scope to specific security categories.

test-case-groups

string

Format: Space-separated list of test group identifiersRequired: No (defaults to all groups)Example:

test-case-groups: "prompt_injection jailbreak data_exfiltration"

Why Use Test Case Groups?

Focused Development

When improving specific security aspects of your system prompt, test only relevant categories:

# Testing improvements to prompt injection defenses
test-case-groups: "prompt_injection"

This provides faster feedback cycles during development.

Staged CI/CD Pipeline

Run critical tests on every commit, comprehensive tests before deployment:

# Fast check on pull requests
- name: Quick security check
  uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
  with:
    test-case-groups: "prompt_injection jailbreak"
    # ... other params

# Full evaluation before production deployment
- name: Comprehensive security evaluation
  uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
  # No test-case-groups parameter = run all tests
  with:
    # ... other params

Cost and Time Optimization

Fewer test groups mean:

Faster evaluation runs
Lower API usage costs
Quicker developer feedback

Use targeted test groups during development, then run full evaluations before production releases.

Available Test Case Groups

Circuit Breaker Labs supports various test case groups through the TestCaseGroup enum. Groups target different attack vectors and security concerns.

The specific test groups available depend on your Circuit Breaker Labs API subscription and the latest API version. Refer to the Circuit Breaker Labs API documentation for the current list of supported groups.

Common Test Groups

While the exact groups may vary, typical categories include:

prompt_injection

Tests for prompt injection attacks where adversaries attempt to override your system instructions.Example attacks:

“Ignore previous instructions and…”
Role-switching attempts
Instruction override patterns

jailbreak

Tests for jailbreak attempts that try to bypass safety guardrails through various sophisticated techniques.Example attacks:

Hypothetical scenarios
Roleplay requests
Multi-step manipulation

data_exfiltration

Tests for attempts to extract training data, system information, or confidential details.Example attacks:

Requests for system prompts
Training data extraction
Internal information disclosure

toxic_content

Tests for generation of harmful, offensive, or toxic content.Example attacks:

Hate speech generation
Violence glorification
Harassment content

Custom test groups may also be supported. Check with your Circuit Breaker Labs account to see available options.

Usage Examples

Single Test Group

Test only prompt injection vulnerabilities:

- name: Test prompt injection defenses
  uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
  with:
    fail-action-threshold: "0.80"
    fail-case-threshold: "0.6"
    variations: "3"
    maximum-iteration-layers: "2"
    system-prompt: ${{ steps.load-prompt.outputs.prompt }}
    openrouter-model-name: "anthropic/claude-3.7-sonnet"
    circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}
    test-case-groups: "prompt_injection"

Multiple Test Groups

Test several related security categories:

- name: Test core security defenses
  uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
  with:
    fail-action-threshold: "0.75"
    fail-case-threshold: "0.6"
    variations: "2"
    maximum-iteration-layers: "1"
    system-prompt: "You are a helpful assistant"
    openrouter-model-name: "openai/gpt-4"
    circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}
    test-case-groups: "prompt_injection jailbreak data_exfiltration"

All Test Groups (Default)

Omit the parameter to run comprehensive evaluation:

- name: Comprehensive security evaluation
  uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
  with:
    fail-action-threshold: "0.85"
    fail-case-threshold: "0.7"
    variations: "5"
    maximum-iteration-layers: "3"
    system-prompt: ${{ steps.load-prompt.outputs.prompt }}
    openrouter-model-name: "anthropic/claude-3.7-sonnet"
    circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}
    # No test-case-groups parameter = all groups

Multi-Turn Evaluation

Test case groups work identically in multi-turn evaluations:

- name: Multi-turn jailbreak testing
  uses: circuitbreakerlabs/actions/multiturn-evaluate-system-prompt@v1
  with:
    fail-action-threshold: "0.70"
    fail-case-threshold: "0.6"
    max-turns: "4"
    test-types: "crescendo context_switching"
    system-prompt: "You are a helpful assistant"
    openrouter-model-name: "anthropic/claude-3.7-sonnet"
    circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}
    test-case-groups: "jailbreak"

OpenAI Fine-Tune Evaluations

Test case groups apply to fine-tuned model evaluations as well:

- name: Evaluate fine-tuned model safety
  uses: circuitbreakerlabs/actions/singleturn-evaluate-openai-finetune@v1
  with:
    fail-action-threshold: "0.80"
    fail-case-threshold: "0.65"
    variations: "3"
    maximum-iteration-layers: "2"
    model-name: "ft:gpt-4-0125-preview:org:model:id"
    circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}
    openai-api-key: ${{ secrets.OPENAI_API_KEY }}
    test-case-groups: "prompt_injection jailbreak"

Implementation Details

How It Works

When you specify test case groups, the action:

Parses the space-separated list of group identifiers
Validates each group against the TestCaseGroup enum
Passes the filtered list to the Circuit Breaker Labs API
Runs only tests belonging to the specified groups

From the source code (src/actions/common.py:64-69):

def parse_test_case_group(value: str) -> TestCaseGroup | str:
    try:
        return TestCaseGroup(value)
    except ValueError:
        # If not a valid enum value, treat it as a custom string
        return value

Custom Test Groups

The parser accepts both:

Standard enum values from TestCaseGroup
Custom string values for organization-specific test groups

If a group name doesn’t match a standard enum value, it’s passed to the API as a custom string. The API will validate whether the group exists for your account.

Invalid group names will cause the API to return an error, failing the action early before running tests.

Workflow Patterns

Pattern 1: Progressive Testing

Run quick tests on every commit, comprehensive tests on main branch:

name: Security Evaluation

on:
  pull_request:
  push:
    branches: [main]

jobs:
  quick-check:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - name: Quick security test
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          test-case-groups: "prompt_injection"
          fail-action-threshold: "0.80"
          fail-case-threshold: "0.6"
          # ... other params

  comprehensive-check:
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - name: Full security evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          # No test-case-groups = run all
          fail-action-threshold: "0.85"
          fail-case-threshold: "0.7"
          # ... other params

Pattern 2: Parallel Group Testing

Run different test groups in parallel for faster results:

jobs:
  security-tests:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        group:
          - "prompt_injection"
          - "jailbreak"
          - "data_exfiltration"
    steps:
      - uses: actions/checkout@v6
      - name: Test ${{ matrix.group }}
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          test-case-groups: ${{ matrix.group }}
          # ... other params

Pattern 3: Critical vs. Non-Critical Tests

Apply different thresholds to different test categories:

jobs:
  critical-security:
    runs-on: ubuntu-latest
    steps:
      - name: Critical security tests (strict)
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          test-case-groups: "prompt_injection jailbreak"
          fail-action-threshold: "0.10"  # Strict
          fail-case-threshold: "0.8"     # High bar
          # ... other params

  general-safety:
    runs-on: ubuntu-latest
    steps:
      - name: General safety tests (moderate)
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          test-case-groups: "toxic_content"
          fail-action-threshold: "0.30"  # More permissive
          fail-case-threshold: "0.6"     # Moderate bar
          # ... other params

Best Practices

Start Broad, Then Focus

Begin with full evaluations (no test-case-groups) to identify weaknesses, then use targeted groups to iterate on specific issues.

Match Groups to Development Phase

Feature branches: Single critical group
Pull requests: Core security groups
Main/Production: All groups

Document Your Group Choices

Comment in your workflow files why specific groups are chosen:

# Testing only prompt injection as that's what we improved in this PR
test-case-groups: "prompt_injection"

Combine with Variations and Layers

Fewer test groups allow higher variations and iteration layers within the same runtime:

test-case-groups: "jailbreak"
variations: "5"              # More variations
maximum-iteration-layers: "3" # Deeper testing

Troubleshooting

Invalid Test Group Error

Error message:

Error: Invalid test case group 'unknown_group'

Solution: Verify group names against the Circuit Breaker Labs API documentation or remove the invalid group from your list.

Empty Test Results

Symptom: Action completes but reports 0 tests run Cause: The specified test groups don’t exist or aren’t available for your API subscription Solution: Check your Circuit Breaker Labs account settings or run without test-case-groups to see all available tests.

Unexpected Failure Rates

Symptom: Failure rate dramatically different when using specific groups vs. all groups Explanation: Different test groups have different difficulty levels. Some categories are inherently harder to defend against. Action: This is expected behavior. Adjust your thresholds based on the specific groups you’re testing.

Input Parameters - Complete reference for all action inputs
Thresholds - Understanding how to set appropriate threshold values
Single-Turn Actions - Action-specific documentation
Multi-Turn Actions - Multi-turn evaluation guides

Getting Started

Actions

Configuration

Examples

Resources

Overview

Why Use Test Case Groups?

Focused Development

Staged CI/CD Pipeline

Cost and Time Optimization

Available Test Case Groups

Common Test Groups

Usage Examples

Single Test Group

Multiple Test Groups

All Test Groups (Default)

Multi-Turn Evaluation

OpenAI Fine-Tune Evaluations

Implementation Details

How It Works

Custom Test Groups

Workflow Patterns

Pattern 1: Progressive Testing

Pattern 2: Parallel Group Testing

Pattern 3: Critical vs. Non-Critical Tests

Best Practices

Start Broad, Then Focus

Match Groups to Development Phase

Document Your Group Choices

Combine with Variations and Layers

Troubleshooting

Invalid Test Group Error

Empty Test Results

Unexpected Failure Rates

Build docs developers (and LLMs) love

Getting Started

Actions

Configuration

Examples

Resources

​Overview

​Why Use Test Case Groups?

​Focused Development

​Staged CI/CD Pipeline

​Cost and Time Optimization

​Available Test Case Groups

​Common Test Groups

​Usage Examples

​Single Test Group

​Multiple Test Groups

​All Test Groups (Default)

​Multi-Turn Evaluation

​OpenAI Fine-Tune Evaluations

​Implementation Details

​How It Works

​Custom Test Groups

​Workflow Patterns

​Pattern 1: Progressive Testing

​Pattern 2: Parallel Group Testing

​Pattern 3: Critical vs. Non-Critical Tests

​Best Practices

Start Broad, Then Focus

Match Groups to Development Phase

Document Your Group Choices

Combine with Variations and Layers

​Troubleshooting

​Invalid Test Group Error

​Empty Test Results

​Unexpected Failure Rates

​Related Documentation

Build docs developers (and LLMs) love

Overview

Why Use Test Case Groups?

Focused Development

Staged CI/CD Pipeline

Cost and Time Optimization

Available Test Case Groups

Common Test Groups

Usage Examples

Single Test Group

Multiple Test Groups

All Test Groups (Default)

Multi-Turn Evaluation

OpenAI Fine-Tune Evaluations

Implementation Details

How It Works

Custom Test Groups

Workflow Patterns

Pattern 1: Progressive Testing

Pattern 2: Parallel Group Testing

Pattern 3: Critical vs. Non-Critical Tests

Best Practices

Troubleshooting

Invalid Test Group Error

Empty Test Results

Unexpected Failure Rates

Related Documentation