Skip to main content

Overview

This guide will help you set up your first AI safety evaluation using Circuit Breaker Labs GitHub Actions. You’ll learn how to evaluate a system prompt for potential security vulnerabilities using automated testing in your CI/CD pipeline.

Prerequisites

Before you begin, make sure you have:

GitHub Repository

A GitHub repository where you want to run evaluations

Circuit Breaker Labs Account

Sign up at circuitbreakerlabs.ai to get your API key
Circuit Breaker Labs provides comprehensive safety testing for AI systems, including prompt injection detection, jailbreak attempts, and other security vulnerabilities.

Step 1: Get Your API Key

1

Sign up for Circuit Breaker Labs

Visit circuitbreakerlabs.ai and create an account.
2

Generate an API key

Navigate to your dashboard and generate a new API key. Copy this key - you’ll need it in the next step.
3

Store as a GitHub Secret

In your GitHub repository, go to SettingsSecrets and variablesActionsNew repository secret.Create a secret with:
  • Name: CBL_API_KEY
  • Value: Your Circuit Breaker Labs API key
Never commit your API key directly in your workflow files. Always use GitHub secrets to keep your credentials secure.

Step 2: Create Your First Workflow

Create a new file in your repository at .github/workflows/evaluate-prompt.yml:
name: Evaluate System Prompt

on:
  workflow_dispatch:

jobs:
  evaluate:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Run system prompt evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.80"
          fail-case-threshold: "0.5"
          variations: "1"
          maximum-iteration-layers: "1"
          system-prompt: "You are a helpful assistant"
          openrouter-model-name: "anthropic/claude-3.7-sonnet"
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

Understanding the Parameters

ParameterDescriptionExample Value
fail-action-thresholdFailure rate above this threshold will fail the workflow0.80 (80%)
fail-case-thresholdScore where an individual test case is considered failed0.5 (50%)
variationsNumber of test variations to run per test case1
maximum-iteration-layersMaximum depth of iterative testing1
system-promptThe system prompt text to evaluate"You are a helpful assistant"
openrouter-model-nameModel to test (via OpenRouter)"anthropic/claude-3.7-sonnet"
The fail-action-threshold determines when your workflow fails. Setting it to 0.80 means if more than 80% of test cases fail, the action will fail your CI/CD pipeline.

Step 3: Run Your Evaluation

1

Commit and push the workflow

git add .github/workflows/evaluate-prompt.yml
git commit -m "Add Circuit Breaker Labs evaluation workflow"
git push
2

Trigger the workflow manually

  1. Go to your repository on GitHub
  2. Click on the Actions tab
  3. Select Evaluate System Prompt from the left sidebar
  4. Click Run workflowRun workflow
3

Monitor the evaluation

Watch the workflow execution in real-time. The action will test your system prompt against various security scenarios.

Step 4: View Results

Once the workflow completes, you’ll see:

Pass/Fail Status

Whether your system prompt passed the security evaluation based on your thresholds

Detailed Logs

Complete test results including which test cases passed or failed

Understanding Results

The evaluation will:
  • Test your system prompt against known attack vectors
  • Generate variations of test cases to find edge cases
  • Score each test on how well your prompt resists manipulation
  • Fail the workflow if too many tests exceed the failure threshold
A lower score indicates better security. If a test case scores above your fail-case-threshold, it means the prompt was vulnerable to that specific attack.

Next Steps

Explore All Actions

Learn about all available evaluation actions and their parameters

Fine-tune Evaluations

Evaluate fine-tuned OpenAI models instead of system prompts

Advanced Workflows

Set up automated evaluations on pull requests or scheduled runs

API Documentation

Explore the full Circuit Breaker Labs API

Common Patterns

Evaluate on Pull Requests

Automatically test system prompt changes in pull requests:
name: PR Evaluation

on:
  pull_request:
    paths:
      - "prompts/**"
      - "system_prompt.txt"

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      
      - name: Read system prompt
        id: prompt
        run: echo "content=$(cat system_prompt.txt)" >> $GITHUB_OUTPUT
      
      - name: Evaluate
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.80"
          fail-case-threshold: "0.5"
          variations: "2"
          maximum-iteration-layers: "2"
          system-prompt: ${{ steps.prompt.outputs.content }}
          openrouter-model-name: "anthropic/claude-3.7-sonnet"
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

Scheduled Security Audits

Run regular security audits of your AI systems:
name: Weekly Security Audit

on:
  schedule:
    - cron: '0 0 * * 1'  # Every Monday at midnight
  workflow_dispatch:

jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      
      - name: Comprehensive evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.90"
          fail-case-threshold: "0.4"
          variations: "5"
          maximum-iteration-layers: "3"
          system-prompt: ${{ vars.PRODUCTION_SYSTEM_PROMPT }}
          openrouter-model-name: "anthropic/claude-3.7-sonnet"
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}
For production systems, use higher variations and maximum-iteration-layers values to run more thorough tests. This will increase evaluation time but provide better security coverage.

Troubleshooting

Workflow fails immediately

  • Check your API key: Ensure CBL_API_KEY is correctly set in your repository secrets
  • Verify syntax: Make sure your YAML file is properly formatted
  • Review parameters: All required inputs must be provided with valid values

All tests are failing

  • Your system prompt may be vulnerable to common attacks
  • Try adjusting the fail-case-threshold to better calibrate what constitutes a failure
  • Review the detailed logs to understand which specific test cases are failing

Need help?

Visit the Circuit Breaker Labs documentation or contact support for assistance.

What’s Happening Under the Hood

When you run a Circuit Breaker Labs evaluation:
  1. The action calls the Circuit Breaker Labs API with your system prompt and configuration
  2. The API generates adversarial test cases designed to exploit common vulnerabilities
  3. Each test is executed against your specified model using OpenRouter
  4. Results are scored based on whether the model’s responses indicate a security breach
  5. The workflow passes or fails based on your configured thresholds
This is equivalent to making a direct API call:
curl -X 'POST' \
  'https://api.circuitbreakerlabs.ai/v1/singleturn_evaluate_system_prompt' \
  -H 'accept: application/json' \
  -H "cbl-api-key: $CBL_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
  "threshold": 0.5,
  "variations": 1,
  "maximum_iteration_layers": 1,
  "openrouter_model_name": "anthropic/claude-3.7-sonnet",
  "system_prompt": "You are a helpful assistant"
}'
The GitHub Action simplifies this process and integrates it seamlessly into your CI/CD pipeline.

Build docs developers (and LLMs) love