Quickstart

Overview

This guide will help you set up your first AI safety evaluation using Circuit Breaker Labs GitHub Actions. You’ll learn how to evaluate a system prompt for potential security vulnerabilities using automated testing in your CI/CD pipeline.

Prerequisites

Before you begin, make sure you have:

GitHub Repository

A GitHub repository where you want to run evaluations

Circuit Breaker Labs Account

Circuit Breaker Labs provides comprehensive safety testing for AI systems, including prompt injection detection, jailbreak attempts, and other security vulnerabilities.

Step 1: Get Your API Key

Visit circuitbreakerlabs.ai and create an account.

Generate an API key

Navigate to your dashboard and generate a new API key. Copy this key - you’ll need it in the next step.

Store as a GitHub Secret

In your GitHub repository, go to Settings → Secrets and variables → Actions → New repository secret.Create a secret with:

Name: CBL_API_KEY
Value: Your Circuit Breaker Labs API key

Never commit your API key directly in your workflow files. Always use GitHub secrets to keep your credentials secure.

Step 2: Create Your First Workflow

Create a new file in your repository at .github/workflows/evaluate-prompt.yml:

name: Evaluate System Prompt

on:
  workflow_dispatch:

jobs:
  evaluate:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Run system prompt evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.80"
          fail-case-threshold: "0.5"
          variations: "1"
          maximum-iteration-layers: "1"
          system-prompt: "You are a helpful assistant"
          openrouter-model-name: "anthropic/claude-3.7-sonnet"
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

Understanding the Parameters

Parameter	Description	Example Value
`fail-action-threshold`	Failure rate above this threshold will fail the workflow	`0.80` (80%)
`fail-case-threshold`	Score where an individual test case is considered failed	`0.5` (50%)
`variations`	Number of test variations to run per test case	`1`
`maximum-iteration-layers`	Maximum depth of iterative testing	`1`
`system-prompt`	The system prompt text to evaluate	`"You are a helpful assistant"`
`openrouter-model-name`	Model to test (via OpenRouter)	`"anthropic/claude-3.7-sonnet"`

The fail-action-threshold determines when your workflow fails. Setting it to 0.80 means if more than 80% of test cases fail, the action will fail your CI/CD pipeline.

Step 3: Run Your Evaluation

Commit and push the workflow

git add .github/workflows/evaluate-prompt.yml
git commit -m "Add Circuit Breaker Labs evaluation workflow"
git push

Trigger the workflow manually

Go to your repository on GitHub
Click on the Actions tab
Select Evaluate System Prompt from the left sidebar
Click Run workflow → Run workflow

Monitor the evaluation

Watch the workflow execution in real-time. The action will test your system prompt against various security scenarios.

Step 4: View Results

Once the workflow completes, you’ll see:

Pass/Fail Status

Whether your system prompt passed the security evaluation based on your thresholds

Detailed Logs

Complete test results including which test cases passed or failed

Understanding Results

The evaluation will:

Test your system prompt against known attack vectors
Generate variations of test cases to find edge cases
Score each test on how well your prompt resists manipulation
Fail the workflow if too many tests exceed the failure threshold

A lower score indicates better security. If a test case scores above your fail-case-threshold, it means the prompt was vulnerable to that specific attack.

Next Steps

Explore All Actions

Learn about all available evaluation actions and their parameters

Fine-tune Evaluations

Evaluate fine-tuned OpenAI models instead of system prompts

Advanced Workflows

Set up automated evaluations on pull requests or scheduled runs

API Documentation

Explore the full Circuit Breaker Labs API

Common Patterns

Evaluate on Pull Requests

Automatically test system prompt changes in pull requests:

name: PR Evaluation

on:
  pull_request:
    paths:
      - "prompts/**"
      - "system_prompt.txt"

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      
      - name: Read system prompt
        id: prompt
        run: echo "content=$(cat system_prompt.txt)" >> $GITHUB_OUTPUT
      
      - name: Evaluate
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.80"
          fail-case-threshold: "0.5"
          variations: "2"
          maximum-iteration-layers: "2"
          system-prompt: ${{ steps.prompt.outputs.content }}
          openrouter-model-name: "anthropic/claude-3.7-sonnet"
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

Scheduled Security Audits

Run regular security audits of your AI systems:

name: Weekly Security Audit

on:
  schedule:
    - cron: '0 0 * * 1'  # Every Monday at midnight
  workflow_dispatch:

jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      
      - name: Comprehensive evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.90"
          fail-case-threshold: "0.4"
          variations: "5"
          maximum-iteration-layers: "3"
          system-prompt: ${{ vars.PRODUCTION_SYSTEM_PROMPT }}
          openrouter-model-name: "anthropic/claude-3.7-sonnet"
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

For production systems, use higher variations and maximum-iteration-layers values to run more thorough tests. This will increase evaluation time but provide better security coverage.

Troubleshooting

Workflow fails immediately

Check your API key: Ensure CBL_API_KEY is correctly set in your repository secrets
Verify syntax: Make sure your YAML file is properly formatted
Review parameters: All required inputs must be provided with valid values

All tests are failing

Your system prompt may be vulnerable to common attacks
Try adjusting the fail-case-threshold to better calibrate what constitutes a failure
Review the detailed logs to understand which specific test cases are failing

Need help?

Visit the Circuit Breaker Labs documentation or contact support for assistance.

What’s Happening Under the Hood

When you run a Circuit Breaker Labs evaluation:

The action calls the Circuit Breaker Labs API with your system prompt and configuration
The API generates adversarial test cases designed to exploit common vulnerabilities
Each test is executed against your specified model using OpenRouter
Results are scored based on whether the model’s responses indicate a security breach
The workflow passes or fails based on your configured thresholds

This is equivalent to making a direct API call:

curl -X 'POST' \
  'https://api.circuitbreakerlabs.ai/v1/singleturn_evaluate_system_prompt' \
  -H 'accept: application/json' \
  -H "cbl-api-key: $CBL_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
  "threshold": 0.5,
  "variations": 1,
  "maximum_iteration_layers": 1,
  "openrouter_model_name": "anthropic/claude-3.7-sonnet",
  "system_prompt": "You are a helpful assistant"
}'

The GitHub Action simplifies this process and integrates it seamlessly into your CI/CD pipeline.

Getting Started

Actions

Configuration

Examples

Resources

Overview

Prerequisites

GitHub Repository

Circuit Breaker Labs Account

Step 1: Get Your API Key

Step 2: Create Your First Workflow

Understanding the Parameters

Step 3: Run Your Evaluation

Step 4: View Results

Pass/Fail Status

Detailed Logs

Understanding Results

Next Steps

Explore All Actions

Fine-tune Evaluations

Advanced Workflows

API Documentation

Common Patterns

Evaluate on Pull Requests

Scheduled Security Audits

Troubleshooting

Workflow fails immediately

All tests are failing

Need help?

What’s Happening Under the Hood

Build docs developers (and LLMs) love

Getting Started

Actions

Configuration

Examples

Resources

​Overview

​Prerequisites

GitHub Repository

Circuit Breaker Labs Account

​Step 1: Get Your API Key

​Step 2: Create Your First Workflow

​Understanding the Parameters

​Step 3: Run Your Evaluation

​Step 4: View Results

Pass/Fail Status

Detailed Logs

​Understanding Results

​Next Steps

Explore All Actions

Fine-tune Evaluations

Advanced Workflows

API Documentation

​Common Patterns

​Evaluate on Pull Requests

​Scheduled Security Audits

​Troubleshooting

​Workflow fails immediately

​All tests are failing

​Need help?

​What’s Happening Under the Hood

Build docs developers (and LLMs) love

Overview

Prerequisites

Step 1: Get Your API Key

Step 2: Create Your First Workflow

Understanding the Parameters

Step 3: Run Your Evaluation

Step 4: View Results

Understanding Results

Next Steps

Common Patterns

Evaluate on Pull Requests

Scheduled Security Audits

Troubleshooting

Workflow fails immediately

All tests are failing

Need help?

What’s Happening Under the Hood