Advanced CI/CD Workflows

Overview

Advanced CI/CD patterns for integrating Circuit Breaker Labs evaluations into sophisticated deployment pipelines. These examples demonstrate multi-stage evaluations, conditional deployments, and integration with other tools.

Multi-Stage Evaluation Pipeline

Run evaluations in stages with increasing rigor before deploying:

name: Multi-Stage Evaluation

on:
  push:
    branches:
      - main
  pull_request:

jobs:
  quick-evaluation:
    name: Quick Safety Check
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Load configuration
        id: config
        run: |
          PROMPT=$(jq -r '.system_prompt' model_config.json)
          MODEL=$(jq -r '.model' model_config.json)
          echo "prompt<<EOF" >> $GITHUB_OUTPUT
          echo "$PROMPT" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT
          echo "model=$MODEL" >> $GITHUB_OUTPUT

      - name: Quick single-turn evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.70"
          fail-case-threshold: "0.5"
          variations: "1"
          maximum-iteration-layers: "1"
          system-prompt: ${{ steps.config.outputs.prompt }}
          openrouter-model-name: ${{ steps.config.outputs.model }}
          test-case-groups: "security"
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

  thorough-evaluation:
    name: Thorough Safety Check
    needs: quick-evaluation
    runs-on: ubuntu-latest
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Load configuration
        id: config
        run: |
          PROMPT=$(jq -r '.system_prompt' model_config.json)
          MODEL=$(jq -r '.model' model_config.json)
          echo "prompt<<EOF" >> $GITHUB_OUTPUT
          echo "$PROMPT" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT
          echo "model=$MODEL" >> $GITHUB_OUTPUT

      - name: Comprehensive single-turn evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.85"
          fail-case-threshold: "0.5"
          variations: "5"
          maximum-iteration-layers: "3"
          system-prompt: ${{ steps.config.outputs.prompt }}
          openrouter-model-name: ${{ steps.config.outputs.model }}
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

      - name: Multi-turn evaluation
        uses: circuitbreakerlabs/actions/multiturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.85"
          fail-case-threshold: "0.5"
          max-turns: "6"
          test-types: "jailbreak context_shift consistency"
          system-prompt: ${{ steps.config.outputs.prompt }}
          openrouter-model-name: ${{ steps.config.outputs.model }}
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

  deploy:
    name: Deploy to Production
    needs: thorough-evaluation
    runs-on: ubuntu-latest
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Deploy application
        run: |
          echo "Deploying to production..."
          # Your deployment logic here

This pattern runs a quick check on PRs and comprehensive evaluations only on main branch pushes. Deployment only proceeds if all evaluations pass.

Parallel Test Suite Execution

Run different test case groups in parallel for faster feedback:

name: Parallel Test Suites

on:
  push:
    branches:
      - main
  workflow_dispatch:

jobs:
  load-config:
    name: Load Configuration
    runs-on: ubuntu-latest
    outputs:
      prompt: ${{ steps.config.outputs.prompt }}
      model: ${{ steps.config.outputs.model }}
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Load configuration
        id: config
        run: |
          PROMPT=$(jq -r '.system_prompt' model_config.json)
          MODEL=$(jq -r '.model' model_config.json)
          echo "prompt<<EOF" >> $GITHUB_OUTPUT
          echo "$PROMPT" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT
          echo "model=$MODEL" >> $GITHUB_OUTPUT

  security-tests:
    name: Security Tests
    needs: load-config
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Run security evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.90"
          fail-case-threshold: "0.5"
          variations: "3"
          maximum-iteration-layers: "2"
          system-prompt: ${{ needs.load-config.outputs.prompt }}
          openrouter-model-name: ${{ needs.load-config.outputs.model }}
          test-case-groups: "security jailbreak"
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

  privacy-tests:
    name: Privacy Tests
    needs: load-config
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Run privacy evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.90"
          fail-case-threshold: "0.5"
          variations: "3"
          maximum-iteration-layers: "2"
          system-prompt: ${{ needs.load-config.outputs.prompt }}
          openrouter-model-name: ${{ needs.load-config.outputs.model }}
          test-case-groups: "privacy data_protection"
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

  compliance-tests:
    name: Compliance Tests
    needs: load-config
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Run compliance evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.90"
          fail-case-threshold: "0.5"
          variations: "3"
          maximum-iteration-layers: "2"
          system-prompt: ${{ needs.load-config.outputs.prompt }}
          openrouter-model-name: ${{ needs.load-config.outputs.model }}
          test-case-groups: "compliance regulatory"
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

  multi-turn-tests:
    name: Multi-Turn Tests
    needs: load-config
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Run multi-turn evaluation
        uses: circuitbreakerlabs/actions/multiturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.85"
          fail-case-threshold: "0.5"
          max-turns: "6"
          test-types: "jailbreak context_shift"
          system-prompt: ${{ needs.load-config.outputs.prompt }}
          openrouter-model-name: ${{ needs.load-config.outputs.model }}
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

Parallel execution reduces total pipeline time. Each test suite has strict thresholds (0.90) for critical areas like security and privacy.

Matrix Strategy for Multiple Models

Test multiple models or configurations simultaneously:

name: Multi-Model Evaluation

on:
  workflow_dispatch:
  schedule:
    - cron: '0 0 * * 0'  # Weekly on Sunday

jobs:
  evaluate-models:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        model:
          - name: "claude-3.7-sonnet"
            openrouter: "anthropic/claude-3.7-sonnet"
          - name: "claude-3-opus"
            openrouter: "anthropic/claude-3-opus"
          - name: "gpt-4"
            openrouter: "openai/gpt-4"
        threshold: ["0.80", "0.85", "0.90"]
      fail-fast: false
    
    name: Evaluate ${{ matrix.model.name }} (threshold ${{ matrix.threshold }})
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Load system prompt
        id: config
        run: |
          PROMPT=$(jq -r '.system_prompt' model_config.json)
          echo "prompt<<EOF" >> $GITHUB_OUTPUT
          echo "$PROMPT" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT

      - name: Evaluate ${{ matrix.model.name }}
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: ${{ matrix.threshold }}
          fail-case-threshold: "0.5"
          variations: "3"
          maximum-iteration-layers: "2"
          system-prompt: ${{ steps.config.outputs.prompt }}
          openrouter-model-name: ${{ matrix.model.openrouter }}
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

Set fail-fast: false to ensure all matrix jobs complete even if one fails. This gives you complete comparison data across models and thresholds.

Conditional Deployment Based on Environment

Different evaluation strategies for staging vs production:

name: Environment-Specific Evaluation

on:
  push:
    branches:
      - staging
      - main

jobs:
  evaluate-staging:
    name: Staging Evaluation
    if: github.ref == 'refs/heads/staging'
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Load configuration
        id: config
        run: |
          PROMPT=$(jq -r '.system_prompt' model_config.json)
          MODEL=$(jq -r '.model' model_config.json)
          echo "prompt<<EOF" >> $GITHUB_OUTPUT
          echo "$PROMPT" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT
          echo "model=$MODEL" >> $GITHUB_OUTPUT

      - name: Staging evaluation (lighter)
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.75"
          fail-case-threshold: "0.5"
          variations: "2"
          maximum-iteration-layers: "2"
          system-prompt: ${{ steps.config.outputs.prompt }}
          openrouter-model-name: ${{ steps.config.outputs.model }}
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

      - name: Deploy to staging
        run: |
          echo "Deploying to staging environment..."
          # Staging deployment logic

  evaluate-production:
    name: Production Evaluation
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Load configuration
        id: config
        run: |
          PROMPT=$(jq -r '.system_prompt' model_config.json)
          MODEL=$(jq -r '.model' model_config.json)
          echo "prompt<<EOF" >> $GITHUB_OUTPUT
          echo "$PROMPT" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT
          echo "model=$MODEL" >> $GITHUB_OUTPUT

      - name: Production single-turn evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.90"
          fail-case-threshold: "0.5"
          variations: "5"
          maximum-iteration-layers: "3"
          system-prompt: ${{ steps.config.outputs.prompt }}
          openrouter-model-name: ${{ steps.config.outputs.model }}
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

      - name: Production multi-turn evaluation
        uses: circuitbreakerlabs/actions/multiturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.90"
          fail-case-threshold: "0.5"
          max-turns: "8"
          test-types: "jailbreak context_shift consistency instruction_following"
          system-prompt: ${{ steps.config.outputs.prompt }}
          openrouter-model-name: ${{ steps.config.outputs.model }}
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

      - name: Deploy to production
        run: |
          echo "Deploying to production environment..."
          # Production deployment logic

Staging uses lighter evaluation (2 variations, threshold 0.75) while production uses rigorous testing (5 variations, threshold 0.90, multi-turn with 8 turns).

Integration with PR Comments

Post evaluation results as PR comments:

name: PR Evaluation with Comments

on:
  pull_request:
    paths:
      - "model_config.json"
      - "prompts/**"

jobs:
  evaluate:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Load configuration
        id: config
        run: |
          PROMPT=$(jq -r '.system_prompt' model_config.json)
          MODEL=$(jq -r '.model' model_config.json)
          echo "prompt<<EOF" >> $GITHUB_OUTPUT
          echo "$PROMPT" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT
          echo "model=$MODEL" >> $GITHUB_OUTPUT

      - name: Run evaluation
        id: evaluation
        continue-on-error: true
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.80"
          fail-case-threshold: "0.5"
          variations: "3"
          maximum-iteration-layers: "2"
          system-prompt: ${{ steps.config.outputs.prompt }}
          openrouter-model-name: ${{ steps.config.outputs.model }}
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

      - name: Comment on PR
        uses: actions/github-script@v7
        with:
          github-token: ${{ secrets.GITHUB_TOKEN }}
          script: |
            const status = '${{ steps.evaluation.outcome }}';
            const body = status === 'success' 
              ? '✅ Circuit Breaker Labs evaluation passed!\n\nYour system prompt changes meet safety requirements.'
              : '❌ Circuit Breaker Labs evaluation failed.\n\nPlease review the safety evaluation results and update your system prompt.';
            
            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: body
            });

      - name: Fail if evaluation failed
        if: steps.evaluation.outcome == 'failure'
        run: exit 1

Using continue-on-error: true allows the workflow to post a comment before failing. This provides immediate feedback to PR authors.

Reusable Workflow

Create a reusable workflow for consistent evaluations across repositories:

Create .github/workflows/evaluate-reusable.yml

.github/workflows/evaluate-reusable.yml

name: Reusable Evaluation Workflow

on:
  workflow_call:
    inputs:
      system-prompt:
        required: true
        type: string
      model:
        required: true
        type: string
      fail-action-threshold:
        required: false
        type: string
        default: "0.80"
      variations:
        required: false
        type: string
        default: "3"
    secrets:
      cbl-api-key:
        required: true

jobs:
  evaluate:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Single-turn evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: ${{ inputs.fail-action-threshold }}
          fail-case-threshold: "0.5"
          variations: ${{ inputs.variations }}
          maximum-iteration-layers: "2"
          system-prompt: ${{ inputs.system-prompt }}
          openrouter-model-name: ${{ inputs.model }}
          circuit-breaker-labs-api-key: ${{ secrets.cbl-api-key }}

      - name: Multi-turn evaluation
        uses: circuitbreakerlabs/actions/multiturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: ${{ inputs.fail-action-threshold }}
          fail-case-threshold: "0.5"
          max-turns: "6"
          test-types: "jailbreak context_shift"
          system-prompt: ${{ inputs.system-prompt }}
          openrouter-model-name: ${{ inputs.model }}
          circuit-breaker-labs-api-key: ${{ secrets.cbl-api-key }}

Use reusable workflow

name: Evaluate with Reusable Workflow

on:
  push:
    branches:
      - main

jobs:
  evaluate:
    uses: ./.github/workflows/evaluate-reusable.yml
    with:
      system-prompt: "You are a helpful assistant"
      model: "anthropic/claude-3.7-sonnet"
      fail-action-threshold: "0.85"
      variations: "5"
    secrets:
      cbl-api-key: ${{ secrets.CBL_API_KEY }}

Reusable workflows promote consistency and reduce duplication across multiple repositories in your organization.

Fine-Tune Training Pipeline

Complete pipeline from training data to evaluated fine-tune:

name: Fine-Tune Training Pipeline

on:
  workflow_dispatch:
    inputs:
      training_data_path:
        description: 'Path to training data file'
        required: true
        default: 'data/training.jsonl'
      model_suffix:
        description: 'Model name suffix'
        required: true

jobs:
  upload-training-data:
    name: Upload Training Data
    runs-on: ubuntu-latest
    outputs:
      file-id: ${{ steps.upload.outputs.file-id }}
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Upload training file to OpenAI
        id: upload
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          response=$(curl -s https://api.openai.com/v1/files \
            -H "Authorization: Bearer $OPENAI_API_KEY" \
            -F purpose="fine-tune" \
            -F file="@${{ github.event.inputs.training_data_path }}")
          
          file_id=$(echo $response | jq -r '.id')
          echo "file-id=$file_id" >> $GITHUB_OUTPUT
          echo "Uploaded training file: $file_id"

  create-fine-tune:
    name: Create Fine-Tune Job
    needs: upload-training-data
    runs-on: ubuntu-latest
    outputs:
      job-id: ${{ steps.create.outputs.job-id }}
    
    steps:
      - name: Create fine-tuning job
        id: create
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          response=$(curl -s https://api.openai.com/v1/fine_tuning/jobs \
            -H "Content-Type: application/json" \
            -H "Authorization: Bearer $OPENAI_API_KEY" \
            -d '{
              "training_file": "${{ needs.upload-training-data.outputs.file-id }}",
              "model": "gpt-4o-2024-08-06",
              "suffix": "${{ github.event.inputs.model_suffix }}"
            }')
          
          job_id=$(echo $response | jq -r '.id')
          echo "job-id=$job_id" >> $GITHUB_OUTPUT
          echo "Created fine-tuning job: $job_id"

  wait-for-completion:
    name: Wait for Fine-Tune Completion
    needs: create-fine-tune
    runs-on: ubuntu-latest
    outputs:
      model-id: ${{ steps.wait.outputs.model-id }}
    
    steps:
      - name: Wait for job completion
        id: wait
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          job_id="${{ needs.create-fine-tune.outputs.job-id }}"
          
          echo "Waiting for fine-tuning job to complete..."
          while true; do
            response=$(curl -s https://api.openai.com/v1/fine_tuning/jobs/$job_id \
              -H "Authorization: Bearer $OPENAI_API_KEY")
            
            status=$(echo $response | jq -r '.status')
            echo "Job status: $status"
            
            if [ "$status" = "succeeded" ]; then
              model_id=$(echo $response | jq -r '.fine_tuned_model')
              echo "model-id=$model_id" >> $GITHUB_OUTPUT
              echo "Fine-tuning completed! Model: $model_id"
              break
            elif [ "$status" = "failed" ]; then
              echo "Fine-tuning job failed"
              exit 1
            fi
            
            sleep 300  # Wait 5 minutes before checking again
          done

  evaluate-fine-tune:
    name: Evaluate Fine-Tuned Model
    needs: wait-for-completion
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Single-turn evaluation
        uses: circuitbreakerlabs/actions/singleturn-evaluate-openai-finetune@v1
        with:
          fail-action-threshold: "0.85"
          fail-case-threshold: "0.5"
          variations: "5"
          maximum-iteration-layers: "3"
          model-name: ${{ needs.wait-for-completion.outputs.model-id }}
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}

      - name: Multi-turn evaluation
        uses: circuitbreakerlabs/actions/multiturn-evaluate-openai-finetune@v1
        with:
          fail-action-threshold: "0.85"
          fail-case-threshold: "0.5"
          max-turns: "6"
          test-types: "jailbreak context_shift consistency"
          model-name: ${{ needs.wait-for-completion.outputs.model-id }}
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}
          openai-api-key: ${{ secrets.OPENAI_API_KEY }}

  save-model-config:
    name: Save Model Configuration
    needs: [wait-for-completion, evaluate-fine-tune]
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Update configuration file
        run: |
          jq '.model_id = "${{ needs.wait-for-completion.outputs.model-id }}"' \
            finetune_config.json > finetune_config.json.tmp
          mv finetune_config.json.tmp finetune_config.json

      - name: Commit configuration
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add finetune_config.json
          git commit -m "Update fine-tuned model to ${{ needs.wait-for-completion.outputs.model-id }}"
          git push

This complete pipeline uploads training data, creates a fine-tune job, waits for completion, evaluates the model, and updates your configuration file automatically.

Monitoring and Alerting

Integrate with Slack for evaluation notifications:

name: Evaluation with Slack Notifications

on:
  schedule:
    - cron: '0 */12 * * *'  # Every 12 hours
  workflow_dispatch:

jobs:
  evaluate:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

      - name: Load configuration
        id: config
        run: |
          PROMPT=$(jq -r '.system_prompt' model_config.json)
          MODEL=$(jq -r '.model' model_config.json)
          echo "prompt<<EOF" >> $GITHUB_OUTPUT
          echo "$PROMPT" >> $GITHUB_OUTPUT
          echo "EOF" >> $GITHUB_OUTPUT
          echo "model=$MODEL" >> $GITHUB_OUTPUT

      - name: Run evaluation
        id: evaluation
        continue-on-error: true
        uses: circuitbreakerlabs/actions/singleturn-evaluate-system-prompt@v1
        with:
          fail-action-threshold: "0.90"
          fail-case-threshold: "0.5"
          variations: "5"
          maximum-iteration-layers: "3"
          system-prompt: ${{ steps.config.outputs.prompt }}
          openrouter-model-name: ${{ steps.config.outputs.model }}
          circuit-breaker-labs-api-key: ${{ secrets.CBL_API_KEY }}

      - name: Notify Slack on success
        if: steps.evaluation.outcome == 'success'
        uses: slackapi/slack-github-action@v1
        with:
          webhook-url: ${{ secrets.SLACK_WEBHOOK_URL }}
          payload: |
            {
              "text": "✅ Circuit Breaker Labs evaluation passed",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Circuit Breaker Labs Evaluation Passed* ✅\n\nModel: `${{ steps.config.outputs.model }}`\nThreshold: 0.90\n\n<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View Results>"
                  }
                }
              ]
            }

      - name: Notify Slack on failure
        if: steps.evaluation.outcome == 'failure'
        uses: slackapi/slack-github-action@v1
        with:
          webhook-url: ${{ secrets.SLACK_WEBHOOK_URL }}
          payload: |
            {
              "text": "❌ Circuit Breaker Labs evaluation failed",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Circuit Breaker Labs Evaluation Failed* ❌\n\nModel: `${{ steps.config.outputs.model }}`\nThreshold: 0.90\n\n⚠️ *Action Required:* Review safety evaluation results\n\n<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View Details>"
                  }
                }
              ]
            }

      - name: Fail if evaluation failed
        if: steps.evaluation.outcome == 'failure'
        run: exit 1

Slack notifications provide immediate visibility into evaluation results for your team. Configure SLACK_WEBHOOK_URL in your repository secrets.

Getting Started

Actions

Configuration

Examples

Resources

Overview

Multi-Stage Evaluation Pipeline

Parallel Test Suite Execution

Matrix Strategy for Multiple Models

Conditional Deployment Based on Environment

Integration with PR Comments

Reusable Workflow

Fine-Tune Training Pipeline

Monitoring and Alerting

Build docs developers (and LLMs) love

Getting Started

Actions

Configuration

Examples

Resources

​Overview

​Multi-Stage Evaluation Pipeline

​Parallel Test Suite Execution

​Matrix Strategy for Multiple Models

​Conditional Deployment Based on Environment

​Integration with PR Comments

​Reusable Workflow

​Fine-Tune Training Pipeline

​Monitoring and Alerting

Build docs developers (and LLMs) love

Overview

Multi-Stage Evaluation Pipeline

Parallel Test Suite Execution

Matrix Strategy for Multiple Models

Conditional Deployment Based on Environment

Integration with PR Comments

Reusable Workflow

Fine-Tune Training Pipeline

Monitoring and Alerting