singleturn-evaluate-openai-finetune action evaluates an OpenAI fine-tuned model by running it through Circuit Breaker Labs safety tests with single-turn interactions.
Overview
This action is ideal for:- Validating OpenAI fine-tuned models before deployment
- Comparing fine-tune safety against base models
- Automated testing of model training results
- Ensuring fine-tuning hasn’t degraded safety properties
POST /v1/singleturn_evaluate_openai_finetune
Usage
Parameters
Required Parameters
Failure rate threshold that causes the workflow to fail. Expressed as a decimal (e.g.,
'0.10' for 10%).If the overall test failure rate exceeds this threshold, the action exits with code 1 and fails the workflow.Example: '0.10' means the workflow fails if more than 10% of test cases fail.Safety score threshold below which an individual test case is considered failed. Expressed as a decimal between 0 and 1.Test cases with safety scores below this threshold are marked as failures and included in failure rate calculations.Example:
'0.85' means any response with a safety score below 0.85 is considered unsafe.Number of test variations to run for each test case.Higher values provide more comprehensive testing but increase evaluation time and cost.Example:
'3' runs each test case 3 times with different variations.Maximum number of iteration layers for the evaluation.This controls the depth of adversarial testing by generating increasingly sophisticated attack variations.Example:
'5' generates up to 5 layers of test iterations.Fully qualified name of the OpenAI fine-tuned model to evaluate.This must be the complete model identifier from OpenAI, including the fine-tune suffix.Format:
ft:<base-model>:<org-name>:<custom-name>:<suffix>Example: 'ft:gpt-4o-mini-2024-07-18:acmecorp:custom-model:AaBbCcDd'Your Circuit Breaker Labs API key.Important: Always store this as a GitHub secret, never commit it to your repository.Example:
${{ secrets.CBL_API_KEY }}Your OpenAI API key with access to the fine-tuned model.This key must have permission to use the specified fine-tuned model.Important: Always store this as a GitHub secret, never commit it to your repository.Example:
${{ secrets.OPENAI_API_KEY }}Optional Parameters
Space-separated list of test case groups to run. If not specified, all test case groups are executed.This allows you to run specific subsets of tests for targeted evaluation.Example:
'jailbreak prompt_injection'Example Workflows
Post-Training Validation
Continuous Model Monitoring
Deployment Gate
Targeted Vulnerability Testing
Output and Reporting
Success Case
When all tests pass within acceptable thresholds:Failure Case
When the failure rate exceeds the threshold:Implementation Details
The action performs the following steps:- Installs uv: Uses
astral-sh/setup-uv@1e862dfacbd1d6d858c55d9b792c756523627244for Python environment management - Constructs API Request: Builds a
SingleTurnEvaluateOpenAiFinetuneRequestwith your parameters - Calls API: POSTs to
/v1/evaluations/single-turn/evaluate-openai-fine-tunewith both API keys - Processes Results: Parses the
SingleTurnRunTestsResponseand calculates failure rates - Reports Failures: If failure rate exceeds threshold, outputs detailed failure information
- Exits: Returns appropriate exit code based on test results
The action uses the Circuit Breaker Labs Python SDK internally, calling
single_turn_evaluate_openai_fine_tune_post.sync_detailed() from the circuit_breaker_labs.api.evaluations module.Best Practices
Fine-Tune Testing Strategy
- Baseline Testing: Test the base model before fine-tuning to establish safety baselines
- Post-Training Testing: Run comprehensive safety tests immediately after fine-tuning completes
- Regression Testing: Compare fine-tune results against base model to detect safety degradation
- Continuous Monitoring: Schedule periodic tests to catch any drift or issues
Threshold Configuration
- Development Models: Use moderate thresholds (e.g.,
fail-action-threshold: '0.10') - Production Models: Use strict thresholds (e.g.,
fail-action-threshold: '0.05') - Safety-Critical Systems: Use very strict thresholds (e.g.,
fail-action-threshold: '0.01')
API Key Management
- Store both API keys as GitHub secrets
- Use different keys for different environments (dev/staging/prod)
- Rotate keys regularly
- Monitor API key usage for anomalies
Cost Optimization
- Use lower
variationsandmaximum-iteration-layersfor frequent CI checks - Reserve comprehensive testing (high values) for pre-deployment gates
- Use
test-case-groupsto run targeted tests when debugging specific issues
This action incurs costs from both Circuit Breaker Labs (for safety evaluation) and OpenAI (for model inference). Monitor your usage on both platforms.
Troubleshooting
Authentication Errors
Problem:Error: 401 or authentication failures
Solution:
- Verify both API keys are correct
- Ensure secrets are properly configured in GitHub: Settings → Secrets and variables → Actions
- Check that you’re using
${{ secrets.SECRET_NAME }}syntax
Model Access Issues
Problem: Model not found or permission denied Solution:- Verify the fine-tuned model ID is correct
- Ensure your OpenAI API key has access to the specified model
- Check that the model is in a “succeeded” state (not still training or failed)
- Verify the model hasn’t been deleted
Invalid Model Format
Problem: Invalid model name errors Solution:- Ensure you’re using the full model identifier from OpenAI
- Check the format:
ft:<base-model>:<org>:<name>:<suffix> - Copy the exact model ID from OpenAI’s fine-tuning dashboard
High Failure Rates
Problem: Fine-tuned model fails safety tests Solution:- Review failed case details in the action output
- Compare results with base model testing
- Review your fine-tuning training data for safety issues
- Consider adding safety examples to your training data
- Test with different model sizes or base models
Fine-Tune vs System Prompt
This action differs from singleturn-evaluate-system-prompt in key ways:| Aspect | Fine-Tune Action | System Prompt Action |
|---|---|---|
| Model Source | OpenAI fine-tuned models | Any OpenRouter model |
| Authentication | Requires both CBL + OpenAI keys | Only requires CBL key |
| Use Case | Testing custom trained models | Testing prompt engineering |
| Model Parameter | model-name (full fine-tune ID) | system-prompt (text) + openrouter-model-name |
| API Endpoint | /single-turn/evaluate-openai-fine-tune | /singleturn/evaluate-system-prompt |
Related Actions
- Single-Turn System Prompt - For testing system prompts
- Multi-Turn OpenAI Fine-Tune - For conversational testing of fine-tunes