Overview
The/v1/chat/completions endpoint provides an OpenAI-compatible API with CheckThat AI’s advanced claim normalization and evaluation features. This endpoint is designed to be a drop-in replacement for OpenAI’s API with additional capabilities.
Endpoint
Authentication
Use Bearer token authentication with your LLM provider’s API key:Standard OpenAI Parameters
All standard OpenAI parameters are supported:Array of message objects comprising the conversation.Each message must have:
role(string):system,user, orassistantcontent(string): Message content
The LLM model to use. Supports all CheckThat AI models:
- OpenAI:
gpt-4o,gpt-5-2025-08-07,o3-2025-04-16, etc. - Anthropic:
claude-sonnet-4-20250514,claude-opus-4-1-20250805 - Google:
gemini-2.5-pro,gemini-2.5-flash - xAI:
grok-3,grok-4-0709,grok-3-mini - Together AI:
meta-llama/Llama-3.3-70B-Instruct-Turbo-Free
Whether to stream the response. When
true, responses are sent as Server-Sent Events (SSE).Sampling temperature (0.0 to 2.0). Higher values make output more random.
Maximum number of tokens to generate.
Alternative to
max_tokens for specifying maximum completion length.Nucleus sampling parameter (0.0 to 1.0).
Penalty for token frequency (-2.0 to 2.0).
Penalty for token presence (-2.0 to 2.0).
Stop sequences where the API will stop generating.
Number of completions to generate (1 to 128).
Whether to return log probabilities.
Number of most likely tokens to return (0 to 20).
For reasoning models (o3, o4-mini):
low, medium, or high.Specify output format. Supports structured output for compatible models.
List of tools the model can call.
Controls which tool is called.
CheckThat AI Custom Parameters
These parameters enable CheckThat AI’s advanced features:Enable automatic claim refinement through iterative evaluation.When enabled, CheckThat AI will:
- Generate initial response
- Evaluate claim quality
- Iteratively improve the claim until threshold is met
Model to use for claim refinement. Can be different from the main model.Example: Use
gpt-4o for refinement even if using gpt-3.5-turbo for generation.Quality threshold (0.0 to 1.0) for claim refinement.Claims scoring below this threshold will be refined.
Maximum refinement iterations before stopping.
DeepEval metrics to use for claim evaluation.Can be a metric name or custom metric instance.
API key for the refinement model (if different from main API key).
Response Format
Non-Streaming Response
Standard OpenAIChatCompletion object with CheckThat AI extensions:
Response Fields
Unique identifier for the completion
Object type:
chat.completionUnix timestamp of creation
Model used for generation
Array of completion choices
Token usage information
Post-normalization evaluation results (CheckThat AI extension)
Claim refinement metadata (CheckThat AI extension)
Additional CheckThat AI metadata
Streaming Response
Server-Sent Events (SSE) format:Example Requests
Basic Request
With Streaming
With Claim Refinement
With System Prompt
Python with OpenAI SDK
Python with Streaming
Python with Claim Refinement
JavaScript/TypeScript
JavaScript with Streaming
Error Responses
Missing Authorization
Status Code:401 Unauthorized
Invalid API Key
Status Code:403 Forbidden
Validation Error
Status Code:422 Unprocessable Entity
Bad Request
Status Code:400 Bad Request
Server Error
Status Code:500 Internal Server Error
Claim Refinement Process
Whenrefine_claims: true is set, CheckThat AI follows this process:
- Initial Generation - Generate response using specified model
- Quality Evaluation - Evaluate claim using DeepEval metrics
- Refinement Loop - If score < threshold:
- Generate feedback on claim quality
- Refine claim based on feedback
- Re-evaluate refined claim
- Repeat until threshold met or max iterations reached
- Return Enhanced Response - Include refinement metadata
Refinement Metrics
Supported evaluation metrics (via DeepEval):AnswerRelevancyMetric- Measures answer relevance to queryFaithfulnessMetric- Checks factual consistencyContextualPrecisionMetric- Evaluates precision in contextContextualRecallMetric- Measures information recallContextualRelevancyMetric- Assesses contextual relevance- Custom metrics via DeepEval
Refinement Example
Rate Limiting
Same rate limits as other endpoints:- 10 requests per 60 seconds per IP address
- Rate limit headers included in all responses
- Streaming requests count as single request
Best Practices
Use Appropriate Models
Use Appropriate Models
Choose models based on your needs:
- gpt-4o: High-quality general purpose
- o3/o4-mini: Advanced reasoning tasks
- claude-sonnet-4: Nuanced analysis
- llama-3.3-70b: Free tier option
Enable Streaming for UX
Enable Streaming for UX
Always use
stream: true for user-facing applications to provide immediate feedback and better user experience.Set Appropriate Thresholds
Set Appropriate Thresholds
When using claim refinement:
- Start with
threshold: 0.7for most cases - Use
threshold: 0.8+for high-accuracy requirements - Set
max_iters: 3-5to balance quality and cost
Handle Errors Gracefully
Handle Errors Gracefully
Implement retry logic with exponential backoff for transient errors. Log errors for debugging.
Monitor Token Usage
Monitor Token Usage
Track
usage field in responses to monitor costs and optimize prompts. Refinement increases token usage.Implementation Details
The endpoint uses a service layer architecture for clean separation of concerns:Service Layer (api/services/chat/completions.py)
LLM Router (api/_utils/LLMRouter.py)
Automatically selects the correct client based on model:
Comparison with Standard OpenAI API
| Feature | OpenAI API | CheckThat AI |
|---|---|---|
| Chat completions | ✅ | ✅ |
| Streaming | ✅ | ✅ |
| Function calling | ✅ | ✅ |
| Multi-provider support | ❌ | ✅ |
| Claim refinement | ❌ | ✅ |
| Evaluation metrics | ❌ | ✅ |
| Refinement metadata | ❌ | ✅ |
| Drop-in compatible | N/A | ✅ |
Related Endpoints
Chat Endpoint
Simplified streaming chat interface
Models
List all available models
Authentication
Authentication methods and setup
Batch Processing
Process multiple claims efficiently