Overview
Evaluations (also called annotations) attach quality metrics to your traces. They can be:- Trace evaluations: Score entire traces
- Span evaluations: Score individual spans
- Document evaluations: Score retrieved documents in RAG applications
annotator_kind of "LLM" to distinguish them from human annotations.
Endpoints
Add Evaluations
Headers
application/x-protobuf: Protocol Buffer formatapplication/x-pandas-arrow: PyArrow table format (recommended)
Optional compression:
gzip (for protobuf only)Request Body
The request body format depends on the Content-Type:Protocol Buffer Format (application/x-protobuf)
Protocol Buffer Format (application/x-protobuf)
Binary Protocol Buffer containing an
Evaluation message with:name(string, required): Name of the evaluation- Evaluation data in protobuf format
The evaluation name must not be blank or empty.
PyArrow Format (application/x-pandas-arrow)
PyArrow Format (application/x-pandas-arrow)
PyArrow IPC stream containing a table with one of these index structures:For trace evaluations:
- Index:
trace_idorcontext.trace_id - Columns:
score(float),label(string),explanation(string)
- Index:
span_idorcontext.span_id - Columns:
score(float),label(string),explanation(string)
- Multi-index:
[span_id, document_position]or[context.span_id, document_position] - Columns:
score(float),label(string),explanation(string)
Response
Returns HTTP 204 (No Content) on success.Example
Get Evaluations
Query Parameters
Name of the project to get evaluations from. Defaults to
"default" if omitted.Response
Returns a streaming response of PyArrow tables inapplication/x-pandas-arrow format. Each table contains evaluations for a specific evaluation name, grouped by type (trace/span/document).
application/x-pandas-arrowResponse Schema
trace evaluations
Tables with
trace_id index containing:name: Evaluation namescore: Numeric scorelabel: Categorical labelexplanation: Text explanation- Additional annotation metadata
span evaluations
Tables with
span_id index containing the same fields as trace evaluationsdocument evaluations
Tables with multi-index
[span_id, document_position] containing the same fieldsExample
Evaluation Data Formats
Trace Evaluations
Evaluate entire conversation traces:Span Evaluations
Evaluate individual LLM calls or operations:Document Evaluations
Evaluate retrieved documents in RAG:Error Handling
No evaluations found for the specified project
Unsupported content type. Must be
application/x-protobuf or application/x-pandas-arrowInvalid request body:
- Evaluation name is blank or empty
- Invalid PyArrow format
- Invalid data structure (wrong index columns)
Best Practices
Use PyArrow Format
PyArrow format is more efficient than protobuf for bulk evaluations
Include Explanations
Add explanation text to help understand evaluation decisions
Consistent Naming
Use consistent evaluation names across your project (e.g., “correctness”, “relevance”)
Batch Processing
Send multiple evaluations in a single request for better performance