@arizeai/phoenix-evals package provides a comprehensive framework for evaluating LLM outputs using LLM-based evaluators and custom functions.
Installation
Quick Start
Built-in Evaluators
Phoenix provides ready-to-use evaluators for common LLM evaluation tasks.Hallucination / Faithfulness
Detects when the model generates information not supported by the context.createFaithfulnessEvaluator() (same functionality)
Required fields:
output: The LLM’s responsecontext: The context/documents provided to the LLM
Document Relevance
Evaluates if retrieved documents are relevant to the query.input: The user’s querycontext: The retrieved document(s)
Correctness
Compares the output against a reference answer.output: The LLM’s responseexpected: The reference/correct answerinput: The original query (optional but recommended)
Conciseness
Evaluates if the response is appropriately concise.input: The user’s queryoutput: The LLM’s response
Refusal
Detects when the model inappropriately refuses to answer.input: The user’s queryoutput: The LLM’s response
Tool Calling Evaluators
Evaluate tool/function calling behavior:Custom Evaluators
Classification Evaluator
Create a custom binary or multi-class classifier:Function-Based Evaluator
Create an evaluator from a custom function:LLM-Based Custom Evaluator
Evaluation Result
All evaluators return a result object:Batch Evaluation
Evaluate multiple examples in parallel:Model Configuration
Phoenix evals support multiple LLM providers:OpenAI
Anthropic
Google (Gemini)
Azure OpenAI
Template System
Customize evaluation prompts using templates:Template Variables
Extract variables from a template:Binding Evaluators
Create an evaluator with pre-filled inputs:Helper Functions
toEvaluationResult()
Convert custom data to standard evaluation result:asEvaluatorFn()
Convert a function to an evaluator:Integration Examples
With Phoenix Client
With Vercel AI SDK
TypeScript Types
The package provides full TypeScript support:See Also
- TypeScript Client Reference - Interact with Phoenix
- TypeScript OTEL Reference - Set up tracing
- Evaluations Guide - Learn about evaluation concepts