Overview
The NemoEvaluator resource provides model evaluation services with support for multiple benchmark frameworks including LM Eval Harness, BigCode, MT-Bench, RAG, and more.
API Group: apps.nvidia.com
API Version: v1alpha1
Kind: NemoEvaluator
Spec Fields
Container image configuration. Container image repository
PostgreSQL database configuration. Database hostname (minimum length: 1)
Database name (minimum length: 1)
Database credentials Secret containing password
Key in secret for password
Argo Workflows service configuration. Argo Workflows endpoint URL (pattern: ^http, minimum length: 1)
Service account for workflow execution
Vector database configuration. Vector database endpoint URL (pattern: ^http, minimum length: 1)
NeMo Datastore endpoint. Datastore endpoint URL (pattern: ^http, minimum length: 1)
NeMo Entitystore endpoint. Entitystore endpoint URL (pattern: ^http, minimum length: 1)
Container images for different evaluation frameworks. BigCode evaluation harness image
LM evaluation harness image
Similarity metrics evaluation image
LLM-as-a-judge evaluation image
MT-Bench evaluation image
Retriever evaluation image
OpenTelemetry configuration. Enable OpenTelemetry tracing
OTLP collector endpoint URL
Disable Python logging auto-instrumentation
Exporter configuration Traces exporter (enum: otlp, console, none)
Metrics exporter (enum: otlp, console, none)
Logs exporter (enum: otlp, console, none)
excludedUrls
array
default: "[\"health\"]"
URLs to exclude from tracing
Log level (enum: INFO, DEBUG)
Evaluator log level (enum: INFO, DEBUG)
Log sink handlers (enum: console, file)
Console log level (enum: INFO, DEBUG)
Enable validation jobs (default: true)
Service exposure configuration.
Number of replicas (minimum: 1)
Autoscaling configuration
Metrics collection configuration
Override container command
Additional environment variables
User ID for container security context (default: 1000)
Group ID for container security context (default: 2000)
Status Fields
Number of available replicas
Current state (Pending, NotReady, Ready, Failed)
Example
apiVersion : apps.nvidia.com/v1alpha1
kind : NemoEvaluator
metadata :
name : nemoevaluator-sample
namespace : nemo
spec :
evaluationImages :
bigcodeEvalHarness : "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-bigcode:0.12.21"
lmEvalHarness : "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-lm-eval-harness:0.12.21"
similarityMetrics : "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-custom-eval:0.12.21"
llmAsJudge : "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.21"
mtBench : "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.21"
retriever : "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-retriever:0.12.21"
rag : "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-rag:0.12.21"
bfcl : "nvcr.io/nvidia/nemo-microservices/eval-factory-benchmark-bfcl:25.6.1"
agenticEval : "nvcr.io/nvidia/nemo-microservices/eval-factory-benchmark-agentic-eval:25.6.1"
image :
repository : nvcr.io/nvidia/nemo-microservices/evaluator
tag : "25.06"
pullPolicy : IfNotPresent
pullSecrets :
- ngc-secret
expose :
service :
type : ClusterIP
port : 8000
argoWorkflows :
endpoint : https://argo-workflows-server.nemo.svc.cluster.local:2746
serviceAccount : argo-workflows-executor
vectorDB :
endpoint : http://milvus.nemo.svc.cluster.local:19530
datastore :
endpoint : http://nemodatastore-sample.nemo.svc.cluster.local:8000/v1/hf
entitystore :
endpoint : http://nemoentitystore-sample.nemo.svc.cluster.local:8000
databaseConfig :
host : evaluator-pg-postgresql.nemo.svc.cluster.local
port : 5432
databaseName : evaldb
credentials :
user : evaluser
secretName : evaluator-pg-existing-secret
passwordKey : password
otel :
enabled : true
exporterOtlpEndpoint : http://evaluator-otel-opentelemetry-collector.nemo.svc.cluster.local:4317
replicas : 1