Skip to main content

Overview

The NemoEvaluator resource provides model evaluation services with support for multiple benchmark frameworks including LM Eval Harness, BigCode, MT-Bench, RAG, and more. API Group: apps.nvidia.com
API Version: v1alpha1
Kind: NemoEvaluator

Spec Fields

image
object
required
Container image configuration.
databaseConfig
object
required
PostgreSQL database configuration.
argoWorkflows
object
required
Argo Workflows service configuration.
vectorDB
object
required
Vector database configuration.
datastore
object
required
NeMo Datastore endpoint.
entitystore
object
required
NeMo Entitystore endpoint.
evaluationImages
object
required
Container images for different evaluation frameworks.
otel
object
OpenTelemetry configuration.
evalLogLevel
string
default:"INFO"
Evaluator log level (enum: INFO, DEBUG)
logHandlers
string
default:"console"
Log sink handlers (enum: console, file)
consoleLogLevel
string
default:"INFO"
Console log level (enum: INFO, DEBUG)
enableValidation
boolean
Enable validation jobs (default: true)
expose
object
Service exposure configuration.
replicas
integer
default:"1"
Number of replicas (minimum: 1)
scale
object
Autoscaling configuration
metrics
object
Metrics collection configuration
command
array
Override container command
args
array
Container arguments
env
array
Additional environment variables
resources
object
Resource requirements
nodeSelector
object
Node selector labels
tolerations
array
Pod tolerations
affinity
object
Pod affinity rules
labels
object
Custom labels
annotations
object
Custom annotations
userID
integer
User ID for container security context (default: 1000)
groupID
integer
Group ID for container security context (default: 2000)
runtimeClass
string
Runtime class name

Status Fields

conditions
array
Current state conditions
availableReplicas
integer
Number of available replicas
state
string
Current state (Pending, NotReady, Ready, Failed)

Example

apiVersion: apps.nvidia.com/v1alpha1
kind: NemoEvaluator
metadata:
  name: nemoevaluator-sample
  namespace: nemo
spec:
  evaluationImages:
    bigcodeEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-bigcode:0.12.21"
    lmEvalHarness: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-lm-eval-harness:0.12.21"
    similarityMetrics: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-custom-eval:0.12.21"
    llmAsJudge: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.21"
    mtBench: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-llm-as-a-judge:0.12.21"
    retriever: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-retriever:0.12.21"
    rag: "nvcr.io/nvidia/nemo-microservices/eval-tool-benchmark-rag:0.12.21"
    bfcl: "nvcr.io/nvidia/nemo-microservices/eval-factory-benchmark-bfcl:25.6.1"
    agenticEval: "nvcr.io/nvidia/nemo-microservices/eval-factory-benchmark-agentic-eval:25.6.1"
  image:
    repository: nvcr.io/nvidia/nemo-microservices/evaluator
    tag: "25.06"
    pullPolicy: IfNotPresent
    pullSecrets:
      - ngc-secret
  expose:
    service:
      type: ClusterIP
      port: 8000
  argoWorkflows:
    endpoint: https://argo-workflows-server.nemo.svc.cluster.local:2746
    serviceAccount: argo-workflows-executor
  vectorDB:
    endpoint: http://milvus.nemo.svc.cluster.local:19530
  datastore:
    endpoint: http://nemodatastore-sample.nemo.svc.cluster.local:8000/v1/hf
  entitystore:
    endpoint: http://nemoentitystore-sample.nemo.svc.cluster.local:8000
  databaseConfig:
    host: evaluator-pg-postgresql.nemo.svc.cluster.local
    port: 5432
    databaseName: evaldb
    credentials:
      user: evaluser
      secretName: evaluator-pg-existing-secret
      passwordKey: password
  otel:
    enabled: true
    exporterOtlpEndpoint: http://evaluator-otel-opentelemetry-collector.nemo.svc.cluster.local:4317
  replicas: 1

Build docs developers (and LLMs) love