Skip to main content
NemoCustomizer is a production-ready service for fine-tuning large language models using Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA. It provides a scalable API for creating, managing, and deploying customized model adapters.

Overview

NemoCustomizer enables you to:
  • Fine-tune foundation models on proprietary datasets
  • Use LoRA/PEFT for memory-efficient training
  • Orchestrate multi-GPU training jobs with Volcano or Run.AI
  • Track experiments with Weights & Biases
  • Store and version adapters in NemoEntitystore
  • Serve multiple fine-tuned adapters from a single NIM instance

When to Use NemoCustomizer

Domain Adaptation

Adapt general models to specialized domains like healthcare, legal, or finance

Instruction Tuning

Train models to follow specific instruction formats or conversation styles

Style Transfer

Customize output style, tone, or formatting for brand consistency

Multi-Tenancy

Create customer-specific adapters for SaaS deployments

Architecture

NemoCustomizer consists of several components:
  • API Service: REST API for managing customization jobs
  • Training Jobs: Kubernetes Jobs/VolcanoJobs for model training
  • Model Storage: PVC for base models and training artifacts
  • Database: PostgreSQL for job metadata and state
  • Datastore Integration: Fetch training datasets
  • Entitystore Integration: Store trained adapters

Configuration

Complete Example

apiVersion: apps.nvidia.com/v1alpha1
kind: NemoCustomizer
metadata:
  name: nemocustomizer-sample
  namespace: nemo
spec:
  # Container image configuration
  image:
    repository: nvcr.io/nvidia/nemo-microservices/customizer-api
    tag: "25.08"
    pullPolicy: IfNotPresent
    pullSecrets:
      - ngc-secret
  
  # API service exposure
  expose:
    service:
      type: ClusterIP
      port: 8000
  
  # Replica configuration
  replicas: 1
  
  # Scheduler for training jobs (volcano or runai)
  scheduler:
    type: "volcano"
  
  # PostgreSQL database connection
  databaseConfig:
    credentials:
      user: ncsuser
      secretName: customizer-pg-existing-secret
      passwordKey: password
    host: customizer-pg-postgresql.nemo.svc.cluster.local
    port: 5432
    databaseName: ncsdb
  
  # NeMo EntityStore endpoint for storing adapters
  entitystore:
    endpoint: http://nemoentitystore-sample.nemo.svc.cluster.local:8000
  
  # NeMo DataStore endpoint for fetching datasets
  datastore:
    endpoint: http://nemodatastore-sample.nemo.svc.cluster.local:8000
  
  # MLflow tracking server
  mlflow:
    endpoint: http://mlflow-tracking.nemo.svc.cluster.local:80
  
  # Weights & Biases configuration
  wandb:
    secretName: wandb-secret
    apiKeyKey: apiKey
    encryptionKey: encryptionKey
  
  # OpenTelemetry tracing
  otel:
    enabled: true
    exporterOtlpEndpoint: http://customizer-otel-opentelemetry-collector.nemo.svc.cluster.local:4317
  
  # Data store CLI tools image
  nemoDatastoreTools:
    image: nvcr.io/nvidia/nemo-microservices/nds-v2-huggingface-cli:25.08
  
  # Model download jobs configuration
  modelDownloadJobs:
    image: "nvcr.io/nvidia/nemo-microservices/customizer-api:25.08"
    ngcAPISecret:
      name: ngc-api-secret
      key: "NGC_API_KEY"
    securityContext:
      fsGroup: 1000
      runAsNonRoot: true
      runAsUser: 1000
      runAsGroup: 1000
    ttlSecondsAfterFinished: 600
    pollIntervalSeconds: 15
  
  # Model configuration ConfigMap
  modelConfig:
    name: nemo-model-config
  
  # Training job configuration
  trainingConfig:
    configMap:
      name: nemo-training-config
    
    # Base model storage PVC
    modelPVC:
      create: true
      name: finetuning-ms-models-pvc
      storageClass: ""
      volumeAccessMode: ReadWriteOnce
      size: 50Gi
    
    # Per-job workspace PVC
    workspacePVC:
      storageClass: "local-path"
      volumeAccessMode: ReadWriteOnce
      size: 10Gi
      mountPath: /pvc/workspace
    
    # Training container image
    image:
      repository: nvcr.io/nvidia/nemo-microservices/customizer
      tag: "25.08"
    
    # Environment variables for training
    env:
      - name: LOG_LEVEL
        value: INFO
    
    # Multi-node networking configuration
    networkConfig:
      - name: NCCL_IB_SL
        value: "0"
      - name: NCCL_IB_TC
        value: "41"
      - name: UCX_TLS
        value: TCP
      - name: UCX_NET_DEVICES
        value: eth0
    
    # Job lifecycle
    ttlSecondsAfterFinished: 3600
    timeout: 3600
    
    # GPU node configuration
    tolerations:
      - key: "nvidia.com/gpu"
        operator: "Exists"
        effect: "NoSchedule"

Key Configuration Fields

spec.scheduler.type
string
default:"volcano"
Training job scheduler. Options: volcano or runai.
spec.databaseConfig
object
required
PostgreSQL connection configuration for storing customization job metadata.
spec.entitystore.endpoint
string
required
NemoEntitystore service URL for uploading trained adapters.
spec.datastore.endpoint
string
required
NemoDatastore service URL for fetching training datasets.
spec.mlflow.endpoint
string
required
MLflow tracking server URL for experiment tracking.
spec.wandb
object
Weights & Biases configuration for experiment tracking and visualization.
spec.trainingConfig.modelPVC
object
required
Persistent volume for caching base models. Shared across training jobs.
spec.trainingConfig.workspacePVC
object
required
Per-job workspace configuration. Automatically created for each training job.
spec.trainingConfig.networkConfig
array
NCCL/networking environment variables for multi-node training.

Integration with Services

NemoDatastore Integration

NemoCustomizer fetches training datasets from NemoDatastore:
datastore:
  endpoint: http://nemodatastore-sample.nemo.svc.cluster.local:8000
Datasets are referenced in customization requests and downloaded by training jobs.

NemoEntitystore Integration

Trained adapters are automatically uploaded to NemoEntitystore:
entitystore:
  endpoint: http://nemoentitystore-sample.nemo.svc.cluster.local:8000
Once uploaded, adapters can be served by NIM instances configured to poll the entitystore.

MLflow Tracking

Training metrics are logged to MLflow:
mlflow:
  endpoint: http://mlflow-tracking.nemo.svc.cluster.local:80

Weights & Biases

For advanced experiment tracking:
1

Create W&B Secret

kubectl create secret generic wandb-secret \
  --from-literal=apiKey=<YOUR_WANDB_API_KEY> \
  --from-literal=encryptionKey=<RANDOM_KEY> \
  -n nemo
2

Configure W&B in Spec

wandb:
  secretName: wandb-secret
  apiKeyKey: apiKey
  encryptionKey: encryptionKey
  entity: your-team
  projectName: nemo-customization

Training Job Schedulers

Volcano Scheduler

Default scheduler for Kubernetes-native gang scheduling:
scheduler:
  type: "volcano"
Ensure Volcano is installed:
kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml

Run.AI Scheduler

For Run.AI-managed clusters:
scheduler:
  type: "runai"

trainingConfig:
  runaiQueue: default

Storage Configuration

Model PVC

Shared storage for base models:
trainingConfig:
  modelPVC:
    create: true
    name: finetuning-ms-models-pvc
    storageClass: "fast-ssd"  # Use fast storage for models
    volumeAccessMode: ReadWriteOnce
    size: 50Gi
Use ReadWriteMany if running multi-node training jobs on different nodes.

Workspace PVC

Per-job workspace (automatically created):
trainingConfig:
  workspacePVC:
    storageClass: "local-path"
    volumeAccessMode: ReadWriteOnce
    size: 10Gi
    mountPath: /pvc/workspace

API Usage

Create Customization Job

curl -X POST http://nemocustomizer-sample.nemo.svc.cluster.local:8000/v1/customizations \
  -H "Content-Type: application/json" \
  -d '{
    "name": "finance-model-v1",
    "model": "meta/llama-3.1-8b-instruct",
    "dataset": "finance-qa-dataset",
    "hyperparameters": {
      "lora_rank": 16,
      "lora_alpha": 32,
      "learning_rate": 2e-4,
      "num_epochs": 3
    }
  }'

Monitor Job Status

curl http://nemocustomizer-sample.nemo.svc.cluster.local:8000/v1/customizations/finance-model-v1

List Customizations

curl http://nemocustomizer-sample.nemo.svc.cluster.local:8000/v1/customizations

Best Practices

  • Use dedicated GPU nodes for training jobs
  • Configure appropriate node selectors and tolerations
  • Set reasonable timeout values to prevent stuck jobs
  • Monitor GPU utilization and adjust batch sizes
  • Use fast storage (NVMe/SSD) for model PVCs
  • Pre-download large models to avoid repeated downloads
  • Clean up workspace PVCs after job completion
  • Version your datasets in NemoDatastore
  • Use descriptive names for customization jobs
  • Tag experiments with metadata (model, dataset, purpose)
  • Monitor training metrics in W&B or MLflow
  • Keep records of successful hyperparameter configurations
  • Run multiple replicas of the API service for HA
  • Use PostgreSQL with backups for metadata
  • Configure OpenTelemetry for observability
  • Implement proper secret rotation for credentials

Troubleshooting

Check:
  • GPU node availability and labels
  • PVC creation and mounting
  • NGC pull secrets are valid
  • Volcano/Run.AI scheduler is running
Solutions:
  • Reduce batch size in hyperparameters
  • Increase GPU memory by using larger instance types
  • Enable gradient checkpointing
  • Use smaller LoRA rank values
Optimize:
  • Use NVMe storage for model PVC
  • Configure NCCL settings for your network
  • Check for CPU bottlenecks in data loading
  • Use mixed precision training (automatic in NeMo)

Next Steps

Deploy Entitystore

Set up adapter storage and serving

Configure NIM

Enable dynamic LoRA loading in NIM

Setup Evaluation

Evaluate your fine-tuned models

API Reference

Detailed API documentation

Build docs developers (and LLMs) love