NemoEntitystore - NVIDIA NIM Operator

NemoEntitystore provides a storage and serving solution for fine-tuned model adapters (LoRA/PEFT). It enables NIM instances to dynamically load customer-specific adapters without redeploying the base model.

Overview

NemoEntitystore enables you to:

Store fine-tuned LoRA adapters
Serve adapters to NIM instances via HTTP API
Version and track adapter metadata
Enable multi-tenant model serving
Integrate with NemoCustomizer for automatic adapter upload
Support dynamic adapter loading in NIM

When to Use NemoEntitystore

Multi-Tenant Serving

Serve customer-specific adapters from a single NIM instance

Adapter Versioning

Track and manage different versions of fine-tuned models

A/B Testing

Deploy multiple adapter versions and route traffic

Cost Optimization

Share base model infrastructure across tenants

Architecture

NemoEntitystore bridges NemoCustomizer training and NIM serving:

Configuration

Complete Example

apiVersion: apps.nvidia.com/v1alpha1
kind: NemoEntitystore
metadata:
  name: nemoentitystore-sample
  namespace: nemo
spec:
  # Container image configuration
  image:
    repository: nvcr.io/nvidia/nemo-microservices/entity-store
    tag: "25.08"
    pullPolicy: IfNotPresent
    pullSecrets:
      - ngc-secret
  
  # Service exposure
  expose:
    service:
      type: ClusterIP
      port: 8000
  
  # Replica configuration
  replicas: 1
  
  # NemoDatastore endpoint for adapter file storage
  datastore:
    endpoint: http://nemodatastore-sample.nemo.svc.cluster.local:8000
  
  # PostgreSQL database for adapter metadata
  databaseConfig:
    databaseName: nesdb
    host: entity-store-pg-postgresql.nemo.svc.cluster.local
    port: 5432
    credentials:
      user: nesuser
      secretName: entity-store-pg-existing-secret
      passwordKey: password
  
  # Resource limits
  resources:
    requests:
      memory: "256Mi"
      cpu: "500m"
    limits:
      memory: "512Mi"
      cpu: "1"

Key Configuration Fields

spec.datastore

object

required

NemoDatastore endpoint for storing adapter files.

endpoint

string

required

HTTP URL to NemoDatastore service.

spec.databaseConfig

object

required

PostgreSQL database configuration for adapter metadata and versioning.

spec.image.tag

string

Version of the entity-store image. Should match your NemoCustomizer version.

Integration with Services

NemoDatastore Integration

Entitystore uses Datastore to persist adapter files:

spec:
  datastore:
    endpoint: http://nemodatastore-sample.nemo.svc.cluster.local:8000

Adapter files are stored as Git repositories in Datastore, enabling:

Version control of adapters
Efficient storage with deduplication
Access via Git or HTTP

NemoCustomizer Integration

Customizer automatically uploads trained adapters:

# NemoCustomizer config
spec:
  entitystore:
    endpoint: http://nemoentitystore-sample.nemo.svc.cluster.local:8000

After training completes, adapters are:

Packaged with metadata
Uploaded to Entitystore
Registered in the database
Available for NIM to load

NIM Integration

Configure NIM to poll Entitystore for adapters:

apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-1-8b-instruct
  namespace: nemo
spec:
  env:
    # Enable PEFT adapter loading
    - name: NIM_PEFT_SOURCE
      value: http://nemoentitystore-sample.nemo.svc.cluster.local:8000
    
    # Refresh interval (seconds)
    - name: NIM_PEFT_REFRESH_INTERVAL
      value: "180"
    
    # Maximum adapters to cache
    - name: NIM_MAX_CPU_LORAS
      value: "16"
    - name: NIM_MAX_GPU_LORAS
      value: "8"

NIM polls Entitystore at the specified interval and automatically loads new adapters.

API Usage

List Adapters

curl http://nemoentitystore-sample.nemo.svc.cluster.local:8000/v1/adapters

Response:

{
  "adapters": [
    {
      "id": "finance-qa-v1",
      "name": "finance-qa-v1",
      "base_model": "meta/llama-3.1-8b-instruct",
      "version": "1.0.0",
      "created_at": "2024-03-15T10:30:00Z",
      "status": "ready"
    }
  ]
}

Get Adapter Details

curl http://nemoentitystore-sample.nemo.svc.cluster.local:8000/v1/adapters/finance-qa-v1

Response:

{
  "id": "finance-qa-v1",
  "name": "finance-qa-v1",
  "base_model": "meta/llama-3.1-8b-instruct",
  "version": "1.0.0",
  "lora_config": {
    "rank": 16,
    "alpha": 32,
    "target_modules": ["q_proj", "v_proj"]
  },
  "download_url": "http://nemodatastore-sample.nemo.svc.cluster.local:8000/adapters/finance-qa-v1.tar.gz",
  "metadata": {
    "dataset": "finance-qa-dataset",
    "training_job": "customization-123"
  }
}

Upload Adapter (Manual)

Typically done by NemoCustomizer, but can be manual:

curl -X POST http://nemoentitystore-sample.nemo.svc.cluster.local:8000/v1/adapters \
  -H "Content-Type: multipart/form-data" \
  -F "[email protected]" \
  -F 'metadata={
    "name": "custom-adapter-v1",
    "base_model": "meta/llama-3.1-8b-instruct",
    "version": "1.0.0"
  }'

Delete Adapter

curl -X DELETE http://nemoentitystore-sample.nemo.svc.cluster.local:8000/v1/adapters/finance-qa-v1

Adapter Lifecycle

Training

NemoCustomizer fine-tunes a model and produces a LoRA adapter.

Upload

Customizer packages the adapter with metadata and uploads to Entitystore.

Registration

Entitystore:

Stores adapter files in NemoDatastore
Registers metadata in PostgreSQL
Validates adapter compatibility

Discovery

NIM polls Entitystore and discovers new adapters.

Download

NIM downloads adapter files from NemoDatastore.

NIM loads the adapter into memory (CPU or GPU).

Serving

Adapter is available for inference requests.

Using Adapters in NIM

Request with Adapter

Specify adapter in the request:

curl -X POST http://nim-service.nemo.svc.cluster.local:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-3.1-8b-instruct",
    "messages": [
      {"role": "user", "content": "What are quarterly earnings?"}
    ],
    "peft_name": "finance-qa-v1"
  }'

List Loaded Adapters in NIM

curl http://nim-service.nemo.svc.cluster.local:8000/v1/models

Response includes base model and loaded adapters:

{
  "data": [
    {
      "id": "meta/llama-3.1-8b-instruct",
      "object": "model"
    },
    {
      "id": "finance-qa-v1",
      "object": "model",
      "owned_by": "peft"
    }
  ]
}

Database Schema

Entitystore uses PostgreSQL to track:

Adapter metadata (name, version, base model)
Training provenance (job ID, dataset, hyperparameters)
Storage locations (Datastore repository, file paths)
Status (ready, uploading, failed)
Access logs (who/when accessed)

Database is automatically initialized via init container.

Best Practices

Adapter Management

Use semantic versioning (v1.0.0, v1.1.0)
Include metadata about training data and purpose
Test adapters before making them available
Archive old versions rather than deleting

Performance

Monitor adapter cache hit rates in NIM
Adjust NIM_MAX_GPU_LORAS based on GPU memory
Use faster storage for frequently accessed adapters
Pre-load commonly used adapters

Multi-Tenancy

Use adapter naming conventions (tenant-purpose-version)
Implement access control if needed
Monitor adapter usage per tenant
Set resource quotas for adapter storage

Production Deployment

Run multiple replicas for HA
Backup PostgreSQL regularly
Monitor Datastore storage capacity
Implement adapter approval workflows

Monitoring

Key Metrics

Number of registered adapters
Adapter upload/download rates
Storage usage in Datastore
NIM adapter cache statistics
API request latency

Health Checks

# Entitystore health
curl http://nemoentitystore-sample.nemo.svc.cluster.local:8000/health

# Database connectivity
kubectl exec deployment/nemoentitystore-sample -n nemo -- \
  nc -zv entity-store-pg-postgresql.nemo.svc.cluster.local 5432

# Datastore connectivity
curl http://nemodatastore-sample.nemo.svc.cluster.local:8000/v1/health

Troubleshooting

Adapter Upload Fails

Check:

NemoDatastore is accessible
Database connection is healthy
Adapter file format is correct
Storage quota not exceeded

NIM Not Loading Adapters

Verify:

NIM_PEFT_SOURCE points to Entitystore
Adapter is marked as “ready” in database
NIM has network access to Entitystore and Datastore
Adapter is compatible with base model

Database Migration Fails

Solutions:

Check init container logs
Verify PostgreSQL version compatibility
Ensure database user has proper permissions
Check if database already exists

Advanced Usage

Adapter Metadata Schema

Adapters support rich metadata:

{
  "name": "finance-qa-v2",
  "version": "2.0.0",
  "base_model": "meta/llama-3.1-8b-instruct",
  "lora_config": {
    "rank": 16,
    "alpha": 32,
    "dropout": 0.1
  },
  "training": {
    "dataset": "finance-qa-v2-dataset",
    "job_id": "customization-456",
    "hyperparameters": {
      "learning_rate": 2e-4,
      "epochs": 3
    }
  },
  "metrics": {
    "train_loss": 0.45,
    "eval_accuracy": 0.89
  },
  "tags": ["finance", "qa", "production"]
}

Adapter Approval Workflow

Implement approval before serving:

Adapter uploaded with status “pending”
Review adapter quality and metrics
Approve via API: PATCH /v1/adapters/{id} with {"status": "ready"}
NIM only loads adapters with status “ready”

Next Steps

Train Adapters

Fine-tune models with NemoCustomizer

Serve with NIM

Configure NIM for dynamic adapter loading

Setup Datastore

Configure adapter file storage

API Reference

Detailed API documentation

Get Started

Core Concepts

NIM Services

NeMo Microservices

Configuration

Operations

​Overview

​When to Use NemoEntitystore

Multi-Tenant Serving

Adapter Versioning

A/B Testing

Cost Optimization

​Architecture

​Configuration

​Complete Example

​Key Configuration Fields

​Integration with Services

​NemoDatastore Integration

​NemoCustomizer Integration

​NIM Integration

​API Usage

​List Adapters

​Get Adapter Details

​Upload Adapter (Manual)

​Delete Adapter

​Adapter Lifecycle

​Using Adapters in NIM

​Request with Adapter

​List Loaded Adapters in NIM

​Database Schema

​Best Practices

​Monitoring

​Key Metrics

​Health Checks

​Troubleshooting

​Advanced Usage

​Adapter Metadata Schema

​Adapter Approval Workflow

​Next Steps

Train Adapters

Serve with NIM

Setup Datastore

API Reference

Build docs developers (and LLMs) love

Overview

When to Use NemoEntitystore

Architecture

Configuration

Complete Example

Key Configuration Fields

Integration with Services

NemoDatastore Integration

NemoCustomizer Integration

NIM Integration

API Usage

List Adapters

Get Adapter Details

Upload Adapter (Manual)

Delete Adapter

Adapter Lifecycle

Using Adapters in NIM

Request with Adapter

List Loaded Adapters in NIM

Database Schema

Best Practices

Monitoring

Key Metrics

Health Checks

Troubleshooting

Advanced Usage

Adapter Metadata Schema

Adapter Approval Workflow

Next Steps