Skip to main content
NemoEntitystore provides a storage and serving solution for fine-tuned model adapters (LoRA/PEFT). It enables NIM instances to dynamically load customer-specific adapters without redeploying the base model.

Overview

NemoEntitystore enables you to:
  • Store fine-tuned LoRA adapters
  • Serve adapters to NIM instances via HTTP API
  • Version and track adapter metadata
  • Enable multi-tenant model serving
  • Integrate with NemoCustomizer for automatic adapter upload
  • Support dynamic adapter loading in NIM

When to Use NemoEntitystore

Multi-Tenant Serving

Serve customer-specific adapters from a single NIM instance

Adapter Versioning

Track and manage different versions of fine-tuned models

A/B Testing

Deploy multiple adapter versions and route traffic

Cost Optimization

Share base model infrastructure across tenants

Architecture

NemoEntitystore bridges NemoCustomizer training and NIM serving:

Configuration

Complete Example

apiVersion: apps.nvidia.com/v1alpha1
kind: NemoEntitystore
metadata:
  name: nemoentitystore-sample
  namespace: nemo
spec:
  # Container image configuration
  image:
    repository: nvcr.io/nvidia/nemo-microservices/entity-store
    tag: "25.08"
    pullPolicy: IfNotPresent
    pullSecrets:
      - ngc-secret
  
  # Service exposure
  expose:
    service:
      type: ClusterIP
      port: 8000
  
  # Replica configuration
  replicas: 1
  
  # NemoDatastore endpoint for adapter file storage
  datastore:
    endpoint: http://nemodatastore-sample.nemo.svc.cluster.local:8000
  
  # PostgreSQL database for adapter metadata
  databaseConfig:
    databaseName: nesdb
    host: entity-store-pg-postgresql.nemo.svc.cluster.local
    port: 5432
    credentials:
      user: nesuser
      secretName: entity-store-pg-existing-secret
      passwordKey: password
  
  # Resource limits
  resources:
    requests:
      memory: "256Mi"
      cpu: "500m"
    limits:
      memory: "512Mi"
      cpu: "1"

Key Configuration Fields

spec.datastore
object
required
NemoDatastore endpoint for storing adapter files.
endpoint
string
required
HTTP URL to NemoDatastore service.
spec.databaseConfig
object
required
PostgreSQL database configuration for adapter metadata and versioning.
spec.image.tag
string
Version of the entity-store image. Should match your NemoCustomizer version.

Integration with Services

NemoDatastore Integration

Entitystore uses Datastore to persist adapter files:
spec:
  datastore:
    endpoint: http://nemodatastore-sample.nemo.svc.cluster.local:8000
Adapter files are stored as Git repositories in Datastore, enabling:
  • Version control of adapters
  • Efficient storage with deduplication
  • Access via Git or HTTP

NemoCustomizer Integration

Customizer automatically uploads trained adapters:
# NemoCustomizer config
spec:
  entitystore:
    endpoint: http://nemoentitystore-sample.nemo.svc.cluster.local:8000
After training completes, adapters are:
  1. Packaged with metadata
  2. Uploaded to Entitystore
  3. Registered in the database
  4. Available for NIM to load

NIM Integration

Configure NIM to poll Entitystore for adapters:
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
  name: llama-3-1-8b-instruct
  namespace: nemo
spec:
  env:
    # Enable PEFT adapter loading
    - name: NIM_PEFT_SOURCE
      value: http://nemoentitystore-sample.nemo.svc.cluster.local:8000
    
    # Refresh interval (seconds)
    - name: NIM_PEFT_REFRESH_INTERVAL
      value: "180"
    
    # Maximum adapters to cache
    - name: NIM_MAX_CPU_LORAS
      value: "16"
    - name: NIM_MAX_GPU_LORAS
      value: "8"
NIM polls Entitystore at the specified interval and automatically loads new adapters.

API Usage

List Adapters

curl http://nemoentitystore-sample.nemo.svc.cluster.local:8000/v1/adapters
Response:
{
  "adapters": [
    {
      "id": "finance-qa-v1",
      "name": "finance-qa-v1",
      "base_model": "meta/llama-3.1-8b-instruct",
      "version": "1.0.0",
      "created_at": "2024-03-15T10:30:00Z",
      "status": "ready"
    }
  ]
}

Get Adapter Details

curl http://nemoentitystore-sample.nemo.svc.cluster.local:8000/v1/adapters/finance-qa-v1
Response:
{
  "id": "finance-qa-v1",
  "name": "finance-qa-v1",
  "base_model": "meta/llama-3.1-8b-instruct",
  "version": "1.0.0",
  "lora_config": {
    "rank": 16,
    "alpha": 32,
    "target_modules": ["q_proj", "v_proj"]
  },
  "download_url": "http://nemodatastore-sample.nemo.svc.cluster.local:8000/adapters/finance-qa-v1.tar.gz",
  "metadata": {
    "dataset": "finance-qa-dataset",
    "training_job": "customization-123"
  }
}

Upload Adapter (Manual)

Typically done by NemoCustomizer, but can be manual:
curl -X POST http://nemoentitystore-sample.nemo.svc.cluster.local:8000/v1/adapters \
  -H "Content-Type: multipart/form-data" \
  -F "[email protected]" \
  -F 'metadata={
    "name": "custom-adapter-v1",
    "base_model": "meta/llama-3.1-8b-instruct",
    "version": "1.0.0"
  }'

Delete Adapter

curl -X DELETE http://nemoentitystore-sample.nemo.svc.cluster.local:8000/v1/adapters/finance-qa-v1

Adapter Lifecycle

1

Training

NemoCustomizer fine-tunes a model and produces a LoRA adapter.
2

Upload

Customizer packages the adapter with metadata and uploads to Entitystore.
3

Registration

Entitystore:
  • Stores adapter files in NemoDatastore
  • Registers metadata in PostgreSQL
  • Validates adapter compatibility
4

Discovery

NIM polls Entitystore and discovers new adapters.
5

Download

NIM downloads adapter files from NemoDatastore.
6

Loading

NIM loads the adapter into memory (CPU or GPU).
7

Serving

Adapter is available for inference requests.

Using Adapters in NIM

Request with Adapter

Specify adapter in the request:
curl -X POST http://nim-service.nemo.svc.cluster.local:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama-3.1-8b-instruct",
    "messages": [
      {"role": "user", "content": "What are quarterly earnings?"}
    ],
    "peft_name": "finance-qa-v1"
  }'

List Loaded Adapters in NIM

curl http://nim-service.nemo.svc.cluster.local:8000/v1/models
Response includes base model and loaded adapters:
{
  "data": [
    {
      "id": "meta/llama-3.1-8b-instruct",
      "object": "model"
    },
    {
      "id": "finance-qa-v1",
      "object": "model",
      "owned_by": "peft"
    }
  ]
}

Database Schema

Entitystore uses PostgreSQL to track:
  • Adapter metadata (name, version, base model)
  • Training provenance (job ID, dataset, hyperparameters)
  • Storage locations (Datastore repository, file paths)
  • Status (ready, uploading, failed)
  • Access logs (who/when accessed)
Database is automatically initialized via init container.

Best Practices

  • Use semantic versioning (v1.0.0, v1.1.0)
  • Include metadata about training data and purpose
  • Test adapters before making them available
  • Archive old versions rather than deleting
  • Monitor adapter cache hit rates in NIM
  • Adjust NIM_MAX_GPU_LORAS based on GPU memory
  • Use faster storage for frequently accessed adapters
  • Pre-load commonly used adapters
  • Use adapter naming conventions (tenant-purpose-version)
  • Implement access control if needed
  • Monitor adapter usage per tenant
  • Set resource quotas for adapter storage
  • Run multiple replicas for HA
  • Backup PostgreSQL regularly
  • Monitor Datastore storage capacity
  • Implement adapter approval workflows

Monitoring

Key Metrics

  • Number of registered adapters
  • Adapter upload/download rates
  • Storage usage in Datastore
  • NIM adapter cache statistics
  • API request latency

Health Checks

# Entitystore health
curl http://nemoentitystore-sample.nemo.svc.cluster.local:8000/health

# Database connectivity
kubectl exec deployment/nemoentitystore-sample -n nemo -- \
  nc -zv entity-store-pg-postgresql.nemo.svc.cluster.local 5432

# Datastore connectivity
curl http://nemodatastore-sample.nemo.svc.cluster.local:8000/v1/health

Troubleshooting

Check:
  • NemoDatastore is accessible
  • Database connection is healthy
  • Adapter file format is correct
  • Storage quota not exceeded
Verify:
  • NIM_PEFT_SOURCE points to Entitystore
  • Adapter is marked as “ready” in database
  • NIM has network access to Entitystore and Datastore
  • Adapter is compatible with base model
Solutions:
  • Check init container logs
  • Verify PostgreSQL version compatibility
  • Ensure database user has proper permissions
  • Check if database already exists

Advanced Usage

Adapter Metadata Schema

Adapters support rich metadata:
{
  "name": "finance-qa-v2",
  "version": "2.0.0",
  "base_model": "meta/llama-3.1-8b-instruct",
  "lora_config": {
    "rank": 16,
    "alpha": 32,
    "dropout": 0.1
  },
  "training": {
    "dataset": "finance-qa-v2-dataset",
    "job_id": "customization-456",
    "hyperparameters": {
      "learning_rate": 2e-4,
      "epochs": 3
    }
  },
  "metrics": {
    "train_loss": 0.45,
    "eval_accuracy": 0.89
  },
  "tags": ["finance", "qa", "production"]
}

Adapter Approval Workflow

Implement approval before serving:
  1. Adapter uploaded with status “pending”
  2. Review adapter quality and metrics
  3. Approve via API: PATCH /v1/adapters/{id} with {"status": "ready"}
  4. NIM only loads adapters with status “ready”

Next Steps

Train Adapters

Fine-tune models with NemoCustomizer

Serve with NIM

Configure NIM for dynamic adapter loading

Setup Datastore

Configure adapter file storage

API Reference

Detailed API documentation

Build docs developers (and LLMs) love