Skip to main content
NeMo Microservices is a suite of production-ready services designed to streamline the lifecycle of generative AI models. These microservices work together to provide a complete platform for customizing, evaluating, protecting, and serving LLMs at scale.

What are NeMo Microservices?

NeMo Microservices provide essential infrastructure components for enterprise AI deployments:
  • Model Customization: Fine-tune and adapt pre-trained models to your specific use cases
  • Model Evaluation: Assess model performance across multiple benchmarks and metrics
  • Safety & Compliance: Apply guardrails to ensure responsible AI deployment
  • Data Management: Store and version control training data, models, and artifacts
  • Entity Management: Track and serve fine-tuned model adapters (LoRA/PEFT)

Architecture

NeMo Microservices integrate with NVIDIA NIM to create a comprehensive AI platform:

Available Services

NemoCustomizer

Fine-tune foundation models with your proprietary data using LoRA/PEFT techniques

NemoEvaluator

Evaluate model performance across standard benchmarks and custom metrics

NemoGuardrails

Add programmable guardrails to ensure safe and compliant AI interactions

NemoDatastore

Git-based storage for datasets, models, and training artifacts

NemoEntitystore

Manage and serve fine-tuned model adapters to NIM deployments

Key Features

Enterprise-Ready

  • Scalable Architecture: Horizontal scaling with Kubernetes HPA
  • High Availability: Multi-replica deployments with load balancing
  • Observability: Built-in OpenTelemetry support for tracing and monitoring
  • Security: PostgreSQL-backed persistence with secret management

Integration with NIM

NeMo Microservices are designed to work seamlessly with NVIDIA NIM:
  • Dynamic LoRA Loading: Serve multiple fine-tuned adapters from a single NIM instance
  • Model Versioning: Track and deploy different versions of customized models
  • Guardrail Integration: Apply safety policies at inference time
  • Performance Optimization: Efficient adapter switching without model reloading

Production Deployment

  • Kubernetes Native: Full integration with Kubernetes ecosystem
  • Resource Management: GPU scheduling for training jobs (Volcano/Run.AI)
  • Storage Flexibility: Support for PVCs, object storage (S3/MinIO)
  • Database Support: PostgreSQL for metadata and state management

Common Use Cases

Fine-tune general-purpose models on industry-specific data (legal, medical, financial) while maintaining the base model’s capabilities.
Deploy a single NIM instance that serves multiple customer-specific fine-tuned adapters, reducing infrastructure costs.
Evaluate models on production data, identify weaknesses, fine-tune with targeted datasets, and redeploy seamlessly.
Implement guardrails to prevent harmful outputs, ensure regulatory compliance, and maintain brand safety.

Prerequisites

Before deploying NeMo Microservices, ensure you have:
1

NVIDIA NIM Operator

The NIM Operator must be installed in your Kubernetes cluster. See Installation Guide.
2

PostgreSQL Database

Most services require PostgreSQL for metadata storage. Each service can use a separate database or schema.
3

NGC Credentials

Pull secrets for accessing NeMo Microservices container images from NGC.
4

GPU Resources

NemoCustomizer training jobs require GPU nodes. Configure appropriate node selectors and tolerations.

Getting Started

apiVersion: apps.nvidia.com/v1alpha1
kind: NemoDatastore
metadata:
  name: datastore
  namespace: nemo
spec:
  image:
    repository: nvcr.io/nvidia/nemo-microservices/datastore
    tag: "25.08"
  databaseConfig:
    host: postgres.nemo.svc.cluster.local
    port: 5432
    databaseName: ndsdb
    credentials:
      user: ndsuser
      secretName: postgres-credentials
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NemoEntitystore
metadata:
  name: entitystore
  namespace: nemo
spec:
  image:
    repository: nvcr.io/nvidia/nemo-microservices/entity-store
    tag: "25.08"
  datastore:
    endpoint: http://datastore.nemo.svc.cluster.local:8000
  databaseConfig:
    host: postgres.nemo.svc.cluster.local
    port: 5432
    databaseName: nesdb
    credentials:
      user: nesuser
      secretName: postgres-credentials

Next Steps

Deploy Customizer

Set up model fine-tuning capabilities

Configure Guardrails

Add safety controls to your AI applications

Setup Evaluation

Implement model quality assessment

Storage Setup

Configure data and model storage

Build docs developers (and LLMs) love