Skip to main content

Buy vs Build

The most important decision in ML infrastructure: Should we build or buy?
Building gives you control and customization. Buying gives you speed and support. Neither is universally better.

Decision Framework

Build when:
  • You have unique requirements (custom hardware, proprietary algorithms)
  • Cost at scale justifies engineering investment (>$500K/year cloud spend)
  • You have a strong ML platform team (5+ engineers)
  • Vendor lock-in is unacceptable
Buy when:
  • Time-to-market is critical (startup, new product)
  • Small team (fewer than 5 engineers)
  • Standard use cases (image classification, NER, embeddings)
  • Need SLA and support

The Build vs Buy Spectrum

Full Build ←―――――――――――――――――――――――――→ Full Buy
  K8s + OSS      Managed K8s       Platform Services     Pure SaaS
(Kubeflow+MLflow) (EKS+SageMaker)   (Vertex AI)         (OpenAI API)
Most companies end up in the middle: managed infrastructure + open-source tools.

Open Source Stack

Pro: No vendor lock-in, full controlCon: Requires deep expertise, ongoing maintenanceExample: Self-hosted K8s, Kubeflow, MLflow, Prometheus

Managed Platform

Pro: Fast setup, integrated services, SLACon: Vendor lock-in, less flexibilityExample: AWS SageMaker, Google Vertex AI, Azure ML

Serverless

Pro: Zero ops, pay-per-use, scales to zeroCon: Cold starts, vendor-specific APIsExample: Modal, Replicate, Banana, Baseten

API Services

Pro: No infrastructure, state-of-the-art modelsCon: Expensive at scale, data privacy concernsExample: OpenAI, Anthropic, Cohere

Real-World Stack Examples

Startup (Seed Stage)

Goal: Ship fast, minimize ops Stack:
  • Compute: Modal or Railway (serverless)
  • Data: Postgres (Supabase) + S3
  • Experiments: Weights & Biases
  • Serving: Modal.com or Replicate
  • Monitoring: Sentry + built-in metrics
Why:
  • No Kubernetes complexity
  • Pay-per-use (cost scales with revenue)
  • Focus on product, not infrastructure
Many successful companies stay on this stack for years. Don’t over-engineer early.

Mid-Size Company (Series A-B)

Goal: Control costs, improve reliability Stack:
  • Infra: Managed Kubernetes (EKS/GKE)
  • Data: S3 + Snowflake/BigQuery
  • Experiments: W&B + MLflow
  • Pipelines: Airflow (Astronomer)
  • Serving: FastAPI on K8s + KServe
  • Monitoring: Prometheus + Grafana + Datadog
Why:
  • Managed services reduce ops burden
  • K8s for flexibility without full self-hosting
  • Mix of open-source and commercial tools

Large Enterprise

Goal: Scale, compliance, multi-tenancy Stack:
  • Infra: Self-hosted K8s (on-prem or cloud)
  • Data: Data lake (Delta Lake) + feature store (Feast/Tecton)
  • Experiments: MLflow + custom platform
  • Pipelines: Airflow or Kubeflow
  • Serving: Custom inference framework + Triton
  • Monitoring: Prometheus + Grafana + Seldon
Why:
  • Full control for compliance (HIPAA, GDPR)
  • Cost optimization at scale
  • Custom features (multi-tenancy, chargeback)

AWS Example

A production ML system on AWS: Components:
  • Data: S3 for raw data, Athena for queries, RDS for metadata
  • Processing: SageMaker Processing Jobs (Spark/Pandas at scale)
  • Training: EC2 with GPUs + SageMaker Training Jobs
  • Pipelines: MWAA (Managed Airflow)
  • Serving: SageMaker Multi-Model Endpoints
  • Monitoring: CloudWatch + SageMaker Model Monitor
SageMaker Multi-Model Endpoints:
import boto3

sagemaker = boto3.client('sagemaker')

# Create endpoint
sagemaker.create_endpoint(
    EndpointName='multi-model-endpoint',
    EndpointConfigName='multi-model-config'
)

# Upload models to S3
s3.upload_file('model1.tar.gz', 'bucket', 'models/model1.tar.gz')
s3.upload_file('model2.tar.gz', 'bucket', 'models/model2.tar.gz')

# Invoke specific model
runtime = boto3.client('sagemaker-runtime')
response = runtime.invoke_endpoint(
    EndpointName='multi-model-endpoint',
    TargetModel='model1.tar.gz',  # Specify which model
    Body=json.dumps({'inputs': [...]})
)
Why Multi-Model Endpoints?
  • Share instance across 100s of models
  • Models loaded on-demand (saves memory)
  • Cost-effective for many small models
SageMaker is great for teams already on AWS. For multi-cloud or open-source preference, use EKS + Kubeflow.

GCP Example (Vertex AI)

Components:
  • Data: GCS + BigQuery
  • Processing: Dataflow (Apache Beam)
  • Training: Vertex AI Training (managed)
  • Pipelines: Vertex AI Pipelines (Kubeflow-based)
  • Serving: Vertex AI Prediction
  • Monitoring: Cloud Monitoring + Vertex AI Model Monitoring
Vertex AI simplifies Kubeflow:
from google.cloud import aiplatform

aiplatform.init(project='my-project', location='us-central1')

job = aiplatform.CustomTrainingJob(
    display_name='bert-training',
    container_uri='gcr.io/my-project/trainer:v1',
)

model = job.run(
    replica_count=4,
    machine_type='n1-standard-8',
    accelerator_type='NVIDIA_TESLA_T4',
    accelerator_count=1
)

endpoint = model.deploy(machine_type='n1-standard-4')
Vertex AI handles:
  • Distributed training
  • Model versioning
  • A/B testing
  • Drift detection
Vertex AI is the most “batteries-included” cloud ML platform. Use it if you’re all-in on GCP.

Common Patterns

Hybrid Serving

Problem: Some models need GPUs, others don’t. Running all on GPU is wasteful. Solution:
API Gateway
  ├─> Lightweight models (CPU, FastAPI)
  └─> Heavy models (GPU, Triton)
Route based on model type:
from fastapi import FastAPI
import httpx

app = FastAPI()

@app.post('/predict')
async def predict(request: Request):
    if request.model in ['bert', 'gpt']:
        # Route to GPU cluster
        return await httpx.post('http://gpu-cluster/predict', json=request.dict())
    else:
        # Handle locally (CPU)
        return model.predict(request.inputs)

Feature Store

Problem: Training uses batch features, serving needs real-time features. Code diverges. Solution: Centralized feature store (Feast, Tecton)
from feast import FeatureStore

store = FeatureStore(repo_path='.')

# Training: batch features
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=['user_features:age', 'user_features:country']
).to_df()

# Serving: online features
features = store.get_online_features(
    features=['user_features:age', 'user_features:country'],
    entity_rows=[{'user_id': 123}]
).to_dict()
Feature store ensures consistency.
Feature stores are essential for large teams (>20 data scientists) but overkill for small projects. Start simple.

Shadow Mode

Problem: New model needs validation before replacing old one. Solution: Run both, compare predictions
@app.post('/predict')
def predict(request):
    # Production model
    prod_result = model_v1.predict(request.inputs)
    
    # Shadow model (logged, not returned)
    shadow_result = model_v2.predict(request.inputs)
    log_comparison(prod_result, shadow_result)
    
    return prod_result  # Only return prod
After validating shadow model performs well, swap it in.

A/B Testing

import random

@app.post('/predict')
def predict(request):
    user_id = request.user_id
    if hash(user_id) % 100 < 10:  # 10% traffic
        result = model_v2.predict(request.inputs)
        variant = 'B'
    else:
        result = model_v1.predict(request.inputs)
        variant = 'A'
    
    log_prediction(user_id, result, variant)
    return result
Track metrics (accuracy, latency, user engagement) by variant.

Tech Radar

Module 8 includes a tech radar (inspired by Thoughtworks):
  • Adopt: Proven, safe for production
  • Trial: Worth trying in non-critical projects
  • Assess: Interesting but not ready
  • Hold: Avoid or phase out
Examples:
  • Adopt: FastAPI, Kubernetes, Prometheus, W&B
  • Trial: Dagster, vLLM, Modal
  • Assess: MLflow Model Registry, Seldon Core v2
  • Hold: TensorFlow (prefer PyTorch), Airflow 1.x
Tech radars are opinionated. Build your own based on team experience and requirements.

Hands-On Examples

Explore production patterns in Module 8:
  • Deploy multi-model endpoints on SageMaker
  • Understand buy vs build trade-offs
  • Review real-world architecture examples
  • Build a custom tech radar

Next Steps

Course Overview

Review all concepts

Module 1

Start with containerization

Further Reading

Build docs developers (and LLMs) love