Serverless Alternatives

Introduction

Kubernetes is powerful but complex. For small teams, pet projects, or when you simply don’t want to manage infrastructure, serverless platforms offer simpler alternatives.

Serverless doesn’t mean “no servers”—it means you don’t manage them. The platform handles scaling, orchestration, and infrastructure automatically.

When to Choose Serverless

Good Use Cases

Small teams: No dedicated DevOps resources
Rapid prototyping: Get from idea to production in minutes
Variable workload: Scale to zero when idle, scale up on demand
Focus on code: Spend time on ML, not infrastructure

When to Use Kubernetes Instead

Complex microservices: Many interdependent services
Strict cost control: Reserved capacity is cheaper than pay-per-use
Custom infrastructure: Need specific networking, storage, or security setups
Vendor lock-in concerns: Kubernetes provides portability

Serverless platforms can become expensive at scale. Always monitor costs and compare with dedicated infrastructure as your usage grows.

Modal is purpose-built for AI/ML workloads. It provides serverless GPU access, automatic scaling, and a Python-native API.

GPU support: Access H100, A100, and other GPUs without provisioning
Python-first: Define infrastructure with Python decorators
Fast iteration: Hot reload code without rebuilding containers
Automatic scaling: Scale from 0 to 1000s of containers
Built-in orchestration: Distributed map, parallel jobs, scheduled functions

Installation

# Install Modal
uv add modal

# Or with pip
pip install modal

# Authenticate
modal token new

This opens your browser to complete authentication.

Hello World Example

import sys
import modal

app = modal.App("ml-in-production-module-1")


@app.function()
def f(i):
    if i % 2 == 0:
        print("hello", i)
    else:
        print("world", i, file=sys.stderr)

    return i * i


@app.local_entrypoint()
def main():
    # run the function remotely on Modal
    print(f.remote(1000))

    # run the function in parallel and remotely on Modal
    total = 0
    for ret in f.map(range(20)):
        total += ret

    print(total)

Run the example:

uv run modal run -d ./modal-examples/modal_hello_world.py

Key Features Demonstrated

Remote execution: f.remote(1000) runs the function on Modal’s infrastructure
Parallel processing: f.map(range(20)) distributes work across multiple containers
Local entrypoint: @app.local_entrypoint() runs locally and orchestrates remote functions

Modal handles containerization automatically. You don’t write Dockerfiles—just specify dependencies in your Python code.

ML Training Example

Modal excels at GPU-accelerated training. Here’s a real-world example that fine-tunes a language model:

import modal

app = modal.App("function-calling-finetune")
image = (
    modal.Image.debian_slim()
    .pip_install(
        [
            "transformers==4.51.2",
            "peft==0.15.1",
            "bitsandbytes==0.45.4",
            "trl==0.16.1",
            "datasets==3.5.0",
            "torch==2.2.1",
            "accelerate==1.5.2",
            "wandb==0.19.8",
        ]
    )
    .env({"WANDB_PROJECT": "function-calling-finetune"})
)

with image.imports():
    from enum import Enum
    import torch

    from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
    from datasets import load_dataset
    from trl import SFTConfig, SFTTrainer
    from peft import LoraConfig, TaskType, PeftConfig, PeftModel


DATASET_NAME = "Jofthomas/hermes-function-calling-thinking-V1"
USERNAME = "truskovskiyk"
MODEL_NAME = "google/gemma-3-4b-it"
OUTPUT_DIR = "gemma-3-4b-it-function-calling"


@app.function(
    image=image,
    cloud="aws",
    gpu="H200",
    timeout=86400,
    secrets=[modal.Secret.from_name("training-config")],
)
def function_calling_finetune():
    set_seed(42)

    dataset_name = DATASET_NAME
    username = USERNAME
    model_name = MODEL_NAME
    output_dir = OUTPUT_DIR

    # Load tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    # ... (training code continues)

Run training:

uv run modal run -d ./modal-examples/modal_hello_world_training.py::function_calling_finetune

Container Image Definition

image = (
    modal.Image.debian_slim()
    .pip_install(["transformers==4.51.2", "torch==2.2.1"])
    .env({"WANDB_PROJECT": "my-project"})
)

Define dependencies programmatically—no Dockerfile needed.

GPU Allocation

@app.function(
    image=image,
    cloud="aws",
    gpu="H200",
    timeout=86400,
)

Request specific GPU types and cloud providers.

Secret Management

secrets=[modal.Secret.from_name("training-config")]

Securely inject API keys and credentials.

Distributed Computing

results = f.map(data_batches, order_outputs=False)

Automatically parallelize work across containers.

Optimize Cold Starts
Volume Storage
Scheduled Jobs

# Use slim base images
image = modal.Image.debian_slim()

# Cache expensive operations
@app.function(image=image)
@modal.web_endpoint()
def serve():
    # This loads once per container lifetime
    model = load_model()
    
    def handler(request):
        return model.predict(request.data)
    
    return handler

# Persist data across runs
volume = modal.Volume.from_name("my-data")

@app.function(volumes={"/data": volume})
def train():
    # Read from /data
    dataset = load_from_disk("/data/dataset")
    
    # Write results
    model.save("/data/model")
    volume.commit()  # Persist changes

# Run daily training
@app.function(
    schedule=modal.Period(days=1),
    secrets=[modal.Secret.from_name("api-keys")]
)
def daily_retrain():
    download_latest_data()
    train_model()
    deploy_model()

Modal charges for compute time only (no idle costs):

CPU: ~$0.0001/CPU-second
GPU: ~ $1.10/hour for A10G, ~$ 4.50/hour for A100
Storage: Volumes are extra

You only pay when functions are executing. Containers scale to zero automatically, making Modal cost-effective for intermittent workloads.

Railway: Simple App Deployment

Railway provides simple deployment for web applications and APIs. It’s ideal for model serving endpoints.

Why Railway?

Zero config: Deploy from GitHub with one click
Databases included: PostgreSQL, Redis, MongoDB built-in
Automatic HTTPS: SSL certificates and domains handled automatically
Preview environments: Every PR gets its own environment
Simple pricing: Pay for resources used, no hidden fees

Getting Started

Visit Railway

open https://railway.app/

Create Project

Click “New Project” and select your repository.Railway detects your runtime (Python, Node, etc.) automatically.

Configure Service

Railway reads configuration from:

Dockerfile (if present)
requirements.txt for Python
package.json for Node.js

No configuration needed for standard projects.

Deploy

Push to your main branch—Railway deploys automatically.Get a public URL instantly: https://your-app.up.railway.app

Railway Use Cases

Model Serving API
Streamlit Dashboard
Background Workers

# app.py
from fastapi import FastAPI
import joblib

app = FastAPI()
model = joblib.load("model.pkl")

@app.post("/predict")
def predict(data: dict):
    prediction = model.predict([data["features"]])
    return {"prediction": prediction[0]}

Railway automatically:

Detects FastAPI
Installs dependencies
Exposes port 8000
Provides HTTPS URL

# streamlit_app.py
import streamlit as st
import pandas as pd

st.title("ML Model Dashboard")

uploaded_file = st.file_uploader("Upload data")
if uploaded_file:
    df = pd.read_csv(uploaded_file)
    predictions = model.predict(df)
    st.write(predictions)

Railway detects Streamlit and configures automatically.

# worker.py
import schedule
import time

def retrain_model():
    print("Retraining model...")
    # Training logic here

schedule.every().day.at("02:00").do(retrain_model)

while True:
    schedule.run_pending()
    time.sleep(60)

Deploy background tasks alongside your API.

Railway Environment Variables

Configure secrets in Railway’s dashboard:

# Set via Railway UI
DATABASE_URL=postgresql://...
WANDB_API_KEY=...
MODEL_PATH=/app/models/model.pkl

Access in code:

import os

db_url = os.environ["DATABASE_URL"]
api_key = os.environ.get("WANDB_API_KEY")

Railway Pricing

Railway uses credit-based pricing:

Starter: $5/month (free trial available)
Developer: $20/month for more resources
Pay-as-you-go: ~$0.000463/GB-hour for memory

Railway is convenient but can be more expensive than self-hosted options at scale. Monitor usage and set spending limits.

Feature	Modal	Railway
Best For	GPU training, batch jobs	Web APIs, databases
GPU Support	✅ H100, A100, A10G	❌ No GPU
Scaling	Automatic, 0 to 1000s	Automatic, but limited
Pricing	Per-second GPU/CPU	Per-resource usage
Setup Complexity	Python decorators	Git push
Use Case	Heavy ML workloads	Simple deployments

Other Serverless Options

Google Cloud Run

Containerized applications on serverless infrastructure:

# Deploy with one command
gcloud run deploy model-server \
  --image gcr.io/project/model-server \
  --platform managed \
  --allow-unauthenticated

Pros: Fast scaling, free tier, GCP integration
Cons: 15-minute timeout, no GPUs

AWS Lambda

Function-as-a-Service with ML support:

# lambda_function.py
import json

def lambda_handler(event, context):
    # Your ML inference code
    prediction = model.predict(event['data'])
    return {
        'statusCode': 200,
        'body': json.dumps({'prediction': prediction})
    }

Pros: Massive scale, pay-per-invocation
Cons: Cold starts, complex ML dependencies

Hugging Face Spaces

Host ML demos and models for free:

# app.py
import gradio as gr

def predict(text):
    return model.generate(text)

gr.Interface(fn=predict, inputs="text", outputs="text").launch()

Upload to Hugging Face Spaces for instant public hosting.

Migration Path

Starting Point

Prototype

Start with Railway or Modal for rapid development.

Validate

Prove your ML system works and provides value.

Scale

Monitor costs and performance metrics.

Migrate

Move to Kubernetes when:

Serverless costs exceed dedicated infrastructure
You need custom networking/security
Team has DevOps capacity

Keeping Options Open

Design portable applications:

# config.py
import os

def load_model():
    """Load model from environment-specific location"""
    if os.environ.get("MODAL_RUNTIME"):
        return load_from_volume("/cache/model.pkl")
    elif os.environ.get("RAILWAY_ENVIRONMENT"):
        return load_from_s3("s3://models/model.pkl")
    else:
        return load_from_disk("./model.pkl")

This allows switching platforms without code changes.

Best Practices

Always set budget alerts on serverless platforms. GPU costs can accumulate quickly if workloads run longer than expected.

Cost Control

Set limits: Configure max concurrency and timeout limits
Monitor usage: Track GPU hours and function invocations
Optimize cold starts: Cache models and dependencies
Use preemptible instances: Save costs on interruptible workloads

Development Workflow

# Develop locally
python train.py

# Test on serverless
modal run train.py

# Deploy to production
modal deploy train.py

Keep local development fast, use serverless for expensive operations.

Resources

Railway

General

Next Steps

Ready to practice everything you’ve learned? Head to the Practice Exercise to apply containerization, Kubernetes, CI/CD, and serverless concepts in a hands-on project.

Module 1: Infrastructure

Module 2: Data Management

Module 3: Training Workflows

Module 4: Pipeline Orchestration

Module 5: Model Serving

Module 6: Optimization

Module 7: Monitoring

Module 8: Cloud Platforms

Introduction

When to Choose Serverless

Good Use Cases

When to Use Kubernetes Instead

Installation

Hello World Example

Key Features Demonstrated

ML Training Example

Railway: Simple App Deployment

Why Railway?

Getting Started

Railway Use Cases

Railway Environment Variables

Railway Pricing

Other Serverless Options

Google Cloud Run

AWS Lambda

Hugging Face Spaces

Migration Path

Starting Point

Keeping Options Open

Best Practices

Cost Control

Development Workflow

Resources

Railway

General

Next Steps

Build docs developers (and LLMs) love

Module 1: Infrastructure

Module 2: Data Management

Module 3: Training Workflows

Module 4: Pipeline Orchestration

Module 5: Model Serving

Module 6: Optimization

Module 7: Monitoring

Module 8: Cloud Platforms

​Introduction

​When to Choose Serverless

​Good Use Cases

​When to Use Kubernetes Instead

​Modal: Serverless for AI/ML

​Why Modal?

​Installation

​Hello World Example

​Key Features Demonstrated

​ML Training Example

​Modal Features Breakdown

​Modal Best Practices

​Modal Pricing

​Railway: Simple App Deployment

​Why Railway?

​Getting Started

​Railway Use Cases

​Railway Environment Variables

​Railway Pricing

​Comparison: Modal vs Railway

​Other Serverless Options

​Google Cloud Run

​AWS Lambda

​Hugging Face Spaces

​Migration Path

​Starting Point

​Keeping Options Open

​Best Practices

​Cost Control

​Development Workflow

​Resources

​Modal

​Railway

​General

​Next Steps

Build docs developers (and LLMs) love

Introduction

When to Choose Serverless

Good Use Cases

When to Use Kubernetes Instead

Modal: Serverless for AI/ML

Why Modal?

Installation

Hello World Example

Key Features Demonstrated

ML Training Example

Modal Features Breakdown

Modal Best Practices

Modal Pricing

Railway: Simple App Deployment

Why Railway?

Getting Started

Railway Use Cases

Railway Environment Variables

Railway Pricing

Comparison: Modal vs Railway

Other Serverless Options

Google Cloud Run

AWS Lambda

Hugging Face Spaces

Migration Path

Starting Point

Keeping Options Open

Best Practices

Cost Control

Development Workflow

Resources

Modal

Railway

General

Next Steps