Skip to main content

Overview

Modal provides flexible trial credits: 5permonthuponsignup,whichincreasesto5 per month upon sign up, which increases to 30 per month when you add a payment method. Modal is a serverless platform where you pay by compute time for any supported model.

Free Tier

$5/month (no payment method)

Enhanced Free Tier

$30/month (with payment method)

Pricing Model

Modal charges by compute time rather than tokens. You pay for the actual CPU/GPU time your code uses, making it cost-effective for batch processing and custom workloads.

Credits Structure

TierMonthly CreditsRequirements
Basic$5/monthJust sign up
Enhanced$30/monthAdd payment method

Available Models

Modal supports any model you can deploy. Unlike traditional API providers, Modal lets you:
  • Deploy any open-source model from Hugging Face
  • Run custom inference code with your own optimizations
  • Use any framework: PyTorch, JAX, TensorFlow, etc.
  • Scale automatically based on demand
Modal is ideal for developers who want full control over their model deployment and inference pipeline.

Getting Started

1. Sign Up

Visit modal.com and create a free account to receive $5/month in credits.

2. Add Payment Method (Optional)

Add a payment method to increase your free monthly credits to $30.

3. Install Modal CLI

pip install modal

4. Authenticate

modal token new

5. Deploy Your First Function

import modal

stub = modal.Stub("example-app")

@stub.function(
    gpu="A10G",
    image=modal.Image.debian_slim().pip_install(
        "transformers",
        "torch",
        "accelerate"
    )
)
def generate_text(prompt: str):
    from transformers import pipeline
    
    generator = pipeline(
        "text-generation",
        model="meta-llama/Llama-3.1-8B-Instruct",
        device="cuda"
    )
    
    return generator(prompt, max_length=100)[0]["generated_text"]

@stub.local_entrypoint()
def main():
    result = generate_text.remote("What is the capital of France?")
    print(result)

6. Run Your Function

modal run app.py

Key Features

Serverless Infrastructure

  • Automatic scaling: Scale to zero when idle
  • GPU access: Use A10G, A100, or other GPUs
  • Fast cold starts: Optimized container loading

Flexible Deployment

Any Model

Deploy any model from Hugging Face or custom weights

Custom Code

Full control over inference pipeline

Multiple Frameworks

PyTorch, JAX, TensorFlow, ONNX, etc.

Auto Scaling

Scale from zero to thousands of GPUs

Use Cases

  • Custom Models: Deploy proprietary or fine-tuned models
  • Batch Processing: Process large datasets efficiently
  • Research: Experiment with different model architectures
  • API Services: Build production-grade inference APIs
  • Data Processing: Run GPU-accelerated data pipelines

Cost Optimization

Since Modal charges by compute time:
  • Use smaller GPUs for testing
  • Implement caching to avoid redundant computation
  • Batch requests when possible
  • Scale to zero when idle (automatic)

Resources

Modal Platform

Access the platform

Documentation

View documentation

Examples

Browse example apps

Pricing

View pricing details
Free credits are provided monthly. They reset each month and do not roll over.

Build docs developers (and LLMs) love