Skip to main content
Modal Classes (referred to as Cls) enable you to build stateful serverless applications with method pooling and lifecycle hooks. They’re ideal for workloads like model inference, where you want to load a model once and reuse it across many requests.

Creating classes

Create a class by decorating a Python class with @app.cls():
import modal

app = modal.App()

@app.cls()
class MyModel:
    @modal.method()
    def predict(self, input: str):
        return f"Prediction for: {input}"

Class configuration

The @app.cls() decorator accepts the same configuration parameters as @app.function():
@app.cls(
    image=modal.Image.debian_slim().pip_install("torch"),
    gpu="A100",
    secrets=[modal.Secret.from_name("model-api-key")],
    timeout=600
)
class GPUModel:
    @modal.method()
    def inference(self, data):
        pass
Configuration specified at the class level applies to all methods within the class.

Methods

Decorate instance methods with @modal.method() to make them remotely callable:
@app.cls()
class Counter:
    @modal.method()
    def increment(self, value: int):
        return value + 1
    
    @modal.method()
    def decrement(self, value: int):
        return value - 1
Call methods remotely:
with app.run():
    counter = Counter()
    result = counter.increment.remote(5)
    print(result)  # 6

Lifecycle hooks

Lifecycle methods let you manage state across the lifetime of a container.

Enter methods

Use @modal.enter() to run setup code when a container starts:
@app.cls(gpu="A100")
class Model:
    @modal.enter()
    def load_model(self):
        import torch
        # Load the model once when container starts
        self.model = torch.load("model.pth")
        self.model.eval()
    
    @modal.method()
    def predict(self, input):
        # Reuse the loaded model
        return self.model(input)
The @modal.enter() method runs once per container, not once per method call. This makes it perfect for expensive initialization like loading models or establishing database connections.

Exit methods

Use @modal.exit() to run cleanup code when a container shuts down:
@app.cls()
class DatabaseWorker:
    @modal.enter()
    def connect(self):
        self.db = connect_to_database()
    
    @modal.exit()
    def disconnect(self):
        self.db.close()
    
    @modal.method()
    def query(self, sql: str):
        return self.db.execute(sql)

Async lifecycle methods

Lifecycle methods can be async:
@app.cls()
class AsyncService:
    @modal.enter()
    async def setup(self):
        self.client = await create_async_client()
    
    @modal.method()
    async def call_api(self, endpoint: str):
        return await self.client.get(endpoint)

Parameterized classes

Use modal.parameter() to create parameterized classes:
import modal

app = modal.App()

@app.cls()
class ParameterizedModel:
    model_name: str = modal.parameter()
    batch_size: int = modal.parameter(default=32)
    
    @modal.enter()
    def load_model(self):
        print(f"Loading {self.model_name} with batch size {self.batch_size}")
        self.model = load_model(self.model_name)
    
    @modal.method()
    def predict(self, input):
        return self.model.predict(input, batch_size=self.batch_size)
Instantiate with different parameters:
with app.run():
    gpt2 = ParameterizedModel(model_name="gpt2", batch_size=16)
    gpt4 = ParameterizedModel(model_name="gpt4", batch_size=8)
    
    result1 = gpt2.predict.remote("Hello")
    result2 = gpt4.predict.remote("World")
@app.cls()
class Model:
    # Required parameter
    model_id: str = modal.parameter()
    
    # Optional parameter with default
    temperature: float = modal.parameter(default=0.7)
Custom __init__ constructors are deprecated. Use modal.parameter() annotations instead for class parameterization.

State management

State stored in instance variables persists across method calls within the same container:
@app.cls()
class StatefulCounter:
    @modal.enter()
    def initialize(self):
        self.count = 0
    
    @modal.method()
    def increment(self):
        self.count += 1
        return self.count
State is container-local. Different containers maintain separate state, and containers may be recycled over time.

Web endpoints on classes

Expose class methods as web endpoints:
@app.cls()
class API:
    @modal.method()
    @modal.web_endpoint()
    def health(self):
        return {"status": "healthy"}
    
    @modal.method()
    @modal.web_endpoint(method="POST")
    def process(self, data: dict):
        return {"result": data}

Looking up classes

Reference deployed classes by name:
import modal

Model = modal.Cls.from_name("my-app", "Model")
model = Model()
result = model.predict.remote("input")

Runtime configuration

Override class configuration at runtime:
Model = modal.Cls.from_name("my-app", "Model")

# Use a different GPU
ModelOnH100 = Model.with_options(gpu="H100")

# Override multiple settings
CustomModel = Model.with_options(
    gpu="A100",
    memory=16384,
    timeout=1200
)

model = CustomModel()
result = model.predict.remote(data)

Concurrency configuration

Enable input concurrency:
Model = modal.Cls.from_name("my-app", "Model")
ConcurrentModel = Model.with_concurrency(max_inputs=100, target_inputs=80)

model = ConcurrentModel()

Batching configuration

Enable dynamic batching:
Model = modal.Cls.from_name("my-app", "Model")
BatchedModel = Model.with_batching(max_batch_size=32, wait_ms=100)

model = BatchedModel()

Autoscaling

Dynamically adjust autoscaling for class instances:
Model = modal.Cls.from_name("my-app", "Model")
model = Model()

# Update autoscaling settings
model.update_autoscaler(
    min_containers=2,
    max_containers=10,
    buffer_containers=1
)
Autoscaler updates only affect the current instance. Redeployments reset to the static configuration.

Memory snapshots

Enable faster cold starts with memory snapshots:
@app.cls(enable_memory_snapshot=True)
class FastStartModel:
    @modal.enter(snap=True)
    def load_model(self):
        # This runs once and is checkpointed
        self.model = load_large_model()
    
    @modal.method()
    def predict(self, input):
        return self.model(input)
The snap=True parameter creates a checkpoint after the enter method completes, enabling subsequent containers to start from that state.

Best practices

Use lifecycle hooks for expensive initialization

Load models, establish connections, and perform other expensive setup in @modal.enter() methods:
@app.cls(gpu="A100")
class EfficientModel:
    @modal.enter()
    def load(self):
        # Expensive: runs once per container
        self.model = load_model()
    
    @modal.method()
    def predict(self, input):
        # Fast: reuses loaded model
        return self.model(input)
Group related functionality:
@app.cls()
class ImageProcessor:
    @modal.method()
    def resize(self, image, width, height):
        pass
    
    @modal.method()
    def crop(self, image, x, y, w, h):
        pass
    
    @modal.method()
    def filter(self, image, filter_type):
        pass

Use parameters for configuration

Make classes reusable with parameters:
@app.cls()
class ConfigurableModel:
    model_version: str = modal.parameter()
    precision: str = modal.parameter(default="fp16")
    
    @modal.enter()
    def load_model(self):
        self.model = load_model(self.model_version, self.precision)

Clean up resources in exit methods

Ensure proper cleanup:
@app.cls()
class ResourceManager:
    @modal.enter()
    def acquire_resources(self):
        self.resource = acquire_expensive_resource()
    
    @modal.exit()
    def release_resources(self):
        self.resource.close()

Build docs developers (and LLMs) love