Cog: Containers for Machine Learning

Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container. You can deploy your packaged model to your own infrastructure, or to Replicate.

Why Cog?

It’s really hard for researchers to ship machine learning models to production. Part of the solution is Docker, but it is so complex to get it to work: Dockerfiles, pre-/post-processing, Flask servers, CUDA versions. More often than not the researcher has to sit down with an engineer to get the damn thing deployed. Cog solves this by providing a simple interface to package your model with all its dependencies, automatically handling the complexity of Docker, CUDA, and production serving.

Installation

Install Cog on macOS, Linux, or Windows with WSL2

Quickstart

Run your first prediction in under 5 minutes

YAML Reference

Learn how to configure your model’s environment

Python API

Define predictions with the Predictor interface

Key Features

Docker containers without the pain

Writing your own Dockerfile can be a bewildering process. With Cog, you define your environment with a simple configuration file and it generates a Docker image with all the best practices: Nvidia base images, efficient caching of dependencies, installing specific Python versions, sensible environment variable defaults, and so on.

No more CUDA hell

Cog knows which CUDA/cuDNN/PyTorch/Tensorflow/Python combos are compatible and will set it all up correctly for you.

Define inputs and outputs with standard Python

Then, Cog generates an OpenAPI schema and validates the inputs and outputs automatically.

Automatic HTTP prediction server

Your model’s types are used to dynamically generate a RESTful HTTP API using a high-performance Rust/Axum server. The server supports both synchronous and asynchronous predictions with webhooks for long-running models.

Ready for production

Deploy your model anywhere that Docker images run. Your own infrastructure, or Replicate.

Quick Example

Define the Docker environment your model runs in with cog.yaml:

build:
  gpu: true
  system_packages:
    - "libgl1-mesa-glx"
    - "libglib2.0-0"
  python_version: "3.12"
  python_requirements: requirements.txt
predict: "predict.py:Predictor"

Define how predictions are run on your model with predict.py:

from cog import BasePredictor, Input, Path
import torch

class Predictor(BasePredictor):
    def setup(self):
        """Load the model into memory to make running multiple predictions efficient"""
        self.model = torch.load("./weights.pth")

    def predict(self,
          image: Path = Input(description="Grayscale input image")
    ) -> Path:
        """Run a single prediction on the model"""
        processed_image = preprocess(image)
        output = self.model(processed_image)
        return postprocess(output)

Now, you can run predictions on this model:

cog predict -i [email protected]

Next Steps

Get Started

Follow our quickstart guide to run your first model

Deploy Models

Learn how to deploy your packaged models

Examples

Browse example models built with Cog

Join Discord

Get help from the community in #cog

Get Started

Guides

Configuration

Cog - Containers for Machine Learning

Cog: Containers for Machine Learning

Why Cog?

Installation

Quickstart

YAML Reference

Python API

Key Features

Docker containers without the pain

No more CUDA hell

Define inputs and outputs with standard Python

Automatic HTTP prediction server

Ready for production

Quick Example

Next Steps

Get Started

Deploy Models

Examples

Join Discord

Build docs developers (and LLMs) love

Get Started

Guides

Configuration

​Cog: Containers for Machine Learning

​Why Cog?

Installation

Quickstart

YAML Reference

Python API

​Key Features

​Docker containers without the pain

​No more CUDA hell

​Define inputs and outputs with standard Python

​Automatic HTTP prediction server

​Ready for production

​Quick Example

​Next Steps

Get Started

Deploy Models

Examples

Join Discord

Build docs developers (and LLMs) love

Cog: Containers for Machine Learning

Why Cog?

Key Features

Docker containers without the pain

No more CUDA hell

Define inputs and outputs with standard Python

Automatic HTTP prediction server

Ready for production

Quick Example

Next Steps