This guide assumes you have a model packaged with Cog. If you don’t, follow the setting up your own model guide or use an example model.
Getting Started
Build your model
First, build your model into a Docker image:This creates a Docker image tagged as
my-model containing your model and all its dependencies.Start the Docker container
Run the container with the appropriate configuration for your model:
The
-d flag runs the container in detached mode. The -p 5001:5000 flag maps port 5000 from the container to port 5001 on your host machine.Verify the server is running
The server is now running locally on port 5001. View the OpenAPI schema to confirm:You can also open http://localhost:5001/openapi.json in your browser.
Running Predictions
To run a prediction, call the/predictions endpoint with a POST request:
Managing the Server
Stop the server
To stop the running container:View logs
To view the server logs:Restart the server
To restart the container:Server Configuration
Cog Docker images havepython -m cog.server.http set as the default command. When using command-line options, pass the full command before the options.
Controlling Threads
The--threads option controls how many requests Cog serves in parallel:
- CPU models: Defaults to the number of CPUs on your machine
- GPU models: Defaults to 1 (GPUs typically can only be used by one process)
Custom Host Configuration
By default, Cog serves on0.0.0.0. Use the --host option to override:
Deployment Options
Since Cog models are standard Docker containers, you can deploy them to any platform that supports Docker:- Cloud platforms: AWS ECS, Google Cloud Run, Azure Container Instances
- Kubernetes: Any Kubernetes cluster
- Serverless: AWS Lambda (with container support), Google Cloud Functions
- Replicate: Deploy directly to Replicate’s managed infrastructure
Next Steps
- Learn about the HTTP API for making predictions
- See example deployments for different platforms
- Explore the Python API reference for advanced features