Skip to main content
Ollama Cloud allows you to run large language models without requiring powerful local hardware. Cloud models are automatically offloaded to Ollama’s cloud infrastructure while maintaining the same interface as local models.

What are Cloud Models?

Cloud models are a new type of model in Ollama that:
  • Run remotely on Ollama’s cloud infrastructure
  • Require no GPU on your local machine
  • Use the same API as local models
  • Work with existing tools and workflows
  • Enable access to larger models that won’t fit on personal computers
This makes it possible to run models like 120B parameter LLMs without investing in expensive hardware.

Getting Started

Sign In to Ollama

Cloud models require an account on ollama.com. Sign in or create an account:
ollama signin
Follow the prompts to authenticate.

Browse Cloud Models

Explore available cloud models at ollama.com/search?c=cloud.

Running Cloud Models

Run a cloud model directly from the terminal:
ollama run gpt-oss:120b-cloud
Cloud models work exactly like local models:
# Chat interactively
ollama run gpt-oss:120b-cloud

# One-off prompt
ollama run gpt-oss:120b-cloud "Explain quantum computing"

# Check running models
ollama ps

Direct Cloud API Access

Cloud models can also be accessed directly via ollama.com’s API without a local Ollama installation. In this mode, ollama.com acts as a remote Ollama host.

Authentication

1

Create API Key

Generate an API key at ollama.com/settings/keys.
2

Set Environment Variable

Export your API key:
export OLLAMA_API_KEY=your_api_key
Or in Windows PowerShell:
$env:OLLAMA_API_KEY="your_api_key"

List Available Models

Retrieve models available via the cloud API:
curl https://ollama.com/api/tags

Generate Responses

Install the library:
pip install ollama
Connect to the cloud API:
import os
from ollama import Client

client = Client(
    host="https://ollama.com",
    headers={'Authorization': 'Bearer ' + os.environ.get('OLLAMA_API_KEY')}
)

messages = [
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
]

for part in client.chat('gpt-oss:120b', messages=messages, stream=True):
  print(part['message']['content'], end='', flush=True)
When using the direct cloud API, use model names without the -cloud suffix (e.g., gpt-oss:120b instead of gpt-oss:120b-cloud).

Privacy and Data Handling

Ollama is designed with privacy in mind:

Local Models

  • Run entirely on your machine
  • No data sent to Ollama servers
  • Complete privacy and control

Cloud Models

  • Prompts and responses are processed to provide the service
  • No storage or logging of prompt/response content
  • No training on your data
  • Basic account info and usage metadata collected (not including content)
  • Data is never sold
  • You can delete your account anytime
For more details, see the Privacy Policy.

Disabling Cloud Features

If you prefer to run Ollama in local-only mode:

Using Configuration File

Edit ~/.ollama/server.json:
{
  "disable_ollama_cloud": true
}

Using Environment Variable

export OLLAMA_NO_CLOUD=1
Or in Windows PowerShell:
$env:OLLAMA_NO_CLOUD=1
Restart Ollama after changing the configuration. Check logs to verify:
Ollama cloud disabled: true
When cloud features are disabled:
  • Cloud models will not be accessible
  • All inference happens locally
  • No connections to ollama.com services

Benefits of Cloud Models

No GPU Required

Run large models without expensive hardware

Access Larger Models

Use 120B+ parameter models that won’t fit locally

Same API

Works with existing Ollama tools and workflows

Flexible Deployment

Mix local and cloud models based on your needs

Pricing

For current pricing information and usage limits, visit ollama.com/pricing.

Use Cases

Development and Testing

Quickly prototype with large models before deploying locally:
# Test with cloud model
ollama run gpt-oss:120b-cloud "Generate API documentation"

# Deploy locally when ready
ollama pull codellama:34b
ollama run codellama:34b "Generate API documentation"

Hybrid Workflows

Use local models for privacy-sensitive tasks and cloud models for heavy lifting:
import ollama

# Sensitive data - local model
local_response = ollama.chat(
    model='llama3.2',
    messages=[{'role': 'user', 'content': 'Analyze customer data'}]
)

# General purpose - cloud model
cloud_response = ollama.chat(
    model='gpt-oss:120b-cloud',
    messages=[{'role': 'user', 'content': 'Explain machine learning'}]
)

Resource-Constrained Environments

Run on devices without GPUs:
# Works on CPU-only laptops
ollama run gpt-oss:120b-cloud

FAQ

Yes, cloud models support the full Ollama API including streaming, embeddings, and multimodal inputs where applicable.
Absolutely. You can have both local and cloud models installed and switch between them at any time using the same commands.
Cloud models require an internet connection. If connectivity is lost, local models will continue to work normally.
It depends on your hardware. Cloud models have network latency but run on powerful infrastructure. Local models on high-end GPUs may be faster for small batches.
Yes, cloud models are designed for production use. Check the pricing page for rate limits and terms of service.

Next Steps

Model Library

Browse all available models

API Reference

Complete API documentation

Local Installation

Set up Ollama locally

Python SDK

Python integration guide

Build docs developers (and LLMs) love