Cloud

Ollama Cloud allows you to run large language models without requiring powerful local hardware. Cloud models are automatically offloaded to Ollama’s cloud infrastructure while maintaining the same interface as local models.

What are Cloud Models?

Cloud models are a new type of model in Ollama that:

Run remotely on Ollama’s cloud infrastructure
Require no GPU on your local machine
Use the same API as local models
Work with existing tools and workflows
Enable access to larger models that won’t fit on personal computers

This makes it possible to run models like 120B parameter LLMs without investing in expensive hardware.

Getting Started

Cloud models require an account on ollama.com. Sign in or create an account:

ollama signin

Follow the prompts to authenticate.

Browse Cloud Models

Explore available cloud models at ollama.com/search?c=cloud.

Running Cloud Models

CLI
Python
JavaScript
cURL

Run a cloud model directly from the terminal:

ollama run gpt-oss:120b-cloud

Cloud models work exactly like local models:

# Chat interactively
ollama run gpt-oss:120b-cloud

# One-off prompt
ollama run gpt-oss:120b-cloud "Explain quantum computing"

# Check running models
ollama ps

First, pull the cloud model:

ollama pull gpt-oss:120b-cloud

Install the Ollama Python library:

pip install ollama

Use the cloud model in your code:

from ollama import Client

client = Client()

messages = [
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
]

for part in client.chat('gpt-oss:120b-cloud', messages=messages, stream=True):
  print(part['message']['content'], end='', flush=True)

Print the complete response:

print()  # New line after streaming

First, pull the cloud model:

ollama pull gpt-oss:120b-cloud

Install the Ollama JavaScript library:

npm install ollama

Use the cloud model in your application:

import { Ollama } from "ollama";

const ollama = new Ollama();

const response = await ollama.chat({
  model: "gpt-oss:120b-cloud",
  messages: [{ role: "user", content: "Explain quantum computing" }],
  stream: true,
});

for await (const part of response) {
  process.stdout.write(part.message.content);
}

First, pull the cloud model:

ollama pull gpt-oss:120b-cloud

Make API requests using cURL:

curl http://localhost:11434/api/chat -d '{
  "model": "gpt-oss:120b-cloud",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

With streaming:

curl http://localhost:11434/api/chat -d '{
  "model": "gpt-oss:120b-cloud",
  "messages": [{
    "role": "user",
    "content": "Tell me a story"
  }],
  "stream": true
}'

Direct Cloud API Access

Cloud models can also be accessed directly via ollama.com’s API without a local Ollama installation. In this mode, ollama.com acts as a remote Ollama host.

Authentication

Create API Key

Generate an API key at ollama.com/settings/keys.

Set Environment Variable

Export your API key:

export OLLAMA_API_KEY=your_api_key

Or in Windows PowerShell:

$env:OLLAMA_API_KEY="your_api_key"

List Available Models

Retrieve models available via the cloud API:

curl https://ollama.com/api/tags

Generate Responses

Python
JavaScript
cURL

Install the library:

pip install ollama

Connect to the cloud API:

import os
from ollama import Client

client = Client(
    host="https://ollama.com",
    headers={'Authorization': 'Bearer ' + os.environ.get('OLLAMA_API_KEY')}
)

messages = [
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
]

for part in client.chat('gpt-oss:120b', messages=messages, stream=True):
  print(part['message']['content'], end='', flush=True)

When using the direct cloud API, use model names without the -cloud suffix (e.g., gpt-oss:120b instead of gpt-oss:120b-cloud).

Install the library:

npm install ollama

Connect to the cloud API:

import { Ollama } from "ollama";

const ollama = new Ollama({
  host: "https://ollama.com",
  headers: {
    Authorization: "Bearer " + process.env.OLLAMA_API_KEY,
  },
});

const response = await ollama.chat({
  model: "gpt-oss:120b",
  messages: [{ role: "user", content: "Explain quantum computing" }],
  stream: true,
});

for await (const part of response) {
  process.stdout.write(part.message.content);
}

Make authenticated requests:

curl https://ollama.com/api/chat \
  -H "Authorization: Bearer $OLLAMA_API_KEY" \
  -d '{
    "model": "gpt-oss:120b",
    "messages": [{
      "role": "user",
      "content": "Why is the sky blue?"
    }],
    "stream": false
  }'

With streaming enabled:

curl https://ollama.com/api/chat \
  -H "Authorization: Bearer $OLLAMA_API_KEY" \
  -d '{
    "model": "gpt-oss:120b",
    "messages": [{
      "role": "user",
      "content": "Write a haiku about programming"
    }],
    "stream": true
  }'

Privacy and Data Handling

Ollama is designed with privacy in mind:

Local Models

Run entirely on your machine
No data sent to Ollama servers
Complete privacy and control

Cloud Models

Prompts and responses are processed to provide the service
No storage or logging of prompt/response content
No training on your data
Basic account info and usage metadata collected (not including content)
Data is never sold
You can delete your account anytime

For more details, see the Privacy Policy.

Disabling Cloud Features

If you prefer to run Ollama in local-only mode:

Using Configuration File

Edit ~/.ollama/server.json:

{
  "disable_ollama_cloud": true
}

Using Environment Variable

export OLLAMA_NO_CLOUD=1

Or in Windows PowerShell:

$env:OLLAMA_NO_CLOUD=1

Restart Ollama after changing the configuration. Check logs to verify:

Ollama cloud disabled: true

When cloud features are disabled:

Cloud models will not be accessible
All inference happens locally
No connections to ollama.com services

Benefits of Cloud Models

No GPU Required

Run large models without expensive hardware

Access Larger Models

Use 120B+ parameter models that won’t fit locally

Same API

Works with existing Ollama tools and workflows

Flexible Deployment

Mix local and cloud models based on your needs

Pricing

For current pricing information and usage limits, visit ollama.com/pricing.

Use Cases

Development and Testing

Quickly prototype with large models before deploying locally:

# Test with cloud model
ollama run gpt-oss:120b-cloud "Generate API documentation"

# Deploy locally when ready
ollama pull codellama:34b
ollama run codellama:34b "Generate API documentation"

Hybrid Workflows

Use local models for privacy-sensitive tasks and cloud models for heavy lifting:

import ollama

# Sensitive data - local model
local_response = ollama.chat(
    model='llama3.2',
    messages=[{'role': 'user', 'content': 'Analyze customer data'}]
)

# General purpose - cloud model
cloud_response = ollama.chat(
    model='gpt-oss:120b-cloud',
    messages=[{'role': 'user', 'content': 'Explain machine learning'}]
)

Resource-Constrained Environments

Run on devices without GPUs:

# Works on CPU-only laptops
ollama run gpt-oss:120b-cloud

FAQ

Do cloud models support all Ollama features?

Yes, cloud models support the full Ollama API including streaming, embeddings, and multimodal inputs where applicable.

Can I switch between local and cloud models?

Absolutely. You can have both local and cloud models installed and switch between them at any time using the same commands.

What happens if I lose internet connectivity?

Cloud models require an internet connection. If connectivity is lost, local models will continue to work normally.

Are cloud models faster than local models?

It depends on your hardware. Cloud models have network latency but run on powerful infrastructure. Local models on high-end GPUs may be faster for small batches.

Can I use cloud models in production?

Yes, cloud models are designed for production use. Check the pricing page for rate limits and terms of service.

Next Steps

Model Library

Browse all available models

API Reference

Complete API documentation

Local Installation

Set up Ollama locally

Python SDK

Python integration guide

Get Started

Core Concepts

Features

Integrations

Platform Guides

Advanced

Resources

What are Cloud Models?

Getting Started

Browse Cloud Models

Running Cloud Models

Direct Cloud API Access

Authentication

List Available Models

Generate Responses

Privacy and Data Handling

Local Models

Cloud Models

Disabling Cloud Features

Using Configuration File

Using Environment Variable

Benefits of Cloud Models

No GPU Required

Access Larger Models

Same API

Flexible Deployment

Pricing

Use Cases

Development and Testing

Hybrid Workflows

Resource-Constrained Environments

FAQ

Next Steps

Model Library

API Reference

Local Installation

Python SDK

Build docs developers (and LLMs) love

Get Started

Core Concepts

Features

Integrations

Platform Guides

Advanced

Resources

​What are Cloud Models?

​Getting Started

​Sign In to Ollama

​Browse Cloud Models

​Running Cloud Models

​Direct Cloud API Access

​Authentication

​List Available Models

​Generate Responses

​Privacy and Data Handling

​Local Models

​Cloud Models

​Disabling Cloud Features

​Using Configuration File

​Using Environment Variable

​Benefits of Cloud Models

No GPU Required

Access Larger Models

Same API

Flexible Deployment

​Pricing

​Use Cases

​Development and Testing

​Hybrid Workflows

​Resource-Constrained Environments

​FAQ

​Next Steps

Model Library

API Reference

Local Installation

Python SDK

Build docs developers (and LLMs) love

What are Cloud Models?

Getting Started

Sign In to Ollama

Browse Cloud Models

Running Cloud Models

Direct Cloud API Access

Authentication

List Available Models

Generate Responses

Privacy and Data Handling

Local Models

Cloud Models

Disabling Cloud Features

Using Configuration File

Using Environment Variable

Benefits of Cloud Models

Pricing

Use Cases

Development and Testing

Hybrid Workflows

Resource-Constrained Environments

FAQ

Next Steps