Cloudflare Workers AI

Cloudflare Workers AI provides free access to various open-source models running on Cloudflare’s edge network.

Overview

Cloudflare Workers AI offers serverless inference for multiple open-source models, running on Cloudflare’s global network for low latency worldwide.

Rate Limits

Free Allocation: 10,000 neurons per day

View detailed pricing

Neurons are Cloudflare’s unit of measurement for AI workloads. Different models and operations consume different amounts of neurons.

Available Models

Cloudflare Workers AI offers a wide variety of models:

Featured Models

@cf/openai/gpt-oss-120b - Open-source GPT model
@cf/qwen/qwen3-30b-a3b-fp8 - Qwen 3 model
Llama 4 Scout Instruct - Latest Llama model
Llama 3.3 70B Instruct (FP8) - Optimized Llama 3.3
Gemma 3 12B Instruct - Google’s Gemma model
Mistral Small 3.1 24B Instruct - Mistral’s efficient model

DeepSeek Models

DeepSeek R1 Distill Qwen 32B
Deepseek Coder 6.7B Base (AWQ)
Deepseek Coder 6.7B Instruct (AWQ)
Deepseek Math 7B Instruct

Llama Models

Llama 2 7B Chat (FP16, INT8, LoRA)
Llama 2 13B Chat (AWQ)
Llama 3 8B Instruct (AWQ)
Llama 3.1 8B Instruct (AWQ, FP8)
Llama 3.2 1B, 3B, 11B Vision Instruct
Llama 3.3 70B Instruct (FP8)
Llama 4 Scout Instruct
Llama Guard 3 8B

Mistral Models

Mistral 7B Instruct v0.1 (AWQ)
Mistral 7B Instruct v0.2 (LoRA)
Mistral Small 3.1 24B Instruct
Hermes 2 Pro Mistral 7B

Qwen Models

Qwen 1.5 (0.5B, 1.8B, 7B, 14B)
Qwen 2.5 Coder 32B Instruct
Qwen QwQ 32B

Gemma Models

Gemma 2B Instruct (LoRA)
Gemma 3 12B Instruct
Gemma 7B Instruct (LoRA)

Other Models

@cf/aisingapore/gemma-sea-lion-v4-27b-it
@cf/ibm-granite/granite-4.0-h-micro
@cf/zai-org/glm-4.7-flash
Discolm German 7B v1 (AWQ)
Falcom 7B Instruct
Neural Chat 7B v3.1 (AWQ)
OpenChat 3.5 0106
OpenHermes 2.5 Mistral 7B (AWQ)
Phi-2
SQLCoder 7B 2
Starling LM 7B Beta
TinyLlama 1.1B Chat v1.0
Una Cybertron 7B v2 (BF16)
Zephyr 7B Beta (AWQ)

API Usage

import requests

url = "https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/ai/run/@cf/meta/llama-3.3-70b-instruct-fp8"

headers = {
    "Authorization": "Bearer YOUR_CLOUDFLARE_API_TOKEN",
    "Content-Type": "application/json"
}

data = {
    "messages": [
        {"role": "user", "content": "Hello, how are you?"}
    ]
}

response = requests.post(url, headers=headers, json=data)
print(response.json())

Getting Started

Create Cloudflare Account

Get API Token

Create an API token with Workers AI permissions

Find Account ID

Get your account ID from the dashboard

Start Building

Use the REST API to run inference

Key Features

Global Network

Run inference on Cloudflare’s edge network

Low Latency

Models run close to your users

Multiple Models

50+ open-source models available

Optimized Variants

AWQ, LoRA, FP8, INT8 quantized models

Serverless

No infrastructure to manage

Pay As You Go

Free tier with neurons-based pricing

Model Optimizations

Cloudflare offers various optimized versions:

AWQ: Activation-aware Weight Quantization (4-bit)
FP8: 8-bit floating point
INT8: 8-bit integer quantization
LoRA: Low-Rank Adaptation for fine-tuning

Use Cases

Edge AI: Run AI close to your users
Chatbots: Build conversational interfaces
Content Generation: Generate text content
Code Assistance: Code completion and generation
Translation: Multilingual applications
Global Applications: Low-latency worldwide

Additional Resources

Cloudflare Dashboard

Manage your account

Documentation

Official documentation

Model Catalog

Browse all models

Pricing

Pricing details

Always Free

Overview

Rate Limits

Available Models

Featured Models

DeepSeek Models

Llama Models

Mistral Models

Qwen Models

Gemma Models

Other Models

API Usage

Getting Started

Key Features

Global Network

Low Latency

Multiple Models

Optimized Variants

Serverless

Pay As You Go

Model Optimizations

Use Cases

Additional Resources

Cloudflare Dashboard

Documentation

Model Catalog

Pricing

Build docs developers (and LLMs) love

Always Free

​Overview

​Rate Limits

​Available Models

​Featured Models

​DeepSeek Models

​Llama Models

​Mistral Models

​Qwen Models

​Gemma Models

​Other Models

​API Usage

​Getting Started

​Key Features

Global Network

Low Latency

Multiple Models

Optimized Variants

Serverless

Pay As You Go

​Model Optimizations

​Use Cases

​Additional Resources

Cloudflare Dashboard

Documentation

Model Catalog

Pricing

Build docs developers (and LLMs) love

Overview

Rate Limits

Available Models

Featured Models

DeepSeek Models

Llama Models

Mistral Models

Qwen Models

Gemma Models

Other Models

API Usage

Getting Started

Key Features

Model Optimizations

Use Cases

Additional Resources