Cloudflare Workers AI provides free access to various open-source models running on Cloudflare’s edge network.
Overview
Cloudflare Workers AI offers serverless inference for multiple open-source models, running on Cloudflare’s global network for low latency worldwide.Rate Limits
Free Allocation: 10,000 neurons per day
Available Models
Cloudflare Workers AI offers a wide variety of models:Featured Models
- @cf/openai/gpt-oss-120b - Open-source GPT model
- @cf/qwen/qwen3-30b-a3b-fp8 - Qwen 3 model
- Llama 4 Scout Instruct - Latest Llama model
- Llama 3.3 70B Instruct (FP8) - Optimized Llama 3.3
- Gemma 3 12B Instruct - Google’s Gemma model
- Mistral Small 3.1 24B Instruct - Mistral’s efficient model
DeepSeek Models
- DeepSeek R1 Distill Qwen 32B
- Deepseek Coder 6.7B Base (AWQ)
- Deepseek Coder 6.7B Instruct (AWQ)
- Deepseek Math 7B Instruct
Llama Models
- Llama 2 7B Chat (FP16, INT8, LoRA)
- Llama 2 13B Chat (AWQ)
- Llama 3 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (AWQ, FP8)
- Llama 3.2 1B, 3B, 11B Vision Instruct
- Llama 3.3 70B Instruct (FP8)
- Llama 4 Scout Instruct
- Llama Guard 3 8B
Mistral Models
- Mistral 7B Instruct v0.1 (AWQ)
- Mistral 7B Instruct v0.2 (LoRA)
- Mistral Small 3.1 24B Instruct
- Hermes 2 Pro Mistral 7B
Qwen Models
- Qwen 1.5 (0.5B, 1.8B, 7B, 14B)
- Qwen 2.5 Coder 32B Instruct
- Qwen QwQ 32B
Gemma Models
- Gemma 2B Instruct (LoRA)
- Gemma 3 12B Instruct
- Gemma 7B Instruct (LoRA)
Other Models
- @cf/aisingapore/gemma-sea-lion-v4-27b-it
- @cf/ibm-granite/granite-4.0-h-micro
- @cf/zai-org/glm-4.7-flash
- Discolm German 7B v1 (AWQ)
- Falcom 7B Instruct
- Neural Chat 7B v3.1 (AWQ)
- OpenChat 3.5 0106
- OpenHermes 2.5 Mistral 7B (AWQ)
- Phi-2
- SQLCoder 7B 2
- Starling LM 7B Beta
- TinyLlama 1.1B Chat v1.0
- Una Cybertron 7B v2 (BF16)
- Zephyr 7B Beta (AWQ)
API Usage
Getting Started
Create Cloudflare Account
Sign up at cloudflare.com
Key Features
Global Network
Run inference on Cloudflare’s edge network
Low Latency
Models run close to your users
Multiple Models
50+ open-source models available
Optimized Variants
AWQ, LoRA, FP8, INT8 quantized models
Serverless
No infrastructure to manage
Pay As You Go
Free tier with neurons-based pricing
Model Optimizations
Cloudflare offers various optimized versions:
- AWQ: Activation-aware Weight Quantization (4-bit)
- FP8: 8-bit floating point
- INT8: 8-bit integer quantization
- LoRA: Low-Rank Adaptation for fine-tuning
Use Cases
- Edge AI: Run AI close to your users
- Chatbots: Build conversational interfaces
- Content Generation: Generate text content
- Code Assistance: Code completion and generation
- Translation: Multilingual applications
- Global Applications: Low-latency worldwide
Additional Resources
Cloudflare Dashboard
Manage your account
Documentation
Official documentation
Model Catalog
Browse all models
Pricing
Pricing details
