Skip to main content
Workers AI provides access to AI models directly from Workers. Use the API to run inference on text, images, and embeddings.

Overview

Access the Workers AI API:
import Cloudflare from 'cloudflare';

const client = new Cloudflare({
  apiToken: process.env.CLOUDFLARE_API_TOKEN,
});

// Access AI resources
const ai = client.ai;

Run inference

Execute AI models on-demand.

Text generation

Generate text using large language models.
const response = await client.ai.run(
  '@cf/meta/llama-2-7b-chat-int8',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    prompt: 'Tell me about Cloudflare Workers',
  }
);

console.log(response.response);
model_name
string
required
The AI model identifier (e.g., ‘@cf/meta/llama-2-7b-chat-int8’)
account_id
string
required
Your Cloudflare account ID
prompt
string
required
The input text prompt
max_tokens
number
Maximum number of tokens to generate
temperature
number
Controls randomness (0.0 to 1.0, higher = more random)
top_p
number
Nucleus sampling parameter
top_k
number
Top-k sampling parameter
response
string
The generated text response
usage
object
Token usage statistics
usage.prompt_tokens
number
Number of tokens in the prompt
usage.completion_tokens
number
Number of tokens in the completion
usage.total_tokens
number
Total number of tokens used

Chat completion

Generate chat responses using a messages format.
const response = await client.ai.run(
  '@cf/meta/llama-2-7b-chat-int8',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'What is Workers AI?' },
    ],
  }
);
messages
array
required
Array of message objects
messages.role
string
required
Message role: ‘system’, ‘user’, or ‘assistant’
messages.content
string
required
Message content

Text embeddings

Generate vector embeddings for text.
const response = await client.ai.run(
  '@cf/baai/bge-base-en-v1.5',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    text: 'Cloudflare Workers AI is awesome',
  }
);

console.log(response.data); // Array of embeddings
model_name
string
required
The embedding model (e.g., ‘@cf/baai/bge-base-en-v1.5’)
text
string | array
required
Text string or array of strings to embed
data
array
Array of embedding vectors
shape
array
Shape of the embedding array [count, dimensions]

Text classification

Classify text into categories.
const response = await client.ai.run(
  '@cf/huggingface/distilbert-sst-2-int8',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    text: 'This product is amazing!',
  }
);

console.log(response); // [{ label: 'POSITIVE', score: 0.99 }]
label
string
Classification label
score
number
Confidence score (0-1)

Translation

Translate text between languages.
const response = await client.ai.run(
  '@cf/meta/m2m100-1.2b',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    text: 'Hello, how are you?',
    source_lang: 'en',
    target_lang: 'es',
  }
);

console.log(response.translated_text);
text
string
required
Text to translate
source_lang
string
Source language code (e.g., ‘en’)
target_lang
string
required
Target language code (e.g., ‘es’)

Summarization

Summarize long text.
const response = await client.ai.run(
  '@cf/facebook/bart-large-cnn',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    input_text: longArticleText,
    max_length: 150,
  }
);

console.log(response.summary);
input_text
string
required
Text to summarize
max_length
number
Maximum summary length in tokens

Text-to-image

Generate images from text descriptions.
const response = await client.ai.run(
  '@cf/stabilityai/stable-diffusion-xl-base-1.0',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    prompt: 'A sunset over mountains',
  }
);

// Response is a binary image
prompt
string
required
Text description of the image to generate
negative_prompt
string
Things to avoid in the image
num_steps
number
Number of diffusion steps (higher = better quality, slower)
guidance
number
How closely to follow the prompt (1-20)
width
number
Image width in pixels
height
number
Image height in pixels

Image classification

Classify images into categories.
const response = await client.ai.run(
  '@cf/microsoft/resnet-50',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    image: imageArray, // Array of pixel values
  }
);

console.log(response); // [{ label: 'dog', score: 0.95 }]
image
array
required
Image data as array of integers (8-bit unsigned)

Automatic speech recognition

Transcribe audio to text.
const response = await client.ai.run(
  '@cf/openai/whisper',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    audio: audioArray,
  }
);

console.log(response.text); // Transcribed text
audio
array
required
Audio data as array of integers
source_lang
string
Source language of the audio
text
string
The transcribed text
word_count
number
Number of words transcribed

Models

Browse and discover available AI models.

List models

Retrieve all available AI models.
for await (const model of client.ai.models.list({
  account_id: '023e105f4ecef8ad9ca31a8372d0c353',
})) {
  console.log(model);
}
account_id
string
required
Your Cloudflare account ID
name
string
Model identifier
description
string
Model description
task
string
Task type (e.g., ‘text-generation’, ‘text-embeddings’)

Using Workers AI in Workers

Bind Workers AI to your Worker:
export default {
  async fetch(request, env) {
    const response = await env.AI.run(
      '@cf/meta/llama-2-7b-chat-int8',
      {
        prompt: 'Tell me a joke',
      }
    );
    
    return Response.json(response);
  },
};
Configure the binding when deploying:
const version = await client.workers.beta.workers.versions.create(
  workerId,
  {
    account_id: accountId,
    main_module: 'worker.mjs',
    compatibility_date: '2024-03-01',
    bindings: [
      {
        type: 'ai',
        name: 'AI',
      },
    ],
    modules: [...],
  }
);

Best practices

  1. Model selection: Choose the right model for your task (size vs. performance)
  2. Caching: Cache embeddings and frequently used results
  3. Rate limiting: Implement rate limiting for user-facing applications
  4. Error handling: Handle model errors gracefully with fallbacks
  5. Streaming: Use streaming for long-running text generation
  6. Context length: Be mindful of model context limits

Build docs developers (and LLMs) love