Workers AI

Workers AI provides access to AI models directly from Workers. Use the API to run inference on text, images, and embeddings.

Overview

Access the Workers AI API:

import Cloudflare from 'cloudflare';

const client = new Cloudflare({
  apiToken: process.env.CLOUDFLARE_API_TOKEN,
});

// Access AI resources
const ai = client.ai;

Run inference

Execute AI models on-demand.

Text generation

Generate text using large language models.

const response = await client.ai.run(
  '@cf/meta/llama-2-7b-chat-int8',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    prompt: 'Tell me about Cloudflare Workers',
  }
);

console.log(response.response);

model_name

string

required

The AI model identifier (e.g., ‘@cf/meta/llama-2-7b-chat-int8’)

account_id

string

required

Your Cloudflare account ID

prompt

string

required

The input text prompt

max_tokens

number

Maximum number of tokens to generate

temperature

number

Controls randomness (0.0 to 1.0, higher = more random)

top_p

number

Nucleus sampling parameter

top_k

number

Top-k sampling parameter

response

string

The generated text response

usage

object

Token usage statistics

usage.prompt_tokens

number

Number of tokens in the prompt

usage.completion_tokens

number

Number of tokens in the completion

usage.total_tokens

number

Total number of tokens used

Chat completion

Generate chat responses using a messages format.

const response = await client.ai.run(
  '@cf/meta/llama-2-7b-chat-int8',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'What is Workers AI?' },
    ],
  }
);

messages

array

required

Array of message objects

messages.role

string

required

Message role: ‘system’, ‘user’, or ‘assistant’

messages.content

string

required

Message content

Text embeddings

Generate vector embeddings for text.

const response = await client.ai.run(
  '@cf/baai/bge-base-en-v1.5',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    text: 'Cloudflare Workers AI is awesome',
  }
);

console.log(response.data); // Array of embeddings

model_name

string

required

The embedding model (e.g., ‘@cf/baai/bge-base-en-v1.5’)

text

string | array

required

Text string or array of strings to embed

data

array

Array of embedding vectors

shape

array

Shape of the embedding array [count, dimensions]

Text classification

Classify text into categories.

const response = await client.ai.run(
  '@cf/huggingface/distilbert-sst-2-int8',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    text: 'This product is amazing!',
  }
);

console.log(response); // [{ label: 'POSITIVE', score: 0.99 }]

label

string

Classification label

score

number

Confidence score (0-1)

Translation

Translate text between languages.

const response = await client.ai.run(
  '@cf/meta/m2m100-1.2b',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    text: 'Hello, how are you?',
    source_lang: 'en',
    target_lang: 'es',
  }
);

console.log(response.translated_text);

text

string

required

Text to translate

source_lang

string

Source language code (e.g., ‘en’)

target_lang

string

required

Target language code (e.g., ‘es’)

Summarization

Summarize long text.

const response = await client.ai.run(
  '@cf/facebook/bart-large-cnn',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    input_text: longArticleText,
    max_length: 150,
  }
);

console.log(response.summary);

input_text

string

required

Text to summarize

max_length

number

Maximum summary length in tokens

Text-to-image

Generate images from text descriptions.

const response = await client.ai.run(
  '@cf/stabilityai/stable-diffusion-xl-base-1.0',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    prompt: 'A sunset over mountains',
  }
);

// Response is a binary image

prompt

string

required

Text description of the image to generate

negative_prompt

string

Things to avoid in the image

num_steps

number

Number of diffusion steps (higher = better quality, slower)

guidance

number

How closely to follow the prompt (1-20)

width

number

Image width in pixels

height

number

Image height in pixels

Image classification

Classify images into categories.

const response = await client.ai.run(
  '@cf/microsoft/resnet-50',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    image: imageArray, // Array of pixel values
  }
);

console.log(response); // [{ label: 'dog', score: 0.95 }]

image

array

required

Image data as array of integers (8-bit unsigned)

Automatic speech recognition

Transcribe audio to text.

const response = await client.ai.run(
  '@cf/openai/whisper',
  {
    account_id: '023e105f4ecef8ad9ca31a8372d0c353',
    audio: audioArray,
  }
);

console.log(response.text); // Transcribed text

audio

array

required

Audio data as array of integers

source_lang

string

Source language of the audio

text

string

The transcribed text

word_count

number

Number of words transcribed

Models

Browse and discover available AI models.

List models

Retrieve all available AI models.

for await (const model of client.ai.models.list({
  account_id: '023e105f4ecef8ad9ca31a8372d0c353',
})) {
  console.log(model);
}

account_id

string

required

Your Cloudflare account ID

name

string

Model identifier

description

string

Model description

task

string

Task type (e.g., ‘text-generation’, ‘text-embeddings’)

Using Workers AI in Workers

Bind Workers AI to your Worker:

export default {
  async fetch(request, env) {
    const response = await env.AI.run(
      '@cf/meta/llama-2-7b-chat-int8',
      {
        prompt: 'Tell me a joke',
      }
    );
    
    return Response.json(response);
  },
};

Configure the binding when deploying:

const version = await client.workers.beta.workers.versions.create(
  workerId,
  {
    account_id: accountId,
    main_module: 'worker.mjs',
    compatibility_date: '2024-03-01',
    bindings: [
      {
        type: 'ai',
        name: 'AI',
      },
    ],
    modules: [...],
  }
);

Best practices

Model selection: Choose the right model for your task (size vs. performance)
Caching: Cache embeddings and frequently used results
Rate limiting: Implement rate limiting for user-facing applications
Error handling: Handle model errors gracefully with fallbacks
Streaming: Use streaming for long-running text generation
Context length: Be mindful of model context limits

Core Resources

Workers & Serverless

Application Services

Security & Access

Network Services

Developer Platform

Management & Monitoring

Overview

Run inference

Text generation

Chat completion

Text embeddings

Text classification

Translation

Summarization

Text-to-image

Image classification

Automatic speech recognition

Models

List models

Using Workers AI in Workers

Best practices

Build docs developers (and LLMs) love

Core Resources

Workers & Serverless

Application Services

Security & Access

Network Services

Developer Platform

Management & Monitoring

​Overview

​Run inference

​Text generation

​Chat completion

​Text embeddings

​Text classification

​Translation

​Summarization

​Text-to-image

​Image classification

​Automatic speech recognition

​Models

​List models

​Using Workers AI in Workers

​Best practices

Build docs developers (and LLMs) love

Overview

Run inference

Text generation

Chat completion

Text embeddings

Text classification

Translation

Summarization

Text-to-image

Image classification

Automatic speech recognition

Models

List models

Using Workers AI in Workers

Best practices