Workers AI provides access to AI models directly from Workers. Use the API to run inference on text, images, and embeddings.
Overview
Access the Workers AI API:
import Cloudflare from 'cloudflare';
const client = new Cloudflare({
apiToken: process.env.CLOUDFLARE_API_TOKEN,
});
// Access AI resources
const ai = client.ai;
Run inference
Execute AI models on-demand.
Text generation
Generate text using large language models.
const response = await client.ai.run(
'@cf/meta/llama-2-7b-chat-int8',
{
account_id: '023e105f4ecef8ad9ca31a8372d0c353',
prompt: 'Tell me about Cloudflare Workers',
}
);
console.log(response.response);
The AI model identifier (e.g., ‘@cf/meta/llama-2-7b-chat-int8’)
Your Cloudflare account ID
Maximum number of tokens to generate
Controls randomness (0.0 to 1.0, higher = more random)
Nucleus sampling parameter
The generated text response
Number of tokens in the prompt
Number of tokens in the completion
Total number of tokens used
Chat completion
Generate chat responses using a messages format.
const response = await client.ai.run(
'@cf/meta/llama-2-7b-chat-int8',
{
account_id: '023e105f4ecef8ad9ca31a8372d0c353',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is Workers AI?' },
],
}
);
Message role: ‘system’, ‘user’, or ‘assistant’
Text embeddings
Generate vector embeddings for text.
const response = await client.ai.run(
'@cf/baai/bge-base-en-v1.5',
{
account_id: '023e105f4ecef8ad9ca31a8372d0c353',
text: 'Cloudflare Workers AI is awesome',
}
);
console.log(response.data); // Array of embeddings
The embedding model (e.g., ‘@cf/baai/bge-base-en-v1.5’)
Text string or array of strings to embed
Array of embedding vectors
Shape of the embedding array [count, dimensions]
Text classification
Classify text into categories.
const response = await client.ai.run(
'@cf/huggingface/distilbert-sst-2-int8',
{
account_id: '023e105f4ecef8ad9ca31a8372d0c353',
text: 'This product is amazing!',
}
);
console.log(response); // [{ label: 'POSITIVE', score: 0.99 }]
Translation
Translate text between languages.
const response = await client.ai.run(
'@cf/meta/m2m100-1.2b',
{
account_id: '023e105f4ecef8ad9ca31a8372d0c353',
text: 'Hello, how are you?',
source_lang: 'en',
target_lang: 'es',
}
);
console.log(response.translated_text);
Source language code (e.g., ‘en’)
Target language code (e.g., ‘es’)
Summarization
Summarize long text.
const response = await client.ai.run(
'@cf/facebook/bart-large-cnn',
{
account_id: '023e105f4ecef8ad9ca31a8372d0c353',
input_text: longArticleText,
max_length: 150,
}
);
console.log(response.summary);
Maximum summary length in tokens
Text-to-image
Generate images from text descriptions.
const response = await client.ai.run(
'@cf/stabilityai/stable-diffusion-xl-base-1.0',
{
account_id: '023e105f4ecef8ad9ca31a8372d0c353',
prompt: 'A sunset over mountains',
}
);
// Response is a binary image
Text description of the image to generate
Things to avoid in the image
Number of diffusion steps (higher = better quality, slower)
How closely to follow the prompt (1-20)
Image classification
Classify images into categories.
const response = await client.ai.run(
'@cf/microsoft/resnet-50',
{
account_id: '023e105f4ecef8ad9ca31a8372d0c353',
image: imageArray, // Array of pixel values
}
);
console.log(response); // [{ label: 'dog', score: 0.95 }]
Image data as array of integers (8-bit unsigned)
Automatic speech recognition
Transcribe audio to text.
const response = await client.ai.run(
'@cf/openai/whisper',
{
account_id: '023e105f4ecef8ad9ca31a8372d0c353',
audio: audioArray,
}
);
console.log(response.text); // Transcribed text
Audio data as array of integers
Source language of the audio
Number of words transcribed
Models
Browse and discover available AI models.
List models
Retrieve all available AI models.
for await (const model of client.ai.models.list({
account_id: '023e105f4ecef8ad9ca31a8372d0c353',
})) {
console.log(model);
}
Your Cloudflare account ID
Task type (e.g., ‘text-generation’, ‘text-embeddings’)
Using Workers AI in Workers
Bind Workers AI to your Worker:
export default {
async fetch(request, env) {
const response = await env.AI.run(
'@cf/meta/llama-2-7b-chat-int8',
{
prompt: 'Tell me a joke',
}
);
return Response.json(response);
},
};
Configure the binding when deploying:
const version = await client.workers.beta.workers.versions.create(
workerId,
{
account_id: accountId,
main_module: 'worker.mjs',
compatibility_date: '2024-03-01',
bindings: [
{
type: 'ai',
name: 'AI',
},
],
modules: [...],
}
);
Best practices
- Model selection: Choose the right model for your task (size vs. performance)
- Caching: Cache embeddings and frequently used results
- Rate limiting: Implement rate limiting for user-facing applications
- Error handling: Handle model errors gracefully with fallbacks
- Streaming: Use streaming for long-running text generation
- Context length: Be mindful of model context limits