Message per response

Token streaming with message-per-response is a pattern where every token generated by your model for a given response is appended to a single Ably message. Each complete AI response then appears as one message in the channel history while delivering live tokens in realtime. This uses Ably Pub/Sub for realtime communication between agents and clients. This pattern is useful for chat-style applications where you want each complete AI response stored as a single message in history, making it easy to retrieve and display multi-response conversation history. Each agent response becomes a single message that grows as tokens are appended, allowing clients joining mid-stream to catch up efficiently without processing thousands of individual tokens. The message-per-response pattern includes automatic rate limit protection through rollups, making it the recommended approach for most token streaming use cases.

How it works

Initial message: When an agent response begins, publish an initial message with message.create action to the Ably channel that is either empty or contains the first token as content.
Token streaming: Append subsequent tokens to the original message by publishing those tokens with the message.append action.
Live delivery: Clients subscribed to the channel receive each appended token in realtime, allowing them to progressively render the response.
Compacted history: The channel history contains only one message per agent response, which includes all appended tokens concatenated as contiguous text.

You do not need to mark the message or token stream as completed; the final message content automatically includes the full response constructed from all appended tokens.

Enable appends

Message append functionality requires “Message annotations, updates, deletes and appends” to be enabled in a channel rule associated with the channel. To enable the channel rule:

Go to the Ably dashboard and select your app.
Navigate to the “Configuration” > “Rules” section from the left-hand navigation bar.
Choose “Add new rule”.
Enter a channel name or namespace pattern (e.g. ai for all channels starting with ai:).
Select the “Message annotations, updates, deletes and appends” option from the list.
Click “Create channel rule”.

The examples on this page use the ai: namespace prefix, which assumes you have configured the rule for ai. Your token or API key needs the following capabilities on the channel:

Capability	Purpose
`subscribe`	Receive messages
`history`	Retrieve historical messages for client hydration
`publish`	Create new messages
`message-update-own`	Append to your own messages

Publishing tokens

Publish tokens from a Realtime client, which maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see Realtime and REST. Channels separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel. Use the get() method to create or retrieve a channel instance: To start streaming an AI response, publish the initial message. The message is identified by a server-assigned identifier called a serial. Use the serial to append each subsequent token to the message as it arrives from the AI model: When publishing tokens, don’t await the channel.appendMessage() call. Ably rolls up acknowledgments and debounces them for efficiency, which means awaiting each append would unnecessarily slow down your token stream. Messages are still published in the order that appendMessage() is called, so delivery order is not affected.

Handling append failures

The examples above append successive tokens to a response message by pipelining the append operations — that is, the agent publishes an append operation without waiting for prior operations to complete. This is necessary in order to avoid the append rate being capped by the round-trip time from the agent to the Ably endpoint. However, this means that the agent does not await the outcome of each append operation, and that can result in the agent continuing to submit append operations after an earlier operation has failed. For example, if a rate limit is exceeded, a single append may be rejected while the following tokens continue to be accepted. The agent needs to obtain the outcome of each append operation, and take corrective action in the event that any operation failed for some reason. A simple but effective way to do this is to ensure that, if streaming of a response fails for any reason, the message is updated with the final complete response text once it is available. This means that although the streaming experience is disrupted in the case of failure, there is no consistency problem with the final result once the response completes. To detect append failures, keep a reference to each append operation and check for rejections after the stream completes: If any append fails, use updateMessage() to replace the message content with the complete response. This ensures subscribers receive the full response regardless of any gaps caused by failed appends. The message.update action replaces the entire message content, so subscribers will have the complete response after processing the update. This pattern allows publishing append operations for multiple concurrent model responses on the same channel. As long as you append to the correct message serial, tokens from different responses will not interfere with each other, and the final concatenated message for each response will contain only the tokens from that response.

Configuring rollup behaviour

By default, AI Transport automatically rolls up tokens into messages at a rate of 25 messages per second (using a 40ms rollup window). This protects you from hitting connection rate limits while maintaining a smooth user experience. You can tune this behaviour when establishing your connection to balance between message costs and delivery speed: The appendRollupWindow parameter controls how many tokens are combined into each published message for a given model output rate. This creates a trade-off between delivery smoothness and the number of concurrent model responses you can stream on a single connection:

With a shorter rollup window, tokens are published more frequently, creating a smoother, more fluid experience for users as they see the response appear in more fine-grained chunks. However, this consumes more of your connection’s message rate capacity, limiting how many simultaneous model responses you can stream.
With a longer rollup window, multiple tokens are batched together into fewer messages, allowing you to run more concurrent response streams on the same connection, but users will notice tokens arriving in larger chunks.

The default 40ms window strikes a balance, delivering tokens at 25 messages per second - smooth enough for a great user experience while allowing you to run two simultaneous response streams on a single connection. If you need to support more concurrent streams, increase the rollup window (up to 500ms), accepting that tokens will arrive in more noticeable batches. Alternatively, instantiate a separate Ably client which uses its own connection, giving you access to additional message rate capacity.

Subscribing to token streams

Subscribers receive different message actions depending on when they join and how they’re retrieving messages. Each message has an action field that indicates how to process it, and a serial field that identifies which message the action relates to:

message.create: Indicates a new response has started (i.e. a new message was created). The message data contains the initial content (often empty or the first token). Store this as the beginning of a new response using serial as the identifier.
message.append: Contains a single token fragment to append. The message data contains only the new token, not the full concatenated response. Append this token to the existing response identified by serial.
message.update: Contains the whole response up to that point. The message data contains the full concatenated text so far. Replace the entire response content with this data for the message identified by serial. This action occurs when the channel needs to resynchronize the full message state, such as after a client resumes from a transient disconnection.

Client hydration

When clients connect or reconnect, such as after a page refresh, they often need to catch up on complete responses and individual tokens that were published while they were offline or before they joined. The message per response pattern enables efficient client state hydration without needing to process every individual token and supports seamlessly transitioning from historical responses to live tokens.

Using rewind for recent history

The simplest approach is to use Ably’s rewind channel option to attach to the channel at some point in the recent past, and automatically receive all messages since that point. Historical messages are delivered as message.update events containing the complete concatenated response, which then seamlessly transition to live message.append events for any ongoing responses: Rewind supports two formats:

Time-based: Use a time interval like '30s' or '2m' to retrieve messages from that time period
Count-based: Use a number like 10 or 50 to retrieve the most recent N messages (maximum 100)

Using history for older messages

Use channel history with the untilAttach option to paginate back through history to obtain historical responses, while preserving continuity with the delivery of live tokens:

Hydrating an in-progress response

A common pattern is to persist complete model responses in your database while using Ably for streaming in-progress responses. The client loads completed responses from your database, then uses Ably to catch up on any response that was still in progress. You can hydrate in-progress responses using either the rewind or history pattern.

Publishing with correlation metadata

To correlate Ably messages with your database records, include the responseId in the message extras when publishing:

Hydrate using rewind

When hydrating, load completed responses from your database, then use rewind to catch up on any in-progress response. Check the responseId from message extras to skip responses already loaded from your database:

Hydrate using history

Load completed responses from your database, then use channel history with the untilAttach option to catch up on any in-progress responses. Use the timestamp of the last completed response as a lower bound, so that only messages after that timestamp are retrieved, ensuring continuity with live message delivery.

Get Started

Token Streaming

Sessions & Identity

Messaging Patterns

Provider Guides

How it works

Enable appends

Publishing tokens

Handling append failures

Configuring rollup behaviour

Subscribing to token streams

Client hydration

Using rewind for recent history

Using history for older messages

Hydrating an in-progress response

Publishing with correlation metadata

Hydrate using rewind

Hydrate using history

Build docs developers (and LLMs) love

Get Started

Token Streaming

Sessions & Identity

Messaging Patterns

Provider Guides

​How it works

​Enable appends

​Publishing tokens

​Handling append failures

​Configuring rollup behaviour

​Subscribing to token streams

​Client hydration

​Using rewind for recent history

​Using history for older messages

​Hydrating an in-progress response

​Publishing with correlation metadata

​Hydrate using rewind

​Hydrate using history

Build docs developers (and LLMs) love

How it works

Enable appends

Publishing tokens

Handling append failures

Configuring rollup behaviour

Subscribing to token streams

Client hydration

Using rewind for recent history

Using history for older messages

Hydrating an in-progress response

Publishing with correlation metadata

Hydrate using rewind

Hydrate using history