- Limits relating to usage across an account, such as the total number of messages sent in a month, or the aggregate instantaneous message rate across all connections and channels
- Limits relating to the capacity of a single resource, such as a connection or a channel
Message-per-response
The message-per-response pattern includes automatic rate limit protection. AI Transport prevents a single response stream from reaching the message rate limit for a connection by rolling up multiple appends into a single published message:- Your agent streams tokens to the channel at the model’s output rate
- Ably publishes the first token immediately, then automatically rolls up subsequent tokens on receipt
- Clients receive the same content, delivered in fewer discrete messages
Configure rollup behaviour
Ably concatenates all appends for a single response that are received during the rollup window into one published message. You can specify the rollup window for a particular connection by setting theappendRollupWindow transport parameter. This allows you to determine how much of the connection message rate can be consumed by a single response stream and control your consumption costs.
appendRollupWindow | Maximum message rate for a single response |
|---|---|
| 0ms | Model output rate |
| 20ms | 50 messages/s |
| 40ms (default) | 25 messages/s |
| 100ms | 10 messages/s |
| 500ms (max) | 2 messages/s |
appendRollupWindow set to 100ms:
Message-per-token
The message-per-token pattern requires you to manage rate limits directly. Each token publishes as a separate message, so high-speed model output can cause per-connection or per-channel rate limits to be hit, as well as consuming overall message allowances quickly. To stay within limits:- Calculate your headroom by comparing your model’s peak output rate against your package’s connection inbound message rate
- Account for concurrency by multiplying peak rates by the maximum number of simultaneous streams your application supports
- If required, batch tokens in your agent before publishing to the SDK, reducing message count while maintaining delivery speed
- Enable server-side batching to reduce the number of messages delivered to your subscribers
Next steps
- Review Ably platform limits to understand rate limit thresholds for your package
- Learn about the message-per-response pattern for automatic rate limit protection
- Learn about the message-per-token pattern for fine-grained control
