Scalability - Ably Realtime

Scalability refers to a system’s ability to change in size or scale to accommodate varying workloads. In practice, this means that as demand increases — whether through more users, more messages, or more channels — the system can grow to meet that demand without degradation in performance or reliability.

Why scalability matters

For realtime platforms, scalability is not a nice-to-have feature but a fundamental requirement, as applications can experience unpredictable growth and traffic spikes.

Consequences of poor scalability

When systems cannot scale effectively, users experience:

Degraded service
Increased latency
Connection failures
Message loss during periods of high demand

A truly scalable system removes these constraints, allowing developers to build without worrying about artificial ceilings on their growth potential.

Ably’s scalability dimensions

Ably scales on several key dimensions:

Channels: The maximum number of channels that can be used simultaneously by a single application can be scaled horizontally with no technical limit
Connections: The maximum number of connections that can exist simultaneously for a single application can be scaled horizontally without limitation
Message throughput: The total volume of messages Ably can process for a single application at any given moment is scaled horizontally, meaning there is no limit on the maximum possible aggregate rate

Vertical vs. horizontal scalability

There are fundamentally two approaches to scaling systems.

Vertical scalability (scaling up)

Vertical scalability means tackling larger problems by using larger components:

Deploying server instances with more CPU cores
Adding more memory to existing servers
Using larger storage devices

Limitations of vertical scaling:

Physical constraints on how powerful a single machine can become
Costs increase non-linearly with capacity
Single points of failure become more critical
Downtime is often required for upgrades

Horizontal scalability (scaling out)

Horizontal scalability means solving larger problems by having more components instead of larger ones:

Adding more server instances to a distributed system
Distributing load across a larger number of machines
Partitioning data and workloads across multiple resources

Benefits of horizontal scaling:

Virtually unlimited scaling potential
Greater resilience through redundancy
More cost-effective scaling at large scales
Ability to scale both up and down with demand (elasticity)
Resources can be added incrementally as needed

The need for elasticity

Modern applications require elasticity — the ability to scale both up and down in response to changing demand:

Applications experience fluctuations in usage
Traffic patterns follow time zones and regional events
Cost optimization demands that resources match current needs
Successful applications can experience exponential growth that is impossible to predict accurately

Realtime system challenges:

Persistent connections
Message fanout
Low latency requirements
Global synchronization

For platforms like Ably that need to support arbitrary and elastic scalability, horizontal scaling is the only viable approach.

Challenges of horizontally scaling resources

Effective horizontal scaling involves several significant challenges that must be addressed in a distributed system design.

Resource coordination

It’s not enough just to have unlimited resources — you have to direct requests to those resources effectively:

Resources need coordinated access to shared dependencies
Requests must be distributed efficiently across available resources
The system must maintain consistent behavior as resources are added or removed

Without proper coordination mechanisms, a distributed system quickly becomes unpredictable and unreliable even as more capacity is added.

Stateful interactions

Most realtime systems involve stateful interactions where the replicated resources cannot operate independently:

Users expect consistent experiences across sessions
Messages need to be delivered in the correct order
Subscribers to the same channel need to see the same content

Maintaining this state consistently across a distributed system requires sophisticated algorithms and careful system design to avoid race conditions, conflicts, and inconsistent views of data.

State maintenance

In stateful systems like Ably, the system must maintain information across requests or messages:

Which clients are connected
Which channels are active
Which clients are subscribed to which channels
Message ordering and history
Presence information

Simply adding more servers isn’t enough — the system must ensure that state is maintained correctly across the distributed infrastructure.

High-scale fanout

This occurs when a single message needs to be delivered to a very large number of recipients:

A sporting event might have millions of viewers receiving the same score updates
A financial application might distribute price changes to thousands of traders
A chat application might deliver a message to everyone in a popular channel

These scenarios require specialized architectures to handle the efficient distribution of messages to large numbers of connections without overwhelming individual system components or creating network bottlenecks.

Consistent hashing for workload distribution

Ably uses consistent hashing as the foundation of its horizontal scaling approach. Consistent hashing solves a key problem in distributed systems: how to distribute work evenly across a changing set of resources while minimizing redistribution when resources are added or removed.

Traditional hashing limitations

In a traditional hashing approach, work might be distributed using a formula like:

server = hash(item) % number_of_servers

This works well when the number of servers is fixed, but causes major problems when servers are added or removed:

When the number of servers changes, the output of the calculation changes for most items
This results in most items being moved to new servers
Massive redistribution causes service disruption

How consistent hashing works

Consistent hashing addresses this issue by arranging both servers and work items on a conceptual “ring”:

Map to ring

Both servers and work items are mapped to positions on the ring using a hash function.

Assign to nearest server

Each work item is assigned to the nearest server clockwise around the ring.

Minimal redistribution

When a server is added or removed, only the work items that fall between the new or removed server and its clockwise neighbor need to be reassigned.

This dramatically reduces the amount of redistribution required when the set of resources changes.

Ably’s implementation

At Ably, consistent hashing enables efficient distribution of channels across the available processing resources:

Each channel is placed on a specific server instance based on its position on the hash ring
When the system scales — either to add capacity or to respond to failures — only a small fraction of channels need to be moved to different servers
This approach minimizes disruption during scaling events and ensures that the system can maintain performance even as it grows

Multiple hashes for even distribution

To address potential uneven distribution, especially when the number of servers is small, Ably assigns multiple hash positions to each possible placement location (server process):

Each placement location is represented by multiple points on the hash ring
The number of points can be adjusted based on server capacity
This statistically creates a more even distribution of work

Example: If each server has 100 points on the ring, and a server is added to a cluster of 10 servers:

Each existing server will give up approximately 1/11th of its load to the new server
This results in a balanced distribution

Benefits of multiple hashes:

Busy items are distributed more evenly across the available servers
The law of large numbers helps ensure that no single placement location gets an unfair share of high-traffic items
The system becomes more statistically predictable as scale increases

Progressive hashing for graceful scaling

Ably extends consistent hashing with “progressive hashing” to make scaling operations more gradual and controlled. Problem: Even with consistent hashing, adding or removing a server causes an immediate redistribution of the affected work items, which can lead to:

Thundering herd problems
Connection spikes
Processing delays
Resource pressure

Solution: Progressive hashing ensures that changes to the available resources are introduced gradually.

Scaling up with progressive hashing

When a new server joins the cluster:

Gradual hash announcement

The server doesn’t immediately take on its full share of work. Instead, it gradually announces additional hash positions over time.

Progressive load absorption

The new server might start by claiming just 10% of its eventual hash positions, then increase to 20%, 30%, and so on.

Warm-up period

This allows the new server to warm up and absorb load progressively.

Stable performance

Existing servers maintain stable performance as they gradually shed load.

Scaling down with progressive hashing

The same approach works in reverse when a server is scheduled for termination:

Gradual relinquishment

The server gradually relinquishes its hash positions before actual termination.

Progressive redistribution

Its workload is redistributed gradually to other servers.

Clean termination

By the time the server is actually removed, most or all of its work has already been transitioned.

This controlled shedding is particularly important for graceful scaling down or for replacing instances during maintenance.

How Ably achieves scalability

Ably’s architecture is built from the ground up to enable horizontal scalability across all dimensions. This is achieved through several key design principles that work together to create a seamlessly scalable platform.

Multi-layered architecture

Ably uses a multi-layered architecture organized into independently scalable layers:

Frontend layer: Handles REST requests and realtime connections (WebSocket and Comet)
Core layer: Performs central message processing for channels

These layers scale independently in each region according to demand, monitored through metrics like CPU and memory utilization. This separation of concerns allows each layer to scale efficiently according to its specific workload characteristics.

Channel scalability

Channels are the core building block of Ably’s service. Ably achieves horizontal scalability for channels through consistent hashing:

Each compute instance within the core layer has a set of pseudo-randomly generated hashes
Hashing determines the location of any given channel
As a cluster scales, channels relocate to maintain an even load distribution
Any number of channels can exist as long as sufficient compute capacity is available

Whether there are many lightly-loaded channels or fewer heavily-loaded ones, scaling and placement strategies ensure capacity is added as required and load is effectively distributed.

Connection scalability

Connection processing is stateless, meaning connections can be freely routed to any frontend server without impacting functionality:

A load balancer distributes work and decides where to terminate each connection
Combines simple random allocation with prioritization based on instantaneous load factors
The system performs background shedding to force the relocation of connections for balanced load
As long as sufficient capacity exists and routing maintains a balanced load, the service can absorb an unlimited number of connections

This stateless approach to connection handling significantly simplifies scaling while maintaining consistent user experience.

Handling high-scale fanout

The main challenge for connection scaling is high-scale fanout — when a large number of connections are subscribed to common channels. Ably addresses this through a tiered fanout architecture.

Two-tier fanout

When a message is published to a channel with many subscribers:

First tier: Channel to frontends

The channel processor forwards the message to all frontend servers that have subscribers for that channel.

Second tier: Frontend to connections

Each frontend server then delivers the message to its connected clients who are subscribed to the channel.

This two-tier fanout approach allows Ably to scale to handle millions of subscribers per channel.

Regional tier

At the regional tier, a channel also disseminates processed messages to corresponding channels in other regions where the channel is active. This ensures global distribution with optimized routing.

Subscription mapping

The channel processor maintains a map of which frontend servers have subscribers for each channel:

When a new subscription is created, the frontend server notifies the channel processor
The channel processor updates its subscription map
Messages are only sent to frontend servers that have active subscribers
This optimizes network usage and processing resources

By separating the concerns of channel processing and connection management, and by implementing efficient fanout mechanisms, Ably can scale to handle channels with millions of subscribers while maintaining low latency and high throughput.

Message throughput scalability

Ably achieves scalability through multiple complementary approaches:

Distributed processing: Messages are processed by the core instance responsible for the channel, distributing the load across the cluster
Efficient routing: The system routes messages directly to interested parties without unnecessary network hops
Optimized protocols: Binary protocols and efficient message encoding minimize overhead
Connection optimizations: Features like delta compression reduce bandwidth requirements for large messages

Monitoring and auto-scaling

Maintaining effective horizontal scalability requires continuous monitoring and automated scaling.

Monitoring metrics

Ably’s platform monitors various metrics:

CPU and memory utilization
Message rates
Channel and connection counts
Resource headroom

Automated scaling triggers

When monitoring determines that the load is approaching the current capacity:

Automatic scaling triggers to add more resources
For stateful roles, progressive hashing introduces new capacity gradually
This minimizes disruption to the existing workload

Scaling down

When the system detects excess capacity:

It can scale down by gradually removing resources
This optimizes cost efficiency without impacting performance

Regional variations

The auto-scaling systems also account for regional variations in load:

Different regions may experience peak loads at different times due to time zone differences and regional events
By scaling each region independently based on its current load, Ably ensures efficient resource utilization across the global platform

Load testing

Regular load testing helps validate the system’s scalability properties:

Ensures that the distribution mechanisms work as expected at scale
Identifies potential bottlenecks before they affect real traffic
Tests how well the system redistributes work after failures
Measures how quickly the system can scale up and down

Practical limits and considerations

While Ably’s architecture is designed for horizontal scalability, practical considerations do exist that developers should understand when architecting applications on the platform.

Channel considerations

When working with channels, several factors should be considered:

While there’s no hard limit on the number of channels, each active channel consumes memory and CPU resources
Very high message rates on a single channel may encounter throughput limitations, as all processing for one channel occurs on a single core instance
Applications should distribute high-volume message traffic across multiple channels when possible

Connection considerations

Connection factors to keep in mind:

Each connection consumes memory and processing resources on frontend instances
Very high message rates on a single connection may encounter throughput limitations due to network constraints and protocol overhead
For publishing at sustained high rates, applications may need to distribute work across multiple connections or use the REST API

Message considerations

Message rate and size impact overall system performance:

Default message size limits (typically 64KB) protect against excessive memory pressure and network load
Very large messages impact processing cost and transit latency, especially in high-volume scenarios
Features like delta compression help manage bandwidth for large messages with minor changes

Benefits of Ably’s scalable architecture

Ably’s horizontally scalable architecture provides several key benefits that directly impact application development and user experience.

No scale ceiling

The most fundamental benefit is the removal of technical limitations on growth:

Unlimited channels

No limit on the number of channels your application can use.

Unlimited connections

No limit on the number of concurrent connections.

Unlimited throughput

No limit on the aggregate message throughput.

Seamless growth

Applications can scale from prototype to global adoption without fundamental architecture changes.

Automatic elasticity

The platform handles scaling automatically:

Resources are provisioned on demand as load changes
Scaling occurs independently across different dimensions based on actual usage patterns
Applications benefit from elasticity without any additional configuration or management
No need for capacity planning and over-provisioning

Developer focus

Engineering teams can concentrate on building features that deliver business value:

No need to design complex scaling architectures
No requirement to manage infrastructure
No operational overhead of monitoring and scaling systems
Accelerated time-to-market
Teams can invest their time in innovation rather than operations

Cost efficiency

Elastic scaling provides cost benefits:

Pay only for the resources you use
Automatic scaling down during periods of low demand
No need to over-provision for peak capacity

Next steps

Performance

Learn how Ably maintains low latency at scale

Fault tolerance

Understand how Ably maintains reliability while scaling

Edge network

Explore Ably’s global edge network infrastructure

Limits

Review rate limits and quotas

Overview

Architecture

Products & SDKs

Pricing & Billing

​Why scalability matters

​Consequences of poor scalability

​Ably’s scalability dimensions

​Vertical vs. horizontal scalability

​Vertical scalability (scaling up)

​Horizontal scalability (scaling out)

​The need for elasticity

​Challenges of horizontally scaling resources

​Resource coordination

​Stateful interactions

​State maintenance

​High-scale fanout

​Consistent hashing for workload distribution

​Traditional hashing limitations

​How consistent hashing works

​Ably’s implementation

​Multiple hashes for even distribution

​Progressive hashing for graceful scaling

​Scaling up with progressive hashing

​Scaling down with progressive hashing

​How Ably achieves scalability

​Multi-layered architecture

​Channel scalability

​Connection scalability

​Handling high-scale fanout

​Two-tier fanout

​Regional tier

​Subscription mapping

​Message throughput scalability

​Monitoring and auto-scaling

​Monitoring metrics

​Automated scaling triggers

​Scaling down

​Regional variations

​Load testing

​Practical limits and considerations

​Channel considerations

​Connection considerations

​Message considerations

​Benefits of Ably’s scalable architecture

​No scale ceiling