Skip to main content
Scalability refers to a system’s ability to change in size or scale to accommodate varying workloads. In practice, this means that as demand increases — whether through more users, more messages, or more channels — the system can grow to meet that demand without degradation in performance or reliability.

Why scalability matters

For realtime platforms, scalability is not a nice-to-have feature but a fundamental requirement, as applications can experience unpredictable growth and traffic spikes.

Consequences of poor scalability

When systems cannot scale effectively, users experience:
  • Degraded service
  • Increased latency
  • Connection failures
  • Message loss during periods of high demand
A truly scalable system removes these constraints, allowing developers to build without worrying about artificial ceilings on their growth potential.

Ably’s scalability dimensions

Ably scales on several key dimensions:
  • Channels: The maximum number of channels that can be used simultaneously by a single application can be scaled horizontally with no technical limit
  • Connections: The maximum number of connections that can exist simultaneously for a single application can be scaled horizontally without limitation
  • Message throughput: The total volume of messages Ably can process for a single application at any given moment is scaled horizontally, meaning there is no limit on the maximum possible aggregate rate

Vertical vs. horizontal scalability

There are fundamentally two approaches to scaling systems.

Vertical scalability (scaling up)

Vertical scalability means tackling larger problems by using larger components:
  • Deploying server instances with more CPU cores
  • Adding more memory to existing servers
  • Using larger storage devices
Limitations of vertical scaling:
  • Physical constraints on how powerful a single machine can become
  • Costs increase non-linearly with capacity
  • Single points of failure become more critical
  • Downtime is often required for upgrades

Horizontal scalability (scaling out)

Horizontal scalability means solving larger problems by having more components instead of larger ones:
  • Adding more server instances to a distributed system
  • Distributing load across a larger number of machines
  • Partitioning data and workloads across multiple resources
Benefits of horizontal scaling:
  • Virtually unlimited scaling potential
  • Greater resilience through redundancy
  • More cost-effective scaling at large scales
  • Ability to scale both up and down with demand (elasticity)
  • Resources can be added incrementally as needed

The need for elasticity

Modern applications require elasticity — the ability to scale both up and down in response to changing demand:
  • Applications experience fluctuations in usage
  • Traffic patterns follow time zones and regional events
  • Cost optimization demands that resources match current needs
  • Successful applications can experience exponential growth that is impossible to predict accurately
Realtime system challenges:
  • Persistent connections
  • Message fanout
  • Low latency requirements
  • Global synchronization
For platforms like Ably that need to support arbitrary and elastic scalability, horizontal scaling is the only viable approach.

Challenges of horizontally scaling resources

Effective horizontal scaling involves several significant challenges that must be addressed in a distributed system design.

Resource coordination

It’s not enough just to have unlimited resources — you have to direct requests to those resources effectively:
  • Resources need coordinated access to shared dependencies
  • Requests must be distributed efficiently across available resources
  • The system must maintain consistent behavior as resources are added or removed
Without proper coordination mechanisms, a distributed system quickly becomes unpredictable and unreliable even as more capacity is added.

Stateful interactions

Most realtime systems involve stateful interactions where the replicated resources cannot operate independently:
  • Users expect consistent experiences across sessions
  • Messages need to be delivered in the correct order
  • Subscribers to the same channel need to see the same content
Maintaining this state consistently across a distributed system requires sophisticated algorithms and careful system design to avoid race conditions, conflicts, and inconsistent views of data.

State maintenance

In stateful systems like Ably, the system must maintain information across requests or messages:
  • Which clients are connected
  • Which channels are active
  • Which clients are subscribed to which channels
  • Message ordering and history
  • Presence information
Simply adding more servers isn’t enough — the system must ensure that state is maintained correctly across the distributed infrastructure.

High-scale fanout

This occurs when a single message needs to be delivered to a very large number of recipients:
  • A sporting event might have millions of viewers receiving the same score updates
  • A financial application might distribute price changes to thousands of traders
  • A chat application might deliver a message to everyone in a popular channel
These scenarios require specialized architectures to handle the efficient distribution of messages to large numbers of connections without overwhelming individual system components or creating network bottlenecks.

Consistent hashing for workload distribution

Ably uses consistent hashing as the foundation of its horizontal scaling approach. Consistent hashing solves a key problem in distributed systems: how to distribute work evenly across a changing set of resources while minimizing redistribution when resources are added or removed.

Traditional hashing limitations

In a traditional hashing approach, work might be distributed using a formula like:
server = hash(item) % number_of_servers
This works well when the number of servers is fixed, but causes major problems when servers are added or removed:
  • When the number of servers changes, the output of the calculation changes for most items
  • This results in most items being moved to new servers
  • Massive redistribution causes service disruption

How consistent hashing works

Consistent hashing addresses this issue by arranging both servers and work items on a conceptual “ring”:
1

Map to ring

Both servers and work items are mapped to positions on the ring using a hash function.
2

Assign to nearest server

Each work item is assigned to the nearest server clockwise around the ring.
3

Minimal redistribution

When a server is added or removed, only the work items that fall between the new or removed server and its clockwise neighbor need to be reassigned.
This dramatically reduces the amount of redistribution required when the set of resources changes.

Ably’s implementation

At Ably, consistent hashing enables efficient distribution of channels across the available processing resources:
  • Each channel is placed on a specific server instance based on its position on the hash ring
  • When the system scales — either to add capacity or to respond to failures — only a small fraction of channels need to be moved to different servers
  • This approach minimizes disruption during scaling events and ensures that the system can maintain performance even as it grows

Multiple hashes for even distribution

To address potential uneven distribution, especially when the number of servers is small, Ably assigns multiple hash positions to each possible placement location (server process):
  • Each placement location is represented by multiple points on the hash ring
  • The number of points can be adjusted based on server capacity
  • This statistically creates a more even distribution of work
Example: If each server has 100 points on the ring, and a server is added to a cluster of 10 servers:
  • Each existing server will give up approximately 1/11th of its load to the new server
  • This results in a balanced distribution
Benefits of multiple hashes:
  • Busy items are distributed more evenly across the available servers
  • The law of large numbers helps ensure that no single placement location gets an unfair share of high-traffic items
  • The system becomes more statistically predictable as scale increases

Progressive hashing for graceful scaling

Ably extends consistent hashing with “progressive hashing” to make scaling operations more gradual and controlled. Problem: Even with consistent hashing, adding or removing a server causes an immediate redistribution of the affected work items, which can lead to:
  • Thundering herd problems
  • Connection spikes
  • Processing delays
  • Resource pressure
Solution: Progressive hashing ensures that changes to the available resources are introduced gradually.

Scaling up with progressive hashing

When a new server joins the cluster:
1

Gradual hash announcement

The server doesn’t immediately take on its full share of work. Instead, it gradually announces additional hash positions over time.
2

Progressive load absorption

The new server might start by claiming just 10% of its eventual hash positions, then increase to 20%, 30%, and so on.
3

Warm-up period

This allows the new server to warm up and absorb load progressively.
4

Stable performance

Existing servers maintain stable performance as they gradually shed load.

Scaling down with progressive hashing

The same approach works in reverse when a server is scheduled for termination:
1

Gradual relinquishment

The server gradually relinquishes its hash positions before actual termination.
2

Progressive redistribution

Its workload is redistributed gradually to other servers.
3

Clean termination

By the time the server is actually removed, most or all of its work has already been transitioned.
This controlled shedding is particularly important for graceful scaling down or for replacing instances during maintenance.

How Ably achieves scalability

Ably’s architecture is built from the ground up to enable horizontal scalability across all dimensions. This is achieved through several key design principles that work together to create a seamlessly scalable platform.

Multi-layered architecture

Ably uses a multi-layered architecture organized into independently scalable layers:
  • Frontend layer: Handles REST requests and realtime connections (WebSocket and Comet)
  • Core layer: Performs central message processing for channels
These layers scale independently in each region according to demand, monitored through metrics like CPU and memory utilization. This separation of concerns allows each layer to scale efficiently according to its specific workload characteristics.

Channel scalability

Channels are the core building block of Ably’s service. Ably achieves horizontal scalability for channels through consistent hashing:
  • Each compute instance within the core layer has a set of pseudo-randomly generated hashes
  • Hashing determines the location of any given channel
  • As a cluster scales, channels relocate to maintain an even load distribution
  • Any number of channels can exist as long as sufficient compute capacity is available
Whether there are many lightly-loaded channels or fewer heavily-loaded ones, scaling and placement strategies ensure capacity is added as required and load is effectively distributed.

Connection scalability

Connection processing is stateless, meaning connections can be freely routed to any frontend server without impacting functionality:
  • A load balancer distributes work and decides where to terminate each connection
  • Combines simple random allocation with prioritization based on instantaneous load factors
  • The system performs background shedding to force the relocation of connections for balanced load
  • As long as sufficient capacity exists and routing maintains a balanced load, the service can absorb an unlimited number of connections
This stateless approach to connection handling significantly simplifies scaling while maintaining consistent user experience.

Handling high-scale fanout

The main challenge for connection scaling is high-scale fanout — when a large number of connections are subscribed to common channels. Ably addresses this through a tiered fanout architecture.

Two-tier fanout

When a message is published to a channel with many subscribers:
1

First tier: Channel to frontends

The channel processor forwards the message to all frontend servers that have subscribers for that channel.
2

Second tier: Frontend to connections

Each frontend server then delivers the message to its connected clients who are subscribed to the channel.
This two-tier fanout approach allows Ably to scale to handle millions of subscribers per channel.

Regional tier

At the regional tier, a channel also disseminates processed messages to corresponding channels in other regions where the channel is active. This ensures global distribution with optimized routing.

Subscription mapping

The channel processor maintains a map of which frontend servers have subscribers for each channel:
  • When a new subscription is created, the frontend server notifies the channel processor
  • The channel processor updates its subscription map
  • Messages are only sent to frontend servers that have active subscribers
  • This optimizes network usage and processing resources
By separating the concerns of channel processing and connection management, and by implementing efficient fanout mechanisms, Ably can scale to handle channels with millions of subscribers while maintaining low latency and high throughput.

Message throughput scalability

Ably achieves scalability through multiple complementary approaches:
  • Distributed processing: Messages are processed by the core instance responsible for the channel, distributing the load across the cluster
  • Efficient routing: The system routes messages directly to interested parties without unnecessary network hops
  • Optimized protocols: Binary protocols and efficient message encoding minimize overhead
  • Connection optimizations: Features like delta compression reduce bandwidth requirements for large messages

Monitoring and auto-scaling

Maintaining effective horizontal scalability requires continuous monitoring and automated scaling.

Monitoring metrics

Ably’s platform monitors various metrics:
  • CPU and memory utilization
  • Message rates
  • Channel and connection counts
  • Resource headroom

Automated scaling triggers

When monitoring determines that the load is approaching the current capacity:
  • Automatic scaling triggers to add more resources
  • For stateful roles, progressive hashing introduces new capacity gradually
  • This minimizes disruption to the existing workload

Scaling down

When the system detects excess capacity:
  • It can scale down by gradually removing resources
  • This optimizes cost efficiency without impacting performance

Regional variations

The auto-scaling systems also account for regional variations in load:
  • Different regions may experience peak loads at different times due to time zone differences and regional events
  • By scaling each region independently based on its current load, Ably ensures efficient resource utilization across the global platform

Load testing

Regular load testing helps validate the system’s scalability properties:
  • Ensures that the distribution mechanisms work as expected at scale
  • Identifies potential bottlenecks before they affect real traffic
  • Tests how well the system redistributes work after failures
  • Measures how quickly the system can scale up and down

Practical limits and considerations

While Ably’s architecture is designed for horizontal scalability, practical considerations do exist that developers should understand when architecting applications on the platform.

Channel considerations

When working with channels, several factors should be considered:
  • While there’s no hard limit on the number of channels, each active channel consumes memory and CPU resources
  • Very high message rates on a single channel may encounter throughput limitations, as all processing for one channel occurs on a single core instance
  • Applications should distribute high-volume message traffic across multiple channels when possible

Connection considerations

Connection factors to keep in mind:
  • Each connection consumes memory and processing resources on frontend instances
  • Very high message rates on a single connection may encounter throughput limitations due to network constraints and protocol overhead
  • For publishing at sustained high rates, applications may need to distribute work across multiple connections or use the REST API

Message considerations

Message rate and size impact overall system performance:
  • Default message size limits (typically 64KB) protect against excessive memory pressure and network load
  • Very large messages impact processing cost and transit latency, especially in high-volume scenarios
  • Features like delta compression help manage bandwidth for large messages with minor changes

Benefits of Ably’s scalable architecture

Ably’s horizontally scalable architecture provides several key benefits that directly impact application development and user experience.

No scale ceiling

The most fundamental benefit is the removal of technical limitations on growth:

Unlimited channels

No limit on the number of channels your application can use.

Unlimited connections

No limit on the number of concurrent connections.

Unlimited throughput

No limit on the aggregate message throughput.

Seamless growth

Applications can scale from prototype to global adoption without fundamental architecture changes.

Automatic elasticity

The platform handles scaling automatically:
  • Resources are provisioned on demand as load changes
  • Scaling occurs independently across different dimensions based on actual usage patterns
  • Applications benefit from elasticity without any additional configuration or management
  • No need for capacity planning and over-provisioning

Developer focus

Engineering teams can concentrate on building features that deliver business value:
  • No need to design complex scaling architectures
  • No requirement to manage infrastructure
  • No operational overhead of monitoring and scaling systems
  • Accelerated time-to-market
  • Teams can invest their time in innovation rather than operations

Cost efficiency

Elastic scaling provides cost benefits:
  • Pay only for the resources you use
  • Automatic scaling down during periods of low demand
  • No need to over-provision for peak capacity

Next steps

Performance

Learn how Ably maintains low latency at scale

Fault tolerance

Understand how Ably maintains reliability while scaling

Edge network

Explore Ably’s global edge network infrastructure

Limits

Review rate limits and quotas

Build docs developers (and LLMs) love