Global infrastructure

The Ably platform has fault-tolerant, highly-available, elastic global infrastructure for effortless scaling.

Infrastructure overview

Ably’s platform is built primarily on Amazon’s cloud infrastructure. The platform is distributed across more than 15 physical datacenters within the AWS network, with 700+ edge locations globally through AWS CloudFront, ensuring there isn’t a single point of failure or congestion across the service.

Key infrastructure characteristics

Multi-region deployment

Servers distributed across 15+ physical datacenters within the AWS network

Global edge network

700+ edge locations globally through AWS CloudFront

Physical isolation

Each datacenter is physically isolated from others to prevent cascading failures

Independent scaling

Each datacenter scales independently to meet regional load

Datacenter distribution

Ably operates in multiple regions around the world, with each datacenter operating independently. This global distribution provides several benefits:

Geographic proximity

Clients are automatically connected to the nearest datacenter to reduce latency. This ensures optimal performance regardless of user location.

Fault isolation

Each datacenter is physically isolated from the others, ensuring that a failure in one datacenter has no effect on any other datacenter. This isolation is critical for maintaining service availability during regional outages.

Data residency

The global distribution enables Ably to comply with data residency requirements by keeping data within specific geographic regions when required. This is important for customers operating in regulated industries or jurisdictions with strict data sovereignty laws.

Intelligent routing

Ably is designed to route messages using the least amount of network hops to minimize latency and maximize performance for clients, regardless of their location.

DNS-based routing

Ably uses DNS-based latency routing to direct clients to the nearest available datacenter. When a client performs a DNS lookup, the DNS service resolves to the closest datacenter to the client’s location. Primary endpoint: main.realtime.ably.net Ably’s DNS configuration uses a TTL of 60 seconds, allowing for relatively quick rerouting of traffic if a datacenter becomes unhealthy.

Fallback mechanisms

To address DNS limitations in failure scenarios, Ably implements a fallback mechanism in all client libraries:

If a client cannot connect to the primary endpoint, it automatically attempts to connect using alternative endpoints
Fallback endpoints include direct connections to specific datacenters
A completely segregated secondary domain (ably-realtime.com) uses a different DNS provider

When using fallbacks, clients may connect to a datacenter that is not the closest to them, potentially increasing latency by up to 150ms. However, Ably prioritizes service availability over optimal latency in failure scenarios.

CloudFront and load balancing

AWS CloudFront

Client connections to Ably are handled through AWS CloudFront for global edge distribution. When a client attempts to connect to Ably, the request is first routed to the nearest CloudFront edge location with over 700 edge locations globally. This reduces the public internet transit time, as clients connect to a nearby edge node rather than traversing the entire distance to an Ably datacenter.

Network Load Balancers

Behind CloudFront, each Ably region employs AWS Network Load Balancers (NLBs) to distribute traffic to the application servers. NLBs:

Operate at the transport layer
Handle millions of requests per second
Maintain ultra-low latencies
Distribute traffic to frontend servers for establishing and maintaining client connections

Auto-healing and auto-scaling

Dynamic load assignment

Load is dynamically assigned and reassigned across servers in realtime. The service auto-heals and routes around network failures automatically.

Independent regional scaling

Each datacenter scales independently to meet the load within that region. Ably continuously monitors CPU, memory, and other key metrics, triggering autoscaling based on aggregated performance indicators.

Capacity management

All Ably infrastructure scales on demand to handle ambient traffic levels. The infrastructure is typically provisioned with significant headroom above current demand, ensuring that sudden increases in traffic can be accommodated without impacting service quality.

Infrastructure redundancy

Beyond DNS and client-side fallbacks, Ably’s infrastructure includes multiple layers of redundancy:

Datacenter redundancy

Each datacenter contains redundant servers, network paths, and storage systems to eliminate single points of failure.

Multi-region redundancy

The failure of an entire datacenter does not impact the availability of the service as a whole. Clients can continue to connect via other datacenters.

Edge redundancy

CloudFront is designed to be highly available, with redundant capacity across multiple edge locations.

Message persistence

Messages are persisted in multiple locations:

Every message is stored in RAM on two or more physically isolated datacenters within the receiving region
Every message is additionally stored in RAM in at least one other region
For persisted messages, storage across three regions is required before the message is deemed successfully stored

Network information

For detailed information about Ably’s global infrastructure:

Network map: View Ably’s datacenters and global points of presence
Status monitoring: Check the status of datacenters by region
Latency metrics: See global round-trip latency statistics measured externally by Uptrends

Next steps

Edge network

Learn about Ably’s edge network architecture and DDoS protection

Fault tolerance

Understand how Ably maintains high availability

Performance

Explore Ably’s performance characteristics

Scalability

Discover how Ably achieves unlimited scalability

Overview

Architecture

Products & SDKs

Pricing & Billing

Infrastructure overview

Key infrastructure characteristics

Multi-region deployment

Global edge network

Physical isolation

Independent scaling

Datacenter distribution

Geographic proximity

Fault isolation

Data residency

Intelligent routing

DNS-based routing

Fallback mechanisms

CloudFront and load balancing

AWS CloudFront

Network Load Balancers

Auto-healing and auto-scaling

Dynamic load assignment

Independent regional scaling

Capacity management

Infrastructure redundancy

Network information

Next steps

Edge network

Fault tolerance

Performance

Scalability

Build docs developers (and LLMs) love

Overview

Architecture

Products & SDKs

Pricing & Billing

​Infrastructure overview

​Key infrastructure characteristics

Multi-region deployment

Global edge network

Physical isolation

Independent scaling

​Datacenter distribution

​Geographic proximity

​Fault isolation

​Data residency

​Intelligent routing

​DNS-based routing

​Fallback mechanisms

​CloudFront and load balancing

​AWS CloudFront

​Network Load Balancers

​Auto-healing and auto-scaling

​Dynamic load assignment

​Independent regional scaling

​Capacity management

​Infrastructure redundancy

​Network information

​Next steps

Edge network

Fault tolerance

Performance

Scalability

Build docs developers (and LLMs) love

Infrastructure overview

Key infrastructure characteristics

Datacenter distribution

Geographic proximity

Fault isolation

Data residency

Intelligent routing

DNS-based routing

Fallback mechanisms

CloudFront and load balancing

AWS CloudFront

Network Load Balancers

Auto-healing and auto-scaling

Dynamic load assignment

Independent regional scaling

Capacity management

Infrastructure redundancy

Network information

Next steps