Overview
Lichess handles real-time communication through a separate WebSocket server called lila-ws, which communicates with the main Scala application (lila) via Redis pub/sub. This architecture enables horizontal scaling and handles millions of concurrent WebSocket connections.Architecture Diagram
Why Separate WebSocket Server?
Benefits
Horizontal scaling: Run multiple lila-ws instances behind load balancer Isolation: WebSocket connections don’t consume resources from main app Resilience: Main app restarts don’t drop WebSocket connections Performance: Optimized for connection handling vs. business logic Language flexibility: Could rewrite in different language if neededTrade-offs
Complexity: Additional service to deploy and monitor Latency: Extra hop through Redis adds ~1-5ms Consistency: Must handle Redis connection failures gracefullylila-ws Server
Technology Stack
- Language: Scala 3
- HTTP: Netty-based HTTP server
- Concurrency: Akka actors for connection management
- Redis: Lettuce client for pub/sub
Core Responsibilities
Connection Management
Connection Management
Maintain WebSocket connections for:Game rooms: Players and spectators watching a game
/watch/{gameId}- Spectate game/play/{gameId}- Play in game
/tournament/{tournamentId}
/study/{studyId}
/user/{userId}
/lobby
/site
Message Routing
Message Routing
lila-ws routes messages between clients and lila:Client → lila:lila → Client:
Presence Tracking
Presence Tracking
lila-ws tracks online users:Used for:
- “Online” indicators on profiles
- Friend online notifications
- Activity statistics
Connection Lifecycle
Redis Communication
Redis Channels
Lichess uses Redis pub/sub with predictable channel patterns: Site input (clients → lila):Message Protocol
Messages use JSON with type discriminator:Example: Game Move
1. Client sends move:lila Integration
Socket Module
The main lila application communicates with lila-ws through thesocket module:
RemoteSocket API
Round Module Integration
Theround module uses sockets for live game updates:
WebSocket Message Types
Game Room Messages
Client → Server:move- Player makes a moverematch- Player offers rematchtakeback- Request takebackdraw- Offer/accept drawresign- Resign gameabort- Abort gamemoretime- Give opponent more timepalantir- Premium user sees opponent’s time usage
move- Move made (includes fen, clock, etc.)end- Game ended (winner, reason)crowd- Spectator count changedclock- Clock syncgone- Player disconnectedgoneIn- Player will be considered gone soonpremove- Premove acknowledged
Tournament Room Messages
Server → Client:reload- Tournament data changed (refetch)redirect- Player should join gamestanding- Leaderboard update
Lobby Messages
Client → Server:join- Join lobby poolcancel- Leave lobby pool
redirect- Game found, join itpools- Pool counts updated
Scalability
Horizontal Scaling
Multiple lila-ws instances can run simultaneously:Connection Limits
Each lila-ws instance handles:- 100,000+ concurrent WebSocket connections
- 10,000+ messages per second
Redis Performance
Redis pub/sub can handle:- 100,000+ messages/second
- Sub-millisecond latency
- 3+ Redis nodes with replication
- Automatic failover
- Sentinel for monitoring
Client-Side WebSocket
TypeScript Socket Client
Frontend uses a custom WebSocket wrapper:Socket Library
Reconnection Strategy
- Automatic reconnect: Exponential backoff (2s, 4s, 8s, …)
- State recovery: Server sends full state on reconnect
- Buffering: Queue messages while disconnected
Monitoring
Metrics
Key metrics tracked:- Connection count: Active WebSocket connections per node
- Message rate: Messages/second in/out
- Redis lag: Time between publish and receive
- Reconnection rate: Clients reconnecting (indicates issues)
- Error rate: Failed message parsing, Redis timeouts
Health Checks
See Also
- Backend Architecture - lila application structure
- Frontend Architecture - Client-side socket usage
- Deployment - Production WebSocket infrastructure

