Skip to main content
Payment System Architecture

Overview

When you click the Buy button on Amazon or any e-commerce website, a complex payment processing system springs into action. This case study explores how money moves through a payment system, from the initial purchase to final settlement.
Payment systems handle real money and must prioritize reliability, security, and consistency above all else.

Payment Flow Architecture

Let’s trace how a payment moves through the system step by step:

1. Payment Event Generation

1

User Action

When a user clicks the “Buy” button, a payment event is generated and sent to the payment service.
2

Event Persistence

The payment service immediately stores the payment event in the database for audit and recovery purposes.
Why persist immediately?
  • Ensure no payment request is lost
  • Enable recovery from failures
  • Maintain audit trail
  • Support dispute resolution

2. Payment Order Processing

1

Order Decomposition

A single payment event may contain multiple payment orders. For example, you might buy products from several sellers in one checkout.The payment service breaks down the event into individual payment orders, one per seller.
2

Payment Execution

For each payment order:
  1. The payment service calls the payment executor
  2. The payment executor stores the order in the database
  3. The executor calls an external PSP (Payment Service Provider) to process the credit card payment
Payment Service Provider (PSP) Examples:
  • Stripe
  • PayPal
  • Adyen
  • Square
  • Braintree

3. Balance Updates

1

Wallet Update

After successful payment execution:
  1. Payment service calls the wallet service
  2. Wallet service updates the seller’s balance
  3. Updated balance is persisted in the wallet database
2

Ledger Recording

After wallet update:
  1. Payment service calls the ledger service
  2. Ledger service appends new transaction records
  3. Ledger maintains immutable financial record

4. Settlement Process

1

Daily Settlement

Every night, PSPs and banks send settlement files to their clients.
2

Settlement File Contents

The settlement file contains:
  • Current balance of the bank account
  • All transactions that occurred during the day
  • Transaction fees and charges
  • Any chargebacks or refunds

Key Components Explained

Role: Orchestrates the entire payment flowResponsibilities:
  • Receive and validate payment events
  • Break down events into individual orders
  • Coordinate with payment executor, wallet, and ledger
  • Handle payment state management
  • Manage retries and error handling
Critical Features:
  • Idempotency: Same payment request should not be processed twice
  • State machine: Track payment through various states (pending, processing, completed, failed)
  • Error handling: Gracefully handle failures and timeouts
Role: Execute payments through external PSPsResponsibilities:
  • Store payment orders before processing
  • Integrate with multiple PSPs (Stripe, PayPal, etc.)
  • Handle PSP-specific protocols and formats
  • Manage payment method details securely
  • Retry failed payments with exponential backoff
Security Considerations:
  • Never store raw credit card numbers (use tokens)
  • PCI DSS compliance
  • Secure communication with PSPs (TLS)
  • Encrypt sensitive data at rest
Role: Manage account balances for all partiesResponsibilities:
  • Track balances for sellers and buyers
  • Process debits and credits atomically
  • Prevent negative balances
  • Support multiple currencies
  • Handle holds and reservations
Database Design:
CREATE TABLE wallets (
  wallet_id BIGINT PRIMARY KEY,
  user_id BIGINT NOT NULL,
  currency VARCHAR(3) NOT NULL,
  balance DECIMAL(19, 4) NOT NULL,
  available_balance DECIMAL(19, 4) NOT NULL,
  held_balance DECIMAL(19, 4) NOT NULL,
  version BIGINT NOT NULL,  -- For optimistic locking
  updated_at TIMESTAMP NOT NULL
);
Role: Maintain immutable record of all financial transactionsCharacteristics:
  • Append-only: Never update or delete entries
  • Double-entry bookkeeping: Every transaction has equal debits and credits
  • Audit trail: Complete history of all money movements
  • Reconciliation: Match internal records with bank statements
Why separate from wallet?
  • Wallet = current state (“How much money do I have now?”)
  • Ledger = complete history (“How did I get this balance?”)
  • Separation enables independent scaling and optimization
Example Ledger Entry:
{
  "transaction_id": "tx_123456",
  "timestamp": "2024-02-28T10:30:00Z",
  "entries": [
    {
      "account": "buyer_wallet_456",
      "type": "debit",
      "amount": 100.00,
      "currency": "USD"
    },
    {
      "account": "seller_wallet_789",
      "type": "credit",
      "amount": 95.00,
      "currency": "USD"
    },
    {
      "account": "platform_fee",
      "type": "credit",
      "amount": 5.00,
      "currency": "USD"
    }
  ]
}

10 Principles for Resilient Payment Systems

Based on Shopify’s experience processing billions in payments: Shopify Payment System Resilience Principles
Problem: Default timeout of 60 seconds is too longSolution:
  • Read timeout: 5 seconds
  • Write timeout: 1 second
Rationale: Fail fast and retry rather than blocking for a minute
Problem: Cascading failures when downstream services failSolution: Use circuit breakers to stop calling failing servicesExample: Shopify’s Semian library protects:
  • Net::HTTP
  • MySQL
  • Redis
  • gRPC services
States:
  • Closed: Normal operation
  • Open: Stop calling service (return cached/default response)
  • Half-open: Test if service recovered
Formula:
Throughput = Queue Size / Average Processing Time
Example:
  • 50 requests in queue
  • 100ms average processing time
  • Throughput = 500 requests/second
Action: Monitor queue depth and processing time to detect capacity issues early
Four Golden Signals (from Google SRE):
  1. Latency: How long requests take
  2. Traffic: How many requests you’re getting
  3. Errors: Rate of failed requests
  4. Saturation: How full your service is
Set up alerts for all four metrics.
Requirements:
  • Centralized logging system
  • Structured format (JSON)
  • Easily searchable
  • Correlated by request ID
Example:
{
  "timestamp": "2024-02-28T10:30:00Z",
  "level": "info",
  "request_id": "req_abc123",
  "user_id": "user_456",
  "payment_id": "pay_789",
  "message": "Payment processed successfully",
  "amount": 100.00,
  "currency": "USD"
}
Problem: Network failures can cause duplicate paymentsSolution: Client provides unique idempotency keyImplementation:
POST /payments
Idempotency-Key: ulid_01HQJK8M9N7P6R3S5T4V2W1X0Y

{
  "amount": 100.00,
  "currency": "USD",
  "source": "tok_visa_1234"
}
Why ULID over UUID?
  • Lexicographically sortable
  • More compact
  • Contains timestamp
Purpose: Ensure internal records match bank/PSP statementsProcess:
  1. Receive settlement file from PSP/bank
  2. Compare with internal ledger
  3. Identify discrepancies
  4. Investigate and resolve breaks
  5. Store results in database
Frequency: Daily (automated), plus on-demand for investigations
Strategy: Regularly simulate high-volume scenariosShopify’s Approach:
  • Simulate flash sales (Black Friday, Cyber Monday)
  • Test with 10x normal traffic
  • Measure latency at different percentiles (p50, p95, p99)
  • Identify bottlenecks before they hit production
Three Key Roles:
  1. Incident Manager on Call (IMOC): Coordinates response
  2. Support Response Manager (SRM): Handles customer communication
  3. Service Owners: Fix the technical issue
Process:
  • Detect → Alert → Assemble team → Diagnose → Fix → Communicate → Document
Three Questions:
  1. What exactly happened? (Timeline of events)
  2. What incorrect assumptions did we hold? (Root cause)
  3. What can we do to prevent this? (Action items)
Culture: Blameless retrospectives focused on system improvements

Design Tradeoffs

Challenge: In a distributed payment system, network partitions can occurDecision: Choose consistency over availabilityRationale:
  • Better to fail a payment than process it twice
  • Financial accuracy is non-negotiable
  • Temporary unavailability is acceptable; incorrect balances are not
Implementation:
  • Use distributed transactions (2PC, Saga pattern)
  • Strong consistency for wallet updates
  • Eventual consistency acceptable for analytics/reporting
Synchronous (for critical path):
  • Payment validation
  • Wallet debits/credits
  • Ledger recording
Asynchronous (for non-critical):
  • Email notifications
  • Analytics updates
  • Fraud scoring (post-authorization)
  • Webhooks to merchants
Benefit: Keep critical path fast and reliable
Don’t build:
  • Credit card processing (PCI DSS complexity)
  • Fraud detection (requires massive data)
  • Bank integrations (regulatory requirements)
Do build:
  • Payment orchestration layer
  • Wallet and ledger systems
  • Business logic and rules
  • Reconciliation systems
Strategy: Use PSPs (Stripe, Adyen) for payment processing, build orchestration and financial systems

Security Considerations

PCI DSS Compliance

Never store raw credit card numbers. Use tokenization provided by PSPs.

Encryption

Encrypt all sensitive data at rest and in transit (TLS 1.2+).

Fraud Detection

Integrate with fraud detection services. Monitor for suspicious patterns.

Rate Limiting

Prevent brute force attacks on payment endpoints.

Audit Logging

Log all payment actions with user, timestamp, and outcome.

Access Control

Strict RBAC for payment system access. Separate read/write permissions.

Handling Failures

Double Payment Prevention

Problem: Network timeout after PSP charges card but before response received Solution:
1

Idempotency Keys

Client includes unique key with each request
2

Request Deduplication

Server checks if key already processed before executing
3

Store Results

Cache response for duplicate requests (24 hours)

Wallet Update Failure

Problem: Payment succeeds at PSP but wallet update fails Solution: Use distributed transactions Saga Pattern:
1. Reserve amount in buyer's wallet
2. Call PSP to charge card
3. If success:
   - Commit buyer wallet debit
   - Credit seller wallet
   - Record in ledger
4. If failure:
   - Release reservation in buyer wallet
   - Refund via PSP (if charged)

Key Technologies

Database

PostgreSQL with ACID transactions for wallet and ledger

Message Queue

Kafka for async processing (notifications, webhooks)

Cache

Redis for idempotency keys and rate limiting

Monitoring

Prometheus + Grafana for metrics, PagerDuty for alerts

Summary

Building a resilient payment system requires:
1

Strong Consistency

Use ACID transactions for financial operations. Never compromise on data accuracy.
2

Idempotency

Prevent duplicate payments with idempotency keys on all write operations.
3

Comprehensive Logging

Maintain immutable ledger and detailed audit logs for compliance and debugging.
4

Fault Tolerance

Implement circuit breakers, timeouts, and graceful degradation.
5

Reconciliation

Daily reconciliation with PSPs and banks to catch discrepancies early.
6

Security First

PCI DSS compliance, encryption, fraud detection, and strict access controls.
Payment systems are among the most critical systems in software engineering. Prioritize correctness and reliability over performance. It’s better to process payments slowly than to process them incorrectly.

Build docs developers (and LLMs) love