Skip to main content
YugabyteDB delivers powerful distributed database capabilities that combine the best of traditional RDBMS with modern NoSQL systems. This page explores the core features that make YugabyteDB an ideal choice for cloud-native applications.

Distributed SQL with PostgreSQL compatibility

YugabyteDB SQL (YSQL) provides a fully relational SQL API that is wire-compatible with PostgreSQL.

PostgreSQL query layer

YSSQL reuses the PostgreSQL query layer, providing:
  • Full SQL support - DDL, DML, joins, subqueries, window functions
  • Data types - All PostgreSQL data types including JSON, arrays, and custom types
  • Constraints - Primary keys, foreign keys, unique, check, and not null constraints
  • Indexes - B-tree, hash, GIN, GiST, and covering indexes
  • Advanced features - Stored procedures, triggers, views, and partitions
CREATE TABLE orders (
  order_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id INT NOT NULL,
  order_date TIMESTAMP DEFAULT NOW(),
  total_amount DECIMAL(10,2),
  status VARCHAR(20) CHECK (status IN ('pending', 'shipped', 'delivered')),
  CONSTRAINT fk_customer FOREIGN KEY (customer_id) 
    REFERENCES customers(customer_id)
);
YugabyteDB currently supports PostgreSQL 15 features. The database is designed to support newer PostgreSQL versions over time.

Supported PostgreSQL features

YugabyteDB supports the following PostgreSQL capabilities:

Data types

  • Numeric, text, date/time
  • JSON/JSONB
  • Arrays and ranges
  • UUID, INET, CIDR
  • Custom types

Indexes

  • Primary key indexes
  • Secondary indexes
  • Partial indexes
  • Expression indexes
  • Covering indexes

Constraints

  • Primary keys
  • Foreign keys
  • Unique constraints
  • Check constraints
  • Not null constraints

Advanced SQL

  • CTEs and recursive queries
  • Window functions
  • Stored procedures
  • Triggers
  • Partitioned tables

Distributed ACID transactions

YugabyteDB provides strong consistency and ACID guarantees across distributed nodes.

Transaction isolation levels

YugabyteDB supports three isolation levels:
The strongest isolation level, preventing all concurrency anomalies.
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
  -- All reads see a consistent snapshot
  -- Writes are serialized globally
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
Use when: Absolute correctness is required (financial transactions, inventory management)

Transaction implementation

YugabyteDB implements distributed transactions based on Google Spanner’s architecture:
  • Hybrid logical clocks - Provide global ordering without atomic clocks
  • Raft consensus - Ensures strong consistency for writes
  • Snapshot isolation by default - Balances performance and consistency
  • Automatic conflict resolution - Handles concurrent updates transparently
BEGIN;
  -- Debit from one account
  UPDATE accounts 
  SET balance = balance - 500 
  WHERE account_id = 'acc_001';
  
  -- Credit to another account  
  UPDATE accounts 
  SET balance = balance + 500 
  WHERE account_id = 'acc_002';
  
  -- Both updates succeed or both fail
COMMIT;

Automatic data sharding

YugabyteDB automatically distributes data across nodes for horizontal scalability.

Sharding strategies

Default sharding method that distributes data evenly:
-- Hash-sharded table (default)
CREATE TABLE users (
  user_id SERIAL PRIMARY KEY,
  username VARCHAR(50),
  email VARCHAR(100)
);
How it works:
  • YugabyteDB hashes the primary key
  • Assigns rows to tablets based on hash value
  • Ensures even distribution across nodes
Best for:
  • Uniform data access patterns
  • Write-heavy workloads
  • No natural range queries

Tablet splitting

As data grows, YugabyteDB automatically splits tablets:
1

Monitor tablet size

YugabyteDB tracks the size of each tablet continuously.
2

Trigger split

When a tablet exceeds the threshold (default 10 GB), it’s marked for splitting.
3

Split tablet

The tablet is divided into two tablets with roughly equal data.
4

Rebalance

New tablets may be moved to other nodes to maintain balance.
Tablet splitting is automatic and online. Applications continue running without interruption.

High availability and fault tolerance

YugabyteDB ensures continuous availability through replication and automatic failover.

Replication with Raft consensus

Each tablet is replicated across multiple nodes:
┌──────────────────────────────────────────────┐
│  Tablet (Users table, hash range 0-1000)    │
├──────────────────────────────────────────────┤
│  Node 1        Node 2        Node 3          │
│  [Leader]      [Follower]    [Follower]      │
│                                              │
│  Writes go     Replicated    Replicated     │
│  to leader     via Raft      via Raft       │
└──────────────────────────────────────────────┘
How Raft works:
1

Write request

Client sends write to any node, which forwards to the tablet leader.
2

Leader proposes

Leader proposes the write to follower replicas.
3

Majority vote

Once majority (2 of 3) acknowledges, write is committed.
4

Apply write

All replicas apply the committed write to their local storage.

Automatic failover

When a node fails, YugabyteDB automatically recovers:
Before failure:
Node 1 [Leader] → Node 2 [Follower] → Node 3 [Follower]

Node 1 fails:
❌ Node 1        Node 2 [Follower] → Node 3 [Follower]

Leader election (3 seconds):
❌ Node 1        Node 2 [NEW LEADER] → Node 3 [Follower]

Continue serving:
❌ Node 1        Node 2 [Leader] → Node 3 [Follower]
Recovery time (RTO): ~3 seconds Data loss (RPO): 0 (no data lost)
For fault tolerance, you need at least 3 nodes with replication factor 3. With RF=3, you can tolerate 1 node failure. With RF=5, you can tolerate 2 node failures.

Read replicas

For read-heavy workloads, add read replicas:
-- Create read replica cluster
CREATE TABLESPACE read_replica_ts
WITH (replica_placement='{"num_replicas": 1,
  "placement_blocks": [{"cloud":"aws","region":"us-east-1",
    "min_num_replicas":1}]}') READ ONLY;

-- Query from read replica
SET default_tablespace = read_replica_ts;
SELECT * FROM large_table WHERE category = 'analytics';
Benefits:
  • Offload read traffic from primary cluster
  • Serve stale reads with lower latency
  • No impact on write performance

Linear horizontal scalability

YugabyteDB scales reads and writes by adding nodes.

Scaling writes

Adding nodes increases write capacity:
1

Add new node

# Add node to cluster
./bin/yugabyted start --join=existing_node_ip
2

Automatic rebalancing

YugabyteDB automatically moves tablets to the new node to balance load.
3

Increased capacity

With more nodes, more tablets can accept writes in parallel.
Result: Write throughput scales linearly with the number of nodes.

Scaling reads

Reads scale in multiple ways:

Add more nodes

More nodes = more tablet replicas = more read capacity

Read from followers

Use follower reads for eventually consistent queries

Smart drivers

Topology-aware drivers route reads to nearest replica

Read replicas

Dedicated read-only replicas in additional regions

Benchmark results

YugabyteDB demonstrates linear scalability:
Throughput vs. Nodes (TPCC benchmark)

50k ops/sec  │                                    ●
             │                              ●
40k ops/sec  │                        ●
             │                  ●
30k ops/sec  │            ●
             │      ●
20k ops/sec  │●
             └────────────────────────────────────
              3     6     9    12    15    18  nodes
For optimal performance, use SSD or NVMe storage and ensure adequate network bandwidth between nodes.

Multi-region deployments

Deploy YugabyteDB across multiple geographic regions for global applications.

Synchronous replication

Stretch cluster across regions with strong consistency:
Region: us-west-2 (Oregon)
  - Node 1, Node 2
  
Region: us-east-1 (Virginia)
  - Node 3, Node 4
  
Region: eu-west-1 (Ireland)
  - Node 5, Node 6

Replication Factor: 3
Quorum: 2 of 3 replicas across regions
Characteristics:
  • Strong consistency across regions
  • Automatic failover to any region
  • Higher latency for cross-region writes
  • Best for: Geo-redundancy, disaster recovery

Asynchronous replication (xCluster)

Replicate data between independent clusters:
Primary (us-west-2)       Secondary (eu-west-1)
┌──────────────┐          ┌──────────────┐
│ Cluster A    │────CDC──>│ Cluster B    │
│ (read/write) │          │ (read-only)  │
└──────────────┘          └──────────────┘
Use cases:
  • Cross-region disaster recovery
  • Active-active multi-region setups
  • Compliance with data residency

AI/ML capabilities

Build intelligent applications with vector search and machine learning features.

pgvector extension

Store and query high-dimensional vectors:
-- Enable pgvector
CREATE EXTENSION vector;

-- Create table with vector column
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536)  -- OpenAI embedding size
);

-- Create HNSW index for fast similarity search
CREATE INDEX ON documents 
USING hnsw (embedding vector_cosine_ops);

-- Insert embeddings
INSERT INTO documents (content, embedding) VALUES
  ('YugabyteDB is a distributed database', '[0.1, 0.2, ...]'),
  ('PostgreSQL compatibility', '[0.3, 0.1, ...]');

-- Find similar documents
SELECT content, 
       embedding <=> '[0.15, 0.25, ...]' AS distance
FROM documents
ORDER BY embedding <=> '[0.15, 0.25, ...]'
LIMIT 10;
Use cases:
  • Semantic search
  • Recommendation engines
  • Image similarity
  • Retrieval Augmented Generation (RAG)
HNSW indexing provides fast approximate nearest neighbor search, making vector queries practical at scale.

Performance optimizations

YugabyteDB includes several performance enhancements:

Cost-based optimizer

Enabled by default in v2025.2+:
-- View query plan
EXPLAIN (ANALYZE, COSTS, VERBOSE)
SELECT c.name, COUNT(o.order_id)
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date > '2024-01-01'
GROUP BY c.name;
The CBO:
  • Analyzes table statistics
  • Estimates row counts and costs
  • Chooses optimal join strategies
  • Selects best index to use

Parallel query execution

Leverage multiple CPU cores:
-- Enable parallel query
SET yb_enable_parallel_append = ON;

-- Large aggregation uses parallel workers
SELECT region, SUM(sales)
FROM sales_data
GROUP BY region;

Bitmap scan

Combine multiple indexes efficiently:
-- Create indexes
CREATE INDEX idx_category ON products(category);
CREATE INDEX idx_price ON products(price);

-- Query uses both indexes via bitmap scan
SELECT * FROM products
WHERE category = 'electronics' 
  AND price < 1000;

Next steps

Now that you understand YugabyteDB’s core features:

Build an application

Connect from Java, Python, Go, Node.js, and more

Explore features

Hands-on exercises with transactions, indexes, and more

Deploy to production

Set up multi-node clusters for production workloads

Architecture deep dive

Learn about the internal architecture and design

Build docs developers (and LLMs) love