Skip to main content
Follow these best practices to build efficient, scalable, and resilient applications on YugabyteDB. These guidelines cover data modeling, connection management, query optimization, and distributed system considerations.

Connection Management

Use Connection Pooling

Connection pooling is essential for production applications to reduce overhead and improve performance.
import com.zaxxer.hikari.HikariConfig;
import com.zaxxer.hikari.HikariDataSource;

HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://localhost:5433/yugabyte");
config.setUsername("yugabyte");
config.setPassword("yugabyte");
config.setMaximumPoolSize(10);
config.setMinimumIdle(2);
config.setConnectionTimeout(30000);
config.setIdleTimeout(600000);
config.setMaxLifetime(1800000);

HikariDataSource dataSource = new HikariDataSource(config);
Pool Configuration Guidelines:
  • Set maxPoolSize based on expected concurrent operations
  • Configure connectionTimeout to fail fast on connection issues
  • Use maxLifetime to recycle connections periodically
  • Enable pool_pre_ping (Python) to validate connections before use

Use Smart Drivers for Load Balancing

YugabyteDB Smart Drivers provide cluster-aware load balancing:
import com.yugabyte.ysql.YBClusterAwareDataSource;

YBClusterAwareDataSource ds = new YBClusterAwareDataSource();
ds.setUrl("jdbc:postgresql://node1:5433,node2:5433,node3:5433/yugabyte");
ds.setUser("yugabyte");
ds.setPassword("yugabyte");

// Enable topology awareness
ds.setTopologyAwareLoadBalance(true);
ds.setTopologyKeys("aws.us-west-2.us-west-2a");
Benefits:
  • Automatic distribution of connections across nodes
  • Topology-aware routing to reduce latency
  • Automatic failover on node failures
  • Connection balancing during scale-out

Handle Connection Failures

Implement retry logic for transient failures:
import psycopg2
import time
from psycopg2 import OperationalError

def execute_with_retry(query, max_retries=3, delay=1):
    for attempt in range(max_retries):
        try:
            conn = psycopg2.connect(
                host="localhost",
                port=5433,
                database="yugabyte",
                user="yugabyte",
                password="yugabyte"
            )
            cursor = conn.cursor()
            cursor.execute(query)
            conn.commit()
            return cursor.fetchall()
        except OperationalError as e:
            if attempt < max_retries - 1:
                time.sleep(delay * (2 ** attempt))  # Exponential backoff
                continue
            raise
        finally:
            if conn:
                conn.close()

Data Modeling

Choose Primary Keys Wisely

Primary key design impacts data distribution and query performance.
For auto-incrementing IDs, use HASH partitioning to distribute data evenly:
-- Without HASH: Data concentrates on one tablet
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER,
    total DECIMAL(10, 2)
);

-- With HASH: Data distributed across tablets
CREATE TABLE orders (
    id SERIAL,
    user_id INTEGER,
    total DECIMAL(10, 2),
    PRIMARY KEY (id HASH)
) SPLIT INTO 16 TABLETS;
Use composite keys for efficient time-range queries:
CREATE TABLE sensor_data (
    sensor_id INTEGER,
    timestamp TIMESTAMP,
    temperature DOUBLE PRECISION,
    humidity DOUBLE PRECISION,
    PRIMARY KEY ((sensor_id) HASH, timestamp ASC)
) SPLIT INTO 16 TABLETS;

-- Efficient range query
SELECT * FROM sensor_data 
WHERE sensor_id = 101 
  AND timestamp BETWEEN '2024-01-01' AND '2024-01-31';
Use naturally distributed keys when available:
-- UUID provides natural distribution
CREATE TABLE users (
    user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    username TEXT,
    email TEXT
);

-- Email as primary key (if unique and evenly distributed)
CREATE TABLE user_profiles (
    email TEXT PRIMARY KEY,
    name TEXT,
    preferences JSONB
);

Leverage Indexes Effectively

Include frequently queried columns in the index to avoid table lookups:
-- Regular index requires table lookup
CREATE INDEX idx_users_email ON users(email);

-- Covering index includes needed columns
CREATE INDEX idx_users_email_covering ON users(email) 
INCLUDE (username, created_at);

-- This query uses index-only scan
SELECT email, username, created_at 
FROM users 
WHERE email = '[email protected]';

Use Colocation for Small Tables

Colocate small reference tables for better performance:
-- Create colocated database
CREATE DATABASE myapp WITH COLOCATION = true;

-- Small tables are automatically colocated
CREATE TABLE countries (
    country_code CHAR(2) PRIMARY KEY,
    country_name TEXT
);

CREATE TABLE currencies (
    currency_code CHAR(3) PRIMARY KEY,
    currency_name TEXT
);

-- Large tables can opt out
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    username TEXT
) WITH (COLOCATION = false);

Query Optimization

Use Prepared Statements

Prepared statements improve performance and prevent SQL injection:
String sql = "SELECT * FROM users WHERE email = ?";
PreparedStatement pstmt = conn.prepareStatement(sql);
pstmt.setString(1, email);
ResultSet rs = pstmt.executeQuery();

Optimize Batch Operations

Batch multiple operations to reduce round-trips:
-- Instead of multiple inserts
INSERT INTO users (username, email) VALUES ('user1', '[email protected]');
INSERT INTO users (username, email) VALUES ('user2', '[email protected]');

-- Use single batch insert
INSERT INTO users (username, email) VALUES
    ('user1', '[email protected]'),
    ('user2', '[email protected]'),
    ('user3', '[email protected]');

Use RETURNING Clause

Combine insert/update with select to reduce round-trips:
-- Single round-trip instead of two
INSERT INTO users (username, email) 
VALUES ('john_doe', '[email protected]')
RETURNING id, created_at;

-- Update and return in one operation
UPDATE accounts 
SET balance = balance - 100 
WHERE account_id = 123
RETURNING balance;

Transaction Best Practices

Keep Transactions Short

-- Bad: Long-running transaction
BEGIN;
  SELECT * FROM large_table;  -- Expensive operation
  -- ... complex processing ...
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

-- Good: Short transaction
-- Do expensive operations outside transaction
SELECT * FROM large_table;
-- ... process data ...

BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;

Use Appropriate Isolation Levels

-- Default: READ COMMITTED (best for most cases)
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- For stronger consistency
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- For strongest consistency (use sparingly)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Handle Serialization Errors

import psycopg2
from psycopg2 import extensions

def execute_transaction(func, max_retries=3):
    for attempt in range(max_retries):
        try:
            conn = get_connection()
            conn.set_isolation_level(
                extensions.ISOLATION_LEVEL_SERIALIZABLE
            )
            result = func(conn)
            conn.commit()
            return result
        except psycopg2.extensions.TransactionRollbackError:
            if attempt < max_retries - 1:
                continue
            raise
        finally:
            conn.close()

Performance Monitoring

Use EXPLAIN to Analyze Queries

-- Analyze query execution plan
EXPLAIN (ANALYZE, COSTS, VERBOSE) 
SELECT u.username, COUNT(o.id) 
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.username;

-- Check if index is being used
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';

Monitor Slow Queries

Enable slow query logging:
-- Set log threshold (in milliseconds)
SET log_min_duration_statement = 1000;

-- Log all statements
SET log_statement = 'all';

Multi-Region Best Practices

Use Tablespaces for Geo-Partitioning

-- Create tablespace for specific region
CREATE TABLESPACE us_west 
WITH (replica_placement='{"num_replicas": 3, "placement_blocks": 
  [{"cloud": "aws", "region": "us-west-2", "zone": "us-west-2a", "min_num_replicas": 1}]
}');

-- Partition table by geography
CREATE TABLE users (
    user_id UUID,
    region TEXT,
    username TEXT,
    PRIMARY KEY (user_id, region)
) PARTITION BY LIST (region);

CREATE TABLE users_us_west PARTITION OF users
    FOR VALUES IN ('us-west') TABLESPACE us_west;

Leverage Follower Reads

Reduce latency for read-heavy workloads:
-- Enable follower reads for current session
SET yb_read_from_followers = true;

-- Set staleness tolerance (in milliseconds)
SET yb_follower_read_staleness_ms = 10000;  -- 10 seconds

-- Query will use nearest replica
SELECT * FROM users WHERE user_id = 'uuid';

Security Best Practices

Use SSL/TLS Connections

-- Connection with SSL
postgresql://user:password@host:5433/database?sslmode=verify-full&sslrootcert=/path/to/root.crt

Implement Row-Level Security

-- Enable row-level security
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

-- Create policy
CREATE POLICY user_documents ON documents
    FOR ALL
    TO app_user
    USING (user_id = current_setting('app.user_id')::INTEGER);

-- Set user context
SET app.user_id = 123;

-- User only sees their documents
SELECT * FROM documents;  -- Filtered by policy

Summary Checklist

Use connection pooling with appropriate sizing
Implement retry logic with exponential backoff
Design primary keys for even data distribution
Create indexes for frequently queried columns
Use covering indexes to avoid table lookups
Keep transactions short and focused
Use prepared statements for security and performance
Batch operations to reduce round-trips
Monitor query performance with EXPLAIN
Use Smart Drivers for automatic load balancing
Enable follower reads for read-heavy workloads
Implement proper error handling and retries

Next Steps

Data Modeling

Learn advanced data modeling techniques

Performance Tuning

Monitor and optimize performance

Global Applications

Build multi-region applications

Smart Drivers

Explore smart driver features

Build docs developers (and LLMs) love