Best Practices

Token Efficiency

Progressive Disclosure with search_objects

The search_objects tool offers three detail levels to minimize token usage:

// Start with names only (minimal tokens)
search_objects({
  object_type: "table",
  detail_level: "names"
})

// Then get summary metadata when needed
search_objects({
  object_type: "table",
  pattern: "user%",
  detail_level: "summary"  // adds column_count, row_count, comments
})

// Finally, get full details only for specific objects
search_objects({
  object_type: "table",
  pattern: "users",
  schema: "public",
  detail_level: "full"  // includes all columns, indexes, definitions
})

Token Savings: Using names instead of full can reduce response size by 80-90% for large schemas.

Limit Result Sets

Always use the limit parameter to control result size:

search_objects({
  object_type: "column",
  pattern: "%id%",
  limit: 50  // default: 100, max: 1000
})

Multi-Database Organization

Named Sources in TOML

Organize databases with descriptive IDs and descriptions:

[[sources]]
id = "prod_analytics"
description = "Production analytics database (read-only)"
dsn = "postgres://readonly:***@prod.example.com:5432/analytics"
lazy = true  # Defer connection until first use

[[sources]]
id = "staging_api"
description = "Staging API database (full access)"
dsn = "postgres://api_user:***@staging.example.com:5432/api"

[[sources]]
id = "local_dev"
description = "Local development database"
dsn = "postgres://postgres:postgres@localhost:5432/myapp"

Consistent source_id Usage

When tools are registered per-source, use the source-specific tool names:

// For source "prod_analytics", the tool is named:
execute_sql_prod_analytics({ sql: "SELECT * FROM events LIMIT 10" })

// Default source (first in TOML) uses base names:
execute_sql({ sql: "SELECT version()" })

Tip: The first source in your TOML file becomes the default database.

Lazy Connections for Remote Databases

Use lazy = true for databases that aren’t always needed:

[[sources]]
id = "prod_reporting"
dsn = "postgres://user:***@prod-db.company.com:5432/reports"
lazy = true  # Only connect when first query is executed
ssh_host = "bastion.company.com"
ssh_user = "deploy"
ssh_key = "~/.ssh/id_ed25519"

Benefits:

Faster startup times
Reduced connection overhead for rarely-used databases
Delayed SSH tunnel establishment until needed

Read-Only Mode

When to Enable Read-Only

Configure per-tool readonly mode for:

Production databases - Prevent accidental modifications
Shared staging environments - Protect shared resources
Analytics/reporting databases - Enforce query-only access
Exploratory data analysis - Safe browsing without risk

[[sources]]
id = "prod_db"
dsn = "postgres://readonly:***@prod.example.com:5432/app"

[[tools]]
name = "execute_sql"
source = "prod_db"
readonly = true  # Only SELECT, SHOW, DESCRIBE, EXPLAIN allowed
max_rows = 1000

Allowed Read-Only Operations

From src/utils/allowed-keywords.ts:9-15:

PostgreSQL: SELECT, WITH, EXPLAIN, ANALYZE, SHOW
MySQL/MariaDB: SELECT, WITH, EXPLAIN, ANALYZE, SHOW, DESCRIBE, DESC
SQLite: SELECT, WITH, EXPLAIN, ANALYZE, PRAGMA
SQL Server: SELECT, WITH, EXPLAIN, SHOWPLAN

Query Limiting

Setting Appropriate max_rows Values

[[tools]]
name = "execute_sql"
source = "analytics_db"
max_rows = 1000  # Per-tool row limit

Recommended Limits:

Production queries: 100-1,000 rows (quick results)
Analytics/exploration: 1,000-10,000 rows (detailed analysis)
Reporting: 10,000+ rows (full datasets)
Testing: 10-100 rows (fast iteration)

How Limits Are Applied

From src/utils/sql-row-limiter.ts:126-144:

-- Original query
SELECT * FROM users WHERE active = true

-- With max_rows = 100 (PostgreSQL/MySQL/MariaDB/SQLite)
SELECT * FROM users WHERE active = true LIMIT 100

-- With max_rows = 100 (SQL Server)
SELECT TOP 100 * FROM users WHERE active = true

Smart Limiting:

Existing LIMIT clauses are respected (uses minimum of existing and configured)
Parameterized LIMIT clauses trigger subquery wrapping for safety
Only applies to SELECT queries

Custom Tools for Common Queries

Encapsulating Business Logic

Define reusable tools for frequently-used queries:

[[tools]]
name = "active_users"
description = "Get all active users with their last login"
source = "prod_db"
readonly = true
max_rows = 500
statement = """
  SELECT user_id, email, last_login_at, account_status
  FROM users
  WHERE account_status = 'active'
  ORDER BY last_login_at DESC
"""

[[tools]]
name = "user_by_email"
description = "Find user by email address"
source = "prod_db"
readonly = true
statement = """
  SELECT user_id, email, created_at, account_status
  FROM users
  WHERE email = $1
"""

[[tools.parameters]]
name = "email"
type = "string"
description = "User email address"
required = true

Benefits:

Consistent query patterns across team
Type-safe parameters
Built-in documentation via descriptions
Centralized query optimization

SSH Tunnel Best Practices

When to Use SSH Tunnels

Databases behind firewalls - Production databases in private networks
Bastion host architectures - Security requirement for database access
Multi-hop access - Jump through multiple security layers
Encrypted connections - Additional security layer for sensitive data

Key Management

Prefer key-based authentication over passwords:

[[sources]]
id = "prod_db"
dsn = "postgres://app:***@10.0.1.100:5432/app"
ssh_host = "bastion.company.com"
ssh_user = "deploy"
ssh_key = "~/.ssh/id_ed25519"  # Key-based (recommended)
# ssh_password = "***"  # Avoid password auth when possible

Passphrase-protected keys (recommended for production):

ssh_key = "~/.ssh/id_ed25519"
ssh_passphrase = "key_passphrase"  # Only if key is encrypted

SSH Keepalive Configuration

Prevent idle disconnections for long-running operations:

[[sources]]
id = "remote_db"
dsn = "postgres://user:***@10.0.1.100:5432/app"
ssh_host = "bastion.company.com"
ssh_user = "deploy"
ssh_key = "~/.ssh/id_ed25519"
ssh_keepalive_interval = 60  # Send keepalive every 60 seconds
ssh_keepalive_count_max = 3   # Disconnect after 3 missed responses

Default values: Keepalive is disabled by default (interval = 0).

ProxyJump for Multi-Hop Connections

Connect through multiple jump hosts:

[[sources]]
id = "internal_db"
dsn = "postgres://app:***@10.0.2.50:5432/app"
ssh_host = "internal-server.company.com"
ssh_user = "deploy"
ssh_key = "~/.ssh/id_ed25519"
# Jump through bastion, then jump2
ssh_proxy_jump = "bastion.company.com,[email protected]:2222"

From src/utils/ssh-tunnel.ts:70-138, connections are established sequentially through each jump host.

Connection Pooling and Timeouts

Connection Timeouts

Set appropriate connection timeouts for remote databases:

[[sources]]
id = "remote_db"
dsn = "postgres://user:***@remote-db.example.com:5432/app"
connection_timeout = 30  # Seconds to wait for initial connection

Recommended values:

Local databases: 5-10 seconds
Remote databases: 20-30 seconds
SSH tunnel connections: 30-60 seconds

Query Timeouts

Prevent runaway queries:

[[sources]]
id = "analytics_db"
dsn = "postgres://user:***@analytics.example.com:5432/warehouse"
query_timeout = 60  # Seconds per query (PostgreSQL, MySQL, MariaDB, SQL Server)

Note: SQLite does not support query timeouts at the protocol level.

Configuration Best Practices

TOML Over Environment Variables

Use TOML for multi-database setups:

# dbhub.toml - version controlled, shareable
[[sources]]
id = "prod_read_replica"
dsn = "postgres://readonly:${PROD_DB_PASSWORD}@prod-replica.example.com:5432/app"
readonly = true

[[sources]]
id = "staging"
dsn = "postgres://user:${STAGING_DB_PASSWORD}@staging.example.com:5432/app"

Use .env for single-database or local development:

# .env - local only, not committed
DSN=postgres://postgres:postgres@localhost:5432/myapp

Credential Management

Never commit secrets:

# .gitignore
.env
.env.local
dbhub.toml  # If it contains credentials

Use environment variable substitution (future feature) or:

Keep dbhub.toml.example in version control
Copy to dbhub.toml locally and add credentials
Use cloud secret managers for production

Testing with Workbench

Test Queries Before Using in MCP Client

The built-in workbench (--transport http --port 8080) allows:

Query validation - Test SQL syntax before sending to Claude
Performance testing - Measure query execution time
Result inspection - Verify output format
Tool testing - Execute custom tools interactively

# Start DBHub with workbench
npx @bytebase/dbhub --transport http --port 8080 --config dbhub.toml

# Open browser to http://localhost:8080
# Test queries in the UI before using in Claude Desktop

Request Tracing

Workbench includes request traces for debugging:

View all executed queries
See execution times
Inspect parameters and results
Debug source routing in multi-database setups

Schema Search Path (PostgreSQL)

Custom Schema Configuration

When working with non-public schemas:

[[sources]]
id = "app_db"
dsn = "postgres://user:***@localhost:5432/myapp"
search_path = "myschema,public"  # Comma-separated, first is default

From src/types/config.ts:54:

First schema in the list becomes the default for schema discovery
Subsequent schemas are included in the search path
Affects search_objects results and unqualified table references

Performance Tips

1. Use Specific Filters

// Good: Narrow scope
search_objects({
  object_type: "column",
  schema: "public",
  table: "users",
  pattern: "email"
})

// Avoid: Broad scope across all schemas
search_objects({
  object_type: "column",
  pattern: "%"  // Searches every column in every table
})

2. Leverage Detail Levels

// Step 1: Get table names (fast)
const tables = search_objects({
  object_type: "table",
  schema: "public",
  detail_level: "names"
})

// Step 2: Get full details only for tables of interest
const userTableDetails = search_objects({
  object_type: "table",
  schema: "public",
  pattern: "users",
  detail_level: "full"
})

3. Set Realistic Limits

[[tools]]
name = "execute_sql"
source = "large_warehouse"
max_rows = 100  # Start small, increase if needed

Get Started

Configuration

Database Connectors

MCP Tools

Workbench

Guides

​Token Efficiency

​Progressive Disclosure with search_objects

​Limit Result Sets

​Multi-Database Organization

​Named Sources in TOML

​Consistent source_id Usage

​Lazy Connections for Remote Databases

​Read-Only Mode

​When to Enable Read-Only

​Allowed Read-Only Operations

​Query Limiting

​Setting Appropriate max_rows Values

​How Limits Are Applied

​Custom Tools for Common Queries

​Encapsulating Business Logic

​SSH Tunnel Best Practices

​When to Use SSH Tunnels

​Key Management

​SSH Keepalive Configuration

​ProxyJump for Multi-Hop Connections

​Connection Pooling and Timeouts

​Connection Timeouts

​Query Timeouts

​Configuration Best Practices

​TOML Over Environment Variables

​Credential Management

​Testing with Workbench

​Test Queries Before Using in MCP Client

​Request Tracing

​Schema Search Path (PostgreSQL)

​Custom Schema Configuration

​Performance Tips

​1. Use Specific Filters

​2. Leverage Detail Levels

​3. Set Realistic Limits

​Security Checklist

Build docs developers (and LLMs) love