Overview
Column families provide a way to logically partition a RocksDB database. Each column family can have its own options and configuration, while sharing the same Write-Ahead Log (WAL). This enables efficient multi-tenancy and data segregation within a single database.
ColumnFamilyDescriptor
Describes a column family with its name and options.
struct ColumnFamilyDescriptor {
std ::string name;
ColumnFamilyOptions options;
ColumnFamilyDescriptor ();
ColumnFamilyDescriptor ( const std :: string & _name ,
const ColumnFamilyOptions & _options );
};
Name of the column family. The default column family name is stored in kDefaultColumnFamilyName.
Configuration options specific to this column family
// Create descriptor for default column family
ColumnFamilyDescriptor default_cf (
kDefaultColumnFamilyName ,
ColumnFamilyOptions ());
// Create descriptor for custom column family
ColumnFamilyOptions cf_options;
cf_options . write_buffer_size = 128 << 20 ; // 128MB
ColumnFamilyDescriptor user_data_cf ( "user_data" , cf_options );
ColumnFamilyHandle
Handle to access a specific column family within the database.
class ColumnFamilyHandle {
public:
virtual ~ColumnFamilyHandle ();
virtual const std :: string & GetName () const = 0 ;
virtual uint32_t GetID () const = 0 ;
virtual Status GetDescriptor ( ColumnFamilyDescriptor * desc ) = 0 ;
virtual const Comparator * GetComparator () const = 0 ;
};
GetName
virtual const std :: string & GetName () const = 0 ;
Returns the name of the column family associated with this handle
GetID
virtual uint32_t GetID () const = 0 ;
Returns the ID of the column family
GetDescriptor
virtual Status GetDescriptor ( ColumnFamilyDescriptor * desc ) = 0 ;
Output parameter filled with up-to-date descriptor
This call may lock and release DB mutex to access up-to-date CF options. Pointer-typed options cannot be referenced longer than the original options exist.
GetComparator
virtual const Comparator * GetComparator () const = 0 ;
Returns the comparator of the column family
Creating Column Families
CreateColumnFamily
Create a single column family.
virtual Status CreateColumnFamily ( const ColumnFamilyOptions & options ,
const std :: string & column_family_name ,
ColumnFamilyHandle ** handle );
options
const ColumnFamilyOptions&
Options for the new column family
Name of the column family to create
Output parameter for the column family handle
Creating many column families one-by-one is not recommended due to quadratic overheads (e.g., writing a full OPTIONS file for all CFs after each creation). Use CreateColumnFamilies() or DB::Open() with create_missing_column_families=true instead.
ColumnFamilyOptions cf_options;
cf_options . write_buffer_size = 64 << 20 ;
ColumnFamilyHandle * cf_handle;
Status s = db -> CreateColumnFamily (cf_options, "new_cf" , & cf_handle);
if ( ! s . ok ()) {
// Handle error
}
CreateColumnFamilies (Same Options)
Bulk create column families with the same options.
virtual Status CreateColumnFamilies (
const ColumnFamilyOptions & options ,
const std :: vector < std :: string > & column_family_names ,
std :: vector < ColumnFamilyHandle * > * handles );
column_family_names
const std::vector<std::string>&
Names of column families to create
handles
std::vector<ColumnFamilyHandle*>*
Output vector for column family handles
In case of error, the request may succeed partially. The handles vector will contain handles for successfully created column families.
ColumnFamilyOptions cf_options;
std ::vector < std ::string > cf_names = { "cf1" , "cf2" , "cf3" };
std ::vector < ColumnFamilyHandle *> handles;
Status s = db -> CreateColumnFamilies (cf_options, cf_names, & handles);
if ( ! s . ok ()) {
LOG (ERROR) << "Created " << handles . size () << " out of "
<< cf_names . size () << " column families" ;
}
CreateColumnFamilies (Different Options)
Bulk create column families with individual options.
virtual Status CreateColumnFamilies (
const std :: vector < ColumnFamilyDescriptor > & column_families ,
std :: vector < ColumnFamilyHandle * > * handles );
column_families
const std::vector<ColumnFamilyDescriptor>&
Descriptors for each column family to create
std ::vector < ColumnFamilyDescriptor > column_families;
// Fast writes, less durable
ColumnFamilyOptions fast_opts;
fast_opts . write_buffer_size = 16 << 20 ;
column_families . push_back ( ColumnFamilyDescriptor ( "cache" , fast_opts));
// Slower writes, more durable
ColumnFamilyOptions durable_opts;
durable_opts . write_buffer_size = 128 << 20 ;
durable_opts . target_file_size_base = 256 << 20 ;
column_families . push_back ( ColumnFamilyDescriptor ( "persistent" , durable_opts));
std ::vector < ColumnFamilyHandle *> handles;
Status s = db -> CreateColumnFamilies (column_families, & handles);
Opening Database with Column Families
Open with All Column Families
static Status Open ( const DBOptions & db_options ,
const std :: string & name ,
const std :: vector < ColumnFamilyDescriptor > & column_families ,
std :: vector < ColumnFamilyHandle * > * handles ,
std :: unique_ptr < DB > * dbptr );
You must open ALL column families in the database. Use ListColumnFamilies() to get the list of existing column families.
// First, list existing column families
DBOptions db_options;
std ::vector < std ::string > cf_names;
Status s = DB :: ListColumnFamilies (db_options, "/path/to/db" , & cf_names);
if ( s . ok ()) {
// Build descriptors for all column families
std ::vector < ColumnFamilyDescriptor > column_families;
for ( const auto & name : cf_names) {
column_families . push_back ( ColumnFamilyDescriptor (
name, ColumnFamilyOptions ()));
}
// Open database
std ::vector < ColumnFamilyHandle *> handles;
std ::unique_ptr < DB > db;
s = DB :: Open (db_options, "/path/to/db" , column_families, & handles, & db);
if ( s . ok ()) {
// handles[i] corresponds to column_families[i]
}
} else if ( s . IsPathNotFound ()) {
// New database - create with default column family
std ::vector < ColumnFamilyDescriptor > column_families;
column_families . push_back ( ColumnFamilyDescriptor (
kDefaultColumnFamilyName, ColumnFamilyOptions ()));
db_options . create_if_missing = true ;
std ::vector < ColumnFamilyHandle *> handles;
std ::unique_ptr < DB > db;
s = DB :: Open (db_options, "/path/to/db" , column_families, & handles, & db);
}
ListColumnFamilies
Get list of all column families in a database.
static Status ListColumnFamilies ( const DBOptions & db_options ,
const std :: string & name ,
std :: vector < std :: string > * column_families );
Database options (primarily for env)
column_families
std::vector<std::string>*
Output vector filled with column family names (ordering is unspecified)
Deleting Column Families
DropColumnFamily
Mark a column family for deletion.
virtual Status DropColumnFamily ( ColumnFamilyHandle * column_family );
This only records a drop record in the manifest and prevents the column family from flushing and compacting. The column family is not fully removed until all handles are destroyed.
Status s = db -> DropColumnFamily (cf_handle);
if ( s . ok ()) {
// Column family marked for deletion
// Still need to call DestroyColumnFamilyHandle
}
DropColumnFamilies
Bulk drop multiple column families.
virtual Status DropColumnFamilies (
const std :: vector < ColumnFamilyHandle * > & column_families );
Request may succeed partially. Use ListColumnFamilies() to check the result.
DestroyColumnFamilyHandle
Release and deallocate a column family handle.
virtual Status DestroyColumnFamilyHandle ( ColumnFamilyHandle * column_family );
A column family is only fully removed once it is:
Dropped via DropColumnFamily()
All handles have been destroyed via DestroyColumnFamilyHandle()
You must call this before closing the DB (except for DefaultColumnFamily() handle).
// Proper cleanup sequence
db -> DropColumnFamily (cf_handle);
db -> DestroyColumnFamilyHandle (cf_handle);
// Now column family will be removed
Using Column Families
Reading and Writing
Most DB operations accept an optional ColumnFamilyHandle parameter.
Put Example
Get Example
Delete Example
// Write to specific column family
Status s = db -> Put ( WriteOptions (), cf_handle, "key" , "value" );
// Write to default column family
s = db -> Put ( WriteOptions (), "key" , "value" );
Atomic Writes Across Column Families
Use WriteBatch to write atomically across multiple column families.
WriteBatch batch;
batch . Put (cf_handle1, "key1" , "value1" );
batch . Put (cf_handle2, "key2" , "value2" );
batch . Delete (cf_handle1, "key3" );
Status s = db -> Write ( WriteOptions (), & batch);
// All operations succeed or all fail atomically
Iterating Column Families
Iterator Example
Multiple Iterators
// Create iterator for specific column family
Iterator * it = db -> NewIterator ( ReadOptions (), cf_handle);
for ( it -> SeekToFirst (); it -> Valid (); it -> Next ()) {
std ::cout << it -> key (). ToString () << ": "
<< it -> value (). ToString () << std ::endl;
}
delete it;
Column Family Options
Each column family can have independent configuration.
// Small writes, frequent compaction
ColumnFamilyOptions small_cf_opts;
small_cf_opts . write_buffer_size = 16 << 20 ; // 16MB
small_cf_opts . level0_file_num_compaction_trigger = 2 ;
// Large writes, less frequent compaction
ColumnFamilyOptions large_cf_opts;
large_cf_opts . write_buffer_size = 256 << 20 ; // 256MB
large_cf_opts . level0_file_num_compaction_trigger = 8 ;
large_cf_opts . target_file_size_base = 512 << 20 ;
ColumnFamilyHandle * small_cf;
ColumnFamilyHandle * large_cf;
db -> CreateColumnFamily (small_cf_opts, "small_cf" , & small_cf);
db -> CreateColumnFamily (large_cf_opts, "large_cf" , & large_cf);
Best Practices
When to Use Column Families
Good Use Cases
Multi-tenancy (one CF per tenant)
Different data types with different access patterns
Time-series data (one CF per time bucket)
Different durability/performance requirements
Avoid When
Very large number of column families (>100)
Frequently creating/deleting column families
All data has similar characteristics
Simple key prefixes would suffice
Shared WAL : All column families share the same Write-Ahead Log, so a slow flush in one CF can affect others.Compaction : Each column family has independent compaction, which can help or hurt depending on your workload.Memory : Each column family has its own memtable(s), so memory usage scales with number of CFs.
Common Patterns
Separation by Access Pattern
TTL-Based Separation
// Hot data - frequently accessed, kept in fast storage
ColumnFamilyOptions hot_opts;
hot_opts . OptimizeForPointLookup ( 1024 ); // 1GB cache
// Cold data - rarely accessed, can use more compression
ColumnFamilyOptions cold_opts;
cold_opts . compression = kZSTD;
cold_opts . bottommost_compression = kZSTD;
db -> CreateColumnFamily (hot_opts, "hot_data" , & hot_cf);
db -> CreateColumnFamily (cold_opts, "cold_data" , & cold_cf);
Default Column Family
Every database has a default column family that always exists.
extern const std ::string kDefaultColumnFamilyName; // "default"
// Get default column family handle
ColumnFamilyHandle * default_cf = db -> DefaultColumnFamily ();
// Most methods use default CF when handle is not specified
db -> Put ( WriteOptions (), "key" , "value" ); // Uses default CF
db -> Get ( ReadOptions (), "key" , & value); // Uses default CF
You cannot drop or destroy the default column family handle.