Skip to main content

Overview

Column families provide a way to logically partition a RocksDB database. Each column family can have its own options and configuration, while sharing the same Write-Ahead Log (WAL). This enables efficient multi-tenancy and data segregation within a single database.

ColumnFamilyDescriptor

Describes a column family with its name and options.
struct ColumnFamilyDescriptor {
  std::string name;
  ColumnFamilyOptions options;
  
  ColumnFamilyDescriptor();
  ColumnFamilyDescriptor(const std::string& _name,
                         const ColumnFamilyOptions& _options);
};
name
std::string
Name of the column family. The default column family name is stored in kDefaultColumnFamilyName.
options
ColumnFamilyOptions
Configuration options specific to this column family
// Create descriptor for default column family
ColumnFamilyDescriptor default_cf(
    kDefaultColumnFamilyName, 
    ColumnFamilyOptions());

// Create descriptor for custom column family
ColumnFamilyOptions cf_options;
cf_options.write_buffer_size = 128 << 20;  // 128MB
ColumnFamilyDescriptor user_data_cf("user_data", cf_options);

ColumnFamilyHandle

Handle to access a specific column family within the database.
class ColumnFamilyHandle {
 public:
  virtual ~ColumnFamilyHandle();
  
  virtual const std::string& GetName() const = 0;
  virtual uint32_t GetID() const = 0;
  virtual Status GetDescriptor(ColumnFamilyDescriptor* desc) = 0;
  virtual const Comparator* GetComparator() const = 0;
};

GetName

virtual const std::string& GetName() const = 0;
name
const std::string&
Returns the name of the column family associated with this handle

GetID

virtual uint32_t GetID() const = 0;
id
uint32_t
Returns the ID of the column family

GetDescriptor

virtual Status GetDescriptor(ColumnFamilyDescriptor* desc) = 0;
desc
ColumnFamilyDescriptor*
Output parameter filled with up-to-date descriptor
This call may lock and release DB mutex to access up-to-date CF options. Pointer-typed options cannot be referenced longer than the original options exist.

GetComparator

virtual const Comparator* GetComparator() const = 0;
comparator
const Comparator*
Returns the comparator of the column family

Creating Column Families

CreateColumnFamily

Create a single column family.
virtual Status CreateColumnFamily(const ColumnFamilyOptions& options,
                                  const std::string& column_family_name,
                                  ColumnFamilyHandle** handle);
options
const ColumnFamilyOptions&
Options for the new column family
column_family_name
const std::string&
Name of the column family to create
handle
ColumnFamilyHandle**
Output parameter for the column family handle
Creating many column families one-by-one is not recommended due to quadratic overheads (e.g., writing a full OPTIONS file for all CFs after each creation). Use CreateColumnFamilies() or DB::Open() with create_missing_column_families=true instead.
ColumnFamilyOptions cf_options;
cf_options.write_buffer_size = 64 << 20;

ColumnFamilyHandle* cf_handle;
Status s = db->CreateColumnFamily(cf_options, "new_cf", &cf_handle);
if (!s.ok()) {
  // Handle error
}

CreateColumnFamilies (Same Options)

Bulk create column families with the same options.
virtual Status CreateColumnFamilies(
    const ColumnFamilyOptions& options,
    const std::vector<std::string>& column_family_names,
    std::vector<ColumnFamilyHandle*>* handles);
column_family_names
const std::vector<std::string>&
Names of column families to create
handles
std::vector<ColumnFamilyHandle*>*
Output vector for column family handles
In case of error, the request may succeed partially. The handles vector will contain handles for successfully created column families.
ColumnFamilyOptions cf_options;
std::vector<std::string> cf_names = {"cf1", "cf2", "cf3"};
std::vector<ColumnFamilyHandle*> handles;

Status s = db->CreateColumnFamilies(cf_options, cf_names, &handles);
if (!s.ok()) {
  LOG(ERROR) << "Created " << handles.size() << " out of " 
             << cf_names.size() << " column families";
}

CreateColumnFamilies (Different Options)

Bulk create column families with individual options.
virtual Status CreateColumnFamilies(
    const std::vector<ColumnFamilyDescriptor>& column_families,
    std::vector<ColumnFamilyHandle*>* handles);
column_families
const std::vector<ColumnFamilyDescriptor>&
Descriptors for each column family to create
std::vector<ColumnFamilyDescriptor> column_families;

// Fast writes, less durable
ColumnFamilyOptions fast_opts;
fast_opts.write_buffer_size = 16 << 20;
column_families.push_back(ColumnFamilyDescriptor("cache", fast_opts));

// Slower writes, more durable
ColumnFamilyOptions durable_opts;
durable_opts.write_buffer_size = 128 << 20;
durable_opts.target_file_size_base = 256 << 20;
column_families.push_back(ColumnFamilyDescriptor("persistent", durable_opts));

std::vector<ColumnFamilyHandle*> handles;
Status s = db->CreateColumnFamilies(column_families, &handles);

Opening Database with Column Families

Open with All Column Families

static Status Open(const DBOptions& db_options,
                   const std::string& name,
                   const std::vector<ColumnFamilyDescriptor>& column_families,
                   std::vector<ColumnFamilyHandle*>* handles,
                   std::unique_ptr<DB>* dbptr);
You must open ALL column families in the database. Use ListColumnFamilies() to get the list of existing column families.
// First, list existing column families
DBOptions db_options;
std::vector<std::string> cf_names;
Status s = DB::ListColumnFamilies(db_options, "/path/to/db", &cf_names);

if (s.ok()) {
  // Build descriptors for all column families
  std::vector<ColumnFamilyDescriptor> column_families;
  for (const auto& name : cf_names) {
    column_families.push_back(ColumnFamilyDescriptor(
        name, ColumnFamilyOptions()));
  }
  
  // Open database
  std::vector<ColumnFamilyHandle*> handles;
  std::unique_ptr<DB> db;
  s = DB::Open(db_options, "/path/to/db", column_families, &handles, &db);
  
  if (s.ok()) {
    // handles[i] corresponds to column_families[i]
  }
} else if (s.IsPathNotFound()) {
  // New database - create with default column family
  std::vector<ColumnFamilyDescriptor> column_families;
  column_families.push_back(ColumnFamilyDescriptor(
      kDefaultColumnFamilyName, ColumnFamilyOptions()));
  
  db_options.create_if_missing = true;
  std::vector<ColumnFamilyHandle*> handles;
  std::unique_ptr<DB> db;
  s = DB::Open(db_options, "/path/to/db", column_families, &handles, &db);
}

ListColumnFamilies

Get list of all column families in a database.
static Status ListColumnFamilies(const DBOptions& db_options,
                                 const std::string& name,
                                 std::vector<std::string>* column_families);
db_options
const DBOptions&
Database options (primarily for env)
name
const std::string&
Path to the database
column_families
std::vector<std::string>*
Output vector filled with column family names (ordering is unspecified)

Deleting Column Families

DropColumnFamily

Mark a column family for deletion.
virtual Status DropColumnFamily(ColumnFamilyHandle* column_family);
This only records a drop record in the manifest and prevents the column family from flushing and compacting. The column family is not fully removed until all handles are destroyed.
Status s = db->DropColumnFamily(cf_handle);
if (s.ok()) {
  // Column family marked for deletion
  // Still need to call DestroyColumnFamilyHandle
}

DropColumnFamilies

Bulk drop multiple column families.
virtual Status DropColumnFamilies(
    const std::vector<ColumnFamilyHandle*>& column_families);
Request may succeed partially. Use ListColumnFamilies() to check the result.

DestroyColumnFamilyHandle

Release and deallocate a column family handle.
virtual Status DestroyColumnFamilyHandle(ColumnFamilyHandle* column_family);
A column family is only fully removed once it is:
  1. Dropped via DropColumnFamily()
  2. All handles have been destroyed via DestroyColumnFamilyHandle()
You must call this before closing the DB (except for DefaultColumnFamily() handle).
// Proper cleanup sequence
db->DropColumnFamily(cf_handle);
db->DestroyColumnFamilyHandle(cf_handle);
// Now column family will be removed

Using Column Families

Reading and Writing

Most DB operations accept an optional ColumnFamilyHandle parameter.
// Write to specific column family
Status s = db->Put(WriteOptions(), cf_handle, "key", "value");

// Write to default column family
s = db->Put(WriteOptions(), "key", "value");

Atomic Writes Across Column Families

Use WriteBatch to write atomically across multiple column families.
WriteBatch batch;
batch.Put(cf_handle1, "key1", "value1");
batch.Put(cf_handle2, "key2", "value2");
batch.Delete(cf_handle1, "key3");

Status s = db->Write(WriteOptions(), &batch);
// All operations succeed or all fail atomically

Iterating Column Families

// Create iterator for specific column family
Iterator* it = db->NewIterator(ReadOptions(), cf_handle);
for (it->SeekToFirst(); it->Valid(); it->Next()) {
  std::cout << it->key().ToString() << ": " 
            << it->value().ToString() << std::endl;
}
delete it;

Column Family Options

Each column family can have independent configuration.
// Small writes, frequent compaction
ColumnFamilyOptions small_cf_opts;
small_cf_opts.write_buffer_size = 16 << 20;  // 16MB
small_cf_opts.level0_file_num_compaction_trigger = 2;

// Large writes, less frequent compaction
ColumnFamilyOptions large_cf_opts;
large_cf_opts.write_buffer_size = 256 << 20;  // 256MB
large_cf_opts.level0_file_num_compaction_trigger = 8;
large_cf_opts.target_file_size_base = 512 << 20;

ColumnFamilyHandle* small_cf;
ColumnFamilyHandle* large_cf;
db->CreateColumnFamily(small_cf_opts, "small_cf", &small_cf);
db->CreateColumnFamily(large_cf_opts, "large_cf", &large_cf);

Best Practices

When to Use Column Families

Good Use Cases

  • Multi-tenancy (one CF per tenant)
  • Different data types with different access patterns
  • Time-series data (one CF per time bucket)
  • Different durability/performance requirements

Avoid When

  • Very large number of column families (>100)
  • Frequently creating/deleting column families
  • All data has similar characteristics
  • Simple key prefixes would suffice

Performance Considerations

Shared WAL: All column families share the same Write-Ahead Log, so a slow flush in one CF can affect others.Compaction: Each column family has independent compaction, which can help or hurt depending on your workload.Memory: Each column family has its own memtable(s), so memory usage scales with number of CFs.

Common Patterns

// Hot data - frequently accessed, kept in fast storage
ColumnFamilyOptions hot_opts;
hot_opts.OptimizeForPointLookup(1024);  // 1GB cache

// Cold data - rarely accessed, can use more compression
ColumnFamilyOptions cold_opts;
cold_opts.compression = kZSTD;
cold_opts.bottommost_compression = kZSTD;

db->CreateColumnFamily(hot_opts, "hot_data", &hot_cf);
db->CreateColumnFamily(cold_opts, "cold_data", &cold_cf);

Default Column Family

Every database has a default column family that always exists.
extern const std::string kDefaultColumnFamilyName;  // "default"
// Get default column family handle
ColumnFamilyHandle* default_cf = db->DefaultColumnFamily();

// Most methods use default CF when handle is not specified
db->Put(WriteOptions(), "key", "value");  // Uses default CF
db->Get(ReadOptions(), "key", &value);    // Uses default CF
You cannot drop or destroy the default column family handle.

Build docs developers (and LLMs) love