Profiling and Benchmarking

Overview

MC-CPP includes dedicated profiling benchmarks to measure performance-critical operations. The test suite identifies hotspots and validates optimization strategies.

Benchmark Architecture

Benchmarks use std::chrono::high_resolution_clock for precise timing measurements:

using Clock = std::chrono::high_resolution_clock;

auto start = Clock::now();
// ... operation to measure
auto end = Clock::now();
double ms = std::chrono::duration<double, std::milli>(end - start).count();

Chunk Meshing Benchmarks

Implementation (`tests/perf_bench.cpp`)

Measures mesh generation performance for different chunk densities.

Helper: Block Type Setup

static BlockType* make_block_type(bool transparent, bool translucent = false) {
    auto* bt = new BlockType();
    bt->transparent = transparent;
    bt->glass = false;
    bt->translucent = translucent;
    bt->is_cube = true;
    bt->vertex_positions = std::vector<std::vector<float>>(6, std::vector<float>(12, 0.0f));
    bt->tex_coords = std::vector<std::vector<float>>(6, std::vector<float>(8, 0.0f));
    bt->tex_indices = std::vector<int>(6, 0);
    bt->shading_values = std::vector<std::vector<float>>(6, std::vector<float>(4, 1.0f));
    return bt;
}

Test World Builder

static std::unique_ptr<World> build_world_for_bench() {
    auto world = std::make_unique<World>(nullptr, nullptr, nullptr);
    world->block_types.resize(76);
    world->block_types[1] = make_block_type(false);  // opaque block
    world->block_types[10] = make_block_type(false); // light source
    return world;
}

Dense vs Sparse Filling

static void fill_chunk(Chunk* chunk, int block_id, bool sparse) {
    for (int x = 0; x < CHUNK_WIDTH; x++) {
        for (int z = 0; z < CHUNK_LENGTH; z++) {
            for (int y = 0; y < CHUNK_HEIGHT; y++) {
                bool place = !sparse || ((x + y + z) % 31 == 0);
                chunk->blocks[x][y][z] = place ? block_id : 0;
            }
        }
    }
}

Dense: All blocks filled (16×128×16 = 32,768 blocks) Sparse: ~3% filled using (x + y + z) % 31 == 0 pattern

Mesh Rebuilding Benchmark

double bench_chunk_meshing(bool sparse, int iterations) {
    auto world = build_world_for_bench();
    Chunk chunk(world.get(), {0, 0, 0});
    fill_chunk(&chunk, 1, sparse);

    auto rebuild_subchunks = [&]() {
        for (auto& kv : chunk.subchunks) {
            kv.second->update_mesh();
        }
    };

    rebuild_subchunks(); // warmup
    chunk.update_mesh();

    auto start = Clock::now();
    for (int i = 0; i < iterations; i++) {
        rebuild_subchunks();
        chunk.update_mesh();
    }
    auto end = Clock::now();
    return std::chrono::duration<double, std::milli>(end - start).count() / iterations;
}

Measures: Time to rebuild all subchunk meshes and merge into final chunk mesh Expected results:

Dense chunks: Higher mesh generation time (more visible faces)
Sparse chunks: Lower time (most blocks culled by neighbors)

Usage

int main() {
    double dense_mesh = bench_chunk_meshing(false, 10);
    double sparse_mesh = bench_chunk_meshing(true, 10);
    std::cout << "[meshing] dense chunk avg:  " << dense_mesh << " ms per rebuild\n";
    std::cout << "[meshing] sparse chunk avg: " << sparse_mesh << " ms per rebuild\n";
    return 0;
}

Block Operations Benchmark

Set/Remove Block Performance

double bench_set_block(int block_id, int iterations) {
    auto world = build_world_for_bench();
    std::vector<glm::ivec3> positions(iterations);
    for (int i = 0; i < iterations; i++) {
        int x = i % 16;
        int z = (i / 16) % 16;
        int y = 60 + (i % 8);
        positions[i] = {x, y, z};
    }

    auto start = Clock::now();
    for (int i = 0; i < iterations; i++) {
        world->set_block(positions[i], block_id);
        world->set_block(positions[i], 0); // remove to trigger reverse lighting
    }
    auto end = Clock::now();
    return std::chrono::duration<double, std::milli>(end - start).count();
}

Tests:

Opaque block placement/removal (block_id = 1)
Light source placement/removal (block_id = 10)

Measures:

Block data update time
Lighting queue population
Mesh invalidation overhead

Usage

const int set_iters = 500;
double opaque_ms = bench_set_block(1, set_iters);
double light_ms = bench_set_block(10, set_iters);

std::cout << "[set_block] " << set_iters << " opaque place/remove: " << opaque_ms << " ms total\n";
std::cout << "[set_block] " << set_iters << " light place/remove:  " << light_ms << " ms total\n";

Hotspot Analysis (`tests/perf_hotspots.cpp`)

Chunk Sorting Optimization

Compares distance calculation methods for chunk rendering order.

Method 1: Direct Distance with sqrt

double bench_sort_with_sqrt(int count, int iterations) {
    std::mt19937 rng(42);
    std::uniform_real_distribution<float> dist(-500.0f, 500.0f);

    glm::vec3 player(0.0f);
    std::vector<glm::vec3> base(count);
    for (auto& v : base) v = {dist(rng), dist(rng), dist(rng)};

    double total_ms = 0.0;
    for (int i = 0; i < iterations; i++) {
        auto data = base; // copy to avoid best-case sorts
        auto start = Clock::now();
        std::sort(data.begin(), data.end(), [&](const glm::vec3& a, const glm::vec3& b) {
            // Matches World::prepare_rendering comparator (glm::distance -> sqrt)
            return glm::distance(player, a) > glm::distance(player, b);
        });
        auto end = Clock::now();
        total_ms += std::chrono::duration<double, std::milli>(end - start).count();
    }
    return total_ms / iterations;
}

Performance issue: glm::distance computes sqrt(dx² + dy² + dz²) for every comparison

Method 2: Squared Distance (Optimized)

double bench_sort_with_dist2(int count, int iterations) {
    std::mt19937 rng(42);
    std::uniform_real_distribution<float> dist(-500.0f, 500.0f);

    glm::vec3 player(0.0f);
    std::vector<std::pair<float, glm::vec3>> base(count);
    for (auto& v : base) {
        glm::vec3 p{dist(rng), dist(rng), dist(rng)};
        float d2 = glm::length2(p - player);
        v = {d2, p};
    }

    double total_ms = 0.0;
    for (int i = 0; i < iterations; i++) {
        auto data = base;
        auto start = Clock::now();
        std::sort(data.begin(), data.end(), [](const auto& a, const auto& b) {
            return a.first > b.first; // sort by squared distance
        });
        auto end = Clock::now();
        total_ms += std::chrono::duration<double, std::milli>(end - start).count();
    }
    return total_ms / iterations;
}

Optimization: Pre-compute squared distances, skip expensive sqrt Ordering preserved: Since sqrt is monotonic, dist²(a) > dist²(b) ⟺ dist(a) > dist(b)

Mesh Merging Optimization

Compares vector concatenation strategies for merging subchunk meshes.

Without Reserve (Naive)

double bench_merge_no_reserve(const std::vector<DummySubchunk>& subs, int iterations) {
    double total_ms = 0.0;
    for (int i = 0; i < iterations; i++) {
        std::vector<float> mesh;
        std::vector<float> translucent;
        auto start = Clock::now();
        for (auto& sc : subs) {
            mesh.insert(mesh.end(), sc.mesh.begin(), sc.mesh.end());
            translucent.insert(translucent.end(), sc.translucent.begin(), sc.translucent.end());
        }
        auto end = Clock::now();
        total_ms += std::chrono::duration<double, std::milli>(end - start).count();
    }
    return total_ms / iterations;
}

Performance issue: Repeated vector reallocations as size grows

With Reserve (Optimized)

double bench_merge_with_reserve(const std::vector<DummySubchunk>& subs, int iterations) {
    double total_ms = 0.0;
    size_t mesh_total = 0;
    size_t translucent_total = 0;
    for (auto& sc : subs) {
        mesh_total += sc.mesh.size();
        translucent_total += sc.translucent.size();
    }

    for (int i = 0; i < iterations; i++) {
        std::vector<float> mesh;
        std::vector<float> translucent;
        mesh.reserve(mesh_total);
        translucent.reserve(translucent_total);

        auto start = Clock::now();
        for (auto& sc : subs) {
            mesh.insert(mesh.end(), sc.mesh.begin(), sc.mesh.end());
            translucent.insert(translucent.end(), sc.translucent.begin(), sc.translucent.end());
        }
        auto end = Clock::now();
        total_ms += std::chrono::duration<double, std::milli>(end - start).count();
    }
    return total_ms / iterations;
}

Optimization: Pre-allocate final size to avoid reallocations

Running Hotspot Tests

int main() {
    const int chunk_count = 2000;
    const int iterations = 6;

    double sqrt_sort = bench_sort_with_sqrt(chunk_count, iterations);
    double dist2_sort = bench_sort_with_dist2(chunk_count, iterations);

    std::cout << "Sort by glm::distance (sqrt comparator): " << sqrt_sort << " ms avg\n";
    std::cout << "Sort by squared distance (precomputed):  " << dist2_sort << " ms avg\n";

    // Simulate 32 subchunks with moderate mesh payload
    auto subs = make_subchunks(32, 12000, 3000);
    double merge_plain = bench_merge_no_reserve(subs, iterations);
    double merge_reserved = bench_merge_with_reserve(subs, iterations);

    std::cout << "Merge subchunk meshes without reserve:    " << merge_plain << " ms avg\n";
    std::cout << "Merge subchunk meshes with pre-reserve:   " << merge_reserved << " ms avg\n";
    return 0;
}

Building and Running Tests

Compilation

# From project root
mkdir -p build
cd build
cmake ..
make

# Run benchmarks
./tests/perf_bench
./tests/perf_hotspots

Expected Output

perf_bench.cpp:

[set_block] 500 opaque place/remove: X.XX ms total
[set_block] 500 light place/remove:  Y.YY ms total
[meshing] dense chunk avg:  A.AA ms per rebuild
[meshing] sparse chunk avg: B.BB ms per rebuild

perf_hotspots.cpp:

Sort by glm::distance (sqrt comparator): X.XX ms avg
Sort by squared distance (precomputed):  Y.YY ms avg
Merge subchunk meshes without reserve:    A.AA ms avg
Merge subchunk meshes with pre-reserve:   B.BB ms avg

Profiling Workflow

1. Identify Hotspot

Use external profilers for real-time analysis: Linux: perf or valgrind --tool=callgrind

perf record ./MC-CPP
perf report

Windows: Visual Studio Performance Profiler or Intel VTune Cross-platform: Tracy Profiler (requires instrumentation)

2. Write Isolated Benchmark

Extract suspected hotspot into standalone test:

// tests/perf_my_feature.cpp
#include <chrono>
#include <iostream>

using Clock = std::chrono::high_resolution_clock;

double bench_my_operation(int iterations) {
    auto start = Clock::now();
    for (int i = 0; i < iterations; i++) {
        // operation to test
    }
    auto end = Clock::now();
    return std::chrono::duration<double, std::milli>(end - start).count();
}

int main() {
    double time_ms = bench_my_operation(1000);
    std::cout << "Average: " << (time_ms / 1000.0) << " ms per operation\n";
    return 0;
}

3. Test Optimization

Compare baseline vs optimized implementation:

double baseline = bench_original(iterations);
double optimized = bench_improved(iterations);
double speedup = baseline / optimized;

std::cout << "Baseline:  " << baseline << " ms\n";
std::cout << "Optimized: " << optimized << " ms\n";
std::cout << "Speedup:   " << speedup << "x\n";

4. Validate in Production

Confirm improvement in actual gameplay:

Monitor frame times during chunk loading
Test with various RENDER_DISTANCE settings
Verify no visual regressions

Performance Metrics

Target Frame Budget (60 FPS)

Total frame time: 16.67 ms
Chunk mesh updates: < 2 ms
Lighting propagation: < 1 ms
Rendering: < 10 ms
Input/physics: < 2 ms
Overhead: < 2 ms

Bottleneck Identification

Common symptoms: CPU-bound:

High frame time with low GPU utilization
Hotspots: mesh generation, lighting BFS, sorting
Solution: Reduce CHUNK_UPDATES, LIGHT_STEPS_PER_TICK

GPU-bound:

GPU at 100% utilization
Hotspots: fragment shader complexity, overdraw
Solution: Reduce RENDER_DISTANCE, disable SMOOTH_LIGHTING, lower ANTIALIASING

Memory-bound:

Stuttering during chunk load/unload
Hotspots: texture uploads, VBO uploads
Solution: Reduce RENDER_DISTANCE, optimize atlas packing

Adding New Benchmarks

Template

#include <chrono>
#include <iostream>

using Clock = std::chrono::high_resolution_clock;

double benchmark_operation(int iterations) {
    // Setup
    auto data = prepare_test_data();
    
    // Warmup (avoid cold cache effects)
    perform_operation(data);
    
    // Measure
    auto start = Clock::now();
    for (int i = 0; i < iterations; i++) {
        perform_operation(data);
    }
    auto end = Clock::now();
    
    // Return average time per iteration
    return std::chrono::duration<double, std::milli>(end - start).count() / iterations;
}

int main() {
    const int iterations = 100;
    double avg_ms = benchmark_operation(iterations);
    std::cout << "Average: " << avg_ms << " ms\n";
    return 0;
}

Best Practices

Warmup runs: Execute operation once before timing to warm caches
Multiple iterations: Average across many runs to reduce noise
Representative data: Use realistic data sizes and distributions
Isolated testing: Minimize external dependencies
Consistent environment: Close other applications, disable CPU throttling
Statistical validation: Report min/max/stddev for stability analysis

Getting Started

Architecture

Rendering

World & Chunks

Entities & Physics

Persistence

Performance

​Overview

​Benchmark Architecture

​Chunk Meshing Benchmarks

​Implementation (tests/perf_bench.cpp)

​Helper: Block Type Setup

​Test World Builder

​Dense vs Sparse Filling

​Mesh Rebuilding Benchmark

​Usage

​Block Operations Benchmark

​Set/Remove Block Performance

​Usage

​Hotspot Analysis (tests/perf_hotspots.cpp)

​Chunk Sorting Optimization

​Method 1: Direct Distance with sqrt

​Method 2: Squared Distance (Optimized)

​Mesh Merging Optimization

​Without Reserve (Naive)

​With Reserve (Optimized)

​Running Hotspot Tests

​Building and Running Tests

​Compilation

​Expected Output

​Profiling Workflow

​1. Identify Hotspot

​2. Write Isolated Benchmark

​3. Test Optimization

​4. Validate in Production

​Performance Metrics

​Target Frame Budget (60 FPS)

​Bottleneck Identification

​Adding New Benchmarks

​Template

​Best Practices

Build docs developers (and LLMs) love

Overview

Benchmark Architecture

Chunk Meshing Benchmarks

Implementation (`tests/perf_bench.cpp`)

Helper: Block Type Setup

Test World Builder

Dense vs Sparse Filling

Mesh Rebuilding Benchmark

Usage

Block Operations Benchmark

Set/Remove Block Performance

Usage

Hotspot Analysis (`tests/perf_hotspots.cpp`)

Chunk Sorting Optimization

Method 1: Direct Distance with sqrt

Method 2: Squared Distance (Optimized)

Mesh Merging Optimization

Without Reserve (Naive)

With Reserve (Optimized)

Running Hotspot Tests

Building and Running Tests

Compilation

Expected Output

Profiling Workflow

1. Identify Hotspot

2. Write Isolated Benchmark

3. Test Optimization

4. Validate in Production

Performance Metrics

Target Frame Budget (60 FPS)

Bottleneck Identification

Adding New Benchmarks

Template

Best Practices