Performance Tuning

Tuning Vespa involves optimizing resource allocation, thread pools, caching, and query execution to maximize throughput and minimize latency.

Performance Fundamentals

Query Performance

Optimize search and ranking latency

Feed Performance

Maximize document indexing throughput

Resource Efficiency

Optimize CPU, memory, and disk usage

Query Performance Tuning

Thread Pool Configuration

Optimize container thread pools for query handling:

<container version="1.0" id="default">
  <search/>
  
  <!-- Configure search handler threads -->
  <handler id="com.yahoo.search.handler.SearchHandler">
    <binding>http://*/search/*</binding>
  </handler>
  
  <nodes>
    <!-- JVM heap tuning -->
    <jvm options="-Xms8g -Xmx8g -XX:+UseG1GC -XX:MaxGCPauseMillis=200"/>
  </nodes>
</container>

Content Node Tuning

Configure executor threads on content nodes:

<content version="1.0" id="my-content">
  <tuning>
    <searchnode>
      <!-- Match (search) executor threads -->
      <requestthreads>
        <count>8</count>        <!-- Number of search threads -->
        <persearch>2</persearch> <!-- Threads per search -->
      </requestthreads>
      
      <!-- Summary (docsum) threads -->
      <summary>
        <io>
          <threads>8</threads>   <!-- Summary fetch threads -->
        </io>
      </summary>
    </searchnode>
  </tuning>
  
  <nodes>
    <node hostalias="node1" distribution-key="0"/>
    <node hostalias="node2" distribution-key="1"/>
  </nodes>
</content>

Monitor Thread Pool Performance

// Executor metrics to watch (from SearchNodeMetrics.java)
CONTENT_PROTON_EXECUTOR_MATCH_QUEUESIZE      // Match queue depth
CONTENT_PROTON_EXECUTOR_MATCH_UTILIZATION    // Match thread utilization
CONTENT_PROTON_EXECUTOR_DOCSUM_QUEUESIZE     // Docsum queue depth
CONTENT_PROTON_EXECUTOR_DOCSUM_UTILIZATION   // Docsum utilization

// Threading service per document DB
CONTENT_PROTON_DOCUMENTDB_THREADING_SERVICE_MASTER_QUEUESIZE
CONTENT_PROTON_DOCUMENTDB_THREADING_SERVICE_INDEX_QUEUESIZE
CONTENT_PROTON_DOCUMENTDB_THREADING_SERVICE_SUMMARY_QUEUESIZE

Optimal thread count = Number of CPU cores. Start here and adjust based on utilization metrics.

Ranking Performance

Rank Profile Optimization

Use Phased Ranking

Optimize expensive ranking with two phases:

rank-profile optimized {
  first-phase {
    expression: bm25(title) + bm25(body)
  }
  
  second-phase {
    expression: xgboost("model.json")
    rerank-count: 100  # Only rerank top 100
  }
}

Monitor Ranking Metrics

// From SearchNodeMetrics.java
CONTENT_PROTON_DOCUMENTDB_MATCHING_DOCS_MATCHED   // First phase
CONTENT_PROTON_DOCUMENTDB_MATCHING_DOCS_RANKED    // First phase ranked
CONTENT_PROTON_DOCUMENTDB_MATCHING_DOCS_RERANKED  // Second phase

// Per rank profile
CONTENT_PROTON_DOCUMENTDB_MATCHING_RANK_PROFILE_QUERY_LATENCY
CONTENT_PROTON_DOCUMENTDB_MATCHING_RANK_PROFILE_RERANK_TIME

Optimize Match Phase

Limit first-phase matching for low-relevance documents:

rank-profile with-match-phase inherits default {
  match-phase {
    attribute: quality_score
    max-hits: 10000
    max-filter-coverage: 0.5
  }
}

Attribute vs Index Trade-offs

Fast-search Attributes

Make attributes searchable for better performance:

schema product {
  document product {
    field category type string {
      indexing: summary | attribute
      attribute: fast-search  # Enable fast searching
    }
    
    field price type int {
      indexing: summary | attribute
      attribute: fast-search
    }
  }
}

When to use fast-search:

Low cardinality fields (< 10,000 unique values)
Frequently used in filters or grouping
Need fast counting/aggregation

Memory impact:

// Monitor attribute memory
CONTENT_PROTON_DOCUMENTDB_ATTRIBUTE_MEMORY_USAGE_ALLOCATED_BYTES
CONTENT_PROTON_DOCUMENTDB_ATTRIBUTE_MEMORY_USAGE_USED_BYTES
CONTENT_PROTON_DOCUMENTDB_ATTRIBUTE_RESOURCE_USAGE_ADDRESS_SPACE

Feed Performance Tuning

Document Processing

Optimize feeding throughput:

<content version="1.0" id="my-content">
  <tuning>
    <searchnode>
      <!-- Flush tuning -->
      <index>
        <io>
          <write>directio</write>
        </io>
      </index>
      
      <!-- Memory index flush threshold -->
      <index>
        <maxflushed>2</maxflushed>
      </index>
    </searchnode>
  </tuning>
</content>

Feed Metrics

// From ContainerMetrics.java
FEED_OPERATIONS           // Total feed operations
FEED_LATENCY              // Feed latency
FEED_HTTP_REQUESTS        // Feed HTTP requests

HTTPAPI_NUM_PUTS          // Put operations
HTTPAPI_NUM_UPDATES       // Update operations
HTTPAPI_NUM_REMOVES       // Remove operations
HTTPAPI_LATENCY           // Operation latency

Batch Feeding

Optimize for high-throughput ingestion:

import requests
import json

def batch_feed(documents, batch_size=1000):
    """Feed documents in batches for optimal throughput"""
    url = "http://localhost:8080/document/v1/namespace/doctype/docid"
    
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i+batch_size]
        
        # Use async operations
        for doc in batch:
            requests.post(
                url,
                json=doc,
                params={'timeout': '180s'},  # Increase timeout
                headers={'Content-Type': 'application/json'}
            )

Feed rate limits: Monitor HTTPAPI_FAILED_TIMEOUT and CONTENT_PROTON_RESOURCE_USAGE_FEEDING_BLOCKED to detect throttling.

Memory Optimization

Content Node Memory

Configure Memory Limits

<content version="1.0" id="my-content">
  <tuning>
    <searchnode>
      <resource-limits>
        <memory>0.85</memory>  <!-- 85% threshold -->
      </resource-limits>
    </searchnode>
  </tuning>
</content>

Optimize Document Store Cache

<content version="1.0" id="my-content">
  <tuning>
    <searchnode>
      <summary>
        <store>
          <cache>
            <maxsize>1073741824</maxsize>  <!-- 1GB cache -->
            <maxsize-percent>5</maxsize-percent>
          </cache>
        </store>
      </summary>
    </searchnode>
  </tuning>
</content>

Monitor cache effectiveness:

CONTENT_PROTON_DOCUMENTDB_READY_DOCUMENT_STORE_CACHE_HIT_RATE
CONTENT_PROTON_DOCUMENTDB_READY_DOCUMENT_STORE_CACHE_MEMORY_USAGE
CONTENT_PROTON_DOCUMENTDB_READY_DOCUMENT_STORE_CACHE_ELEMENTS

Monitor Memory Pressure

# Check memory metrics
curl http://localhost:19050/state/v1/metrics | \
  jq '.metrics.values[] | select(.name | contains("memory"))'

JVM Tuning (Container Nodes)

<container version="1.0" id="default">
  <nodes>
    <jvm options="
      -Xms8g -Xmx8g
      -XX:+UseG1GC
      -XX:MaxGCPauseMillis=200
      -XX:InitiatingHeapOccupancyPercent=70
      -XX:+ParallelRefProcEnabled
      -XX:MaxTenuringThreshold=8
    "/>
  </nodes>
</container>

GC Metrics to Monitor:

JDISC_GC_COUNT      // GC frequency
JDISC_GC_MS         // GC pause time
MEM_HEAP_USED       // Heap utilization

GC Tuning Guidelines

Heap size: Set to 50-75% of container node RAM
GC algorithm: Use G1GC for heaps > 4GB
Pause target: 200ms is reasonable for most applications
Monitor: GC should be < 5% of total CPU time

Disk Performance

Storage Configuration

<content version="1.0" id="my-content">
  <tuning>
    <searchnode>
      <!-- Document store tuning -->
      <summary>
        <store>
          <!-- Compression -->
          <compression>
            <type>lz4</type>
            <level>6</level>  <!-- 1-9, higher = more compression -->
          </compression>
          
          <!-- File size -->
          <logstore>
            <maxfilesize>4000000000</maxfilesize>  <!-- 4GB files -->
          </logstore>
        </store>
      </summary>
      
      <!-- Index tuning -->
      <index>
        <io>
          <write>directio</write>  <!-- Bypass OS cache -->
          <read>directio</read>
        </io>
      </index>
    </searchnode>
  </tuning>
</content>

Disk Metrics

// Monitor disk usage (from SearchNodeMetrics.java)
CONTENT_PROTON_DOCUMENTDB_DISK_USAGE                      // Total disk usage
CONTENT_PROTON_DOCUMENTDB_READY_DOCUMENT_STORE_DISK_USAGE
CONTENT_PROTON_DOCUMENTDB_READY_DOCUMENT_STORE_DISK_BLOAT
CONTENT_PROTON_DOCUMENTDB_INDEX_DISK_USAGE

// Transaction log
CONTENT_PROTON_TRANSACTIONLOG_DISK_USAGE
CONTENT_PROTON_TRANSACTIONLOG_ENTRIES

Index Cache Tuning

<content version="1.0" id="my-content">
  <tuning>
    <searchnode>
      <!-- Posting list cache -->
      <diskindexcache>
        <size>2147483648</size>  <!-- 2GB -->
      </diskindexcache>
    </searchnode>
  </tuning>
</content>

Monitor cache performance:

CONTENT_PROTON_INDEX_CACHE_POSTINGLIST_HIT_RATE
CONTENT_PROTON_INDEX_CACHE_POSTINGLIST_MEMORY_USAGE
CONTENT_PROTON_INDEX_CACHE_BITVECTOR_HIT_RATE

Network Optimization

Connection Tuning

<container version="1.0" id="default">
  <http>
    <server id="default" port="8080">
      <config name="jdisc.http.connector">
        <maxConnectionLife>300.0</maxConnectionLife>
        <idleTimeout>60.0</idleTimeout>
      </config>
    </server>
  </http>
</container>

Connection Metrics

// From ContainerMetrics.java
SERVER_NUM_OPEN_CONNECTIONS      // Current open connections
SERVER_NUM_CONNECTIONS           // Total connections
SERVER_CONNECTIONS_OPEN_MAX      // Max concurrent connections
SERVER_CONNECTION_DURATION_MEAN  // Average connection duration

Query Timeout Configuration

<container version="1.0" id="default">
  <search>
    <chain id="default" inherits="vespa">
      <searcher id="com.yahoo.search.searchers.TimeoutSearcher">
        <config name="search.searchers.timeout">
          <timeout>5.0</timeout>  <!-- 5 second query timeout -->
        </config>
      </searcher>
    </chain>
  </search>
</container>

Monitor timeout-related metrics:

ERROR_TIMEOUT                                        // Timeout errors
CONTENT_PROTON_DOCUMENTDB_MATCHING_SOFT_DOOMED_QUERIES  // Soft timeouts
QUERY_TIMEOUT                                        // Configured timeout

Resource Prioritization

Feeding vs Queries

Balance resource allocation:

<content version="1.0" id="my-content">
  <tuning>
    <searchnode>
      <!-- Lower feeding priority during peak query times -->
      <feeding>
        <concurrency>0.5</concurrency>  <!-- 50% of capacity -->
      </feeding>
    </searchnode>
  </tuning>
</content>

Benchmarking and Testing

Load Testing

Establish Baseline

Measure performance before tuning:

# Use vespa-fbench or custom load generator
vespa-fbench -n 100 -q queries.txt -s 30 -c 10 localhost 8080

Apply Tuning

Make one configuration change at a time

Measure Impact

Compare metrics:

Query latency (p50, p95, p99)
Throughput (QPS)
Resource utilization
Error rates

Iterate

Continue tuning based on results

Performance Checklist

Query Performance

Optimize rank profiles (use phased ranking)
Configure appropriate thread pools
Enable fast-search on filter attributes
Tune match phase for large result sets
Monitor query latency percentiles

Feed Performance

Use batch feeding for bulk operations
Increase feed timeout for large documents
Monitor feeding blocked metric
Optimize flush intervals
Balance feeding vs query resource allocation

Memory

Configure memory limits appropriately
Tune document store cache size
Optimize JVM heap size
Monitor GC overhead
Use memory efficiently in rank profiles

Disk

Performance Anti-Patterns

Avoid these common mistakes:

Over-provisioning threads: More threads != better performance
Ignoring cache metrics: Poor cache hit rates waste resources
Synchronous feeding: Always use async operations for high throughput
No query timeouts: Can cause resource exhaustion
Tuning without measuring: Always benchmark before and after changes

Next Steps

Monitoring

Track performance metrics

Scaling

Scale resources when tuning isn’t enough

Troubleshooting

Debug performance issues

Scaling

Troubleshooting

⌘I

Get Started

Core Concepts

Search & Query

Data Operations

Machine Learning

Configuration & Deployment

Performance & Operations

​Performance Fundamentals

Query Performance

Feed Performance

Resource Efficiency

​Query Performance Tuning

​Thread Pool Configuration

​Content Node Tuning

​Monitor Thread Pool Performance

​Ranking Performance

​Rank Profile Optimization

​Attribute vs Index Trade-offs

​Feed Performance Tuning

​Document Processing

​Feed Metrics

​Batch Feeding

​Memory Optimization

​Content Node Memory

​JVM Tuning (Container Nodes)

​Disk Performance

​Storage Configuration

​Disk Metrics

​Index Cache Tuning

​Network Optimization

​Connection Tuning

​Connection Metrics

​Query Timeout Configuration

​Resource Prioritization

​Feeding vs Queries

​Benchmarking and Testing

​Load Testing

​Performance Checklist

​Performance Anti-Patterns

​Next Steps

Monitoring

Scaling

Troubleshooting

Build docs developers (and LLMs) love

Performance Fundamentals

Query Performance Tuning

Thread Pool Configuration

Content Node Tuning

Monitor Thread Pool Performance

Ranking Performance

Rank Profile Optimization

Attribute vs Index Trade-offs

Feed Performance Tuning

Document Processing

Feed Metrics

Batch Feeding

Memory Optimization

Content Node Memory

JVM Tuning (Container Nodes)

Disk Performance

Storage Configuration

Disk Metrics

Index Cache Tuning

Network Optimization

Connection Tuning

Connection Metrics

Query Timeout Configuration

Resource Prioritization

Feeding vs Queries

Benchmarking and Testing

Load Testing

Performance Checklist

Performance Anti-Patterns

Next Steps