Stress Testing

Stress testing pushes your system beyond normal operating capacity to identify breaking points, observe how it fails, and test recovery mechanisms.

Purpose

Stress tests help you:

Identify system breaking points and maximum capacity
Observe how the system fails under extreme load
Test system recovery after failure
Find memory leaks and resource exhaustion issues
Validate that the system degrades gracefully

Stress tests will likely cause errors and failures. This is intentional - the goal is to understand failure modes and breaking points.

Configuration Pattern

Stress tests ramp up beyond normal capacity:

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },  // Below normal load
    { duration: '5m', target: 100 },
    { duration: '2m', target: 200 },  // Normal load
    { duration: '5m', target: 200 },
    { duration: '2m', target: 300 },  // Around breaking point
    { duration: '5m', target: 300 },
    { duration: '2m', target: 400 },  // Beyond breaking point
    { duration: '5m', target: 400 },
    { duration: '10m', target: 0 },   // Recovery
  ],
};

export default function() {
  const res = http.get('https://quickpizza.grafana.com');
  check(res, {
    'status is 200': (r) => r.status === 200,
  });
  sleep(1);
}

Using the Ramping VUs Executor

The ramping-vus executor is ideal for stress testing:

export const options = {
  scenarios: {
    stress: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 100 },
        { duration: '5m', target: 100 },
        { duration: '2m', target: 200 },
        { duration: '5m', target: 200 },
        { duration: '2m', target: 300 },
        { duration: '5m', target: 300 },
        { duration: '2m', target: 400 },
        { duration: '5m', target: 400 },
        { duration: '10m', target: 0 },
      ],
      gracefulRampDown: '30s',
    },
  },
};

Ramp Up

Gradually increase to breaking point

Peak Stress

Maintain extreme load

Recovery

Monitor system recovery

Stress Test Stages

Baseline Load

Start below normal operating capacity to establish a baseline.

{ duration: '2m', target: 100 },
{ duration: '5m', target: 100 },

Normal Capacity

Reach expected peak load to verify normal operation.

{ duration: '2m', target: 200 },
{ duration: '5m', target: 200 },

Stress Zone

Push beyond normal capacity to find breaking points.

{ duration: '2m', target: 300 },
{ duration: '5m', target: 300 },
{ duration: '2m', target: 400 },
{ duration: '5m', target: 400 },

Recovery Period

Ramp down and observe how the system recovers.

{ duration: '10m', target: 0 },

Advanced Stress Testing

Multi-Stage Stress Pattern

Test multiple stress levels:

import http from 'k6/http';
import { check } from 'k6';

export const options = {
  stages: [
    // Warm up
    { duration: '1m', target: 50 },
    
    // Stress level 1: 150% of normal
    { duration: '3m', target: 150 },
    { duration: '5m', target: 150 },
    
    // Stress level 2: 200% of normal
    { duration: '3m', target: 200 },
    { duration: '5m', target: 200 },
    
    // Stress level 3: 300% of normal
    { duration: '3m', target: 300 },
    { duration: '5m', target: 300 },
    
    // Recovery
    { duration: '10m', target: 0 },
  ],
  thresholds: {
    // More lenient thresholds for stress tests
    http_req_duration: ['p(95)<2000'],
    http_req_failed: ['rate<0.1'], // Allow up to 10% errors
  },
};

export default function() {
  const res = http.get('https://quickpizza.grafana.com/api/pizza');
  check(res, { 'status is 200': (r) => r.status === 200 });
}

Stress test thresholds are typically more lenient than load tests since you expect the system to struggle under extreme conditions.

What to Monitor

System Metrics

Response times: When do they start degrading?
Error rates: At what load do errors appear?
Throughput: Where does it plateau?
Resource usage: CPU, memory, disk, network
Queue depths: Database connections, message queues

Breaking Point Indicators

import http from 'k6/http';
import { check, sleep } from 'k6';
import { Counter } from 'k6/metrics';

const errors = new Counter('errors');
const timeouts = new Counter('timeouts');

export const options = {
  stages: [
    { duration: '2m', target: 100 },
    { duration: '5m', target: 100 },
    { duration: '2m', target: 200 },
    { duration: '5m', target: 200 },
    { duration: '2m', target: 300 },
    { duration: '5m', target: 300 },
  ],
  thresholds: {
    errors: ['count<100'],
    timeouts: ['count<50'],
  },
};

export default function() {
  const res = http.get('https://quickpizza.grafana.com', {
    timeout: '10s',
  });
  
  if (res.status === 0) {
    timeouts.add(1);
  }
  
  if (res.status >= 400) {
    errors.add(1);
  }
  
  check(res, {
    'status is 200': (r) => r.status === 200,
  });
  
  sleep(1);
}

Use custom metrics to track specific failure modes like timeouts, connection errors, and server errors.

Best Practices

Gradual Stress Increase

Incremental Steps

Increase load in 25-50% increments to identify exact breaking points

Hold Periods

Maintain each stress level for 3-5 minutes to observe steady-state behavior

Recovery Testing

The recovery period is critical:

export const options = {
  stages: [
    // ... stress stages ...
    
    // Long recovery period to monitor
    { duration: '10m', target: 0 },
  ],
};

During recovery, monitor:

How quickly do response times return to normal?
Are there lingering errors or stuck processes?
Do queues drain properly?
Does memory get released?

Realistic Stress Scenarios

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },
    { duration: '5m', target: 200 },
    { duration: '2m', target: 300 },
    { duration: '5m', target: 300 },
    { duration: '10m', target: 0 },
  ],
};

export default function() {
  // Mix of read and write operations
  http.get('https://quickpizza.grafana.com/api/pizza');
  sleep(0.5);
  
  http.post('https://quickpizza.grafana.com/api/cart', JSON.stringify({
    pizzaId: 1,
    quantity: 2,
  }), {
    headers: { 'Content-Type': 'application/json' },
  });
  
  sleep(1);
}

When to Use

Capacity planning: Determine absolute maximum capacity
Failure mode analysis: Understand how the system fails
Auto-scaling validation: Test that scaling mechanisms work
Resource limits: Identify resource bottlenecks
Pre-production: Before major releases or traffic events

Common Findings

Expected Behaviors

Graceful degradation: System slows but doesn’t crash
Error handling: Meaningful error messages
Resource limits: Clear capacity boundaries
Recovery: System returns to normal after stress

Red Flags

Watch for these serious issues:

Complete system crashes
Cascading failures across services
Memory leaks that persist after recovery
Data corruption or inconsistency
Inability to recover without manual intervention

Analysis Tips

Identify your breaking point by analyzing:

Response time curve: Where does p(95) exceed acceptable limits?
Error rate: When do errors start appearing?
Throughput plateau: Where does requests/sec stop increasing?
Resource exhaustion: When do CPU/memory/connections max out?

export const options = {
  thresholds: {
    http_req_duration: [
      'p(50)<500',   // Median should be fast
      'p(95)<2000',  // 95th percentile can be slower
      'p(99)<5000',  // 99th percentile - stress conditions
    ],
    http_req_failed: ['rate<0.1'], // Up to 10% errors acceptable
  },
};

Getting Started

Core Concepts

Writing Tests

Test Types

Protocol Support

Results & Visualization

Extensions

Stress Testing

Purpose

Configuration Pattern

Using the Ramping VUs Executor

Ramp Up

Peak Stress

Recovery

Stress Test Stages

Advanced Stress Testing

Multi-Stage Stress Pattern

What to Monitor

System Metrics

Breaking Point Indicators

Best Practices

Gradual Stress Increase

Incremental Steps

Hold Periods

Recovery Testing

Realistic Stress Scenarios

When to Use

Common Findings

Expected Behaviors

Red Flags

Analysis Tips

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Writing Tests

Test Types

Protocol Support

Results & Visualization

Extensions

​Purpose

​Configuration Pattern

​Using the Ramping VUs Executor

Ramp Up

Peak Stress

Recovery

​Stress Test Stages

​Advanced Stress Testing

​Multi-Stage Stress Pattern

​What to Monitor

​System Metrics

​Breaking Point Indicators

​Best Practices

​Gradual Stress Increase

Incremental Steps

Hold Periods

​Recovery Testing

​Realistic Stress Scenarios

​When to Use

​Common Findings

​Expected Behaviors

​Red Flags

​Analysis Tips

Build docs developers (and LLMs) love

Purpose

Configuration Pattern

Using the Ramping VUs Executor

Stress Test Stages

Advanced Stress Testing

Multi-Stage Stress Pattern

What to Monitor

System Metrics

Breaking Point Indicators

Best Practices

Gradual Stress Increase

Recovery Testing

Realistic Stress Scenarios

When to Use

Common Findings

Expected Behaviors

Red Flags

Analysis Tips