Skip to main content
Stress testing pushes your system beyond normal operating capacity to identify breaking points, observe how it fails, and test recovery mechanisms.

Purpose

Stress tests help you:
  • Identify system breaking points and maximum capacity
  • Observe how the system fails under extreme load
  • Test system recovery after failure
  • Find memory leaks and resource exhaustion issues
  • Validate that the system degrades gracefully
Stress tests will likely cause errors and failures. This is intentional - the goal is to understand failure modes and breaking points.

Configuration Pattern

Stress tests ramp up beyond normal capacity:
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },  // Below normal load
    { duration: '5m', target: 100 },
    { duration: '2m', target: 200 },  // Normal load
    { duration: '5m', target: 200 },
    { duration: '2m', target: 300 },  // Around breaking point
    { duration: '5m', target: 300 },
    { duration: '2m', target: 400 },  // Beyond breaking point
    { duration: '5m', target: 400 },
    { duration: '10m', target: 0 },   // Recovery
  ],
};

export default function() {
  const res = http.get('https://quickpizza.grafana.com');
  check(res, {
    'status is 200': (r) => r.status === 200,
  });
  sleep(1);
}

Using the Ramping VUs Executor

The ramping-vus executor is ideal for stress testing:
export const options = {
  scenarios: {
    stress: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 100 },
        { duration: '5m', target: 100 },
        { duration: '2m', target: 200 },
        { duration: '5m', target: 200 },
        { duration: '2m', target: 300 },
        { duration: '5m', target: 300 },
        { duration: '2m', target: 400 },
        { duration: '5m', target: 400 },
        { duration: '10m', target: 0 },
      ],
      gracefulRampDown: '30s',
    },
  },
};

Ramp Up

Gradually increase to breaking point

Peak Stress

Maintain extreme load

Recovery

Monitor system recovery

Stress Test Stages

1

Baseline Load

Start below normal operating capacity to establish a baseline.
{ duration: '2m', target: 100 },
{ duration: '5m', target: 100 },
2

Normal Capacity

Reach expected peak load to verify normal operation.
{ duration: '2m', target: 200 },
{ duration: '5m', target: 200 },
3

Stress Zone

Push beyond normal capacity to find breaking points.
{ duration: '2m', target: 300 },
{ duration: '5m', target: 300 },
{ duration: '2m', target: 400 },
{ duration: '5m', target: 400 },
4

Recovery Period

Ramp down and observe how the system recovers.
{ duration: '10m', target: 0 },

Advanced Stress Testing

Multi-Stage Stress Pattern

Test multiple stress levels:
import http from 'k6/http';
import { check } from 'k6';

export const options = {
  stages: [
    // Warm up
    { duration: '1m', target: 50 },
    
    // Stress level 1: 150% of normal
    { duration: '3m', target: 150 },
    { duration: '5m', target: 150 },
    
    // Stress level 2: 200% of normal
    { duration: '3m', target: 200 },
    { duration: '5m', target: 200 },
    
    // Stress level 3: 300% of normal
    { duration: '3m', target: 300 },
    { duration: '5m', target: 300 },
    
    // Recovery
    { duration: '10m', target: 0 },
  ],
  thresholds: {
    // More lenient thresholds for stress tests
    http_req_duration: ['p(95)<2000'],
    http_req_failed: ['rate<0.1'], // Allow up to 10% errors
  },
};

export default function() {
  const res = http.get('https://quickpizza.grafana.com/api/pizza');
  check(res, { 'status is 200': (r) => r.status === 200 });
}
Stress test thresholds are typically more lenient than load tests since you expect the system to struggle under extreme conditions.

What to Monitor

System Metrics

  • Response times: When do they start degrading?
  • Error rates: At what load do errors appear?
  • Throughput: Where does it plateau?
  • Resource usage: CPU, memory, disk, network
  • Queue depths: Database connections, message queues

Breaking Point Indicators

import http from 'k6/http';
import { check, sleep } from 'k6';
import { Counter } from 'k6/metrics';

const errors = new Counter('errors');
const timeouts = new Counter('timeouts');

export const options = {
  stages: [
    { duration: '2m', target: 100 },
    { duration: '5m', target: 100 },
    { duration: '2m', target: 200 },
    { duration: '5m', target: 200 },
    { duration: '2m', target: 300 },
    { duration: '5m', target: 300 },
  ],
  thresholds: {
    errors: ['count<100'],
    timeouts: ['count<50'],
  },
};

export default function() {
  const res = http.get('https://quickpizza.grafana.com', {
    timeout: '10s',
  });
  
  if (res.status === 0) {
    timeouts.add(1);
  }
  
  if (res.status >= 400) {
    errors.add(1);
  }
  
  check(res, {
    'status is 200': (r) => r.status === 200,
  });
  
  sleep(1);
}
Use custom metrics to track specific failure modes like timeouts, connection errors, and server errors.

Best Practices

Gradual Stress Increase

Incremental Steps

Increase load in 25-50% increments to identify exact breaking points

Hold Periods

Maintain each stress level for 3-5 minutes to observe steady-state behavior

Recovery Testing

The recovery period is critical:
export const options = {
  stages: [
    // ... stress stages ...
    
    // Long recovery period to monitor
    { duration: '10m', target: 0 },
  ],
};
During recovery, monitor:
  • How quickly do response times return to normal?
  • Are there lingering errors or stuck processes?
  • Do queues drain properly?
  • Does memory get released?

Realistic Stress Scenarios

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },
    { duration: '5m', target: 200 },
    { duration: '2m', target: 300 },
    { duration: '5m', target: 300 },
    { duration: '10m', target: 0 },
  ],
};

export default function() {
  // Mix of read and write operations
  http.get('https://quickpizza.grafana.com/api/pizza');
  sleep(0.5);
  
  http.post('https://quickpizza.grafana.com/api/cart', JSON.stringify({
    pizzaId: 1,
    quantity: 2,
  }), {
    headers: { 'Content-Type': 'application/json' },
  });
  
  sleep(1);
}

When to Use

  • Capacity planning: Determine absolute maximum capacity
  • Failure mode analysis: Understand how the system fails
  • Auto-scaling validation: Test that scaling mechanisms work
  • Resource limits: Identify resource bottlenecks
  • Pre-production: Before major releases or traffic events

Common Findings

Expected Behaviors

  • Graceful degradation: System slows but doesn’t crash
  • Error handling: Meaningful error messages
  • Resource limits: Clear capacity boundaries
  • Recovery: System returns to normal after stress

Red Flags

Watch for these serious issues:
  • Complete system crashes
  • Cascading failures across services
  • Memory leaks that persist after recovery
  • Data corruption or inconsistency
  • Inability to recover without manual intervention

Analysis Tips

Identify your breaking point by analyzing:
  1. Response time curve: Where does p(95) exceed acceptable limits?
  2. Error rate: When do errors start appearing?
  3. Throughput plateau: Where does requests/sec stop increasing?
  4. Resource exhaustion: When do CPU/memory/connections max out?
export const options = {
  thresholds: {
    http_req_duration: [
      'p(50)<500',   // Median should be fast
      'p(95)<2000',  // 95th percentile can be slower
      'p(99)<5000',  // 99th percentile - stress conditions
    ],
    http_req_failed: ['rate<0.1'], // Up to 10% errors acceptable
  },
};

Build docs developers (and LLMs) love