Stress testing pushes your system beyond normal operating capacity to identify breaking points, observe how it fails, and test recovery mechanisms.
Purpose
Stress tests help you:
Identify system breaking points and maximum capacity
Observe how the system fails under extreme load
Test system recovery after failure
Find memory leaks and resource exhaustion issues
Validate that the system degrades gracefully
Stress tests will likely cause errors and failures. This is intentional - the goal is to understand failure modes and breaking points.
Configuration Pattern
Stress tests ramp up beyond normal capacity:
import http from 'k6/http' ;
import { check , sleep } from 'k6' ;
export const options = {
stages: [
{ duration: '2m' , target: 100 }, // Below normal load
{ duration: '5m' , target: 100 },
{ duration: '2m' , target: 200 }, // Normal load
{ duration: '5m' , target: 200 },
{ duration: '2m' , target: 300 }, // Around breaking point
{ duration: '5m' , target: 300 },
{ duration: '2m' , target: 400 }, // Beyond breaking point
{ duration: '5m' , target: 400 },
{ duration: '10m' , target: 0 }, // Recovery
],
};
export default function () {
const res = http . get ( 'https://quickpizza.grafana.com' );
check ( res , {
'status is 200' : ( r ) => r . status === 200 ,
});
sleep ( 1 );
}
Using the Ramping VUs Executor
The ramping-vus executor is ideal for stress testing:
export const options = {
scenarios: {
stress: {
executor: 'ramping-vus' ,
startVUs: 0 ,
stages: [
{ duration: '2m' , target: 100 },
{ duration: '5m' , target: 100 },
{ duration: '2m' , target: 200 },
{ duration: '5m' , target: 200 },
{ duration: '2m' , target: 300 },
{ duration: '5m' , target: 300 },
{ duration: '2m' , target: 400 },
{ duration: '5m' , target: 400 },
{ duration: '10m' , target: 0 },
],
gracefulRampDown: '30s' ,
},
},
};
Ramp Up Gradually increase to breaking point
Peak Stress Maintain extreme load
Recovery Monitor system recovery
Stress Test Stages
Baseline Load
Start below normal operating capacity to establish a baseline. { duration : '2m' , target : 100 },
{ duration : '5m' , target : 100 },
Normal Capacity
Reach expected peak load to verify normal operation. { duration : '2m' , target : 200 },
{ duration : '5m' , target : 200 },
Stress Zone
Push beyond normal capacity to find breaking points. { duration : '2m' , target : 300 },
{ duration : '5m' , target : 300 },
{ duration : '2m' , target : 400 },
{ duration : '5m' , target : 400 },
Recovery Period
Ramp down and observe how the system recovers. { duration : '10m' , target : 0 },
Advanced Stress Testing
Multi-Stage Stress Pattern
Test multiple stress levels:
import http from 'k6/http' ;
import { check } from 'k6' ;
export const options = {
stages: [
// Warm up
{ duration: '1m' , target: 50 },
// Stress level 1: 150% of normal
{ duration: '3m' , target: 150 },
{ duration: '5m' , target: 150 },
// Stress level 2: 200% of normal
{ duration: '3m' , target: 200 },
{ duration: '5m' , target: 200 },
// Stress level 3: 300% of normal
{ duration: '3m' , target: 300 },
{ duration: '5m' , target: 300 },
// Recovery
{ duration: '10m' , target: 0 },
],
thresholds: {
// More lenient thresholds for stress tests
http_req_duration: [ 'p(95)<2000' ],
http_req_failed: [ 'rate<0.1' ], // Allow up to 10% errors
},
};
export default function () {
const res = http . get ( 'https://quickpizza.grafana.com/api/pizza' );
check ( res , { 'status is 200' : ( r ) => r . status === 200 });
}
Stress test thresholds are typically more lenient than load tests since you expect the system to struggle under extreme conditions.
What to Monitor
System Metrics
Response times : When do they start degrading?
Error rates : At what load do errors appear?
Throughput : Where does it plateau?
Resource usage : CPU, memory, disk, network
Queue depths : Database connections, message queues
Breaking Point Indicators
import http from 'k6/http' ;
import { check , sleep } from 'k6' ;
import { Counter } from 'k6/metrics' ;
const errors = new Counter ( 'errors' );
const timeouts = new Counter ( 'timeouts' );
export const options = {
stages: [
{ duration: '2m' , target: 100 },
{ duration: '5m' , target: 100 },
{ duration: '2m' , target: 200 },
{ duration: '5m' , target: 200 },
{ duration: '2m' , target: 300 },
{ duration: '5m' , target: 300 },
],
thresholds: {
errors: [ 'count<100' ],
timeouts: [ 'count<50' ],
},
};
export default function () {
const res = http . get ( 'https://quickpizza.grafana.com' , {
timeout: '10s' ,
});
if ( res . status === 0 ) {
timeouts . add ( 1 );
}
if ( res . status >= 400 ) {
errors . add ( 1 );
}
check ( res , {
'status is 200' : ( r ) => r . status === 200 ,
});
sleep ( 1 );
}
Use custom metrics to track specific failure modes like timeouts, connection errors, and server errors.
Best Practices
Gradual Stress Increase
Incremental Steps Increase load in 25-50% increments to identify exact breaking points
Hold Periods Maintain each stress level for 3-5 minutes to observe steady-state behavior
Recovery Testing
The recovery period is critical:
export const options = {
stages: [
// ... stress stages ...
// Long recovery period to monitor
{ duration: '10m' , target: 0 },
],
};
During recovery, monitor:
How quickly do response times return to normal?
Are there lingering errors or stuck processes?
Do queues drain properly?
Does memory get released?
Realistic Stress Scenarios
import http from 'k6/http' ;
import { check , sleep } from 'k6' ;
export const options = {
stages: [
{ duration: '2m' , target: 100 },
{ duration: '5m' , target: 200 },
{ duration: '2m' , target: 300 },
{ duration: '5m' , target: 300 },
{ duration: '10m' , target: 0 },
],
};
export default function () {
// Mix of read and write operations
http . get ( 'https://quickpizza.grafana.com/api/pizza' );
sleep ( 0.5 );
http . post ( 'https://quickpizza.grafana.com/api/cart' , JSON . stringify ({
pizzaId: 1 ,
quantity: 2 ,
}), {
headers: { 'Content-Type' : 'application/json' },
});
sleep ( 1 );
}
When to Use
Capacity planning : Determine absolute maximum capacity
Failure mode analysis : Understand how the system fails
Auto-scaling validation : Test that scaling mechanisms work
Resource limits : Identify resource bottlenecks
Pre-production : Before major releases or traffic events
Common Findings
Expected Behaviors
Graceful degradation : System slows but doesn’t crash
Error handling : Meaningful error messages
Resource limits : Clear capacity boundaries
Recovery : System returns to normal after stress
Red Flags
Watch for these serious issues:
Complete system crashes
Cascading failures across services
Memory leaks that persist after recovery
Data corruption or inconsistency
Inability to recover without manual intervention
Analysis Tips
Identify your breaking point by analyzing:
Response time curve : Where does p(95) exceed acceptable limits?
Error rate : When do errors start appearing?
Throughput plateau : Where does requests/sec stop increasing?
Resource exhaustion : When do CPU/memory/connections max out?
export const options = {
thresholds: {
http_req_duration: [
'p(50)<500' , // Median should be fast
'p(95)<2000' , // 95th percentile can be slower
'p(99)<5000' , // 99th percentile - stress conditions
],
http_req_failed: [ 'rate<0.1' ], // Up to 10% errors acceptable
},
};