Spike testing evaluates how your system handles sudden, dramatic increases in load. It tests resilience to traffic surges like flash sales, viral content, or DDoS attacks.
Purpose
Spike tests help you:
Validate system behavior during sudden traffic spikes
Test auto-scaling responsiveness
Identify if the system crashes or recovers
Verify rate limiting and throttling mechanisms
Test circuit breakers and fallback strategies
Spike tests differ from stress tests by focusing on sudden load changes rather than gradual increases.
Configuration Pattern
Spike tests have rapid load increases:
import http from 'k6/http' ;
import { check , sleep } from 'k6' ;
export const options = {
stages: [
{ duration: '10s' , target: 100 }, // Below normal load
{ duration: '1m' , target: 100 },
{ duration: '10s' , target: 1400 }, // Spike to 1400 users
{ duration: '3m' , target: 1400 }, // Stay at 1400 for 3 minutes
{ duration: '10s' , target: 100 }, // Scale down rapidly
{ duration: '3m' , target: 100 }, // Recovery
{ duration: '10s' , target: 0 },
],
};
export default function () {
const res = http . get ( 'https://quickpizza.grafana.com' );
check ( res , {
'status is 200' : ( r ) => r . status === 200 ,
});
sleep ( 1 );
}
Baseline Normal operating load
Spike 10-20x increase in seconds
Using the Ramping VUs Executor
The ramping-vus executor enables rapid load changes:
export const options = {
scenarios: {
spike: {
executor: 'ramping-vus' ,
startVUs: 0 ,
stages: [
{ duration: '2m' , target: 100 }, // Normal load
{ duration: '30s' , target: 2000 }, // Spike!
{ duration: '3m' , target: 2000 }, // Maintain spike
{ duration: '1m' , target: 100 }, // Recovery
{ duration: '2m' , target: 0 },
],
gracefulRampDown: '30s' ,
},
},
};
Spike Test Stages
Establish Baseline
Start with normal load to establish baseline metrics. { duration : '2m' , target : 100 },
Spike
Rapidly increase to 5-10x normal load in 30 seconds or less. { duration : '30s' , target : 2000 },
Maintain Spike
Hold the spike for 1-5 minutes to observe sustained behavior. { duration : '3m' , target : 2000 },
Recovery
Return to normal load and monitor recovery. { duration : '1m' , target : 100 },
{ duration : '2m' , target : 0 },
Multiple Spike Pattern
Test resilience to repeated spikes:
import http from 'k6/http' ;
import { check , sleep } from 'k6' ;
export const options = {
stages: [
// First spike
{ duration: '1m' , target: 100 },
{ duration: '30s' , target: 1000 },
{ duration: '2m' , target: 1000 },
{ duration: '30s' , target: 100 },
// Recovery period
{ duration: '2m' , target: 100 },
// Second spike
{ duration: '30s' , target: 1500 },
{ duration: '2m' , target: 1500 },
{ duration: '30s' , target: 100 },
// Recovery period
{ duration: '2m' , target: 100 },
// Third spike
{ duration: '30s' , target: 2000 },
{ duration: '2m' , target: 2000 },
{ duration: '30s' , target: 0 },
],
};
export default function () {
const res = http . get ( 'https://quickpizza.grafana.com/api/pizza' );
check ( res , { 'status is 200' : ( r ) => r . status === 200 });
sleep ( 1 );
}
Multiple spikes test whether the system can recover between surges and handle repeated stress.
Instant Spike Pattern
For testing extreme scenarios, use zero-duration stages:
export const options = {
stages: [
{ duration: '2m' , target: 100 }, // Normal load
{ duration: '0s' , target: 2000 }, // Instant spike!
{ duration: '5m' , target: 2000 }, // Hold
{ duration: '0s' , target: 100 }, // Instant drop
{ duration: '5m' , target: 100 }, // Recovery
],
};
Zero-duration stages create instantaneous load changes. This is extremely aggressive and may overwhelm systems. Use with caution.
Using Arrival Rate for Spikes
For more realistic spike testing, use arrival rates:
export const options = {
scenarios: {
spike: {
executor: 'ramping-arrival-rate' ,
startRate: 50 ,
timeUnit: '1s' ,
preAllocatedVUs: 100 ,
maxVUs: 2000 ,
stages: [
{ duration: '2m' , target: 50 }, // Normal: 50 req/s
{ duration: '30s' , target: 500 }, // Spike: 500 req/s
{ duration: '3m' , target: 500 }, // Hold spike
{ duration: '30s' , target: 50 }, // Back to normal
{ duration: '2m' , target: 50 }, // Recovery
],
},
},
};
What to Monitor
Critical Metrics
import http from 'k6/http' ;
import { check , sleep } from 'k6' ;
import { Counter , Trend } from 'k6/metrics' ;
const errors = new Counter ( 'errors' );
const timeouts = new Counter ( 'timeouts' );
const latency = new Trend ( 'custom_latency' );
export const options = {
stages: [
{ duration: '1m' , target: 100 },
{ duration: '30s' , target: 1000 },
{ duration: '3m' , target: 1000 },
{ duration: '30s' , target: 100 },
],
};
export default function () {
const start = Date . now ();
const res = http . get ( 'https://quickpizza.grafana.com' , {
timeout: '30s' ,
});
const duration = Date . now () - start ;
latency . add ( duration );
if ( res . status === 0 ) {
timeouts . add ( 1 );
}
if ( res . status >= 400 ) {
errors . add ( 1 );
}
check ( res , {
'status is 200' : ( r ) => r . status === 200 ,
'response time < 3s' : ( r ) => r . timings . duration < 3000 ,
});
sleep ( 1 );
}
System Behavior
Monitor these aspects during spikes:
Response Times How much do they degrade during the spike?
Error Rates What percentage of requests fail?
Throughput Does it scale with increased load?
Recovery Time How long to return to normal?
Best Practices
Spike Magnitude
Choose spike size based on your scenario:
Conservative : 2-3x normal load (test auto-scaling)
Realistic : 5-10x normal load (flash sales, viral content)
Extreme : 20-50x normal load (DDoS simulation)
Spike Duration
export const options = {
stages: [
{ duration: '2m' , target: 100 },
// Short spike: 1-2 minutes
{ duration: '30s' , target: 1000 },
{ duration: '1m' , target: 1000 },
// Allow 3-5 minute recovery
{ duration: '30s' , target: 100 },
{ duration: '3m' , target: 100 },
],
};
Shorter spikes (1-2 minutes) test immediate response. Longer spikes (5-10 minutes) test sustained resilience.
Realistic Spike Scenarios
Simulate different spike patterns:
import http from 'k6/http' ;
import { check } from 'k6' ;
export const options = {
scenarios: {
// Flash sale scenario
flash_sale: {
executor: 'ramping-vus' ,
startVUs: 100 ,
stages: [
{ duration: '5m' , target: 100 }, // Pre-sale normal
{ duration: '1m' , target: 2000 }, // Sale starts!
{ duration: '10m' , target: 2000 }, // Sale duration
{ duration: '5m' , target: 500 }, // Post-sale decline
{ duration: '5m' , target: 100 }, // Return to normal
],
},
},
};
export default function () {
// Hot item endpoint
http . get ( 'https://quickpizza.grafana.com/api/pizza/1' );
// Add to cart
const cartRes = http . post ( 'https://quickpizza.grafana.com/api/cart' ,
JSON . stringify ({ pizzaId: 1 , quantity: 1 }),
{ headers: { 'Content-Type' : 'application/json' } }
);
check ( cartRes , {
'can add to cart' : ( r ) => r . status === 200 ,
});
}
When to Use
Before major events : Flash sales, product launches, marketing campaigns
DDoS resilience : Test protection mechanisms
Auto-scaling validation : Verify rapid scaling works
Traffic surge preparation : Viral content, breaking news
Capacity planning : Determine maximum burst capacity
Expected Outcomes
Healthy System Response
Brief Degradation
Response times increase temporarily but remain acceptable
Auto-scaling Triggers
Additional resources provision automatically
Stabilization
Performance recovers within 2-3 minutes
Clean Recovery
System returns to baseline after spike ends
Warning Signs
Watch for these problems:
Complete service outages
Error rates above 5%
Response times exceeding 10 seconds
System crashes or restarts
Inability to recover after spike ends
Cascading failures to dependent services
Analysis Tips
Compare Pre-Spike and Spike Metrics
export const options = {
thresholds: {
// Normal performance (first 2 minutes)
'http_req_duration{phase:baseline}' : [ 'p(95)<500' ],
// Spike performance (relaxed thresholds)
'http_req_duration{phase:spike}' : [ 'p(95)<2000' ],
// Recovery performance
'http_req_duration{phase:recovery}' : [ 'p(95)<500' ],
},
};
Key Questions
How long until the system stabilizes after the spike?
What is the error rate during peak spike?
Did any services crash or become unavailable?
Did auto-scaling respond in time?
Were there any data inconsistencies?
Did the system fully recover?