Skip to main content

When to Use SQL Query Stress

Ideal Use Cases

SQL Query Stress is purpose-built for testing individual queries and stored procedures under load, not full application testing.
Use SQL Query Stress for:
  • Query optimization validation: Test if your index/query changes actually improve performance
  • Concurrency testing: Understand how queries behave with multiple simultaneous executions
  • Resource impact analysis: Measure CPU, IO, and memory effects of queries under load
  • Regression testing: Verify that database changes don’t degrade performance
  • Capacity planning: Determine maximum throughput for specific operations
  • Stored procedure stress testing: Test procedures with realistic parameter distributions
Don’t use SQL Query Stress for:
  • Full application load testing (use JMeter, LoadRunner, etc.)
  • Testing web services or APIs directly
  • UI/UX performance testing
  • Multi-query transaction workflows (each test runs one query repeatedly)

Complementary Tools

SQL Query Stress works best alongside:
  • SQL Server Profiler/Extended Events: Capture broader database activity
  • Query Store: Track query performance over time
  • Database Engine Tuning Advisor: Get index recommendations
  • sp_BlitzCache: Analyze execution plan patterns

Setting Realistic Thread and Iteration Counts

Understanding Thread Behavior

Each thread in SQL Query Stress (LoadEngine.cs:120-158):
  1. Opens a dedicated SQL connection from the pool
  2. Executes the query for N iterations sequentially
  3. Optionally delays between iterations
  4. Runs at BelowNormal thread priority (LoadEngine.cs:152)
Threads ≠ Concurrent Executions at All Times: Due to the sequential nature of iterations within each thread, actual concurrency fluctuates. Faster queries mean more overlap.

Thread Count Guidelines

Start with your expected concurrent users:
  • OLTP applications: 20-100 threads per application server
  • Reporting systems: 5-20 threads (reports typically run sequentially)
  • Web applications: 10-50 threads (depends on connection pooling)
  • Batch processes: 1-10 threads (often single-threaded)
Progressive load testing approach:
  1. Baseline: 1 thread, 100 iterations (establishes single-user performance)
  2. Light load: 10 threads, 100 iterations each (1,000 total executions)
  3. Normal load: Match expected production concurrency
  4. Stress test: 2-3x normal load (find breaking points)
  5. Maximum throughput: Increase threads until performance plateaus
The LoadEngine automatically sets MinPoolSize = MaxPoolSize = threads (LoadEngine.cs:47-48) to pre-allocate connections. This means thread count directly impacts initial connection overhead.

Iteration Count Guidelines

Total executions = threads × iterations Choose iterations based on test goals: Quick validation (10-50 iterations):
  • Verify query executes without errors
  • Get rough performance estimates
  • Fast feedback during development
Performance baselines (100-1000 iterations):
  • Establish reliable average execution times
  • Smooth out variance and outliers
  • Recommended minimum for comparative testing
Statistical significance (1000+ iterations):
  • Detailed percentile analysis (p50, p95, p99)
  • Identify rare edge cases
  • Production-like performance characterization
Endurance testing (10,000+ iterations):
  • Detect memory leaks
  • Identify performance degradation over time
  • Test plan cache stability
More iterations = more statistical confidence but longer test runtime. Balance thoroughness with iteration time.

Test Environment Setup

Environment Considerations

Ideal test environment:
  • Matches production hardware specifications
  • Isolated from other workloads
  • Contains production-like data volumes
  • Same SQL Server version and edition
  • Identical indexes and statistics
Common environment issues:
Problem: Your test server has 4 CPUs and 16GB RAM, production has 32 CPUs and 256GB RAM.Impact:
  • CPU-bound queries will appear much slower
  • Memory pressure may force excessive disk IO
  • Parallelism behavior differs dramatically
Solution:
  • Scale thread counts proportionally (reduce by CPU ratio)
  • Focus on relative improvements, not absolute numbers
  • Test on production-equivalent hardware for capacity planning
  • Use Resource Governor to limit CPU/memory if testing on production
Problem: Testing with 1,000 rows when production has 10 million rows.Impact:
  • Entire table fits in memory (won’t in production)
  • Query plans differ dramatically (index scans vs seeks)
  • Locking behavior is unrealistic
  • Results are completely misleading
Solution:
  • Load representative data volumes
  • Include realistic data distributions (not just sequential IDs)
  • Update statistics: UPDATE STATISTICS TableName WITH FULLSCAN
  • Consider using production backup restored to test environment
Problem: You restored a production backup but statistics are outdated.Impact:
  • Query optimizer makes poor decisions
  • Execution plans don’t match production
  • Performance results are invalid
Solution:
-- Update all statistics with full scan
EXEC sp_MSforeachtable 'UPDATE STATISTICS ? WITH FULLSCAN'

-- Verify statistics are current
SELECT 
    OBJECT_NAME(object_id) AS TableName,
    name AS StatisticName,
    STATS_DATE(object_id, stats_id) AS LastUpdated
FROM sys.stats
WHERE OBJECTPROPERTY(object_id, 'IsUserTable') = 1
ORDER BY LastUpdated;

Network Considerations

Run SQL Query Stress from the same network segment as your application servers, not your development machine. Network latency significantly impacts results.
Network impact on timing:
  • LoadEngine measures client-perceived time (LoadEngine.cs:413-435)
  • Includes query execution + network transfer
  • ForceDataRetrieval = true amplifies network effects for large result sets

Baseline Before Optimization

Why Baselines Matter

Without a baseline, you can’t prove optimization worked. Baseline workflow:
  1. Document current state:
    • Current query or procedure code
    • Existing indexes
    • Current statistics collection state
  2. Run baseline test:
    • Enable CollectIoStats = true (QueryStressSettings.cs:122)
    • Enable CollectTimeStats = true (QueryStressSettings.cs:123)
    • Record: avg/min/max execution time, logical reads, CPU time
    • Save the test configuration
  3. Make ONE change:
    • Add an index, OR
    • Rewrite the query, OR
    • Adjust a parameter
    • Never multiple changes simultaneously
  4. Re-run identical test:
    • Same thread count
    • Same iteration count
    • Same connection settings
    • Same test data
  5. Compare results:
    • Did execution time improve?
    • Did logical reads decrease?
    • Did CPU time decrease?
Change one variable at a time. If you add an index AND rewrite the query, you won’t know which change helped (or if one change hurt).

Key Metrics to Baseline

Always track these QueryOutput metrics (LoadEngine.cs:535-545):
  1. Time (TimeSpan): Client-perceived execution time
    • Watch: Average, P95, P99, maximum
  2. LogicalReads (int): Pages read from buffer cache
    • Goal: Minimize (fewer IOs = faster queries)
    • Extracted via regex from InfoMessage events (LoadEngine.cs:280)
  3. CpuTime (int): SQL Server CPU milliseconds
    • Distinguishes CPU-bound from IO-bound queries
    • Extracted via regex from InfoMessage events (LoadEngine.cs:284-286)
  4. ElapsedTime (int): Server-side elapsed time
    • Compare to client Time to identify network overhead
  5. Exception count: Should be zero in successful tests
    • Any exceptions indicate problems

Interpreting Results Correctly

Understanding the Output

SQL Query Stress aggregates results from all threads via a BlockingCollection<QueryOutput> (LoadEngine.cs:18) that:
  • Collects output from each iteration of each thread
  • Reports progress to the UI (LoadEngine.cs:203)
  • Tracks active threads (LoadEngine.cs:202)

Common Interpretation Mistakes

Why this is wrong:
  • Averages hide outliers that affect user experience
  • A few very slow executions get hidden by many fast ones
  • P95 and P99 metrics matter more for user satisfaction
Better approach:
  • Look at maximum execution time (worst case)
  • Calculate percentiles if your UI supports it
  • Investigate why some iterations are much slower
Why this matters:
  • Logical reads = pages read from buffer cache
  • Direct indicator of IO efficiency
  • Reducing logical reads almost always improves performance
Example:
  • Query A: 50ms average, 10,000 logical reads
  • Query B: 55ms average, 1,000 logical reads
  • Query B is better: Lower IO means better scalability under load
When IO matters most:
  • Data not in cache (cold cache scenario)
  • High concurrency (cache contention)
  • Large databases (cache can’t hold everything)
Issue: The first iteration may be slower due to:
  • Plan compilation (LoadEngine caches SqlCommand objects, but plans may recompile)
  • Cold buffer cache
  • Statistics loading
Solution:
  • Run a warm-up test first
  • Use high iteration counts (100+) so first execution is insignificant
  • Or manually exclude first iteration from analysis
Real production has:
  • Stale statistics
  • Fragmented indexes
  • Other concurrent workload
  • Plan cache pressure
  • TempDB contention
Better testing:
  • Test with realistic concurrent load (use multiple test scenarios)
  • Occasionally clear plan cache between tests
  • Test with statistics out of date
  • Run alongside other database activity

CPU-Bound vs IO-Bound Analysis

Compare CpuTime vs ElapsedTime from statistics: CPU-bound query:
  • CPU time << elapsed time
  • Solution: Optimize query logic, reduce computations, improve indexes to reduce rows processed
IO-bound query:
  • CPU time << elapsed time
  • Solution: Reduce logical reads, add covering indexes, increase memory
Wait-bound query:
  • Both CPU and elapsed are low, but client Time is high
  • Solution: Network issues, locking/blocking, check for waits in Extended Events

Comparing Test Runs

Test Configuration Consistency

For valid comparisons, keep identical:
Save your test configurations! SQL Query Stress can save/load settings (QueryStressSettings.cs is serializable) to ensure consistency.
  • Connection string (same server, database)
  • Thread count (affects concurrency and connection pool)
  • Iteration count (affects statistical confidence)
  • CommandTimeout (QueryStressSettings.cs:30)
  • ForceDataRetrieval (QueryStressSettings.cs:48)
  • CollectIoStats / CollectTimeStats (QueryStressSettings.cs:18, 24)
  • DelayBetweenQueries (QueryStressSettings.cs:84)
  • Parameter query and mappings (if using parameters)

Making Valid Comparisons

Good comparison:
Test A: Original query, 50 threads, 200 iterations
        Avg: 125ms, Logical reads: 5000
        
Test B: Optimized query, 50 threads, 200 iterations (same settings)
        Avg: 45ms, Logical reads: 150
        
Conclusion: 64% faster, 97% fewer IOs - optimization successful
Invalid comparison:
Test A: 10 threads, 100 iterations
        Avg: 125ms
        
Test B: 50 threads, 500 iterations (different settings!)
        Avg: 45ms
        
Conclusion: Can't determine if optimization worked or if lower average 
            is due to caching from more iterations

Statistical Significance

Use high iteration counts (500+) when comparing close results. A 5ms difference in average time is meaningless with only 10 iterations but significant with 1000.
When results are close:
  • Run multiple test cycles (3-5 complete runs of each configuration)
  • Calculate standard deviation
  • Look for consistent patterns across runs
  • Small differences (<5%) may be noise

Advanced Testing Strategies

Progressive Load Testing

1 thread  → Baseline single-user performance
10 threads → Light load, check for obvious issues
25 threads → Moderate load, realistic workload
50 threads → Heavy load, stress testing
100 threads → Maximum load, find breaking point
Plot threads vs throughput to find optimal concurrency.

Parameter Distribution Testing

When using the parameter query feature (LoadEngine.cs:210-272): Design parameter queries to reflect production:
  • Include same distribution of values (not just SELECT TOP 100)
  • Include edge cases (NULLs, empty strings, large values)
  • Randomize order to avoid cache effects
Example:
-- Bad: Sequential IDs (unrealistic cache behavior)
SELECT TOP 1000 CustomerID FROM Customers ORDER BY CustomerID

-- Good: Random sample matching production distribution  
SELECT TOP 1000 CustomerID 
FROM Customers 
TABLESAMPLE(1000 ROWS)
ORDER BY NEWID()
The ParamServer rotates through rows using Interlocked.Increment (LoadEngine.cs:223), cycling when reaching the end.

Endurance Testing

Test for degradation over time: Configuration:
  • Low thread count (5-10)
  • Very high iteration count (10,000-100,000)
  • Monitor for:
    • Increasing execution times
    • Growing memory usage
    • Plan cache bloat
    • TempDB growth

Cancellation Testing

Test your queries handle cancellation gracefully by stopping tests mid-run with KillQueriesOnCancel = true (default).
The LoadEngine gives threads 5 seconds to exit gracefully (LoadEngine.cs:190) before forcing interruption. Queries should respond to cancellation within this window.

Common Pitfalls to Avoid

  1. Testing SELECT * instead of actual column list: Column list affects IO and network transfer
  2. Forgetting to update statistics after data changes: Results will be misleading
  3. Testing with empty TempDB: Production TempDB is usually busy with other operations
  4. Not clearing plan cache between major changes: May test cached plan, not new query
  5. Using COUNT(*) to validate results: Adds overhead that skews timing
  6. Testing stored procedures without parameters: Real usage always includes parameters
  7. Assuming faster = better: Sometimes slower query with fewer reads is more scalable
  8. Not monitoring SQL Server during tests: Use Perfmon/DMVs to see resource utilization

Further Reading

Build docs developers (and LLMs) love