Drift’s Monte Carlo simulation engine is designed for high performance, running 100,000 simulations in ~500ms using NumPy vectorization and multiprocessing. This guide covers how to tune performance based on your deployment environment.
Simulation Architecture
The Python simulation engine (simulation/monte_carlo.py) uses:
- NumPy Vectorization - Operates on entire arrays instead of loops
- Multiprocessing - Distributes work across CPU cores
- Batch Processing - Splits simulations into chunks for parallel execution
- Pre-generated Random Numbers - Generates all random values upfront for efficiency
Worker Configuration
Default Behavior
By default, Drift caps workers at 4 to balance performance and resource usage (simulation/monte_carlo.py:237):
if n_workers is None:
n_workers = min(cpu_count(), 4) # Cap at 4 for demo
Adjusting Worker Count
From API:
// apps/api/src/services/simulationService.ts
const results = await runPythonSimulation(request, n_workers=8)
From Python:
from simulation.monte_carlo import run_monte_carlo
results = run_monte_carlo(
request=simulation_request,
n_workers=8 # Specify worker count
)
Recommended Worker Counts
| Environment | CPU Cores | Recommended Workers | Reasoning |
|---|
| Development | 2-4 | 2 | Leave cores for IDE, browser |
| Laptop/Desktop | 4-8 | 4 | Balance performance & battery |
| Server (shared) | 8-16 | 6-8 | Don’t monopolize all cores |
| Server (dedicated) | 16+ | 12-16 | Maximize throughput |
| Serverless | 1-2 | 1 | Limited vCPUs per function |
Worker count should be ≤ CPU cores. Over-subscribing (e.g., 16 workers on 4 cores) causes context switching overhead and degrades performance.
Benchmarking Workers
Use the built-in benchmark function to test optimal worker count:
from simulation.monte_carlo import benchmark_simulation
results = benchmark_simulation(simulation_request)
print(results)
Output:
{
"1_workers": {
"time_seconds": 2.45,
"simulations_per_second": 40816
},
"2_workers": {
"time_seconds": 1.28,
"simulations_per_second": 78125
},
"4_workers": {
"time_seconds": 0.68,
"simulations_per_second": 147058
},
"speedup_4x": 3.6
}
Configured in simulation/monte_carlo.py:318-340.
Simulation Parameters
Number of Simulations
Controls the number of Monte Carlo paths to simulate.
Default: 100,000 simulations (simulation/models.py:210)
class SimulationParams(BaseModel):
n_simulations: int = 100000
Trade-offs:
| Simulations | Execution Time | Accuracy | Use Case |
|---|
| 1,000 | ~50ms | Low confidence intervals | Quick prototyping |
| 10,000 | ~100ms | Reasonable accuracy | Development/testing |
| 100,000 | ~500ms | High statistical confidence | Production (default) |
| 500,000 | ~2.5s | Very high confidence | Research, critical decisions |
| 1,000,000 | ~5s | Maximum accuracy | Academic analysis |
Adjusting:
// From API
const request: SimulationRequest = {
simulationParams: {
nSimulations: 50000 // Reduce for faster response
}
}
# From Python
params = SimulationParams(n_simulations=50000)
For interactive use cases (real-time what-if scenarios), use 10,000-50,000 simulations to keep response times under 200ms. For final results, use 100,000+ for statistical rigor.
Volatility Parameters
Control the randomness of income, expenses, and returns.
Configured in simulation/models.py:210-224:
class SimulationParams(BaseModel):
income_volatility: float = 0.05 # ±5% income variance
expense_volatility: float = 0.15 # ±15% spending variance
annual_return_mean: float = 0.07 # 7% average annual return
annual_return_std: float = 0.15 # 15% return volatility
inflation_rate: float = 0.025 # 2.5% annual inflation
inflation_volatility: float = 0.01 # ±1% inflation variance
emergency_probability: float = 0.08 # 8% monthly emergency chance
emergency_min: float = 500 # Min emergency cost
emergency_max: float = 3000 # Max emergency cost
Impact on Performance:
- Higher volatility → Wider outcome distributions (more realistic)
- Lower volatility → Narrower distributions (overly optimistic)
- No impact on execution time (pre-generated random numbers)
Risk Tolerance Presets
Drift includes three risk profiles that adjust investment returns (simulation/models.py:234-256):
risk_profiles = {
"low": {"annual_return_mean": 0.04, "annual_return_std": 0.08}, # Conservative (bonds)
"medium": {"annual_return_mean": 0.07, "annual_return_std": 0.15}, # Balanced (60/40)
"high": {"annual_return_mean": 0.10, "annual_return_std": 0.20}, # Aggressive (stocks)
}
Usage:
params = SimulationParams.from_risk_tolerance("medium")
1. Vectorization vs. Looping
The engine uses NumPy array operations instead of Python loops for ~100x speedup.
Fast (vectorized):
# Simulate all months for all scenarios at once
income = base_income * income_multiplier * income_noise[:, month] # Shape: (100000,)
Slow (looping):
# Simulate one scenario at a time
for sim in range(n_simulations):
for month in range(months):
income = base_income * random.gauss(1.0, volatility)
2. Pre-generating Random Numbers
All random values are generated upfront (simulation/monte_carlo.py:45-79):
# Pre-generate all random numbers for efficiency
income_noise = rng.normal(1.0, params.income_volatility, (n_sims, months))
spending_noise = rng.normal(1.0, params.expense_volatility, (n_sims, months))
emergency_events = rng.random((n_sims, months)) < params.emergency_probability
market_returns = rng.normal(monthly_return_mean, monthly_return_std, (n_sims, months))
Why this is fast:
- NumPy’s C-based RNG is much faster than Python’s
random module
- Generating in bulk amortizes function call overhead
- Enables vectorized operations later
3. Batch Processing
Work is split across workers in batches (simulation/monte_carlo.py:240-261):
# Split 100,000 simulations across 4 workers
batches = np.array_split(seeds, n_workers) # [25k, 25k, 25k, 25k]
with Pool(n_workers) as pool:
for balances, batch_id in pool.imap_unordered(run_simulation_batch, batch_args):
results_list.append(balances)
Benefits:
- Near-linear scaling with CPU cores (4 workers → 3.6x speedup)
- Progress reporting per batch
- Fault isolation (one worker crash doesn’t kill entire simulation)
4. Memory Optimization
The engine stores only final balances, not full time series:
balances = np.zeros((n_sims, months + 1)) # Shape: (100000, 37)
# After simulation:
return balances[:, -1] # Return only final month (100000,)
Memory usage for 100k simulations (36-month timeline):
- Full time series: 100k × 37 months × 8 bytes = 29.6 MB
- Final balances only: 100k × 8 bytes = 0.8 MB
- Savings: 97% reduction
Account-Aware Simulation
When using Plaid integration, Drift supports per-account modeling.
Enabling Account-Aware Mode
params = SimulationParams(
use_account_aware_simulation=True,
credit_cards=[
CreditCardParams(id="card1", balance=5000, apr=18.99, minimum_payment=150),
CreditCardParams(id="card2", balance=3000, apr=24.99, minimum_payment=90),
],
loans=[
LoanParams(id="loan1", balance=25000, interest_rate=4.5, monthly_payment=450),
]
)
Account-aware simulation is slower due to per-account interest calculations:
| Mode | Execution Time | Accuracy | Use Case |
|---|
| Legacy | ~500ms | Aggregated debt | Quick estimates |
| Account-Aware | ~850ms | Per-card interest | Plaid integration |
Code path (simulation/monte_carlo.py:152-213):
if params.use_account_aware_simulation:
# Per-card interest accrual
for i in range(len(params.credit_cards)):
monthly_rate = card_aprs[i] / 12
interest = card_balances[:, i] * monthly_rate
card_balances[:, i] += interest
# ... payment logic
else:
# Aggregated legacy logic (faster)
balances[:, month + 1] = balances[:, month] + income - spending - loan_payments + returns
Account-aware mode is required when using Plaid data to accurately model per-card APRs and loan amortization schedules.
Progress Reporting
For long-running simulations, enable progress callbacks:
def progress_callback(update: dict):
print(f"Worker {update['worker']}: {update['percentage']}% complete")
results = run_monte_carlo(
request=simulation_request,
n_workers=4,
progress_callback=progress_callback
)
Callback payload:
{
"type": "progress",
"completed": 25000,
"total": 100000,
"worker": 0,
"percentage": 25.0
}
Configured in simulation/monte_carlo.py:254-260.
Deployment Recommendations
Local Development
# Fast iteration, lower accuracy
SimulationParams(
n_simulations=10000, # 100ms response
use_account_aware_simulation=False
)
Run with:
cd simulation
source venv/bin/activate
python main.py --mode simulate --input '{...}'
Production API
# Balance speed and accuracy
SimulationParams(
n_simulations=100000, # 500ms response
use_account_aware_simulation=True # If using Plaid
)
# Configure workers based on server CPU count
n_workers = min(cpu_count(), 8)
Hosting considerations:
- Vercel/Netlify: Limited to 10s execution time → Use 50k simulations or offload to worker service
- Railway/Render: Full CPU access → Use default 100k simulations with auto-detected workers
- AWS Lambda: 1-2 vCPUs → Use 1 worker, 50k simulations, optimize for cold start
Serverless (AWS Lambda)
Lambda functions have limited CPU and 15-minute timeout:
# Optimized for Lambda constraints
SimulationParams(
n_simulations=25000, # ~200ms execution
use_account_aware_simulation=False # Reduce complexity
)
n_workers = 1 # Lambda typically has 1-2 vCPUs
Package size optimization:
# Use slim NumPy build
pip install numpy --no-binary :all:
# Strip debug symbols
find . -name "*.so" -exec strip {} \;
Maximize throughput on dedicated hardware:
# Maximum accuracy and speed
SimulationParams(
n_simulations=500000, # ~2.5s execution
use_account_aware_simulation=True
)
n_workers = cpu_count() # Use all available cores
Built-in Metrics
Results include performance metadata:
{
"success_probability": 0.73,
"median_outcome": 52000,
"simulations_run": 100000,
"workers_used": 4,
"assumptions": { ... }
}
Logging Execution Time
import time
start = time.time()
results = run_monte_carlo(request, n_workers=4)
elapsed = time.time() - start
print(f"Simulations: {results.simulations_run}")
print(f"Workers: {results.workers_used}")
print(f"Time: {elapsed:.2f}s")
print(f"Throughput: {results.simulations_run / elapsed:.0f} sims/sec")
| Metric | Target | Excellent | Needs Tuning |
|---|
| Execution time (100k sims) | < 1s | < 500ms | > 2s |
| Throughput | > 100k sims/sec | > 200k sims/sec | < 50k sims/sec |
| Speedup (4 workers vs 1) | > 3x | > 3.5x | < 2x |
| Memory usage | < 100 MB | < 50 MB | > 200 MB |
Troubleshooting
Problem: Slow Simulation (> 5s for 100k sims)
Possible causes:
- Too many workers (context switching overhead)
- Python environment not optimized (missing NumPy binaries)
- Account-aware mode with many accounts
Solutions:
# Reduce workers
n_workers = min(cpu_count(), 4)
# Verify NumPy is using optimized BLAS
import numpy as np
np.show_config() # Should show MKL, OpenBLAS, or ATLAS
# Reduce account complexity
params.use_account_aware_simulation = False
Problem: Out of Memory
Cause: Too many simulations or long timeline
Solutions:
# Reduce simulations
params.n_simulations = 50000
# Process in smaller batches
n_workers = 8 # More workers = smaller batches per worker
Problem: Poor Speedup with Multiple Workers
Cause: Small problem size or GIL contention
Solutions:
# Increase batch size
n_workers = 2 # Fewer workers = larger batches
# Ensure NumPy releases GIL (it should by default)
# Verify with: python -m cProfile simulation/main.py
For optimal performance on most systems, use 4 workers with 100,000 simulations. This provides a good balance of speed (~500ms), accuracy, and resource usage.