- Chaos engineering on Wikipedia: Describes the basic concepts, history and tools related to chaos engineering.
- Chaos engineering, the history, principles and practices: Excellent article about chaos engineering by Gremlin, a chaos engineering platform.
- Understanding chaos engineering and resilience: Intro to chaos engineering in the context of Azure Chaos Studio, managed service that uses chaos engineering to help you measure, understand, and improve your cloud application and service resilience.
Chaos engineering with Simmy
Simmy is a major new addition to Polly library starting with v8.3.0, adding a chaos engineering and fault-injection dimension to Polly, through the provision of strategies to selectively inject faults, latency, custom behavior or fake results.Chaos strategies are seamlessly integrated into Polly v8’s resilience pipeline architecture, allowing you to combine them with retry, circuit breaker, timeout, and other resilience strategies.
Basic usage
Here’s how to configure chaos strategies in your resilience pipeline:The
AddChaosFault, AddChaosLatency, AddChaosOutcome, and AddChaosBehavior will take effect sequentially if you combine them together. In the above example, we use fault first then latency strategy, which can save fault waiting time. If you put AddChaosLatency before AddChaosFault, you will get different behavior.Built-in chaos strategies
| Strategy | Type | What does the strategy do? |
|---|---|---|
| Fault | Proactive | Injects exceptions in your system. |
| Outcome | Reactive | Injects fake outcomes (results or exceptions) in your system. |
| Latency | Proactive | Injects latency into executions before the calls are made. |
| Behavior | Proactive | Allows you to inject any extra behavior, before a call is placed. |
Common options across strategies
All chaos strategies share these configuration options:| Property | Default Value | Description |
|---|---|---|
InjectionRate | 0.001 | A decimal between 0 and 1 inclusive. The strategy will inject the chaos, randomly, that proportion of the time, e.g.: if 0.2, twenty percent of calls will be randomly affected; if 0.01, one percent of calls; if 1, all calls. |
InjectionRateGenerator | null | Generates the injection rate for a given execution, which the value should be between [0, 1] (inclusive). |
Enabled | true | Determines whether the strategy is enabled or not. |
EnabledGenerator | null | The generator that indicates whether the chaos strategy is enabled for a given execution. |
- If both
InjectionRateandInjectionRateGeneratorare specified thenInjectionRatewill be ignored. - If both
EnabledandEnabledGeneratorare specified thenEnabledwill be ignored.
Major differences from Polly.Contrib.Simmy
This section highlights the major differences compared to thePolly.Contrib.Simmy library:
- From
MonkeyPolicytoChaosStrategy: We’ve updated the terminology fromMonkeytoChaosto better align with the well-recognized principles of chaos engineering. - Unified configuration options: The
InjectOptionsBaseandInjectOptionsAsyncBaseare now consolidated intoChaosStrategyOptions. This change brings Simmy in line with the Polly v8 API, offering built-in support for options-based configuration and seamless integration of synchronous and asynchronous executions. - Chaos strategies enabled by default: Adding a chaos strategy (previously known as monkey policy) now means it’s active right away. This is a departure from earlier versions, where the monkey policy had to be explicitly enabled.
- API changes: The new version of Simmy introduces several API updates:
| From | To |
|---|---|
InjectException | AddChaosFault |
InjectResult | AddChaosOutcome |
InjectBehavior | AddChaosBehavior |
InjectLatency | AddChaosLatency |
- Sync and async unification: Before, Simmy had various methods to set policies like
InjectLatency,InjectLatencyAsync,InjectLatency<T>, andInjectLatencyAsync<T>. With the new version based on Polly v8, these methods have been combined into a singleAddChaosLatencyextension that works for bothResiliencePipelineBuilderandResiliencePipelineBuilder<T>. These rules are covering all types of chaos strategies (Outcome, Fault, Latency, and Behavior).
Inject chaos selectively
You can dynamically adjust the frequency and timing of chaos injection. For instance, in pre-production and test environments, it’s sensible to consistently inject chaos. This proactive approach helps in preparing for potential failures. In production environments, however, you may prefer to limit chaos to certain users and tenants, ensuring that regular users remain unaffected. Here’s how to configure chaos strategies to enable selective injection:Centralize chaos management
We recommend encapsulating the chaos decisions and injection rate in a shared class, such asIChaosManager:
Telemetry
The telemetry of chaos strategies is seamlessly integrated with Polly’s telemetry infrastructure. The chaos strategies produce the following information events:Chaos.OnFault- Reported when a fault is injectedChaos.OnOutcome- Reported when an outcome is injectedChaos.OnLatency- Reported when latency is injectedChaos.OnBehavior- Reported when a behavior is injected
Information severity.
Motivation
There are a lot of questions when it comes to chaos engineering and making sure that a system is actually ready to face the worst possible scenarios:- Is my system resilient enough?
- Am I handling the right exceptions/scenarios?
- How will my system behave if X happens?
- How can I test without waiting for a handled (or even unhandled) exception to happen in my production environment?
What is needed to simulate chaotic scenarios?
- A way to simulate failures of dependencies (any service dependency for example).
- Define when to fail based on some external factors - maybe global configuration or some rule.
- A way to revert easily, to control the blast radius.
- To be production grade, to run this in a production or near-production system with automation.
Next steps
Fault Injection
Inject exceptions to test error handling
Latency Injection
Add delays to simulate slow operations
Outcome Injection
Inject fake results or responses
Behavior Injection
Execute custom behavior before operations