"With so many interacting components, the number of things that can go wrong in a distributed system is enormous. You’ll never be able to prevent all possible failure modes, but you can identify many of the weaknesses in your system before they’re triggered by these events. This report introduces you to Chaos Engineering, a method of experimenting on infrastructure that lets you expose weaknesses before they become a real problem.
Members of the Netflix team that developed Chaos Engineering explain how to apply these principles to your own system. By introducing controlled experiments, you’ll learn how emergent behavior from component interactions can cause your system to drift into an unsafe, chaotic state."
AWS Podcast | Listen & Learn About AWS
#238: Chaos Engineering and Architecture with Adrian Cockcroft | April 8, 2018
AWS re:Invent 2017: Digital Transformation (ARC219) - YouTube
related book:
Drift into Failure: From Hunting Broken Components to Understanding Complex Systems, Sidney Dekker, eBook - Amazon.com
also mentioned in podcast:
Open Source at AWS
Open Source at AWS @ GitHub
GitHub - awslabs/sockeye: Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet