Welcome to Episode 45 of The Hockey Stick Show! I’m Miko Pawlikowski, and in this episode, I had the chance to chat with Rob Charlwood, Principal Site Reliability Engineer at Cazoo. We explored the world of chaos engineering — what it is, how to get started, and why it’s essential for building resilient systems.
Demystifying Chaos Engineering
Rob broke down what chaos engineering really means: intentional, controlled experiments designed to reveal weaknesses in systems before they cause real problems. We discussed how chaos engineering isn’t about breaking things for fun, but about building confidence in how systems behave under stress.
From Incident Response to Proactive Resilience
Rob shared how his background in incident response and SRE shaped his passion for chaos engineering. Instead of waiting for outages, he advocates for proactive discovery of failure points through thoughtful experimentation — turning chaos into a tool for continuous learning.
How to Start with Chaos Engineering
For teams looking to adopt chaos engineering, Rob’s advice is to start small and safe:
Begin with game days or small-scale failure injections.
Focus on learning and sharing results to build a culture of resilience.
Leverage open-source tools (like Chaos Mesh or Gremlin) that help lower the barrier to entry.
Balancing Risk and Safety
We talked about how to design experiments responsibly, especially in production environments. Rob emphasized:
The importance of defining clear hypotheses.
Building guardrails and abort mechanisms.
Communicating openly with stakeholders.
The Human Side of Chaos Engineering
Rob highlighted that chaos engineering is as much about people as technology. It’s about creating psychological safety so teams feel comfortable exploring failure scenarios — and ensuring blameless culture when things go wrong.
Looking Ahead: The Future of Chaos Engineering
Rob shared his thoughts on where the field is headed:
More automation and integration into CI/CD pipelines.
Greater focus on socio-technical systems, not just infrastructure.
Chaos engineering as a standard practice in modern reliability engineering.
Final Thoughts
This episode is packed with insights on how to make chaos engineering approachable and valuable — no matter your team size or maturity level. Rob’s journey shows how deliberate experimentation can turn potential disasters into opportunities for growth.
You can connect with Rob on LinkedIn and find more resources about chaos engineering at cazoo.co.uk and chaosengineering.org.
Thanks for listening!
Share this post