Dev Ops and IT Service Management – If it Hurts, do it More Often
Blog: Good e-Learning Blog
Are your old thought patterns holding you back?
It’s just how the brain works. As a newly-born baby, everything you experience is just a confusing mess. Slowly but surely, patterns emerge and you learn for instance that visual outlines tend to distinguish between one thing and another. And that crying results in the appearance of milk.
As soon as a workable degree of predictability is established, we put these heuristics in our toolbox and move on to learning about different things. But from time to time, the need arises to revisit these trusted tools and reluctantly trade them in for superior notions. IT is not immune to this phenomenon. Let’s look at some DevOps thinking about familiar IT Service Management topics.
Change and Stability
Most people would agree that tinkering with something often causes additional problems. “If it ain’t broke, don’t fit it” – in other words, avoid change as much as possible. In IT, this has resulted in the avoidance of frequent releases of new versions of systems so as to reduce the risk of failures. The opposite also applies. Systems that are pinned down by strict procedures, are difficult to change. Processes and practices in many if not most organizations reflect this thinking. There seems to be some kind of natural tension between change and stability.
Yet there is another way of regarding the situation, as illustrated by the following train of thought. The smaller the change, the lower the risk of surprises. This plausible statement implies that large releases should be broken down into smaller changes, and that these smaller changes should therefore be deployed more frequently. Deploying changes more frequently means that the people doing the deployment get the opportunity to practice more often, increasing, in turn, their proficiency to execute change. The higher their proficiency, the less risk there is. So, the mental model “frequent change is bad” has now been turned on its head: “the more frequent, the better”.
This “do it more often” way of thinking is closely related to a concept that is commonly used when dealing with complex systems.
The author Nassim Nicholas Taleb, who’s works include The Black Swan (2007) – about the extreme impact of certain kinds of rare and unpredictable events (outliers) – coined the term “antifragile” in his book Antifragile: Things That Gain From Disorder (2012). He uses the term to refer to things that benefit from disorder. He contrasts antifragile with fragile and resilient. Something that is fragile is damaged by disorder. A resilient object is unaffected by disorder. Antifragile things benefit from disorder.
If it Hurts, do it More Often
“If it hurts, do it more often” is a statement often heard in DevOps circles. This is about growing the resilience if not the antifragility of the system. “System” is used here in the broadest sense, including the people and tooling involved in developing and running information systems. An extreme example is Chaos Monkey, a program developed by Netflix in 2011 that randomly chooses a server and disables it during its usual hours of activity.
At first glance, this sounds crazy. Who on earth would want to encourage production failures?
The rationale was that Netflix could not depend on the random outages to test their behavior in the face of the consequences of this event. Knowing that Chaos Monkey would cause this frequently, motivated the engineers to build redundancy and process automation to survive such incidents, without impacting the millions of Netflix users. Although superficially contra-intuitive, this bold strategy has improved the quality of their services. It just needed a different way of thinking about things – breaking the mental mould.
People are comfortable with the status quo. It’s human nature and, contrary to popular belief, people in IT are human beings. They like things the way they’ve always been. IT Service Management practitioners like process and control and predictability. But when faced with complex adaptive systems that exhibit emergent behavior, they have to leave their comfort zone and think anew.
From an evolutionary point of view, this is healthy. Communities need outside forces to disrupt them from time to time, because otherwise, inbreeding will lead to inevitable extinction. So, is DevOps the mutation that will enrich the IT Service Management gene pool? Will it create a resistant strand of digital DNA that won’t buckle under today’s strenuous demands for quicker delivery of more resilient IT service? It just might. It will mess with your head at first but “if it hurts, do it more often”.