The other day I had a chance to catch up with a colleague who was eagerly anticipating the start of a new incident manager in his group. He explained that they had quite a few really good candidates, but the role was a large one and right now they were feeling the crunch being short a person.
Initially I thought that they might have been looking for someone to oversee the process but it turned out that they really needed someone with experience at the role of managing the incidents and to be able to hit the ground running since there were quite a few issues each day that needed to be managed to resolution by a seasoned pro.
Because I am curious, and this discussion was already starting to prompt me for a blog post, I wondered what he meant by dealing with loads of issues each day. We explained that it wasn’t one thing that always was an issue it was that within the sea of applications and infrastructure they supported there always seemed to be an issue with something.
“Bottom line, it’s because our change management process isn’t very mature, we still have loads of issues as the result of a change going bad.”
He also outlined that many mornings seemed to be a panic state, people were correcting small issues from a change gone sideways or that a change had not completed on time. The perspective from the business was that there were regular issues. The reality in these types of situations is that suppport teams tend to go into firefighting mode, and as we all know once you are in that state it can be hard to get out. we can hear our business say “Get it fixed as fast as possible”
They did this so well that the incident team was required to have a mastery of resolving issues, so much so it was an expectation. The trouble here is that because this process was not really working in a well-rounded way with other inputs and outputs there was no real way to make overall lasting improvements.
This is likely a loop that will continue until they stop and take a closer look at the big picture.
Let’s expand this one level
Why are we seeing issues in the first place? This isn’t to say that we are looking for what caused the issues. This is typically how this incident centric organization ended up in this rut. They are consistently looking at the technical reason that these failures occur but are not applying what they find to the area that will improve this situation. In reality when we start looking at a larger, more process centric view, we can see that the change management process clearly has some areas for improvement since we have several failed changes or changes which exceed their windows.
Expand one more layer
The magnification for the issue shouldn’t stop there. Once we make an improvement in the initial process we can look for what inputs and outputs are still gaps and make some enhancements in those areas. It’s almost like a domino effect. To continue with this example this organization might focus on processes within Service Operation or Service Transition, but we should also start to think about what this looks like from Service Design and Strategy. For example as we look to make improvements within change management we might find that this is the result of how we manage demand from the business.
Expand one more time
If we were to magnify this again we might identify that all causes for our challenges are the result of poorly managed communication. Do we know and understand what the business outcomes are? Are we in a position to know or understand what they are? When we ask these questions we need to really be sure that we can get the answer without any level of assumption. Are the business goals clear and do they drive the overall decisions that we make every day within the organization.
While our ability to restore service as the result of an incident is important, having a well-rounded approach to service delivery is far more sustainable in the long run. Continuing to pile on resources for break-fix work has no real value. It is also detrimental to improving and fostering a partnership with the business that you support.