Putting the Ops in DevOps
I often hear people who work in operational roles in IT complain that DevOps is just about development, and that much of the important work done in ops is not really included. When I look around the organizations where I work, I can immediately see why this view is so common. Almost all DevOps activity is driven by people who come from a development background. But if we want DevOps to work for everyone then we all need to be involved. This doesn’t mean waiting to be invited to join in and complaining when we aren’t. It means actively adopting ideas from DevOps into how we run IT operations and seeking out our development colleagues to foster collaboration.
Far too often, this just isn’t happening.
Bridging the Gap between Dev and Ops
It’s fair to say that almost all DevOps activity has been driven by development staff. Their approach has resulted in major improvements. They create valuable software much more quickly, and the software they create tends to be more reliable and better focussed on customer needs. Although some of these organizations also include some aspects of IT operations into how they do DevOps, the push for this tends to come from the development community, and, to make things worse, improvements tend to be resisted by IT operations people who see this as encroaching on their area of expertise.
Even where aspects of DevOps have been embraced by operations, they have tended to be limited to just a few, largely technical, areas of operations. For example, toolchain optimizations create great improvements in testing, in change management and in release management; and sometimes adopting the collaboration and sharing that a DevOps approach promotes has resulted in improved time to resolution for software bugs. But other aspects of IT operations are rarely considered.
What’s needed first and foremost is a drive to engage with the actual DevOps approach itself, and to work out the ways in which adopting and adapting it can be helpful to operations.
What is DevOps?
I don’t have space in this blog to give a detailed explanation of DevOps, so I’m just going to give a very brief overview of what I see as the most important aspects.
The Three Ways
One view of DevOps is that it is about understanding and optimising the three ways: Flow, Feedback, Experiment and Learn. I have written about the three ways before, so if you want to learn more have a look at my blog The 3 ways aren’t just for DevOps. In summary:
- Flow means understanding the end-to-end flow of work you are part of, and thinking about how to optimise the whole thing, not just your bit. This involves eliminating queues and bottlenecks throughout the system and helping small pieces of work to flow independently to maximise throughput.
- Feedback means creating feedback loops. Not just for the end-to-end flow, but also for every stage within the flow. Offer feedback to people who provide you with input and solicit feedback from those that you affect. Then use the feedback to understand how you can remove friction and improve flow.
- Experiment and learn means trying things out in safe-to-fail environments, and constantly learning and improving. Don’t just try things at random, start by forming a hypothesis about how you could improve and then try something to see if you were right. If it works out well then build on it, otherwise revert to the original situation and think some more.
CALMS
Another way of thinking about DevOps is summarised in the acronym CALMS, which stands for Culture, Automation, Lean, Measurement and Sharing. Focussing on these five important aspects of DevOps results in optimised working and improved value.
Culture is all about collaboration and working together to optimise the value we create for paying customers. The culture of your organization determines how well you manage flow, feedback and learning, as well as everything else you do.
Automation eliminates wasteful manual processes and replaces them with reliable, repeatable automated ones. The most common DevOps example is the automation of the toolchain that takes newly developed code, integrates it with other code, then builds and tests the full software solution ready for deployment.
Lean is a way of optimising work. It includes understanding what is really happening by going to where the work is done, value-stream-mapping to understand flow of work, eliminating waste, and incorporating improvement into everything you do.
Measurement is just what you might expect. Measure what you do, and what outcomes this results in, so you can monitor the effect of changes and improvements.
Sharing stresses the importance of making work visible, and collaborating. It encourages the use of Kanban and other visualization methodologies to help everyone understand how work is flowing through the system.
How can IT Operations get more involved with DevOps?
The most important thing that needs to happen is for operations staff to engage with these ideas. If IT operations staff adopt the three ways, and think about CALMS, it should be possible to improve every aspect of IT, resulting in even greater value for customers. Here are some specific examples of how IT operations staff could create more value by adopting DevOps ideas.
Think about flow and feedback
When we design ITSM processes, we tend to introduce queues of work. We often create large numbers of queues, and each queue has lots of work waiting to be completed. For example, an incident management system may have a separate queue for each team that works on incidents. The more queues we create, the more we hinder the flow of work. Ideally every piece of work should flow from initiation to completion without having to sit in a queue at any time. If you are trying to manage large numbers of incidents and problems then it may not be possible to eliminate queues altogether, but it is important to realize that these queues may create an illusion of an orderly approach, but they are not really helping anyone. They are part of the problem. Reducing the number and length of queues will not only improve flow but will directly improve your customers’ experience of IT.
Multiple queues can be especially problematic for people working in support teams, who may have one queue of incoming incidents, another queue of incoming change requests, yet one more with project work, and no help in prioritizing the work coming from all these queues. Creation of a Kanban board for the team, to help visualize all the work, and to allow the team to pull work when they are ready for it, can be a great help here.
Most IT organizations solicit feedback from end users when incidents are closed, and many also carry out annual surveys of customer satisfaction. While these sources of feedback can be helpful, they are nowhere near sufficient, and often arrive far too late to be of much value. But if every person or group automatically offers feedback to those that deliver work to them and solicits feedback from those that they pass work on to, then every time you transfer an incident to a different group you create an opportunity for feedback that can be put to use immediately. Did the service desk collect the right information to enable second-line support to work on the incident? Did second-line support provide a timely response to the service desk, so they could update the end-user? If something could have gone better, what could you do to fix it next time? This type of feedback loop is an invaluable tool for improvement in a way that just waiting for feedback after an incident has been closed can never be.
If you regularly and reliably solicit and offer feedback then many improvement opportunities will become visible, resulting in much better flow of work, and much improved customer experiences.
Use automated testing
Every DevOps environment configures automated testing of software, but this is rarely extended all the way to the production environment for use by operations staff.
When things go wrong, IT operations staff need to investigate exactly what is working and what isn’t, so that they can intervene appropriately. Is this a software bug or a network failure? Or maybe there is an error in the data? In the same way as development teams create toolchains that automate the entire flow of integration and testing, IT operations teams could and should create automated testing that enables them to understand the health of the IT services, and pinpoint the exact area of failure when there is an error. Ideally the testing can be shared, with the same tests being used to verify correct operations in the production environment as were used to verify correct behaviour in the integration and test environment. This is a great opportunity for collaboration and sharing between development and operations teams.
Go to the Gemba
One of the principles of Lean is expressed in the phrase “go to the gemba”. This means go to where the work is done, if you want to really understand what’s happening. IT managers shouldn’t just look at metrics, they should go and spend some time working on the service desk. And your service desk people shouldn’t just ask users about what they do, they should spend some time working in the various business units to understand how the organization really works.
This will not only result in improved understanding of how things work but will also help to foster the culture needed for collaboration and sharing.
Experiment and learn
IT operations usually has many processes that help to ensure reliable and repeatable delivery of services. In many organizations these processes are static, they don’t adapt to changing circumstances.
A really great IT organization understands that their processes need to evolve to keep up with changing business needs and environmental constraints. This doesn’t mean that every process needs to have a major re-design every year, that would be very inefficient. What you should do is encourage staff to identify improvement opportunities, and evaluate these to pin point improvements that you think will help. Then you should think about how you can test out these ideas, without creating major risk or disruption. Ideally you will find a way to measure the impact of the improvements, so that you can be sure that they are working well before rolling them out across your organization.
For example, you may think it would help to change how the service desk categorises incidents. Think hard about the expected benefits. Then make a small, reversible, change to just one or two categories and measure how well this works. If it doesn’t deliver the expected benefits than revert to the original situation, but if it does then you can plan your next incremental improvement. You can read some more examples of this approach in Major ITSM Improvements Should Start with Small Steps.
Measure what matters
The M in CALMS stands for measurement, and this is another area where IT operations can learn from the DevOps culture. It is vital to measure what you are doing, and what value this creates, but it is easy to measure too much and to become fixated on internal measurements.
I have written extensively about the need to base Key Performance Indicators (KPIs) on clearly understood Critical Success Factors (CSFs) and to remember that the measurements are NOT your goals, just a tool you can use to identify trends and thresholds. You can find more suggestions for IT operations metrics in
- Getting Your Service Desk Metrics and Measurement Right
- Defining Metrics for Problem Management
- Defining Metrics for Change Management
- How to Define, Measure, and Report IT Service Availability
- How Do You Measure IT Services?
- Your metrics are not your goals
Summary and Conclusion
I have given just a few examples of how IT operations can use ideas from DevOps to improve how they work. There is clearly a great opportunity for DevOps to extend beyond its current Dev focus to make a huge improvement to how we do “Ops”, but people in IT operations teams can’t just sit back and wait for someone else to make DevOps work for them. We need to take responsibility for putting the Ops into DevOps. Don’t try to copy my examples but do think about how you can use the three ways and CALMS to improve how you deliver services to your customers.
The opportunity is there. Are you ready to take it?
This work is licensed under CC BY-SA 4.0