DevOps uses software development (Dev) and IT operations (Ops) philosophies and practices to shorten the software development lifecycle (SDLC), provide continuous delivery of products, and maintain high software quality. Engineers who tackle slow deployment problems, communication bottlenecks between development and operations teams, and manual and error-prone engineering processes and tasks are doing DevOps work. Their work directly affects development velocity, reducing rates of failure in the production environment and minimizing system downtimes.
As an engineering team grows, DevOps processes become reliant on automation. If DevOps remains manual and ad hoc then engineering team efficiency will suffer, error rates will start to creep upwards, delivery velocity will slow, and apps and services will gradually become less reliable.
In this article you’ll learn about automating DevOps and the consequent benefits. We’ll cover:
What is DevOps, and what it means to automate it
What can be automated in DevOps workflows
The expected benefits for automation
Good metrics to use when measuring automation impact
Best practices for adopting automation into a DevOps flow
What an IDP like Cortex can do to help
What is DevOps automation?
Effective DevOps automation depends on engineers taking a rigorous and opinionated approach to deciding which work should be automated and when. If run manually, DevOps processes are error prone, as the underlying systems are complex and deeply integrated with each other. The processes are also inefficient, as they require an attentive DevOps engineer to be available to start the processes, keep them running, and make sure they are completed successfully. This is taking them away from other more productive tasks.
Most domains under DevOps’ purview—such as testing, integration, deployment, infrastructure provisioning, and monitoring—are highly leveraged and impact the entirety of the engineering org’s work. It’s difficult to support these processes at scale, while remaining efficient and reliable, without methodically automating significant portions of the work.
What DevOps processes can be automated?
When looking for which parts of the DevOps pipeline to automate, focus on rule-based, data-driven, and repetitive tasks, and prioritize the most time-consuming ones first. Across the entire DevOps lifecycle, there are many opportunities for applying automation.
Continuous integration/continuous deployment (CI/CD)
Code integration (CI) and continuous deployment (CD), when automated, will significantly reduce the operational load on the team members. With an automated CI/CD pipeline, app changes can make it to production faster and without any dependencies on manual operational processes. Automated software delivery can move work end to end, from the moment a developer pushes source code changes to version control through to production. As a result, 50% of engineering leaders[1] name automation and automated CI/CD as key features of their production process (see our State of Software Production Readiness Report for details).
Testing and validation
Automation can be used to: run tests at various points in the development process, offload the running of tests to cloud systems, and generate tests based on various system properties. Unit tests, which are configured to run automatically and are coupled with coverage checks, enforce guarantees that the system doesn’t regress, setting a quality standard for production-ready work. When more complex tests and validation are offloaded to cloud systems such as Azure or AWS, engineers can work without being blocked on testing. Cloud-hosted tests will also scale beyond what’s possible on a development machine. Automatically generated tests can reduce the workload on engineering teams and increase velocity. Fully automated CI/CD processes depend heavily on a mature automated testing environment. In fact, test automation is second only to CI/CD as a requirement for production readiness automation and 42% of engineering leaders mention it explicitly[1].
Resource provisioning
Automated scaling and self-serve resource provisioning allows engineers to ship source code to production faster and helps them avoid slowdowns from manual approval processes. Provisioning automation can also be integrated with identity management systems to then make sure that access and permissions to internal resources quickly respond to an engineer’s changing responsibilities and work.
Infrastructure as code (IaC)
IaC systems allow DevOps teams to rapidly and dynamically scale up their operations. By switching to a programmable model for infrastructure control, they enable the automation of cloud infrastructure and IT infrastructure deployment and configuration management.
Monitoring and incident response
Automatic monitoring tools dynamically learn how a system works and develop or suggest heuristics to detect when an incident is occurring, making it easier for teams to monitor large and complex systems. Once an incident is ongoing, automatic response and recovery tools can apply common recovery heuristics without waiting for human intervention, and apply alerting heuristics to make sure only the correct persons are involved when more complex responses are warranted.
What are the benefits of DevOps automation?
Automation transforms DevOps workflows into more efficient, reliable, and scalable versions of themselves. They are highly leveraged ways to streamline engineering work, and they can deliver a range of benefits.
Increased speed
New features and software improvements automatically move to production without being blocked by operational workflows or the availability of engineers for manual work.
Improved quality and reliability
Tests are guaranteed to run, and standards for reliability and quality are automatically enforced, preventing regressions and highlighting potential problems before they become difficult to address. This directly translates into an improved user experience for apps and services.
Enhanced collaboration
To support automated DevOps workflows, operational and development teams will work closely and develop a shared accountability and a shared understanding of core work. This results in a more integrated work environment which will reduce barriers to communication.
Increased innovation
When combined, the safety and assurances that come from automated testing, validation, error detection, and error recovery derisk innovative and creative solutions. As a result, teams will be able to innovate more and faster.
Cost reduction
When resource allocation becomes automatic, and controls for tuning resources up or down are more fine grained, operational and engineering teams can be much more efficient about resource use, and they can optimize costs.
Increased developer productivity
With each DevOps workflow that is automated, engineering and operational teams find themselves spending less time on repetitive and time-consuming tasks. Improved communication reduces instances of teams being blocked waiting for each other. Individual developer productivity increases as a result.
Improved developer experience (DevEx)
A low-friction and highly collaborative DevOps experience will improve developer morale and productivity, and translate into longer-term retention. There has been a renewed focus on DevEx this year, as Justin Reock highlights in his article, “Developer Experience is Dead: Long Live Developer Experience!”.
Improved production readiness
DevOps workflows can encompass the entire production readiness life cycle. As the production readiness life cycle becomes more automated, quality standards become more easy to enforce and check globally.
How to measure the success of DevOps automation?
Once the decision is made to increase DevOps automation, keeping track of progress in an objective manner will help quantify the impact and help drive continuous improvements. The DevOps Research and Assessment (DORA) team recommends the following metrics.
Deployment frequency
Deployment frequency measures how often a team can deploy changes to production. It ranges from elite (ad hoc or multiple times daily) to low (once per day to once per week).
Lead time for changes (LTC)
LTD measures turnaround time between commits into version control systems such as bitbucket or git, and deployment of the same code to production. It ranges from low (1 week to 1 month) to elite (less than 1 hour).
Change failure rate (CFR)
CFR measures the percentage of releases that result in customer impact, such as product downtime, degraded service, or rollbacks. It ranges from low (64%) to elite (5% or less).
Failed Deployment Recovery Time (FDTR)
Formerly known as MTTR (see Understanding Mean Time to Resolve (MTTR)), FDTR measures the amount of time it takes your team to restore service when there’s a disruption. It ranges between low (1–6 months) and elite (less than 1 hour).
Reliability
There is no easy way to aggregate reliability system-wide. However, reliability can be measured for individual systems and services by making sure SLA/SLO, performance targets, and error budgets are defined, measured, and tracked.
If you want to dive deeper into the various metrics and how they might be used, see What are developer experience metrics? and Metrics for measuring developer productivity.
Best practices for automating DevOps processes
DevOps teams that decide to adopt a more automated approach to their workflow can have a daunting task ahead of them. They may be faced with skill gaps as teams are over-specialized, significant legacy infrastructure and software, and cultural resistance to change. It’s best to follow a gradual and methodical approach.
Start small
Look for opportunities to apply standardized automation tools, such as Jenkins and its plugins, to parts of the deployment process. There may be opportunities to use out-of-the-box automatic test generation or static code analysis tools for initial progress or to find other small and immediate wins to start.
Focus on pain points
As the team grows increasingly comfortable with and builds capacity for automating their work, prioritization becomes relevant. Identify areas that exhibit frequent human error, or use DORA metrics to identify and prioritize pain points.
Take a layered approach
Avoid getting bogged down in complexity by factoring DevOps toolchains and workflows into layers, and focus on automating each layer independently of the rest. For a detailed discussion on layered approaches, see Rethinking DevOps and automation with a layered approach.
Choose the right tools
To prevent or address scalability problems, it is crucial to use DevOps automation tools that can handle the load and reduce the amount of engineering work operations teams will do in-house. There are many tools to consider for all layers—one might use a mix of Docker, Kubernetes, git and Github actions for CI/CD, ansible for automatic provisioning, and in-house tooling for automatic error detection and remediation. An internal developer portal (IDP) will help document, coordinate, and act as a single source of truth, tracking progress and related data (see What is an internal developer portal?).
Testing, version control, and documentation
Operations-specific automated processes should still follow standard engineering practices. Because they are leveraged and affect the entire engineering team and processes, make sure that core engineering standards around testing, version control, and documentation are equally applied to DevOps automation.
Iterate and improve
Once you’ve addressed most pain points in most of the layers with automation, it’s important to iterate on and maintain a culture of automation throughout the SDLC. DevOps metrics remain relevant, as any significant changes can be pushed into feedback loops to help improve operations as a whole.
To learn more about DevOps practices, we recommend Best practices for DevOps teams and Security, Automation and Developer Experience: The Top DevOps Trends of 2024.
How can Cortex help?
Cortex is a world-class IDP, providing a single interface for accessing all DevOps tools and resources to world-class engineering teams. It acts as a single source of truth for operational work and reduces time spent tracking information down across multiple platforms.
Cortex is a fully featured IDP and it:
Speeds up development with features such as built-in actions to trigger API calls and templates for bootstrapping new software
Increases visibility into the SDLC, by for example keeping automatically up-to-date service ownership lists
Improves resource management by for example centrally tracking usage, highlighting orphaned services and reporting on operational efficiency
The following Cortex features are specifically relevant to DevOps automation efforts:
Scaffolder: The Scaffolder supports project templates containing pre-defined boilerplate code, which covers input validation, tracking, and catalog integration. It can also be used to compose templates into basic automation workflows and remove friction from production readiness flows.
Scorecards: The Scorecards track system performance, deployment statuses, and other key metrics. Users can set benchmarks and see convenient reports.
Eng Intelligence: Cortex analyzes data already imported in its systems to uncover trends and drive meaningful improvements. Data can be compared across teams or groups and drive collaboration and communication, helping team members focus on impactful problems and have meaningful conversations about causes, not just symptoms.
Catalog: The Catalog provides an always and automatically up-to-date view of service ownership information and status in one place, acting as a service version control and documentation center.