Back to Case Studies

How Rapid7 performed 3,000 complex migrations in under 2 weeks

How Rapid7 performed 3,000 complex migrations in under 2 weeks

Program Snapshot

Introduction

Rapid7 is a cybersecurity platform relied on by thousands of organizations around the world to unify endpoint-to-cloud exposure management and detection and response, enabling teams to confidently anticipate threats and detect and respond to cyber attacks.

As a trusted security partner, Rapid7 must ensure utmost operational rigor in all areas of the business. Rapid7 chose Cortex as a partner to drive engineering efficiency through measurable improvements to project velocity and developer experience.

The initial “time to find” problem

In late 2021, long before many engineering teams were thinking about Internal Developer Portals (IDPs), Log4shell had just ground engineering productivity to a halt. Teams around the world spent days sifting through git, spreadsheets, and wikis to surface potentially affected software as well as responsible owners. For Elaine Hardwick (Director of Engineering)  and Amanda Jackson (Program Manager) this was a pivotal moment.

“Log4shell took up a ton of the Platform team’s time—trying to piece together package and ownership information without knowing whether this information was up to date. This sent us into a deep dive of service catalogs like OpsLevel and Backstage to shore up information and shorten time-to-find ownership.”

While Log4shell may have been a catalyst for turning attention to new ways of shoring up information, this use case quickly evolved. Elaine continues,

“—But we needed more. A central place to not only track information about software, but information about how it’s built. We wanted to ask which software was meeting the highest levels of operational maturity? Has everyone made the switch from core infrastructure to newer modules? Are vulnerabilities surfaced actioned on within our SLAs? We wanted this information all in one place, updated automatically, without juggling multiple spreadsheets. That’s when we found our way to Cortex.”

Why Rapid7 chose Cortex

Rapid7 was looking to improve three key areas of operations that all lead to improved information discovery:

  1. Accelerate migrations: Rapid7 wanted to ensure visibility not just across cloud environments, providers, and resources, but within those spaces as well. They needed to reduce time to find information and ensure alignment to ongoing standards of excellence or time bound initiatives like migrations.

  2. Streamline incident response: While the team at Rapid7 had numerous security tools to ensure prompt identification of threats, they wanted to accelerate what came next. The team needed a faster path to understanding ownership, dependencies, and next steps.

  3. Accelerate delivery timelines: Time spent hunting for information, context switching, and responding to threats was taking a toll on developer productivity, and time to market. Rapid7 wanted to reduce rote work for developers to help them get back to building high quality software.

Amanda sums up the core problem, “Walk away from a spreadsheet for a minute, and it’s already stale—making program and software tracking really difficult, and noisy for developers. With Cortex, we never have that issue. I can just trust that information is always up to date, which is huge not just for me or other program managers, but for developer trust—everything is transparently tracked, and we can leave devs alone that have already done what they need to do.”

How Cortex accelerates migrations

Maintaining alignment across a team of more than 1,000 engineers without slowing anyone down isn’t easy. But engineering teams often face moments when consistency is a non-negotiable. Elaine expounds, “Ongoing maintenance of software—primarily addressing end of life initiatives and ensuring everything is on the latest package versions—was a huge pain for us, and really drained both manager and developer time.”

With Cortex, the team can immediately see who owns what, and what the status is, while only developers affected by required changes are pinged with explicit instructions and deadlines. Rapid7’s initiative to upgrade RDS instances shifted what would have taken months to less than 2 weeks.

Amanda explains, “We had around 3,000 RDS instances that needed to be migrated across a variety of regions. With Cortex, we were able to spread out the lift to the appropriate domain owners to track down which RDS instances were left to be upgraded and which teams were responsible for the upgrade. This allowed us to save a massive amount of time in gathering involvement for the upgrade process and allowed us to clearly see the quantity left to make sure we drove adoption across all Engineering teams.” Elaine adds,

“If we didn’t have Cortex, we might have missed an upgrade, which would have resulted in an outage. Cortex really helped make this an air-tight process. Without it, I would have had to personally ping every owner to have them look at service dashboards. Since Cortex looks directly at resource metadata, we get immediate access to the most up-to-date information, without distracting devs.”

How Cortex streamlines incident response

Cortex makes it easy for Rapid7 to quickly find information about incidents and vulnerabilities, and ensure that appropriate follow-up measures are executed in a timely fashion—even if the person on call doesn’t have a lot of historical context.

Amanda explains, “When we have an incident, the person receiving the notification can head straight to Cortex to view everything they need to know. They can drill down into ownership, on-call, and dependencies, check recent events across all connected tools, and access readmes and runbooks, all from one page. This cuts response time significantly.”

She continues, “If we had to do this manually, we’d need to rely on people at the company that have been there long enough to amass enough historical knowledge. That’s a common but problematic approach for lots of companies. Cortex automatically updates this information and enables us to house it in a central location everyone can access.”

How Cortex accelerates delivery timelines

One of Amanda’s primary roles at Rapid7 is to look after Developer Experience. This means her primary metrics often revolve around reducing friction to outcomes. “In my opinion, Rapid7’s best asset is its people. Having the right tools in hand enables them to focus on what they’re best at, and removes friction to impact,” she shares.

She continues, “Without Cortex, our time to deliver would be greatly slowed down—by a couple weeks at least. Cortex allows us to move faster and more securely. Without it, we’re chasing down details during an incident, throughout a tool swap, or during day-to-day developer operations—all things which distract developers from building and shipping quickly. Now all these details are handed to us in a single place that’s always up to date.”

A look at Rapid7’s top scorecards

Rapid7’s primary objective in using Cortex Scorecards is to accelerate time to action—both during an incident and when performing system swaps and upgrades. Top scorecards include:

RDS migrations

The team wanted to ensure global RDS upgrades took place in a timely fashion. Rapid7 successfully upgraded 3,000 multi-region RDS instances in under 2 weeks.

Vulnerability Remediation

The team’s vulnerability remediation Scorecard is used to track vulnerability volume and SLAs for addressing each, by severity. The team avoided outages that could have been caused by missed migrations.

For more information on how Cortex can help you drive alignment to standards of engineering excellence, visit our website, or take a tour today.

Talk to an expert today