In recent years, as companies rush to adopt microservices as core components of their infrastructure, we've noticed a pattern plaguing SREs:
Independent teams are given the freedom to build independent services to meet their specific needs. As a result, many disconnected services crop up—the broader team might not know what they are for or even that they exist.
In time, teams reorganize, engineers leave the company, and information essential to the maintenance of services they owned is lost.
Now, when an outage occurs, SREs find themselves hunting through Slack threads, searching old code comments, even emailing former coworkers to reach a resolution that could be simple and close-at-hand.
These inefficiencies lead to redundant tools, buggy code, and frustration across the organization.
If this situation sounds familiar to you, don’t worry—we’re here to help. With these three steps, your tangled web of unreliable services can become an accessible suite of software your whole organization will have confidence in.
1. Turn tribal knowledge into shared information
Some knowledge, like deep familiarity with the quirks and contingencies of a codebase, can only be gained through long periods of hands-on experience. When collected, documented, and shared across an engineering team, this knowledge is extremely valuable, providing critical intuition when fixing issues or building features. But when this information is “tribal knowledge”—only circulated informally between a few people on the team—it's easily lost in the flux of reorganizations and personnel changes.
To avoid these costly slowdowns, it’s crucial to make sure information about your services is being communicated effectively. A service catalog is a good place to start. With a central location for each service's metadata (things like design documents, runbooks, and links to external dashboards) developers and SREs taking charge of unfamiliar services can quickly get the lay of the land.
2. Define and enforce your standards
Now that you understand your services' context, how do you understand their quality? With a defined, measurable set of best practices, you can quantitatively determine which of your existing services needs more attention. And by automating the measurement of these quality metrics in your CI/CD pipeline, you can ensure that every new service you deploy meets your benchmarks.
Cortex's Scorecards are one way to define these standards across your organization. With CQL, our custom language, you can hook into your third-party tools, write rules for your use case, and automatically enforce quality in your production deployments—no need for checklists or spreadsheets.
3. Make life easy
After you’ve catalogued your services and defined their best practices, you should have a good idea of what a high-quality service looks like for the different technologies in use at your company. With this insight and a tool like Cookiecutter, you can create bespoke templates for your engineers based on the features and standards that matter to you. In one click, without boilerplate or error-prone manual configuration, they'll be able to spin up a service they know they can trust.
Of course, when it's this easy to create a service, it can be just as easy to lose track of what you've made. If you scaffold a service with Cortex, we'll automatically integrate it with your service catalog, giving your organization full visibility from the very start of its lifecycle.
We know building a culture that supports high-quality microservices is tough. That’s why Cortex is committed to providing the tools your organization needs to develop and enforce best practices in each of its services. With these pieces in place, your teams will be sharing knowledge, confidently deploying quality code, and eliminating hours of unnecessary friction.
If you have any questions or would like to learn more, reach out at team@getcortexapp.com—we’d love to hear from you.
This article is based on an interview Cortex did with TFiR which you can watch in its entirety here.