Over the past few years, there’s been a lot said about the need to “tame the monolith.” Tightly coupled data and domains, slow deployments, and long test runs present real challenges, especially when trying to scale teams and applications.
It’s no surprise that organizations of all kinds, from startups to well-established enterprises, have been migrating their monolithic apps to microservice and service-oriented architectures. Microservices break all the components of an application into independent modules, or services. For example, a service’s sole responsibility may be archiving a PDF. Service-oriented architectures (SOAs) function like microservices, but apply to the entire enterprise; think service-per-domain, payments, and auth.
Both microservices and SOAs can establish clear ownership over domains and their systems, decouple data, and allow for independent deployments. While this may address some of the issues your team runs into with a monolith, that doesn’t mean microservices are right for every organization or application. Through conversations with engineers, we’ve learned that there’s a real need to tame your microservices, too. In this article, we’ll examine some effective strategies for maintaining your microservices and improving your velocity in the process.
Why do my microservices need to be tamed?
For all of their warts, monoliths do have an edge when it comes to simplicity: all of the code is in one place, so it’s easy to peek under the hood. There are no network calls, and no overhead calling a function somewhere else in the codebase. If you’re using a typed language, you can catch API changes at compile time.
With microservices, on the other hand, there are a suite of technical challenges to consider:
Operational complexity. Managing a distributed system requires an investment in time and effort from everyone involved.
Network latency. Services communicate through APIs accessed through a network, instead of interprocess communication.
Versioning. Each service is updated independently, so the application must be equipped to support multiple versions of a service.
Change management and releases. Without effective communication between teams, changes could impact other teams’ services without them knowing.
While technical challenges do require attention, sometimes the most significant roadblocks have nothing to do with software. When it comes to taming microservices, it’s also important to consider the the human developers and engineers working with the microservices.
PEBCAK: Problem exists between chair and keyboard
Most organizations make the move to microservices in order to scale their teams, rather than to scale their application. Distributing ownership of domains, data, and release cadence can significantly improve the velocity of your engineering teams.
However, distributing ownership means that teams may diverge in their operational procedures and documentation practices. Over time, this can wreak havoc on productivity, dragging your velocity back to its starting place. Your workplace culture and the challenges that relate to people and processes have far-reaching consequences:
It’s costly and challenging to make structural changes. If each team maintains their own processes, then moving an engineer from one team to another may be as costly and time-consuming as onboarding a new hire.
Operational triage takes longer. Tribal knowledge dissipates as an organization grows. This makes it harder to answer essential questions, like knowing what services exist, what they do, and who owns them.
Communication overhead between teams becomes significant. Oversharing information is noisy. Undersharing information can, in the worst cases, lead to outages or issues with functionality, negatively impacting the bottom line.
If you’re considering migrating to microservices to scale your devops team, begin by considering how you can anticipate and plan for people-related challenges.
Taming microservices: A multipronged approach
When it comes to taming your microservices, you’ll need to consider solutions that address both your team’s culture and the structure of your application.
The case for a catalog
One of the most valuable tools at your disposal is a service catalog, a centralized source of information about all of your services, including who owns them and what their dependencies are. Over time, you may develop dozens, if not hundreds, of services, so you’ll need an easy way track of all them or your team will quickly become overwhelmed and their velocity will plummet.
Not only can a catalog help you keep track of your services, it can also help your teams develop better technology. Because the catalog centralizes key information — like programming language, docs, dashboard links, and runbooks — it makes collaboration easier, and it smooths the transition for anyone who’s joining a new team.
By making ownership clear, a catalog can also hold team members accountable for the services they own. Plus, it can be easier for other teams to get in touch with the right people when issues arise, so folks can reach a solution more quickly.
Service-oriented teams
With a service-oriented team structure, the developers who build a service are responsible for handling all aspects of that service, from programming to production to managing outages. These team members will naturally become experts in their services, which improves agility and velocity, leading to overall better code.
This organizational strategy shouldn’t isolate teams from one another, though. Service owners should make sure that other teams can easily access all of the critical information about their services. Just like services need to communicate with one another, teams need to be in communication. Clearly defined ownership should not only promote accountability within a team, but should streamline operations for other teams.
Prevention is better than the cure — you can’t avoid issues and outages, but you can set yourself up well to manage them when they arise. Clear ownership does just that, empowering your engineers to take the necessary steps to resolve a problem.
API design
APIs are critical to a functioning microservice application, and there are a few different ways of building them. A code-first approach, also known as implementation-first, is the traditional way of building an API: the business requirements dictate how the API is coded, and its implementation drives development. Code comes first, then the contract generation, before any testing or documentation.
With a design-first approach, also known as spec-first, the API contract is designed and documentation is developed before any coding happens. While a code-first approach will allow your team to hit the ground running, a spec-first approach requires more advanced planning and coordination, but the end result is an iterative process that will shorten the runway for future APIs.
A spec-first workflow provides even greater benefits when it’s used with additional tooling, like OpenAPI or gRPC. Designing schemas and APIs ahead of time comes with a bunch of benefits:
You can use code generation for clients across different languages.
You’ll catch breaking changes to your API at build time.
Code review processes loop in the right stakeholders before a change is released.
Contract testing ensures you don’t go out of compliance with your predetermined schema.
Atlassian: A spec-first use case
Atlassian, the software development company behind Jira and Trello, found themselves building lots of APIs to keep up with the integrations produced by the Jira team. While their implementation-first approach minimized double handling and gave them the ability to evolve their APIs during testing, it would take days, if not weeks, to gain meaningful feedback.
Atlassian transitioned to a spec-first approach using Swagger and OpenAPI specification, and they were able to significantly tighten their feedback loops. Because feedback was focused on actual API design, rather than its implementation, feedback was also more actionable. Plus, by validating against the spec, they were able to this discover breaking changes to API implementation during testing.
Avoid building a distributed monolith
Regardless of whether you take an implementation- or spec-first approach, you need to be thoughtful with your API design. If you’re not, you may end up building a distributed monolith instead of a microservice architecture, where components are distributed but still tightly coupled. These are a few signs that you’re dealing with a distributed monolith:
No separation of concerns between components
Shared databases and data models
Releasing one services requires deploying multiple other services
Consuming a service’s APIs requires knowledge about its data models and side effects
A distributed monolith takes on all the complexity of microservices, but doesn’t offer the agility and scalability that microservices promise, making the codebase incredibly difficult to work with. With a distributed monolith, you also need to worry about high latency and low throughput.
To avoid building a distributed monolith, you can try the Amazon model, where teams communicate exclusively through APIs. This approach requires a few things of the service owner(s):
Consider API requests and responses before implementation
Share the API for review so that use cases for dependencies are well-covered
Communicate API changes in advance
Version APIs to maintain backwards compatibility
Data models
The real breaking point for a monolithic application is when multiple teams across domains depend on the same data models. This stalls the development cadence across the organization, resulting in spaghetti models that makes it exponentially more difficult to navigate the codebase.
Although microservices may seem like a natural way to avoid this, it’s easy to fall into the same trap with SOAs if you’re not careful. There are a few good habits you can foster that will help you avoid these issues:
Separate data stores for each service. You can ensure that your services remain loosely coupled if each has its own database that cannot be accessed by other services. This also allows engineers to use databases that are best suited to the needs of each service.
Consider your data models and links before implementing APIs. By considering the big picture when designing APIs, you’re more likely to create a robust API that meets the needs of all teams involved.
Reduce dependencies between services at the data level. By minimizing the data dependencies, it becomes easier to make changes to individual services. For example, you can store reference IDs for different resources in a service, and fetch that data through an API instead.
Ensure stakeholders don’t build tooling directly on your data. If tooling is built on your data, your data model is locked in, preventing iteration. Stakeholders should expose data with its own contract through APIs or analytical data streams.
Standardize the process with tooling
As we discussed above, distributed services means distributed teams. While this separation of concern can improve efficiency, it can also lead to teams developing unique ways of doing things. Down the road, these operational inconsistencies can cause serious slowdowns and make it hard for teams to collaborate. The best thing you can do is establish a standardized process with the help of well-designed tools.
For example, Atlassian developed an internal PaaS, Micros, that acts as a thin wrapper around AWS. They chose this path, rather than providing direct access to AWS to all teams, because of the scalability afforded by “a platform that strongly encourages consistent technologies, tools, and processes,” which just wouldn’t be possible otherwise. Micros gives developers the ability to provision their desired resources, while augmenting the process with standardized monitoring and logging protocols.
Spotify, too, developed an internal platform called System-Z, which functioned as a catalog. System-Z provided a source of truth for details about all services, like ownership, links to documentation and runbooks, and recent deploys. It was so successful that in 2020, System-Z was open-sourced as Backstage.
Well-designed tooling to enforce standardization will improve velocity and maintain consistency across teams, even if they’re distributed. Tools like Micros and Backstage become even more powerful when they’re used as the entry point for deeper insights into services, like on-call rotations, links to dashboards, and logging.
Having a centralized, standardized system makes it easier to ramp up engineers and onboard new team members. Plus, operational efficiency is greatly improved when all of your services are operating the same way and developers have a single place to look for critical information.
Cortex: The entry point into your service architecture
Unfortunately, many organizations learn these lessons only after the problem has gotten out of hand. You don’t want to wait until you’re dealing with too many services, growing teams, high turnover, and remote developers before you build the tools you need to effectively manage your architecture.
Cortex solves these problems by providing a standard, opinionated way to organize the vital information about all of your services, along with additional functionality to provide you with greater visibility into your microservices:
Service Catalog. The service catalog tells you what services exist, who owns them, how to contact the owners, where the runbooks are, and everything else you need to know.
Resource Catalog. The resource catalog tracks all the infrastructure assets that are operating under the hood, so you have full visibility into your architecture, and establishes ownership, so teams are just as accountable for resources as services.
Scorecards and reporting. Scorecards reveal the health of your services and allow you to set custom rules, so you can see at a glance which services are owned, whether on-call is established, and much more.
Integrations. By connecting with your favorite tools, Cortex acts as a single pane of glass, collecting all the info you need. When Pagerduty triggers an alert, Cortex will send you a Slack message with service information, so you can quickly take the steps toward resolution.
Query Builder. Easily query data that spans across your catalog and third party tools and gain deeper insights into how your services are actually working.
Scaffolder. Import and build templates so developers can easily spin up new services that follow your organization’s best practices.
We get it — we’ve felt the headaches that come with managing microservices and SOAs. At Cortex, our mission is to rid engineers of this pain by providing a platform that creates visibility and drives a culture of accountability among and across teams. Request a free demo today to see how we can help you start the transition to managing your microservices with a modern catalog.