Do You Really Need a Service Mesh?

Blog Article Published: 09/22/2022

Originally published by Tigera here.

Written by Phil DiCorpo, Tigera.

The challenges involved in deploying and managing microservices have led to the creation of the service mesh, a tool for adding observability, security, and traffic management capabilities at the application layer. While a service mesh is intended to help developers and SREs with a number of use cases related to service-to-service communication within Kubernetes clusters, a service mesh also adds operational complexity and introduces an additional control plane for security teams to manage.

What is a service mesh?

A service mesh is a software infrastructure layer for controlling and monitoring internal, service-to-service traffic in microservices applications.

Service mesh provides some of the middleware and some of the components that enable service-to-service communication, such as dynamic discovery. It provides capabilities around service discovery, load balancing traffic across services, security features around encryption and authentication, tracing observability, and more. The service mesh architecture leverages design patterns to enable communication between services without requiring microservices to rewrite applications.

Service mesh architecture

One of the key aspects of how a service mesh works is that it leverages a sidecar design pattern. Services communicate and handle requests via a proxy, which is dynamically injected into each pod. Envoy is one of the most popular proxies and is being used within service meshes for its performance, extensibility, and API facilities.

While this design pattern of a sidecar proxy makes up the data plane of a service mesh, most meshes introduce an additional control plane to configure and operate the data plane.

Control plane vs. data plane

The data plane is implemented as proxies, such as Envoy, deployed as sidecars. This means that every pod includes an instance of this proxy to mediate and control communication between microservices, and observe, collect, and report mesh traffic telemetry. All application-layer traffic is routed through the data plane.

The control plane manages and configures proxies to route traffic, and collects and consolidates data plane telemetry. While Kubernetes has its own control plane that schedules pods and handles auto-scaling of deployments, service mesh introduces another control plane to manage what these proxies are doing in order to enable service-to-service communication.

Service mesh architecture

Service mesh features

Service mesh provides a powerful but complex set of features. One of the most popular open-source service meshes breaks these features down into four pillars: features to connect, secure, control, and observe. In the connect pillar, also referred to as traffic management, you get some of the advanced capabilities you might need if you're versioning microservices and need to be able to handle various failure scenarios. The next two pillars, secure and control, are sometimes simply referred to as security. This is where a mesh provides facilities to secure traffic with mutual TLS authentication, and enforce policies on that traffic in terms of service-to-service communication. Lastly, the observe pillar covers observability features, including sidecar proxies being able to collect telemetry and logs around how services within a mesh communicate with one another.

These four pillars represent a really powerful feature set; however, this power comes at a cost. Service mesh introduces an additional control plane, which causes increased deployment complexity and significant operational overhead.

Service mesh challenges

The two main challenges posed by service mesh are complexity and performance. In terms of complexity, service mesh is difficult to set up and manage. It requires specialized skills, and includes capabilities that most users don’t need. Because service mesh introduces latency, it can also create performance issues.

Many of the challenges associated with service mesh stem from the fact that there is so much to configure (the majority of the features mentioned above require some form of configuration). While there are many service meshes out there, there’s no one-size-fits-all solution when it comes to the needs of different organizations. It's likely that security teams will need to spend a good amount of time figuring out which service mesh will work for their applications.

As such, use of a service mesh requires the development of domain knowledge and specialized skills around whichever service mesh you end up choosing. That adds another layer of complexity in addition to the work you're already doing with Kubernetes.

Main drivers of adoption

Through conversations with DevOps teams and platform and service owners, we’ve found that there are three main use cases driving interest in service mesh adoption: security/encryption, service-level observability, and service-level control.

Security/encryption – Security for data in transit within a cluster. Sometimes this is driven by industry-specific regulatory concerns, such as PCI compliance or HIPAA. In other cases, it is driven by internal data security requirements. When the security of internet-facing applications is at the core of an organization’s brand and reputation, security becomes extremely important.
Service-level observability – Visibility into how workloads and services are communicating at the application layer. By design, Kubernetes is a multi-tenant environment. As more workloads and services are deployed, it becomes harder to understand how everything is working together, especially if an organization is embracing a microservices-based architecture. Service teams want to understand what their upstream and downstream dependencies are.
Service-level control – Controlling which services can talk to one another. This includes the ability to implement best practices around a zero trust model to ensure security.

While these are the main drivers for adoption, the complexity involved in achieving them through use of a service mesh can be a deterrent for many organizations and teams.

An operationally simpler approach

Platform owners, DevOps teams, and SREs have limited resources, so adopting a service mesh is a significant undertaking due to the resources required for configuration and operation. It is preferable to find a solution that enables a single-pane-of-glass unified control to address the three most popular service mesh use cases—security, observability, and control—while avoiding the complexities associated with deploying a separate, standalone service mesh. Look for a solution that offers the following:

Encryption for data in transit that leverages the latest in crypto technology. For example, a solution that uses open-source WireGuard would provide highly performant encryption while still allowing visibility into all traffic flows.
Visibility into service-to-service communication in a way that is resource efficient and cost effective.
Kubernetes-native visualizations of all the data it collects, allowing the user to visualize communication flows across services and team spaces to facilitate troubleshooting.

Summary

So… Do you really need a service mesh? In my opinion, if security and observability are your primary drivers, the answer is no. Instead, find a solution that provides granular observability and security—not just at the application layer, but across the full stack—while avoiding the operational complexities and overhead often associated with deploying a service mesh.

To learn about new cloud-native approaches for establishing security and observability for containers and Kubernetes, check out this O'Reilly eBook, authored by Tigera.

Application Containers and Microservices DevSecOps Enhancing cloud security strategy

Share this content on your favorite social network today!

Do You Really Need a Service Mesh?

What is a service mesh?

Service mesh architecture

Control plane vs. data plane

Service mesh features

Service mesh challenges

Main drivers of adoption

An operationally simpler approach

Summary

Latest from CSA

AI Summit at RSAC 2024

The State of AI and Security Survey Report

Data Resiliency Survey 2024

Trending This Week

#1 What is the Cloud Controls Matrix (CCM)?

#2 Zero Trust and AI: Better Together

#3 Test Accounts: Another Compliance Risk

#4 AI Safety vs. AI Security: Navigating the Commonality and Differences

#5 Is PQC Broken Already? Implications of the Successful Break of a NIST Finalist

Level up with Cloud Infrastructure Security Training

Related Articles:

Kernel Introspection from Linux to Windows

10 Tips to Guide Your Cloud Email Security Strategy

The Widening Overlap Between Cloud Workloads and Cybersecurity

How to Audit Your Outdated Security Processes

Corporate Membership

Join as an Individual

Research

Find a...

Certificates

Events

Education

Popular Resources

About CSA

Our Team

Legal

Cloud Security Glossary