Alibaba Cloud Service Mesh (ASM) Helps Achieve Zero Trust and Strengthen Application Service Security

11 min readSep 9, 2021

Introduction: Alibaba Cloud Service Mesh (ASM, https://www.aliyun.com/product/servicemesh) has become a leading vehicle for cloud-native zero trust systems. It offloads authentication and authorization from application code to the service mesh to create an out-of-the-box solution that is dynamically configurable, with convenient update strategies that take effect immediately. Using Kubernetes Network Policy to achieve three-layer network security control, ASM provides policy control based on OPA (Open Policy Agent) that includes peer authentication and request authentication capabilities, Istio authorization policies, and more sophisticated management functionality. The zero trust security capabilities provided by ASM help users achieve these security goals.

Author: Wang Xining, Qi Fang

Microservices provide many advantages, including scalability, responsiveness, independent scaling, business logic isolation, independent lifecycle management, and more convenient distributed development. However, these numerous distributed microservices will also increase security challenges as each microservice is a potential target of attack. Kubernetes provides an excellent platform for hosting and orchestrating your microservices. However, interactions between microservices are not secure by default. They communicate via plaintext HTTP, but this is not enough to meet security requirements. Relying on the network boundary to ensure security is not enough, because once an internal service is compromised, the boundary security becomes a Maginot line, and attackers can use the machine as a springboard to attack the internal network. Therefore, internal calls must also be secured. This is where zero trust comes in.

Zero trust is a concept first proposed by Forrester analyst John Kindervag meaning that there is no implicit trust inside or outside the network boundary. In other words, explicit authentication is required everywhere, and the principle of least privilege is used to restrict access to resources.

An important value proposition of service mesh technology is its ability to effectively protect the application production environment without reducing the productivity of developers. Service mesh technology allows the microservice architecture to adopt zero trust network security methods and implement strong identity verification, context-based authorization, and recording and monitoring for all access. Using these mesh functions, you can provide security control capabilities for all applications in the mesh. For example, all traffic is encrypted and all traffic to the application is verified at the policy enforcement point (PEP).

The paper Kubernetes Hardening Guidance issued by the National Security Agency (NSA) in August 2021 also proposed that administrators should consider using service mesh to strengthen the security of Kubernetes clusters.

Alibaba Cloud Service Mesh (ASM, https://www.aliyun.com/product/servicemesh) has become a leading vehicle for cloud-native zero trust systems. It offloads authentication and authorization from application code to the service mesh to create an out-of-the-box solution that is dynamically configurable, with convenient update strategies that take effect immediately. Using Kubernetes Network Policy to achieve three-layer network security control, ASM provides policy control based on OPA (Open Policy Agent) that includes peer authentication and request authentication capabilities, Istio authorization policies, and more sophisticated management functionality. The zero trust security capabilities provided by ASM help users achieve these security goals.

The theoretical principles underlying ASM include the following:

1) The basis of zero trust: Unified identity for cloud-native workloads. ASM provides simple and easy-to-use identity definitions for each workload under the service mesh as well as mechanisms tailored to specific scenarios that expand the identity construction system. It is also compatible with community SPIFFE standards.

2) Vehicle of zero trust: Security certificates. ASM provides mechanisms for issuing certificates and managing the lifecycle and rotation of certificates. Identities are established through X509 TLS certificates, and each agent uses this certificate. ASM also provides certificates and private key rotation.

3)Zero trust engine: Policy execution. A policy-based trust engine is the key to implementing zero trust. In addition to supporting the Istio RBAC authorization strategy, ASM also provides a more fine-grained authorization strategy based on OPA.

4) Zero trust insights: Visualization and analysis. ASM provides observable mechanisms for monitoring the logs and indicators of policy execution, allowing users to evaluate the execution of each policy.

Why Use a Service Mesh to Achieve Zero Trust?

Compared with the traditional method of building these security mechanisms directly in the application code, the service mesh architecture provides the following security advantages:

· The lifecycle of sidecar agents is independent of the application, thus making them easier to manage.

· With dynamic configurations, strategy updates are more convenient and take effect immediately without the need to redeploy the application.

· The central control architecture of the service mesh enables enterprise’s security teams to construct, manage, and deploy security policies applicable to the entire enterprise, thereby ensuring the security of business applications built by application developers by default. They can use these security policies immediately without additional work.

· The service mesh provides the ability to authenticate the end-user credentials attached to the request, such as JWT.

· In addition, using the service mesh architecture, the authentication and authorization system can be deployed as a service in the mesh. In this way, just like other services in the mesh, these security systems can also benefit from the security safeguards of the mesh itself, including encryption, identity recognition, policy enforcement points, and end-user credential authentication and authorization during transmission.

With the aid of ASM, a single control plane can be used to implement strong identity and access management, transparent TLS and encryption, identity authentication and authorization, and audit logging. ASM provides these features out of the box, and the simplicity of installation and management allows developers, system administrators, and security teams to properly protect their microservice applications.

The ASM Zero Trust System

The service mesh can shrink the area of the cloud-native environment and provide the basic framework required by the zero trust application network. Using ASM to manage service-to-service security, you can ensure the end-to-end encryption, service-level identity authentication, and fine-grained authorization policies of the service mesh.

The service mesh system supports the following functions:

· Implementation of mutual TLS authentication or server-oriented TLS authentication between services, with support for lifecycle management features such as automatic certificate rotation. Authentication and encryption of all communications within the mesh.

· Fine-grained authorization based on identity and authorization based on other parameters. Role-based access control (RBAC) can be used to realize the principle of “least privilege”, meaning only authorized services can communicate with each other according to ALLOW/DENY rules.

1. Workload Identity

When applications run in the service mesh environment, the service mesh provides a unique identifier for each service. This identifier will be used when connecting to other microservices running in the service mesh. The service identifier can be used for two-way verification to verify whether the access between services is allowed. It can also be used in authorization policies.

When you use ASM to manage workloads running on Kubernetes or to define virtual machine workloads based on WorkloadEntry, ASM provides a service identity for each workload. The identity is implemented based on the service account token of the workload.

The service identity in ASM conforms to the SPIFFE standard and takes the following format:

spiffe://<trust-domain>/ns/<namespace>/sa/<service-account>

On the ASM console, open the corresponding ASM instance, and you will see the following workload identity under zero trust security in the left-side navigation bar.

Kubernetes cluster workloads running on the data plane and their identity definitions:

Virtual machine workloads defined by WorkloadEntry and their identity definitions:

2. Peer Authentication

Authentication refers to the identity: Who is this service? Who is this end user? Can I trust that they are who they say they are?

ASM products provide two types of authentication:

Peer-to-peer authentication: When two microservices interact with each other, you can enable or disable mutual TLS for peer-to-peer authentication.

Request authentication: You can allow end users and systems to interact with microservices using request authentication. This is usually done using the JSON Web Token (JWT).

See the Getting Started Guide (https://help.aliyun.com/document_detail/149552.html) to install and deploy the bookinfo example.

First, when trying to access the service details using plaintext HTTP from the productpage pod in the same namespace (the default, in this example), the request should be successfully returned with status 200 in normal conditions. This is because both TLS and plaintext traffic are accepted by default.

kubectl exec $(kubectl get pod -l app=productpage -o jsonpath={.items..metadata.name}) -c istio-proxy -- curl http://details:9080/details/1 -o /dev/null -s -w '%{http_code}\n'

Next, define peer authentication under the namespace default.

On the ASM console, open the corresponding ASM instance, and you will see the following peer-to-peer authentication under zero trust security in the left-side navigation bar. On the right, click the “New mTLS Mode” button to define the mTLS mode for the workload details as STRICT.

Next, use the productpage pod to access the service details using plaintext HTTP:

kubectl exec $(kubectl get pod -l app=productpage -o jsonpath={.items..metadata.name}) -c istio-proxy -- curl http://details:9080/details/1 -o /dev/null -s -w '%{http_code}\n'000command terminated with exit code 56

Exit code 56 indicates a failure to receive network data. This is the response we expect. The workload details define the mTLS mode as STRICT, and therefore TLS certificate authentication is required for each request.

To allow normal access, change the peer-to-peer identity authentication definition above from STRICT to PERMISSIVE. The corresponding YAML definition is as follows:

apiVersion: security.istio.io/v1beta1kind: PeerAuthenticationmetadata:name: details-strictnamespace: defaultspec:mtls:mode: PERMISSIVEselector:matchLabels:app: details

3. Request Authentication

First, we will create a request authentication policy to enforce JWT authentication for inbound requests of the service details. On the ASM console, open the corresponding ASM instance, and you will see the following request authentication under zero trust security in the left-side navigation bar. On the right, click the “New” button to define JWT policy for the workload details.

Here, the issuer value is set to “testing@secure.istio.io”,

and the jwks value is taken from https://raw.githubusercontent.com/istio/istio/release-1.9/security/tools/jwt/samples/jwks.json, as follows:

{ "keys":[ {"e":"AQAB","kid":"DHFbpoIUqrY8t2zpA2qXfCmr5VO5ZEr4RzHU_-envvQ","kty":"RSA","n":"xAE7eB6qugXyCAG3yhh7pkDkT65pHymX-P7KfIupjf59vsdo91bSP9C8H07pSAGQO1MV_xFj9VswgsCg4R6otmg5PV2He95lZdHtOcU5DXIg_pbhLdKXbi66GlVeK6ABZOUW3WYtnNHD-91gVuoeJT_DwtGGcp4ignkgXfkiEm4sw-4sfb4qdt5oLbyVpmW6x9cfa7vs2WTfURiCrBoUqgBo_-4WTiULmmHSGZHOjzwa8WtrtOQGsAFjIbno85jp6MnGGGZPYZbDAa_b3y5u-YpW7ypZrvD8BgtKVjgtQgZhLAGezMt0ua3DRrWnKqTZ0BJ_EyxOGuHJrLsn00fnMQ"}]}

Then, when using the productpage pod to access the service details by plaintext HTTP, you can see that the returned result is 200.

The value of the variable TOKEN is set to:

export TOKEN=eyJhbGciOiJSUzI1NiIsImtpZCI6IkRIRmJwb0lVcXJZOHQyenBBMnFYZkNtcjVWTzVaRXI0UnpIVV8tZW52dlEiLCJ0eXAiOiJKV1QifQ.eyJleHAiOjQ2ODU5ODk3MDAsImZvbyI6ImJhciIsImlhdCI6MTUzMjM4OTcwMCwiaXNzIjoidGVzdGluZ0BzZWN1cmUuaXN0aW8uaW8iLCJzdWIiOiJ0ZXN0aW5nQHNlY3VyZS5pc3Rpby5pbyJ9.CfNnxWP2tcnR9q0vxyxweaF3ovQYHYZl82hAUsn21bwQd9zP7c-LS9qd_vpdLG4Tn1A15NxfCjp5f7QNBUo-KC9PJqYpgGbaXhaGx7bEdFWjcwv3nZzvc7M__ZpaCERdwU7igUmJqYGBYQ51vr2njU9ZimyKkfDe3axcyiBZde7G6dabliUosJvvKOPcKIWPccCgefSj_GNfwIip3-SsFdlR7BtbVUcqR-yv-XOxJ3Uc1MI0tz3uMiiZcyPV7sNCU4KRnemRIMHVOfuvHsU60_GhGbiSFzgPTAa9WTltbnarTbxudb_YEOx12JiwYToeX0DCPb43W1tzIBxgm8NxUgkubectl exec $(kubectl get pod -l app=productpage -o jsonpath={.items..metadata.name}) -c istio-proxy -- curl http://details:9080/details/1 -o /dev/null --header "Authorization: Bearer $TOKEN" -s -w '%{http_code}\n'200

If an invalid token is passed, we should see a “401: Unauthorized” response:

kubectl exec $(kubectl get pod -l app=productpage -o jsonpath={.items..metadata.name}) -c istio-proxy -- curl http://details:9080/details/1 -o /dev/null --header "Authorization: Bearer badtoken" -s -w '%{http_code}\n'401

However, if we do not pass a token at all, RequestAuthentication will not call the policy. Requests that do not use JWTs also return 200.

kubectl exec $(kubectl get pod -l app=productpage -o jsonpath={.items..metadata.name}) -c istio-proxy -- curl http://details:9080/details/1 -o /dev/null  -s -w '%{http_code}\n'200

Therefore, in addition to this authentication policy, we also need an authorization policy that requires JWT for all requests. The next section will describe how to define authorization policies in ASM.

4. Authorization Policies

The ASM product provides an authorization policy. You can use the AuthorizationPolicy resource to activate the authorization mechanism between microservices, and use the following to establish an appropriate traffic authorization policy mechanism:

· The workload label selects the selector field to specify the policy target.

· The action field specifies whether it is an ALLOW or DENY request. If you do not specify an action, the action will default to ALLOW. For clarity, we recommend that you always specify the action. (Authorization policies also support AUDIT and CUSTOM actions).

· The rules field specifies the time to trigger the action:

· The from field in rules specifies the source of the request;

· The to field in rules specifies the requested action.

· The when field specifies other conditions that need to be met in order to apply the rule.

The corresponding YAML definition is as follows:

apiVersion: security.istio.io/v1beta1kind: AuthorizationPolicymetadata:name: require-jwtnamespace: defaultspec:action: ALLOWrules:- from:- source:requestPrincipals:- testing@secure.istio.io/testing@secure.istio.ioselector:matchLabels:app: details

Send the request again without using the JWT, and you should get the response 403-Forbidden. This shows that the AuthorizationPolicy has taken effect. Therefore, all front-end requests must have a JWT.

kubectl exec $(kubectl get pod -l app=productpage -o jsonpath={.items..metadata.name}) -c istio-proxy -- curl http://details:9080/details/1 -o /dev/null  -s -w '%{http_code}\n'403

5. OPA Policies

As an incubation project hosted by Cloud Native Computing Foundation (CNCF), Open Policy Agent (OPA) is a policy engine that can be used to implement fine-grained access control for your applications. As a general policy engine, OPA can be deployed as an independent service together with microservices. In order to protect the application, each request to a microservice must be authorized before it can be processed. In order to check request authorizations, the microservice makes an API call to OPA to determine whether the request is authorized.

ASM integrates an Open Policy Agent (OPA) plug-in and defines access control policies through OPA. This enables fine-grained access control for your applications and supports the dynamic update of OPA policies.

For details, see https://help.aliyun.com/document_detail/277428.html

Summary and Cases

In summary, ASM provides the following components to enhance security:

· A managed certificate infrastructure with complete certificate lifecycle management, addressing the complexity of certificate issuance and CA rotation.

· Managed control plane APIs, used to distribute authentication policies, authorization policies, and security naming information to the envoy agent.

· Sidecar agents, helping ensure the security of the mesh by providing the policy enforcement point (PEP).

· The envoy extension, allowing telemetry data collection and auditing.

Each workload establishes an identity through an X509 TLS certificate, which is used by each Sidecar agent. ASM will provide and periodically rotate certificates and private keys. If a specific private key is stolen, ASM will quickly replace it with a new one, greatly reducing the impact of an attack.

References

1. Use authorization policies to implement IP-based access control on the ingress gateway (refer to the document https://help.aliyun.com/document_detail/214764.html?spm=a2c4g.11186623.6.613.6b213918q0rlZV#title-f15-wyq-fxg) or implement access control based on custom external authorization, as shown in the figure below, in which a cloud product is implemented based on the ASM gateway authorization policy.

2. An internet finance customer used the authorization policy provided by ASM to isolate the external connection area and the application area in order to solve access control issues for cross-cluster multi-language applications. At the same time, ASM can be combined with an egress gateway to audit the mesh traffic, and in conjunction with authorization policies, it can also control the application’s access to third-party services.

Alibaba Tech
First hand and in-depth information about Alibaba’s latest technology → Facebook: “Alibaba Tech”. Twitter: “AlibabaTech”.