How to use SkyWalking for distributed tracing in Istio?

In cloud-native applications, a request often needs to be processed by a series of APIs or background services. Some of these services are parallel, some are serial, and they are located on different platforms or nodes. So how to determine the service path and node passed by a call to help us troubleshoot? At this time, distributed tracing needs to be used.

This article will introduce you to:

  • Principles of Distributed Tracing
  • How to Choose a Distributed Tracing Software
  • How to use distributed tracing in Istio
  • Take Bookinfo and SkyWalking as examples to illustrate how to view distributed tracking data

Distributed Tracing Basics

Distributed tracing is a method for tracing requests in distributed systems, which can help users better understand, control and optimize distributed systems. Two concepts are used in distributed tracing: TraceID and SpanID.

  • TraceID is a globally unique ID used to identify the trace information of a request. All trace information of a request belongs to the same TraceID, and the TraceID remains unchanged throughout the trace of the request;

  • SpanID is a locally unique ID used to identify the tracking information of a request at a certain moment. A request will generate different SpanIDs in different time periods, and SpanID is used to distinguish the tracking information of a request in different time periods;

TraceID and SpanID are the basis of distributed tracing. They provide a unified identifier for trace requests in distributed systems, making it easy for users to query, manage, and analyze requested trace information.

image

Distributed Tracing Schematic

The following is the process of distributed tracing:

  1. When a system receives a request, the distributed tracking system will assign a TraceID to the request to connect the entire call chain;
  2. The distributed tracking system will generate a SpanID and ParentID for each service call of the request in the system to record the parent-child relationship of the call, and the Span without ParentID will be used as the entry of the call chain;
  3. TraceID and SpanID are passed during each service call;
  4. When viewing distributed tracing, query the entire process of a request through TraceID;

How Istio implements distributed tracing

Distributed tracing in Istio is based on the Envoy proxy in the data plane. After the service request is hijacked into Envoy, Envoy will attach a large number of Headers when forwarding the request, among which are related to distributed tracking:

  • As TraceID: x-request-id :
  • Used to establish the parent-child relationship of Span in the LightStep tracking system: x-ot-span-context :
  • For Zipkin, also suitable for Jaeger, SkyWalking, see b3-propagation for details
    :
    • x-b3-traceid
    • x-b3-spanid
    • x-b3-parentspanid
    • x-b3-sampled
    • x-b3-flags
    • b3
  • For Datadog:
    • x-datadog-trace-id
    • x-datadog-parent-id
    • x-datadog-sampling-priority
  • For SkyWalking: sw8
  • For AWS X-Ray: x-amzn-trace-id

For detailed usage of these headers, please refer to the Envoy documentation
.

No matter what language your application is developed in, Envoy will automatically propagate these headers for you, but you still need to make some small changes to the application code to add distributed tracking to the application. This is because the application cannot automatically propagate these headers, you can integrate the distributed tracking agent in the program, or manually propagate these headers in the code. Envoy will send the trace data to the tracer backend for processing, and then you can view the trace data in the UI.

For example, in the Productpage service in the Bookinfo application, if you look at its code, you can find that the Jeager client library is integrated, and the Header generated by Envoy is synchronized to the HTTP request for the Details and Reviews services in the getForwardHeaders (request) method:

 def getForwardHeaders ( request ): headers = {}  
# 使用Jeager agent 获取x-b3-** header span = get_current_span () carrier = {} tracer . inject ( span_context = span . context , format = Format . HTTP_HEADERS , carrier = carrier )  
headers . update ( carrier )  
# 手动处理非x-b3-* header if 'user' in session : headers [ 'end-user' ] = session [ 'user' ] incoming_headers = [ 'x-request-id' , 'x-ot-span-context' , 'x-datadog-trace-id' , 'x-datadog-parent-id' , 'x-datadog-sampling-priority' , 'traceparent' , 'tracestate' , 'x-cloud-trace-context' , 'grpc-trace-bin' , 'sw8' , 'user-agent' , 'cookie' , 'authorization' , 'jwt' , ]  
for ihdr in incoming_headers : val = request . headers . get ( ihdr ) if val is not None : headers [ ihdr ] = val  
return headers

For frequently asked questions about distributed tracing in Istio, see the Istio documentation
.

How to choose a distributed tracking system

The principle of the distributed tracking system is similar, and there are many such systems on the market, such as Apache SkyWalking
, Jaeger
, Zipkin
, Lightstep, Pinpoint, etc. We will choose three of them and compare them from multiple dimensions. They were chosen because:

  • They are currently the most popular open source distributed tracing systems;
  • Both are based on the OpenTracing specification;
  • Both support integration with Istio and Envoy;
category Apache SkyWalking Jaeger Zipkin
Method to realize Language-based probes, service mesh probes, eBPF agent, third-party indicator library (currently supports Zipkin) language-based probes language-based probes
data storage ES, H2, MySQL, TiDB, Sharding-sphere, BanyanDB ES, MySQL, Cassandra, memory ES, MySQL, Cassandra, memory
language support Java, Rust, PHP, NodeJS, Go, Ruby, Python Java, Go, Python, NodeJS, C#, PHP, Ruby, C++ Java, Go, Python, NodeJS, C#, PHP, Ruby, C++
Initiator personal Uber twitter
Governance Apache Foundation CNCF CNCF
Version 9.3.0 1.39.0 2.23.19
Number of Stars 20.9k 16.8k 15.8k

Distributed tracking system comparison table (data cut-off time 2022-12-07)

Although Apache SkyWalking’s Agent does not support as many languages ​​as Jaeger and Zipkin, SkyWalking has more implementation methods and is compatible with the tracking data of Jaeger and Zipkin. One of the best choices.

experiment

Refer to the Istio documentation
To install and configure Apache SkyWalking.

Environmental description

The following is the environment of our experiment:

  • Kubernetes 1.24.5
  • Istio 1.16
  • SkyWalking 9.1.0

Install Istio

Before installing, you can check whether there is any problem with the environment

 $ istioctl experimental precheck ✔ No issues found when checking the cluster. Istio is safe to install or upgrade! To get started, check out https://istio.io/latest/docs/setup/getting-started/

Then install Istio and integrate SkyWalking:

 # 初始化Istio Operator istioctl operator init # 安装Istio 并配置使用SkyWalking kubectl apply -f - <<EOF apiVersion: install.istio.io/v1alpha1 kind: IstioOperator metadata: namespace: istio-system name: istio-with-skywalking spec: meshConfig: defaultProviders: tracing: - "skywalking" enableTracing: true extensionProviders: - name: "skywalking" skywalking: service: tracing.istio-system.svc.cluster.local port: 11800 EOF

Deploy Apache SkyWalking

Istio 1.16 supports distributed tracing using Apache SkyWalking. Execute the following code to install SkyWalking:

 kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.16/samples/addons/extras/skywalking.yaml

It will be installed under the istio-system namespace:

  • SkyWalking OAP
    (Observability Analysis Platform): used to receive tracking data, supports SkyWalking native data format, Zipkin v1 and v2 and Jaeger format.
  • UI
    : Used to query distributed tracing data.

For more information about SkyWalking, please refer to the SkyWalking documentation
.

Deploy the Bookinfo application

Execute the following command to install the bookinfo example:

 kubectl label namespace default istio-injection = enabled kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml

Check the gateway IP, and access the Productpage service access endpoint, then open the SkyWalking UI:

 istioctl dashboard skywalking

The General Service page of SkyWalking shows all the services in the bookinfo application.

image

General Service page of SkyWalking

You can also see information about instances, endpoints, topology, traces, and more. For example, the figure below shows the service topology of the bookinfo application.

image

Service topology of the Bookinfo application

The tracking view of SkyWalking has various display forms, such as list, tree, table and statistics.

image

SkyWalking Universal Service Tracking supports multiple display styles

For our convenience, set the sampling rate of the trace to 100%:

 kubectl apply -f - <<EOF apiVersion: telemetry.istio.io/v1alpha1 kind: Telemetry metadata: name: mesh-default namespace: istio-system spec: tracing: - randomSamplingPercentage: 100.00 EOF

uninstall

After the experiment, execute the following command to uninstall Istio and SkyWalking:

 samples/bookinfo/platform/kube/cleanup.sh istioctl unintall --purge kubectl delete namespace istio-system

Bookinfo demo tracking information description

Navigate to the General Service tab in the Apache SkyWalking UI to view the tracking information of the nearest istio-ingressgateway service, the table view is shown below. The figure shows the basic information of all spans requested this time, click on each span to view detailed information.

image

The basic information of the Span is displayed in the table view

Switch to the list view, and you can see the execution order of each Span, as shown in the figure below.

image

The execution order of Spans is shown in the list view

You may be confused, why such a simple application will generate so many Span information? Because after we inject the Envoy proxy into the Pod, each request between services will be intercepted and processed by Envoy, as shown in the figure below.

image

Envoy intercepts the traffic that touches the application container to generate a Span

The entire tracking process is shown in the figure below.

image

Bookinfo’s Distributed Tracking Trace

In the figure, each Span is marked with a serial number, and the time consumption is indicated in brackets. For the sake of illustration, we summarize all spans in the table below.

serial number method Total time (ms) Component time consumption (ms) current service illustrate
1 /productpage 190 0 istio-ingressgateway Envoy Outbound
2 /productpage 190 1 istio-ingressgateway Ingress -> Productpage network transmission
3 /productpage 189 1 product page Envoy Inbound
4 /productpage 188 twenty one product page Apply internal processing
5 /details/0 8 1 product page Envoy Outbound
6 /details/0 7 3 product page Productpage -> Details Network Transport
7 /details/0 4 0 details Envoy Inbound
8 /details/0 4 4 details inside the application
9 /reviews/0 159 0 product page Envoy Outbound
10 /reviews/0 159 14 product page Productpage -> Reviews web transfer
11 /reviews/0 145 1 reviews Envoy Inbound
12 /reviews/0 144 109 reviews Apply internal processing
13 /ratings/0 35 2 reviews Envoy Outbound
14 /ratings/0 33 16 reviews Reviews -> Ratings Network Transfer
15 /ratings/0 17 1 ratings Envoy Inbound
16 /ratings/0 16 16 ratings Apply internal processing

From the above information it can be found that:

  • This request takes a total of 190ms;
  • In the Istio sidecar mode, every time traffic enters and exits the application container, it needs to pass through the Envoy proxy once, and each time takes 0 to 2 ms;
  • Network requests between Pods take between 1 and 16ms;
  • The time-consuming call chain Ingress Gateway -> Productpage -> Reviews -> Ratings takes a total of 182 ms, which is less than the total request time of 190 ms. This is because the data itself has errors and the start time of the span is not It must be equal to the end time of the parent Span. If you are on the tracking page of SkyWalking, select the “list” style to view the tracking data (see Figure 2) to find this problem more intuitively;
  • We can see that the most time-consuming part is the Reviews application, which takes 109ms, so we can optimize it for this application;

Summarize

Distributed tracing can be easily used in Istio with a slight modification of the application code. Among the many distributed tracing systems supported by Istio, Apache SkyWalking
is one of the best. It not only supports distributed tracing, but also supports metrics and log collection, alarming, Kubernetes and service grid monitoring, using eBPF to diagnose service grid performance
It is a full-featured cloud-native application analysis platform. In this article, for the convenience of demonstration, the tracking sampling rate is set to 100%. Please adjust the sampling strategy (sampling percentage) as needed during production use to prevent excessive tracking logs.

refer to

This article is transferred from https://jimmysong.io/blog/distributed-tracing-with-skywalking-in-istio/
This site is only for collection, and the copyright belongs to the original author.