Prometheus高可用-Thanos

# 一、概述

Prometheus目前作为云原生主流的监控利器，相信大家对此都不陌生，对于一般系统架构来说，其实根本不需要对Prometheus做高可用部署，简单的单体即可满足监控需求，当随着业务扩容及系统架构比较大时，如果还是仅仅单体可能就满足不了现有的需求，需要原来那个监控对业务指标进行提前预警，因此对于监控系统也需要有较高的要求。

本文不去探究网上常用的多副本、联邦集群这种模式，而是直接采用Thanos架构，Thanos也属于CNCF重多项目中的一员。

# 二、部署

我们Prometheus集群是基于Prometheus Operator部署，该项目直接集成了Thanos存储，具体可以参考:https://github.com/coreos/prometheus-operator/blob/master/Documentation/thanos.md

Thanos对接的存储有很多，我这里直接使用阿里的OSS，更多信息可以参考:https://thanos.io/tip/thanos/storage.md/

# 2.1 部署Thanos Sidecar

Sidecar 连接 Prometheus，将其数据提供给 Thanos Query 查询，并且可以将其数据上传到云对象存储，以供长期存储。

直接修改prometheus-prometheus.yaml文件，添加thanos配置,这样Prometheus对应的pod就会注入Sidecar容器

# 这里把数据上传到远程对象存储上面
[root@k8s-master01 thanos]# cat thanos-storage-ali.yaml 
type: ALIYUNOSS
config:
  endpoint: "oss-cn-hangzhou.aliyuncs.com"
  bucket: "vuepress-xxx"
  access_key_id: "*********"
  access_key_secret: "*********"
prefix: ""
# 应用配置
kubectl create secret generic thanos-objectstorage --from-file=thanos.yaml=./thanos-storage-ali.yaml -n monitoring
# prometheus-prometheus.yaml配置
...
spec:
  ...
  thanos:
    image: quay.io/thanos/thanos:v0.30.2
    objectStorageConfig:
      key: thanos.yaml
      name: thanos-objectstorage
...

kubectl apply -f prometheus-prometheus.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

也可以通过查看对应Prometheus pod中运行的容器

[root@k8s-master01 manifests]# kubectl describe pod -n monitoring prometheus-k8s-1
...
thanos-sidecar:
    Container ID:  containerd://6c8f8668cdfb7eccd0d26b3f36015cce5b24610fbd0c6e424092fe5e8fa4e811
    Image:         quay.io/thanos/thanos:v0.28.1
    Image ID:      quay.io/thanos/thanos@sha256:3e95df4ce38edf1ca60666be0be229bed71ae155e8f5cf3ebbe7b45fbea487cb
    Ports:         10902/TCP, 10901/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      sidecar
      --prometheus.url=http://localhost:9090/
      --prometheus.http-client={"tls_config": {"insecure_skip_verify":true}}
      --grpc-address=:10901
      --http-address=:10902
    State:          Running
      Started:      Wed, 15 Mar 2023 14:31:02 +0800
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gbttc (ro)
...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

# 2.2 部署thanos-querier

Thanos Query实现了 Prometheus API，提供全局查询视图，并且与Prometheus原生PromQL也是完全兼容，所以也可以直接对接Grafana,Query属于无状态应用，可以水平扩展。

thanos-query组件清单

# thanos-query.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-querier
  namespace: monitoring
  labels:
    app: thanos-querier
spec:
  selector:
    matchLabels:
      app: thanos-querier
  template:
    metadata:
      labels:
        app: thanos-querier
    spec:
      containers:
      - name: thanos
        image: quay.io/thanos/thanos:v0.28.1
        args:
        - query
        - --log.level=debug
        - --query.replica-label=prometheus_replica
        - --store=dnssrv+prometheus-operated:10901 
        ports:
        - name: http
          containerPort: 10902
        - name: grpc
          containerPort: 10901
        resources:
        livenessProbe:
          httpGet:
            path: /-/healthy
            port: http
          initialDelaySeconds: 10
        readinessProbe:
          httpGet:
            path: /-/healthy
            port: http
          initialDelaySeconds: 15
---
apiVersion: v1
kind: Service
metadata:
  name: thanos-querier
  namespace: monitoring
  labels:
    app: thanos-querier
spec:
  ports:
  - port: 9090
    protocol: TCP
    targetPort: http
    name: http
  selector:
    app: thanos-querier
  type: ClusterIP

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

查看pod运行

[root@k8s-master01 thanos]# kubectl get pods -n monitoring -l app=thanos-querier
NAME                            READY   STATUS    RESTARTS   AGE
thanos-querier-5c97668b55-wfrxz   1/1     Running   0          2m16s

1
2
3

外网暴露

借助Ingress-nginx暴露

# thanos-query.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: thanos-querier-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: "nginx"
    prometheus.io/http_probe: "true"
spec:
  rules:
  - host: thanos-querier.tchua.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: thanos-querier
            port:
              number: 9090

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

浏览器访问

可以看到两个Sidecar已经关联注册上来

# 2.3 部署Thanos Store

Store 组件主要用于和Querier组件协调从存储中检索历史数据。

# thanos-store.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-store
  namespace: monitoring
  labels:
    app: thanos-store
spec:
  selector:
    matchLabels:
      app: thanos-store
  serviceName: thanos-store
  template:
    metadata:
      labels:
        app: thanos-store
    spec:
      containers:
        - name: thanos
          image: quay.io/thanos/thanos:v0.30.2
          args:
            - "store"
            - "--log.level=debug"
            - "--data-dir=/data"
            - "--objstore.config-file=/etc/secret/thanos.yaml"
            - "--index-cache-size=500MB"
            - "--chunk-pool-size=500MB"
          ports:
            - name: http
              containerPort: 10902
            - name: grpc
              containerPort: 10901
          livenessProbe:
            httpGet:
              port: 10902
              path: /-/healthy
            initialDelaySeconds: 10
          readinessProbe:
            httpGet:
              port: 10902
              path: /-/ready
            initialDelaySeconds: 15
          volumeMounts:
            - name: object-storage-config
              mountPath: /etc/secret
              readOnly: false
      volumes:
        - name: object-storage-config
          secret:
            secretName: thanos-objectstorage
---
apiVersion: v1
kind: Service
metadata:
  name: thanos-store
  namespace: monitoring
spec:
  type: ClusterIP
  clusterIP: None
  ports:
    - name: grpc
      port: 10901
      targetPort: grpc
  selector:
    app: thanos-store

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

查看资源运行情况

[root@k8s-master01 thanos]# kubectl get pods -n monitoring |grep thanos-store
NAME                                   READY   STATUS    RESTARTS   AGE
thanos-store-0                         1/1     Running   0          53s

[root@k8s-master01 thanos]# kubectl get svc -n monitoring |grep thanos-store
thanos-store            ClusterIP   None            <none>        10901/TCP                    3m53s

1
2
3
4
5
6
7

注册至Querier

修改Querier启动参数

...
containers:
      - name: thanos
        image: quay.io/thanos/thanos:v0.30.2
        args:
        - query
        - --log.level=debug
        - --query.replica-label=prometheus_replica
        - --store=dnssrv+prometheus-operated:10901 
        - --store=dnssrv+thanos-store:10901
...

1
2
3
4
5
6
7
8
9
10
11

查看Store是否注册至Querier

# 三、Thanos工作机制

# 3.1 Thanos架构

# 3.2 Thanos 各组件工作机制

# 3.2.1 Sidecar 组件

概述

在基于Prometheus-Operator部署的监控集群，可以直接使用CRD的方式进行Thanos Sidecar的配置，这样直接会在Prometheus的pod中新增一个Sidecar容器，该容器有两个功能:

作为代理，把本地Prometheus数据提供给Querier 组件读取查询。
把Prometheus监控数据持久化至远程存储，可以查看具体支持的storage (opens new window)

读取指标流程

sidecar 接收到来自于 query 发起的查询请求后将其转换成 query API 请求，发送给其绑定的 Prometheus，由 Prometheus 从本地读取数据并响应，返回短期的本地采集和评估数据。

store 接收到来自于 query 发起的查询请求后首先从对象存储桶中遍历数据块的 meta.json，根据其中记录的时间范围和标签先进行一次过滤。接下来从对象存储桶中读取数据块的 index 和 chunks 进行查询，部分查询频率较高的index 会被缓存下来，下次查询使用到时可以直接读取。最终返回长期的历史采集和评估指标。

说明

Thanos Sidecar上传本地数据至远程对象存储虽然是属于实时的操作，但是因为Prometheus的数据落盘机制，并不是实时落盘，而是每2个小时生成一个时序数据块，Thanos Sidecar再把这个数据块上传至远程存储对象中，所以当Prometheus出现宕机重建时，也会出现数据丢失2小时的情况,因此还是建议Prometheus做持久化配置,以保证当Prometheus宕机重建时，数据的最小化丢失。

参数配置

我们把上面部署的的配置复制下来

# Prometheus-Prometheus.yaml
thanos:
    image: quay.io/thanos/thanos:v0.30.2
    objectStorageConfig:
      key: thanos.yaml
      name: thanos-objectstorage
# 查看注入后的参数配置
[root@k8s-master01 thanos]# kubectl get pods -n monitoring prometheus-k8s-1 -o yaml
...
- args:
    - sidecar
    - --prometheus.url=http://localhost:9090/
    - '--prometheus.http-client={"tls_config": {"insecure_skip_verify":true}}'
    - --grpc-address=:10901
    - --http-address=:10902
    - --objstore.config=$(OBJSTORE_CONFIG)
    - --tsdb.path=/prometheus
    env:
    - name: OBJSTORE_CONFIG
      valueFrom:
        secretKeyRef:
          key: thanos.yaml
          name: thanos-objectstorage
    image: quay.io/thanos/thanos:v0.30.2
    imagePullPolicy: IfNotPresent
    name: thanos-sidecar
    ports:
    - containerPort: 10902
      name: http
      protocol: TCP
    - containerPort: 10901
      name: grpc
      protocol: TCP
...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

参数说明

通过上面的pod启动参数及环境变量可以看到，启动参数--objstore.config=$(OBJSTORE_CONFIG),会去使用系统变量OBJSTORE_CONFIG的值，而OBJSTORE_CONFIG，刚好是上面通过secret注入的内容。

Prometheus-Prometheus.yaml中关于Prometheus的配置有这样一个参数external_labels用来指定标签，是用来在Queirer查询时去重使用。

# 3.2.2 Querier 组件

Querier 组件（也称为“查询”）实现了 Prometheus 的 HTTP API，可以像 Prometheus 的 Graph 一样，通过 PromQL 查询 Thanos 集群中的数据。
Querier 组件属于无状态应用，可以部署多个副本， Querier查询时会从每个 Prometheus 实例的 Sidecar 和 Store Gateway 获取到指标数据。
Querier 在做查询时会去除集群标签，将指标名称和标签一致的序列根据时间排序合并。虽然指标数据来自不同的采集源，但是只会响应一份结果而不是多份重复的结果。

部署参数

# thanos-query.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-querier
  namespace: monitoring
  labels:
    app: thanos-querier
spec:
  selector:
    matchLabels:
      app: thanos-querier
  template:
    metadata:
      labels:
        app: thanos-querier
    spec:
      containers:
      - name: thanos
        image: quay.io/thanos/thanos:v0.28.1
        args:
        - query
        - --log.level=debug
        - --query.replica-label=prometheus_replica
        - --store=dnssrv+prometheus-operated:10901
        - --store=dnssrv+thanos-store:10901
        ports:
        - name: http
          containerPort: 10902
        - name: grpc
          containerPort: 10901
        resources:
        livenessProbe:
          httpGet:
            path: /-/healthy
            port: http
          initialDelaySeconds: 10
        readinessProbe:
          httpGet:
            path: /-/healthy
            port: http
          initialDelaySeconds: 15

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

**参数解读**

query.replica-label=prometheus_replica prometheus_replica属于内置标签,重复数据去重,同一个集群中，该标签需要有唯一值。在Thanos - Query查询页面勾选Use Deduplication 选择是否使用去重，该功能就是根据query.replica-label标签选择副本中的一个进行选择，比如本文prometheus_replica=prometheus-k8s-0和prometheus_replica=prometheus-k8s-0两个副本，Thanos会根据打分机制，选择更加稳定的副本数据进行展示。
--store=dnssrv+prometheus-operated,Sidecar服务发现机制，Querier 组件需要与Sidecar 以及 Store 组件进行对接的,启动Querier 组件时，需要指定部署的Sidecar和store地址，这里时采用对应的 Headless Service 来进行发现。
--store=dnssrv+thanos-store:10901,同上

# 3.2.3 Store 组件

我们知道Querier组件通过Sidecar查询数据时，只是有最近2小时的监控数据，对于历史数据，我们就需要借助Store组件了，上面部署Sidecar组件的时候，我们有配置数据存储到远程对象存储里面，Store就是去该存储对象查询数据的。

部署参数

# thanos-store.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-store
  namespace: monitoring
  labels:
    app: thanos-store
spec:
  selector:
    matchLabels:
      app: thanos-store
  serviceName: thanos-store
  template:
    metadata:
      labels:
        app: thanos-store
    spec:
      containers:
        - name: thanos
          image: quay.io/thanos/thanos:v0.30.2
          args:
            - "store"
            - "--log.level=debug"
            - "--data-dir=/data"
            - "--objstore.config-file=/etc/secret/thanos.yaml"
            - "--index-cache-size=500MB"
            - "--chunk-pool-size=500MB"
          ports:
            - name: http
              containerPort: 10902
            - name: grpc
              containerPort: 10901
          livenessProbe:
            httpGet:
              port: 10902
              path: /-/healthy
            initialDelaySeconds: 10
          readinessProbe:
            httpGet:
              port: 10902
              path: /-/ready
            initialDelaySeconds: 15
          volumeMounts:
            - name: object-storage-config
              mountPath: /etc/secret
              readOnly: false
      volumes:
        - name: object-storage-config
          secret:
            secretName: thanos-objectstorage
---
apiVersion: v1
kind: Service
metadata:
  name: thanos-store
  namespace: monitoring
spec:
  type: ClusterIP
  clusterIP: None
  ports:
    - name: grpc
      port: 10901
      targetPort: grpc
  selector:
    app: thanos-store

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

参数解读

--objstore.config-file=/etc/secret/thanos.yaml 该配置文件是通过volumes挂载到本地，也是部署Sidecar时配置的远程存储对象创建的secret。

创建了一个名为thanos-store的Headless Service用于Querier组件自动发现。

# 3.2.4 Compactor 组件

Compactor 组件主要是与对象存储交互，用于压缩数据以及降采样注，提升对长期数据的查询效率。该组件由于需要与对象存储做交互，出于并发安全的考虑，一般都是部署一个副本。

降采样: 比如一份监控数据，原始采集时10s一次，把他降为5m一次，这样数据量就小很多，因为我们有时候看的一历史的趋势，不需要对每个数据点进行采集

注意

降采样有三个主要的配置参数：

--retention.resolution-raw（单位：d，默认 0d）
--retention.resolution-5m（单位：d，默认 0d）
--retention.resolution-1h（单位：d，默认 0d）

数据降准的目标是为大范围时间(如月或年)的范围查询提供一个获得快速结果的机会。换句话说，如果你设置原始数据的留存时间比 5m 的降准数据留存时间短（--retention.resolution-raw 的值比--retention.resolution-5m 和 --retention.resolution-1h小），那么会遇到一个问题，当你想查看细节的时候，细节丢失了，你看不到了。

部署参数

# thanos-compactor.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-compactor
  namespace: monitoring
  labels:
    app: thanos-compactor
spec:
  replicas: 1
  selector:
    matchLabels:
      app: thanos-compactor
  serviceName: thanos-compactor
  template:
    metadata:
      labels:
        app: thanos-compactor
    spec:
      containers:
        - name: thanos
          image: quay.io/thanos/thanos:v0.30.2
          args:
            - "compact"
            - "--log.level=debug"
            - "--data-dir=/data"
            - "--objstore.config-file=/etc/secret/thanos.yaml"
            - "--wait"
          ports:
            - name: http
              containerPort: 10902
          livenessProbe:
            httpGet:
              port: 10902
              path: /-/healthy
            initialDelaySeconds: 10
          readinessProbe:
            httpGet:
              port: 10902
              path: /-/ready
            initialDelaySeconds: 15
          volumeMounts:
            - name: object-storage-config
              mountPath: /etc/secret
              readOnly: false
      volumes:
        - name: object-storage-config
          secret:
            secretName: thanos-objectstorage
---
apiVersion: v1
kind: Service
metadata:
  name: thanos-compactor
  namespace: monitoring
  labels:
    app: thanos-compactor
spec:
  ports:
    - port: 10902
      targetPort: http
      name: http
  selector:
    app: thanos-compactor
  type: ClusterIp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

参数解释

--data-dir=/data
--objstore.config-file=/etc/secret/thanos.yaml 指定对象存储的配置文件与stoer组件类似
--wait 指定该组件长期运行，默认情况下，Thanos Compactor 组件只运行一次，运行结束就退出，这使得它可以作为 cronjob执行。如果想要 Thanos Compactor 组件长期运行，可以使用 --wait 或 --wait-interval=5m 参数。

# 四、对接Grafana

上面把几个组件部署后，数据持久化存储，去重都都已经实现，之前Grafana我们是直接使用Promeheus svc地址，这里需要修改为thanos-query地址。

第一种: 直接修改yaml文件

# grafana-dashboardDatasources.yaml
apiVersion: v1
kind: Secret
metadata:
  labels:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 9.4.3
  name: grafana-datasources
  namespace: monitoring
stringData:
  datasources.yaml: |-
    {
        "apiVersion": 1,
        "datasources": [
            {
                "access": "proxy",
                "editable": false,
                "name": "prometheus",
                "orgId": 1,
                "type": "prometheus",
                "url": "http://thanos-query.monitoring.svc:9090",
                "version": 1
            }
        ]
    }
type: Opaque