kube-prometheus 实战指南
# 一、kube-prometheus简介
在当今的云原生时代,Kubernetes已成为容器编排的事实标准。随着应用的规模和复杂度不断增加,对Kubernetes集群的监控变得尤为重要。Prometheus作为一款成熟且广泛使用的监控系统,为Kubernetes集群提供了强大的监控能力。而kube-prometheus项目,则进一步简化了Prometheus监控在Kubernetes上的部署和管理,利用Prometheus Operator自动化部署Prometheus监控组件。
kubernetes的监控指标分为两种:
- Core metrics(核心指标):从 Kubelet、cAdvisor 等获取度量数据,再由metrics-server提供给 kube-scheduler、HPA、 控制器等使用,主要包括node和pod的cpu、内存。
- Custom Metrics(自定义指标):由Prometheus Adapter提供API custom.metrics.k8s.io,由此可支持任意Prometheus采集到的指标
# 二、核心组件
- Operator
kube-prometheus的核心组件,负责创建和维护Prometheus、ServiceMonitor、AlertManager以及PrometheusRule这四个CRD资源对象。它持续监控这些资源对象的状态,确保它们始终符合预期的配置。
- Prometheus
Prometheus CRD作为Prometheus Server的抽象,是声明性地描述了Prometheus部署的期望状态。
- Prometheus Server
由Operator根据Prometheus CRD的定义部署,构成Prometheus监控系统的主体。Prometheus Server通过pull模式,从ServiceMonitor指定的metrics数据接口中拉取监控数据。
- ServiceMonitor
ServiceMonitor 也是一个自定义CRD,它描述了一组被 Prometheus 监控的 targets 列表。该资源通过labelSelector来选取对应的 Service Endpoint,让 Prometheus Server 通过选取的 Service 来获取 Metrics 信息,其实就是exporter的各种抽象。
- kube-state-metrics
专门用于监控Kubernetes集群中资源对象的状态指标,如Deployment、Pod、Node等,为Prometheus提供Kubernetes特有的监控数据
- prometheus-adpater
功能: 它是一个API服务器,提供了一个APIServer服务,名为custom-metrics-apiserver,这个自定义APIServer通过Kubernetes aggregator聚合到apiserver,将prometheus中可以用PromQL查询到的指标数据转换成k8s对应的数据格式,在API组custom.metrics.k8s.io中提供出来。 启动参数:
- lister-kubeconfig: 默认使用in-cluster方式
- metrics-relist-interval: 更新metric缓存值的间隔,最好大于等于Prometheus 的scrape interval,不然数据会为空
- prometheus-url: 对应连接的prometheus地址
- config: 一个yaml文件,配置如何从prometheus获取数据,并与k8s的资源做对应,以及如何在api接口中展示。
笔记
参开文档: https://www.cnblogs.com/zhangmingcheng/p/15773348.html
# 三、安装部署
# 3.1 获取kube-prometheus
需要注意,自己当前集群版本与kube-prometheus的兼容性,具体参考:版本映射关系 (opens new window)
# GitHub仓库 直接获取
下载: git clone https://github.com/prometheus-operator/kube-prometheus.git
这里下载之后,可以看到对应的资源配置清单已经按照应用组件进行分类。
2
3
# 3.2 部署清单资源准备(镜像下载有问题)
提示
这里最新版本,官方把镜像仓库地址由k8s.gcr.io修改为registry.k8s.io,有人说这样可以在墙内直接访问,不过我这边试了下,镜像还是下载不到,所以进行下面镜像准备步骤,如果可以正常下载,忽略该步骤即可,目前使用版本v0.10.0。
- 修改镜像-->prometheus-adapter
1) 自定义构建镜像(pull代码时注意版本,如遇镜像构建失败,请根据实际情况修改镜像地址)
git pull https://github.com/kubernetes-sigs/prometheus-adapter.git
cd prometheus-adapter
docker build -t registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.9.1 .
2) prometheusAdapter-deployment.yaml
修改镜像地址为selina5288/prometheus-adapter:v0.9.1
2
3
4
5
6
- 修改镜像-->kube-state-metrics
1) 自定义构建镜像(注意版本号)
git clone --branch v2.4.2 https://github.com/kubernetes/kube-state-metrics.git
cd kube-state-metrics
docker build -t registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.4.2 .
2) kubeStateMetrics-deployment.yaml(注意对应tag)
修改镜像地址为 registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.4.2
2
3
4
5
6
7
# 3.3 部署清单资源
cd manifests
kubectl apply -f .
2
# 3.4 ingress暴露
提示
这里之所以简单安装ingress进行暴露,是因为这是我本地部署的,并且ingress-nginx-controller这里网络模式也修改为: hostNetwork: true。如果线上,还请根据实际情况进行ingress部署
- 关于nginx-ingress安装
1) 下载清单文件 修改pod网络模式为 hostNetwork: true
wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/deploy.yaml
2) 修改镜像 清单中镜像地址需要翻墙才可以
https://hub.docker.com/r/anjia0532/google-containers.ingress-nginx.controller/tags
2
3
4
- 暴露prometheus
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prom-ingress
namespace: monitoring
annotations:
kubernetes.io/ingress.class: "nginx"
prometheus.io/http_probe: "true"
spec:
rules:
- host: alert.tchua.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: alertmanager-main
port:
number: 9093
- host: grafana.tchua.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 3000
- host: prom.tchua.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-k8s
port:
number: 9090
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
- 访问
这里部署后,查看controller在哪台机器,然后在本地进行hosts解析即可
# 3.5 数据持久化存储
需要提前准备好pv pvc
1) 默认prometheus operator数据存储的时间为1d,如果修改数据存储时间,添加以下参数即可。
retention: 7d
2) 数据持久化存储配置(准备好pv pvc )
storage:
volumeClaimTemplate:
spec:
storageClassName: tchua-prometheus-data
resources:
requests:
storage: 200Gi
2
3
4
5
6
7
8
9
10
# 四、解决个别组件无法监控问题
注意
文档仅适用于官方分支release-0.7及以下,因为0.8以上,prometheus-serviceMonitor*
相关标签选择已经变化,并不适用于下面的文档,可以根据自己的版本修改对应的标签选择,否则还是找不到对应的实例,因为标签对应不上而关联不到。
# 4.1 KubeControllerManager
集群安装后,默认无法监控到KubeControllerManager,这是由于,serviceMonitorKubeControllerManager是基于svc选择,安装的集群默认是没有该组件svc,见文件(kubernetesControlPlane-serviceMonitorKubeControllerManager)
- 修改监听地址
vim /etc/kubernetes/manifests/kube-controller-manager.yaml
.......
- command:
- kube-controller-manager
- --allocate-node-cidrs=true
- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
- --bind-address=0.0.0.0
............
# 这里一般修改后会自动重启,如果没有则自己重启下即可
2
3
4
5
6
7
8
9
10
11
- 创建svc
vim kubernetesControlPlane-serviceKubeControllerManager.yaml
apiVersion: v1
kind: Service
metadata:
name: kube-controller-manager
namespace: kube-system
labels:
k8s-app: kube-controller-manager
spec:
type: ClusterIP
clusterIP: None
ports:
- name: https-metrics
port: 10257
targetPort: 10257
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: kube-controller-manager
namespace: kube-system
labels:
k8s-app: kube-controller-manager
subsets:
- addresses:
- ip: 10.15.0.205
ports:
- name: https-metrics
port: 10257
protocol: TCP
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# 4.2 KubeScheduler
KubeScheduler与KubeControllerManager同理,所以按照上面操作即可
- 修改监听地址
vim /etc/kubernetes/manifests/kube-scheduler.yaml
......
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=0.0.0.0
......
2
3
4
5
6
7
8
- 创建svc
apiVersion: v1
kind: Service
metadata:
name: kube-scheduler
namespace: kube-system
labels:
k8s-app: kube-scheduler
spec:
type: ClusterIP
clusterIP: None
ports:
- name: https-metrics
port: 10259
targetPort: 10259
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: kube-scheduler
namespace: kube-system
labels:
k8s-app: kube-scheduler
subsets:
- addresses:
- ip: 10.15.0.205
ports:
- name: https-metrics
port: 10259
protocol: TCP
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 五、告警配置
监控所有组件都启动之后,已经内置了很多监控项,基本所有k8s资源,节点组件这些都已经涵盖,不过有时候咱们需要对业务层以及其它资源监控时,就需要自定义监控项了。
# 5.1 自定义应用告警(2种方式)
- 通过prometheus-operator
提示
关于ServiceMonitor对象,由于监控时都是基于service进行选择,因此,通过ServiceMonitor自定义告警时,所需要监控的业务需要创建对应的service对象,这样才可以配置ServiceMonitor进行指标监控,另外,对于ServiceMonitor对象的创建这里,有两种方式:
- 参看现有资源配置清单进行修改:比如
prometheus-serviceMonitor.yaml
- 通过
kubectl explain ServiceMonitor
命令查看ServiceMonitor支持字段进行配置
添加自定义告警ServiceMonitor步骤:
1) 创建一个ServiceMonitor对象,用于 Prometheus 添加监控项
2) 为 ServiceMonitor 对象关联 metrics 数据接口的一个 Service 对象
3) 确保 通过Service 对象可以正确获取到 metrics 数据
2
3
4
- 通过Prometheus的additionalScrapeConfigs字段注册
kubectl explain Prometheus.spec.additionalScrapeConfigs
1) 准备应用指标过滤文件,此文件就是我们经常使用的应用监控配置项
cat bsd-tchua-jvm.yaml
2) 创建secret
kubectl create secret generic bsd-channel-jvm --from-file=bsd-tchua-jvm.yaml -n monitoring
3) 注册至prometheus
vim prometheus-prometheus.yaml # 新增
additionalScrapeConfigs:
name: bsd-tchua-jvm
key: bsd-tchua-jvm.yaml
4) 修改指标 --> 无需重启prometheus
kubectl delete secrets -n monitoring bsd-tchua-jvm
kubectl create secret generic bsd-tchua-jvm --from-file=bsd-tchua-jvm.yaml -n monitoring
2
3
4
5
6
7
8
9
10
11
12
13
# 5.2 pod状态相关指标
kubernetesControlPlane-prometheusRule.yaml 配置文件
- kube_pod_container_status_waiting_reason 指标
pod处于waiting状态原因
1) ContainerCreating
2) CrashLoopBackOff Kubernetes试图启动该Pod,但是过程中出现错误
3) CreateContainerConfigError
4) ErrImagePull
5) ImagePullBackOff
6) CreateContainerError
7) InvalidImageName
告警表达式:
- alert: 应用启动失败
annotations:
description: '环境:{{ $labels.namespace }},pod名称:{{ $labels.pod }},应用名称: {{ $labels.container}},失败原因: {{ $labels.reason }}.'
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/kubecontainerwaiting
summary: Pod container waiting longer than 1 hour
expr: |
sum by (namespace, pod, container,reason) (kube_pod_container_status_waiting_reason{job="kube-state-metrics"}) > 0
for: 5m
labels:
severity: warning
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
- KubePodNotReady
含义: pod 一直没有就绪
表达式:
sum by (namespace, pod) (
max by(namespace, pod) (
kube_pod_status_phase{job="kube-state-metrics", phase=~"Pending|Unknown"}
) * on(namespace, pod) group_left(owner_kind) topk by(namespace, pod) (
1, max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!="Job"})
)
) > 0
2
3
4
5
6
7
8
9
10
- KubeDeploymentReplicasMismatch
含义: 部署的副本与预期副本数不一致,联系运维处理
- pod内存使用率
- alert: pod 内存使用率
expr: (sum(container_memory_working_set_bytes{container!="POD",name!=""}) BY (instance, namespace,pod) / sum(container_spec_memory_limit_bytes > 0) BY (instance, namespace,pod) * 100) > 95
for: 2m
labels:
severity: warning
annotations:
summary: Container Memory usage (instance {{ $labels.instance }})
description: '{{ $labels.namespace }}/{{ $labels.pod }} Pod Memory usage is above 95%,当前值: {{ $value | printf "%.2f" }}'
2
3
4
5
6
7
8
# 5.3 alertmanager配置修改
提示
alertmanager配置修改也是有两种方式,一种是直接修改alertmanager-secret.yaml
对应创建的secret,另一种是使用AlertmanagerConfig资源对象创建。
# 使用secret方式
- 查看当前配置
1、查看alertmanager-secret.yaml配置文件
2、查看secret
1) kubectl get secrets alertmanager-main -n monitoring -o yaml # 截取alertmanager.yaml:
2) echo "" |base64 -d
"global":
"resolve_timeout": "5m"
"inhibit_rules":
- "equal":
- "namespace"
- "alertname"
"source_matchers":
- "severity = critical"
"target_matchers":
- "severity =~ warning|info"
- "equal":
- "namespace"
- "alertname"
"source_matchers":
- "severity = warning"
"target_matchers":
- "severity = info"
- "equal":
- "namespace"
"source_matchers":
- "alertname = InfoInhibitor"
"target_matchers":
- "severity = info"
"receivers":
- "name": "Default"
- "name": "Watchdog"
- "name": "Critical"
- "name": "null"
"route":
"group_by":
- "namespace"
"group_interval": "5m"
"group_wait": "30s"
"receiver": "Default"
"repeat_interval": "12h"
"routes":
- "matchers":
- "alertname = Watchdog"
"receiver": "Watchdog"
- "matchers":
- "alertname = InfoInhibitor"
"receiver": "null"
- "matchers":
- "severity = critical"
"receiver": "Critical"
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
- 创建新配置
# alertmanager.yaml
global:
resolve_timeout: 1m
route:
receiver: 'default-receiver'
group_by: ['alertname']
group_wait: 10s
group_interval: 2m
repeat_interval: 1h
routes:
- receiver: "web.hook.channel"
match:
severity: warning
receivers:
- name: 'default-receiver'
webhook_configs:
- url: 'http://dingtalk-hook-svc:5000'
send_resolved: true
- name: 'web.hook.channel'
webhook_configs:
- url: 'http://10.1.1.173:5000'
send_resolved: true
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
- 应用配置
1) 删除之前配置
kubectl delete secret alertmanager-main -n monitoring
2) 创建配置
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
2
3
4
# 使用AlertmanagerConfig对象
AlertmanagerConfig作为一种k8s中资源对象,咱们也可以通过想定义其它资源一样对其进行资源清单的配置,可以使用下面命令查看支持的字段:
kubectl explain AlertmanagerConfig
2
- 创建
alertmanager-config.yaml
文件
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alertmanager-config
namespace: monitoring
labels:
alertmanagerConfig: example
spec:
route:
groupBy: ['job']
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receiver: 'webhook'
receivers:
- name: 'webhook'
webhookConfigs:
- url: 'http://10.1.1.173:18088/v1/alert'
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
- 关联Alertmanager现有config配置
# alertmanager-alertmanager.yaml配置中在Alertmanager.spec下面新增以下字段,这样新增的配置就会追加到现有的config配置里面
# ## 这里需要注意的是matchLabels选择的标签需要与AlertmanagerConfig对象中定义的一致,否则会无法关联
alertmanagerConfigSelector:
matchLabels:
alertmanagerConfig: example
2
3
4
5
# 5.4 自定义告警规则
- 创建PrometheusRule资源清单
cat jvm/bsd-channel-prometheusRules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels: # 标签需要与prometheus资源清单中ruleSelector对应
prometheus: k8s
role: alert-rules
name: bsd-channel-jvm
namespace: monitoring
spec:
groups:
- name: bsd-channel-jvm-rules
rules:
- alert: 堆内存使用率
expr: sum(jvm_memory_used_bytes{area="heap"}) by(instance,app_name) / sum(jvm_memory_max_bytes{area="heap"}) by(instance,app_name) * 100 > 95
for: 2m
labels:
severity: warning
annotations:
description: '环境:{{ $labels.namespace }},应用:{{ $labels.app_name }} 堆内存使用率超过95%,当前值:{{ $value | printf "%.2f" }}'
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
- 规则与prometheus关联
上面清单中可以看到在metedata中定义的labels
labels:
prometheus: k8s
role: alert-rules
需要与prometheus中定义的ruleSelector对应
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
2
3
4
5
6
7
8
9
# 六、组件解读
# 6.1 ServiceMonitor
- 介绍
ServiceMonitor通过operator创建后,也属于kubertenes中的资源,一个ServiceMonitor 可以通过 labelSelector 的方式去匹配一类 Service
- 资源定义
kubeStateMetrics-serviceMonitor 为例
cat kubeStateMetrics-serviceMonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.4.2
name: kube-state-metrics
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
honorLabels: true
interval: 30s # 每30s获取一次信息
port: https-main # kube-state-metrics svc中端口名
relabelings:
- action: labeldrop
regex: (pod|service|endpoint|namespace)
scheme: https
scrapeTimeout: 30s
tlsConfig:
insecureSkipVerify: true
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: https-self
scheme: https
tlsConfig:
insecureSkipVerify: true
jobLabel: app.kubernetes.io/name
selector: # 匹配的 Service 的labels,如果使用mathLabels,则下面的所有标签都匹配时才会匹配该service,如果使用matchExpressions,则至少匹配一个标签的service都会被选择
matchLabels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: kube-prometheus
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
注意: 更多配置参考: kubectl explain ServiceMonitor.spec 命令
# 七、自动发现解读
# 7.1 node
Node role发现每个集群中的目标是通过默认的kubelet的HTTP端口。目标地址默认是kubernetes如下地址中node的第一个地址(NodeInternalIP
, NodeExternalIP
,NodeLegacyHostIP
, and NodeHostName
.),此外,node的实例标签将会被设置成从API server传递过来的node的name。
可用的meta标签有:
__meta_kubernetes_node_name: The name of the node object.
__meta_kubernetes_node_label_<labelname>: Each label from the node object.
__meta_kubernetes_node_labelpresent_<labelname>: true for each label from the node object.
__meta_kubernetes_node_annotation_<annotationname>: Each annotation from the node object.
__meta_kubernetes_node_annotationpresent_<annotationname>: true for each annotation from the node object.
__meta_kubernetes_node_address_<address_type>: The first address for each node address type, if it exists.
2
3
4
5
6
# 7.2 service
service角色会为每个服务发现一个服务端口。对于黑盒监控的服务,这个比较有用。address将会被设置成service的kubernetes DNS名称以及各自的服务端口。
可用标签有:
__meta_kubernetes_namespace: The namespace of the service object.
__meta_kubernetes_service_annotation_<annotationname>: Each annotation from the service object.
__meta_kubernetes_service_annotationpresent_<annotationname>: "true" for each annotation of the service object.
__meta_kubernetes_service_cluster_ip: The cluster IP address of the service. (Does not apply to services of type ExternalName)
__meta_kubernetes_service_external_name: The DNS name of the service. (Applies to services of type ExternalName)
__meta_kubernetes_service_label_<labelname>: Each label from the service object.
__meta_kubernetes_service_labelpresent_<labelname>: true for each label of the service object.
__meta_kubernetes_service_name: The name of the service object.
__meta_kubernetes_service_port_name: Name of the service port for the target.
__meta_kubernetes_service_port_protocol: Protocol of the service port for the target.
2
3
4
5
6
7
8
9
10
# 7.3 pod
Pod role会发现所有pods以及暴露的容器作为target。每个容器声明一个端口,一个单独的target就会生成。如果一个容器没有指定端口,通过relabel手动指定一个端口,一个port-free target容器将会生成。
可用标签:
__meta_kubernetes_namespace: The namespace of the pod object.
__meta_kubernetes_pod_name: The name of the pod object.
__meta_kubernetes_pod_ip: The pod IP of the pod object.
__meta_kubernetes_pod_label_<labelname>: Each label from the pod object.
__meta_kubernetes_pod_labelpresent_<labelname>: truefor each label from the pod object.
__meta_kubernetes_pod_annotation_<annotationname>: Each annotation from the pod object.
__meta_kubernetes_pod_annotationpresent_<annotationname>: true for each annotation from the pod object.
__meta_kubernetes_pod_container_init: true if the container is an InitContainer
__meta_kubernetes_pod_container_name: Name of the container the target address points to.
__meta_kubernetes_pod_container_port_name: Name of the container port.
__meta_kubernetes_pod_container_port_number: Number of the container port.
__meta_kubernetes_pod_container_port_protocol: Protocol of the container port.
__meta_kubernetes_pod_ready: Set to true or false for the pod's ready state.
__meta_kubernetes_pod_phase: Set to Pending, Running, Succeeded, Failed or Unknown in the lifecycle.
__meta_kubernetes_pod_node_name: The name of the node the pod is scheduled onto.
__meta_kubernetes_pod_host_ip: The current host IP of the pod object.
__meta_kubernetes_pod_uid: The UID of the pod object.
__meta_kubernetes_pod_controller_kind: Object kind of the pod controller.
__meta_kubernetes_pod_controller_name: Name of the pod controller.
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 7.4 endpoints
endpoints role从每个服务监听的endpoints发现。每个endpoint都会发现一个port。如果endpoint是一个pod,所有包含的容器不被绑定到一个endpoint port,也会被targets被发现。
可用标签:
__meta_kubernetes_namespace: The namespace of the endpoints object.
__meta_kubernetes_endpoints_name: The names of the endpoints object.
For all targets discovered directly from the endpoints list (those not additionally inferred from underlying pods), the following labels are attached:
__meta_kubernetes_endpoint_hostname: Hostname of the endpoint.
__meta_kubernetes_endpoint_node_name: Name of the node hosting the endpoint.
__meta_kubernetes_endpoint_ready: Set to true or false for the endpoint's ready state.
__meta_kubernetes_endpoint_port_name: Name of the endpoint port.
__meta_kubernetes_endpoint_port_protocol: Protocol of the endpoint port.
__meta_kubernetes_endpoint_address_target_kind: Kind of the endpoint address target.
__meta_kubernetes_endpoint_address_target_name: Name of the endpoint address target.
If the endpoints belong to a service, all labels of the role: service discovery are attached.
For all targets backed by a pod, all labels of the role: pod discovery are attached.
2
3
4
5
6
7
8
9
10
11
12
# 7.5 ingress
ingress role将会发现每个ingress。ingress在黑盒监控上比较有用。address将会被设置成ingress指定的配置。
可使用标签:
__meta_kubernetes_namespace: The namespace of the ingress object.
__meta_kubernetes_ingress_name: The name of the ingress object.
__meta_kubernetes_ingress_label_<labelname>: Each label from the ingress object.
__meta_kubernetes_ingress_labelpresent_<labelname>: true for each label from the ingress object.
__meta_kubernetes_ingress_annotation_<annotationname>: Each annotation from the ingress object.
__meta_kubernetes_ingress_annotationpresent_<annotationname>: true for each annotation from the ingress object.
__meta_kubernetes_ingress_scheme: Protocol scheme of ingress, https if TLS config is set. Defaults to http.
__meta_kubernetes_ingress_path: Path from ingress spec. Defaults to /.
2
3
4
5
6
7
8
# 附件: 问题记录
# 1、kubeadm 部署k8s 1.23.1 记录
1、kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""
解决:
1) 修改docker配置
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver":"json-file",
"log-opts": {"max-size":"50m", "max-file":"2"}
}
EOF
2
3
4
5
6
7
8
9
10
11
# 2、kube-prometheus部署
1、The CustomResourceDefinition "prometheuses.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
解决:
由apply改为create创建资源
2
3