小哥之哥 小哥之哥
首页
    • Prometheus
    • Kubertenes
    • Docker
    • MySQL
  • Go
  • Python
  • Vue
  • Jenkins
  • ELK
  • LDAP
  • 随笔
  • 最佳实践
  • 博客搭建
  • 问题杂谈
关于
友链
  • 分类
  • 标签
  • 归档
GitHub (opens new window)

小哥之哥

运维扫地僧
首页
    • Prometheus
    • Kubertenes
    • Docker
    • MySQL
  • Go
  • Python
  • Vue
  • Jenkins
  • ELK
  • LDAP
  • 随笔
  • 最佳实践
  • 博客搭建
  • 问题杂谈
关于
友链
  • 分类
  • 标签
  • 归档
GitHub (opens new window)
  • Kubertenes

  • Prometheus

    • Prometheus介绍
    • Prometheus安装部署
    • PromQL学习
    • kube-prometheus 实战指南
      • Prometheus高可用-Thanos
      • Prometheus监控JVM实战
      • Alertmanager报警历史持久化
      • Kubernetes监控的自定义部署实践【未完整版】
      • Prometheus远程存储之VictoriaMetrics
      • Prometheus数据迁移至VMstorage
    • Docker

    • 数据库

    • 运维利器

    • 运维
    • Prometheus
    tchua
    2023-02-09
    目录

    kube-prometheus 实战指南

    # 一、kube-prometheus简介

    在当今的云原生时代,Kubernetes已成为容器编排的事实标准。随着应用的规模和复杂度不断增加,对Kubernetes集群的监控变得尤为重要。Prometheus作为一款成熟且广泛使用的监控系统,为Kubernetes集群提供了强大的监控能力。而kube-prometheus项目,则进一步简化了Prometheus监控在Kubernetes上的部署和管理,利用Prometheus Operator自动化部署Prometheus监控组件。

    kubernetes的监控指标分为两种:

    • Core metrics(核心指标):从 Kubelet、cAdvisor 等获取度量数据,再由metrics-server提供给 kube-scheduler、HPA、 控制器等使用,主要包括node和pod的cpu、内存。
    • Custom Metrics(自定义指标):由Prometheus Adapter提供API custom.metrics.k8s.io,由此可支持任意Prometheus采集到的指标
    # 二、核心组件

    img

    • Operator

    kube-prometheus的核心组件,负责创建和维护Prometheus、ServiceMonitor、AlertManager以及PrometheusRule这四个CRD资源对象。它持续监控这些资源对象的状态,确保它们始终符合预期的配置。

    • Prometheus

    Prometheus CRD作为Prometheus Server的抽象,是声明性地描述了Prometheus部署的期望状态。

    • Prometheus Server

    由Operator根据Prometheus CRD的定义部署,构成Prometheus监控系统的主体。Prometheus Server通过pull模式,从ServiceMonitor指定的metrics数据接口中拉取监控数据。

    • ServiceMonitor

    ServiceMonitor 也是一个自定义CRD,它描述了一组被 Prometheus 监控的 targets 列表。该资源通过labelSelector来选取对应的 Service Endpoint,让 Prometheus Server 通过选取的 Service 来获取 Metrics 信息,其实就是exporter的各种抽象。

    • kube-state-metrics

    专门用于监控Kubernetes集群中资源对象的状态指标,如Deployment、Pod、Node等,为Prometheus提供Kubernetes特有的监控数据

    • prometheus-adpater

    功能: 它是一个API服务器,提供了一个APIServer服务,名为custom-metrics-apiserver,这个自定义APIServer通过Kubernetes aggregator聚合到apiserver,将prometheus中可以用PromQL查询到的指标数据转换成k8s对应的数据格式,在API组custom.metrics.k8s.io中提供出来。 启动参数:

    1. lister-kubeconfig: 默认使用in-cluster方式
    2. metrics-relist-interval: 更新metric缓存值的间隔,最好大于等于Prometheus 的scrape interval,不然数据会为空
    3. prometheus-url: 对应连接的prometheus地址
    4. config: 一个yaml文件,配置如何从prometheus获取数据,并与k8s的资源做对应,以及如何在api接口中展示。

    笔记

    参开文档: https://www.cnblogs.com/zhangmingcheng/p/15773348.html

    # 三、安装部署

    # 3.1 获取kube-prometheus

    需要注意,自己当前集群版本与kube-prometheus的兼容性,具体参考:版本映射关系 (opens new window)

    # GitHub仓库 直接获取
    下载: git clone https://github.com/prometheus-operator/kube-prometheus.git
    这里下载之后,可以看到对应的资源配置清单已经按照应用组件进行分类。
    
    1
    2
    3
    # 3.2 部署清单资源准备(镜像下载有问题)

    提示

    这里最新版本,官方把镜像仓库地址由k8s.gcr.io修改为registry.k8s.io,有人说这样可以在墙内直接访问,不过我这边试了下,镜像还是下载不到,所以进行下面镜像准备步骤,如果可以正常下载,忽略该步骤即可,目前使用版本v0.10.0。

    • 修改镜像-->prometheus-adapter
    1) 自定义构建镜像(pull代码时注意版本,如遇镜像构建失败,请根据实际情况修改镜像地址)
    git pull https://github.com/kubernetes-sigs/prometheus-adapter.git
    cd prometheus-adapter 
    docker build -t registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.9.1 .
    2) prometheusAdapter-deployment.yaml
    	修改镜像地址为selina5288/prometheus-adapter:v0.9.1
    
    1
    2
    3
    4
    5
    6
    • 修改镜像-->kube-state-metrics
    1) 自定义构建镜像(注意版本号)
    git clone --branch v2.4.2 https://github.com/kubernetes/kube-state-metrics.git
    cd kube-state-metrics
    docker build -t registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.4.2 .
    
    2) kubeStateMetrics-deployment.yaml(注意对应tag)
    修改镜像地址为 registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.4.2
    
    1
    2
    3
    4
    5
    6
    7
    # 3.3 部署清单资源
    cd manifests
    kubectl apply -f .
    
    1
    2
    # 3.4 ingress暴露

    提示

    这里之所以简单安装ingress进行暴露,是因为这是我本地部署的,并且ingress-nginx-controller这里网络模式也修改为: hostNetwork: true。如果线上,还请根据实际情况进行ingress部署

    • 关于nginx-ingress安装
    1) 下载清单文件 修改pod网络模式为 hostNetwork: true
    wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/cloud/deploy.yaml 
    2) 修改镜像 清单中镜像地址需要翻墙才可以
    https://hub.docker.com/r/anjia0532/google-containers.ingress-nginx.controller/tags
    
    1
    2
    3
    4
    • 暴露prometheus
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: prom-ingress
      namespace: monitoring
      annotations:
        kubernetes.io/ingress.class: "nginx"
        prometheus.io/http_probe: "true"
    spec:
      rules:
      - host: alert.tchua.com
        http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: alertmanager-main
                port:
                  number: 9093
      - host: grafana.tchua.com
        http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: grafana
                port:
                  number: 3000
      - host: prom.tchua.com
        http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: prometheus-k8s
                port:
                  number: 9090
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    • 访问
    这里部署后,查看controller在哪台机器,然后在本地进行hosts解析即可
    
    1
    # 3.5 数据持久化存储

    需要提前准备好pv pvc

    1) 默认prometheus operator数据存储的时间为1d,如果修改数据存储时间,添加以下参数即可。
    retention: 7d
    2) 数据持久化存储配置(准备好pv pvc )
    storage:
        volumeClaimTemplate:
          spec:
            storageClassName: tchua-prometheus-data
            resources:
              requests:
                storage: 200Gi
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    # 四、解决个别组件无法监控问题

    注意

    文档仅适用于官方分支release-0.7及以下,因为0.8以上,prometheus-serviceMonitor*相关标签选择已经变化,并不适用于下面的文档,可以根据自己的版本修改对应的标签选择,否则还是找不到对应的实例,因为标签对应不上而关联不到。

    # 4.1 KubeControllerManager
    集群安装后,默认无法监控到KubeControllerManager,这是由于,serviceMonitorKubeControllerManager是基于svc选择,安装的集群默认是没有该组件svc,见文件(kubernetesControlPlane-serviceMonitorKubeControllerManager)
    
    1
    • 修改监听地址
    vim /etc/kubernetes/manifests/kube-controller-manager.yaml
    .......
    - command:
        - kube-controller-manager
        - --allocate-node-cidrs=true
        - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
        - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
        - --bind-address=0.0.0.0
    ............
    
    # 这里一般修改后会自动重启,如果没有则自己重启下即可
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    • 创建svc
    vim kubernetesControlPlane-serviceKubeControllerManager.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: kube-controller-manager
      namespace: kube-system
      labels:
        k8s-app: kube-controller-manager
    spec:
      type: ClusterIP
      clusterIP: None
      ports:
      - name: https-metrics
        port: 10257
        targetPort: 10257
        protocol: TCP
    
    ---
    apiVersion: v1
    kind: Endpoints
    metadata:
      name: kube-controller-manager
      namespace: kube-system
      labels:
        k8s-app: kube-controller-manager
    subsets:
    - addresses:
      - ip: 10.15.0.205
      ports:
        - name: https-metrics
          port: 10257
          protocol: TCP
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    # 4.2 KubeScheduler
    KubeScheduler与KubeControllerManager同理,所以按照上面操作即可
    
    1
    • 修改监听地址
    vim /etc/kubernetes/manifests/kube-scheduler.yaml
    ......
    - command:
        - kube-scheduler
        - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
        - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
        - --bind-address=0.0.0.0
    ......
    
    1
    2
    3
    4
    5
    6
    7
    8
    • 创建svc
    apiVersion: v1
    kind: Service
    metadata:
      name: kube-scheduler
      namespace: kube-system
      labels:
        k8s-app: kube-scheduler
    spec:
      type: ClusterIP
      clusterIP: None
      ports:
      - name: https-metrics
        port: 10259
        targetPort: 10259
        protocol: TCP
    
    ---
    apiVersion: v1
    kind: Endpoints
    metadata:
      name: kube-scheduler
      namespace: kube-system
      labels:
        k8s-app: kube-scheduler
    subsets:
    - addresses:
      - ip: 10.15.0.205
      ports:
        - name: https-metrics
          port: 10259
          protocol: TCP
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    # 五、告警配置

    监控所有组件都启动之后,已经内置了很多监控项,基本所有k8s资源,节点组件这些都已经涵盖,不过有时候咱们需要对业务层以及其它资源监控时,就需要自定义监控项了。

    # 5.1 自定义应用告警(2种方式)
    • 通过prometheus-operator

    提示

    关于ServiceMonitor对象,由于监控时都是基于service进行选择,因此,通过ServiceMonitor自定义告警时,所需要监控的业务需要创建对应的service对象,这样才可以配置ServiceMonitor进行指标监控,另外,对于ServiceMonitor对象的创建这里,有两种方式:

    • 参看现有资源配置清单进行修改:比如prometheus-serviceMonitor.yaml
    • 通过kubectl explain ServiceMonitor 命令查看ServiceMonitor支持字段进行配置
    添加自定义告警ServiceMonitor步骤:
    1) 创建一个ServiceMonitor对象,用于 Prometheus 添加监控项
    2) 为 ServiceMonitor 对象关联 metrics 数据接口的一个 Service 对象
    3) 确保 通过Service 对象可以正确获取到 metrics 数据
    
    1
    2
    3
    4
    • 通过Prometheus的additionalScrapeConfigs字段注册

    kubectl explain Prometheus.spec.additionalScrapeConfigs

    1) 准备应用指标过滤文件,此文件就是我们经常使用的应用监控配置项
     cat bsd-tchua-jvm.yaml
    2) 创建secret
    kubectl create secret generic bsd-channel-jvm --from-file=bsd-tchua-jvm.yaml -n monitoring
    3) 注册至prometheus
    vim prometheus-prometheus.yaml # 新增
    additionalScrapeConfigs:
        name: bsd-tchua-jvm
        key: bsd-tchua-jvm.yaml
        
    4) 修改指标 --> 无需重启prometheus
    kubectl delete secrets -n monitoring bsd-tchua-jvm
    kubectl create secret generic bsd-tchua-jvm --from-file=bsd-tchua-jvm.yaml -n monitoring
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    # 5.2 pod状态相关指标

    kubernetesControlPlane-prometheusRule.yaml 配置文件

    • kube_pod_container_status_waiting_reason 指标
    pod处于waiting状态原因
    	1) ContainerCreating
    	2) CrashLoopBackOff Kubernetes试图启动该Pod,但是过程中出现错误
    	3) CreateContainerConfigError
    	4) ErrImagePull
    	5) ImagePullBackOff
    	6) CreateContainerError
    	7) InvalidImageName
    	
    告警表达式:
    - alert: 应用启动失败
          annotations:
            description: '环境:{{ $labels.namespace }},pod名称:{{ $labels.pod }},应用名称: {{ $labels.container}},失败原因: {{ $labels.reason }}.'
            runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/kubecontainerwaiting
            summary: Pod container waiting longer than 1 hour
          expr: |
            sum by (namespace, pod, container,reason) (kube_pod_container_status_waiting_reason{job="kube-state-metrics"}) > 0
          for: 5m
          labels:
            severity: warning
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    • KubePodNotReady
    含义: pod 一直没有就绪
    表达式: 
    sum by (namespace, pod) (
              max by(namespace, pod) (
                kube_pod_status_phase{job="kube-state-metrics", phase=~"Pending|Unknown"}
              ) * on(namespace, pod) group_left(owner_kind) topk by(namespace, pod) (
                1, max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!="Job"})
              )
            ) > 0
    
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    • KubeDeploymentReplicasMismatch
    含义: 部署的副本与预期副本数不一致,联系运维处理
    
    1
    • pod内存使用率
    - alert: pod 内存使用率
          expr: (sum(container_memory_working_set_bytes{container!="POD",name!=""}) BY (instance, namespace,pod) / sum(container_spec_memory_limit_bytes > 0) BY (instance, namespace,pod) * 100) > 95
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: Container Memory usage (instance {{ $labels.instance }})
            description: '{{ $labels.namespace }}/{{ $labels.pod }} Pod Memory usage is above 95%,当前值: {{ $value | printf "%.2f" }}'
    
    1
    2
    3
    4
    5
    6
    7
    8
    # 5.3 alertmanager配置修改

    提示

    alertmanager配置修改也是有两种方式,一种是直接修改alertmanager-secret.yaml 对应创建的secret,另一种是使用AlertmanagerConfig资源对象创建。

    # 使用secret方式

    • 查看当前配置
    1、查看alertmanager-secret.yaml配置文件
    2、查看secret
     1) kubectl get secrets alertmanager-main -n monitoring -o yaml # 截取alertmanager.yaml:
     2) echo "" |base64 -d
     	"global":
      "resolve_timeout": "5m"
    "inhibit_rules":
    - "equal":
      - "namespace"
      - "alertname"
      "source_matchers":
      - "severity = critical"
      "target_matchers":
      - "severity =~ warning|info"
    - "equal":
      - "namespace"
      - "alertname"
      "source_matchers":
      - "severity = warning"
      "target_matchers":
      - "severity = info"
    - "equal":
      - "namespace"
      "source_matchers":
      - "alertname = InfoInhibitor"
      "target_matchers":
      - "severity = info"
    "receivers":
    - "name": "Default"
    - "name": "Watchdog"
    - "name": "Critical"
    - "name": "null"
    "route":
      "group_by":
      - "namespace"
      "group_interval": "5m"
      "group_wait": "30s"
      "receiver": "Default"
      "repeat_interval": "12h"
      "routes":
      - "matchers":
        - "alertname = Watchdog"
        "receiver": "Watchdog"
      - "matchers":
        - "alertname = InfoInhibitor"
        "receiver": "null"
      - "matchers":
        - "severity = critical"
        "receiver": "Critical"
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    • 创建新配置
    # alertmanager.yaml 
    global:
      resolve_timeout: 1m
      
    route:
      receiver: 'default-receiver'
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 2m
      repeat_interval: 1h
      
      routes:
      - receiver: "web.hook.channel"
        match:
          severity: warning
      
    receivers:
    - name: 'default-receiver'
      webhook_configs:
      - url: 'http://dingtalk-hook-svc:5000'
        send_resolved: true
    - name: 'web.hook.channel'
      webhook_configs:
      - url: 'http://10.1.1.173:5000'
        send_resolved: true
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    • 应用配置
    1) 删除之前配置
    kubectl delete secret alertmanager-main -n monitoring
    2) 创建配置
    kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
    
    1
    2
    3
    4
    # 使用AlertmanagerConfig对象

    AlertmanagerConfig作为一种k8s中资源对象,咱们也可以通过想定义其它资源一样对其进行资源清单的配置,可以使用下面命令查看支持的字段:
    kubectl explain AlertmanagerConfig
    
    1
    2
    • 创建alertmanager-config.yaml文件
    apiVersion: monitoring.coreos.com/v1alpha1
    kind: AlertmanagerConfig
    metadata:
      name: alertmanager-config
      namespace: monitoring
      labels:
        alertmanagerConfig: example
    spec:
      route:
        groupBy: ['job']
        groupWait: 30s
        groupInterval: 5m
        repeatInterval: 12h
        receiver: 'webhook'
      receivers:
      - name: 'webhook'
        webhookConfigs:
        - url: 'http://10.1.1.173:18088/v1/alert'
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    • 关联Alertmanager现有config配置
    # alertmanager-alertmanager.yaml配置中在Alertmanager.spec下面新增以下字段,这样新增的配置就会追加到现有的config配置里面
    # ## 这里需要注意的是matchLabels选择的标签需要与AlertmanagerConfig对象中定义的一致,否则会无法关联
    alertmanagerConfigSelector:
        matchLabels:
          alertmanagerConfig: example
    
    1
    2
    3
    4
    5
    # 5.4 自定义告警规则
    • 创建PrometheusRule资源清单
    cat jvm/bsd-channel-prometheusRules.yaml
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      labels:   # 标签需要与prometheus资源清单中ruleSelector对应
        prometheus: k8s    
        role: alert-rules
      name: bsd-channel-jvm
      namespace: monitoring
    spec:
      groups:
      - name: bsd-channel-jvm-rules
        rules:
        - alert: 堆内存使用率
          expr: sum(jvm_memory_used_bytes{area="heap"}) by(instance,app_name) / sum(jvm_memory_max_bytes{area="heap"}) by(instance,app_name) * 100 > 95 
          for: 2m
          labels:
            severity: warning
          annotations:
            description: '环境:{{ $labels.namespace }},应用:{{ $labels.app_name }} 堆内存使用率超过95%,当前值:{{ $value | printf "%.2f" }}'
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    • 规则与prometheus关联
    上面清单中可以看到在metedata中定义的labels
    labels:
      prometheus: k8s    
      role: alert-rules
    需要与prometheus中定义的ruleSelector对应
    ruleSelector:
        matchLabels:
          prometheus: k8s
          role: alert-rules
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    # 六、组件解读
    # 6.1 ServiceMonitor
    • 介绍
    ServiceMonitor通过operator创建后,也属于kubertenes中的资源,一个ServiceMonitor 可以通过 labelSelector 的方式去匹配一类 Service
    
    1
    • 资源定义

    kubeStateMetrics-serviceMonitor 为例

    cat kubeStateMetrics-serviceMonitor.yaml
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        app.kubernetes.io/component: exporter
        app.kubernetes.io/name: kube-state-metrics
        app.kubernetes.io/part-of: kube-prometheus
        app.kubernetes.io/version: 2.4.2
      name: kube-state-metrics
      namespace: monitoring
    spec:
      endpoints:
      - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
        honorLabels: true
        interval: 30s   	# 每30s获取一次信息
        port: https-main	# kube-state-metrics svc中端口名
        relabelings:
        - action: labeldrop
          regex: (pod|service|endpoint|namespace)
        scheme: https
        scrapeTimeout: 30s
        tlsConfig:
          insecureSkipVerify: true
      - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
        interval: 30s
        port: https-self
        scheme: https
        tlsConfig:
          insecureSkipVerify: true
      jobLabel: app.kubernetes.io/name
      selector: # 匹配的 Service 的labels,如果使用mathLabels,则下面的所有标签都匹配时才会匹配该service,如果使用matchExpressions,则至少匹配一个标签的service都会被选择
        matchLabels:
          app.kubernetes.io/component: exporter
          app.kubernetes.io/name: kube-state-metrics
          app.kubernetes.io/part-of: kube-prometheus
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36

    注意: 更多配置参考: kubectl explain ServiceMonitor.spec 命令

    # 七、自动发现解读
    # 7.1 node

    Node role发现每个集群中的目标是通过默认的kubelet的HTTP端口。目标地址默认是kubernetes如下地址中node的第一个地址(NodeInternalIP, NodeExternalIP,NodeLegacyHostIP, and NodeHostName.),此外,node的实例标签将会被设置成从API server传递过来的node的name。

    可用的meta标签有:

    __meta_kubernetes_node_name: The name of the node object.
    __meta_kubernetes_node_label_<labelname>: Each label from the node object.
    __meta_kubernetes_node_labelpresent_<labelname>: true for each label from the node object.
    __meta_kubernetes_node_annotation_<annotationname>: Each annotation from the node object.
    __meta_kubernetes_node_annotationpresent_<annotationname>: true for each annotation from the node object.
    __meta_kubernetes_node_address_<address_type>: The first address for each node address type, if it exists.
    
    1
    2
    3
    4
    5
    6
    # 7.2 service

    service角色会为每个服务发现一个服务端口。对于黑盒监控的服务,这个比较有用。address将会被设置成service的kubernetes DNS名称以及各自的服务端口。

    可用标签有:

    __meta_kubernetes_namespace: The namespace of the service object.
    __meta_kubernetes_service_annotation_<annotationname>: Each annotation from the service object.
    __meta_kubernetes_service_annotationpresent_<annotationname>: "true" for each annotation of the service object.
    __meta_kubernetes_service_cluster_ip: The cluster IP address of the service. (Does not apply to services of type ExternalName)
    __meta_kubernetes_service_external_name: The DNS name of the service. (Applies to services of type ExternalName)
    __meta_kubernetes_service_label_<labelname>: Each label from the service object.
    __meta_kubernetes_service_labelpresent_<labelname>: true for each label of the service object.
    __meta_kubernetes_service_name: The name of the service object.
    __meta_kubernetes_service_port_name: Name of the service port for the target.
    __meta_kubernetes_service_port_protocol: Protocol of the service port for the target.
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    # 7.3 pod

    Pod role会发现所有pods以及暴露的容器作为target。每个容器声明一个端口,一个单独的target就会生成。如果一个容器没有指定端口,通过relabel手动指定一个端口,一个port-free target容器将会生成。

    可用标签:

    __meta_kubernetes_namespace: The namespace of the pod object.
    __meta_kubernetes_pod_name: The name of the pod object.
    __meta_kubernetes_pod_ip: The pod IP of the pod object.
    __meta_kubernetes_pod_label_<labelname>: Each label from the pod object.
    __meta_kubernetes_pod_labelpresent_<labelname>: truefor each label from the pod object.
    __meta_kubernetes_pod_annotation_<annotationname>: Each annotation from the pod object.
    __meta_kubernetes_pod_annotationpresent_<annotationname>: true for each annotation from the pod object.
    __meta_kubernetes_pod_container_init: true if the container is an InitContainer
    __meta_kubernetes_pod_container_name: Name of the container the target address points to.
    __meta_kubernetes_pod_container_port_name: Name of the container port.
    __meta_kubernetes_pod_container_port_number: Number of the container port.
    __meta_kubernetes_pod_container_port_protocol: Protocol of the container port.
    __meta_kubernetes_pod_ready: Set to true or false for the pod's ready state.
    __meta_kubernetes_pod_phase: Set to Pending, Running, Succeeded, Failed or Unknown in the lifecycle.
    __meta_kubernetes_pod_node_name: The name of the node the pod is scheduled onto.
    __meta_kubernetes_pod_host_ip: The current host IP of the pod object.
    __meta_kubernetes_pod_uid: The UID of the pod object.
    __meta_kubernetes_pod_controller_kind: Object kind of the pod controller.
    __meta_kubernetes_pod_controller_name: Name of the pod controller.
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    # 7.4 endpoints

    endpoints role从每个服务监听的endpoints发现。每个endpoint都会发现一个port。如果endpoint是一个pod,所有包含的容器不被绑定到一个endpoint port,也会被targets被发现。

    可用标签:

    __meta_kubernetes_namespace: The namespace of the endpoints object.
    __meta_kubernetes_endpoints_name: The names of the endpoints object.
    For all targets discovered directly from the endpoints list (those not additionally inferred from underlying pods), the following labels are attached:
    __meta_kubernetes_endpoint_hostname: Hostname of the endpoint.
    __meta_kubernetes_endpoint_node_name: Name of the node hosting the endpoint.
    __meta_kubernetes_endpoint_ready: Set to true or false for the endpoint's ready state.
    __meta_kubernetes_endpoint_port_name: Name of the endpoint port.
    __meta_kubernetes_endpoint_port_protocol: Protocol of the endpoint port.
    __meta_kubernetes_endpoint_address_target_kind: Kind of the endpoint address target.
    __meta_kubernetes_endpoint_address_target_name: Name of the endpoint address target.
    If the endpoints belong to a service, all labels of the role: service discovery are attached.
    For all targets backed by a pod, all labels of the role: pod discovery are attached.
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    # 7.5 ingress

    ingress role将会发现每个ingress。ingress在黑盒监控上比较有用。address将会被设置成ingress指定的配置。

    可使用标签:

    __meta_kubernetes_namespace: The namespace of the ingress object.
    __meta_kubernetes_ingress_name: The name of the ingress object.
    __meta_kubernetes_ingress_label_<labelname>: Each label from the ingress object.
    __meta_kubernetes_ingress_labelpresent_<labelname>: true for each label from the ingress object.
    __meta_kubernetes_ingress_annotation_<annotationname>: Each annotation from the ingress object.
    __meta_kubernetes_ingress_annotationpresent_<annotationname>: true for each annotation from the ingress object.
    __meta_kubernetes_ingress_scheme: Protocol scheme of ingress, https if TLS config is set. Defaults to http.
    __meta_kubernetes_ingress_path: Path from ingress spec. Defaults to /.
    
    1
    2
    3
    4
    5
    6
    7
    8
    # 附件: 问题记录
    # 1、kubeadm 部署k8s 1.23.1 记录
    1、kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""
    解决:
     1) 修改docker配置
      cat > /etc/docker/daemon.json <<EOF
    {
      "exec-opts": ["native.cgroupdriver=systemd"],
      "log-driver":"json-file",
      "log-opts": {"max-size":"50m", "max-file":"2"}
    }
    EOF
    
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    # 2、kube-prometheus部署
    1、The CustomResourceDefinition "prometheuses.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
    解决:
    由apply改为create创建资源
    
    1
    2
    3
    编辑 (opens new window)
    #prometheus
    上次更新: 2024/08/01, 10:26:30
    PromQL学习
    Prometheus高可用-Thanos

    ← PromQL学习 Prometheus高可用-Thanos→

    最近更新
    01
    cert-manager自动签发Lets Encrypt
    09-05
    02
    Docker构建多架构镜像
    08-02
    03
    Prometheus数据迁移至VMstorage
    08-01
    更多文章>
    Theme by Vdoing | Copyright © 2023-2024 |豫ICP备2021026650号
    • 跟随系统
    • 浅色模式
    • 深色模式
    • 阅读模式