通过Prometheus查询计算Kubernetes集群中的容器CPU、内存使用率等指标

作者: 李佶澳   转载请保留:原文地址   发布时间:2018-09-14 13:36:26 +0800

说明

Kubernetes的kubelet组件内置了cadvisor,将Node上容器的指标以Prometheus支持的格式展示,可以通过这些指标计算得到更多有用的数据。

Kubelet的Cadvisor指标获取

直接访问Kubelet的10255端口,可以读取以Prometheus支持的格式呈现的指标:

$ curl http://192.168.88.10:10255/metrics/cadvisor
# HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision.
# TYPE cadvisor_version_info gauge
cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="17.05.0-ce",kernelVersion="3.10.0-693.11.6.el7.x86_64",osVersion="CentOS Linux 7 (Core)"} 1
# HELP container_cpu_load_average_10s Value of container cpu load average over the last 10 seconds.
# TYPE container_cpu_load_average_10s gauge
container_cpu_load_average_10s{container_name="",id="/",image="",name="",namespace="",pod_name=""} 1
container_cpu_load_average_10s{container_name="POD",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod1a666636_a687_11e8_9cc4_525400160f15.slice/docker-e433276784317535e206d33e8e703a7360de86402c8b3e0b335e0d8071edde72.scope",image="registry.aliyuncs.com/archon/pause-amd64:3.0",name="k8s_POD_prometheus-node-exporter-4mck8_default_1a666636-a687-11e8-9cc4-525400160f15_1",namespace="default",pod_name="prometheus-node-exporter-4mck8"} 0
...

在Prometheus的配置文件中,配置了相关的Target之后,这些指标就可以从Prometheus中查询到。见:新型监控告警工具prometheus(普罗米修斯)入门使用(附视频讲解)

容器CPU使用率的计算

man top手册中找到了CPU使用率的定义:

1. %CPU  --  CPU Usage
  The task's share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time.

  In a true SMP environment, if a process is multi-threaded and top is not operating in Threads mode, amounts greater
  than 100% may be reported.  You toggle Threads mode with the `H' inter-active command.

  Also for multi-processor environments, if Irix mode is Off, top will operate in Solaris mode where a task's cpu usage
  will be divided by the total number of CPUs.  You toggle Irix/Solaris modes with the `I' interactive command.

即在过去的一段时间里进程占用的CPU时间与CPU总时间的比率,如果有多个CPU或者多核,需要将每个CPU的时间相加。

kubelet中的cadvisor采集的指标与含义,见:Monitoring cAdvisor with Prometheus

其中有一项是:

container_cpu_usage_seconds_total 	Counter 	Cumulative cpu time consumed 	seconds

container_cpu_usage_seconds_total是container累计使用的CPU时间,用它除以CPU的总时间,就得到了容器的CPU使用率:

先计算出容器的CPU占用时间,因为Node上的CPU有多个,需要将容器在每个CPU上占用的时间累加起来:

sum(
   delta(
       container_cpu_usage_seconds_total
           {container_name="webshell",pod_name="webshell-rc-8wjhv"}[1m]
   )
) 

然后计算CPU的总时间,这里的CPU数量是容器分配到CPU数量,公式如下:

sum(
    container_spec_cpu_quota
        {container_name="webshell",pod_name="webshell-rc-8wjhv"}
) / 1000 * 60

container_spec_cpu_quota是容器的CPU配额,它的值是:为容器指定的CPU个数*100000。

将上面两个公式的结果相除,就得到了容器的CPU使用率:

sum(
   delta(
       container_cpu_usage_seconds_total
           {container_name="webshell",pod_name="webshell-rc-8wjhv"}[1m]
   )
) 
/ 
( sum(
    container_spec_cpu_quota
        {container_name="webshell",pod_name="webshell-rc-8wjhv"}
  ) / 1000 * 60
)

写成一行就是:

sum(delta(container_cpu_usage_seconds_total{container_name="webshell",pod_name="webshell-rc-8wjhv"}[1m])) / (sum(container_spec_cpu_quota{container_name="webshell",pod_name="webshell-rc-8wjhv"}) /100000 * 60)

上面使用delta()计算增量,算的是1m中内的时间变化,用rate()直接计算比率更好:

sum(rate(container_cpu_usage_seconds_total{container_name="webshell",pod_name="webshell-rc-8wjhv"}[1m])) / (sum(container_spec_cpu_quota{container_name="webshell",pod_name="webshell-rc-8wjhv"}/100000))

如果要同时计算所有容器的CPU使用率:

(sum(rate(container_cpu_usage_seconds_total{container_name!="",pod_name!=""}[1m])) by(cluster,namespace,container_name,pod_name))/(sum(container_spec_cpu_quota{container_name!="",pod_name!=""}) by(cluster,namespace,container_name,pod_name) /100000)*100

容器内存使用率的计算

容器内存使用率的计算就简单多了,直接用CPU使用量除以CPU配额即可:

container_memory_rss{container_name="webshell",pod_name="webshell-rc-8wjhv"}
/
container_spec_memory_limit_bytes{container_name="webshell",pod_name="webshell-rc-8wjhv"}

计算Node CPU的空闲率

avg(rate(node_cpu_seconds_total{mode="idle"}[1m])) by (cluster,instance,nodename) < 0.1

参考

  1. 新型监控告警工具prometheus(普罗米修斯)入门使用(附视频讲解)
  2. Monitoring cAdvisor with Prometheus

限时活动,每邀请一人即返回25元!

Copyright @2011-2018 All rights reserved. 转载请添加原文连接,合作请加微信lijiaocn或者发送邮件: lijiaocn@foxmail.com,备注网站合作 友情链接: lijiaocn github.com