kuberntes的node无法通过物理机网卡访问Service

Tags: kubernetes_problem 

目录

在kubernetes的一台物理机上访问service的clusterIP,没有响应,经过调查发现默认路由是”dev eth0”,而通过物理机的Underlay网卡无法访问服务。

环境

服务地址

10.254.51.153

服务的endpoint:

 192.168.67.4:8082

物理服务器eth0地址:

10.39.0.17/24

flannel0地址:

192.168.82.0/24

物理机上的默认路由是dev eth0。

现象

通过flannel0可以访问:

$curl  --interface flannel0 http://10.254.51.153/api/v1/model/namespaces/default/pods/busybox-1865195333-zkwtt/containers/busybox/metrics/cpu/usage
{
  "metrics": [
   {
    "timestamp": "2017-03-31T08:47:00Z",
    "value": 7846847
   },
   {
    "timestamp": "2017-03-31T08:48:00Z",
    "value": 7846847
   },
   {
    "timestamp": "2017-03-31T08:49:00Z",
    "value": 7846847
   }
  ],
  "latestTimestamp": "2017-03-31T08:49:00Z"
 }

通过默认路由eth0访问时:

$curl  http://10.254.51.153/api/v1/model/namespaces/default/pods/busybox-1865195333-zkwtt/containers/busybox/metrics/cpu/usage
^C  

客户端包处理日志

清空iptables:

iptables -t raw -F
iptables -t mangle -F
iptables -t filter -F
iptables -t nat -F
systemctl restart kube-proxy

添加iptables日志规则:

iptables -t raw -I OUTPUT -d 10.254.51.153 -j LOG --log-level 7 --log-prefix "raw out: "
iptables -t mangle -I OUTPUT -d 10.254.51.153 -j LOG --log-level 7 --log-prefix "mangle out: "
iptables -t nat -I OUTPUT -d 10.254.51.153 -j LOG --log-level 7 --log-prefix "nat out: "
iptables -t filter -I OUTPUT -d 10.254.51.153 -j LOG --log-level 7 --log-prefix "filter out: "
iptables -t mangle -I POSTROUTING -d 10.254.51.153 -j LOG --log-level 7 --log-prefix "mangle post: "
iptables -t nat -I POSTROUTING -d 10.254.51.153 -j LOG --log-level 7 --log-prefix "nat post: "
iptables -t nat -A POSTROUTING -d 10.254.51.153 -j LOG --log-level 7 --log-prefix "nat post: "

iptables -t raw -I OUTPUT -d 192.168.67.4 -j LOG --log-level 7 --log-prefix "raw out: "
iptables -t mangle -I OUTPUT -d 192.168.67.4 -j LOG --log-level 7 --log-prefix "mangle out: "
iptables -t nat -I OUTPUT -d 192.168.67.4 -j LOG --log-level 7 --log-prefix "nat out: "
iptables -t filter -I OUTPUT -d 192.168.67.4 -j LOG --log-level 7 --log-prefix "filter out: "
iptables -t mangle -I POSTROUTING -d 192.168.67.4 -j LOG --log-level 7 --log-prefix "mangle post: "
iptables -t nat -I POSTROUTING -d 192.168.67.4 -j LOG --log-level 7 --log-prefix "nat post: "
iptables -t nat -A POSTROUTING -d 192.168.67.4 -j LOG --log-level 7 --log-prefix "nat post: "

通过flannel0访问服务10.254.51.153时候:

Mar 31 16:57:22 slave1 kernel: raw out: IN= OUT=flannel0 SRC=192.168.82.0 DST=10.254.51.153 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23666 DF PROTO=TCP SPT=58156 DPT=80 WINDOW=28640 RES=0x00 SYN URGP=0
Mar 31 16:57:22 slave1 kernel: mangle out: IN= OUT=flannel0 SRC=192.168.82.0 DST=10.254.51.153 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23666 DF PROTO=TCP SPT=58156 DPT=80 WINDOW=28640 RES=0x00 SYN URGP=0
Mar 31 16:57:22 slave1 kernel: nat out: IN= OUT=flannel0 SRC=192.168.82.0 DST=10.254.51.153 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23666 DF PROTO=TCP SPT=58156 DPT=80 WINDOW=28640 RES=0x00 SYN URGP=0
Mar 31 16:57:22 slave1 kernel: filter out: IN= OUT=flannel0 SRC=192.168.82.0 DST=192.168.67.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23666 DF PROTO=TCP SPT=58156 DPT=8082 WINDOW=28640 RES=0x00 SYN URGP=0
Mar 31 16:57:22 slave1 kernel: mangle post: IN= OUT=flannel0 SRC=192.168.82.0 DST=192.168.67.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23666 DF PROTO=TCP SPT=58156 DPT=8082 WINDOW=28640 RES=0x00 SYN URGP=0
Mar 31 16:57:22 slave1 kernel: nat post: IN= OUT=flannel0 SRC=192.168.82.0 DST=192.168.67.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23666 DF PROTO=TCP SPT=58156 DPT=8082 WINDOW=28640 RES=0x00 SYN URGP=0

通过eth0访问服务10.254.51.153时候:

Mar 31 16:38:42 slave1 kernel: raw out: IN= OUT=eth0 SRC=10.39.0.17 DST=10.254.51.153 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=46455 DF PROTO=TCP SPT=34397 DPT=80 WINDOW=29200 RES=0x00 SYN URGP=0
Mar 31 16:38:42 slave1 kernel: mangle out: IN= OUT=eth0 SRC=10.39.0.17 DST=10.254.51.153 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=46455 DF PROTO=TCP SPT=34397 DPT=80 WINDOW=29200 RES=0x00 SYN URGP=0
Mar 31 16:38:42 slave1 kernel: nat out: IN= OUT=eth0 SRC=10.39.0.17 DST=10.254.51.153 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=46455 DF PROTO=TCP SPT=34397 DPT=80 WINDOW=29200 RES=0x00 SYN URGP=0
Mar 31 16:38:42 slave1 kernel: filter out: IN= OUT=eth0 SRC=10.39.0.17 DST=192.168.67.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=46455 DF PROTO=TCP SPT=34397 DPT=8082 WINDOW=29200 RES=0x00 SYN URGP=0
Mar 31 16:38:42 slave1 kernel: mangle post: IN= OUT=flannel0 SRC=10.39.0.17 DST=192.168.67.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=46455 DF PROTO=TCP SPT=34397 DPT=8082 WINDOW=29200 RES=0x00 SYN URGP=0
Mar 31 16:38:42 slave1 kernel: nat post: IN= OUT=flannel0 SRC=10.39.0.17 DST=192.168.67.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=46455 DF PROTO=TCP SPT=34397 DPT=8082 WINDOW=29200 RES=0x00 SYN URGP=0

可以看到两者最大的区别是SRC地址,通过flannel0出去的报文原地址是192.168.82.0,通过eth0出去的报文源地址是10.39.0.17

初步判断,通过eth0发出的报文源地址是eth0的IP,但是在经过了DNAT之后,目标地址是flannel0网段IP,所以报文被通过flannel0送出,但是源IP却不是flannel0网段的,所以迟迟地收不到回应包。

在flannel0上抓包,发现第一个syn报文被送出,但一直没有收到回应。

注意:抓包时看到的链路层协议是raw ip,也就是没有链路层头的报文,因为flannel0是一个TUN设备,处理的是没有二层头的三层包

对比客户端与endpoint端的报文情况

通过flannel0访问Service

客户端:

10:03:44.685400 IP 192.168.82.0.37025 > 192.168.67.4.us-cli: Flags [S], seq 2191192236, win 28640, options [mss 1432,sackOK,TS val 230893328 ecr 0,nop,wscale 7], length 0
10:03:44.686347 IP 192.168.67.4.us-cli > 192.168.82.0.37025: Flags [S.], seq 2617803445, ack 2191192237, win 28960, options [mss 1460,sackOK,TS val 1347268161 ecr 230893328,nop,wscale 7], length 0
10:03:44.686443 IP 192.168.82.0.37025 > 192.168.67.4.us-cli: Flags [.], ack 1, win 224, options [nop,nop,TS val 230893329 ecr 1347268161], length 0
10:03:44.686630 IP 192.168.82.0.37025 > 192.168.67.4.us-cli: Flags [P.], seq 1:176, ack 1, win 224, options [nop,nop,TS val 230893329 ecr 1347268161], length 175
10:03:44.698455 IP 192.168.67.4.us-cli > 192.168.82.0.37025: Flags [.], ack 176, win 235, options [nop,nop,TS val 1347268173 ecr 230893329], length 0
10:03:44.699126 IP 192.168.67.4.us-cli > 192.168.82.0.37025: Flags [P.], seq 1:396, ack 176, win 235, options [nop,nop,TS val 1347268174 ecr 230893329], length 395
10:03:44.699200 IP 192.168.82.0.37025 > 192.168.67.4.us-cli: Flags [.], ack 396, win 233, options [nop,nop,TS val 230893342 ecr 1347268174], length 0
10:03:44.699489 IP 192.168.82.0.37025 > 192.168.67.4.us-cli: Flags [F.], seq 176, ack 396, win 233, options [nop,nop,TS val 230893342 ecr 1347268174], length 0
10:03:44.701185 IP 192.168.67.4.us-cli > 192.168.82.0.37025: Flags [F.], seq 396, ack 177, win 235, options [nop,nop,TS val 1347268176 ecr 230893342], length 0
10:03:44.701248 IP 192.168.82.0.37025 > 192.168.67.4.us-cli: Flags [.], ack 397, win 233, options [nop,nop,TS val 230893344 ecr 1347268176], length 0

endpoint端:

10:03:44.636957 IP 192.168.82.0.37025 > 192.168.67.4.us-cli: Flags [S], seq 2191192236, win 28640, options [mss 1432,sackOK,TS val 230893328 ecr 0,nop,wscale 7], length 0
10:03:44.637063 IP 192.168.67.4.us-cli > 192.168.82.0.37025: Flags [S.], seq 2617803445, ack 2191192237, win 28960, options [mss 1460,sackOK,TS val 1347268161 ecr 230893328,nop,wscale 7], length 0
10:03:44.638093 IP 192.168.82.0.37025 > 192.168.67.4.us-cli: Flags [.], ack 1, win 224, options [nop,nop,TS val 230893329 ecr 1347268161], length 0
10:03:44.649241 IP 192.168.82.0.37025 > 192.168.67.4.us-cli: Flags [P.], seq 1:176, ack 1, win 224, options [nop,nop,TS val 230893329 ecr 1347268161], length 175
10:03:44.649266 IP 192.168.67.4.us-cli > 192.168.82.0.37025: Flags [.], ack 176, win 235, options [nop,nop,TS val 1347268173 ecr 230893329], length 0
10:03:44.650030 IP 192.168.67.4.us-cli > 192.168.82.0.37025: Flags [P.], seq 1:396, ack 176, win 235, options [nop,nop,TS val 1347268174 ecr 230893329], length 395
10:03:44.651534 IP 192.168.82.0.37025 > 192.168.67.4.us-cli: Flags [.], ack 396, win 233, options [nop,nop,TS val 230893342 ecr 1347268174], length 0
10:03:44.651596 IP 192.168.82.0.37025 > 192.168.67.4.us-cli: Flags [F.], seq 176, ack 396, win 233, options [nop,nop,TS val 230893342 ecr 1347268174], length 0
10:03:44.651682 IP 192.168.67.4.us-cli > 192.168.82.0.37025: Flags [F.], seq 396, ack 177, win 235, options [nop,nop,TS val 1347268176 ecr 230893342], length 0
10:03:44.652512 IP 192.168.82.0.37025 > 192.168.67.4.us-cli: Flags [.], ack 397, win 233, options [nop,nop,TS val 230893344 ecr 1347268176], length 0

通过default路由eth0时

客户端:

 $tcpdump -i flannel0
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on flannel0, link-type RAW (Raw IP), capture size 65535 bytes
 10:00:18.136220 IP 10.39.0.17.55735 > 192.168.67.4.us-cli: Flags [S], seq 2791854768, win 29200, options [mss 1460,sackOK,TS val 230686779 ecr 0,nop,wscale 7], length 0
 10:00:19.136867 IP 10.39.0.17.55735 > 192.168.67.4.us-cli: Flags [S], seq 2791854768, win 29200, options [mss 1460,sackOK,TS val 230687780 ecr 0,nop,wscale 7], length 0
 10:00:21.140809 IP 10.39.0.17.55735 > 192.168.67.4.us-cli: Flags [S], seq 2791854768, win 29200, options [mss 1460,sackOK,TS val 230689784 ecr 0,nop,wscale 7], length 00

endpoint端:

 $tcpdump -i flannel0
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on flannel0, link-type RAW (Raw IP), capture size 65535 bytes
 10:00:18.088526 IP 10.39.0.17.55735 > 192.168.67.4.us-cli: Flags [S], seq 2791854768, win 29200, options [mss 1460,sackOK,TS val 230686779 ecr 0,nop,wscale 7], length 0
 10:00:19.089163 IP 10.39.0.17.55735 > 192.168.67.4.us-cli: Flags [S], seq 2791854768, win 29200, options [mss 1460,sackOK,TS val 230687780 ecr 0,nop,wscale 7], length 0
 10:00:21.093064 IP 10.39.0.17.55735 > 192.168.67.4.us-cli: Flags [S], seq 2791854768, win 29200, options [mss 1460,sackOK,TS val 230689784 ecr 0,nop,wscale 7], length 00

可以看到,当flannel0收到源IP不属于flannel0管理的网段的报文时没有回应SYN。

也就是说当flannel0收到的报文的源IP是underlay网络的IP时,报文将不被处理。


kubernetes_problem

  1. kubernetes ingress-nginx 启用 upstream 长连接,需要注意,否则容易 502
  2. kubernetes ingress-nginx 的 canary 影响指向同一个 service 的所有 ingress
  3. ingress-nginx 启用 tls 加密,配置了不存在的证书,导致 unable to get local issuer certificate
  4. https 协议访问,误用 http 端口,CONNECT_CR_SRVR_HELLO: wrong version number
  5. Kubernetes ingress-nginx 4 层 tcp 代理,无限重试不存在的地址,高达百万次
  6. Kubernetes 集群中个别 Pod 的 CPU 使用率异常高的问题调查
  7. Kubernetes 集群 Node 间歇性变为 NotReady 状态: IO 负载高,延迟严重
  8. Kubernetes的nginx-ingress-controller刷新nginx的配置滞后十分钟导致504
  9. Kubernetes的Nginx Ingress 0.20之前的版本,upstream的keep-alive不生效
  10. Kubernetes node 的 xfs文件系统损坏,kubelet主动退出且重启失败,恢复后无法创建pod
  11. Kubernetes的Pod无法删除,glusterfs导致docker无响应,集群雪崩
  12. Kubernetes集群node无法访问service: kube-proxy没有正确设置cluster-cidr
  13. Kubernetes集群node上的容器无法ping通外网: iptables snat规则缺失导致
  14. Kubernetes问题调查: failed to get cgroup stats for /systemd/system.slice
  15. Kubelet1.7.16使用kubeconfig时,没有设置--require-kubeconfig,导致node不能注册
  16. Kubelet从1.7.16升级到1.9.11,Sandbox以外的容器都被重建的问题调查
  17. Kubernetes: 内核参数rp_filter设置为Strict RPF,导致Service不通
  18. Kubernetes使用过程中遇到的一些问题与解决方法
  19. Kubernetes集群节点被入侵挖矿,CPU被占满
  20. kubernetes的node上的重启linux网络服务后,pod无法联通
  21. kubernetes的pod因为同名Sandbox的存在,一直无法删除
  22. kubelet升级,导致calico中存在多余的workloadendpoint,node上存在多余的veth设备
  23. 使用petset创建的etcd集群在kubernetes中运行失败
  24. Kubernetes 容器启动失败: unable to create nf_conn slab cache
  25. 未在calico中创建hostendpoint,导致开启隔离后,在kubernetes的node上无法访问pod
  26. calico分配的ip冲突,pod内部arp记录丢失,pod无法访问外部服务
  27. kubernetes的dnsmasq缓存查询结果,导致pod偶尔无法访问域名
  28. k8s: rbd image is locked by other nodes
  29. kuberntes的node无法通过物理机网卡访问Service

推荐阅读

Copyright @2011-2019 All rights reserved. 转载请添加原文连接,合作请加微信lijiaocn或者发送邮件: [email protected],备注网站合作

友情链接:  李佶澳的博客  系统软件  程序语言  运营经验  关注方向  水库文集  网文收藏  网络课程  发现知识星球  百度搜索 谷歌搜索