K8s 多 Master 重启:流程梳理与问题排查 📅 2026/6/30 4:35:53 节点重启流程每台节点按以下步骤循环执行检查 etcd → 阻止调度 → 驱逐 Pod → 重启 → 恢复调度全部完成后再处理下一台。1.检查 etcd 状态etcd 推荐奇数节点部署以保证 quorum多数派存活时集群可正常读写。容错计算公式⌊n/2⌋ 1参考官方容错性说明。当前 3 节点需至少保证 2 个 etcd 存活。以下操作需在每台 etcd 节点上验证## 验证数据一致性 / 节点健康 # etcdctl --endpointshttps://127.0.0.1:2379 \ --cacert/etc/kubernetes/pki/etcd/ca.crt \ --cert/etc/kubernetes/pki/etcd/server.crt \ --key/etc/kubernetes/pki/etcd/server.key \ endpoint status --write-outtable ## DB SIZE: 数据库大小部署时通过 --quota-backend-bytes 设置上限默认 2G ## IS LEADER: 是否为 leader ## IS LEARNER: 是否为非投票成员worker ## RAFT TERM: leader 任期须保证各节点该值一致重启 / 网络抖动都会使其 1 -------------------------------------------------------------------------------------------------------------------------------------- | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | -------------------------------------------------------------------------------------------------------------------------------------- | https://x.x.x.x:2379 | 683c58b549788bd9 | 3.5.15 | 30 MB | true | false | 40 | 130863104 | 130863104 | | -------------------------------------------------------------------------------------------------------------------------------------- ## 验证连通性 / 响应延迟 # etcdctl --endpointshttps://127.0.0.1:2379 \ --cacert/etc/kubernetes/pki/etcd/ca.crt \ --cert/etc/kubernetes/pki/etcd/server.crt \ --key/etc/kubernetes/pki/etcd/server.key \ endpoint health --write-outtable ## HEALTH: 能否读写 ## TOOK: 读一个随机 key无错误即判定健康耗时 ≈ 网络往返 leader 心跳确认 ---------------------------------------------------- | ENDPOINT | HEALTH | TOOK | ERROR | ---------------------------------------------------- | https://x.x.x.x:2379 | true | 60.842745ms | | ---------------------------------------------------- ## 查看成员列表 # etcdctl --endpointshttps://127.0.0.1:2379 \ --cacert/etc/kubernetes/pki/etcd/ca.crt \ --cert/etc/kubernetes/pki/etcd/server.crt \ --key/etc/kubernetes/pki/etcd/server.key \ member list -w table ## STATUS: 节点状态 ## PEER ADDRS: 节点间通信地址 ## CLIENT ADDRS: 客户端API server访问地址 ## IS LEARNER: 是否为非投票成员worker ---------------------------------------------------------------------------------- | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | ---------------------------------------------------------------------------------- | xxxxxxx | started | demo-1 | https://x.x.x.x:2380 | https://x.x.x.x:2379 | false | | xxxxxxx | started | demo-2 | https://x.x.x.x:2380 | https://x.x.x.x:2379 | false | | xxxxxxx | started | demo-3 | https://x.x.x.x:2380 | https://x.x.x.x:2379 | false | ----------------------------------------------------------------------------------2.阻止 Pod 调度通过cordon标记待重启节点阻止新 Pod 调度上来注一次只操作一台节点完成该节点的 cordon → 重启 → uncordon 流程后才能处理下一台# kubectl cordon demo-13.驱逐 Podkube-apiserver/controller-manager/scheduler/etcd是 kubelet 直接管理的静态 Podstatic poddrain 不会驱逐它们。只要其余两个节点的 etcd 存活集群控制面就正常。# kubectl drain demo-1 \ --ignore-daemonsets \ --delete-emptydir-data \ --timeout300s驱逐后确认节点已停止调度# kubectl get node NAME STATUS ROLES AGE VERSION demo-1 Ready,SchedulingDisabled control-plane 336d v1.31.14 demo-2 Ready control-plane 336d v1.31.14 demo-3 Ready control-plane 336d v1.31.14常见报错驱逐超时通常是因为 PDBPodDisruptionBudget不允许驱逐例如error when evicting pods/prometheus-k8s-0 -n monitoring: Cannot evict pod as it would violate the pods disruption budget.排查并处理该节点的 PDB除临时改 PDB 外也可扩容对应副本数或手动清理 Pod但这会使 PDB 失去意义。# kubectl get pdb -n monitoring prometheus-k8s NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE prometheus-k8s 1 N/A 0 337d ## 临时将 minAvailable 调为 0结束后建议还原 # kubectl patch pdb prometheus-k8s -n monitoring --typejson -p[{op:replace,path:/spec/minAvailable,value:0}]4.重启节点# ssh demo-1 # reboot5.恢复调度并验证节点重启后确认状态恢复Ready内核版本也会变为更新后的版本再用uncordon解除调度限制# kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME demo-1 Ready,SchedulingDisabled control-plane 336d v1.31.14 x.x.x.x none Ubuntu 24.04.4 LTS 6.8.0-124-generic containerd://1.7.27 demo-2 Ready control-plane 336d v1.31.14 x.x.x.x none Ubuntu 24.04.4 LTS 6.8.0-64-generic containerd://1.7.27 demo-3 Ready control-plane 336d v1.31.14 x.x.x.x none Ubuntu 24.04.4 LTS 6.8.0-106-generic containerd://1.7.27 # kubectl uncordon demo-1 node/demo-1 uncordoned # kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME demo-1 Ready control-plane 336d v1.31.14 x.x.x.x none Ubuntu 24.04.4 LTS 6.8.0-124-generic containerd://1.7.27 demo-2 Ready control-plane 336d v1.31.14 x.x.x.x none Ubuntu 24.04.4 LTS 6.8.0-64-generic containerd://1.7.27 demo-3 Ready control-plane 336d v1.31.14 x.x.x.x none Ubuntu 24.04.4 LTS 6.8.0-106-generic containerd://1.7.276.循环操作当前节点处理完毕回到第 1 步对下一台节点重复整个流程...问题排查1.重启后 Pod 被标记为invalid1.1.问题现象重启节点后该节点所有 Pod 的 RESTARTS 列显示为invalid静态 Pod READY 列展示为 0/1。在 k8s 源码中pod 创建时间与当前时间偏差超过 2 秒即显示 invalid。# kubectl get pods -A -o wide | grep demo-3 NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE kube-system coredns-dbbb9ff68-z8wjd 1/1 Running 0 (invalid ago) 131m x.x.x.x demo-3 kube-system etcd-demo-3 0/1 Running 0 (invalid ago) 10m x.x.x.x demo-3 kube-system kube-apiserver-demo-3 0/1 Running 0 (invalid ago) 10m x.x.x.x demo-3 kube-system kube-controller-manager-demo-3 0/1 Running 0 (invalid ago) 10m x.x.x.x demo-3 kube-system kube-proxy-f2fj8 1/1 Running 1 (invalid ago) 75d x.x.x.x demo-3 kube-system kube-scheduler-demo-3 0/1 Running 0 (invalid ago) 10m x.x.x.x demo-3 kube-system node-local-dns-xkfxc 1/1 Running 3 (invalid ago) 75d x.x.x.x demo-31.2.排查过程1.2.1.校验容器与节点时间以 etcd Pod 为例其startedAtUTC比节点当前 UTC 时间晚了约 7 小时 40 分处于 未来 时间通过时间结尾的 Z 判断时间格式为 UTC## 节点当前时间CST / UTC # date Wed Jun 17 09:06:16 PM CST 2026 # date -u Wed Jun 17 01:06:16 PM UTC 2026 ## Pod 容器状态时间为 UTC # kubectl get pod -n kube-system etcd-demo-1 -o jsonpath{.status.containerStatuses} | jq . [ { ... lastState: { terminated: { exitCode: 255, finishedAt: 2026-06-17T20:46:05Z, reason: Unknown, startedAt: 2026-06-17T09:56:07Z } }, name: etcd, ready: false, restartCount: 4, state: { running: { startedAt: 2026-06-17T20:46:16Z } } } ] ## containerd 记录的容器创建/启动时间 ## https://github.com/kubernetes-sigs/cri-tools/blob/v1.26.1/cmd/crictl/container.go#L862 # crictl inspect 2ea3cdadfec6f | grep -Ei createdAt|startedAt|finishedAt createdAt: 2026-06-18T04:46:16.1787087808:00, startedAt: 2026-06-18T04:46:16.64195002208:00, finishedAt: 0001-01-01T00:00:00Z, ## 换算为 UTC 统一对比 ## containerd: 2026-06-17 20:46:16 UTC ## 节点当前: 2026-06-17 13:06:16 UTC ← 容器时间在「未来」晚 7h40m1.2.2.校验 Pod 底层容器状态排除 containerd / etcd 异常。底层容器实际处于 Running 状态# crictl ps -a | grep etcd CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD 2ea3cdadfec6f 2e96e5913fc06 Less than a second ago Running etcd 4 7451b51061d22 etcd-demo-1 fab531bce16e7 2e96e5913fc06 3 hours ago Exited etcd 3 5f897d72fd206 etcd-demo-1 ## 进程确实在跑 # ps aux | grep -v grep | grep etcd root 2089 ... etcd --advertise-client-urlshttps://x.x.x.x:2379 ... ## etcd 节点状态 # etcdctl --endpointshttps://127.0.0.1:2379 \ --cacert/etc/kubernetes/pki/etcd/ca.crt \ --cert/etc/kubernetes/pki/etcd/server.crt \ --key/etc/kubernetes/pki/etcd/server.key \ endpoint status -w table -------------------------------------------------------------------------------------------------------------------------------------- | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | -------------------------------------------------------------------------------------------------------------------------------------- | https://x.x.x.x:2379 | 683c58b549788bd9 | 3.5.15 | 30 MB | false | false | 59 | 131325454 | 131325454 | | --------------------------------------------------------------------------------------------------------------------------------------1.2.3.定位时间问题根因各节点都配了 NTP对比时间一致。这里我的想法是这样的containerd 服务本身已经很多年了大概率不会有这种 bug更可能是他创建容器时识别到的时间就是在 未来...那现在查看时间却正常了应该是被节点中配置的时间同步改回来了。所以我才通过 dmesg 看一下启动后的内核记录# dmesg -T | grep -iEw rtc|time|clock [Wed Jun 17 20:54:08 2026] vmware: Host bus clock speed read from hypervisor : 66000000 Hz [Wed Jun 17 20:54:08 2026] vmware: using clock offset of 5629468472 ns [Wed Jun 17 20:54:08 2026] PM: RTC time: 12:54:08, date: 2026-06-17 [Wed Jun 17 20:54:09 2026] PTP clock support registered ## 设置 UTC 时间 [Wed Jun 17 20:54:10 2026] rtc_cmos 00:01: setting system clock to 2026-06-17T12:54:10 UTC (1781700850) [Wed Jun 17 20:54:11 2026] Loaded X.509 cert Build time autogenerated kernel key: ... ## 内核设置完时间约 40 秒后 systemd-journald 又记录了一次时间向后跳变 [Wed Jun 17 20:54:50 2026] systemd-journald[525]: Time jumped backwards, rotating.用 journalctl 把 systemd-journald 记录的日志导出由于输出较多我做了一些精简。发现时间线为20:54:xx → 04:45:xx → 20:54:xx在主机时间被拨到未来04:45时 containerd 才启动所以创建出的容器时间也是 未来。注这份日志并不能直接证明是 VMware 改的时间。我的判断依据是时间被改为 04:45:xx 前启动的只有 ssh/cron 等系统服务他们不没有改时间的能力而 VGAuthService 是距改时最近、且具备改时能力的服务因此列为第一怀疑对象后续治本方案也验证了这一点。## 内容较多输出到文件后截取必要片段 # journalctl -b --no-pager journalctl.txt ## -b: 只看本次开机后的日志; ## --no-pager: 不分页全部输出 ## 50 秒前后可看到明显的时间差异20:54:xx → 04:45:xx → 20:54:xx Jun 17 20:54:19 demo-1 kernel: DMI: VMware, Inc. VMware Virtual Platform/... Jun 17 20:54:19 demo-1 kernel: vmware: hypercall mode: 0x02 Jun 17 20:54:19 demo-1 kernel: Hypervisor detected: VMware Jun 17 20:54:19 demo-1 kernel: vmware: TSC freq read from hypervisor : 2600.000 MHz Jun 17 20:54:19 demo-1 kernel: vmware: Host bus clock speed read from hypervisor : 66000000 Hz Jun 17 20:54:19 demo-1 kernel: vmware: using clock offset of 5629468472 ns Jun 17 20:54:19 demo-1 kernel: Booting paravirtualized kernel on VMware hypervisor Jun 17 20:54:26 demo-1 VGAuthService[799]: Using /var/lib/vmware/VGAuth/aliasStore for alias store root directory Jun 17 20:54:26 demo-1 VGAuthService[799]: LoadCatalogAndSchema: Using /etc/vmware-tools/vgauth/schemas for SAML schemas Jun 17 20:54:26 demo-1 VGAuthService[799]: LoadPrefs: Allowing 300 of clock skew for SAML date validation Jun 17 20:54:26 demo-1 VGAuthService[799]: SAML_Init: Using xmlsec1 1.2.39 for XML signature support Jun 17 20:54:26 demo-1 VGAuthService[799]: ServiceNetworkCreateSocketDir: Created socket directory /var/run/vmware Jun 17 20:54:26 demo-1 VGAuthService[799]: BEGIN SERVICE Jun 17 20:54:26 demo-1 systemd[1]: Starting etcd.service - Etcd Service... ## 主机时间被拨到 未来 后,containerd 才启动 Jun 18 04:45:57 demo-1 systemd-resolved[796]: Clock change detected. Flushing caches. Jun 18 04:45:57 demo-1 systemd[1]: Started kubelet.service - kubelet: The Kubernetes Node Agent. Jun 18 04:45:58 demo-1 systemd[1]: Starting containerd.service - containerd container runtime... Jun 18 04:46:16 demo-1 containerd[919]: time... msgCreateContainer ... for ContainerMetadata{Name:etcd,Attempt:4,} returns container id \2ea3cdadfec6f...\ Jun 18 04:46:16 demo-1 containerd[919]: time... msgStartContainer for \2ea3cdadfec6f...\ Jun 18 04:46:16 demo-1 systemd[1]: Started cri-containerd-2ea3cdadfec6f....scope - libcontainer container 2ea3cdadfec6f.... Jun 18 04:46:16 demo-1 containerd[919]: time... msgStartContainer for \2ea3cdadfec6f...\ returns successfully ## 时钟再次被更改 Jun 17 20:54:50 demo-1 systemd-resolved[796]: Clock change detected. Flushing caches. Jun 17 20:54:50 demo-1 systemd-journald[525]: Time jumped backwards, rotating. Jun 17 20:54:50 demo-1 systemd-timesyncd[797]: Contacted time server 91.189.91.157:123 (ntp.ubuntu.com). Jun 17 20:54:50 demo-1 systemd-timesyncd[797]: Initial clock synchronization to Wed 2026-06-17 20:54:50.365713 CST. Jun 17 20:54:50 demo-1 systemd[1]: etcd.service: Scheduled restart job, restart counter is at 2. Jun 17 20:54:50 demo-1 systemd[1]: Starting etcd.service - Etcd Service...1.3.解决方式通过上面日志输出怀疑根因是 VMware 导致的时间跳变那就有有两条路改虚拟机自身启动顺序治标改 VMware 时间同步配置治本。1.3.1.治标更改启动顺序适用于没有 VMware 宿主机权限的场景。新建一个等待时钟同步的服务让 kubelet 依赖它确保主机时间恢复正常后再拉起容器。以及 contaierd 也需要这个依赖否则会出现下一个问题。这个自定义服务具体内容不重要换成sleep 60也能达到目的。## 1. 用 Drop-In 而非直接改 kubelet.service ## /usr/lib/systemd/system/kubelet.service 归 rpm/deb 包所有升级会被覆盖; ## systemd Drop-In (/etc/systemd/system/kubelet.service.d/) 不属于任何包,类似 helm custom value. # systemctl status kubelet ## ● kubelet.service - kubelet: The Kubernetes Node Agent ## Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; preset: enabled) ## Drop-In: /usr/lib/systemd/system/kubelet.service.d/ ## └─10-kubeadm.conf # mkdir /etc/systemd/system/kubelet.service.d/ ## 2. 新建等待时钟同步服务 cat /etc/systemd/system/wait-for-clock-sync.service EOF [Unit] DescriptionWait for system clock to be synchronized Aftersystemd-timesyncd.service network-online.target Beforekubelet.service Wantsnetwork-online.target [Service] Typeoneshot RemainAfterExityes ExecStart/bin/bash -c for i in $(seq 1 60); do if timedatectl show -p NTPSynchronized --value 2/dev/null | grep -q yes; then exit 0; fi; sleep 1; done; exit 0 TimeoutStartSec70 [Install] WantedBymulti-user.target EOF ## 3. 让 kubelet 依赖它 cat /etc/systemd/system/kubelet.service.d/20-wait-time-sync.conf EOF [Unit] Afterwait-for-clock-sync.service Requireswait-for-clock-sync.service EOF ## 4. 启用 重载 # systemctl daemon-reload # systemctl enable --now wait-for-clock-sync.service ## 5. 验证依赖链 # systemctl list-dependencies --reverse wait-for-clock-sync.service # systemctl status wait-for-clock-sync.service1.3.2.治本关闭 VMware 时间同步本次环境是 VMware 与虚拟机时间不一致、启动时把虚拟机时间拨乱导致的。参考 VMware 官方博客关掉所有时间同步关闭虚拟机更改对应机器 vmx 配置开机。## 在 ESXi 宿主机上编辑对应虚拟机的 vmx # grep -i time /vmfs/volumes/data-1/demo-1/demo-1.vmx time.synchronize.continue FALSE time.synchronize.restore FALSE time.synchronize.resume.disk FALSE time.synchronize.shrink FALSE time.synchronize.tools.startup FALSE time.synchronize.tools.enable FALSE time.synchronize.resume.host FALSE## 关闭后重启dmesg 不再有时间跳变 # dmesg -T | grep -iEw rtc|time|clock [Thu Jun 18 18:14:06 2026] vmware: Host bus clock speed read from hypervisor : 66000000 Hz [Thu Jun 18 18:14:06 2026] vmware: using clock offset of 4177512396 ns [Thu Jun 18 18:14:07 2026] PM: RTC time: 10:14:06, date: 2026-06-18 [Thu Jun 18 18:14:08 2026] PTP clock support registered [Thu Jun 18 18:14:08 2026] rtc_cmos 00:01: setting system clock to 2026-06-18T10:14:08 UTC (1781777648) [Thu Jun 18 18:14:08 2026] Loaded X.509 cert Build time autogenerated kernel key: ... [Thu Jun 18 18:14:09 2026] Loaded X.509 cert Build time autogenerated kernel key: ...2.重启后容器名称被占用容器无法创建2.1.问题现象触发原因与问题 1 同源时间跳变导致 containerd 数据写入不完整根治同样用 1.3 的方案。本节讲的是重启后已经出现该症状时如何手动恢复。此问题一般由两种情况导致非原子写入containerd 创建容器时会写两条记录名称 详情如出现 断电/panic/时间跳变 等情况只写一条就会残留时间跳变本次根因。# kubectl get pods -n kube-system -o wide | grep demo-1 NAME READY STATUS RESTARTS AGE IP NODE coredns-dbbb9ff68-pzr4j 1/1 Running 4 5h8m x.x.x.x demo-1 etcd-demo-1 0/1 Unknown 8 20m x.x.x.x demo-1 kube-apiserver-demo-1 0/1 Unknown 32 20m x.x.x.x demo-1 kube-controller-manager-demo-1 0/1 Unknown 20 20m x.x.x.x demo-1 kube-proxy-jgx42 1/1 Running 0 34m x.x.x.x demo-1 kube-scheduler-demo-1 0/1 Unknown 19 20m x.x.x.x demo-1 node-local-dns-vgjlf 1/1 Running 0 28m x.x.x.x demo-12.2.排查过程2.2.1.查底层容器与日志容器已创建但处于 Exited且日志文件不存在容器并未真正起来# crictl ps -a | grep -i etcd CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD c93ac13a41b67 2e96e5913fc06 Less than a second ago Exited etcd 8 f2fc82e9a4d98 etcd-demo-1 ## 日志文件不存在 # crictl logs 624bf43aff2db FATA[0000] failed to try resolving symlinks in path /var/log/pods/kube-system_etcd-demo-1_c93ac13a41b67d89d5dbbfbc90cf9c8f/etcd/8.log: lstat /var/log/pods/kube-system_etcd-demo-1_c93ac13a41b67d89d5dbbfbc90cf9c8f/etcd/8.log: no such file or directory2.2.2.查 kubelet 日志容器由 kubelet 拉起看 kubelet 的报错。以下日志都在说同一件事kubelet 想创建etcd-demo-1但这个名字在 containerd 里已被占用reserved# journalctl -u kubelet --since 3 min ago --no-pager | grep -i etcd Jun 18 23:02:36 demo-1 kubelet[23429]: E0618 23:02:36.675072 ... RunPodSandbox from runtime service failed errrpc error: code Unknown desc failed to reserve sandbox name \etcd-demo-1_kube-system_df1fae7c70ff1a1dfc6127a8f7bf67a2_6\: name ... is reserved for \7cc9d6627964...\ Jun 18 23:02:36 demo-1 kubelet[23429]: E0618 23:02:36.675212 ... Failed to create sandbox for pod err... failed to reserve sandbox name \etcd-demo-1_kube-system_..._6\: name ... is reserved for \7cc9d6627964...\ podkube-system/etcd-demo-1 Jun 18 23:02:36 demo-1 kubelet[23429]: E0618 23:02:36.675273 ... CreatePodSandbox for pod failed err... failed to reserve sandbox name \etcd-demo-1_kube-system_..._6\: name ... is reserved for \7cc9d6627964...\ podkube-system/etcd-demo-1 Jun 18 23:02:36 demo-1 kubelet[23429]: E0618 23:02:36.675410 ... Error syncing pod, skipping errfailed to \CreatePodSandbox\ for \etcd-demo-1_kube-system(...)\ with CreatePodSandboxError: ... name ... is reserved for \7cc9d6627964...\ podkube-system/etcd-demo-12.3.解决方式补充containerd 社区已有对应 issue #108482.2.x / 2.3.x 已修复见 PR #11576升级后可避免复发。暂停 kubelet 服务后手动清理被占用的容器再重启 containerd 即可## 停 kubelet否则它会反复尝试创建 # systemctl stop kubelet ## 找到被占用的 sandbox按名字查重名 # crictl pods ## 列出所有 sandbox # crictl pods -q --name name ## 拿到对应 ID ## 删除残留 sandbox # crictl stopp $ID # crictl rmp -f $ID # systemctl restart containerd # systemctl start kubelet