0%

kubernetes device or resource busy的问题

系统日志提示错误如下:

1
Apr 19 08:19:58 vpsz-dce-mgt03 kubelet: E0419 08:19:58.636618   76840 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/secret/13b322de-35d5-11eb-a480-0242ac120004-default-token-thcng\" (\"13b322de-35d5-11eb-a480-0242ac120004\")" failed. No retries permitted until 2021-04-19 08:22:00.636581297 +0800 CST m=+26338789.158428442 (durationBeforeRetry 2m2s). Error: "UnmountVolume.TearDown failed for volume \"default-token-thcng\" (UniqueName: \"kubernetes.io/secret/13b322de-35d5-11eb-a480-0242ac120004-default-token-thcng\") pod \"13b322de-35d5-11eb-a480-0242ac120004\" (UID: \"13b322de-35d5-11eb-a480-0242ac120004\") : remove /var/lib/kubelet/pods/13b322de-35d5-11eb-a480-0242ac120004/volumes/kubernetes.io~secret/default-token-thcng: device or resource busy"

解决方法:

使用脚本检查目录被谁占用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash
declare -A map
for i in `find /proc/*/mounts -exec grep $1 {} + 2>/dev/null | awk '{print $1"#"$2}'`
do
pid=`echo $i | awk -F "[/]" '{print $3}'`
point=`echo $i | awk -F "[#]" '{print $2}'`
mnt=`ls -l /proc/$pid/ns/mnt |awk '{print $11}'`
map["$mnt"]="exist"
cmd=`cat /proc/$pid/cmdline`
echo -e "$pid\t$mnt\t$cmd\t$point"
done

for i in `ps aux|grep docker-containerd-shim |grep -v "grep" |awk '{print $2}'`
do
mnt=`ls -l /proc/$i/ns/mnt 2>/dev/null | awk '{print $11}'`
if [[ "${map[$mnt]}" == "exist" ]];then
echo $mnt
fi
done
1
sh test.sh /var/lib/kubelet/pods/13b322de-35d5-11eb-a480-0242ac120004/volumes/kubernetes.io~secret/default-token-thcng

可以看到被占用:

1
82263	mnt:[4026532308]	/bin/node_exporter--path.procfs=/host/proc--path.sysfs=/host/sys--path.rootfs=/host/root--web.listen-address=0.0.0.0:9100--collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$--collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)	/host/root/var/lib/kubelet/pods/13b322de-35d5-11eb-a480-0242ac120004/volumes/kubernetes.io~secret/default-token-thcng

找到82263的父进程:

1
2
3
[root@vpsz-dce-mgt03 ~]# ps -ef |grep 82263
nfsnobo+ 82263 82237 0 2020 ? 08:51:33 /bin/node_exporter --path.procfs=/host/proc --path.sysfs=/host/sys --path.rootfs=/host/root --web.listen-address=0.0.0.0:9100 --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$ --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
root 107711 84220 0 08:34 pts/1 00:00:00 grep --color=auto 82263

继续找到8345的父进程:

1
2
3
4
[root@vpsz-dce-mgt03 g1703269]# ps -ef |grep 82237
root 82237 3452 0 2020 ? 00:00:08 docker-containerd-shim 125ebce164eb4705ab348fed5642ed649437dd0f8ad0aa2092208c55727d4e1d /var/run/docker/libcontainerd/125ebce164eb4705ab348fed5642ed649437dd0f8ad0aa2092208c55727d4e1d docker-runc
nfsnobo+ 82263 82237 0 2020 ? 08:51:33 /bin/node_exporter --path.procfs=/host/proc --path.sysfs=/host/sys --path.rootfs=/host/root --web.listen-address=0.0.0.0:9100 --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$ --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
root 114427 84220 0 08:35 pts/1 00:00:00 grep --color=auto 82237

可以看到是id为125ebce164eb4705ab348fed5642ed649437dd0f8ad0aa2092208c55727d4e1d的docker在占用