官方文档:https://kubernetes.io/docs/setup/independent/high-availability/
关闭防火墙
或者开放需要的端口 详见https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
1 2
| systemctl stop firewalld systemctl disable firewalld
|
禁用selinux
1 2 3
| # Set SELinux in permissive mode (effectively disabling it) setenforce 0 sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
|
启用net.bridge.bridge-nf-call-ip6tables和net.bridge.bridge-nf-call-iptables
1 2 3 4 5 6 7
| cat <<EOF > /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 vm.swappiness=0 EOF sysctl --system
|
禁用swap
1 2 3 4
| swapoff -a
修改/etc/fstab 文件,注释掉 SWAP 的自动挂载. 使用free -m确认swap已经关闭。
|
加载ipvs相关模块,kube-proxy开启ipvs的前置条件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| cat > /etc/sysconfig/modules/ipvs.modules <<EOF #!/bin/bash ipvs_mods_dir="/usr/lib/modules/$(uname -r)/kernel/net/netfilter/ipvs" for i in $(ls $ipvs_mods_dir | grep -o "^[^.]*"); do /sbin/modinfo -F filename $i &> /dev/null if [ $? -eq 0 ]; then /sbin/modprobe $i fi done modprobe -- ip_vs modprobe -- ip_vs_rr modprobe -- ip_vs_wrr modprobe -- ip_vs_sh modprobe -- nf_conntrack_ipv4 EOF chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4
|
上面脚本创建了的/etc/sysconfig/modules/ipvs.modules文件,保证在节点重启后能自动加载所需模块。 使用lsmod | grep -e ip_vs -e nf_conntrack_ipv4命令查看是否已经正确加载所需的内核模块。接下来还需要确保各个节点上已经安装了ipset软件包。 为了便于查看ipvs的代理规则,最好安装一下管理工具ipvsadm。
1
| yum install ipset ipvsadm -y
|
安装 docker
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
| # 安装docker yum install -y yum-utils device-mapper-persistent-data lvm2 yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# 查看docker版本 yum list docker-ce.x86_64 --showduplicates |sort -r
# Kubernetes 1.15当前支持的docker版本列表是1.13.1, 17.03, 17.06, 17.09, 18.06, 18.09 yum install -y --setopt=obsoletes=0 \ docker-ce-18.09.7-3.el7
# 启动docker systemctl enable docker systemctl start docker
# 修改docker cgroup driver为systemd 根据文档中的内容,详见https://kubernetes.io/docs/setup/production-environment/container-runtimes/,对于使用systemd作为init system的Linux的发行版,使用systemd作为docker的cgroup driver可以确保服务器节点在资源紧张的情况更加稳定。
cat > /etc/docker/daemon.json <<EOF { "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2", "storage-opts": [ "overlay2.override_kernel_check=true" ] } EOF
mkdir -p /etc/systemd/system/docker.service.d
# 重启docker systemctl restart docker
# 更改默认存储目录 vim /etc/docker/daemon.json
{"graph": "/new-path/docker"}
# 查看docker cgroup driver docker info | grep Cgroup
|
安装kubeadm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| # 添加阿里云仓库 cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/ enabled=1 gpgcheck=1 repo_gpgcheck=1 gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF
# 安装kubeadm yum install -y kubelet-1.15.0 kubeadm-1.15.0 kubectl-1.15.0
从安装结果可以看出还安装了cri-tools, kubernetes-cni, socat三个依赖:
1、官方从Kubernetes 1.14开始将cni依赖升级到了0.7.5版本 2、socat是kubelet的依赖 3、cri-tools是CRI(Container Runtime Interface)容器运行时接口的命令行工具
# 启动kubectl systemctl enable kubelet && systemctl start kubelet
|
部署master节点
官方推荐我们使用–config指定配置文件,并在配置文件中指定原来这些flag所配置的内容,详见https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/。这也是Kubernetes为了支持动态Kubelet配置,详见https://kubernetes.io/docs/tasks/administer-cluster/reconfigure-kubelet/。
使用kubeadm config print init-defaults可以打印集群初始化默认的使用的配置:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
| apiVersion: kubeadm.k8s.io/v1beta2 bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: advertiseAddress: 1.2.3.4 bindPort: 6443 nodeRegistration: criSocket: /var/run/dockershim.sock name: master taints: - effect: NoSchedule key: node-role.kubernetes.io/master --- apiServer: timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta2 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controllerManager: {} dns: type: CoreDNS etcd: local: dataDir: /var/lib/etcd imageRepository: k8s.gcr.io kind: ClusterConfiguration kubernetesVersion: v1.14.0 networking: dnsDomain: cluster.local serviceSubnet: 10.96.0.0/12 scheduler: {}
|
从默认的配置中可以看到,可以使用imageRepository定制在集群初始化时拉取k8s所需镜像的地址。基于默认配置定制出本次使用kubeadm初始化集群所需的配置文件kubeadm.yaml:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| apiVersion: kubeadm.k8s.io/v1beta2 kind: InitConfiguration localAPIEndpoint: advertiseAddress: 10.168.4.5 bindPort: 6443 nodeRegistration: taints: - effect: PreferNoSchedule key: node-role.kubernetes.io/master --- apiVersion: kubeadm.k8s.io/v1beta2 imageRepository: registry.aliyuncs.com/google_containers kind: ClusterConfiguration kubernetesVersion: v1.15.0 networking: podSubnet: 10.244.0.0/16
|
使用kubeadm默认配置初始化的集群,会在master节点打上node-role.kubernetes.io/master:NoSchedule的污点,阻止master节点接受调度运行工作负载。这里测试环境只有两个节点,所以将这个taint修改为node-role.kubernetes.io/master:PreferNoSchedule。
修改advertiseAddress api地址、imageRepository 阿里云仓库地址、podSubnet 网络地址
kubeadm初始化集群
1
| kubeadm init --config kubeadm.yaml
|
查看集群状态,确认个组件都处于healthy状态
1 2 3 4 5 6
| [root@master ~]# kubectl get cs NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health":"true"}
|
安装Pod Network
1
| kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
|
如果Node有多个网卡的话,详见https://github.com/kubernetes/kubernetes/issues/39701,目前需要在kube-flannel.yml中使用–iface参数指定集群主机内网网卡的名称,否则可能会出现dns无法解析。需要将kube-flannel.yml下载到本地,flanneld启动参数加上--iface=
1 2 3 4 5 6 7 8 9 10
| containers: - name: kube-flannel image: quay.io/coreos/flannel:v0.11.0-amd64 command: - /opt/bin/flanneld args: - --ip-masq - --kube-subnet-mgr - --iface=eth1 ......
|
测试集群DNS是否可用
1
| kubectl create curl --image=radial/busyboxplus:curl -it -rm
|
进入后执行nslookup kubernetes.default确认解析正常
如何从集群中移除Node
如果需要从集群中移除node2这个Node执行下面的命令
在master节点上执行:
1 2
| kubectl drain node2 --delete-local-data --force --ignore-daemonsets kubectl delete node node2
|
在node2上执行:
1 2 3 4 5 6
| kubeadm reset ifconfig cni0 down ip link delete cni0 ifconfig flannel.1 down ip link delete flannel.1 rm -rf /var/lib/cni/
|
在master上执行:
1
| kubectl delete node node2
|
kube-proxy开启ipvs
修改ConfigMap的kube-system/kube-proxy中的config.conf
1 2
| kubectl edit cm kube-proxy -n kube-system # 39行 mode: "" -> mode: "ipvs"
|
重启所有工作节点的kube-proxy pod
1
| kubectl get pod -n kube-system | grep kube-proxy | awk '{system("kubectl delete pod "$1" -n kube-system")}'
|
查看是否生效
1
| kubectl get pod -n kube-system | grep kube-proxy | awk '{system("kubectl delete pod "$1" -n kube-system")}'
|
查看日志
1
| kubectl -n kube-system logs kube-proxy-xddn9
|
查看ipvs规则