Personally, Rocky Linux 8 is the most anticipated software release of the year!
The Upgrade Plan
We are going to upgrade our Kubernetes homelab nodes from CentOS 7 to Rocky Linux 8.
We have a cluster of six nodes, three control planes and three worker nodes, all of which are KVM guests running CentOS 7.
We will upgrade the control plane nodes first, one at a time using PXE boot and ansible playbooks, and then upgrade the worker nodes, also one at a time, using the same approach.
This is a lengthy process but does not require re-building the cluster from scratch. Note that will not be upgrading Kubernetes version or software components like Docker.
Software version before the upgrade:
- CentOS 7
- Docker 20.10
- Kubernetes 1.21.1
- Calico 3.19
- Istio 1.9
Software versions after the upgrade:
- Rocky Linux 8
- Docker 20.10
- Kubernetes 1.21.1
- Calico 3.19
- Istio 1.9
SELinux is set to enforcing mode.
Configuration Files
For PXE boot setup, see here.
For Rocky Linux 8 kickstart file, see GitHub repository here.
For Ansible playbooks, see GitHub repository here.
Caveats
RHEL 8 comes with firewalld that uses nftables by default. Depending on a CNI, you may or may not have hard time making Kubernetes pod-to-pod communication work with nftables.
Calico IptablesBackend specifies which backend of iptables will be used. The default is legacy. We will therefore delete firewalld from all Rocky Linux 8 nodes, what in turn will remove nftables package as a dependency. This configuration is not supported and has not been tested outside of the homelab environment.
Disclaimer
THIS IS NOT SUPPORTED.
Cluster Information
$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME srv31 Ready control-plane,master 123d v1.21.1 10.11.1.31 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7 srv32 Ready control-plane,master 123d v1.21.1 10.11.1.32 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7 srv33 Ready control-plane,master 123d v1.21.1 10.11.1.33 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7 srv34 Ready none 123d v1.21.1 10.11.1.34 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7 srv35 Ready none 95d v1.21.1 10.11.1.35 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7 srv36 Ready none 95d v1.21.1 10.11.1.36 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7
Upgrade the First Control Plane Node
We will start with srv31.
Drain and Delete Control Plane from Kubernetes Cluster
Drain and delete the control plane from the cluster:
$ kubectl drain srv31 $ kubectl delete node srv31
Make sure the node is no longer in the Kubernetes cluster:
$ kubectl get nodes NAME STATUS ROLES AGE VERSION srv32 Ready control-plane,master 123d v1.21.1 srv33 Ready control-plane,master 123d v1.21.1 srv34 Ready none 123d v1.21.1 srv35 Ready none 95d v1.21.1 srv36 Ready none 95d v1.21.1
The cluster will remain operational as long as the other two control planes are online.
Delete Control Plane from Etcd Cluster
Etcd will have a record of all three control plane nodes. We therefore have to delete the control plane node from the Etcd cluster too.
$ kubectl get pods -n kube-system -l component=etcd -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES etcd-srv32 1/1 Running 2 9d 10.11.1.32 srv32 none none etcd-srv33 1/1 Running 2 9d 10.11.1.33 srv33 none none
Query the cluster for the Etcd members:
$ kubectl exec etcd-srv32 \ -n kube-system -- etcdctl \ --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/peer.crt \ --key /etc/kubernetes/pki/etcd/peer.key \ member list 24b959b3d5fb9579, started, srv32, https://10.11.1.32:2380, https://10.11.1.32:2379, false 4a9dc4303465abc8, started, srv31, https://10.11.1.31:2380, https://10.11.1.31:2379, false d60055f923c49949, started, srv33, https://10.11.1.33:2380, https://10.11.1.33:2379, false
Delete the member for control plane srv31:
$ kubectl exec etcd-srv32 \ -n kube-system -- etcdctl \ --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/peer.crt \ --key /etc/kubernetes/pki/etcd/peer.key \ member remove 4a9dc4303465abc8 Member 4a9dc4303465abc8 removed from cluster 53e3f96426ba03f3
Delete Control Plane KVM Guest
SSH into the hypervisor where the control plane server is running, and stop the VM:
$ ssh [email protected] "virsh destroy srv31-master"
Delete the current KVM snapshot (it’s the one from the previous Kubernetes upgrade):
$ ssh [email protected] "virsh snapshot-delete srv31-master --current"
Delete the control plane server image, including its storage:
$ ssh [email protected] "virsh undefine srv31-master --remove-all-storage" Domain srv31-master has been undefined Volume 'vda'(/var/lib/libvirt/images/srv31.qcow2) removed.
Create a Rocky Linux KVM Guest
Provision a new control plane KVM guest using PXE boot:
$ virt-install \ --connect qemu+ssh://[email protected]/system \ --name srv31-master \ --network bridge=br0,model=virtio,mac=C0:FF:EE:D0:5E:31 \ --disk path=/var/lib/libvirt/images/srv31.qcow2,size=16 \ --pxe \ --ram 4096 \ --vcpus 2 \ --os-type linux \ --os-variant centos7.0 \ --sound none \ --rng /dev/urandom \ --virt-type kvm \ --wait 0
Once the server is up, set up passwordless root authentication and run Ansible playbook to configure Kubernetes homelab environment:
$ cd kubernetes-homelab/ansible $ ssh-copy-id -f -i ./roles/hl.users/files/id_rsa_root.pub [email protected] $ ansible-playbook playbooks/main-k8s-hosts.yml
Prepare Kubernetes Cluster for Control Plane Node to Join
SSH into a working control plane node, srv32, and re-upload certificates:
$ ssh [email protected] "kubeadm init phase upload-certs --upload-certs" [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [upload-certs] Using certificate key: 7118834f8c6ae140c574e44a495fe51705c869a91508e8152871ca26a500a440
Print the join command on the same control plane node:
$ ssh [email protected] "kubeadm token create --print-join-command" kubeadm join kubelb.hl.test:6443 --token 1spj1c.zzzeydqo3yhvvaoy --discovery-token-ca-cert-hash sha256:f2e8bdc45d591d475c84a7cf69d56ba056ba034febe1561e7f77641d869ab0c5
SSH into the newly created control plane srv31 and join the Kubernetes cluster:
$ ssh [email protected] \ "kubeadm join kubelb.hl.test:6443 --token 1spj1c.zzzeydqo3yhvvaoy \ --discovery-token-ca-cert-hash sha256:f2e8bdc45d591d475c84a7cf69d56ba056ba034febe1561e7f77641d869ab0c5 \ --control-plane \ --certificate-key 7118834f8c6ae140c574e44a495fe51705c869a91508e8152871ca26a500a440"
Restart kubelet:
$ ssh [email protected] "systemctl restart kubelet"
Label the node:
$ kubectl label node srv31 node-role.kubernetes.io/control-plane= $ kubectl label node srv31 node-role.kubernetes.io/master=
Check cluster status:
$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME srv31 Ready control-plane,master 14m v1.21.1 10.11.1.31 none Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 docker://20.10.7 srv32 Ready control-plane,master 123d v1.21.1 10.11.1.32 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7 srv33 Ready control-plane,master 123d v1.21.1 10.11.1.33 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7 srv34 Ready none 123d v1.21.1 10.11.1.34 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7 srv35 Ready none 95d v1.21.1 10.11.1.35 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7 srv36 Ready none 95d v1.21.1 10.11.1.36 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7
We have our very first control plane running on Rocky Linux 8!
Repeat the process for the other two control planes, srv32 and srv33.
Do no proceed further until you upgrade all control planes:
$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME srv31 Ready control-plane,master 167m v1.21.1 10.11.1.31 none Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 docker://20.10.7 srv32 Ready control-plane,master 32m v1.21.1 10.11.1.32 none Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 docker://20.10.7 srv33 Ready control-plane,master 7m42s v1.21.1 10.11.1.33 none Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 docker://20.10.7 srv34 Ready none 124d v1.21.1 10.11.1.34 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7 srv35 Ready none 95d v1.21.1 10.11.1.35 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7 srv36 Ready none 95d v1.21.1 10.11.1.36 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7
$ kubectl -n kube-system get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-kube-controllers-7f4f5bf95d-42n7z 1/1 Running 0 24m 192.168.137.36 srv35 none none calico-node-4x79f 1/1 Running 0 164m 10.11.1.31 srv31 none none calico-node-54c25 1/1 Running 0 29m 10.11.1.32 srv32 none none calico-node-7fmzb 1/1 Running 1 9d 10.11.1.36 srv36 none none calico-node-hvh28 1/1 Running 0 4m39s 10.11.1.33 srv33 none none calico-node-p5vkt 1/1 Running 1 9d 10.11.1.35 srv35 none none calico-node-stfm6 1/1 Running 1 9d 10.11.1.34 srv34 none none coredns-85d9df8444-897jj 1/1 Running 0 110m 192.168.137.34 srv35 none none coredns-85d9df8444-stn4d 1/1 Running 0 24m 192.168.137.37 srv35 none none etcd-srv31 1/1 Running 0 157m 10.11.1.31 srv31 none none etcd-srv32 1/1 Running 0 26m 10.11.1.32 srv32 none none etcd-srv33 1/1 Running 0 4m36s 10.11.1.33 srv33 none none kube-apiserver-srv31 1/1 Running 6 164m 10.11.1.31 srv31 none none kube-apiserver-srv32 1/1 Running 4 29m 10.11.1.32 srv32 none none kube-apiserver-srv33 1/1 Running 0 4m38s 10.11.1.33 srv33 none none kube-controller-manager-srv31 1/1 Running 0 164m 10.11.1.31 srv31 none none kube-controller-manager-srv32 1/1 Running 0 29m 10.11.1.32 srv32 none none kube-controller-manager-srv33 1/1 Running 0 4m38s 10.11.1.33 srv33 none none kube-proxy-5d25q 1/1 Running 0 4m39s 10.11.1.33 srv33 none none kube-proxy-bpbrc 1/1 Running 0 29m 10.11.1.32 srv32 none none kube-proxy-ltssd 1/1 Running 1 9d 10.11.1.36 srv36 none none kube-proxy-rqmk6 1/1 Running 0 164m 10.11.1.31 srv31 none none kube-proxy-z9wg2 1/1 Running 2 9d 10.11.1.35 srv35 none none kube-proxy-zkj8c 1/1 Running 1 9d 10.11.1.34 srv34 none none kube-scheduler-srv31 1/1 Running 0 164m 10.11.1.31 srv31 none none kube-scheduler-srv32 1/1 Running 0 29m 10.11.1.32 srv32 none none kube-scheduler-srv33 1/1 Running 0 4m38s 10.11.1.33 srv33 none none
Upgrade Worker Nodes
We will start with srv34.
Drain and Delete Worker Node from Kubernetes Cluster
$ kubectl drain srv34 --delete-emptydir-data --ignore-daemonsets $ kubectl delete node srv34
Make sure the node is no longer in the Kubernetes cluster:
$ kubectl get nodes NAME STATUS ROLES AGE VERSION srv31 Ready control-plane,master 13h v1.21.1 srv32 Ready control-plane,master 11h v1.21.1 srv33 Ready control-plane,master 11h v1.21.1 srv35 Ready none 95d v1.21.1 srv36 Ready none 95d v1.21.1
Stop the server:
$ ssh [email protected] "virsh destroy srv34-node" Domain srv34-node destroyed
Delete the current snapshot:
$ ssh [email protected] "virsh snapshot-delete srv34-node --current"
Delete the server, including its storage:
$ ssh [email protected] "virsh undefine srv34-node --remove-all-storage" Domain srv34-node has been undefined Volume 'vda'(/var/lib/libvirt/images/srv34.qcow2) removed.
Create a Rocky Linux KVM Guest
Provision a new KVM guest using PXE boot:
$ virt-install \ --connect qemu+ssh://[email protected]/system \ --name srv34-node \ --network bridge=br0,model=virtio,mac=C0:FF:EE:D0:5E:34 \ --disk path=/var/lib/libvirt/images/srv34.qcow2,size=16 \ --pxe \ --ram 8192 \ --vcpus 2 \ --os-type linux \ --os-variant centos7.0 \ --sound none \ --rng /dev/urandom \ --virt-type kvm \ --wait 0
Once the server is up, set up passwordless root authentication and run Ansible playbook to configure Kubernetes homelab environment:
$ cd kubernetes-homelab/ansible $ ssh-copy-id -f -i ./roles/hl.users/files/id_rsa_root.pub [email protected] $ ansible-playbook playbooks/main-k8s-hosts.yml
SSH into the newly created worker node srv34 and join the Kubernetes cluster:
$ ssh [email protected] \ "kubeadm join kubelb.hl.test:6443 --token 1spj1c.zzzeydqo3yhvvaoy \ --discovery-token-ca-cert-hash sha256:f2e8bdc45d591d475c84a7cf69d56ba056ba034febe1561e7f77641d869ab0c5"
Restart kubelet:
$ ssh [email protected] "systemctl restart kubelet"
Check cluster status:
$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME srv31 Ready control-plane,master 14h v1.21.1 10.11.1.31 none Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 docker://20.10.7 srv32 Ready control-plane,master 12h v1.21.1 10.11.1.32 none Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 docker://20.10.7 srv33 Ready control-plane,master 12h v1.21.1 10.11.1.33 none Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 docker://20.10.7 srv34 Ready none 20m v1.21.1 10.11.1.34 none Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 docker://20.10.7 srv35 Ready none 95d v1.21.1 10.11.1.35 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7 srv36 Ready none 95d v1.21.1 10.11.1.36 none CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.7
Repeat the process for the other two worker nodes, srv35 and srv36.
The end result should be all nodes running Rocky Linux 8:
$ kubectl get nodes -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion,OS-IMAGE:.status.nodeInfo.osImage,KERNEL:.status.nodeInfo.kernelVersion NAME VERSION OS-IMAGE KERNEL srv31 v1.21.1 Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 srv32 v1.21.1 Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 srv33 v1.21.1 Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 srv34 v1.21.1 Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 srv35 v1.21.1 Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64 srv36 v1.21.1 Rocky Linux 8.4 (Green Obsidian) 4.18.0-305.3.1.el8_4.x86_64
Phew! Seems like a lot of work!
So, for your home lab you decided to stick with IPTables instead of adopting nftables based on personal preference, did I read it right? If pod-to-pod communication is disrupted by nftables but production environments require it, how do you think the enterprise addresses that new issue?
No strong preference between iptables and nftables. Rocky 8 uses nftables by default, I tried it with Calico CNI and it didn’t work for me. I’ll admit that I didn’t spend that much time debugging it to be honest as it’s only a homelab environment. I then tried it with iptables and Calico worked without issues. I switched to iptables therefore (iptables or firewalld/nftables, I don’t mind as long as it works, as it’s managed by Ansible anyway). I don’t think that Kubernetes is supported on RHEL 8 (I may be wrong), for production environments, I’d suggest RHEL 7.
In terms of large enterprises and how they address this issue, well, the way I see it, the problem isn’t Kubernetes, it’s the CNI that has to work with whatever software vendors decide to use as a backend for firewall. If you use a CNI that does not support nftables, then you either have to stick to iptables, or move to a different CNI.
I deployed on RHEL 8 on AWS Kubernetes with Calico CNI without Firewalld/nftables around September 2022. It’s supported by Redhat.
Hi Oli, thanks, I wasn’t aware. Have you got any link to Red Hat’s documentation where it says it’s a supported deployment?