Kubernetes homelab migration to the latest version of Rocky Linux.
The Upgrade Plan
We are going to upgrade our Kubernetes homelab nodes from Rocky 8 to Rocky 9.
We have a cluster of six nodes, three control planes and three worker nodes, all of which are KVM guests running Rocky 8.
We will upgrade the control plane nodes first, one at a time using Packer images and Ansible playbooks, and then upgrade the worker nodes, also one at a time, using the same approach.
This is a lengthy but zero-downtime process, and does not require re-building the Kubernetes cluster from scratch. Note that will not be upgrading Kubernetes version.
Software version before the upgrade:
- Rocky 8
- Containerd 1.6
- Kubernetes 1.26
- Calico 3.25
- Istio 1.17
Software versions after the upgrade:
- Rocky 9
- Containerd 1.6
- Kubernetes 1.26
- Calico 3.25
- Istio 1.17
SELinux is set to enforcing mode.
Configuration Files
For Packer setup, see Github repository here.
For Ansible playbooks, see GitHub repository here.
Cluster Information
$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME srv31 Ready control-plane 347d v1.26.4 10.11.1.31 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20 srv32 Ready control-plane 347d v1.26.4 10.11.1.32 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20 srv33 Ready control-plane 477d v1.26.4 10.11.1.33 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20 srv34 Ready none 477d v1.26.4 10.11.1.34 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20 srv35 Ready none 347d v1.26.4 10.11.1.35 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20 srv36 Ready none 477d v1.26.4 10.11.1.36 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
Build a Rocky 9 KVM Image with Packer
First of all, we need to build Rocky 9 KVM image using Packer.
$ git clone https://github.com/lisenet/kubernetes-homelab.git $ cd ./packer $ PACKER_LOG=1 packer build ./rocky9.json
Upgrade the First Control Plane Node
We will start with srv31.
Drain and Delete Control Plane from Kubernetes Cluster
Drain and delete the control plane from the cluster:
$ kubectl drain srv31 $ kubectl delete node srv31
Make sure the node is no longer in the Kubernetes cluster:
$ kubectl get nodes NAME STATUS ROLES AGE VERSION srv32 Ready control-plane 347d v1.26.4 srv33 Ready control-plane 477d v1.26.4 srv34 Ready none 477d v1.26.4 srv35 Ready none 347d v1.26.4 srv36 Ready none 477d v1.26.4
The cluster will remain operational as long as the other two control planes are online.
Delete Control Plane from Etcd Cluster
Etcd will have a record of all three control plane nodes. We therefore have to delete the control plane node from the Etcd cluster too.
$ kubectl get pods -n kube-system -l component=etcd -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES etcd-srv32 1/1 Running 4 (2d ago) 20d 10.11.1.32 srv32 none none etcd-srv33 1/1 Running 4 (2d ago) 20d 10.11.1.33 srv33 none none
Query the cluster for the Etcd members:
$ kubectl exec etcd-srv32 \ -n kube-system -- etcdctl \ --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/peer.crt \ --key /etc/kubernetes/pki/etcd/peer.key \ member list c36952e9f5bf4f49, started, srv33, https://10.11.1.33:2380, https://10.11.1.33:2379, false c44657d8f6e7dea5, started, srv31, https://10.11.1.31:2380, https://10.11.1.31:2379, false e279a8288f4be237, started, srv32, https://10.11.1.32:2380, https://10.11.1.32:2379, false
Delete the member for control plane srv31:
$ kubectl exec etcd-srv32 \ -n kube-system -- etcdctl \ --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/peer.crt \ --key /etc/kubernetes/pki/etcd/peer.key \ member remove c44657d8f6e7dea5 Member c44657d8f6e7dea5 removed from cluster 53e3f96426ba03f3
Delete Control Plane KVM Guest
SSH into the hypervisor where the control plane server is running, and stop the VM:
$ ssh [email protected] "virsh destroy srv31-master" Domain 'srv31-master' destroyed
Delete the current KVM snapshot (it’s the one from the previous Kubernetes upgrade):
$ ssh [email protected] "virsh snapshot-delete srv31-master --current"
Delete the control plane server image, including its storage:
$ ssh [email protected] "virsh undefine srv31-master --remove-all-storage" Domain srv31-master has been undefined Volume 'vda'(/var/lib/libvirt/images/srv31.qcow2) removed.
Create a Rocky Linux Control Plane KVM Guest
Copy Rocky 9 image that was built with Packer to the hypervisor for srv31:
$ scp ./packer/artifacts/qemu/rocky9/rocky9.qcow2 [email protected]:/var/lib/libvirt/images/srv31.qcow2
Provision a new srv31 control plane KVM guest:
$ virt-install \ --connect qemu+ssh://[email protected]/system \ --name srv31-master \ --network bridge=br0,model=virtio,mac=C0:FF:EE:D0:5E:31 \ --disk path=/var/lib/libvirt/images/srv31.qcow2,size=32 \ --import \ --ram 4096 \ --vcpus 2 \ --os-type linux \ --os-variant centos8 \ --sound none \ --rng /dev/urandom \ --virt-type kvm \ --wait 0
Once the server is up, set up passwordless root authentication and run Ansible playbook to configure Kubernetes homelab environment.
$ git clone https://github.com/lisenet/homelab-ansible.git $ cd ./homelab-ansible $ ssh-copy-id -f -i ./roles/hl.users/files/id_rsa_root.pub [email protected] $ ansible-playbook ./playbooks/configure-k8s-hosts.yml
Prepare Kubernetes Cluster for Control Plane Node to Join
SSH into a working control plane node, srv32, and re-upload certificates:
$ ssh [email protected] "kubeadm init phase upload-certs --upload-certs" [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [upload-certs] Using certificate key: d6c4506ef4f1150686b05599fe7019b3adcf914eaaba3e602a3e0d8f8efd0a78
Print the join command on the same control plane node:
$ ssh [email protected] "kubeadm token create --print-join-command" kubeadm join kubelb.hl.test:6443 --token fkfjv6.hp756ohdx6bv2hll --discovery-token-ca-cert-hash sha256:e98d5740c0ff6d5fd567cba755e27ea57fcc06fd694436a90ad632813351aae1
SSH into the newly created control plane srv31 and join the Kubernetes cluster:
$ ssh [email protected] \ "kubeadm join kubelb.hl.test:6443 --token fkfjv6.hp756ohdx6bv2hll \ --discovery-token-ca-cert-hash sha256:e98d5740c0ff6d5fd567cba755e27ea57fcc06fd694436a90ad632813351aae1 \ --control-plane \ --certificate-key d6c4506ef4f1150686b05599fe7019b3adcf914eaaba3e602a3e0d8f8efd0a78"
Restart kubelet
on srv31:
$ ssh [email protected] "systemctl restart kubelet"
Check cluster status:
$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME srv31 Ready control-plane 11m v1.26.4 10.11.1.31 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20 srv32 Ready control-plane 347d v1.26.4 10.11.1.32 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20 srv33 Ready control-plane 477d v1.26.4 10.11.1.33 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20 srv34 Ready none 477d v1.26.4 10.11.1.34 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20 srv35 Ready none 348d v1.26.4 10.11.1.35 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20 srv36 Ready none 477d v1.26.4 10.11.1.36 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
We have our very first control plane running on Rocky 9.
Repeat the process for the other two control planes, srv32 and srv33.
Do not proceed further until you upgrade all control planes:
$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME srv31 Ready control-plane 89m v1.26.4 10.11.1.31 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20 srv32 Ready control-plane 32m v1.26.4 10.11.1.32 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20 srv33 Ready control-plane 52s v1.26.4 10.11.1.33 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20 srv34 Ready none 477d v1.26.4 10.11.1.34 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20 srv35 Ready none 348d v1.26.4 10.11.1.35 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20 srv36 Ready none 477d v1.26.4 10.11.1.36 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
$ kubectl -n kube-system get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-kube-controllers-57b57c56f-l9ps5 1/1 Running 0 24m 192.168.134.2 srv31 none none calico-node-4x79f 1/1 Running 0 164m 10.11.1.31 srv31 none none calico-node-54c25 1/1 Running 0 29m 10.11.1.32 srv32 none none calico-node-7fmzb 1/1 Running 1 9d 10.11.1.36 srv36 none none calico-node-hvh28 1/1 Running 0 4m39s 10.11.1.33 srv33 none none calico-node-p5vkt 1/1 Running 1 9d 10.11.1.35 srv35 none none calico-node-stfm6 1/1 Running 1 9d 10.11.1.34 srv34 none none coredns-787d4945fb-9dq4q 1/1 Running 0 110m 192.168.134.1 srv31 none none coredns-787d4945fb-k67rx 1/1 Running 0 24m 192.168.134.3 srv31 none none etcd-srv31 1/1 Running 0 157m 10.11.1.31 srv31 none none etcd-srv32 1/1 Running 0 26m 10.11.1.32 srv32 none none etcd-srv33 1/1 Running 0 4m36s 10.11.1.33 srv33 none none kube-apiserver-srv31 1/1 Running 6 164m 10.11.1.31 srv31 none none kube-apiserver-srv32 1/1 Running 4 29m 10.11.1.32 srv32 none none kube-apiserver-srv33 1/1 Running 0 4m38s 10.11.1.33 srv33 none none kube-controller-manager-srv31 1/1 Running 0 164m 10.11.1.31 srv31 none none kube-controller-manager-srv32 1/1 Running 0 29m 10.11.1.32 srv32 none none kube-controller-manager-srv33 1/1 Running 0 4m38s 10.11.1.33 srv33 none none kube-proxy-5d25q 1/1 Running 0 4m39s 10.11.1.33 srv33 none none kube-proxy-bpbrc 1/1 Running 0 29m 10.11.1.32 srv32 none none kube-proxy-ltssd 1/1 Running 1 9d 10.11.1.36 srv36 none none kube-proxy-rqmk6 1/1 Running 0 164m 10.11.1.31 srv31 none none kube-proxy-z9wg2 1/1 Running 2 9d 10.11.1.35 srv35 none none kube-proxy-zkj8c 1/1 Running 1 9d 10.11.1.34 srv34 none none kube-scheduler-srv31 1/1 Running 0 164m 10.11.1.31 srv31 none none kube-scheduler-srv32 1/1 Running 0 29m 10.11.1.32 srv32 none none kube-scheduler-srv33 1/1 Running 0 4m38s 10.11.1.33 srv33 none none metrics-server-77dff74649-lkhll 1/1 Running 0 146m 192.168.135.194 srv34 none none
Upgrade Worker Nodes
We will start with srv34.
Drain and Delete Worker Node from Kubernetes Cluster
$ kubectl drain srv34 --delete-emptydir-data --ignore-daemonsets $ kubectl delete node srv34
Make sure the node is no longer in the Kubernetes cluster:
$ kubectl get nodes NAME STATUS ROLES AGE VERSION srv31 Ready control-plane 89m v1.26.4 srv32 Ready control-plane 32m v1.26.4 srv33 Ready control-plane 52s v1.26.4 srv35 Ready none 348d v1.26.4 srv36 Ready none 477d v1.26.4
Stop the server:
$ ssh [email protected] "virsh destroy srv34-node" Domain srv34-node destroyed
Delete the current snapshot:
$ ssh [email protected] "virsh snapshot-delete srv34-node --current"
Delete the server, including its storage:
$ ssh [email protected] "virsh undefine srv34-node --remove-all-storage" Domain srv34-node has been undefined Volume 'vda'(/var/lib/libvirt/images/srv34.qcow2) removed.
Create a Rocky Linux Worker Node KVM Guest
Copy Rocky 9 image that was built with Packer to the hypervisor for srv34:
$ scp ./packer/artifacts/qemu/rocky9/rocky9.qcow2 [email protected]:/var/lib/libvirt/images/srv34.qcow2
Provision a new srv34 worker node KVM guest:
$ virt-install \ --connect qemu+ssh://[email protected]/system \ --name srv34-node \ --network bridge=br0,model=virtio,mac=C0:FF:EE:D0:5E:34 \ --disk path=/var/lib/libvirt/images/srv34.qcow2,size=32 \ --import \ --ram 8192 \ --vcpus 4 \ --os-type linux \ --os-variant centos8 \ --sound none \ --rng /dev/urandom \ --virt-type kvm \ --wait 0
Once the server is up, set up passwordless root authentication and run Ansible playbook to configure Kubernetes homelab environment:
$ cd ./homelab-ansible $ ssh-copy-id -f -i ./roles/hl.users/files/id_rsa_root.pub [email protected] $ ansible-playbook ./playbooks/configure-k8s-hosts.yml
SSH into the newly created worker node srv34 and join the Kubernetes cluster:
$ ssh [email protected] \ "kubeadm join kubelb.hl.test:6443 --token fkfjv6.hp756ohdx6bv2hll \ --discovery-token-ca-cert-hash sha256:e98d5740c0ff6d5fd567cba755e27ea57fcc06fd694436a90ad632813351aae1 "
Restart kubelet
on srv34:
$ ssh [email protected] "systemctl restart kubelet"
Check cluster status:
$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME srv31 Ready control-plane 109m v1.26.4 10.11.1.31 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20 srv32 Ready control-plane 52m v1.26.4 10.11.1.32 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20 srv33 Ready control-plane 21m v1.26.4 10.11.1.33 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20 srv34 Ready none 38s v1.26.4 10.11.1.34 none Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 containerd://1.6.20 srv35 Ready none 348d v1.26.4 10.11.1.35 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20 srv36 Ready none 477d v1.26.4 10.11.1.36 none Rocky Linux 8.7 (Green Obsidian) 4.18.0-372.9.1.el8.x86_64 containerd://1.6.20
Repeat the process for the other two worker nodes, srv35 and srv36.
The end result should be all nodes running Rocky 9:
$ kubectl get nodes -o custom-columns=NAME:.metadata.name,VERSION:.status.nodeInfo.kubeletVersion,OS-IMAGE:.status.nodeInfo.osImage,KERNEL:.status.nodeInfo.kernelVersion NAME VERSION OS-IMAGE KERNEL srv31 v1.26.4 Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 srv32 v1.26.4 Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 srv33 v1.26.4 Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 srv34 v1.26.4 Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 srv35 v1.26.4 Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64 srv36 v1.26.4 Rocky Linux 9.2 (Blue Onyx) 5.14.0-284.18.1.el9_2.x86_64