Upgrading Homelab Kubernetes Cluster from 1.23 to 1.24

The most time-consuming Kubernetes upgrade to date because of dockershim.

The Upgrade Path

Our cluster was originally built using Ansible (kubeadm). We will use kubeadm upgrade to upgrade it.

We will be upgrading from:

kubeadm 1.23.5
kubelet 1.23.5
kubectl 1.23.5
kubernetes-cni 0.8.7
coredns 1.8.6
etcd 3.5.1
calico 3.23
docker-ce 20.10.13 (this will be removed)
Istio 1.13

to:

kubeadm 1.24.6
kubelet 1.24.6
kubectl 1.24.6
kubernetes-cni 1.1.1
coredns 1.8.6
etcd 3.5.3
calico 3.24
containerd 1.6.8 (this will be installed)
Istio 1.14

Breaking Changes

Docker runtime support using dockshim in the kubelet is now completely removed in 1.24. The kubelet used to have a module called “dockershim” which implements CRI support for Docker and it has seen maintenance issues in the Kubernetes community. The following dockershim related flags were removed from kubelet along with dockershim:

--experimental-dockershim-root-directory
--docker-endpoint
--image-pull-progress-deadline
--network-plugin
--cni-conf-dir
--cni-bin-dir
--cni-cache-dir
--network-plugin-mtu

We will therefore need to remove the --network-plugin flag from /var/lib/kubelet/kubeadm-flags.env file. We will also have to migrate the container runtime on each node from Docker engine to containerd.

Also, the kubeadm configuration now defaults to the containerd socket unix:///var/run/containerd/containerd.sock instead of the one for Docker.

Backup the Cluster

Kubernetes nodes run on KVM, therefore we have taken KVM snapshosts of each virtual machine before starting the upgrade.

Upgrade Control Plane Nodes

Cluster node status before proceeding:

$ kubectl get no -o wide
NAME    STATUS   ROLES                  AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                           KERNEL-VERSION              CONTAINER-RUNTIME
srv31   Ready    control-plane,master   110d   v1.23.5   10.11.1.31    none          Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   docker://20.10.13
srv32   Ready    control-plane,master   62d    v1.23.5   10.11.1.32    none          Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   docker://20.10.13
srv33   Ready    control-plane,master   191d   v1.23.5   10.11.1.33    none          Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   docker://20.10.13
srv34   Ready    none                   191d   v1.23.5   10.11.1.34    none          Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   docker://20.10.13
srv35   Ready    none                   62d    v1.23.5   10.11.1.35    none          Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   docker://20.10.13
srv36   Ready    none                   191d   v1.23.5   10.11.1.36    none          Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   docker://20.10.13

Perform kubeadm upgrade

The upgrade procedure on control plane nodes should be executed one node at a time.

We will start with the control plane srv31. For the first control plane node srv31:

$ sudo yum install -y kubeadm-1.24.6-0 --disableexcludes=kubernetes
$ kubeadm version

Verify the upgrade plan:

$ sudo kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0922 12:23:21.666475  206120 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/var/run/dockershim.sock". Please update your configuration!
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.23.5
[upgrade/versions] kubeadm version: v1.24.6
I0922 12:23:27.297081  206120 version.go:255] remote version is much newer: v1.25.2; falling back to: stable-1.24
[upgrade/versions] Target version: v1.24.6
[upgrade/versions] Latest version in the v1.23 series: v1.23.12

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       TARGET
kubelet     6 x v1.23.5   v1.23.12

Upgrade to the latest version in the v1.23 series:

COMPONENT                 CURRENT   TARGET
kube-apiserver            v1.23.5   v1.23.12
kube-controller-manager   v1.23.5   v1.23.12
kube-scheduler            v1.23.5   v1.23.12
kube-proxy                v1.23.5   v1.23.12
CoreDNS                   v1.8.6    v1.8.6
etcd                      3.5.1-0   3.5.3-0

You can now apply the upgrade by executing the following command:

	kubeadm upgrade apply v1.23.12

_____________________________________________________________________

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       TARGET
kubelet     6 x v1.23.5   v1.24.6

Upgrade to the latest stable version:

COMPONENT                 CURRENT   TARGET
kube-apiserver            v1.23.5   v1.24.6
kube-controller-manager   v1.23.5   v1.24.6
kube-scheduler            v1.23.5   v1.24.6
kube-proxy                v1.23.5   v1.24.6
CoreDNS                   v1.8.6    v1.8.6
etcd                      3.5.1-0   3.5.3-0

You can now apply the upgrade by executing the following command:

	kubeadm upgrade apply v1.24.6

_____________________________________________________________________


The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.

API GROUP                 CURRENT VERSION   PREFERRED VERSION   MANUAL UPGRADE REQUIRED
kubeproxy.config.k8s.io   v1alpha1          v1alpha1            no
kubelet.config.k8s.io     v1beta1           v1beta1             no
_____________________________________________________________________

Upgrade the cluster:

$ sudo kubeadm upgrade apply v1.24.6
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0922 12:25:57.398963  207926 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/var/run/dockershim.sock". Please update your configuration!
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.24.6"
[upgrade/versions] Cluster version: v1.23.5
[upgrade/versions] kubeadm version: v1.24.6
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]: y
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.24.6" (timeout: 5m0s)...
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/staticpods] Preparing for "etcd" upgrade
[upgrade/staticpods] Renewing etcd-server certificate
[upgrade/staticpods] Renewing etcd-peer certificate
[upgrade/staticpods] Renewing etcd-healthcheck-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2022-09-22-12-26-49/etcd.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
[apiclient] Found 3 Pods for label selector component=etcd
[upgrade/staticpods] Component "etcd" upgraded successfully!
[upgrade/etcd] Waiting for etcd to become available
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests1124044511"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Renewing apiserver-etcd-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2022-09-22-12-26-49/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
[apiclient] Found 3 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing controller-manager.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2022-09-22-12-26-49/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
[apiclient] Found 3 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing scheduler.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2022-09-22-12-26-49/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
[apiclient] Found 3 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upgrade/postupgrade] Removing the deprecated label node-role.kubernetes.io/master='' from all control plane Nodes. After this step only the label node-role.kubernetes.io/control-plane='' will be present on control plane Nodes.
[upgrade/postupgrade] Adding the new taint &Taint{Key:node-role.kubernetes.io/control-plane,Value:,Effect:NoSchedule,TimeAdded:,} to all control plane Nodes. After this step both taints &Taint{Key:node-role.kubernetes.io/control-plane,Value:,Effect:NoSchedule,TimeAdded:,} and &Taint{Key:node-role.kubernetes.io/master,Value:,Effect:NoSchedule,TimeAdded:,} should be present on control plane Nodes.
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.24.6". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

We need to upgrade our CNI provider plugin. We are going to upgrade to Calico 3.24 which has been tested against Kubernetes version 1.24.

$ kubectl apply -f https://docs.projectcalico.org/archive/v3.24/manifests/calico.yaml
poddisruptionbudget.policy/calico-kube-controllers configured
serviceaccount/calico-kube-controllers unchanged
serviceaccount/calico-node unchanged
configmap/calico-config unchanged
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/caliconodestatuses.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/ipreservations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org configured
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org configured
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers unchanged
clusterrole.rbac.authorization.k8s.io/calico-node configured
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers unchanged
clusterrolebinding.rbac.authorization.k8s.io/calico-node unchanged
daemonset.apps/calico-node configured
deployment.apps/calico-kube-controllers configured

For the other control plane nodes:

$ sudo yum install -y kubeadm-1.24.6-0 --disableexcludes=kubernetes
$ kubeadm version
$ sudo kubeadm config images pull
$ sudo kubeadm upgrade node

According to Kubernetes documentation, calling kubeadm upgrade plan and upgrading the CNI provider plugin is no longer needed.

Migrate from dockershim

Drain the nodes and upgrade kubelet and kubectl:

$ export CONTROL_PLANE="srv31"
$ kubectl drain ${CONTROL_PLANE} --ignore-daemonsets --delete-emptydir-data
$ sudo yum install -y kubelet-1.24.6-0 kubectl-1.24.6-0 kubernetes-cni-1.1.1-0.x86_64 --disableexcludes=kubernetes

Remove Docker packages and install containerd. Then change the container runtime on the node from Docker engine to containerd:

$ sudo systemctl stop kubelet
$ sudo systemctl disable docker.service --now
$ sudo yum remove docker-ce docker-ce-cli
$ sudo yum install -y containerd
$ sudo mkdir -p /etc/containerd
$ containerd config default | sudo tee /etc/containerd/config.toml
$ sudo systemctl enable containerd
$ sudo systemctl restart containerd

Configure the kubelet to use containerd as its container runtime:

$ sudo sed -i 's/--network-plugin=cni/--container-runtime=remote\ --container-runtime-endpoint=unix\:\/\/\/run\/containerd\/containerd.sock/g' /var/lib/kubelet/kubeadm-flags.env

The kubeadm tool stores the CRI socket for each host as an annotation in the node object for that host. To change it we can execute the following command on a machine that has the kubeadm /etc/kubernetes/admin.conf file:

$ sudo kubectl edit no ${CONTROL_PLANE}

Change the value of kubeadm.alpha.kubernetes.io/cri-socket from /var/run/dockershim.sock to the CRI socket path unix:///run/containerd/containerd.sock.

$ sudo systemctl daemon-reload && sudo systemctl restart kubelet
$ kubectl uncordon ${CONTROL_PLANE}

Repeat the process for control planes srv32 and srv33.

Upgrade Worker Nodes

We will start with the worker node srv34.

Upgrade kubeadm:

$ sudo yum install -y kubeadm-1.24.6-0 --disableexcludes=kubernetes
$ sudo kubeadm upgrade node

Drain the worker node:

$ export WORKER_NODE="srv34"
$ kubectl drain ${WORKER_NODE} --ignore-daemonsets --delete-emptydir-data

Upgrade kubelet and kubectl:

$ sudo yum install -y kubelet-1.24.6-0 kubectl-1.24.6-0 kubernetes-cni-1.1.1-0.x86_64 --disableexcludes=kubernetes

Remove Docker packages and install containerd. Then change the container runtime on the node from Docker engine to containerd:

$ sudo systemctl stop kubelet
$ sudo systemctl disable docker.service --now
$ sudo yum remove docker-ce docker-ce-cli
$ sudo yum install -y containerd
$ sudo mkdir -p /etc/containerd
$ containerd config default | sudo tee /etc/containerd/config.toml
$ sudo systemctl restart containerd

Configure the kubelet to use containerd as its container runtime:

$ sudo sed -i 's/--network-plugin=cni/--container-runtime=remote\ --container-runtime-endpoint=unix\:\/\/\/run\/containerd\/containerd.sock/g' /var/lib/kubelet/kubeadm-flags.env

As mentioned previously, the kubeadm tool stores the CRI socket for each host as an annotation in the node object for that host. To change it we can execute the following command on a machine that has the kubeadm /etc/kubernetes/admin.conf file:

$ sudo kubectl edit no ${WORKER_NODE}

Change the value of kubeadm.alpha.kubernetes.io/cri-socket from /var/run/dockershim.sock to the CRI socket path unix:///run/containerd/containerd.sock.

$ sudo systemctl daemon-reload && sudo systemctl restart kubelet

Uncordon the worker node:

$ kubectl uncordon ${WORKER_NODE}

Repeat the process for worker nodes srv35 and srv36.

Verify Cluster Status

Check cluster node status:

$ kubectl get no
NAME    STATUS   ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                           KERNEL-VERSION              CONTAINER-RUNTIME
srv31   Ready    control-plane   110d   v1.24.6   10.11.1.31    none          Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   containerd://1.6.8
srv32   Ready    control-plane   62d    v1.24.6   10.11.1.32    none          Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   containerd://1.6.8
srv33   Ready    control-plane   191d   v1.24.6   10.11.1.33    none          Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   containerd://1.6.8
srv34   Ready    none            191d   v1.24.6   10.11.1.34    none          Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   containerd://1.6.8
srv35   Ready    none            62d    v1.24.6   10.11.1.35    none          Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   containerd://1.6.8
srv36   Ready    none            191d   v1.24.6   10.11.1.36    none          Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64   containerd://1.6.8

Check Calico pods:

$ kubectl -n kube-system get po -l k8s-app=calico-node
NAME                READY   STATUS    RESTARTS   AGE
calico-node-55z4t   1/1     Running   1          113m
calico-node-592sw   1/1     Running   1          111m
calico-node-9m5zp   1/1     Running   1          109m
calico-node-nnvqn   1/1     Running   1          112m
calico-node-qx4kh   1/1     Running   1          110m
calico-node-r6t6j   1/1     Running   1          111m

Update Istio

Istio Canary updates are not great because they don’t upgrade sidecars. In production we build a new Kubernetes cluster using red/black deployment and install a new version of Istio. For the sake of the homelab environment, we will do an in-place upgrade.

Download istioctl binary:

$ wget https://github.com/istio/istio/releases/download/1.14.4/istioctl-1.14.4-linux-amd64.tar.gz
$ tar xf istioctl-1.14.4-linux-amd64.tar.gz 
$ sudo mv istioctl /usr/local/bin/
$ sudo chown root: /usr/local/bin/istioctl

Ensure that the upgrade is compatible with our environment:

$ istioctl x precheck
✔ No issues found when checking the cluster. Istio is safe to install or upgrade!

$ istioctl version
client version: 1.14.4
control plane version: 1.13.2
data plane version: 1.13.2 (15 proxies)

Generate a YAML manifest for Kubernetes:

$ git clone https://github.com/lisenet/kubernetes-homelab.git
$ cd ./kubernetes-homelab/istio
$ istioctl manifest generate -f ./istio-operator.yml > ./istio-kubernetes.yml

Upgrade Istio. The kubectl apply command may show transient errors due to resources not being available in the cluster in the correct order. If that happens, simply run the command again.

$ kubectl apply -f ./istio-kubernetes.yml

Verify:

$ kubectl get po -n istio-system
NAME                                    READY   STATUS    RESTARTS   AGE
istio-ingressgateway-658c8d864c-bdxvq   1/1     Running   0          103s
istio-ingressgateway-658c8d864c-hz96q   1/1     Running   0          103s
istiod-6f644998b7-62tsv                 1/1     Running   0          103s
istiod-6f644998b7-nmzrg                 1/1     Running   0          103s
kiali-c946fb5bc-68dp8                   1/1     Running   0          15m
prometheus-6d496598f9-tdxwl             2/2     Running   0          19m

We should see the updated version on the control plane but a bunch of old proxies (sidecars) on the data plane:

$ istioctl version
client version: 1.14.4
control plane version: 1.14.4
data plane version: 1.14.4 (2 proxies), 1.13.2 (13 proxies)

Restart all pods that have Istio sidecards running to allow them to pick up a new version of Istio. When done, we should have no old versions of proxies running:

$ istioctl version
client version: 1.14.4
control plane version: 1.14.4
data plane version: 1.14.4 (15 proxies)

References

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

https://kubernetes.io/docs/tasks/administer-cluster/migrating-from-dockershim/change-runtime-containerd/

https://github.com/kubernetes/kubernetes/pull/106907

https://istio.io/latest/docs/setup/upgrade/in-place/