March 9, 2021

OpenShift 4 extends the operator pattern introduced by CoreOS, and enables automated management of the Kubernetes cluster and the underlying resources including machine instances and operating system configuration. Operator driven over the air updates enable automated updates much like you are accustomed to receiving for your smart phone. What follows is a a technical exploration of the OpenShift over the air updates implementation.

Operators All the Way Down

What is an “Operator”?

An Operator is a method of packaging, deploying, and managing a Kubernetes application.

What is a “Kubernetes application”?

A Kubernetes application is an application that is both deployed on Kubernetes and managed using the Kubernetes APIs.

By creating APIs and Custom Resource Definitions for all aspects of a cluster, OpenShift essentially turns the cluster into a “kubernetes application”.

For example here is a list of machine configuration related CRDs now available for management by Kubernetes.

$ oc api-resources --api-group=machineconfiguration.openshift.io
NAME                      SHORTNAMES   APIGROUP                            NAMESPACED   KIND
containerruntimeconfigs   ctrcfg       machineconfiguration.openshift.io   false        ContainerRuntimeConfig
controllerconfigs                      machineconfiguration.openshift.io   false        ControllerConfig
kubeletconfigs                         machineconfiguration.openshift.io   false        KubeletConfig
machineconfigpools        mcp          machineconfiguration.openshift.io   false        MachineConfigPool
machineconfigs            mc           machineconfiguration.openshift.io   false        MachineConfig

Operators Manage the Workloads

The Operator SDK behind this work leverages technologies including Ansible, Helm, and Golang to bundle software and related operational knowledge into a Kubernetes Operator. OpenShift enables easy installation of community and ISV applications from customizatble catalog sources such as OperatorHub and Red Hat Marketplace.

After installation of a workload operator, the update or removal is managed by the Operator Lifecycle Manager or OLM. When a new release of the operator is published the OLM facilitates an update to the new operator bundle.

You can see the list of installed optional workload operators with the oc get operators command.

$ oc get operators
NAME                                                  AGE
advanced-cluster-management.open-cluster-management   65d
argocd-operator.dale                                  21d
container-security-operator.openshift-operators       54d
kubevirt-hyperconverged.openshift-cnv                 71d
podium-operator-bundle.dale                           54d
splunk-certified.splunk                               56d

This post will focus less on workload operators and more on cluster operators.

Operators Manage the Platform

There is a set of operators that do not require manual installation. These operators comprise the services such as DNS, etcd, monitoring, and cloud API automation that make up the OpenShift cluster.

These are called exposed by a the ClusterOperator custom resource. You can see them with the oc get clusteroperators command.

$ oc get clusteroperators
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.12    True        False         False      5m9s
cloud-credential                           4.6.12    True        False         False      102d
cluster-autoscaler                         4.6.12    True        False         False      102d
config-operator                            4.6.12    True        False         False      102d
console                                    4.6.12    True        False         False      2d9h
csi-snapshot-controller                    4.6.12    True        False         False      13h
dns                                        4.6.12    True        False         False      89d
etcd                                       4.6.12    True        False         False      102d
image-registry                             4.6.12    True        False         False      82d
ingress                                    4.6.12    True        False         False      7d18h
insights                                   4.6.12    True        False         False      102d
kube-apiserver                             4.6.12    True        False         False      102d
kube-controller-manager                    4.6.12    True        False         False      102d
kube-scheduler                             4.6.12    True        False         False      102d
kube-storage-version-migrator              4.6.12    True        False         False      74m
machine-api                                4.6.12    True        False         False      102d
machine-approver                           4.6.12    True        False         False      102d
machine-config                             4.6.12    True        False         False      4m30s
marketplace                                4.6.12    True        False         False      3d17h
monitoring                                 4.6.12    True        False         False      84m
network                                    4.6.12    True        False         False      102d
node-tuning                                4.6.12    True        False         False      6d18h
openshift-apiserver                        4.6.12    True        False         False      4m57s
openshift-controller-manager               4.6.12    True        False         False      10d
openshift-samples                          4.6.12    True        False         False      6d18h
operator-lifecycle-manager                 4.6.12    True        False         False      102d
operator-lifecycle-manager-catalog         4.6.12    True        False         False      102d
operator-lifecycle-manager-packageserver   4.6.12    True        False         False      7m22s
service-ca                                 4.6.12    True        False         False      102d
storage                                    4.6.12    True        False         False      77d

By examining these cluster operators with oc describe you can deduce a sense of their purpose and context. For example you can see that the “authentication” clusteroperator has a relation to the namespace “openshift-authentication”. This would be a logical place to look for further details such as events and pod logs.

$ oc describe clusteroperator/authentication

Name:         authentication
Namespace:
Labels:       <none>
Annotations:  exclude.release.openshift.io/internal-openshift-hosted: true
              include.release.openshift.io/self-managed-high-availability: true
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  snip...
Status:
  Related Objects:
    Group:      operator.openshift.io
    Name:       cluster
    Resource:   authentications
    Group:      config.openshift.io
    Name:       cluster
    Resource:   authentications
    Group:      config.openshift.io
    Name:       cluster
    Resource:   infrastructures
    Group:      config.openshift.io
    Name:       cluster
    Resource:   oauths
    Group:      route.openshift.io
    Name:       oauth-openshift
    Namespace:  openshift-authentication
    Resource:   routes
    Group:
    Name:       oauth-openshift
    Namespace:  openshift-authentication
    Resource:   services
    Group:
    Name:       openshift-authentication
    Resource:   namespaces
    Group:
    Name:       openshift-authentication-operator
    snip...
  Versions:
    Name:     oauth-apiserver
    Version:  4.6.12
    Name:     operator
    Version:  4.6.12
    Name:     oauth-openshift
    Version:  4.6.12_openshift

These operators watch for specific custom resources that they understand how to implement. For example, the authentication operator understands how to turn an authentication custom resource into an appropriately configured service.

$ oc api-resources --api-group=config.openshift.io --sort-by=name
NAME               SHORTNAMES   APIGROUP              NAMESPACED   KIND
apiservers                      config.openshift.io   false        APIServer
authentications                 config.openshift.io   false        Authentication
builds                          config.openshift.io   false        Build
clusteroperators   co           config.openshift.io   false        ClusterOperator
clusterversions                 config.openshift.io   false        ClusterVersion
consoles                        config.openshift.io   false        Console
dnses                           config.openshift.io   false        DNS
featuregates                    config.openshift.io   false        FeatureGate
images                          config.openshift.io   false        Image
infrastructures                 config.openshift.io   false        Infrastructure
ingresses                       config.openshift.io   false        Ingress
networks                        config.openshift.io   false        Network
oauths                          config.openshift.io   false        OAuth
operatorhubs                    config.openshift.io   false        OperatorHub
projects                        config.openshift.io   false        Project
proxies                         config.openshift.io   false        Proxy
schedulers                      config.openshift.io   false        Scheduler

To understand how to define an “authentication”, in addition to the product documentation, you can refer to the spec in the API documentation Or take advantage of the explain command eg. oc explain DNS; oc explain DNS.spec.

Cluster Version Operator

These cluster operators like “authentication” and “console” report their status and are kept in up to date by the Cluster Version Operator. The CVO is responsible for orchestrating over the air updates to the cluster. By acting on a payload representing the artifacts which make up an OpenShift release, the CVO ensures that everything from the RHEL CoreOS version on the nodes to the OpenShift web console are running on the same release.

So how does the CVO learn what upgrades are available?

OpenShift Update Service

The updates made available to OpenShift are published using a protocol dubbed Cincinnati. The update service takes advantage of integration tests and telemetry data to present only the updates that are as reliable as possible.

As updates graduate from candidate to fast to stable or possibly EUS they are presented for consumption in the corresponding release channel. If a an issue is discovered after release through testing or telemetry feedback, a release may be pulled and no longer show up in the graph as a possible update target. This is why it is important to never force an upgrade.

More detail may be found in the documentation and this blog post describing support for a locally hosted OpenShift Update Service.

The source of our update graph can be seen in ClusterVersion resource nameed “version”. An alternative command would be oc describe clusterversion. Notice this cluster is tracking the stable-4.6 channel and retrieving the graph from https://api.openshift.com/api/upgrades_info/v1/graph.

$ oc get ClusterVersion/version -o jsonpath="{.spec}" | jq

{
  "channel": "stable-4.6",
  "clusterID": "e640ac63-7a14-4f72-99c5-e363a071e2d8",
  "desiredUpdate": {
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:5c3618ab914eb66267b7c552a9b51c3018c3a8f8acf08ce1ff7ae4bfdd3a82bd",
    "version": "4.6.12"
  },
  "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
}

The oc adm upgrade command will use the information from that graph and present any suitable updates. Red Hat OpenShift Container Platform Update Graph generates a dynamic graphical representation of the update options.

$ oc adm upgrade
Cluster version is 4.6.12

Updates:

VERSION IMAGE
4.6.13  quay.io/openshift-release-dev/ocp-release@sha256:8a9e40df2a19db4cc51dc8624d54163bef6e88b7d88cc0f577652ba25466e338
4.6.15  quay.io/openshift-release-dev/ocp-release@sha256:b70f550e3fa94af2f7d60a3437ec0275194db36f2dc49991da2336fe21e2824c

Examining the Content of a Release Payload

We can get some info about a release, including the list of container images providing cluster operators, with the oc adm release info command.

$ oc adm release info \
quay.io/openshift-release-dev/ocp-release@sha256:8a9e40df2a19db4cc51dc8624d54163bef6e88b7d88cc0f577652ba25466e338
Name:      4.6.13
Digest:    sha256:8a9e40df2a19db4cc51dc8624d54163bef6e88b7d88cc0f577652ba25466e338
Created:   2021-01-20T13:12:02Z
OS/Arch:   linux/amd64
Manifests: 444

Pull From: quay.io/openshift-release-dev/ocp-release@sha256:8a9e40df2a19db4cc51dc8624d54163bef6e88b7d88cc0f577652ba25466e338

Release Metadata:
  Version:  4.6.13
  Upgrades: 4.5.21, 4.5.22, 4.5.23, 4.5.24, 4.5.27, 4.5.28, 4.6.1, 4.6.3, 4.6.4, 4.6.6, 4.6.8, 4.6.9, 4.6.10, 4.6.12
  Metadata:
    url: https://access.redhat.com/errata/RHBA-2021:0171

Component Versions:
  kubernetes 1.19.4
  machine-os 46.82.202101191342-0 Red Hat Enterprise Linux CoreOS

Images:
  NAME                                           DIGEST
  aws-ebs-csi-driver                             sha256:8f34b0cc5c7554bdfacee78ebcb5747c22dd1bf72eb4dd6007c35830e9062106
  aws-ebs-csi-driver-operator                    sha256:940d4757e6e0a603ef4fafd3d2772306b1d54f318696a35321e46ecaf2998284
  aws-machine-controllers                        sha256:a1d91b36f474b371120ae5af9c67ef97fb06cffe9d6c96ff0d2367c7fe239a43
  aws-pod-identity-webhook                       sha256:66bc5e8411973292c5f155ff5cb67569d2c081499dbf89f5c708504afa1ff600
  azure-machine-controllers                      sha256:f40212ae38936c094dc16dbf2f4d1fefa5f0ee6b68e78afe72af3cfc6b0ce8f6
  baremetal-installer                            sha256:8d2e5b267d9c45a2d808f6437023aceafad8c707cb94e57e325e3f0b2beadd62
  baremetal-machine-controllers                  sha256:36b69cc3989b1b486e7078b544fe7316e92d92ab0f1a139bcb7b1acdd1d54f14
  baremetal-operator                             sha256:ca65a235189a2929df9615995dd5f27a9ab3cda95a311cda9421cc21489e15be
  baremetal-runtimecfg                           sha256:34b95e9d96daeb80c08dbdf8a66f5b8d4a2e51a86879edd91d7bb0b40df67093
  cli                                            sha256:cf339d3bac5ecf4ee836f72bd1b3c4f0f7b0553dda63c3e73e6e2a7f1d1560ae
  cli-artifacts                                  sha256:32b6f0648ed05d5e0109d651d32093209de7db8a116fbe25e0250fac65bd2901
  cloud-credential-operator                      sha256:e490c84ffcfd6a07589eeb56600e04afc2e58edb3459643bd569be50e66e6061
  cluster-authentication-operator                sha256:baa7275273e6a4e2adb75aced5485c880368d9260df03f832a2b0a4c6cb194e3
  ...snip...

Pro Tip: oc adm release info quay.io/openshift-release-dev/ocp-release:4.6.13-x86_64 would have also worked.

Managing the Nodes

The machines which make up the cluster are also managed by operators. The machine-api operator interfaces with the underlying infrastructure provider (OpenStack, AWS, etc) to provision machines on Day 1 and the machine-config operator manages the resources that implement machine configuration.

Machines are provisioned by a technology called Ignition4. Ignition can be thought of as a combination of Kickstart and Cloud-Init. When a machine is booted it is provided a URL to the machine config server which serves up an Ignition config. The config will include a reference to the RHCOS image to install on the host and every file and configuration detail to be applied.

Machines are grouped into pools and subscribe to corresponding MachineConfigPools which each have a unique machine configuration. Example pools include master or worker in this example below.

Machine Config Server

Since Ignition runs in the initial RAM disk (initrd), it has the ability to make low level changes to storage and networking without requiring a reboot. If there are any errors reported by Ignition during the provisioning phase, the machine will not be placed into service.

The machine-config ClusterOperator has as an operand the machine-config-daemon. The MCD is deployed as a daemonset to all nodes and prevents configuration drift by checking in with the machine-config-server. When a change is detected between the currently applied machine configuration and the desired configuration the machine-config-controller the MCC will coordinate the application of the change in a controlled manner.

OpenShift 4 considers nodes to be immutable. That is to say, if they break or require a change they should be “replaced” or configured in whole rather than in part. A practical effect of this is that the only way to affect a change is to create a MachineConfig any change to a machine configuration requires a reapplication of all changes by way of a node reboot. The number of nodes impacted simultaneously may be configured with the maxUnavailable attribute of a MachineConfigPool which will be honored by the MCC.

Updating Node Operating Systems with libostree

As shown above, the OpenShift release image contains a reference to a specific release of RHCOS eg. machine-os 46.82.202101191342-0 Red Hat Enterprise Linux CoreOS.

RHEL CoreOS like RHEL Atomic Host uses libostree to apply operating system updates in a transactional manner. During an update the MCD downloads the container image provided in the osImageURL attribute of the Ignition config. This is image essentially a wrapper around an OSTree commit which is extracted and written to /ostree/deploy/rhcos/deploy/<hash>. The boot configuration is then updated to point to the new OSTree where it will be used upon reboot.

sh-4.4# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6069ffcc5dcd227c867a86970f3d2076221081a81b6c92fbecfd044ef160161e
              CustomOrigin: Managed by machine-config-operator
                   Version: 46.82.202101131942-0 (2021-01-13T19:45:25Z)
                    Commit: 046de410ba240083996855d797207e09374bbfdf65e5be34baa1e1f1ab49918a
                            |- art-rhaos-4.6 (2021-01-13T17:25:39Z)
                            |- rhel8-fast-datapath (2020-11-23T19:03:06Z)
                            |- rhel8-baseos (2021-01-12T14:14:01Z)
                            `- rhel8-appstream (2021-01-13T10:43:31Z)
                    Staged: no
                 StateRoot: rhcos

  pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4c5b83a192734ad6aa33da798f51b4b7ebe0f633ed63d53867a0c3fb73993024
              CustomOrigin: Managed by machine-config-operator
                   Version: 46.82.202012051820-0 (2020-12-05T18:24:10Z)
                    Commit: 7e014b64b96536487eb3701fa4f0e604697730bde2f2f9034c4f56c024c67119
                            |- art-rhaos-4.6 (2020-12-05T14:07:54Z)
                            |- rhel8-fast-datapath (2020-11-23T19:03:06Z)
                            |- rhel8-baseos (2020-11-23T17:53:53Z)
                            `- rhel8-appstream (2020-11-30T10:18:29Z)
                 StateRoot: rhcos

Machine Config Daemon Troubleshooting

The MCD is of course running in a pod, so you can spy on its logs on each node and observe its actions or check for problems. For example, while unusual (I’ve seen the following once), it adds some behind the scenes context. After finding a node as NotReady a problem with the boot config was exposed in the logs.

oc project openshift-machine-config-operator
for POD in `oc get pod -o name -l k8s-app=machine-config-daemon`; do
  echo ----
  echo Checking $POD for osImageURL mismatch
  oc get $POD -o wide
  echo
  oc logs $POD -c  machine-config-daemon  | grep "expected target osImageURL" | tail -1
done
...snip...
----
Checking pod/machine-config-daemon-nn5p8 for osImageURL mismatch
NAME                          READY   STATUS    RESTARTS   AGE   IP             NODE                  NOMINATED NODE   READINESS GATES
machine-config-daemon-nn5p8   2/2     Running   0          15m   192.168.4.63   vipi-7zc6h-master-1   <none>           <none>

E0209 18:23:14.965590  116436 writer.go:135] Marking Degraded due to: unexpected on-disk state validating against \
 rendered-master-ff38f849e21009792e7db36fe2c27ef9: expected target osImageURL "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6069ffcc5dcd227c867a86970f3d2076221081a81b6c92fbecfd044ef160161e",\
 have "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4c5b83a192734ad6aa33da798f51b4b7ebe0f633ed63d53867a0c3fb73993024"

Updating the boot config allowed the host to come back into compliance.

oc debug node/vipi-7zc6h-master-1
sh-4.4# chroot /host
sh-4.4# rpm-ostree status -v
...snip...
sh-4.4# /run/bin/machine-config-daemon pivot quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6069ffcc5dcd227c867a86970f3d207622108
1a81b6c92fbecfd044ef160161e
sh-4.4# reboot

Finally when the node is rebooted with the node annotations updated by the MachineConfigControllerare updated to reflect the state. When the values of the machineconfiguration.openshift.io/currentConfig and machineconfiguration.openshift.io/desiredConfig annotations on the Node match the MCD is happy and so are we!

I could have also just deleted the node and allowed the machine api operator to automatically replace it, but I didn’t even talk about MachineSets and node autoscaling! Maybe next time. :)

Summary

This was admittedly heavy on external links and possibly “hand wavy” at points, but I hope it provides you a better sense of just exactly how OpenShift Over The Air Updates automate the administration of your entire Kubernetes cluster.

Watch a Video Demonstration

For more information on this topic including a video demonstration of the OpenShift 4 upgrade process, please see my BrightTALK presentation from September, 2020.

How do OpenShift Over The Air Updates Work?