June 20, 2025
CoreOS On-cluster Image Layering in OpenShift 4.19 allows modifications to node the operating system. This detailed walk through customizes the node operating system image by adding RPMs to support autofs.
In part 2 we will configure autofs and enable automatic filesystem mounting across cluster nodes.
Background
RHEL CoreOS is a container optimized operating system which is distributed via a container image. OpenShift leverages this capability to keep the operating system up to date without the need to apply individual patches to nodes in the cluster.
Typically software will not be installed directly into the host operating system, but instead provided via images running in containers managed by Kubernetes. There are cases where it may be desirable to add packages directly to the operating system, however.
The RHCOS Image Layering feature of OpenShift allows for the ability to modify the container image that provides the CoreOS operating system image in a manner compatible with the automated node lifecycle management performed by the OpenShift Machine Config Operator.
The Challenge - Missing RPMs
❓ How can I install RPMs on my OpenShift nodes?
I want to run the automount daemon on cluster nodes so I can expose dynamic NFS mounts from existing infrastructure to OpenShift workloads. Once the mounts are trigged by accessing a path on the host, the mounted filesystem can be exposed to pods via a hostPath volume, but the autofs
RPM is not on installed on the nodes.
I could run automountd in a container, but that presents some challenges like also configuring and accessing sssd
for user and automount lookups in LDAP, propagating the mounts back to the host. This would also make me responsible for building and maintaining the image that provides autofs. Having automountd running directly in the node operating system is the prefered solution in this case.
So how do I get the autofs and related RPMs installed on the nodes?
By adding the RPMs to the container image the nodes are running.
💡 Install RPMs by layering on top of the node CoreOS image.
Understanding On-Cluster Layering
Out-of-cluster layering made it possible to build and host a custom CoreOS image for nodes on an external registry.
On-Cluster Layering or OCL is a GA feature in OpenShift 4.19 that enables the building and management of custom node images to take place entirely within the cluster.
The MachineOSConfig
resource defines the production of the layered image. It specifies the following parameters:
- Association to a single
MachineConfigPool
- Registry location to push & pull the layered image
- The regsitry credentials required to push and pull
- Containerfile (Dockerfile) to build the image
The Containerfile will be embeded in the MachineOSConfig, but on it’s own it may look like this:
FROM configs AS final
RUN dnf install -y \
autofs \
openldap-clients \
&& dnf clean all \
&& ostree container commit
⭐ Tip
Using a custom layered image does result in more reboots than the default image. Certificate rotation and smaller config changes will cause reboots that are otherwise avoided and any changes to the machineconfigs require a image rebuild. Pausing the MachineConfigPool during MachineConfig changes can minimize reboots.
Preparing for On-Cluster Image Layering
Image Registry
The custom image we are creating must be pushed to a container image registry. The on-cluster registry can be used for that if it is enabled. That is what I will use here.
Creating Pull Secrets for the Image Build
To upload (push) or download (pull) an image from a registry requires a credential called a “pull-secret”.
OpenShift includes a global pull secret which is supplied during installation and stored in the openshift-config
namespace. This has privilege to download the base CoreOS image, but we also need a credential to push our custom image to a registry.
The image will be built by the builder ServiceAccount in the openshift-machine-config-operator
namespace, so we need to create a credential and associate it with this user.
Testing revealed that the same pull secret is used for pulling the standard images, so we need to combine this created secret with the global pull secret.
Demo
👀 Watch a demonstration of above.
Building the CoreOS Image
Creating the worker-automount MachineConfigPool
We need a way to associate the custom image with the nodes of our choosing. This will be done using the MachineConfigPool
(MCP) resource as shown below. This is also how we will associate the added configuration values a bit later. You can learn a bit more about MachineConfigPools in this blog post.
1apiVersion: machineconfiguration.openshift.io/v1
2kind: MachineConfigPool
3metadata:
4 annotation:
5 description: Worker nodes with automount enabled
6 labels:
7 pools.operator.machineconfiguration.openshift.io/worker-automount: ""
8 name: worker-automount
9spec:
10 machineConfigSelector:
11 matchExpressions:
12 - key: machineconfiguration.openshift.io/role
13 operator: In
14 values:
15 - worker
16 - worker-automount
17 nodeSelector:
18 matchLabels:
19 node-role.kubernetes.io/worker-automount: ""
20 paused: true
On line 19 we say that this MachineConfigPool will nclude nodes that are labeled with “node-role.kubernetes.io/worker-automount”, and on lines 15 and 16 we specify that any MachineConfig
resources labeled with either “worker” or “worker-automount” will be used for those machines. Also notice on line 20 that we are defining this pool as “paused” by default.
🔧 Creating the “worker-automount” MachineConfigPool
oc create -f machineconfigpool.yaml
Because we are not creating any MachineConfig resources yet the nodes in this pool will be configured just like any existing worker nodes, except once we create the MachineOSConfig
below the nodes in the pool will also have our custom image applied.
Creating the worker-automount MachineOSConfig
Here is the MachineOSConfig
which will be associated with the worker-automount MachineConfigPool
above. This will define how to build the image.
Using the pull secrets we created above, it will push the image to the registry at the location specified.
1apiVersion: machineconfiguration.openshift.io/v1
2kind: MachineOSConfig
3metadata:
4 name: worker-automount
5spec:
6 machineConfigPool:
7 name: worker-automount
8 containerFile:
9 - content: |-
10 FROM configs AS final
11 RUN dnf install -y \
12 autofs \
13 libsss_autofs \
14 openldap-clients \
15 && dnf clean all \
16 && ostree container commit
17 imageBuilder:
18 imageBuilderType: Job
19 baseImagePullSecret:
20 # baseImagePullSecret is the secret used to pull the base image
21 name: pull-and-push-secret
22 renderedImagePushSecret:
23 # renderedImagePushSecret is the secret used to push the custom image
24 name: push-secret
25 renderedImagePushSpec: image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/os-image:latest
On line 7 we are specifying that this MachineOSConfig is to be applied to the “worker-automount” pool. In the content section we have the Containerfile instructions. On line 21 and 24 we reference the secrets holding the necessary registry credentials, and line 25 defines where the built image will be pushed to.
🔧 Creating the “worker-automount” MachineOSConfig
oc create -f machineosconfig.yaml
🌳 In the Weeds
Here is what happens when you create a MachineOSConfig
named “worker-automount”:
- creates a
deployment/machine-os-builder
which creates apod/machine-os-builder-<hash>
pod/machine-os-builder-<hash>
waits to acquire a leasepod/machine-os-builder-<hash>
creates amachineosbuild/worker-automount-<hash>
resourcepod/machine-os-builder-<hash>
creates ajob/build-worker-automount-<hash>
job/build-worker-automount-<hash>
creates apod/build-worker-automount-<hash>
to perform the build.- This pod log shows the build progress. ^
Once the image is pushed it is visible in an ImageStream
in the openshift-machine-config-operator namespace.
oc get describe imagestream/os-image -n openshift-machine-config-operator
...
Demo
👀 Watch a demonstration of above.
Deploying the Layered Image
⚠️ Warning!
Until 🐛 OCPBUGS-56648 is repaired, it is important that the custom image is deployed to nodes before any MachineConfigs that refer to new components are added. eg. Do not
systemctl enable autofs
until the layered image having autofs installed is fully deployed.Another possible workaround would be to make this change in the Containerfile.
Node Imaging
With the necessary changes layered onto the CoreOS image and that image assocated to a pool, the next step is to add a node to the pool and cause the CoreOS image to be redeployed.
This is done by labeling the node with the role variable the pool is using to select nodes.
🔧 Adding a Node to the “worker-automount” MachineConfigPool
# 🎯 Select a test node TEST_WORKER=hub-v57jl-worker-0-jlvfs # 🏷️ Adjust node-role label & move it to worker-automount pool oc label node $TEST_WORKER \ node-role.kubernetes.io/worker- \ node-role.kubernetes.io/worker-automount='' # 🔍 worker-automount MCP now has a node count of 1 oc get mcp -o custom-columns='NAME:.metadata.name, MACHINECOUNT:.status.machineCount' NAME MACHINECOUNT master 3 worker 7 worker-automount 1
Now the update can be triggered by unpausing the pool. After this change, any further changes that affect the “worker-automount” MachineConfigPool will automatically be applied.
🔧 Unpausing the worker-automount MachineConfigPool to begin updates
oc patch machineconfigpool/worker-automount \ --type merge --patch '{"spec":{"paused":false}}'"
With the pool unpaused, the MachineConfigDaemon
running on nodes in the pool will begin applying the rendered machineconfig by cordoning and draining the node. The node will then reboot and begin running the custom image.
The machine config daemon log can be watched as the node is updated.
MCD_POD=$(oc get pods -A -l k8s-app=machine-config-daemon --field-selector=spec.host=$TEST_WORKER -o name)
oc logs -n openshift-machine-config-operator -f $MCD_POD
... 👀 look for something like this at the end...
I0611 17:26:38.595644 3512 update.go:2786] "Validated on-disk state"
I0611 17:26:38.607920 3512 daemon.go:2340] Completing update to target MachineConfig: rendered-worker-automount-5ffdffe14badbefb26817971e15627a6 / Image: image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/os-image@sha256:326d0638bb78f372e378988a4bf86c46005ccba0b269503ee05e841ea127945e
I0611 17:26:48.790395 3512 update.go:2786] "Update completed for config rendered-worker-automount-5ffdffe14badbefb26817971e15627a6 and node has been successfully uncordoned"
Once the node is back online check to confirm that the autofs rpm is now present.
# ✅ Verify that the autofs RPM now exists on the node
oc debug node/$TEST_WORKER -- chroot /host rpm -q autofs 2>/dev/null
autofs-5.1.7-60.el9.x86_64
👀 Watch a demonstration of above.
Summary
On cluster iamge layering allows for customizing of the OpenShift node operating system with the responsibility of maintaining the image adeptly managed by the cluster its self.
🚀 Now that we have added the RPMs to support it, we can proceed to configure autofs by teaching sssd
how to talk to our LDAP server to look up users and automount maps, and by accounting for CoreOS requirements.
Be sure to checkout part 2 of this series where we will do exactly that!