OpenShift Virtualization on vSphere

May 13, 2022

OpenShift Virtualization builds upon KubeVirt to provide a container native home for your virtual machine workloads. While bare metal is the only officially support platform today, this post will walk through enabling OpenShift Virtualization on vSphere in a lab environment. With nested virtualization you’ll be able to spin up containerized VMs bridged to your physical networks.

Understanding OpenShift Virtualization

Why virtual machines in containers?!

As you begin to migrate applications to a containerized, cloud-native platform like OpenShift you begin to realize benefits like portability, repeatability, and automation. Applications hosted on virtual machines may not be practical to containerize or even compatible.

OpenShift Virtualization enables you to run your virtualized workloads on the same platform powering your containerized workloads using a consistent Kubernetes interface.

To experiment with container native virtualization on vSphere let’s begin by enabling a suitable network configuration.

Configuring vSphere Networking for KubeVirt

Virtual machines will attach to the pod network by default. You likely already have several networks plumbed to your ESXi Hosts, and it’s likely you may want to attach your containerized virtual machines to these same networks. Virtual Switch Tagging (VST) and Virtual Guest Tagging (VGT) provide the ability to carry a VLAN tag all the way from the physical rack switch through the vSwitch to a guest.

Adding a PortGroup to vSwitch

Follow the VMware documentation to configure your standard or distributed vSwitch by adding a PortGroup to carry all the VLANs you would like to be present in the OpenShift Virtualization environment. Configure the portgroup to have vlan type “VLAN trunking” and specify the appropriate VLANs. For a standard vSwitch select 4095 to carry all. For a distributed vSwitch select 0-4094.

⚠️ IMPORTANT Enable promiscuous mode

Switches improve network efficiency by learning where MAC addresses are and not sending traffic where it isn’t needed. Because our virtual machines will be using MAC addresses that vSphere does not know about you will see failures such as no response to DHCP or ARP requests unless you modify the security settings of the network or PortGroup.

Trunk Port Group for Standard vSwitch

Trunk Port Group for Standard vSwitch

Distributed vSwitch Option

Why did we do that?

Now we can create guests for our OpenShift nodes which have a 2nd network interface card. When this NIC is attached to the newly created “Trunk” port group it will receive an an 802.1Q trunk from the vSwitch. The VLANs on that trunk may then be split back out to bridges in the node which provide connectivity to containerized virtual machines.

Customizing the OpenShift RHCOS Node Template

First, let’s restate what we are going to achieve: A virtual machine in a container on a virtual machine on a physical ESXi host. For this to work, our Guest (OpenShift Node) running in vSphere needs to know how to “do virtualization”.

Things are beginning to feel a bit recursive. 😵 Don’t worry. We’ll get there.

It all starts with a template…

Cloning the Existing RHCOS Template as a VM

The OpenShift Machine API operator builds nodes in vSphere by cloning a guest template that was created during cluster installation. This template does not include settings required for nested virtualization.

Clone the “*rhcos” template to a virtual machine so that it is possible to make edits. Give the VM a name that matches the template with “-cnv” on the end. So “hub-7vxwj-rhcos” becomes “hub-7vxwj-rhcos-cnv”.

📓 We are using “cnv” as shorthand for Container Native Virtualization which predates the OpenShift Virtualization name.

Customizing the Temporary VM

This VM is temporary. Don’t boot it. We just want to use it to make some changes that aren’t possible to make on a static template.

Make These Changes

  • Enable these CPU features: Hardware virtualization, IOMMU, Performance counters
  • Add a 2nd NIC attached to the Trunk portgroup
CNV Node Template with Customizations

CNV Node Template with Customizations

Converting the Customized VM to a Template

Once these changes have been made, convert this VM to a template. Keep the same hub-7vxwj-rhcos-cnv name.

Creating a Machineset for Hypervisors

How do we tell OpenShift to use this template?

MachineSets define how many machines to provision as worker nodes and exactly how to build them.

Based on the existing worker machineset, create a new one that is CNV specific. This machineset will use the newly created template.

# copy existing worker machineset
INFRA_ID=$(oc get infrastructure/cluster -o jsonpath='{.status.infrastructureName}')
echo $INFRA_ID
hub-7vxwj
oc get machineset/${INFRA_ID}-worker -n openshift-machine-api -o yaml > ${INFRA_ID}-cnv.yaml
# modify to look like the example below and create it
vi ${INFRA_ID}-cnv.yaml
oc create -f ${INFRA_ID}-cnv.yaml -n openshift-machine-api

📓 MachineSet For Workers With Virtualization

Notice that line 42 refers to the Trunk port group and line 46 refers to the virtual machine template hub-7vxwj-rhcos-cnv created above.

 1apiVersion: machine.openshift.io/v1beta1
 2kind: MachineSet
 3metadata:
 4  annotations:
 5    machine.openshift.io/memoryMb: "16384"
 6    machine.openshift.io/vCPU: "6"
 7  labels:
 8    machine.openshift.io/cluster-api-cluster: hub-7vxwj
 9  name: hub-7vxwj-cnv
10  namespace: openshift-machine-api
11  resourceVersion: "162348847"
12spec:
13  replicas: 1
14  selector:
15    matchLabels:
16      machine.openshift.io/cluster-api-cluster: hub-7vxwj
17      machine.openshift.io/cluster-api-machineset: hub-7vxwj-cnv
18  template:
19    metadata:
20      labels:
21        machine.openshift.io/cluster-api-cluster: hub-7vxwj
22        machine.openshift.io/cluster-api-machine-role: worker
23        machine.openshift.io/cluster-api-machine-type: worker
24        machine.openshift.io/cluster-api-machineset: hub-7vxwj-cnv
25    spec:
26      metadata:
27        labels:
28          machine.openshift.io/cluster-api-machineset: hub-7vxwj-cnv
29      providerSpec:
30        value:
31          apiVersion: vsphereprovider.openshift.io/v1beta1
32          credentialsSecret:
33            name: vsphere-cloud-credentials
34          diskGiB: 90
35          kind: VSphereMachineProviderSpec
36          memoryMiB: 16384
37          metadata:
38            creationTimestamp: null
39          network:
40            devices:
41            - networkName: lab-192-168-4-0-b24
42            - networkName: Trunk
43          numCPUs: 6
44          numCoresPerSocket: 1
45          snapshot: ""
46          template: hub-7vxwj-rhcos-cnv
47          userDataSecret:
48            name: worker-user-data
49          workspace:
50            datacenter: Garden
51            datastore: VMData-HD
52            folder: /Garden/vm/hub-7vxwj
53            resourcePool: /Garden/host/Goat/Resources
54            server: vcenter.lab.bewley.net

Configuring OpenShift Virtualization Networking

Install OpenShift Virtualization using the web UI or GitOps and this repo.

Once installed and a Hyperconverged resource has been created, the nmstate.io API group will become available.

📓 nmstate.io API Group Resources for node network configuration

$ oc api-resources --api-group nmstate.io
NAME                                 SHORTNAMES   APIVERSION           NAMESPACED   KIND
nodenetworkconfigurationenactments   nnce         nmstate.io/v1beta1   false        NodeNetworkConfigurationEnactment
nodenetworkconfigurationpolicies     nncp         nmstate.io/v1beta1   false        NodeNetworkConfigurationPolicy
nodenetworkstates                    nns          nmstate.io/v1beta1   false        NodeNetworkState

Creating a Node Network Configuration Policy

If we want to use all the VLANs we are trunking to a node, we need to tell OpenShift how to configure NIC for all those networks. Using resources from the NMState API we can configure the networking in the node operating system.

Create a NodeNetworkConfigurationPolicy that will be used to configure the 2nd NIC for us in a way that will present each VLAN as a bridge.

You may optionally log in to the node using oc debug node or ssh, and look at the current network settings before making changes.

Notice in this case that the 2nd NIC ens224 exists, but it has no useful configuration.

Node Network before NNCP

Node Network before NNCP

📓 NodeNetworkConfigurationPolicy For Workers With Virtualization

Notice on line 7 we are checking for a label that is common to the nodes with virtualization support. This will ensure our NNCP is applied only to the appropriate nodes.

  1apiVersion: nmstate.io/v1beta1
  2kind: NodeNetworkConfigurationPolicy
  3metadata:
  4  name: ens224-policy
  5spec:
  6  nodeSelector:
  7    machine.openshift.io/cluster-api-machineset: hub-7vxwj-cnv
  8  desiredState:
  9    interfaces:
 10
 11      # defined only to facilitate disabling DHCP
 12      - name: ens224
 13        type: ethernet
 14        state: up
 15        ipv4:
 16          enabled: false
 17        ipv6:
 18          enabled: false
 19
 20      # trans proxy
 21      - name: ens224.1925
 22        type: vlan
 23        state: up
 24        vlan:
 25          base-iface: ens224
 26          id: 1925
 27        ipv4:
 28          enabled: false
 29        ipv6:
 30          enabled: false
 31      - name: br-1925
 32        type: linux-bridge
 33        state: up
 34        ipv4:
 35          enabled: false
 36        ipv6:
 37          enabled: false
 38        bridge:
 39          options:
 40            stp:
 41              enabled: false
 42          port:
 43          - name: ens224.1925
 44            vlan: {}
 45
 46      # disco
 47      - name: ens224.1926
 48        type: vlan
 49        state: up
 50        vlan:
 51          base-iface: ens224
 52          id: 1926
 53        ipv4:
 54          enabled: false
 55        ipv6:
 56          enabled: false
 57      - name: br-1926
 58        type: linux-bridge
 59        state: up
 60        ipv4:
 61          enabled: false
 62        ipv6:
 63          enabled: false
 64        bridge:
 65          options:
 66            stp:
 67              enabled: false
 68          port:
 69          - name: ens224.1926
 70            vlan: {}
 71
 72      # metal
 73      - name: ens224.1927
 74        type: vlan
 75        state: up
 76        vlan:
 77          base-iface: ens224
 78          id: 1927
 79        ipv4:
 80          enabled: false
 81        ipv6:
 82          enabled: false
 83      - name: br-1927
 84        type: linux-bridge
 85        state: up
 86        ipv4:
 87          enabled: false
 88        ipv6:
 89          enabled: false
 90        bridge:
 91          options:
 92            stp:
 93              enabled: false
 94          port:
 95          - name: ens224.1927
 96            vlan: {}
 97
 98      # provisioning
 99      - name: ens224.1928
100        type: vlan
101        state: up
102        vlan:
103          base-iface: ens224
104          id: 1928
105        ipv4:
106          enabled: false
107        ipv6:
108          enabled: false
109      - name: br-1928
110        type: linux-bridge
111        state: up
112        ipv4:
113          enabled: false
114        ipv6:
115          enabled: false
116        bridge:
117          options:
118            stp:
119              enabled: false
120          port:
121          - name: ens224.1928
122            vlan: {}

⚠️ Ambiguous Kubevirt Labels

Ideally, we could rely on a label like kubevirt.io/scheduleable: “true”, but in my experience that label is not unique to hosts having virtualization extensions. I have opened a bug to find out more. https://bugzilla.redhat.com/show_bug.cgi?id=2081133

Confirming Node Networking Configuration Changes

After creation of the NodeNetworkConfigurationPolicy, a NodeNetworkConfigurationEnablement will be created for each node that satisfies the node selector in the policy. (machine.openshift.io/cluster-api-machineset: hub-7vxwj-cnv)

$ oc get nodes -l machine.openshift.io/cluster-api-machineset=hub-7vxwj-cnv
NAME                  STATUS   ROLES    AGE    VERSION
hub-7vxwj-cnv-drhkz   Ready    worker   162m   v1.23.5+9ce5071

$ oc create -f nodenetworkconfigurationpolicy.yaml
nodenetworkconfigurationpolicy.nmstate.io/ens224-policy created

$ oc get nnce
NAME                                STATUS
hub-7vxwj-cnv-drhkz.ens224-policy   Available

Now that the NNCE is status Available, optionally log back into the node and take a look at the network configuration.

Woah! Look at all those interfaces on ens224!

Node Network after NNCE

Node Network after NNCE

📓 Node Network Debugging

See Observing node network state

Attaching a Containerized Virtual Machine to a VLAN

All the work above occurred at the cluster level by a cluster admin. Further configuration takes place within the namespaces that host the virtual machines.

Configuring OpenShift Namespace Networking for VMs

For developers to attach CNV virtual machines to the networks plumbed above, we need to create points of attachment in the namespaces they are privileged to.

The NetworkAttachmentDefinition resource provides virtual machines a logical reference to the network interfaces we created previously.

$ oc api-resources --api-group k8s.cni.cncf.io
NAME                             SHORTNAMES       APIVERSION           NAMESPACED   KIND
network-attachment-definitions   net-attach-def   k8s.cni.cncf.io/v1   true         NetworkAttachmentDefinition

$ oc explain network-attachment-definition
KIND:     NetworkAttachmentDefinition
VERSION:  k8s.cni.cncf.io/v1

DESCRIPTION:
     NetworkAttachmentDefinition is a CRD schema specified by the Network
     Plumbing Working Group to express the intent for attaching pods to one or
     more logical or physical networks. More information available at:
     https://github.com/k8snetworkplumbingwg/multi-net-spec

📓 NetworkAttachmentDefinition Enabling access to provisioning bridge.

Enables VMs in a namespace to attach to a network using CNI.

 1apiVersion: k8s.cni.cncf.io/v1
 2kind: NetworkAttachmentDefinition
 3metadata:
 4  annotations:
 5    description: Provisioning Bridge V1928
 6    k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/br-1928
 7  name: vlan-1928
 8  namespace: provisioning
 9spec:
10  config: |-
11    {
12      "name": "vlan-1928",
13      "cniVersion": "0.3.1",
14      "plugins": [
15        {
16          "type": "cnv-bridge",
17          "bridge":"br-1928",
18          "vlan":1928,
19          "ipam":{}
20        },
21        {
22          "type": "cnv-tuning"
23        }
24      ]
25    }    

Let’s create an attachment to the provisioning network on bridge br-1928.

$ oc new-project provisioning
$ oc create -f net-attach-def.yaml -n provisioning

$ oc get net-attach-def -n provisioning
NAME        AGE
vlan-1928   1m

Once the network attachment definition is available, a VM can be launch using this network on a bridged interface.

Creating a Containerized VM

I’ll leave the details of creating and using VMs in OpenShift for another time, but I will complete the example.

Create a virtual machine in the OpenShift console, and customize the VM by adding a 2nd NIC. Select the vlan-1928 network attachment definition created above.

📓 Select The Proper Namespace

Remember that network attachment definitions are namespace scoped, as are VMs. Select the provisioning namespace when creating the virtual machine.

VM Dialog: Add 2nd NIC

VM Dialog: Add 2nd NIC

After booting and logging into the VM, it can be seen that the eth1 NIC obtained an IP address from the DHCP server on the provisioning LAN.

VM console showing eth1 Provisioning LAN

VM console showing eth1 Provisioning LAN

Summary

Having enabled CPU virtualization extensions on the virtual machine template and adding trunk support to the vswitch you can now launch virtual machines in your OpenShift cluster on vSphere with access to your lab networks.

This is a great way to explore OpenShift Virtualization and experiment with its features as you architect a production use case that leverages bare-metal nodes at a greater scale.

Have fun!

References

comments powered by Disqus