OpenShift High Availability - Routing

March 1, 2016

Highly availabile containers in OpenShift are baked into the cake thanks to replication controllers and service load balancing, but there are plenty of other single points of failure. Here is how to eliminate many of those.

Single Points of Failure

The components of OpenShift include:

  • Master controller manager server and API endpoint
  • Etcd configuration and state storage
  • Docker Registry
  • Router haproxy

This post is mostly about adding high availability to the routing layer.

OpenShift High Availability Configuration

What do we have to do?


Host Inventory and Installation

Of course you’ll be doing advanced install which leverages the OpenShift Ansible playbook.

An example installation might look like this:

  • 3 master nodes
  • 2 infrastructure nodes
  • 3 primary nodes
  • 3 etcd servers
  • 2 load balancers

Here is an overview of the hosts with IP addresses and labels.

Infrastructure Nodes

The infrastructure nodes will be used to run non-user pods, like haproxy routers.

ose-ha-node-01.example.com192.0.2.1region=infra, zone=metal
ose-ha-node-02.example.com192.0.2.2region=infra, zone=metal

Primary Nodes

The primary nodes will run user application pods.

ose-ha-node-03.example.com192.0.2.3region=primary, zone=rhev
ose-ha-node-04.example.com192.0.2.4region=primary, zone=rhev
ose-ha-node-05.example.com192.0.2.5region=primary, zone=rhev

Master Nodes

The master servers act as the API endpoint and can be load balanced by independent load balancer nodes or a dedicated hardware. One master is elected as the contoller manager server.

ose-ha-master-01.example.com192.0.2.21region=infra, zone=rhev
ose-ha-master-02.example.com192.0.2.22region=infra, zone=rhev
ose-ha-master-03.example.com192.0.2.23region=infra, zone=rhev

Load Balancer Servers

These hosts run haproxy and front end the masters using a hostname defined as in the hosts file.


Etcd Servers

Etcd is used to maintain all state for the cluster, and is configured as a standalone cluster.


Hosts Inventory File

And here is the inventory file based on the examples.


openshift_master_identity_providers=[{'name': 'my_ldap_provider', 'challenge': 'true', 'login': 'true', 'kind': 'LDAPPasswordIdentityProvider', 'attributes': {'id': ['dn'], 'email': ['mail'], 'name': ['cn'], 'preferredUsername': ['uid']}, 'bindDN': '', 'bindPassword': '', 'ca': '', 'insecure': 'true', 'url': 'ldap://,'}]




ose-ha-master-[01:03] openshift_node_labels="{'region': 'infra', 'zone': 'rhev'}" openshift_schedulable=False
ose-ha-node-[01:02]   openshift_node_labels="{'region': 'infra', 'zone': 'metal'}"
ose-ha-node-[03:05]   openshift_node_labels="{'region': 'primary', 'zone': 'rhev'}"

Perform the Install

Run my prep playbook and then run the byo playbook to perform the actual install.


Initial DNS Configuration

Access to applications like starts with a wildcard DNS A record in your domain, * pointing to a router pod. The router pod should be assigned to an infrastructure node since the container will be using the host port to attach ha-proxy to.

The DNS records should point to the router pods which are using the infrastructure node host ports. That means the DNS record should point to the IP of the infrastructure node(s). But what if that node fails? Don’t worry about that just yet.

Using nsupdate and a key which is allowed to manipulate our zone, let’s insert a * wildcard, and point the name ose-master at the IP of the first load balancer node (for now).

nsupdate -v -k
    update add *      300 A
    update add 300 A

HA Routing

Of course if DNS points at the IP of a single node, your apps will become unavailable if that node reboots. That can be fixed with a IP Failover service and floating IPs.

The result will look like this:

OpenShift HA Routing

Create a HA router set for the application pods in the primary region. The routers will run on the schedulable nodes in the infra region.

OpenShift’s ipfailover internally uses keepalived, so ensure that multicast is enabled on the labeled nodes, specifically the VRRP multicast IP address

Label the nodes ha-router=primary so they can be selected for the service

oc label nodes ose-ha-node-0{1,2} "ha-router=primary"

# confirm the change
oc get nodes --selector='ha-router=primary'
NAME                         LABELS                                                                                       STATUS    AGE   ha-router=primary,,region=infra,zone=rhev   Ready     3d   ha-router=primary,,region=infra,zone=rhev   Ready     3d

Infrastructure Nodes

ose-ha-node-01.example.com192.0.2.1region=infra, zone=metal, ha-router=primary
ose-ha-node-02.example.com192.0.2.2region=infra, zone=metal, ha-router=primary

Use router service account (or optionally create ipfailover account) to create the router. Check that it exists.

oc get scc privileged -o json | jq .users

Since we have 2 Infrastructure (region=infra) nodes which are labeled ha-router=primary let’s start 2 replicas of a router called ha-router-primary.

Go get a legit wildcard cert for * instead of generating one, and concatenate the cert, key, and intermediate certs into a pem file.

cat \ \ \
  gd_bundle-g2-g1.crt \

Deploy the router with the wildcard cert as the default certificate.

oadm router ha-router-primary \
    --replicas=2 \
    --selector="ha-router=primary" \
    --selector="region=infra" \
    --labels="ha-router=primary" \
    --credentials=/etc/origin/master/openshift-router.kubeconfig \ \
password for stats user admin has been set to cixBqxbXyz
DeploymentConfig "ha-router-primary" created
Service "ha-router-primary" created

Edit the deployment config for the HA router and add the default cert within[DEFAULT_CERTIFICATE]

oc edit dc ha-router-primary

From the initial install, there will be a pre-existing router (router-1) holding the host ports (80,443) which precludes starting the ha router instances. Scale that old router down to 0 pods:

oc scale --replicas=0 rc router-1

Pick 2 IP addresses which will float between the 2 infra nodes and create a IP failover service.

IP Failover Nodes

IP Failover ServiceIPsLabels

Create a IP failover configuration named ipf-ha-router-primary having N replicas equal to number nodes labeled ha-router=primary

oadm ipfailover ipf-ha-router-primary \
    --replicas=2 \
    --watch-port=80 \
    --selector="ha-router=primary" \
    --virtual-ips="" \
    --credentials=/etc/origin/master/openshift-router.kubeconfig \
    --service-account=router \

Keepalived Readiness Probe

As of OSE 3.2 the oc status -v command will warn you that there is no readiness probe defined for this ipf-ha-router-primary deployment config. I tried to resolve that warning with this probe, but deployment failed with a connection refused to port 1985 on the node host IP.

oc set probe dc/ipf-ha-router-primary --readiness --open-tcp=1985

OpenShift HA DNS Configuration

Wildcard DNS records point users to the OpenShift routers providing ingress to application services.

Update the DNS wildcard records to reflect the floating IPs instead of the infra nodes’ primary IPs.

nsupdate -v -k
    update delete * 300 A
    update add    * 300 A
    update delete * 300 A
    update add    * 300 A

HA Master


If a lb group is defined in the Ansible playbook inventory then a haproxy node will be setup to load balance the master API endpoint. The load balancer becomes a single point of failure, however.

HA Registry


This is pretty easy. I’ll post about it at some point.

Related Documentation

comments powered by Disqus