OpenShift 4 on OpenStack Networking and Installation

February 15, 2020

OpenShift Containter Platform 4 is much more like Tectonic than OpenShift 3. Particularly when it comes to installation and node management. Rather then building machines and running an Ansible playbook to configure them you now have the option of setting a fewer paramters in an install config running an installer to build and configure the cluster from scratch.

I would like to illustrate how the basics of the networking might look when installing OpenShift on OpenStack. I also wanted an excuse to try out a new iPad sketch app. These notes are based on recent 4.4 nightly builds on OSP 13 Queens.

Playbook to replace bootstrap.kubeconfig and node certificates on OpenShift 3.10 3.11

May 9, 2019

If you are a serial upgrader like me, you may have found that at one point during your 3.10.xx patching (say 3.10.119) you hit this error during the data plane upgrade:

TASK [openshift_node : Approve the node] ************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/restart.yml:49
Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_csr_approve.py
...
FAILED - RETRYING: Approve the node (30 retries left).Result was: {
    "all_subjects_found": [],
    "attempts": 1,
    "changed": false,
    "client_approve_results": [],
    "client_csrs": {},
    "failed": true,
    "invocation": {
        "module_args": {
            "node_list": [
                "ose-test-node-01.example.com"
            ],
            "oc_bin": "oc",
            "oc_conf": "/etc/origin/master/admin.kubeconfig"
        }
    },
    "msg": "Could not find csr for nodes: ose-test-node-01.example.com",
...

Turns out this was because the start up of atomic-openshift-node failed to generate a CSR.

Downgrade Etcd 3.3.11 to 3.2.22 for OpenShift Compatibility

February 19, 2019

While I was working on migrating etcd to my master nodes I was bitten by an incompatible etcd v3.3.11 RPM made available via RHEL Server Extras repo. Before I got to my last master the RPM was no longer available, and the scaleup playbook failed. I became aware that 3.3.11 is not compatible and should not have been made available.

Unfortunately all members of my etcd cluster were already upgraded and the fix is to take down the cluster, downgrade etcd, and restore from snapshot. It would be great if the etcd version was pinned like Docker is.

Etcdctl v2 and v3 Aliases for Peer Authenticated Commands

February 8, 2019

Getting all the arguments to etcdctl right can be a bit of a pain. Here are a couple of aliases which take advantage of the values in the etcd.conf file.

alias etcd2='. /etc/etcd/etcd.conf && \
    ETCDCTL_API=2 etcdctl \
    --cert-file ${ETCD_PEER_CERT_FILE} \
    --key-file ${ETCD_PEER_KEY_FILE} \
    --ca-file ${ETCD_PEER_TRUSTED_CA_FILE:-$ETCD_PEER_CA_FILE} \
    --endpoints "${ETCD_ADVERTISE_CLIENT_URLS}"'

alias etcd3='. /etc/etcd/etcd.conf && \
    ETCDCTL_API=3 etcdctl \
    --cert ${ETCD_PEER_CERT_FILE} \
    --key ${ETCD_PEER_KEY_FILE} \
    --cacert ${ETCD_PEER_TRUSTED_CA_FILE:-$ETCD_PEER_CA_FILE} \
    --endpoints "${ETCD_ADVERTISE_CLIENT_URLS}"'

If you are using OpenShift, you may also find that you already have some bash functions enabled by the etcd role in /etc/profile.d/etcdctl.sh. They will look different depending on your version. Below is from 3.9.

Migration of Etcd to Masters for OpenShift 3.9 to 3.10 Upgrade

February 8, 2019

As of OpenShift Container Platform 3.10 etcd is expected to run in static pods on the master nodes in the control plane. You may have a deployed an HA cluster with dedicated etcd nodes managed with systemd. How do you migrate the this new architecture?

Assumptions:

You are running OCP 3.9
You have multiple Master nodes
You have dedicated Etcd nodes
You are running RHEL, not Atomic nodes

Outline:

Backup etcd
Scale up Etcd cluster to include Master nodes
Configure Openshift Masters to ignore the old Etcd nodes
Scale down etcd cluster to remove old Etcd nodes

Detailed Steps

Follow along in this document https://docs.openshift.com/container-platform/3.9/admin_guide/assembly_replace-etcd-member.html You may find some etcd aliases handy before proceeding.

Load balancing of OpenShift HA Routers Mind the GARP

February 16, 2018

OpenShift HA Routing uses haproxy application routers to get traffic into the cluster. These application routers are made redundant by running ipfailover (keepalived) pods to maintain a set of Virtual IPs on each infrastructure node where the application routers run. These VIPs are then referenced by round robin DNS records to enable a measure of load balancing.

OK, so now you are load balancing at the network layer, but what about the link layer? Did you know that even if you somehow manage to perfectly balance traffic among the VIPs using RR DNS you could still be using only one of your application routers? Well you could be!

OpenShift 3.6 Upgrade Metrics Fails Missing heapster-certs Secret

October 13, 2017

After your upgrade to OpenShift v3.6 did the deployment of cluster metrics wind up with empty graphs? Check if the heapster pod failed to start due to a missing secret called heapster-certs in the openshift-infra namespace.

Problem

Heapster pod is failing to start

$ oc get pods
NAME                         READY     STATUS              RESTARTS   AGE
hawkular-cassandra-1-l1f3s   1/1       Running             0          9m
hawkular-metrics-rdl07       1/1       Running             0          9m
heapster-cfpcj               0/1       ContainerCreating   0          3m

Check what volumes it is attempting to mount

Installing OpenShift on OpenStack

August 20, 2017

This is a work in progress

The OpenShift Container Platform (OCP) can run on many types of infrastructure; from a Docker contrainer, to a single VM, to a fleet of baremetal or VMs on an infrastructure provider such as RHV, VMware, Amazon EC2, Google Compute Engine, or OpenStack Platform (OSP). This post is to document my experimentation with setting up OCP on OSP.

Doc Overview

So where are the docs?

How to push an image to an unexposed OpenShift Docker registry

August 9, 2017

How do I push an image to the OpenShift Docker registry if it is not exposed outside the cluster?

Login to a member node

Get on a machine that has docker and participates in the cluster SDN or can somehow access that network. (eg. 172.30.0.0/16)

Get the IP of the registry

oc get svc docker-registry -n default --template "{{ .spec.clusterIP }}"
SVC_REGISTRY=$(oc get svc docker-registry -n default --template "{{ .spec.clusterIP }}")

Get a token for your session

Automated Pruning of OpenShift Artifacts; Builds, Deploys, Images

March 22, 2017

After running openshift for a while I discovered that letting builds pile up to around to around 1,200 led to what was essentially a deadlock in the scheduling of new builds. New builds were stuck in a New, waiting state indefinitely.

This was fixed as of OCP 3.4.1, but it caused me to get more pro-active in the pruning of artifacts within OpenShift.

I threw together a script and a playbook to deploy it. YMMV

OpenShift 4 on OpenStack Networking and Installation

Playbook to replace bootstrap.kubeconfig and node certificates on OpenShift 3.10 3.11

Downgrade Etcd 3.3.11 to 3.2.22 for OpenShift Compatibility

Etcdctl v2 and v3 Aliases for Peer Authenticated Commands

Migration of Etcd to Masters for OpenShift 3.9 to 3.10 Upgrade

Detailed Steps

Load balancing of OpenShift HA Routers Mind the GARP

OpenShift 3.6 Upgrade Metrics Fails Missing heapster-certs Secret

Problem

Installing OpenShift on OpenStack

Doc Overview

How to push an image to an unexposed OpenShift Docker registry

Automated Pruning of OpenShift Artifacts; Builds, Deploys, Images

Search

Tags

Openshift

Detailed Steps

Problem

Doc Overview

Search

Tags