Playbook to replace bootstrap.kubeconfig and node certificates on OpenShift 3.10 3.11

May 9, 2019

If you are a serial upgrader like me, you may have found that at one point during your 3.10.xx patching (say 3.10.119) you hit this error during the data plane upgrade:

TASK [openshift_node : Approve the node] ************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/restart.yml:49
Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_csr_approve.py
...
FAILED - RETRYING: Approve the node (30 retries left).Result was: {
    "all_subjects_found": [],
    "attempts": 1,
    "changed": false,
    "client_approve_results": [],
    "client_csrs": {},
    "failed": true,
    "invocation": {
        "module_args": {
            "node_list": [
                "ose-test-node-01.example.com"
            ],
            "oc_bin": "oc",
            "oc_conf": "/etc/origin/master/admin.kubeconfig"
        }
    },
    "msg": "Could not find csr for nodes: ose-test-node-01.example.com",
...

Turns out this was because the start up of atomic-openshift-node failed to generate a CSR.

Downgrade Etcd 3.3.11 to 3.2.22 for OpenShift Compatibility

February 19, 2019

While I was working on migrating etcd to my master nodes I was bitten by an incompatible etcd v3.3.11 RPM made available via RHEL Server Extras repo. Before I got to my last master the RPM was no longer available, and the scaleup playbook failed. I became aware that 3.3.11 is not compatible and should not have been made available.

Unfortunately all members of my etcd cluster were already upgraded and the fix is to take down the cluster, downgrade etcd, and restore from snapshot. It would be great if the etcd version was pinned like Docker is.

Migration of Etcd to Masters for OpenShift 3.9 to 3.10 Upgrade

February 8, 2019

As of OpenShift Container Platform 3.10 etcd is expected to run in static pods on the master nodes in the control plane. You may have a deployed an HA cluster with dedicated etcd nodes managed with systemd. How do you migrate the this new architecture?

Assumptions:

You are running OCP 3.9
You have multiple Master nodes
You have dedicated Etcd nodes
You are running RHEL, not Atomic nodes

Outline:

Backup etcd
Scale up Etcd cluster to include Master nodes
Configure Openshift Masters to ignore the old Etcd nodes
Scale down etcd cluster to remove old Etcd nodes

Detailed Steps

Follow along in this document https://docs.openshift.com/container-platform/3.9/admin_guide/assembly_replace-etcd-member.html You may find some etcd aliases handy before proceeding.

OpenShift 3.6 Upgrade Metrics Fails Missing heapster-certs Secret

October 13, 2017

After your upgrade to OpenShift v3.6 did the deployment of cluster metrics wind up with empty graphs? Check if the heapster pod failed to start due to a missing secret called heapster-certs in the openshift-infra namespace.

Problem

Heapster pod is failing to start

$ oc get pods
NAME                         READY     STATUS              RESTARTS   AGE
hawkular-cassandra-1-l1f3s   1/1       Running             0          9m
hawkular-metrics-rdl07       1/1       Running             0          9m
heapster-cfpcj               0/1       ContainerCreating   0          3m

Check what volumes it is attempting to mount

Automated Pruning of OpenShift Artifacts; Builds, Deploys, Images

March 22, 2017

After running openshift for a while I discovered that letting builds pile up to around to around 1,200 led to what was essentially a deadlock in the scheduling of new builds. New builds were stuck in a New, waiting state indefinitely.

This was fixed as of OCP 3.4.1, but it caused me to get more pro-active in the pruning of artifacts within OpenShift.

I threw together a script and a playbook to deploy it. YMMV

Configuring OpenShift with Multiple Sharded Routers

January 29, 2017

I needed to host a service that would be consumed by a closed client that insists on speaking HTTPS on port 50,000. To solve this, I added a 2nd router deployment and used the OpenShift router sharding feature to selectively enable routes on the 2nd router by way of selectors.

To summarize:

Existing HA router:

HTTP 80
HTTPS 443
Haproxy Stats 1,936

Added HA router:

HTTP 49,999
HTTPS 50,000
Haproxy Stats 51,936

How To

Open infra node firewalls

Open firewall on infra nodes where router will run to allow new http and https port

 iptables -A OS_FIREWALL_ALLOW -m tcp -p tcp --dport 49999 -j ACCEPT
 iptables -A OS_FIREWALL_ALLOW -m tcp -p tcp --dport 50000 -j ACCEPT

This can also be done with Ansible and the os_firewall role in your playbook. (untested)

- hosts: infra-nodes

  vars:
    os_firewall_use_firewalld: False
    os_firewall_allow:
      - service: teradici-http
        port: 49999/tcp
      - service: teradici-https
        port: 50000/tcp

  roles:
    - os_firewall

Create a router

Create a router called ha-router-teradici with oa adm router or oadm router on these ports and also make sure the stats port does not clash with existing router on port 1936

[root@ose-test-master-01 ~]# oc get nodes --show-labels
NAME                           STATUS    AGE       LABELS
ose-test-master-01.example.com   Ready     180d      kubernetes.io/hostname=ose-test-master-01.example.com,region=master,zone=rhev
ose-test-master-02.example.com   Ready     180d      kubernetes.io/hostname=ose-test-master-02.example.com,region=master,zone=rhev
ose-test-node-01.example.com     Ready     180d      ha-router=primary,kubernetes.io/hostname=ose-test-node-01.example.com,region=infra,zone=rhev
ose-test-node-02.example.com     Ready     180d      ha-router=primary,kubernetes.io/hostname=ose-test-node-02.example.com,region=infra,zone=rhev
ose-test-node-03.example.com     Ready     180d      kubernetes.io/hostname=ose-test-node-03.example.com,region=primary,zone=rhev
ose-test-node-04.example.com     Ready     180d      kubernetes.io/hostname=ose-test-node-04.example.com,region=primary,zone=rhev

[root@ose-test-master-01 ~]#  oadm router ha-router-teradici \
    --ports='49999:49999,50000:50000' \
    --stats-port=51936 \
    --replicas=2 \
    --selector="ha-router=primary" \
    --selector="region=infra" \
    --labels="ha-router=teradici" \
    --default-cert=201602_router_wildcard.os.example.com.pem \
    --service-account=router

GOOD: I see that the ports are set properly in the haproxy.config and the service objects

OpenShift Cluster Metrics and Cassandra Troubleshooting

November 14, 2016

OpenShift gathers cluster metrics such as CPU, memory, and network bandwidth per pod which can assist in troubleshooting and capacity planning. The metrics are also used to support horizontal pod autoscaling, which makes the metrics service not just helpful, but critical to operation.

Missing Liveness Probes

There are 3 major components in the metrics collection process. Heapster gathers stats from Docker and feeds them to Hawkular Metrics to tuck away for safe keeping in Cassandra.

How to List Tags On Redhat Registry Images

July 11, 2016

Ever gone to RedHat’s container registry to search for an image and been left wondering what versions exist? Ever been frustrated by the inconsistent tag format? Is there a v or is there not a v? Me too.

Docker Hub has progressed to v2, while the RedHat registry is still v1 at the moment. As long as you use the right syntax, you can use curl to query the registry API and list the tags like this:

Deploy Hawkular Metrics in CDK 2.1 OpenShift 3.2

June 16, 2016

Update! I failed with CDK 2.0, but CDK 2.1 works with some fiddling.

In my last post I installed Red Hat Container Developer Kit to deploy OpenShift Enterprise using Vagrant. But now I want to add Hawkular Metrics to that deployment.

Deploy Metrics

Refer to the docs for deploying metrics in OSE.

$ cd ~/cdk/components/rhel/rhel-ose/
$ vagrant ssh

$ oc login
Authentication required for https://127.0.0.1:8443 (openshift)
Username: admin
Password: admin
Login successful.

$ oc project openshift-infra

$ oc get sa
NAME                        SECRETS   AGE
build-controller            2         10m
builder                     2         10m
daemonset-controller        2         10m
default                     2         10m
deployer                    2         10m
deployment-controller       2         10m
gc-controller               2         10m
hpa-controller              2         10m
job-controller              2         10m
namespace-controller        2         10m
pv-binder-controller        2         10m
pv-provisioner-controller   2         10m
pv-recycler-controller      2         10m
replication-controller      2         10m

$ oc create -f - <<API
apiVersion: v1
kind: ServiceAccount
metadata:
  name: metrics-deployer
  secrets:
  - name: metrics-deployer
API

$ oc secrets new metrics-deployer nothing=/dev/null

$ oadm policy add-role-to-user         edit           system:serviceaccount:openshift-infra:metrics-deployer
$ oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster

From your OSE server grab /usr/share/openshift/examples/infrastructure-templates/enterprise/metrics-deployer.yaml or from here

Getting Started With RedHat Container Development Kit

June 16, 2016

The RedHat Container Developer Kit allows you to deploy OpenShift on your laptop for easier testing and development. Here is how to deploy it.

Register as a RedHat Developer

Obtain a RH login
Place credentials in ~/.vagrant.d/Vagrantfile to enable updates for VMs by automatically registering with RedHat Subscription Manager

Vagrant.configure('2') do |config|
 config.registration.username = '<your Red Hat username>'
 config.registration.password = '<your Red Hat password>'
end

Mac OS X Prereqs

Install pre-reqs:
Continue reading

OCP3