OCP3

Playbook to replace bootstrap.kubeconfig and node certificates on OpenShift 3.10 3.11

If you are a serial upgrader like me, you may have found that at one point during your 3.10.xx patching (say 3.10.119) you hit this error during the data plane upgrade: TASK [openshift_node : Approve the node] ************************************************************ task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/restart.yml:49 Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_csr_approve.py ... FAILED - RETRYING: Approve the node (30 retries left).Result was: { "all_subjects_found": [], "attempts": 1, "changed": false, "client_approve_results": [], "client_csrs": {}, "failed": true, "invocation": { "module_args": { "node_list": [ "ose-test-node-01.

Continue reading

Downgrade Etcd 3.3.11 to 3.2.22 for OpenShift Compatibility

While I was working on migrating etcd to my master nodes I was bitten by an incompatible etcd v3.3.11 RPM made available via RHEL Server Extras repo. Before I got to my last master the RPM was no longer available, and the scaleup playbook failed. I became aware that 3.3.11 is not compatible and should not have been made available. Unfortunately all members of my etcd cluster were already upgraded and the fix is to take down the cluster, downgrade etcd, and restore from snapshot.

Continue reading

Migration of Etcd to Masters for OpenShift 3.9 to 3.10 Upgrade

As of OpenShift Container Platform 3.10 etcd is expected to run in static pods on the master nodes in the control plane. You may have a deployed an HA cluster with dedicated etcd nodes managed with systemd. How do you migrate the this new architecture? Assumptions: You are running OCP 3.9 You have multiple Master nodes You have dedicated Etcd nodes You are running RHEL, not Atomic nodes Outline: Backup etcd Scale up Etcd cluster to include Master nodes Configure Openshift Masters to ignore the old Etcd nodes Scale down etcd cluster to remove old Etcd nodes Detailed Steps Follow along in this document https://docs.

Continue reading

OpenShift 3.6 Upgrade Metrics Fails Missing heapster-certs Secret

After your upgrade to OpenShift v3.6 did the deployment of cluster metrics wind up with empty graphs? Check if the heapster pod failed to start due to a missing secret called heapster-certs in the openshift-infra namespace. Problem Heapster pod is failing to start $ oc get pods NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-l1f3s 1/1 Running 0 9m hawkular-metrics-rdl07 1/1 Running 0 9m heapster-cfpcj 0/1 ContainerCreating 0 3m Check what volumes it is attempting to mount

Continue reading

Automated Pruning of OpenShift Artifacts; Builds, Deploys, Images

After running openshift for a while I discovered that letting builds pile up to around to around 1,200 led to what was essentially a deadlock in the scheduling of new builds. New builds were stuck in a New, waiting state indefinitely. This was fixed as of OCP 3.4.1, but it caused me to get more pro-active in the pruning of artifacts within OpenShift. I threw together a script and a playbook to deploy it.

Continue reading

Configuring OpenShift with Multiple Sharded Routers

I needed to host a service that would be consumed by a closed client that insists on speaking HTTPS on port 50,000. To solve this, I added a 2nd router deployment and used the OpenShift router sharding feature to selectively enable routes on the 2nd router by way of selectors. To summarize: Existing HA router: HTTP 80 HTTPS 443 Haproxy Stats 1,936 Added HA router: HTTP 49,999 HTTPS 50,000 Haproxy Stats 51,936 How To Open infra node firewalls Open firewall on infra nodes where router will run to allow new http and https port iptables -A OS_FIREWALL_ALLOW -m tcp -p tcp --dport 49999 -j ACCEPT iptables -A OS_FIREWALL_ALLOW -m tcp -p tcp --dport 50000 -j ACCEPT This can also be done with Ansible and the os_firewall role in your playbook.

Continue reading

OpenShift Cluster Metrics and Cassandra Troubleshooting

OpenShift gathers cluster metrics such as CPU, memory, and network bandwidth per pod which can assist in troubleshooting and capacity planning. The metrics are also used to support horizontal pod autoscaling, which makes the metrics service not just helpful, but critical to operation. Missing Liveness Probes There are 3 major components in the metrics collection process. Heapster gathers stats from Docker and feeds them to Hawkular Metrics to tuck away for safe keeping in Cassandra.

Continue reading

How to List Tags On Redhat Registry Images

Ever gone to RedHat’s container registry to search for an image and been left wondering what versions exist? Ever been frustrated by the inconsistent tag format? Is there a v or is there not a v? Me too. Docker Hub has progressed to v2, while the RedHat registry is still v1 at the moment. As long as you use the right syntax, you can use curl to query the registry API and list the tags like this:

Continue reading

Deploy Hawkular Metrics in CDK 2.1 OpenShift 3.2

Update! I failed with CDK 2.0, but CDK 2.1 works with some fiddling. In my last post I installed Red Hat Container Developer Kit to deploy OpenShift Enterprise using Vagrant. But now I want to add Hawkular Metrics to that deployment. Deploy Metrics Refer to the docs for deploying metrics in OSE. Login to the vagrant CDK VM before continuing $ cd ~/cdk/components/rhel/rhel-ose/ $ vagrant ssh $ oc login Authentication required for https://127.

Continue reading

Getting Started With RedHat Container Development Kit

The RedHat Container Developer Kit allows you to deploy OpenShift on your laptop for easier testing and development. Here is how to deploy it. Register as a RedHat Developer Obtain a RH login Place credentials in ~/.vagrant.d/Vagrantfile to enable updates for VMs by automatically registering with RedHat Subscription Manager Vagrant.configure('2') do |config| config.registration.username = '<your Red Hat username>' config.registration.password = '<your Red Hat password>' end Mac OS X Prereqs Install pre-reqs:

Continue reading