Migration of Etcd to Masters for OpenShift 3.9 to 3.10 Upgrade

etcd openshift OCP3.9 OCP3.10

Feb 08, 2019

As of OpenShift Container Platform 3.10 etcd is expected to run in static pods on the master nodes in the control plane. You may have a deployed an HA cluster with dedicated etcd nodes managed with systemd. How do you migrate the this new architecture?



Detailed Steps

Follow along in this document You may find some etcd aliases handy before proceeding.


. /etc/etcd/etcd.conf


ETCDCTL_API=3 etcdctl \
    --cert ${ETCD_PEER_CERT_FILE} \
    --key ${ETCD_PEER_KEY_FILE} \
    --endpoints "$ENDPOINTS" \
    endpoint health
[root@ose-test-etcd-01 bin]# ./etcd-health is healthy: successfully committed proposal: took = 2.41743ms is healthy: successfully committed proposal: took = 2.363286ms is healthy: successfully committed proposal: took = 2.213456ms

Because of the migration during the upgrade to 3.7, I am assuming I do not need to back up v2 data. That is somewhat TBD, however.


. /etc/etcd/etcd.conf

ETCD3="etcdctl --cert ${ETCD_PEER_CERT_FILE} \
  --key ${ETCD_PEER_KEY_FILE} \

BACKUP_DIR="/var/backup/etcd/$(date +%Y%m%d%H%M)"
mkdir -p ${BACKUP_DIR}/snap
cp -rp /etc/etcd "${BACKUP_DIR}/"
cp -p $0 "${BACKUP_DIR}/"

        snapshot save ${BACKUP_DIR}/snap/db

# Restore:
# . ${BACKUP_DIR}/etcd/etcd.conf
#    --name $ETCD_NAME \
#    --initial-cluster $ETCD_INITIAL_CLUSTER \
#    --initial-cluster-token $ETCD_INITIAL_CLUSTER_TOKEN \
#    --initial-advertise-peer-urls $ETCD_INITIAL_ADVERTISE_PEER_URLS \
#    snapshot restore ${BACKUP_DIR}/snap/db

ansible-playbook -vvv \
        -i hosts "$PLAYBOOK" \
        | tee $(date +%Y%m%d-%H%M)-etcd-scaleup.log

In my case I found etcd had been accidentally started by hand with a default config file which listened on localhost. The config file was modified by the etcd role and the restart etcd handler was notified, but it was skipped. This caused the etcd cluster status check task to timeout, and subsequent steps in the playbook to fail.

After restarting etcd at 18:43 the cluster reports as healthy, and I re-ran the playbook successfully.

After the playbook has been run successfuly it can be seen that the master node has been added as an etcd endpoint in /etc/origin/master/master-config.yaml on every master node.

  ca: master.etcd-ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key

I considered the modify_yaml but after noticing it inserted some nulls and converted some doule quotes to single quotes, I was happy to find the yedit module.

# playbook to replace currently configured master etcd URLs with
# the hosts found in ansible etcd group
- hosts: masters

    openshift_master_fire_handlers: true

    - lib_utils
    - openshift_facts


    - name: Gather Cluster facts
        role: common

    - name: Derive etcd url list
        openshift_master_etcd_urls: "{{ groups['etcd'] | lib_utils_oo_etcd_host_urls(l_use_ssl, openshift_master_etcd_port) }}"
        l_use_ssl: "{{ openshift_master_etcd_use_ssl | default(True) | bool}}"
        openshift_master_etcd_port: "{{ etcd_client_port | default('2379') }}"

    - name: Configure ectcd url list
        src: "{{ openshift.common.config_base }}/master/master-config.yaml"
        key: etcdClientInfo.urls
        value: "{{ openshift_master_etcd_urls }}"
        backup: yes
      notify: restart master api

    - import_tasks: /usr/share/ansible/openshift-ansible/roles/openshift_master/handlers/main.yml
  [root@ose-test-master-01 etcd]# etcdctl3 member list
  3cc657644e2e1080, started,,,
  669fc09764815697, started,,,
  dd1f136e71579ace, started,,,
  eafa4cc2f9510e7b, started,,,

  [root@ose-test-master-01 etcd]# etcdctl3 member remove 669fc09764815697

You are now one step closer to OpenShift 3.10.

At this point etcd should be running only on the 3 Master nodes and not on the old Etcd nodes. All the masters should know this, and you are one step closer to being able to upgrade to OpenShift 3.10.




