Debugging AWS STS Authentication for OpenShift Operators

March 10, 2022

OpenShift supports granular AWS permissions for pods running cluster operators or even user applications. This enhances security by providing only the necessary privileges and nothing more. This post explores debugging authN and authZ of pods attempting to use fine grained IAM roles in combination with AWS secure token service.

What is STS Authentication?

First some background on what Secure Token Service (STS) is, and why it is a best practice.

The typical method of programmatic authentication to AWS uses long lived credentials in the form of any aws_access_key_id and an associated aws_secret_access_key. If those 2 items were to be accidentally leaked, it’s game over for your AWS bill.

STS adds an ephemeral third factor in aws_session_token. This token will expire after a period of time, so it must be regenerated on a regular basis. In this case, these tokens are validated and refreshed by interacting with an OpenID Connect Identity Provider created for use by OpenShift.

OpenShift projects these 3 factors into a pod which may then use them to authenticate and assume an IAM role appropriately authorized for the relevant AWS services.

📓 OpenShift STS Authentication Process

STS authentication flow

Let’s explore this more deeply by looking an example where this process may not be working properly.

Degraded Ingress Cluster Operator

Multiple operators interact with STS to assume fine grained roles, but we’ll focus on the Ingress operator.

In this example, the Ingress cluster operator status is Degraded. The last message indicates the controller could not find the wildcard DNS (*.apps) entry in Route53. Normally the Ingress operator will be the device to create this record, but it can’t even search for it for some reason.

$ oc get co ingress
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
ingress                                              False       True          True       18h     ...\n\n

 The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DNSReady=False (NoZones: The record isn't present in any zones.)

Debugging Operator STS Authentication

The Ingress operator needs to authenticate to interact with Route53 service. To do so it seeks to assume a role that was provisioned by the cloud-credential-operator, ccoctl, or by hand.

This role is named in a secret along with the location of a secret token signed by a key trused by the OpenID Connect Provider you configured in AWS IAM.

Start by describing the cluster operator (oc describe co ingress) but more importantly, check the logs of the ingress operator pod.

$ oc logs -n openshift-ingress-operator deployment/ingress-operator -c ingress-operator

2022-02-25T17:08:36.019Z ERROR operator.init.controller-runtime.manager.controller.dns_controller controller/controller.go:253 Reconciler error {“name”: “default-wildcard”, “namespace”: “openshift-ingress-operator”, “error”: failed to create DNS provider: failed to create AWS DNS manager: failed to validate aws provider service endpoints: [failed to list route53 hosted zones: WebIdentityErr: failed to retrieve credentials caused by: InvalidIdentityToken: Couldn’t retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements status code: 400, request id: 6aa3127f-e9b8-44e1-87e6-a739746f0370, …

It appears the token was not verified.

Mapping the Operator to a Role

Is STS authentication and “assume role with web identity” working?

The ingress operator pod should be assuming a unique role from AWS that is more fine grained than the instance profile used by the node. In this way the pod has only the premissions it requires to interact with Route53 and ELB APIs.

📓 Try to reproduce this failure by replicating what the pod is doing.

After copying the token you can use the aws sts assume-role-with-web-identity command to test with.

First, gather some details from the pod’s context. Spefically the role to be assumed, and the token to authenticate with.

Cloud-credentials Secret

The cloud-credentials secret holds the ARN of the role which the pod should assume once authenticated, and the location of the token used to authenticate with. That token will be automatically available.

$ oc -n openshift-ingress-operator extract secret/cloud-credentials --to=-
# credentials
[default]
role_arn = arn:aws:iam::1234567890:role/ocp-oidc-openshift-ingress-operator-cloud-credentials
web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token

Role ARN

The role_arn refers to the IAM role which may have been created by the cloud-crendential-operator or other means.

Save this value to $ROLE_ARN.

JWT Token

The web_identity_token_file identifies the projected volume mount holding the JWT used to authenticate.

Download a copy of this file and save the path to $TOKEN

Viewing the Contents of the Role

Before moving on let’s pause and understand a bit more about the IAM role the operator seeks to assume.

The definition of the IAM role used by an operator will be derived from the CredentialsRequest resources bundled with the operator.

$ RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.9.21-x86_64
$ mkdir -p credrequests
$ oc adm release extract \
    --credentials-requests --cloud=aws \
    --to=credrequests/ $RELEASE_IMAGE

$ cat credrequests/0000_50_cluster-ingress-operator_00-ingress-credentials-request.yaml
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
  labels:
    controller-tools.k8s.io: "1.0"
  name: openshift-ingress
  namespace: openshift-cloud-credential-operator
spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: AWSProviderSpec
    statementEntries:
    - action:
      - elasticloadbalancing:DescribeLoadBalancers
      - route53:ListHostedZones
      - route53:ChangeResourceRecordSets
      - tag:GetResources
      effect: Allow
      resource: '*'
  secretRef:
    name: cloud-credentials
    namespace: openshift-ingress-operator
  serviceAccountNames:
  - ingress-operator

Examining the JSON Web Token

The JWT or JSON Web Token which contains 3 fields delimited by a “.”.

  1. Key ID which should match the IDP Key ID. Confirm this in the OIDC provider config
  2. Token Issuer and OpenShift service account name
  3. Cryptographic Signature

Note the key ID (kid)

Field 1

$ export TOKEN=/tmp/token
$ oc  -n openshift-ingress-operator -c ingress-operator \
       rsh deployment/ingress-operator \
       cat /var/run/secrets/openshift/serviceaccount/token > $TOKEN

$ cat $TOKEN | awk -F. '{ print $1 }' | base64 -d | jq
{
  "alg": "RS256",
  "kid": "TrJph8YY31qgQcN_KTaQspV7dY6Uks1BynN3YsoxJ5s"
}

In the 2nd field we want to note the audience (aud), the issuer (iss), and the service account name (serviceaccount) scoped by namespace.

Field 2

$ cat $TOKEN | awk -F. '{ print $2 }' | base64 -d | jq
{
  "aud": [
    "openshift"
  ],
  "exp": 1642528257,
  "iat": 1642524657,
  "iss": "https://ocp-oidc-oidc.s3.us-west-2.amazonaws.com",
  "kubernetes.io": {
    "namespace": "openshift-ingress-operator",
    "pod": {
      "name": "ingress-operator-5b8c4b8d8b-8kb8b",
      "uid": "cf9ee1e6-387f-445d-b8c0-90f40ad8a174"
    },
    "serviceaccount": {
      "name": "ingress-operator",
      "uid": "38313504-2c4c-4d1b-92a9-c7db391a8a59"
    }
  },
  "nbf": 1642524657,
  "sub": "system:serviceaccount:openshift-ingress-operator:ingress-operator"
}

Field 3 Not relevant today.

Verify the JWT Values

Confirm the key is recognized and the resources are publicly accessible.

Is the Key used to generate the token recognized by the OIDC provider in AWS?

You may verify the Key ID in the AWS OIDC provider resource with the kid in field 1 of the token.

Verify OpenID Connect IDP Reachability

Can you reach the IDP issuer’s openid-configuration?

The configuration file MUST be publicly reachable for OIDC to function as Amazon makes the call to this resource on your behalf. This must succeed on and off VPC!

ISSUER=$(cat $TOKEN | awk -F. '{ print $2 }' | base64 -d| jq -r .iss)
# try this from a pod and from your laptop
curl -s $ISSUER/.well-known/openid-configuration || echo "Failure"

Can you reach the IDP issuer’s public keys?

The keys MUST be publicly reachable for OIDC to function as Amazon makes the call to this resource on your behalf. This must succeed on and off VPC!

KEYS=$(curl -s $ISSUER/.well-known/openid-configuration | jq -r .jwks_uri)
# try this from a pod and from your laptop
curl -s $KEYS || echo "Failure"

Testing Authentication with the Token

Use the token to attempt to the assume the role.

First confirm you are authenitcated to AWS normally.

$ aws sts get-caller-identity
{
    "UserId": "AROATPFFJRK73QH4YHANK:i-07d6dbc1345299a43",
    "Account": "1234567890",
    "Arn": "arn:aws:sts::1234567890:assumed-role/managed-roles-ocp-BastionInstanceRole-X15RKXKW1US9/i-07d6dbc1345299a43"
}

Can you “assume role with web identity” using this token?

Copy the token to your a host where the aws CLI is configured and available. Then do what the pod is trying to do.

# Remember role_arn and token location here:
#  oc -n openshift-ingress-operator extract secret/cloud-credentials --to=-
echo $ROLE_ARN
arn:aws:iam::1234567890:role/ocp-oidc-cf-openshift-ingress-operator-cloud-credentials
echo $TOKEN
/tmp/token

aws sts assume-role-with-web-identity \
    --duration-seconds 900 \
    --role-session-name "assumeroletest" \
    --role-arn "$ROLE_ARN" \
    --web-identity-token "$(cat $TOKEN)"

You should get back a block of JSON for success

{
    "Credentials": {
        "AccessKeyId": "ASIAT...............",
        "SecretAccessKey": "cgp/a/ziU.......",
        "SessionToken": "IQoJb3JpZ2luX2VjE..",
        "Expiration": "2022-02-25T22:09:33+00:00"
    },
    "SubjectFromWebIdentityToken": "system:serviceaccount:openshift-ingress-operator:ingress-operator",
    "AssumedRoleUser": {
        "AssumedRoleId": "AROATPFFJRK72IHENLQ7D:assumeroletest",
        "Arn": "arn:aws:sts::1234567890:assumed-role/ocp-oidc-cf-openshift-ingress-operator-cloud-credentials/assumeroletest"
    },
    "Provider": "arn:aws:iam::1234567890:oidc-provider/ocp-oidc-oidc.s3.us-west-2.amazonaws.com",
    "Audience": "openshift"
}

If you see an error like this, your pod can not authenticate or isn’t authorized to assume the role.

An error occurred (InvalidIdentityToken) when calling the AssumeRoleWithWebIdentity operation: Couldn’t retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements

Possible Failures

  • Are the openid-configuration and the keys publicly reachable by https? Remember this does not HAVE to be S3. If your S3 may only be private then you could use Cloudfront to expose the bucket.
  • Is the key id in the token the same as configured in the OIDC IDP?
  • Is the openid-configuration pointing to the right URL for the keys?
  • Is the audience in the token the same as configured in the OIDC IDP? The identity provider should have ‘sts.amazonaws.com’ and ‘openshift’ audiences.
  • Was credentialsMode set to manual at OpenShift install time?
  • Does the token service account match the trust relationship on the role? Notice the namespace and service account names

IAM Role Trust Relationship Note service account in the condition

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::1234567890:oidc-provider/ocp-oidc-oidc.s3.us-west-2.amazonaws.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "ocp-oidc-oidc.s3.us-west-2.amazonaws.com:sub": "system:serviceaccount:openshift-ingress-operator:ingress-operator"
                }
            }
        }
    ]
}

References

comments powered by Disqus