Migrating off AAD Pod Identity on Multi-Region AKS

Microsoft retired AAD Pod Identity in late 2024 with a hard deprecation. The replacement — Azure AD Workload Identity — uses OIDC federation instead of node-bound identities, removes the nmi DaemonSet from the cluster, and is materially better. The official migration story is “annotate your service account, label your pod, done.”

That story is true on a single cluster. On a multi-region AKS estate, it’s the part that takes a week. The hard part is everything around it.

This is what actually goes wrong, told from migrating a fleet of around a dozen clusters across two Azure regions.

Why the migration was non-optional

Pod Identity wasn’t just deprecated — it was getting flakier. Two specific failure modes drove the timeline.

Failure mode 1: the NMI race at startup

Pod Identity worked by intercepting IMDS (169.254.169.254) calls from pods and rewriting them, via a per-node DaemonSet called nmi (Node Managed Identity). When a pod started, it asked for an Azure token via the standard SDK; nmi saw the request, looked up which managed identity should answer, and proxied the response.

The race: pods could ask for a token before nmi had finished syncing the AzureIdentity and AzureIdentityBinding CRDs from the API server. Symptom on the application side was a single ManagedIdentityCredential failure on first boot — sometimes retried successfully by the SDK, sometimes not, depending on the SDK version. On a small cluster with steady-state pods, you’d never see it. On a fleet with frequent rolling deployments and HPA-driven scale-out events, it surfaced as intermittent 401s in the early-startup window.

Cilium-style pre-pull init containers, Karpenter-driven node churn, and pod packing density all made it worse. The fix at runtime was application-level retry — a band-aid on a known race in a deprecated component.

Workload Identity removes the race entirely: there’s no nmi, no IMDS interception, no CRD sync. The pod gets a projected service-account token on volume mount, exchanges it for an Azure token via the standard OIDC federation flow, and the only failure mode is “the federation isn’t configured” — which fails fast and loud, not intermittently.

Failure mode 2: cross-region trust drift

The other thing Pod Identity hid was how fragile the trust relationship was across regions. Each cluster had its own AzureIdentityBinding objects, and over time those drifted between regions — usually because someone hot-fixed one cluster and forgot to mirror to the other, or because Karpenter recreated nodes with stale AzureAssignedIdentity records that didn’t get garbage-collected cleanly.

Workload Identity makes the trust explicit and visible. Each AKS cluster exposes its own OIDC issuer URL, and each Azure managed identity carries an explicit list of federatedIdentityCredentials that name-by-name allow specific (issuer, subject, audience) triples to mint tokens. There’s nowhere for drift to hide — either the federation entry exists in Entra ID for that exact (cluster, namespace, serviceaccount) triple, or it doesn’t and the pod’s az login fails with a clear error.

This is good. It’s also the source of the next problem.

The architecture that works

Per region, three things wire together: the AKS cluster’s OIDC issuer, a user-assigned managed identity in the same region, and an explicit federation entry on that identity.

Two managed identities, one per region, each federated to the OIDC issuer of its own cluster. Pods in eastus mint tokens that only the East US identity will accept; pods in westeurope mint tokens that only the West EU identity will accept. Regional blast-radius is preserved by construction — a compromised cluster cannot impersonate workloads in the other region.

What the federation actually looks like

The Terraform side reduces to one resource per cluster — the federated identity credential pinned to a specific (issuer, subject, audience):

resource "azurerm_federated_identity_credential" "workload" {
  for_each            = var.regions
  name                = "fic-app-${each.value.location}"
  resource_group_name = azurerm_resource_group.aks[each.key].name
  parent_id           = azurerm_user_assigned_identity.workload[each.key].id
  audience            = ["api://AzureADTokenExchange"]
  issuer              = azurerm_kubernetes_cluster.aks[each.key].oidc_issuer_url
  subject             = "system:serviceaccount:default:workload-identity-sa"
}

The Kubernetes side reduces to two annotations:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: workload-identity-sa
  namespace: default
  annotations:
    azure.workload.identity/client-id: "${MI_CLIENT_ID}"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    metadata:
      labels:
        azure.workload.identity/use: "true"   # injects projected token
    spec:
      serviceAccountName: workload-identity-sa
      containers:
        - name: app
          image: myregistry.azurecr.io/my-app:1.0.0

That’s the whole runtime contract. Everything else in the Microsoft Learn walkthrough is plumbing around this.

Three things to budget for in the migration

The minimal Terraform/YAML above is one day of work per cluster. The rest of the week per cluster goes here.

1. Catalog every `(namespace, serviceaccount)` that needs an identity

Before writing any HCL, run this against every cluster in the fleet and reconcile the output:

kubectl get azureidentitybinding -A -o jsonpath='{range .items[*]}{.metadata.namespace}{"/"}{.spec.azureIdentity}{"\n"}{end}'

Across a dozen clusters this is rarely identical — there are ad-hoc bindings someone added during an incident two years ago that nobody removed. Each surviving binding becomes a federatedIdentityCredential on the new managed identity. Underestimate this catalog and the migration stalls in the gap between “Pod Identity removed” and “Workload Identity wired up for the long-tail workloads.”

2. The OIDC issuer URL is per-cluster and changes when the cluster does

The federation entries pin to the cluster’s oidcIssuerProfile.issuerUrl. That URL is per-cluster and is regenerated when the cluster is rebuilt — for example, during a region rebuild for DR drills, or when an environment is recreated from Terraform with a different name. The federation entries do not auto-update.

The mitigation that actually holds up: drive the federation entries from Terraform in the same module as the cluster, with oidc_issuer_url as the input — never paste it as a string anywhere. The trust then re-establishes on every terraform apply. Document this as the canonical pattern; teams copying YAML from a runbook and pasting issuer URLs are the source of the next P1 incident.

3. Test the cutover under load, not at idle

The migration is not “Pod Identity off, Workload Identity on” in one window. The two systems coexist as long as both are installed, so the safer pattern is per-namespace cutover: enable Workload Identity, redeploy the workload, verify the pod gets a token via the new path, then remove the aadpodidbinding label.

The trap: at idle, both systems work. Under load — HPA scaling out, Karpenter provisioning new nodes, image pulls warming caches — the failure modes that make Pod Identity painful in the first place reappear in the cutover window itself. Test the cutover on a workload that is actively scaling, not a one-replica deployment. The findings will be different.

What to plan differently next time

If running this migration again on a fresh estate, two architectural decisions would change up front:

One managed identity per workload, not per cluster. The diagram above has one identity per region, which works for a single workload. For an estate with many workloads, the cleaner model is one managed identity per workload with federation entries for both regional clusters on that single identity. Regional isolation is preserved by RBAC scoping, not by identity duplication, and the count of federatedIdentityCredentials to keep in sync drops by ~50%.
Bake the GitOps reconciler into the migration, not after it. ArgoCD or Flux reconciling the federation entries from a single Git source of truth removes the cross-region drift failure mode entirely, because there’s no manual step that can desync. Doing this during the migration costs a day of setup and saves the entire ongoing operational tax.

Verdict

The migration is mandatory and the destination is genuinely better — projected tokens, no DaemonSet, explicit trust. The work that is not in the official migration guide is the catalog of long-tail bindings, the federation-URL stability problem, and the cutover-under-load discipline. Budget those three and the migration is a week per cluster of straightforward work. Skip them and it stretches into a quarter.