Setting Up Velero for AKS Cluster Backups with Azure Workload Identity



This content originally appeared on DEV Community and was authored by Taylor Levits

I initially thought that taking cluster backups shouldn’t need to be a thing, especially if you generally follow IaC principals. However, in the case of backing up persistent storage data, point in time recovery, compliance, or simply peace of mind, cluster backups are extremely useful. For us, our current setup has application deployment is done through a self-managed CLI but microservices managed via IaC. If the cluster were to go down or be deleted, about 70+ apps would need to be redeployed manually.

I recently completed the task of moving all of our applications from using the long-deprecated AAD Pod Identity to Azure Workload Identity for Azure resource authentication. I did this because:

  1. AAD Pod Identities have been deprecated since late 2023
  2. I wanted to upgrade our cluster to 1.31 and was tired of performing a backup using a hack described soon in this post

I normally like to take a backup of the cluster before performing a Kubernetes upgrade, and we’ve used AKS backup in the past however…

The Problem with AKS Native Backup

Previously, we would hack our way through AKS backups by temporarily enabling pod identity (we installed it through helm, not through AKS) then disabling it for the upgrade process:

az aks update --enable-pod-identity --resource-group <resource-group> --name <cluster-name>
az aks pod-identity exception add --resource-group <resource-group> --cluster-name <cluster-name> --namespace dataprotection-microsoft --pod-labels app.kubernetes.io/name=dataprotection-microsoft-kubernetes
kubectl get azurepodidentityexceptions --all-namespaces
az aks update --resource-group <resource-group> --name <cluster-name> --enable-managed-identity

Then perform the backup, followed by:

az aks update --disable-pod-identity --resource-group <resource-group> --name <cluster-name>

This approach is not sustainable! After migrating to Workload Identity, I attempted to use AKS backup and encountered this error:

Error Code: UserErrorGenericPodIdentityMisconfiguration
Message: The AKS Backup extension is unable to use its managed identity because the AAD pod-managed identity is not properly configured.

Even though we no longer use pod identities and Microsoft recommends using Workload Identity, AKS backup still appears to depend on the deprecated pod identity system. It’s hard to find documentation on this, but it seems like in order AKS backups work, I would have to append the pods with the necessary workload identity labels/service accounts.

Why Velero?

I considered two options:

  1. Manually create a service account, managed identity, and federated credentials, then manually annotate all dataprotection pods. If this is indeed the solution, it would be difficult to store this configuration in IaC since it’s a managed Microsoft solution.
  2. Use Velero. A popular, open-source backup service with lots of documentation and excellent examples.

I chose Velero because I can manage it through Helm and ArgoCD, rather than manual annotations and configurations, making it easier to document and maintain.

Installing and Configuring Velero in AKS

Prerequisites

  • Azure CLI installed and configured
  • kubectl configured for your AKS cluster
  • Appropriate permissions to create Azure resources

Step 1: Create Storage Account and Container

First, create the storage account and blob container where backups will be saved:

# Create storage account
az storage account create \
    --name <storage-account-name> \
    --resource-group <resource-group> \
    --location <location> \
    --sku Standard_LRS

# Create container
az storage container create \
    --name velero-backups \
    --public-access off \
    --account-name <storage-account-name>

Step 2: Create Custom Role for Velero

Define the permissions Velero needs to perform backups and restores:

AZURE_ROLE=Velero
AZURE_SUBSCRIPTION_ID=$(az account list --query '[?isDefault].id' -o tsv)
AZURE_BLOB_STORAGE_RESOURCE_GROUP=<blob-storage-resource-group>
AKS_RESOURCE_GROUP=<aks-resource-group, this generally starts with MC_>

az role definition create --role-definition '{
   "Name": "'$AZURE_ROLE'",
   "Description": "Velero related permissions to perform backups, restores and deletions",
   "Actions": [
       "Microsoft.Compute/disks/read",
       "Microsoft.Compute/disks/write",
       "Microsoft.Compute/disks/endGetAccess/action",
       "Microsoft.Compute/disks/beginGetAccess/action",
       "Microsoft.Compute/snapshots/read",
       "Microsoft.Compute/snapshots/write",
       "Microsoft.Compute/snapshots/delete",
       "Microsoft.Storage/storageAccounts/listkeys/action",
       "Microsoft.Storage/storageAccounts/regeneratekey/action",
       "Microsoft.Storage/storageAccounts/read",
       "Microsoft.Storage/storageAccounts/blobServices/containers/delete",
       "Microsoft.Storage/storageAccounts/blobServices/containers/read",
       "Microsoft.Storage/storageAccounts/blobServices/containers/write",
       "Microsoft.Storage/storageAccounts/blobServices/generateUserDelegationKey/action"
   ],
   "DataActions": [
     "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
     "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
     "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
     "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action",
     "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action"
   ],
   "AssignableScopes": [
     "/subscriptions/'$AZURE_SUBSCRIPTION_ID'/resourceGroups/'$AZURE_BLOB_STORAGE_RESOURCE_GROUP'",
     "/subscriptions/'$AZURE_SUBSCRIPTION_ID'/resourceGroups/'$AKS_RESOURCE_GROUP'"
   ]
}'

Step 3: Create Managed Identity

Create the managed identity that Velero will use:

IDENTITY_NAME=velero
AZURE_RESOURCE_GROUP=<resource-group>

az identity create \
    --subscription $AZURE_SUBSCRIPTION_ID \
    --resource-group $AZURE_RESOURCE_GROUP \
    --name $IDENTITY_NAME

IDENTITY_CLIENT_ID=$(az identity show -g $AZURE_RESOURCE_GROUP -n $IDENTITY_NAME --subscription $AZURE_SUBSCRIPTION_ID --query clientId -o tsv)

Step 4: Assign Role to Managed Identity

Grant the managed identity the necessary permissions to access blob storage:

PRINCIPAL_ID=$(az identity show --name $IDENTITY_NAME --resource-group $AZURE_RESOURCE_GROUP --query principalId -o tsv)

az role assignment create \
    --role $AZURE_ROLE \
    --assignee $PRINCIPAL_ID \
    --scope "/subscriptions/$AZURE_SUBSCRIPTION_ID/resourceGroups/$AZURE_RESOURCE_GROUP"

az role assignment create \
    --role $AZURE_ROLE \
    --assignee $PRINCIPAL_ID \
    --scope "/subscriptions/$AZURE_SUBSCRIPTION_ID/resourceGroups/$AKS_RESOURCE_GROUP"

Step 5: Deploy Velero with Helm

This is our general folder configuration to deploy with Helm:

core/
└── velero/
    ├── Chart.yaml
    └── values.yaml

Chart.yaml:

apiVersion: v2
name: velero
description: Velero Backup and Restore
type: application
version: 0.1.0
appVersion: "1.16.1"
dependencies:
- name: velero
  version: 1.16.1
  repository: https://vmware-tanzu.github.io/helm-charts

values.yaml:

velero:
  podLabels:
    azure.workload.identity/use: "true"

  initContainers:
    - name: velero-plugin-for-microsoft-azure
      image: velero/velero-plugin-for-microsoft-azure:v1.12.1
      imagePullPolicy: IfNotPresent
      volumeMounts:
        - mountPath: /target
          name: plugins

  configuration:
    backupStorageLocation:
      - name: azure
        provider: azure
        bucket: velero-backups
        default: true
        config:
          useAAD: true
          resourceGroup: "<resource-group>"
          storageAccount: "<storage-account-name>"
          subscriptionId: "<subscription-id>"

    volumeSnapshotLocation:
      - name: azure
        provider: azure
        config:
          resourceGroup: "<resource-group>"
          subscriptionId: "<subscription-id>"

  serviceAccount:
    server:
      create: true
      name: velero
      annotations:
        azure.workload.identity/client-id: "<client-id-from-managed-identity>"
        azure.workload.identity/tenant-id: "<tenant-id>"
      labels:
        azure.workload.identity/use: "true"

  credentials:
    secretContents:
      cloud: |
        AZURE_SUBSCRIPTION_ID=<subscription-id>
        AZURE_RESOURCE_GROUP=<resource-group>
        AZURE_CLOUD_NAME=AzurePublicCloud

Step 6: Create Federated Credential

After deploying Velero, create the federated credential to enable Workload Identity:

az identity federated-credential create \
    --name "kubernetes-federated-credential" \
    --identity-name "${IDENTITY_NAME}" \
    --resource-group "${AZURE_RESOURCE_GROUP}" \
    --issuer $(az aks show --name <cluster-name> --resource-group <resource-group> --query "oidcIssuerProfile.issuerUrl" -o tsv) \
    --subject "system:serviceaccount:velero:velero" \
    --audience api://AzureADTokenExchange

Testing Your Velero Installation

1. Create a Test Application

Install Velero according to the official documentation.

Create a simple test application to backup:

# Create a test namespace
kubectl create namespace velero-test

# Create a simple deployment
kubectl create deployment nginx-test --image=nginx -n velero-test

# Create a configmap with some data
kubectl create configmap test-config \
    --from-literal=key1=value1 \
    --from-literal=key2=value2 \
    -n velero-test

2. Perform Different Types of Backups

Basic Namespace Backup:

# Backup specific namespace
velero backup create test-backup-namespace --include-namespaces velero-test

# Check backup status
velero backup describe test-backup-namespace

Cluster-wide Backup:

# Backup entire cluster
velero backup create test-backup-cluster

# Check status
velero backup get

Backup with Labels:

# Backup resources with specific labels
velero backup create test-backup-labels --selector app=nginx-test

Scheduled Backup:

# Create a daily backup schedule
velero schedule create daily-backup \
    --schedule="0 2 * * *" \
    --include-namespaces velero-test

3. Monitor the Backup

# Watch backup progress
velero backup get

# Get detailed backup information
velero backup describe test-backup-namespace --details

# Check backup logs
velero backup logs test-backup-namespace

4. Verify Backup in Azure Storage

Check your Azure storage account to confirm backup files were created:

# List containers in your storage account
az storage container list --account-name <storage-account-name> --output table

# List backup files
az storage blob list \
    --account-name <storage-account-name> \
    --container-name velero-backups \
    --output table

5. Test Restore (Recommended)

To fully validate your backup setup:

# Delete the test namespace
kubectl delete namespace velero-test

# Restore from backup
velero restore create test-restore --from-backup test-backup-namespace

# Check restore status
velero restore describe test-restore

# Verify the namespace and resources were restored
kubectl get all -n velero-test

Conclusion

Velero provides a robust, GitOps-friendly alternative to AKS Backup, especially using Azure Workload Identity to provide cluster backups. The CLI is intuitive and allows your the flexibility to schedule, control what you’re backing up, and allow you to configure your setup.

Please reach out if you have any questions!

References


This content originally appeared on DEV Community and was authored by Taylor Levits