This content originally appeared on DEV Community and was authored by Taylor Levits
I initially thought that taking cluster backups shouldn’t need to be a thing, especially if you generally follow IaC principals. However, in the case of backing up persistent storage data, point in time recovery, compliance, or simply peace of mind, cluster backups are extremely useful. For us, our current setup has application deployment is done through a self-managed CLI but microservices managed via IaC. If the cluster were to go down or be deleted, about 70+ apps would need to be redeployed manually.
I recently completed the task of moving all of our applications from using the long-deprecated AAD Pod Identity to Azure Workload Identity for Azure resource authentication. I did this because:
- AAD Pod Identities have been deprecated since late 2023
- I wanted to upgrade our cluster to 1.31 and was tired of performing a backup using a hack described soon in this post
I normally like to take a backup of the cluster before performing a Kubernetes upgrade, and we’ve used AKS backup in the past however…
The Problem with AKS Native Backup
Previously, we would hack our way through AKS backups by temporarily enabling pod identity (we installed it through helm, not through AKS) then disabling it for the upgrade process:
az aks update --enable-pod-identity --resource-group <resource-group> --name <cluster-name>
az aks pod-identity exception add --resource-group <resource-group> --cluster-name <cluster-name> --namespace dataprotection-microsoft --pod-labels app.kubernetes.io/name=dataprotection-microsoft-kubernetes
kubectl get azurepodidentityexceptions --all-namespaces
az aks update --resource-group <resource-group> --name <cluster-name> --enable-managed-identity
Then perform the backup, followed by:
az aks update --disable-pod-identity --resource-group <resource-group> --name <cluster-name>
This approach is not sustainable! After migrating to Workload Identity, I attempted to use AKS backup and encountered this error:
Error Code: UserErrorGenericPodIdentityMisconfiguration
Message: The AKS Backup extension is unable to use its managed identity because the AAD pod-managed identity is not properly configured.
Even though we no longer use pod identities and Microsoft recommends using Workload Identity, AKS backup still appears to depend on the deprecated pod identity system. It’s hard to find documentation on this, but it seems like in order AKS backups work, I would have to append the pods with the necessary workload identity labels/service accounts.
Why Velero?
I considered two options:
- Manually create a service account, managed identity, and federated credentials, then manually annotate all dataprotection pods. If this is indeed the solution, it would be difficult to store this configuration in IaC since it’s a managed Microsoft solution.
- Use Velero. A popular, open-source backup service with lots of documentation and excellent examples.
I chose Velero because I can manage it through Helm and ArgoCD, rather than manual annotations and configurations, making it easier to document and maintain.
Installing and Configuring Velero in AKS
Prerequisites
- Azure CLI installed and configured
- kubectl configured for your AKS cluster
- Appropriate permissions to create Azure resources
Step 1: Create Storage Account and Container
First, create the storage account and blob container where backups will be saved:
# Create storage account
az storage account create \
--name <storage-account-name> \
--resource-group <resource-group> \
--location <location> \
--sku Standard_LRS
# Create container
az storage container create \
--name velero-backups \
--public-access off \
--account-name <storage-account-name>
Step 2: Create Custom Role for Velero
Define the permissions Velero needs to perform backups and restores:
AZURE_ROLE=Velero
AZURE_SUBSCRIPTION_ID=$(az account list --query '[?isDefault].id' -o tsv)
AZURE_BLOB_STORAGE_RESOURCE_GROUP=<blob-storage-resource-group>
AKS_RESOURCE_GROUP=<aks-resource-group, this generally starts with MC_>
az role definition create --role-definition '{
"Name": "'$AZURE_ROLE'",
"Description": "Velero related permissions to perform backups, restores and deletions",
"Actions": [
"Microsoft.Compute/disks/read",
"Microsoft.Compute/disks/write",
"Microsoft.Compute/disks/endGetAccess/action",
"Microsoft.Compute/disks/beginGetAccess/action",
"Microsoft.Compute/snapshots/read",
"Microsoft.Compute/snapshots/write",
"Microsoft.Compute/snapshots/delete",
"Microsoft.Storage/storageAccounts/listkeys/action",
"Microsoft.Storage/storageAccounts/regeneratekey/action",
"Microsoft.Storage/storageAccounts/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/delete",
"Microsoft.Storage/storageAccounts/blobServices/containers/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/write",
"Microsoft.Storage/storageAccounts/blobServices/generateUserDelegationKey/action"
],
"DataActions": [
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action"
],
"AssignableScopes": [
"/subscriptions/'$AZURE_SUBSCRIPTION_ID'/resourceGroups/'$AZURE_BLOB_STORAGE_RESOURCE_GROUP'",
"/subscriptions/'$AZURE_SUBSCRIPTION_ID'/resourceGroups/'$AKS_RESOURCE_GROUP'"
]
}'
Step 3: Create Managed Identity
Create the managed identity that Velero will use:
IDENTITY_NAME=velero
AZURE_RESOURCE_GROUP=<resource-group>
az identity create \
--subscription $AZURE_SUBSCRIPTION_ID \
--resource-group $AZURE_RESOURCE_GROUP \
--name $IDENTITY_NAME
IDENTITY_CLIENT_ID=$(az identity show -g $AZURE_RESOURCE_GROUP -n $IDENTITY_NAME --subscription $AZURE_SUBSCRIPTION_ID --query clientId -o tsv)
Step 4: Assign Role to Managed Identity
Grant the managed identity the necessary permissions to access blob storage:
PRINCIPAL_ID=$(az identity show --name $IDENTITY_NAME --resource-group $AZURE_RESOURCE_GROUP --query principalId -o tsv)
az role assignment create \
--role $AZURE_ROLE \
--assignee $PRINCIPAL_ID \
--scope "/subscriptions/$AZURE_SUBSCRIPTION_ID/resourceGroups/$AZURE_RESOURCE_GROUP"
az role assignment create \
--role $AZURE_ROLE \
--assignee $PRINCIPAL_ID \
--scope "/subscriptions/$AZURE_SUBSCRIPTION_ID/resourceGroups/$AKS_RESOURCE_GROUP"
Step 5: Deploy Velero with Helm
This is our general folder configuration to deploy with Helm:
core/
└── velero/
├── Chart.yaml
└── values.yaml
Chart.yaml:
apiVersion: v2
name: velero
description: Velero Backup and Restore
type: application
version: 0.1.0
appVersion: "1.16.1"
dependencies:
- name: velero
version: 1.16.1
repository: https://vmware-tanzu.github.io/helm-charts
values.yaml:
velero:
podLabels:
azure.workload.identity/use: "true"
initContainers:
- name: velero-plugin-for-microsoft-azure
image: velero/velero-plugin-for-microsoft-azure:v1.12.1
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /target
name: plugins
configuration:
backupStorageLocation:
- name: azure
provider: azure
bucket: velero-backups
default: true
config:
useAAD: true
resourceGroup: "<resource-group>"
storageAccount: "<storage-account-name>"
subscriptionId: "<subscription-id>"
volumeSnapshotLocation:
- name: azure
provider: azure
config:
resourceGroup: "<resource-group>"
subscriptionId: "<subscription-id>"
serviceAccount:
server:
create: true
name: velero
annotations:
azure.workload.identity/client-id: "<client-id-from-managed-identity>"
azure.workload.identity/tenant-id: "<tenant-id>"
labels:
azure.workload.identity/use: "true"
credentials:
secretContents:
cloud: |
AZURE_SUBSCRIPTION_ID=<subscription-id>
AZURE_RESOURCE_GROUP=<resource-group>
AZURE_CLOUD_NAME=AzurePublicCloud
Step 6: Create Federated Credential
After deploying Velero, create the federated credential to enable Workload Identity:
az identity federated-credential create \
--name "kubernetes-federated-credential" \
--identity-name "${IDENTITY_NAME}" \
--resource-group "${AZURE_RESOURCE_GROUP}" \
--issuer $(az aks show --name <cluster-name> --resource-group <resource-group> --query "oidcIssuerProfile.issuerUrl" -o tsv) \
--subject "system:serviceaccount:velero:velero" \
--audience api://AzureADTokenExchange
Testing Your Velero Installation
1. Create a Test Application
Install Velero according to the official documentation.
Create a simple test application to backup:
# Create a test namespace
kubectl create namespace velero-test
# Create a simple deployment
kubectl create deployment nginx-test --image=nginx -n velero-test
# Create a configmap with some data
kubectl create configmap test-config \
--from-literal=key1=value1 \
--from-literal=key2=value2 \
-n velero-test
2. Perform Different Types of Backups
Basic Namespace Backup:
# Backup specific namespace
velero backup create test-backup-namespace --include-namespaces velero-test
# Check backup status
velero backup describe test-backup-namespace
Cluster-wide Backup:
# Backup entire cluster
velero backup create test-backup-cluster
# Check status
velero backup get
Backup with Labels:
# Backup resources with specific labels
velero backup create test-backup-labels --selector app=nginx-test
Scheduled Backup:
# Create a daily backup schedule
velero schedule create daily-backup \
--schedule="0 2 * * *" \
--include-namespaces velero-test
3. Monitor the Backup
# Watch backup progress
velero backup get
# Get detailed backup information
velero backup describe test-backup-namespace --details
# Check backup logs
velero backup logs test-backup-namespace
4. Verify Backup in Azure Storage
Check your Azure storage account to confirm backup files were created:
# List containers in your storage account
az storage container list --account-name <storage-account-name> --output table
# List backup files
az storage blob list \
--account-name <storage-account-name> \
--container-name velero-backups \
--output table
5. Test Restore (Recommended)
To fully validate your backup setup:
# Delete the test namespace
kubectl delete namespace velero-test
# Restore from backup
velero restore create test-restore --from-backup test-backup-namespace
# Check restore status
velero restore describe test-restore
# Verify the namespace and resources were restored
kubectl get all -n velero-test
Conclusion
Velero provides a robust, GitOps-friendly alternative to AKS Backup, especially using Azure Workload Identity to provide cluster backups. The CLI is intuitive and allows your the flexibility to schedule, control what you’re backing up, and allow you to configure your setup.
Please reach out if you have any questions!
References
- Velero Documentation
- Velero Azure Plugin
- Azure Workload Identity Documentation
- AKS Backup Overview
- Azure CLI Reference
- Kubernetes Documentation
- Helm Documentation
- ArgoCD Documentation
This content originally appeared on DEV Community and was authored by Taylor Levits