Automatic Cleanup
This document explains the automatic PVC cleanup feature that removes cloned PVCs after successful backup operations, helping to manage storage resources efficiently.
Overview
The automatic cleanup feature allows the DataMover Operator to automatically delete cloned PVCs after successful data synchronization. This helps prevent storage waste, reduces operational overhead, and maintains a clean Kubernetes environment.
Feature Configuration
Enabling Automatic Cleanup
Enable automatic cleanup in your DataMover specification:
apiVersion: datamover.a-cup-of.coffee/v1alpha1
kind: DataMover
metadata:
name: auto-cleanup-backup
spec:
sourcePvc: "app-data"
secretName: "storage-credentials"
deletePvcAfterBackup: true # Enable automatic cleanup
Disabling Automatic Cleanup
Keep cloned PVCs for manual management:
apiVersion: datamover.a-cup-of.coffee/v1alpha1
kind: DataMover
metadata:
name: manual-cleanup-backup
spec:
sourcePvc: "app-data"
secretName: "storage-credentials"
deletePvcAfterBackup: false # Disable automatic cleanup
Default Behavior: deletePvcAfterBackup: false
Cleanup Workflow
Phase Progression
When automatic cleanup is enabled, the DataMover follows this workflow:
- CreatingClonedPVC: Create clone of source PVC
- ClonedPVCReady: Clone is bound and ready
- CreatingPod: Execute rclone job for data sync
- CleaningUp: Delete cloned PVC (if
deletePvcAfterBackup: true
) - Completed: Operation finished successfully
Phase Progression Without Cleanup
When automatic cleanup is disabled:
- CreatingClonedPVC: Create clone of source PVC
- ClonedPVCReady: Clone is bound and ready
- CreatingPod: Execute rclone job for data sync
- Completed: Operation finished (clone PVC remains)
Cleanup Trigger
Cleanup is triggered only after:
- ✅ Successful job completion: Rclone job completes successfully
- ✅ Data sync verification: Sync operation reports success
- ✅ Status confirmation: Job status shows completion
Cleanup is NOT triggered when:
- ❌ Job fails: Any failure prevents cleanup
- ❌ Sync errors: Data synchronization errors prevent cleanup
- ❌ Operator errors: Internal operator errors prevent cleanup
Implementation Details
Cleanup Logic
The cleanup process involves:
// Simplified cleanup logic
func (r *DataMoverReconciler) cleanupClonedPVC(ctx context.Context, dm *datamoverv1alpha1.DataMover) {
if dm.Status.RestoredPVCName == "" {
// No PVC to cleanup
return
}
// Delete the cloned PVC
pvc := &corev1.PersistentVolumeClaim{
ObjectMeta: metav1.ObjectMeta{
Name: dm.Status.RestoredPVCName,
Namespace: dm.Namespace,
},
}
if err := r.Delete(ctx, pvc); err != nil {
// Handle deletion error
return err
}
// Record cleanup metrics
metrics.RecordCleanupOperation("success", dm.Namespace)
}
Error Handling
If cleanup fails:
- Retry: Cleanup is retried on subsequent reconciliation
- Logging: Failure is logged with detailed error information
- Metrics: Cleanup failure is recorded in metrics
- Status: DataMover phase remains "CleaningUp" until successful
Safety Mechanisms
The cleanup process includes safety checks:
- PVC Ownership: Only delete PVCs created by the operator
- Status Verification: Confirm successful job completion before cleanup
- Error Recovery: Handle partial cleanup scenarios gracefully
Benefits
Storage Management
Automatic cleanup provides:
- Cost Reduction: Eliminates storage costs for temporary clones
- Resource Efficiency: Prevents storage quota exhaustion
- Clean Environment: Maintains organized Kubernetes resources
Operational Benefits
- Reduced Manual Work: No need for manual PVC cleanup
- Consistent Behavior: Predictable resource lifecycle
- Automation: Fits well into automated backup workflows
Example Storage Savings
Consider a backup operation for a 100GB PVC:
Without Cleanup:
Original PVC: 100GB (permanent)
Clone PVC: 100GB (remains after backup)
Total Usage: 200GB
With Cleanup:
Original PVC: 100GB (permanent)
Clone PVC: 100GB (deleted after backup)
Total Usage: 100GB after completion
Savings: 50% storage reduction per backup operation
Monitoring Cleanup Operations
Metrics
The operator provides Prometheus metrics for cleanup operations:
# Cleanup operation counters
datamover_cleanup_operations_total{status="success", namespace="default"}
datamover_cleanup_operations_total{status="failure", namespace="default"}
# Phase duration including cleanup
datamover_phase_duration_seconds{phase="CleaningUp", namespace="default"}
Status Tracking
Monitor cleanup through DataMover status:
# Watch cleanup progress
kubectl get datamover my-backup -w
# Check detailed status
kubectl describe datamover my-backup
Expected output during cleanup:
status:
phase: "CleaningUp"
restoredPvcName: "restored-app-data-20240806143052"
Logging
Monitor cleanup operations through operator logs:
# View cleanup logs
kubectl logs -n datamover-operator-system deployment/datamover-operator-controller-manager | grep -i cleanup
# Example log entries
# INFO Cleaning up cloned PVC {"pvc": "restored-app-data-20240806143052"}
# INFO Successfully deleted cloned PVC {"pvc": "restored-app-data-20240806143052"}
Use Cases
1. Automated Backup Workflows
Perfect for scheduled backups where clones are temporary:
apiVersion: datamover.a-cup-of.coffee/v1alpha1
kind: DataMover
metadata:
name: nightly-backup
spec:
sourcePvc: "production-data"
secretName: "backup-credentials"
deletePvcAfterBackup: true
addTimestampPrefix: true
Workflow: 1. Clone production PVC 2. Sync to timestamped backup location 3. Automatically delete clone 4. Preserve only original PVC
2. Development Environment Snapshots
For development workflows where clones are not needed after sync:
apiVersion: datamover.a-cup-of.coffee/v1alpha1
kind: DataMover
metadata:
name: dev-snapshot
spec:
sourcePvc: "dev-workspace"
secretName: "dev-storage"
deletePvcAfterBackup: true
3. Compliance Backups
For compliance where only the backup copy matters:
apiVersion: datamover.a-cup-of.coffee/v1alpha1
kind: DataMover
metadata:
name: compliance-backup
spec:
sourcePvc: "financial-records"
secretName: "compliance-storage"
deletePvcAfterBackup: true
addTimestampPrefix: true
When NOT to Use Cleanup
1. Clone Analysis Workflows
When you need to analyze or compare cloned data:
apiVersion: datamover.a-cup-of.coffee/v1alpha1
kind: DataMover
metadata:
name: data-analysis
spec:
sourcePvc: "production-data"
secretName: "storage-credentials"
deletePvcAfterBackup: false # Keep clone for analysis
2. Multi-Stage Backups
When clones are used in multiple backup stages:
# First stage: Create clone and initial backup
apiVersion: datamover.a-cup-of.coffee/v1alpha1
kind: DataMover
metadata:
name: stage1-backup
spec:
sourcePvc: "app-data"
secretName: "primary-storage"
deletePvcAfterBackup: false # Keep for stage 2
# Second stage: Use same clone for secondary backup
# (would reference the same cloned PVC)
3. Debugging Scenarios
When troubleshooting backup issues:
apiVersion: datamover.a-cup-of.coffee/v1alpha1
kind: DataMover
metadata:
name: debug-backup
spec:
sourcePvc: "problematic-data"
secretName: "storage-credentials"
deletePvcAfterBackup: false # Keep clone for debugging
Troubleshooting
Common Issues
1. Cleanup Stuck in Progress
Symptoms: DataMover phase remains "CleaningUp"
Possible Causes: - PVC has active pod attachments - PVC finalizers preventing deletion - RBAC permission issues
Diagnosis:
# Check PVC status
kubectl get pvc <cloned-pvc-name>
# Check for attached pods
kubectl get pods --all-namespaces -o wide | grep <cloned-pvc-name>
# Check PVC finalizers
kubectl get pvc <cloned-pvc-name> -o yaml | grep finalizers
# Check operator permissions
kubectl auth can-i delete persistentvolumeclaims --as=system:serviceaccount:datamover-operator-system:datamover-operator-controller-manager
2. Cleanup Fails After Successful Sync
Symptoms: Job succeeds but cleanup fails
Possible Causes: - PVC in use by other processes - Storage class deletion policies - Volume attachment issues
Solutions:
# Force PVC deletion (if safe)
kubectl patch pvc <cloned-pvc-name> -p '{"metadata":{"finalizers":[]}}' --type=merge
# Check for volume attachments
kubectl get volumeattachment | grep <pv-name>
3. Metrics Not Recording Cleanup
Symptoms: Cleanup happens but metrics not updated
Diagnosis:
# Check operator logs for metric errors
kubectl logs -n datamover-operator-system deployment/datamover-operator-controller-manager | grep -i metric
# Verify Prometheus scraping
curl http://operator-metrics-service:8080/metrics | grep cleanup
Debug Commands
# Monitor cleanup process
kubectl get datamover <name> -w
# Check cleanup logs
kubectl logs -n datamover-operator-system deployment/datamover-operator-controller-manager | grep -i "cleanup\|delete"
# List PVCs created by operator
kubectl get pvc -l app.kubernetes.io/created-by=datamover-operator
# Check PVC deletion events
kubectl get events --field-selector involvedObject.kind=PersistentVolumeClaim
Best Practices
1. Resource Planning
Consider cleanup in resource planning:
- Temporary Storage: Plan for peak usage during clone creation
- Cleanup Timing: Consider cleanup duration in scheduling
- Quota Management: Account for temporary storage quota usage
2. Monitoring
Set up monitoring for cleanup operations:
# Example Prometheus alert
groups:
- name: datamover.cleanup
rules:
- alert: DataMoverCleanupFailing
expr: increase(datamover_cleanup_operations_total{status="failure"}[5m]) > 0
for: 0m
labels:
severity: warning
annotations:
summary: "DataMover cleanup operations are failing"
description: "DataMover cleanup failures in namespace {{ $labels.namespace }}"
3. Backup Verification
Always verify backup success before cleanup:
# Verify backup exists before cleanup completes
rclone lsd s3:my-bucket/ | grep $(date +%Y-%m-%d)
4. Testing
Test cleanup behavior:
# Test cleanup with small PVC
apiVersion: datamover.a-cup-of.coffee/v1alpha1
kind: DataMover
metadata:
name: cleanup-test
spec:
sourcePvc: "test-data"
secretName: "test-credentials"
deletePvcAfterBackup: true
5. Documentation
Document cleanup policies in your backup procedures:
- When cleanup is enabled/disabled
- Storage impact of cleanup decisions
- Recovery procedures if cleanup fails
Advanced Scenarios
Conditional Cleanup
Implement conditional cleanup based on backup verification:
# Example: Only cleanup after backup verification
apiVersion: datamover.a-cup-of.coffee/v1alpha1
kind: DataMover
metadata:
name: verified-cleanup
spec:
sourcePvc: "critical-data"
secretName: "storage-credentials"
deletePvcAfterBackup: true
additionalEnv:
- name: "VERIFY_BACKUP"
value: "true"
Multi-Destination Cleanup
When backing up to multiple destinations:
# Primary backup with cleanup
apiVersion: datamover.a-cup-of.coffee/v1alpha1
kind: DataMover
metadata:
name: primary-backup
spec:
sourcePvc: "important-data"
secretName: "primary-storage"
deletePvcAfterBackup: false # Keep for secondary backup
# Secondary backup without cleanup
apiVersion: datamover.a-cup-of.coffee/v1alpha1
kind: DataMover
metadata:
name: secondary-backup
spec:
sourcePvc: "important-data" # Same source
secretName: "secondary-storage"
deletePvcAfterBackup: true # Cleanup after both complete
Cleanup with Lifecycle Management
Integrate with external lifecycle management:
#!/bin/bash
# External cleanup verification script
NAMESPACE="default"
DATAMOVER_NAME="my-backup"
# Wait for backup completion
kubectl wait --for=condition=Complete datamover/$DATAMOVER_NAME -n $NAMESPACE --timeout=3600s
# Verify backup in storage
if rclone check s3:my-bucket/latest/ /verify/path/; then
echo "Backup verified, cleanup can proceed"
else
echo "Backup verification failed, manual intervention required"
exit 1
fi
This comprehensive documentation covers all aspects of the automatic cleanup feature, providing users with the knowledge needed to effectively use and troubleshoot this functionality.