OperationsUpdated July 3, 2026

ANF — DR Testing, Cutover, and Failback Guide

azurenetappanfdisaster-recoveryepicfailoverrunbook

ANF — DR Testing, Cutover, and Failback Guide

Operational runbook for testing, cutting over to DR, and failing back ANF volumes.

🎯 Overview

This runbook provides step-by-step procedures for Disaster Recovery (DR) testing, manual cutover, and failback for Azure NetApp Files (ANF) volumes supporting Epic environments. It ensures that DR events are executed with minimal risk and downtime, maintaining data integrity and compliance.

Strategic Benefits:

Operational Resilience: Ensures Epic data and services remain available during regional outages or planned DR events.
Regulatory Compliance: Supports DR testing evidence for audits (e.g., HIPAA, SOX).
Controlled Failover/Failback: Mitigates split-brain risk and preserves data consistency.

📋 Process Classification

Phase	Scope	Purpose	Governance Level
DR Test Preparation	All Epic ANF Volumes	Ensure readiness for DR drill	Mandatory
DR Cutover	Target (DR) Region	Enable Epic services from DR site	Controlled
DR Failback	Source (Primary) Region	Restore Epic services to original region	Controlled

📝 Step-by-Step Procedures

1. Prepare For DR Testing

ANF Account/Volume: ohemr-anf-west-epic-pro-wus3-001

1.1 Check Replication Status in Azure Portal

Navigate to Azure NetApp Files → Volumes.
Select the volume and review the Replication tab.
Ensure Relationship is Healthy and Last sync is recent.

1.2 Notify Stakeholders About Downtime Impact

Identify all applications, teams, and business owners impacted.
Define the DR test window, scope, and expected user impact.

1.3 Identify DNS Records That Need Updating

List client-facing hostnames used by the share/mount path.
Locate associated CNAME (preferred) or A records in DNS.
Document current values and TTL; lower TTL (e.g., 60s) for test window.

2. Begin DR Manual Cutover

ANF Account / Volume: ohemr-anf-epic-pro-cus-001

2.1 Break Replication in DR Region (Destination Volume)

Go to the DR (secondary) volume in the target region.
Open the Replication tab and select Break Replication (aka Break Peering).
Wait for the status to show Replication Broken and volume as Online (writable).

💡 Note: Breaking replication makes the secondary volume writable for DR operations.

2.2 Reconfigure Protocol Access

CIFS/SMB: Ensure Active Directory (AD) is configured for the DR region.
Verify Share Permissions and NTFS ACLs (should be preserved by CRR).
Test access from a client in the DR network.

2.3 DNS Updates to Point to DR ANF

Redirect clients to DR endpoint; avoid manual mount path changes.

2.4 Validate DR Access

Test mounting/accessing the DR volume from multiple clients.
Verify data consistency at last sync point.
Ensure application services are functioning from DR site.

2.5 Post-DR Operations

Monitor DR volume capacity and performance.
Keep primary volume read-only or offline to prevent split-brain writes.
Plan for reverse replication if primary is restored.

3. Begin Failback Manually

ANF Account / Volume: ohemr-anf-west-epic-pro-wus3-001

3.1 Reverse Resync to Reactivate Source Volume

Select the source volume in Azure.
Open Replication and select Reverse Resync.
Confirm prompt and monitor health status until stable.

3.2 Reestablish Source-to-Destination Replication

On the destination volume, open Replication.
Confirm Mirror State is Mirrored and Relationship Status is Idle.
Select Break Peering and confirm.
Remount the source volume for client access if necessary.

3.3 Resync the Source Volume with the Destination Volume

On the destination volume, select Reverse Resync to restore normal replication direction.

🏥 Healthcare-Specific Considerations

PHI Data: Ensure DR/Failback processes do not expose PHI to unauthorized environments.
Audit Trail: Retain logs of DR events for compliance (HIPAA, SOX).
Testing Frequency: Schedule DR tests per regulatory and internal policy.

🔧 Implementation Guidelines

Azure Portal Operations

Use the Replication tab for all ANF volume replication actions.
Confirm volume status after each step before proceeding.

DNS Management

Lower DNS TTL before DR events for faster client redirection.
Document changes and revert TTL to standard value after event.

Access Validation

Test both Windows (CIFS/SMB) and Linux (NFS) clients as applicable.
Validate application-level access, not just share mounts.

📊 Monitoring & Reporting

Replication Health: Use Azure Portal or CLI to monitor replication status.
Capacity/Performance: Monitor DR volumes for IOPS, latency, and space utilization during DR event.
Event Logging: Track all steps and changes for audit purposes.

🔍 Compliance Validation

DR Drill Evidence: Archive runbook execution logs and notifications.
PHI Handling: Ensure no DR operations violate HIPAA or internal data handling policies.
Change Control: All DNS and storage changes must follow established change management procedures.

🚨 Troubleshooting Guide

Common Issues

Problem	Diagnosis	Resolution
Replication fails to break	Volume busy or Azure API issue	Retry; check portal for active connections or errors
Access fails in DR	AD not configured or permissions not synced	Validate AD integration, review ACLs
DNS redirection delayed	TTL not lowered or cached values	Lower TTL in advance; flush client DNS cache
Split-brain risk	Primary is still writable	Set primary to read-only/offline before DR cutover

🔗 Related Documentation

Epic Architecture Requirements: Storage and DR architecture details
Operational Procedures: Standard operating procedures for Epic on Azure
Security Baseline: Security and compliance controls for DR operations

📞 Support & Contacts

Domain	Contact	Responsibility
ANF/DR Operations	[email protected]	Storage operations and runbook execution
DNS Management	[email protected]	DNS changes and troubleshooting
Compliance Audit	[email protected]	DR test evidence and regulatory reporting
Epic Application	[email protected]	Application validation in DR/failback

🗂️ DR Excellence: Reliable DR and failback processes minimize downtime, ensure data integrity, and support regulatory compliance for Epic healthcare infrastructure.