Navigation
GuidesUpdated July 3, 2026

SQL Server Always On Availability Group (AAG) Deployment

guidesql-serveraagavailability-grouphigh-availabilityansibleautomationepicmpsqlcogitofailover-clusterazure

Operational Documentation: SQL Server Always On Availability Group (AAG) Deployment

1. Purpose and Scope

The Epic AAG playbook automates provisioning and configuration of two Windows-based Microsoft SQL Server instances participating in a Windows Failover Cluster and Always On Availability Group (AAG) for Epic Systems workloads on Azure. It supports:

  • High availability for:
    • Multi-Purpose SQL (MPSQL) – used by Epic “central services.”
    • Cogito platform (Clarity, Caboodle, SSIS workloads).
  • Secure dependency retrieval (vault secrets, JFrog-hosted artifacts)
  • Consistent OS, SQL Server instance, clustering, replication, and post-install tuning.

This playbook orchestrates three major operational phases:

  1. Secret and prerequisite bootstrap (Python module + Vault secrets)
  2. Core provisioning (OS base config, SQL Server install, AAG formation)
  3. Post-installation server hardening and application-oriented tuning (power, networking, tools, Defender exclusions)

It leverages modular roles, with the heaviest logic inside the database_setup role (implemented in the ohemr-ansible-role-db-install-config repository). That role handles both single-instance and AAG-capable deployments, dynamically branching when AAG variables are present.


2. High-Level Workflow Summary

PhasePlay NameHostsPrivilegeKey Outcomes
1Load HashiVault Secretsallno becomeInstall hvac, retrieve secrets (JFrog token, credentials)
2Gather Facts (No Become)allno becomeCollect Windows/system facts for later logic
3SQL Server and AAG Installallbecome as sql_install_accountApply roles: optum, os_base_configuration, database_setup
4Server and App Configurationallno becomePost-flight tuning (power plan, dump settings, keepalive, tooling, Defender exclusions)

3. Detailed Playbook Breakdown (pb_aag.yml)

3.1 Bootstrap & Secrets

  • Installs hvac (Python HashiCorp Vault client) locally on the first host only.
  • Fails early if jfrog_token is missing (ensures artifact retrieval works).
  • Includes utilities:get_vault_secrets.yml to populate sensitive variables.

3.2 Fact Gathering

A second play runs standard Ansible fact gathering (Windows environment and network data) without privilege escalation to avoid polluting privilege-sensitive state early.

3.3 Core Provisioning (SQL + AAG)

The third play escalates to the SQL installation account:

Roles executed in order:

  1. optum – Enterprise hardening and baseline configuration (accounts, policies, logging, secure defaults)
  2. os_base_configuration – Core OS tuning (baseline security, services, filesystem layout, NIC/WinRM/registry tuning)
  3. database_setup – Installs SQL Server and configures clustering/AAG as required

become_user is set to sql_install_account; ansible_become_password maps to sql_install_password. This allows local/privileged actions requiring installer context (SQL setup bootstrap, service ACLs).

3.4 Post-Provision Configuration

Includes discrete task fragments from the utilities role for:

  • High Performance power plan
  • Kernel memory dump settings
  • TCP KeepAliveTime tuning
  • Disables “Memory Pressure Protection”
  • Installs:
    • SSMS
    • SQLCMD
  • Microsoft Defender exclusions:
    • Cluster + MSDTC directories
    • SQL database file extensions (mdf, ldf, ndf, bak, trn)
    • Core SQL & SSIS executables

4. Core Role Deep Dive: database_setup (db-install-config)

4.1 Functional Modes

ModeTriggerBehavior
Standalonesql.aag undefinedInstalls SQL instance + databases only
AAG-enabledsql.aag definedAdds clustering, replication, listener, quorum, replica seeding, failover test

4.2 Task Sequence (Execution Order)

  1. SQL Facts Collection

    • Validates domain context (win_whoami)
    • Sets defaults: is_primary: true, is_aag: false
    • Normalizes sql_cmd_instance (removes \MSSQLSERVER for default instance)
  2. AAG Node Facts (conditional)

    • Extracts node definition from sql.aag.nodes
    • Validates listener_port, listener_ip, listener_subnet
    • Determines primary via cluster group ownership
    • Sets is_primary, primary_node, is_aag: true
  3. PowerShell Prerequisites

    • Installs required PS modules (e.g. SqlServer, DSC components)
  4. Forced Sector Size (non-AAG)

    • Aligns physical/logical sector metadata (performance optimization)
  5. SQL Server Install

    • Unattended install (instance name, directories, collation, TempDB layout, services)
  6. Enable Windows Clustering (AAG)

    • Adds Failover Clustering feature and configures cluster object(s)
  7. Create SQL User

    • Adds logins (installer / service / SYSTEM / domain groups)
  8. Quorum Storage Setup (AAG)

    • Configures cloud / storage witness (Azure consensus best practice)
  9. Create Backup Share (AAG)

    • Facilitates initial DB backup/restore synchronization
  10. Create SQL Databases

    • Primary baseline DB creation (FULL recovery model for AAG readiness)
  11. Create Replica Databases (AAG)

    • Backs up & restores DBs to secondary nodes WITH NORECOVERY
  12. AAG Prerequisites

    • HADR endpoints, server-level config, endpoint permissions
  13. Create AAG

    • Defines Availability Group
    • Adds databases
    • Configures listener (IP, port)
    • Sets replica modes (sync/async) + failover modes
  14. Remove Backup Share (AAG)

    • Cleans ephemeral seeding resource (unless permanent)
  15. Cleanup Local Admins

    • Removes temporary elevated local accounts (sql_cleanup_admins)
  16. AAG Failover Test (AAG)

    • Controlled failover + validation (synchronization & listener)

4.3 Tag Taxonomy

Please see the readme of the database install role for a good description of the tags used.


5. Inventory / Variable Model

Please see the readme of the database install role for a good description of the variables used.


6. Conceptual Flow

flowchart TD
  A[Secrets & hvac Install] --> B[Vault Secrets Loaded]
  B --> C[Gather Facts]
  C --> D[PowerShell Prereqs]
  D --> E[SQL Install]
  E --> F{AAG Enabled?}
  F -- No --> G[Create Databases]
  F -- Yes --> H[Windows Cluster + Quorum]
  H --> I[Backup Share + Primary DB Create]
  I --> J[Replica DB Restore]
  J --> K[AAG Prereqs]
  K --> L[Create AG + Listener]
  L --> M[Remove Backup Share]
  G --> N[Create SQL User]
  M --> N
  N --> O[Cleanup Admins]
  O --> P[Failover Test]
  P --> Q[Post Config: Power, Dump, KeepAlive, Tools, Defender Exclusions]

7. Tag Usage Reference

ObjectiveExample Command
Full build(no --tags needed)
Recreate availability group onlyansible-playbook pb_aag.yml --tags sql:aag:cluster
Re-run replica restore--tags sql:aag:database_replicas
Run failover test again--tags sql:aag:failover_test
Only baseline databases (non-AAG)--tags sql:databases
Cleanup temp admins--tags sql:cleanup_admins

Warning: Running isolated tags without prerequisite state may fail. Use only for controlled remediation.


8. Specifics by Epic SQL instance

MPSQL is the only instance where an AWX job was created to assist in setting up the cluster. The MPSQL AAG Prod template utilizes the MPSQL - Prod inventory. This AWX template was intended to be kicked off on an adhoc basis when the MPSQL AAG cluster in prod was deemed necessary.

8.1 Topology Patterns

  • Caboodle (test): Two regional HA pairs (caboodle_db_west_epic, caboodle_db_east_epic), each with 2 hosts.

8.2 Secret & Account Strategy

AspectMPSQL ProdCaboodle Test (East/West)
Install + Service AccountSingle shared account (MS\\sharedepicsql)Region-specific (MS\\eastepiccabct, MS\\westepiccabct)
SA Password Sourcesql/prod/mpsql/sharedsql/cloudtest/caboodle/<region>
SSIS / Catalog PasswordN/APresent (sql_ssis_catalog_password)
Product KeyNot shownRetrieved (license-keys/sql-server)
Domain Join Passwordjoin-credentials/svc_0085665_prdjoin-credentials/svc_0085665_tst

| JFrog Token | Shared service account path | Same path reused | | Quorum Storage Secrets | One shared storage account reused across MPSQL regions | Distinct storage account per region (east vs west) |

8.3 Quorum / Witness Strategy

DimensionMPSQL ProdCaboodle Test
Witness Storage AccountSingle shared (west-region) reused across all MPSQL regionsN/A
Witness ANFN/ARegion-specific (east vs west)
Access Key VariablesSame vault path reusedDistinct vault paths per region