SQL Server Always On Availability Group (AAG) Deployment
Operational Documentation: SQL Server Always On Availability Group (AAG) Deployment
1. Purpose and Scope
The Epic AAG playbook automates provisioning and configuration of two Windows-based Microsoft SQL Server instances participating in a Windows Failover Cluster and Always On Availability Group (AAG) for Epic Systems workloads on Azure. It supports:
- High availability for:
- Multi-Purpose SQL (MPSQL) – used by Epic “central services.”
- Cogito platform (Clarity, Caboodle, SSIS workloads).
- Secure dependency retrieval (vault secrets, JFrog-hosted artifacts)
- Consistent OS, SQL Server instance, clustering, replication, and post-install tuning.
This playbook orchestrates three major operational phases:
- Secret and prerequisite bootstrap (Python module + Vault secrets)
- Core provisioning (OS base config, SQL Server install, AAG formation)
- Post-installation server hardening and application-oriented tuning (power, networking, tools, Defender exclusions)
It leverages modular roles, with the heaviest logic inside the database_setup role (implemented in the ohemr-ansible-role-db-install-config repository). That role handles both single-instance and AAG-capable deployments, dynamically branching when AAG variables are present.
2. High-Level Workflow Summary
| Phase | Play Name | Hosts | Privilege | Key Outcomes |
|---|---|---|---|---|
| 1 | Load HashiVault Secrets | all | no become | Install hvac, retrieve secrets (JFrog token, credentials) |
| 2 | Gather Facts (No Become) | all | no become | Collect Windows/system facts for later logic |
| 3 | SQL Server and AAG Install | all | become as sql_install_account | Apply roles: optum, os_base_configuration, database_setup |
| 4 | Server and App Configuration | all | no become | Post-flight tuning (power plan, dump settings, keepalive, tooling, Defender exclusions) |
3. Detailed Playbook Breakdown (pb_aag.yml)
3.1 Bootstrap & Secrets
- Installs
hvac(Python HashiCorp Vault client) locally on the first host only. - Fails early if
jfrog_tokenis missing (ensures artifact retrieval works). - Includes
utilities:get_vault_secrets.ymlto populate sensitive variables.
3.2 Fact Gathering
A second play runs standard Ansible fact gathering (Windows environment and network data) without privilege escalation to avoid polluting privilege-sensitive state early.
3.3 Core Provisioning (SQL + AAG)
The third play escalates to the SQL installation account:
Roles executed in order:
optum– Enterprise hardening and baseline configuration (accounts, policies, logging, secure defaults)os_base_configuration– Core OS tuning (baseline security, services, filesystem layout, NIC/WinRM/registry tuning)database_setup– Installs SQL Server and configures clustering/AAG as required
become_user is set to sql_install_account; ansible_become_password maps to sql_install_password. This allows local/privileged actions requiring installer context (SQL setup bootstrap, service ACLs).
3.4 Post-Provision Configuration
Includes discrete task fragments from the utilities role for:
- High Performance power plan
- Kernel memory dump settings
- TCP KeepAliveTime tuning
- Disables “Memory Pressure Protection”
- Installs:
- SSMS
- SQLCMD
- Microsoft Defender exclusions:
- Cluster + MSDTC directories
- SQL database file extensions (mdf, ldf, ndf, bak, trn)
- Core SQL & SSIS executables
4. Core Role Deep Dive: database_setup (db-install-config)
4.1 Functional Modes
| Mode | Trigger | Behavior |
|---|---|---|
| Standalone | sql.aag undefined | Installs SQL instance + databases only |
| AAG-enabled | sql.aag defined | Adds clustering, replication, listener, quorum, replica seeding, failover test |
4.2 Task Sequence (Execution Order)
-
SQL Facts Collection
- Validates domain context (
win_whoami) - Sets defaults:
is_primary: true,is_aag: false - Normalizes
sql_cmd_instance(removes\MSSQLSERVERfor default instance)
- Validates domain context (
-
AAG Node Facts (conditional)
- Extracts node definition from
sql.aag.nodes - Validates
listener_port,listener_ip,listener_subnet - Determines primary via cluster group ownership
- Sets
is_primary,primary_node,is_aag: true
- Extracts node definition from
-
PowerShell Prerequisites
- Installs required PS modules (e.g.
SqlServer, DSC components)
- Installs required PS modules (e.g.
-
Forced Sector Size (non-AAG)
- Aligns physical/logical sector metadata (performance optimization)
-
SQL Server Install
- Unattended install (instance name, directories, collation, TempDB layout, services)
-
Enable Windows Clustering (AAG)
- Adds Failover Clustering feature and configures cluster object(s)
-
Create SQL User
- Adds logins (installer / service / SYSTEM / domain groups)
-
Quorum Storage Setup (AAG)
- Configures cloud / storage witness (Azure consensus best practice)
-
Create Backup Share (AAG)
- Facilitates initial DB backup/restore synchronization
-
Create SQL Databases
- Primary baseline DB creation (FULL recovery model for AAG readiness)
-
Create Replica Databases (AAG)
- Backs up & restores DBs to secondary nodes WITH NORECOVERY
-
AAG Prerequisites
- HADR endpoints, server-level config, endpoint permissions
-
Create AAG
- Defines Availability Group
- Adds databases
- Configures listener (IP, port)
- Sets replica modes (sync/async) + failover modes
-
Remove Backup Share (AAG)
- Cleans ephemeral seeding resource (unless permanent)
-
Cleanup Local Admins
- Removes temporary elevated local accounts (
sql_cleanup_admins)
- Removes temporary elevated local accounts (
-
AAG Failover Test (AAG)
- Controlled failover + validation (synchronization & listener)
4.3 Tag Taxonomy
Please see the readme of the database install role for a good description of the tags used.
5. Inventory / Variable Model
Please see the readme of the database install role for a good description of the variables used.
6. Conceptual Flow
flowchart TD
A[Secrets & hvac Install] --> B[Vault Secrets Loaded]
B --> C[Gather Facts]
C --> D[PowerShell Prereqs]
D --> E[SQL Install]
E --> F{AAG Enabled?}
F -- No --> G[Create Databases]
F -- Yes --> H[Windows Cluster + Quorum]
H --> I[Backup Share + Primary DB Create]
I --> J[Replica DB Restore]
J --> K[AAG Prereqs]
K --> L[Create AG + Listener]
L --> M[Remove Backup Share]
G --> N[Create SQL User]
M --> N
N --> O[Cleanup Admins]
O --> P[Failover Test]
P --> Q[Post Config: Power, Dump, KeepAlive, Tools, Defender Exclusions]
7. Tag Usage Reference
| Objective | Example Command |
|---|---|
| Full build | (no --tags needed) |
| Recreate availability group only | ansible-playbook pb_aag.yml --tags sql:aag:cluster |
| Re-run replica restore | --tags sql:aag:database_replicas |
| Run failover test again | --tags sql:aag:failover_test |
| Only baseline databases (non-AAG) | --tags sql:databases |
| Cleanup temp admins | --tags sql:cleanup_admins |
Warning: Running isolated tags without prerequisite state may fail. Use only for controlled remediation.
8. Specifics by Epic SQL instance
MPSQL is the only instance where an AWX job was created to assist in setting up the cluster. The MPSQL AAG Prod template utilizes the MPSQL - Prod inventory. This AWX template was intended to be kicked off on an adhoc basis when the MPSQL AAG cluster in prod was deemed necessary.
8.1 Topology Patterns
- Caboodle (test): Two regional HA pairs (
caboodle_db_west_epic,caboodle_db_east_epic), each with 2 hosts.
8.2 Secret & Account Strategy
| Aspect | MPSQL Prod | Caboodle Test (East/West) |
|---|---|---|
| Install + Service Account | Single shared account (MS\\sharedepicsql) | Region-specific (MS\\eastepiccabct, MS\\westepiccabct) |
| SA Password Source | sql/prod/mpsql/shared | sql/cloudtest/caboodle/<region> |
| SSIS / Catalog Password | N/A | Present (sql_ssis_catalog_password) |
| Product Key | Not shown | Retrieved (license-keys/sql-server) |
| Domain Join Password | join-credentials/svc_0085665_prd | join-credentials/svc_0085665_tst |
| JFrog Token | Shared service account path | Same path reused | | Quorum Storage Secrets | One shared storage account reused across MPSQL regions | Distinct storage account per region (east vs west) |
8.3 Quorum / Witness Strategy
| Dimension | MPSQL Prod | Caboodle Test |
|---|---|---|
| Witness Storage Account | Single shared (west-region) reused across all MPSQL regions | N/A |
| Witness ANF | N/A | Region-specific (east vs west) |
| Access Key Variables | Same vault path reused | Distinct vault paths per region |