Navigation
SupportUpdated July 3, 2026

Incident Management

supportincident-managementservicenowitsmescalationcommunicationoperationsepicazuresla

Incident Management

Key Links:


1. Introduction

1.1 Purpose

The Incident Management document ensures a structured approach to identifying, documenting, and resolving incidents promptly, while maintaining clear communication with all stakeholders.

1.2 Scope

  • Applies to both Production and Non-Production (testing, development, etc.) environments.
  • Training environments are treated with the same standards and processes as Production.
  • Covers all teams involved in operational and technical incident resolution for the Epic implementation on Azure.
  • Encompasses detection, triaging, escalation, communication, resolution, and post-incident review.

1.3 Key Principles

  • Rapid Response: Swiftly address incidents to minimize impact on patient care and business operations.
  • Consistent Communication: Provide clear, timely updates to stakeholders.
  • Continuous Improvement: Leverage lessons from each incident to drive process enhancements.

2. Roles & Responsibilities

Below is a high-level summary of each team’s assignment and escalation group.

TeamAssignment Group
Business Operations (Epic App DBA)Epic - Azure (National West)
CitrixUSS_Virtual_Workspace
Azure Platform Ops (Prod)Epic_Azure_Infrastructure_Ops (Prod)
Azure Platform Ops (Non-Prod)Epic_Azure_Infrastructure_Ops_NonProd
Network NSIS (Topology & Connectivity)NSIS FIREWALL ANALYST
Core DNS (Infoblox)ISO - IPAM
Network Delivery (Palo Alto & FW)ISO_Cyber_Defense_Support_CDS
Active DirectoryDirectory Services Infrastructure (DSI)
CloudflareLoad Balancer Web Application Firewall
Terraform Enterprise & Hashi VaultE2M TEK

Visual: Team Escalation/Support Flow

flowchart TD
    BO["Business Operations (Epic App DBA)"]
    CIT["Citrix"]
    subgraph AZG ["Azure Platform Ops"]
      APROD["Prod"]
      ANONPROD["Non-Prod"]
    end
    NSIS["Network NSIS (Topology & Connectivity)"]
    DNS["Core DNS (Infoblox)"]
    SEC["Network Security (Palo Alto & FW)"]
    AD["Active Directory"]
    CF["Cloudflare"]
    TEK["Terraform Enterprise & Hashi Vault"]

    BO --> CIT
    BO --> AZG

    AZG --> NSIS
    AZG --> DNS
    AZG --> SEC
    AZG --> AD
    AZG --> CF
    AZG --> TEK

2.1 Team Responsibilities

Business Operations (Epic - Azure National West)

  • Oversee the Epic application and ensure it meets operational and patient-care needs.
  • Perform Epic database administration (DBA), performance monitoring, and routine maintenance.
  • Coordinate with other teams for Epic upgrades or critical infrastructure changes.
  • Work with Azure Platform Ops and Citrix to ensure infrastructure changes follow Terraform/IaC principles.

Citrix (CITRIX IMS (UHT) - OSW)

  • Manage remote application and desktop virtualization services for secure Epic access.
  • Maintain Citrix server farms, load balancing, and user access policies.
  • Collaborate with Azure Platform Ops for infrastructure provisioning and updates via IaC.

Azure Platform Ops (Prod) (Epic_Azure_Infrastructure_Ops (Prod))

  • Manage the production Azure environment hosting Epic (including training, which is treated as production).
  • Monitor workloads, manage capacity, and ensure regulatory/performance compliance.
  • Implement changes per Terraform and IaC best practices.

Azure Platform Ops (Non-Prod) (Epic_Azure_Infrastructure_Ops_NonProd)

  • Maintain non-production Azure environments (e.g., development, testing).
  • Provision, patch, and decommission resources using IaC standards.

Network NSIS (Topology & Connectivity) (NSIS FIREWALL ANALYST)

  • Ensure secure, efficient network communication within Azure environments.
  • Troubleshoot network issues impacting Epic performance.
  • Collaborate with Azure Platform Ops for infrastructure changes via Terraform.

Core DNS (Infoblox) (ISO - IPAM)

  • Maintain domain name resolution services for Epic and manage DNS records.
  • Coordinate DNS updates for new/decommissioned systems.
  • Work with Azure Platform Ops to push DNS-related changes through IaC workflows.

Network Security (Palo Alto & Firewalls) (ISO - CYBER DEFENSE SUPPORT)

  • Manage advanced firewall/security configurations (IPS/IDS, threat prevention, logging, rules).
  • Enforce network segmentation policies.
  • Implement firewall changes with Azure Platform Ops using Terraform/IaC.

Active Directory (Directory Services Infrastructure (DSI))

  • Administer domain controllers, group policies, and authentication for Epic.
  • Coordinate account provisioning, deprovisioning, and security policy changes with Azure Platform Ops.

Cloudflare (Load Balancer Web Application Firewall)

  • Provide external DNS management, Content Delivery Network (CDN), and DDoS protection for public Epic services.
  • Monitor edge performance/availability.
  • Implement changes through IaC tools with Azure Platform Ops.

Terraform Enterprise & Hashi Vault (E2M TEK)

  • Maintain and support the Terraform Enterprise and HashiCorp Vault platforms.
  • Address incidents or outages related to the availability or performance of these tools.
  • Collaborate with Infra Ops and other teams if platform issues impact environment provisioning or secret management.
  • Responsibility for IaC templates, modules, and secrets remains with the Infra Ops teams.

3. Incident Detection & Triage

3.1 Corporate Priority Grid (ServiceNow SLA Group)

Incident priorities are assigned in ServiceNow based on the corporate SLA group. The table below describes each priority, with response and resolution goals:

PriorityDefinitionResponse GoalRestoration/Fulfillment Goal
1Major outages (business-critical app down). TCC & SMEs assess VBFs.15 min1 hr
2Outages/service degradations. TCC & SMEs assess impact.15 min4 hrs
3Single-user “hard down” or multi-user (not P1/P2).4 business hrs1 business day
4Single-user, workaround exists.1 business day2 business days
5Low-impact (password resets, RFIs, etc.).5 business days5 business days

Monitoring & Alerts

  • Automated monitoring tools (e.g., logging platforms, system alerts) should be tuned for timely detection.
  • All incident alerts route to the respective assignment and/or paging groups.

Triage

  • Validate incident severity and assign the appropriate priority (1–5).
  • Assign incidents to the appropriate team based on environment, service ownership, and business impact.

4. Escalation & Communication Plan

4.1 Escalation Triggers

  • If an incident cannot be resolved within specified timeframes or requires specialized expertise, escalate to the next tier (DevOps, Infrastructure, Security, etc.).

4.2 Communication Channels

  • Primary: Email distribution lists, instant messaging channels, or ticketing system notifications (e.g., ServiceNow).
  • Executive Updates: High-priority incidents warrant direct communication to executive and business leadership. These communications are typically coordinated and delivered by the Business Operations (Epic - Azure National West) team.

4.3 Timeframes

  • Acknowledge incidents within the timeframe defined by their assigned priority.
  • Update stakeholders at defined intervals (e.g., every 30 minutes for Priority 1 incidents).

5. Resolution & Post-Incident Review

Resolution

  • The assigned team mitigates or resolves the issue.
  • Confirm all systems are stable and notify impacted users/stakeholders of the resolution.

Post-Incident Review

  • Conduct a retrospective for critical (P1/P2) incidents.
  • Document root cause, lessons learned, and action items to prevent recurrence.

6. Additional Resources