Navigation
MonitoringUpdated July 3, 2026

Monitoring & Alerting Standards

monitoringalertingazureepicinfrastructureloggingdynatracesplunk

Monitoring & Alerting Standards

Monitoring and alerting standards for OHEMR Epic healthcare infrastructure.


๐ŸŽฏ Overview

All critical infrastructure components supporting Epic systems must implement standardized monitoring and alerting, leveraging Azure Monitor, Dynatrace, Splunk, ServiceNow, and other integrated solutions. This ensures incidents are rapidly detected, classified, and escalated to the appropriate teams, supporting clinical uptime and compliance.

Benefits

  • Unified Operations View: Centralized event collection and alerting
  • Rapid Incident Response: Automated notifications to command centers and support teams
  • Compliance Readiness: Logging and event history for HIPAA, Epic, and SOX needs
  • Operational Consistency: Common patterns for all monitored resource types

๐Ÿ“‹ Monitoring Coverage Matrix

CategoryOperations OwnerPrimary MonitorSecondary MonitorEventHub/IntegrationDynatraceSplunk LoggingInter-link AlertsServiceNow TicketingCommand Center/TCCNotes
Azure Services (Storage, ExpRoute)ParisAzure MonitorDynatrace OneAgentSends to SplunkYesYesYesYes (Team/SNOW)Yes (P1 & P2)P3 & P4: Team/SNOW, P1 & P2: Command Center
Azure Base VM WindowsParisAzure Monitor + Guest OS logsDynatrace OneAgentSends to SplunkYesYesYesYes (Team/SNOW)Yes (P1 & P2)P3 & P4: Team/SNOW, P1 & P2: Command Center
Azure Base VM Linux-Azure Monitor (AMA agent, DCR, DCE, AMPLS)Dynatrace OneAgent(EventHub possible/test)YesYesYesYes (Team/SNOW)Yes (P1 & P2)Linux log aggregation under review. P3 & P4: Team/SNOW, P1 & P2: Command Center
Azure NetApp VolumeDwayne B JonesAzure Monitor-Sends to Splunk-YesYesYes (Team/SNOW)Yes (P1 & P2)P3 & P4: Team/SNOW, P1 & P2: Command Center
Appliances (Firewalls, Infoblox)-Appliance Syslogs (Palo Alto)-Sends to Splunk-YesYesYes (Team/SNOW)Yes (P1 & P2)P3 & P4: Team/SNOW, P1 & P2: Command Center
Citrix-UberAgent dashboards-Sends to Splunk-YesYesYes (Team/SNOW)Yes (P1 & P2)P3 & P4: Team/SNOW, P1 & P2: Command Center

Link to the comprehensive monitoring and alerting matrix

๐Ÿฅ Monitoring Tools & Integration Patterns

Primary Monitoring Solutions

  • Azure Monitor: Default for all Azure-native resources (VMs, storage, NetApp, SQL)
  • Guest OS Logs: Collected via AMA agent, Data Collection Rules (DCR), and Endpoints (DCE)
  • Dynatrace OneAgent: Deployed on VMs for deep application metrics
  • Splunk: Central log aggregation for all monitored events
  • EventHub: Used for data streaming where needed (e.g., manual log aggregation for Linux)
  • ServiceNow: Incident/ticket generation for P3/P4 (team or SNOW), P1/P2 (command center)
  • Interlink: Alert routing and escalation for integration with command centers
  • Epic Command Center / TCC: Escalation point for high-severity incidents

๐Ÿ”ง Standard Configuration Requirements

1. Monitoring & Alerting By Resource Type

Azure Services (Storage, ExpRoute, NetApp, SQL)

  • Monitoring: Azure Monitor
  • Secondary: Dynatrace (where supported), Splunk logging via export or agent
  • Alert Routing: Interlink for all critical alerts
  • Incident Escalation: ServiceNow for P3/P4, Command Center for P1/P2

Azure Base VMs (Windows & Linux)

  • Windows: Azure Monitor + Guest OS logs, Dynatrace OneAgent, Splunk
  • Linux: AMA agent + DCR + DCE + AMPLS, Dynatrace, Splunk (EventHub as needed)
  • Alert Routing: Interlink, ServiceNow (team), Command Center
  • Special Note: Linux log aggregation pattern under review (manual aggregation to LAW)

Appliances (Firewalls, Infoblox, etc.)

  • Monitoring: Appliance Syslogs (e.g., Palo Alto), Splunk
  • Alert Routing: Interlink, ServiceNow, Command Center

Citrix

  • Monitoring: Uberagent โ†’ Kafka โ†’ Splunk
  • Alert Routing: ServiceNow/Team, Command Center

Epic System Pulse

  • Monitoring: System Pulse (epic native)
  • Alert Routing: Email inbox, technical team manual review and classification (future: Splunk/Dynatrace integration)

๐Ÿ›ก๏ธ Alert Classification & Escalation

  • P1 & P2 (High/Critical): Immediate notification to Command Center/TCC
  • P3 & P4 (Warning/Info): Notification to support team or ServiceNow ticket
  • Manual Review: Certain systems (e.g., Epic System Pulse) require manual alert review/classification by technical team

๐Ÿ” Validation & Troubleshooting

Pre-Deployment Checklist

  1. All required monitoring agents deployed (Azure Monitor, Dynatrace, AMA, etc.)
  2. Splunk logging integration configured and tested
  3. Interlink and ServiceNow routing rules validated for each category
  4. Alert severity mapping confirmed (P1/P2 โ†’ Command Center, P3/P4 โ†’ Team/SNOW)
  5. For Epic System Pulse, confirm manual review procedures are documented and followed

Troubleshooting Common Issues

  • Missing alerts in Splunk: Check agent status and EventHub/DCR integrations
  • Alerts not escalated to Command Center: Review Interlink rules and ServiceNow integration
  • Guest OS log gaps: Validate AMA agent and DCR/DCE configurations
  • Manual classification backlog (Epic System Pulse): Ensure technical team coverage, automate where possible

๐Ÿ”— Related Documentation


๐Ÿ“ž Support & Contacts

Monitoring Contacts

  • Azure Monitoring: Clint / Indhu
  • Dynatrace: Paris
  • Splunk Logging: Clint / Indhu / Paris
  • Appliance Monitoring: Dwayne B Jones
  • Citrix: Jason
  • SQL Monitoring: Laura / Clint / John Brownlee
  • Epic System Pulse: Matt / Jordan

Incident Escalation

  • P1/P2 Incidents: Notify Epic Command Center / TCC
  • P3/P4 Incidents: Team notification or ServiceNow ticket

๐Ÿ“ˆ Operational Excellence: Standardized monitoring and alerting ensures reliable healthcare delivery, rapid incident response, and compliance for all OHEMR Epic infrastructure.