MonitoringUpdated July 3, 2026
Monitoring & Alerting Standards
monitoringalertingazureepicinfrastructureloggingdynatracesplunk
Monitoring & Alerting Standards
Monitoring and alerting standards for OHEMR Epic healthcare infrastructure.
๐ฏ Overview
All critical infrastructure components supporting Epic systems must implement standardized monitoring and alerting, leveraging Azure Monitor, Dynatrace, Splunk, ServiceNow, and other integrated solutions. This ensures incidents are rapidly detected, classified, and escalated to the appropriate teams, supporting clinical uptime and compliance.
Benefits
- Unified Operations View: Centralized event collection and alerting
- Rapid Incident Response: Automated notifications to command centers and support teams
- Compliance Readiness: Logging and event history for HIPAA, Epic, and SOX needs
- Operational Consistency: Common patterns for all monitored resource types
๐ Monitoring Coverage Matrix
| Category | Operations Owner | Primary Monitor | Secondary Monitor | EventHub/Integration | Dynatrace | Splunk Logging | Inter-link Alerts | ServiceNow Ticketing | Command Center/TCC | Notes |
|---|---|---|---|---|---|---|---|---|---|---|
| Azure Services (Storage, ExpRoute) | Paris | Azure Monitor | Dynatrace OneAgent | Sends to Splunk | Yes | Yes | Yes | Yes (Team/SNOW) | Yes (P1 & P2) | P3 & P4: Team/SNOW, P1 & P2: Command Center |
| Azure Base VM Windows | Paris | Azure Monitor + Guest OS logs | Dynatrace OneAgent | Sends to Splunk | Yes | Yes | Yes | Yes (Team/SNOW) | Yes (P1 & P2) | P3 & P4: Team/SNOW, P1 & P2: Command Center |
| Azure Base VM Linux | - | Azure Monitor (AMA agent, DCR, DCE, AMPLS) | Dynatrace OneAgent | (EventHub possible/test) | Yes | Yes | Yes | Yes (Team/SNOW) | Yes (P1 & P2) | Linux log aggregation under review. P3 & P4: Team/SNOW, P1 & P2: Command Center |
| Azure NetApp Volume | Dwayne B Jones | Azure Monitor | - | Sends to Splunk | - | Yes | Yes | Yes (Team/SNOW) | Yes (P1 & P2) | P3 & P4: Team/SNOW, P1 & P2: Command Center |
| Appliances (Firewalls, Infoblox) | - | Appliance Syslogs (Palo Alto) | - | Sends to Splunk | - | Yes | Yes | Yes (Team/SNOW) | Yes (P1 & P2) | P3 & P4: Team/SNOW, P1 & P2: Command Center |
| Citrix | - | UberAgent dashboards | - | Sends to Splunk | - | Yes | Yes | Yes (Team/SNOW) | Yes (P1 & P2) | P3 & P4: Team/SNOW, P1 & P2: Command Center |
Link to the comprehensive monitoring and alerting matrix
๐ฅ Monitoring Tools & Integration Patterns
Primary Monitoring Solutions
- Azure Monitor: Default for all Azure-native resources (VMs, storage, NetApp, SQL)
- Guest OS Logs: Collected via AMA agent, Data Collection Rules (DCR), and Endpoints (DCE)
- Dynatrace OneAgent: Deployed on VMs for deep application metrics
- Splunk: Central log aggregation for all monitored events
- EventHub: Used for data streaming where needed (e.g., manual log aggregation for Linux)
- ServiceNow: Incident/ticket generation for P3/P4 (team or SNOW), P1/P2 (command center)
- Interlink: Alert routing and escalation for integration with command centers
- Epic Command Center / TCC: Escalation point for high-severity incidents
๐ง Standard Configuration Requirements
1. Monitoring & Alerting By Resource Type
Azure Services (Storage, ExpRoute, NetApp, SQL)
- Monitoring: Azure Monitor
- Secondary: Dynatrace (where supported), Splunk logging via export or agent
- Alert Routing: Interlink for all critical alerts
- Incident Escalation: ServiceNow for P3/P4, Command Center for P1/P2
Azure Base VMs (Windows & Linux)
- Windows: Azure Monitor + Guest OS logs, Dynatrace OneAgent, Splunk
- Linux: AMA agent + DCR + DCE + AMPLS, Dynatrace, Splunk (EventHub as needed)
- Alert Routing: Interlink, ServiceNow (team), Command Center
- Special Note: Linux log aggregation pattern under review (manual aggregation to LAW)
Appliances (Firewalls, Infoblox, etc.)
- Monitoring: Appliance Syslogs (e.g., Palo Alto), Splunk
- Alert Routing: Interlink, ServiceNow, Command Center
Citrix
- Monitoring: Uberagent โ Kafka โ Splunk
- Alert Routing: ServiceNow/Team, Command Center
Epic System Pulse
- Monitoring: System Pulse (epic native)
- Alert Routing: Email inbox, technical team manual review and classification (future: Splunk/Dynatrace integration)
๐ก๏ธ Alert Classification & Escalation
- P1 & P2 (High/Critical): Immediate notification to Command Center/TCC
- P3 & P4 (Warning/Info): Notification to support team or ServiceNow ticket
- Manual Review: Certain systems (e.g., Epic System Pulse) require manual alert review/classification by technical team
๐ Validation & Troubleshooting
Pre-Deployment Checklist
- All required monitoring agents deployed (Azure Monitor, Dynatrace, AMA, etc.)
- Splunk logging integration configured and tested
- Interlink and ServiceNow routing rules validated for each category
- Alert severity mapping confirmed (P1/P2 โ Command Center, P3/P4 โ Team/SNOW)
- For Epic System Pulse, confirm manual review procedures are documented and followed
Troubleshooting Common Issues
- Missing alerts in Splunk: Check agent status and EventHub/DCR integrations
- Alerts not escalated to Command Center: Review Interlink rules and ServiceNow integration
- Guest OS log gaps: Validate AMA agent and DCR/DCE configurations
- Manual classification backlog (Epic System Pulse): Ensure technical team coverage, automate where possible
๐ Related Documentation
๐ Support & Contacts
Monitoring Contacts
- Azure Monitoring: Clint / Indhu
- Dynatrace: Paris
- Splunk Logging: Clint / Indhu / Paris
- Appliance Monitoring: Dwayne B Jones
- Citrix: Jason
- SQL Monitoring: Laura / Clint / John Brownlee
- Epic System Pulse: Matt / Jordan
Incident Escalation
- P1/P2 Incidents: Notify Epic Command Center / TCC
- P3/P4 Incidents: Team notification or ServiceNow ticket
๐ Operational Excellence: Standardized monitoring and alerting ensures reliable healthcare delivery, rapid incident response, and compliance for all OHEMR Epic infrastructure.