MonitoringUpdated July 3, 2026

Monitoring & Alerting Standards

monitoringalertingazureepicinfrastructureloggingdynatracesplunk

Monitoring & Alerting Standards

Monitoring and alerting standards for OHEMR Epic healthcare infrastructure.

🎯 Overview

All critical infrastructure components supporting Epic systems must implement standardized monitoring and alerting, leveraging Azure Monitor, Dynatrace, Splunk, ServiceNow, and other integrated solutions. This ensures incidents are rapidly detected, classified, and escalated to the appropriate teams, supporting clinical uptime and compliance.

Benefits

Unified Operations View: Centralized event collection and alerting
Rapid Incident Response: Automated notifications to command centers and support teams
Compliance Readiness: Logging and event history for HIPAA, Epic, and SOX needs
Operational Consistency: Common patterns for all monitored resource types

📋 Monitoring Coverage Matrix

Category	Operations Owner	Primary Monitor	Secondary Monitor	EventHub/Integration	Dynatrace	Splunk Logging	Inter-link Alerts	ServiceNow Ticketing	Command Center/TCC	Notes
Azure Services (Storage, ExpRoute)	Paris	Azure Monitor	Dynatrace OneAgent	Sends to Splunk	Yes	Yes	Yes	Yes (Team/SNOW)	Yes (P1 & P2)	P3 & P4: Team/SNOW, P1 & P2: Command Center
Azure Base VM Windows	Paris	Azure Monitor + Guest OS logs	Dynatrace OneAgent	Sends to Splunk	Yes	Yes	Yes	Yes (Team/SNOW)	Yes (P1 & P2)	P3 & P4: Team/SNOW, P1 & P2: Command Center
Azure Base VM Linux	-	Azure Monitor (AMA agent, DCR, DCE, AMPLS)	Dynatrace OneAgent	(EventHub possible/test)	Yes	Yes	Yes	Yes (Team/SNOW)	Yes (P1 & P2)	Linux log aggregation under review. P3 & P4: Team/SNOW, P1 & P2: Command Center
Azure NetApp Volume	Dwayne B Jones	Azure Monitor	-	Sends to Splunk	-	Yes	Yes	Yes (Team/SNOW)	Yes (P1 & P2)	P3 & P4: Team/SNOW, P1 & P2: Command Center
Appliances (Firewalls, Infoblox)	-	Appliance Syslogs (Palo Alto)	-	Sends to Splunk	-	Yes	Yes	Yes (Team/SNOW)	Yes (P1 & P2)	P3 & P4: Team/SNOW, P1 & P2: Command Center
Citrix	-	UberAgent dashboards	-	Sends to Splunk	-	Yes	Yes	Yes (Team/SNOW)	Yes (P1 & P2)	P3 & P4: Team/SNOW, P1 & P2: Command Center

Link to the comprehensive monitoring and alerting matrix

🏥 Monitoring Tools & Integration Patterns

Primary Monitoring Solutions

Azure Monitor: Default for all Azure-native resources (VMs, storage, NetApp, SQL)
Guest OS Logs: Collected via AMA agent, Data Collection Rules (DCR), and Endpoints (DCE)
Dynatrace OneAgent: Deployed on VMs for deep application metrics
Splunk: Central log aggregation for all monitored events
EventHub: Used for data streaming where needed (e.g., manual log aggregation for Linux)
ServiceNow: Incident/ticket generation for P3/P4 (team or SNOW), P1/P2 (command center)
Interlink: Alert routing and escalation for integration with command centers
Epic Command Center / TCC: Escalation point for high-severity incidents

🔧 Standard Configuration Requirements

1. Monitoring & Alerting By Resource Type

Azure Services (Storage, ExpRoute, NetApp, SQL)

Monitoring: Azure Monitor
Secondary: Dynatrace (where supported), Splunk logging via export or agent
Alert Routing: Interlink for all critical alerts
Incident Escalation: ServiceNow for P3/P4, Command Center for P1/P2

Azure Base VMs (Windows & Linux)

Windows: Azure Monitor + Guest OS logs, Dynatrace OneAgent, Splunk
Linux: AMA agent + DCR + DCE + AMPLS, Dynatrace, Splunk (EventHub as needed)
Alert Routing: Interlink, ServiceNow (team), Command Center
Special Note: Linux log aggregation pattern under review (manual aggregation to LAW)

Appliances (Firewalls, Infoblox, etc.)

Monitoring: Appliance Syslogs (e.g., Palo Alto), Splunk
Alert Routing: Interlink, ServiceNow, Command Center

Citrix

Monitoring: Uberagent → Kafka → Splunk
Alert Routing: ServiceNow/Team, Command Center

Epic System Pulse

Monitoring: System Pulse (epic native)
Alert Routing: Email inbox, technical team manual review and classification (future: Splunk/Dynatrace integration)

🛡️ Alert Classification & Escalation

P1 & P2 (High/Critical): Immediate notification to Command Center/TCC
P3 & P4 (Warning/Info): Notification to support team or ServiceNow ticket
Manual Review: Certain systems (e.g., Epic System Pulse) require manual alert review/classification by technical team

🔍 Validation & Troubleshooting

Pre-Deployment Checklist

All required monitoring agents deployed (Azure Monitor, Dynatrace, AMA, etc.)
Splunk logging integration configured and tested
Interlink and ServiceNow routing rules validated for each category
Alert severity mapping confirmed (P1/P2 → Command Center, P3/P4 → Team/SNOW)
For Epic System Pulse, confirm manual review procedures are documented and followed

Troubleshooting Common Issues

Missing alerts in Splunk: Check agent status and EventHub/DCR integrations
Alerts not escalated to Command Center: Review Interlink rules and ServiceNow integration
Guest OS log gaps: Validate AMA agent and DCR/DCE configurations
Manual classification backlog (Epic System Pulse): Ensure technical team coverage, automate where possible

🔗 Related Documentation

📞 Support & Contacts

Monitoring Contacts

Azure Monitoring: Clint / Indhu
Dynatrace: Paris
Splunk Logging: Clint / Indhu / Paris
Appliance Monitoring: Dwayne B Jones
Citrix: Jason
SQL Monitoring: Laura / Clint / John Brownlee
Epic System Pulse: Matt / Jordan

Incident Escalation

P1/P2 Incidents: Notify Epic Command Center / TCC
P3/P4 Incidents: Team notification or ServiceNow ticket

📈 Operational Excellence: Standardized monitoring and alerting ensures reliable healthcare delivery, rapid incident response, and compliance for all OHEMR Epic infrastructure.