On August 1st, 2024, between 13:06 PDT and 14:25 PDT, some customers experienced delays in telemetry data processing. The incident was caused by a networking configuration change to a Key Vault, which resulted in parts of the service being unable to access the required resource. We resolved the issue by reverting the networking changes.
Impact
During the incident, all telemetry data was delayed by approximately 1 hour. This affected Data Explorer and data connections. However, reports were not impacted.
Action Items
To prevent similar incidents from happening again, we have taken the following actions:
• Created additional monitoring to reduce the time to root cause this kind of issue.
• Developed additional dashboards to allow us to more quickly identify the affected part of the pipeline.
• Authored a TSG (Technical Support Guide) describing how to use the new dashboards.