On January 8th, 2025, between 5:30 PM and 9:02 PM PST, some customers experienced delayed playstream processing and increased error rates for economy requests when accessing PlayFab's API. The incident was caused by network connectivity issues from backend services. We resolved the issue by migrating traffic to a health backend.
The incident resulted in delayed playstream processing and increased error rates for economy requests. Additionally, the Economy V2 SLA dipped to 99% reliability during the impact period.
The root cause of the incident was identified as network connectivity errors from a specific backend. The connectivity issues were not caused by any recent changes.
To prevent similar incidents from happening again, we have taken the following actions:
· Enhanced our monitoring and alerting systems to detect and report any anomalies in the load balancer's behavior and performance.
· Automatic failure when backend services are impacted