On January 22, 2025, between 10:44 AM and 11:15 AM PST, some customers experienced increased latency in PlayFab's API. The incident was caused by a network configuration issue during the migration to new Redis instances, which resulted in ports being blocked. We resolved the issue by rolling back to the previous Redis cluster and restarting the pods.
The APIs experienced increased latency; however, the availability remained above the Service Level Objective (SLO).
The issue was caused by the migration to new Redis instances, which resulted in the use of ports that were not included in the exclusion list.
The issue was not detected sooner because the alert was set as severity 4 and was not noticed immediately. Availability numbers were not impacted by the change.
To prevent similar incidents from happening again, we have taken the following actions:
· Exclude the full range of Redis ports.
· Improved our testing and validation procedures for network configuration changes to catch such issues before they reach production.
· Improved deployment process of infrastructure changes by rolling out updates to a subset of users