At 2018-07-11T00:54 UTC, an engineer ran an automated job to re-route a title’s traffic from one cluster to another by updating DNS records. Moving a title from one cluster to another is a routine job that includes specifying the title ID they are migrating into a form. The engineer pasted an incorrect value into the job form, which resulted in the job updating the DNS records for the primary public API cluster to an incorrect destination.
This resulted in a majority of API calls failing because they were routed to the wrong cluster. Within about two minutes the engineer noticed the drop-in request volume and started to investigate. Shortly after that, a service health check detected the failure and paged the on-call engineer, who started to investigate. The engineer identified the issue and deployed a fix in 7 minutes.
With the service working normally again we are focusing on making sure this issue won’t happen again and look for other similar issues that could potentially happen to prevent them in the first place. We have already put the following fixes in place:
The PlayFab Team