At 2018-07-24-16T08 UTC, leaderboard APIs experienced an elevated API latency and error response rate for approximate 10 minutes. A service health check detected the failure and paged the on-call engineer, who started to investigate. The engineer identified an issue with the Leaderboards database instance not being reachable. The engineer rebooted the database instance, bringing the API latency and error response rate back to the normal range.
When doing the root cause analysis, the issue was identified as a database mirroring malfunction. The database environment is configured for redundancy and high availability. In the version of the database environment that was deployed at the time, there was a known issue where the database could become unreachable in certain circumstances.
With the service working normally again we are focusing on making sure this issue won’t happen again and look for other similar issues that could potentially happen to prevent them in the first place. We have already put the following fixes in place:
Regards,
The PlayFab Team