Leaderboard API Request High Latency
Incident Report for PlayFab
Postmortem

At 2018-07-24-16T08 UTC, leaderboard APIs experienced an elevated API latency and error response rate for approximate 10 minutes.  A service health check detected the failure and paged the on-call engineer, who started to investigate.  The engineer identified an issue with the Leaderboards database instance not being reachable.  The engineer rebooted the database instance, bringing the API latency and error response rate back to the normal range.

When doing the root cause analysis, the issue was identified as a database mirroring malfunction.  The database environment is configured for redundancy and high availability.  In the version of the database environment that was deployed at the time, there was a known issue where the database could become unreachable in certain circumstances.

With the service working normally again we are focusing on making sure this issue won’t happen again and look for other similar issues that could potentially happen to prevent them in the first place.  We have already put the following fixes in place:

  • The software patch was applied to the database engine which will ensure the database mirroring malfunction doesn’t happen again.

Regards,

The PlayFab Team

Posted 3 months ago. Sep 07, 2018 - 11:03 PDT

Resolved
Due to a server upgrade we experienced a spike in latency for approximately 5 minutes. Latency has returned to normal now. This occurred from 11:56AM PST (6:56PM UTC) to 12:01PM PST (7:01PM UTC). We do not expect this to recur as it was caused by applying a patch to prevent an occurrence of today's earlier Leaderboard failures.
Posted 6 months ago. Jun 24, 2018 - 12:08 PDT
This incident affected: API (Statistics and Leaderboards).