System Status

Status of PlayFab services and history of incidents

Operational
Partial Outage
Major Outage
Increased error rates for Cloudscript APIs
Incident Report for PlayFab
Postmortem

On June 7th, 2024, between 11:26 AM and 12:17 PM UTC, some customers experienced intermittent errors and delays when accessing PlayFab's API. The incident was caused by a rapid scaling down of the cloud script infrastructure after a network configuration change, resulting in resource starvation and overload of the available compute instances. We resolved the issue by increasing the minimum number of replicas and decreasing the maximum number of script engines per title for some heavy cloud script users. 

Impact 

Customers were seeing a 7% error rate in all cloud script calls returning InternalServerError. 

Action Items 

To prevent similar incidents from happening again, we have taken the following actions: 

  • We enhanced our monitoring and alerting systems to detect and report any anomalies in the cloud script server's behavior and performance. 

  • We updated the scaling policies for the cloud script server deployment to ensure a sufficient number of replicas and a balanced distribution of traffic. 

  • We decreased the maximum number of script engines per title for some heavy cloud script users to reduce the resource contention and improve the service quality.

Posted Jun 26, 2024 - 17:10 PDT

Resolved
This incident has been resolved.
Posted Jun 07, 2024 - 12:38 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jun 07, 2024 - 12:20 PDT
Investigating
We are currently investigating this issue.
Posted Jun 07, 2024 - 11:35 PDT
This incident affected: API (Cloud Script).