System Status

Status of PlayFab services and history of incidents

Operational
Partial Outage
Major Outage
PlayFab MultiPlayer API Errors
Incident Report for PlayFab
Postmortem

Almost all HTTP requests sent to the PlayFab multiplayer server orchestrator (aka Control Plane) were dropped during the 90 minute outage. This occurred during an upgrade of the control plane, whereby normally the system kills existing connections to managed multiplayer servers in the current update domain and rebuild them. This is what occurred:

  1. VM heartbeat fails due to loss of connection
  2. VM heartbeat boosted to 30 times more frequent on failure
  3. Too many new connections caused CPU to spike to 100%
  4. The entire system fell into a negative feedback loop

Follow up:

  1. The most expensive requests (GetQosServers and VM heartbeat) have fixed to be dramatically less costly.
  2. We are improving our roll-out and testing processes to catch these issues before deployments.
Posted Oct 22, 2019 - 14:01 PDT

Resolved
The incident has been resolved and postmortem will be posted next week.
Posted Sep 23, 2019 - 17:21 PDT
Update
We're still investigating errors reported in the Multiplayer API for a subset of users. An update update will be posted in 20 minutes
Posted Sep 23, 2019 - 16:25 PDT
Monitoring
The issue has been mitigated and the service is recovering. We are actively monitoring the service. An update will be posted in the next 20 minutes
Posted Sep 23, 2019 - 16:00 PDT
Identified
The problem source has been identified and a service update is being deployed to fix a previously bad deployment. An update will be posted in the next 20 minutes.
Posted Sep 23, 2019 - 15:35 PDT
Update
We are continuing to investigate this issue.
Posted Sep 23, 2019 - 15:18 PDT
Investigating
We are experiencing API errors with PlayFab Multiplayer Servers. An update will be posted in the next 15 minutes
Posted Sep 23, 2019 - 15:15 PDT
This incident affected: Multiplayer Game Servers 2.0 (Request Multiplayer Server API).