Microsoft Azure PlayFab

System Status

Status of PlayFab services and history of incidents

Operational
Partial Outage
Major Outage
Multiplayer Servers - Unable to build new servers in Japan East
Incident Report for PlayFab
Postmortem

Starting 05/30 01:35 PDT, new virtual machines to fulfill Multiplayer Server demand were not being generated in Japan East. This eventually caused allocation issues as standing-by pools did not refill.

The Multiplayer Server control plane uses an array of Orleans grains to orchestrate server builds. A typically benign restart of a grain silo uncovered an Orleans bug where other silos did not pick up the restarted node. The issue was mitigated at 11:00 PDT by restarting all nodes serving Japan East.

Short-term (June) repair actions include:

1. Repairing the Orleans issue to be more resilient to node restarts

2. Increasing the sensitivity of our alerts so that single region issues are escalated more rapidly by engineering

Long-term repair actions (this quarter) include:

3 Providing monitoring tools for Multiplayer Server regions in Game Manager

Posted 2 months ago. Jun 11, 2019 - 23:16 PDT

Resolved
Multiplayer servers were experiencing issues in Japan East, resulting in customers not able to spin up servers in this region. The issue was resolved within 8 hours.
Posted 3 months ago. May 30, 2019 - 01:30 PDT