On July 31, 2025, between 10:30 AM and 7:29 PM PDT, some customers experienced significant delays when using PlayStream actions for rules and segments. Action executions were delayed, with a maximum delay of over 500 minutes at peak. The incident was caused by a combination of load and configuration values for the maximum number of records each processor could read at once, which, combined with pod health logic, led to excessive memory usage and processor failures. We resolved the issue by reducing the configuration value, which restored healthy processing across all partitions.
All PlayFab titles using PlayStream actions for rules and segments were impacted. Action executions were delayed but not dropped; however, the prolonged delay meant that some actions may not have been useful by the time they were processed.
The incident was caused by a misconfiguration in the number of records each processor attempted to read, combined with a change in the logic for partition allocation per processor. As processor pods failed due to memory exhaustion, the remaining healthy pods became overloaded, leading to a cascading failure and increasing delays in action processing.
To prevent similar incidents from happening again, we have taken the following actions: