On December 11th, 2024, between 9:20 AM and 11:20 AM PST, some customers experienced processing delays with PlayFab's scheduled tasks. The incident was caused by a bad configuration change. The issue was resolved by reverting the configuration change.
During the incident, the scheduled task processor failed to process any messages for approximately 2 hours. Scheduled tasks queued to run during this time were delayed or did not trigger, but no customer data was lost.
To prevent similar incidents from happening again, we have taken the following actions:
Created a repair item to update our mock unit tests to catch regressions related to this configuration change.
Investigated why end-to-end tests and integration deployment did not catch this issue.
Added a new production alert for no messages processed within a specified time frame.
Adjusted existing production alerts to trigger faster when no scheduled tasks/messages are processed.