System Status

Status of PlayFab services and history of incidents

Operational
Partial Outage
Major Outage
Event Export jobs to AWS S3 are paused.
Incident Report for PlayFab
Postmortem

On December 9th, 2024, between 3:00 PM and 11:00 AM PT the next day, some customers experienced issues with corrupted export files when accessing PlayFab's event data export to S3 buckets. The incident was caused by a bug introduced during an update to legacy code. We resolved the issue by deploying a hotfix and reprocessing the corrupted data. 

Impact 

There were 58 titles affected by this incident, specifically those configured for event exports to S3 buckets. The exports contained invalid characters, causing downstream parsing and decompression issues. The affected data was backfilled successfully by December 11th, 2024, at 7:00 PM PT. During the mitigation, exports to S3 were paused to prevent further impact. 

Root Cause Analysis 

The bug in the export process was introduced during an update to legacy code, which led to additional padding bytes being included in the export data. The codebase had not been actively maintained and lacked end-to-end tests, leaving the bug undetected during manual testing. 

Action Items 

To prevent similar incidents from happening again, we have taken the following actions: 

  • Enhanced our monitoring and alerting systems to detect anomalies in export data. 

  • Refactored the code for downloading and uploading blobs to S3. 

  • Added end-to-end tests for exports to blob and S3. 

  • Created tools for backfilling corrupted data.

Posted Dec 18, 2024 - 10:44 PST

Resolved
Reprocessing of S3 Event Exports for the period between Dec 9th, 1:00 PM PST, and Dec 10th, 6:00 PM PST has been completed. Customers are advised to check their S3 buckets for the updated data.
Posted Dec 12, 2024 - 10:20 PST
Update
A fix has been deployed and we have resumed processing S3 Event Exports. The engineering team is working on going back to reprocess exports that may have had missing or corrupted data between Dec 9th, 1pm PST and Dec 10th, 6pm PST. We will post additional updates when reprocessing is completed.
Posted Dec 10, 2024 - 17:48 PST
Identified
The issue has been identified and a fix is being implemented.
Posted Dec 10, 2024 - 15:27 PST
Investigating
We have identified an issue with Event Export jobs to S3 where some uploads contain invalid characters that may cause issues with parsing or decompressing the contents. S3 Event Export jobs are being paused and data is queued while we investigate and deploy a fix, at which time jobs will resume.
Posted Dec 10, 2024 - 11:14 PST
This incident affected: Analytics (Event Archiving).