Median latency is back to normal. We are still investigating root cause and prevention, but this issue is mitigated for now.
Posted Nov 27, 2020 - 14:13 PST
We're recovering again after the second nuclear cluster reset. Latency of responses is still higher than normal, so we will continue monitoring.
We are also going to add more resources to the cluster (VMs) and also more Asset Silo Pods to spread out the load. We are not under CPU or Memory pressure, and the network traffic is not higher than we have seen in the past. However, there may be some resource utilization issue that we cannot currently identify.
Posted Nov 27, 2020 - 13:45 PST
Pods appeared healthy for 10 minutes, but have moved back into a bad state. Retrying the nuclear cluster reset before attempting to redeploy the service.
Posted Nov 27, 2020 - 13:05 PST
We seem to be recovering again after the nuclear reset. Root cause is still unknown, but issue is likely mitigated.
Posted Nov 27, 2020 - 12:32 PST
We are continuing to investigate this issue - starting the nuclear cluster reset option which worked yesterday.
Posted Nov 27, 2020 - 12:25 PST
Outage of asset service for Economy v2 customers, Economy Classic is unaffected. Seems related to the incident from yesterday.