Seattle - Core Networking - Post Upgrade failure
Incident Report for True North Cloud
Postmortem

Seattle - Core Networking - Post Upgrade failure

Summary of impact: at 3:35AM PST on Sunday May 2nd, 2021 , there was a post upgrade failure of one of our core networking switches, this caused some virtual machines in customer environments to reboot.

Root cause: We determined the recently installed firmware upgrade to our core switching to be problematic.

Mitigation: Core switch firmware was rolled back to a known good version.

Next steps: We sincerely apologize for the impact to affected customers. We are working with the switch vendor to identify the problem with the latest firmware version.

Posted May 06, 2021 - 15:45 PDT

Resolved
This incident has been resolved.
Posted May 02, 2021 - 21:38 PDT
Monitoring
The upgrade has been rolled back successfully and potentially impacted environments are being tested and monitored.
Posted May 02, 2021 - 04:28 PDT
Identified
True North Cloud services experienced a failure of a core networking switch around 4 hours after a recent upgrade. This may have briefly impacted virtual machine and other network operations. This upgrade is being rolled back.
Posted May 02, 2021 - 04:11 PDT
This incident affected: True North Cloud Seattle Overall Availability (Seattle Internet Connectivity, Seattle Virtual Machine Availability, Seattle Storage Availability, Seattle vCloud Availability, Seattle Private Circuit Connectivity).