On Tuesday, May 16, Asana was down for approximately 30 minutes and partially available for another hour. In the spirit of transparency, we want to let our customers know what caused this outage and how we resolved it.
On Tuesday morning at approximately 8:40AM PDT, one of our databases became unreachable. The databaseβs automatic failover did not kick in, so the database stayed unreachable until an on-call engineer triggered a failover manually. After the database was failed over, many of Asanaβs API servers did not correctly recover, so they also needed to be manually restarted.
Weβve since discovered a regression that stopped the API from recovering gracefully. That bug has been fixed and is making its way through our Continuous Integration pipeline. Weβre also working with AWS to understand why the database failed and why it didnβt automatically recover. Finally, weβre working to understand any other reasons why our API servers did not correctly recover once the database was back up.