How we modified our infrastructure to deploy an EU data center

Asana Engineering TeamEngineering Team
30. März 2020
4 Lesezeit (Minuten)
facebookx-twitterlinkedin
How we modified our infrastructure to deploy an EU data center

We recently deployed additional infrastructure in Europe to give enterprise customers more control over where their data is hosted. To better serve customers hosting data from Europe, we had to make a lot of changes to our infrastructure. This post explains the issues we had to solve in order to allow Asana to run from multiple regions. By addressing these issues, we were able to successfully roll out the European region. 

Background: The life of an Asana session

When a user visits app.asana.com, our goal is for the app to load as quickly as possible. We have a dedicated Pageload Service that serves our initial application HTML. The rest of the application is then loaded in the browser. The browser then connects to LunaDb, Asana’s internal application data graph server, to read/write to the application. (If you’re familiar with GraphQL, it’s more or less a GraphQL server, except it automatically updates clients whenever any relevant data changes.)

To support customers with data housed in our European region, we needed to modify our application loading procedures. We wanted sessions to be able to directly connect to backends hosted in Europe for a given session. Our Pageload Service identifies European sessions and changes the connection information in the initial HTML, allowing the browser to directly talk to LunaDb in Europe.

[IA Blog] Life-of-a-session-1 (Image 1)

When a user visits app.asana.com from a European domain, the flow is as follows:

  1. Check that the user is authenticated, and redirect to the login page if that is not the case.

  2. Determine the domain the user is trying to access and identify which region that domain database resides in. In our example, the domain database is located in Europe.

  3. Mark Europe as the service region in the initial HTML and send this back to the browser.

  4. The browser then asynchronously connects to the European region, establishes a session and syncs the data to the application.

  5. All subsequent application access is synced directly to Europe.

Issue 1: Data Model sharding

In 2015, we took Asana from one database to multiple. Today, the details on how we shard data remain roughly the same. Customer data in Asana can be stored in domains, and it is sharded by domain. 

We have an additional shard (the master shard) that stores user data and keeps track of the domain membership for each user. The master shard is stored in a centralized database called the Master Shard Database. This database does not store any data related to tasks, projects, or teams. 

[IA Blog] Sharding-diagram-3 (Image 2)

Each database stores data for multiple customer domains, and each domain is isolated to a single database. User data is stored on the database that contains the master shard.

[IA Blog] AWS-region-520x228 (Image 3)

Though the number of databases has gone up, all of these databases had been isolated to a single AWS region within the United States. All of our serving infrastructure was then deployed alongside the databases. When we built our services originally, we made the assumption that they would be able to read and write data to any database as required. 

This poses a problem when a portion of these databases resides in Europe. When domain databases are located outside of our U.S. AWS region, the assumption of full read/write access to any database no longer holds. Instead, we needed to introduce a new restriction that each session could only access data in a single domain shard.

[IA Blog] Multi-Cluster-Diagram (Image 4)

In this setup, we deploy domain databases in Europe while keeping the master shard in the United States. However, our users should still be able to be members of multiple domains across multiple regions, so we make the master shard available in Europe through an encrypted channel.

Issue 2: Constraints from European infrastructure

Storing customer data in Europe also requires us to deploy infrastructure that is capable of processing and serving the data. We made the choice to not expose the domain databases over the region boundary. This hard restriction makes it very difficult to accidentally store/process data in the wrong region. Within our European region we deploy copies of several of our services. This allows the application to be served directly from the European region, instead of being proxied through our U.S. region. This further ensures that European data is not funneled through infrastructure serving from the U.S.  

The master shard data is still stored in our U.S. region. This creates a big hurdle to our setup in that the master shard data still need to be accessible in the European region. To facilitate this and other cross-region communication, we built a new cross-region service mesh framework called AsanaServices. (I’ve simplified the details of this data access for the purposes of this post.)

Cross-region data access requests add about 100ms in latency, which affects master shard access from the European region. We mitigate this by duplicating frequently accessed fields from the master shard onto the domain database. This means that we are rarely required to directly access the master shard.

Issue 3: Introducing data access restrictions

Customer data should not be accessed across regions. If a session in one region requested data from a domain shard in a different region the data would be inaccessible. To avoid this behavior, we decided to restrict data access within a session to the master shard and a single customer shard. This restriction effectively removes the concern of cross-region data access, as sessions only connect to a single domain shard, and therefore only connect to a single region. 

[IA Blog] Session-Isolation (Image 5)

Each session can then be configured to directly sync with the appropriate region depending on where the domain is. This restriction allows us to avoid the problem of cross-region access.

However, we needed to address pre-existing violations of this restriction in our codebase, and we needed to do so before we could launch. We added non-user-visible monitoring to alert on cases where the restriction was violated. This automated testing enabled us to address violations quickly.

Looking ahead: Building for scale

These changes represent the next phase of scale now that we host Asana across multiple continents. The tools we built to enable this rollout are now core components of our infrastructure, and they will enable us to develop new infrastructure and launch more regions in the future. 

Does solving these kinds of infrastructure problems sound interesting? Our Infrastructure team is hiring — check out our open roles and apply today.