Scaling Stateful Backend Processes at Asana: Sync Server Process Warming

June 30th, 2025

7 min read

At Asana, our sync servers are the backbone of reactive, lightning-fast updates for millions of users globally. But how do you provide reliable autoscaling for an extremely stateful backend like this? This was the core challenge we tackled as part of a broader infrastructure modernization, that also led us to tackle some major deployment and scaling opportunities.

Where we began: existing architectural challenges

Our sync servers are complex components that implement multiple parts of our reactive data loading system (which is similar to GraphQL). They manage stateful client session connections and run a variety of product-specific custom data models, access control rules, and resolvers while maintaining sub-second reactivity. This complexity makes it difficult to de-provision and provision new instances, as both traffic shifting and performant startup are independently difficult problems.

Web clients connect to sync servers via websockets

Our legacy architecture relied on a blue-green approach with static deployment sizes and cache warming as part of the deployment process: once the new cluster finished cache warming, we would switch traffic over. Long-lived, sticky sessions made it ineffective to manually scale up existing deployments to rebalance sessions across pods, especially when load was unevenly distributed. Instead, operators would forecast future peak load to determine whether to scale up ahead of a deployment.

This approach created several operational challenges:

Web socket connection overhead complicated skewed CPU load balancing and slowed down deployments: scaling up the existing deployment would trigger a thundering herd effect if we rebalanced sync sessions across all pods at once. Establishing connections between web clients and sync servers has a high overhead, limiting the effectiveness of manually rebalancing to handle skewed CPU loads. We handled this by slowly draining sessions from old sync pods, which increased total deployment time. Deployments would take 1-2 hours and frequently experienced transient failures.
Static capacity planning: Managing the fleet size required forecasting a static number of desired replicas before initiating the deployment. If we got the scaling wrong, it would lead to expensive over-provisioning or app downtime that was hard to remediate until we passed peak traffic for the day.

Despite these operational challenges, cache warming served several critical purposes that justified its complexity:

Warming in-memory caches - Caching commonly or recently accessed queries greatly improves performance.
Warming the JVM - Our sync servers are written in Scala, and rely heavily on JVM optimizations for their performance. Performing just-in-time (JIT) compilation without cache warming is challenging because of the diverse codepaths from arbitrary resolvers and generated data model code.
Canarying performance - Dark launching new deployments ensures engineers can catch performance regressions before they affect users.

Image showing the a simplified view of Asana's warming setup pre-migration

Simplified view of our warming setup pre-migration

In essence, persistent websocket connections with sticky sessions created our core constraint: we couldn't scale up existing deployments because new pods couldn't easily inherit existing connections, and rebalancing triggered thundering herd effects. This forced us into expensive blue-green deployments where we had to scale by prediction rather than demand.

Our solution

Given this core constraint with persistent websocket connections, we needed a new approach. Our design goals were as follows:

Functional requirements: The team wanted to preserve system performance - sync servers should be able to perform rolling updates and scaling operations reliably & in a reasonable amount of time, without causing a noticeable performance regression.
Time requirements: The data loading infrastructure team was time-constrained to avoid bottlenecking the rest of the Infrastructure org for the architecture migration. While we explored an alternative approach using ArgoCD’s blue-green deployments, which can create a “preview stack” without having it serve production traffic, we deemed it too risky given our need for a quick implementation.

The simplest path forward was to replicate the cache warming logic from our existing architecture, but in a slightly more performant and scalable way that didn’t require creating and waiting on this many resources.

Building it: implementation deep dive

With these goals established, we dove into implementation and faced the following questions along the way:

What type of requests should we replay?

We tested three approaches and found full websocket replay (#3) was necessary for performance:

Simple synthetic requests only (e.g., load a single task repeatedly)
Replay data loading from web sessions without using the websocket API
Replay full web sessions (full websocket-based replay)

When validating performance, we made the intentional tradeoff to focus on JITing code instead of focusing on exactly warming caches - which would have been trickier to do rolling updates. We discovered that JVM metrics were incredibly helpful for validating when warming was actually complete.

Jitter: Each JVM process runs an internal scheduler that queues no-op work items, and measures how long the thread takes to run. This value decreases over time, but starts out extremely high, adding seconds to thread execution in the worst case.
Compilation time: Asana uses JVM metrics showing how much time is spent per-pod on JIT compilation. As this metric decreases, more CPU is available for session management and data loading, resulting in better performance.

How do we handle rate limiting during replay?

We sourced web sessions for traffic replay by capturing traffic from live sessions and exporting it to a Kinesis stream, and then we read it as part of request replay. However, Kinesis rate-limited us since every pod read from the request stream directly. To solve this, we added a caching traffic replay server that could stream websocket session messages to sync servers that started the warming process.

How do we deploy without causing performance regressions?

Our sync servers are usually stable: they can flexibly adapt to bursty request volume because in our Kubernetes scheduling strategy, we co-locate several pods per node. These pods have Burstable QoS, allowing them to use more CPU than allocated. This smooths out the routing for sessions to sync servers - while traffic patterns are dependent on individual user activity, co-locating pods means we only need to worry about load averaged across a few pods.

So while co-locating pods is great for load distribution, it created specific challenges during deployments when we transitioned to our new Kubernetes rolling updates, particularly around multi-tenancy. Individual pods can consume large percentages of the node's CPU during warming, which limits our deployment rate and is problematic because:

Each pod still must be warmed enough to be performant.
Some pods serve real traffic while others are still warming.

What goes wrong:

While warming, pods can use more CPU than their requests for several minutes.
This impacts the capacity of colocated pods serving real traffic to burst.
Since each session is served by a fixed pod, the rate of incoming requests cannot easily be reduced.
This can cause concurrency to spike on the pod, which degrades latency.

To address these conflicting concerns, we:

Tuned Kubernetes resource management thresholds to allow sync pods to burst to a lower percentage.
Configured pod anti-affinity: all sync pods must be on nodes with other pods that have the same commit id.
Decreased concurrency: this is controllable with shrinking JVM threadpool sizes, limiting concurrent requests loaded, and splitting the rolling update into several waves (with a time buffer between waves).
Slowed session draining: establishing connections to sync servers is more expensive than maintaining existing ones because web clients resubscribe to all subscriptions on new sessions, so we slowed the rate that sessions “drain” from old to new pods.

The result: With these changes, we were able to colocate new sync server pods on the same nodes as ones serving traffic without affecting user-facing performance.

When is performance good enough to only use our new architecture?

Our test environments differed significantly from production - to avoid disruption to real customers, we started internal testing on Asana’s domain with isolated infrastructure, which hosts the majority of data on a single-tenant database. This differs from our production topology, where most of Asana’s customers are on shared-tenancy databases and use shared compute resources.

After getting performance on our internal domain to an acceptable state, we ran a small percentage of production traffic on the new architecture, with a quick fallback to drain all existing traffic back to the stable legacy environment, and found it was mostly good with similar parameters. We used this to validate performance before a broader rollout, and kept the fallback around for months as the rest of Asana continued the migration process.

An image showing the simplified view of Asana's warming setup post-migration

Simplified view of our warming setup post-migration

Other gotchas along the way

As the first adopters of Asana’s new infrastructure architecture, our team encountered unique 'gotchas' that ultimately yielded critical learnings, streamlining adoption for other teams across the company. These included:

Coordinating the order of rolling updates for different deployments for a stateful deployment is tricky.
We were proxying all websocket traffic through a proxy layer (sync routers) for load balancing, but accidentally rolled them at the same time as the sync servers, so we initially couldn’t attribute performance regressions to a reconnect storm versus insufficiently warmed sync servers.
When provisioning a new cell (copy of the compute infrastructure), the initialization order could stall deployments if we didn’t create replay servers before sync servers.
We needed both the initial connect and follow-up messages from the same session in the same window, since we were replaying the sync protocol. If our window was too small, we would effectively not be replaying any requests since they’d all error out due to not starting a sync session first.

What does our reactive data loading infrastructure look like today?

Fast forward to today: the new architecture has fundamentally changed how we operate sync servers.

Most of Asana’s resources now use autoscaling. This has resulted in fewer incidents related to capacity planning around peak traffic, and has also saved time for our on-call engineers.
Even “cold starts” of our infrastructure are pretty efficient, and we can failover a set of customers to new compute resources without much hassle.
We’ve made deployments faster, cheaper, and more stable - our beta servers deploy every two hours, and production deploys twice daily. We rarely see deployments fail due to transient issues.
Building upon these improvements, we’ve simplified our architecture by breaking up the sync backend into independent session management and data loading components that greatly simplifies our operating model. We are in the final stages of rolling out this new system.

These changes are just one example of the complex challenges our team is tackling daily to build a robust and scalable platform. If you're passionate about solving these kinds of problems at scale, we'd love to hear from you!

We're hiring!

Explore career opportunities at Asana. If you don’t see a role that’s right for you at Asana today, sign up for Asana’s Talent Network to stay up to date on career opportunities and life at Asana.

Apply now

Author Biography

Spencer Yu is the technical lead of the Core Storage Infrastructure team, which builds and manages the foundational cloud storage layer for all of Asana’s user data. He is a former member of LunaDb Infrastructure, the team behind Asana’s core data loading platform - the critical infrastructure that ensures users always see accurate, reactive, and lightning-fast updates across our web apps and APIs.

Team Shout Outs

This work wouldn’t have been possible without the LunaDb Infrastructure team and some of its past members: Arvind Vijayakumar, Eric Walton, George Ong, Sophia Yao, Alex Matevish, Sean Wentzel, Spencer Yu, Vinodh Kumar Chandra Murthy, Natan Dubitski

Engineering