To queue or not to queue: simplifying our messaging architecture with SocketIO

One of Curai’s core products is First Opinion (now Curai Health), a chat application where users can connect with medical providers to access health information and address their primary care needs. Within our chat app, messages are exchanged back and forth and many of these messages require calls out to ML services. Our legacy codebase handled much of this time-consuming and CPU intensive work via a job queue system. This system was a classic example of high-complexity, low-reliability architecture. As we ported our codebase from Python 2 to 3, we saw an opportunity to replace the jobs and move towards a simpler and more maintainable architecture better suited to our product needs. For anyone considering whether or not they need a job queue, we hope this post will serve as an instructive example.

The Legacy Architecture

The above system had too much indirection for our current product needs and the complexity did not have the added benefit of preparing us for future scale.

In our legacy architecture, messages were sent from the frontend to the backend via a WebSocket connection. Our backend asynchronously sent along pertinent payloads to Amazon’s SQS, which handled queuing up the jobs in a FIFO fashion. Some (but not all) of the queued jobs made additional calls out to separate machine learning services.

This system resulted in a number of product-impacting issues:

Sometimes a message would be sent twice. This occurred due to a race condition — because SQS does not guarantee single delivery, multiple job consumers could pop the same message off the queue.
Sometimes messages were delayed. This happened most frequently due to a backlog of tasks waiting to run synchronously in the SQS queue, especially when time-intensive jobs such as those calling out to models were running in the line ahead. This issue was exacerbated because there was no notion of priority — time-sensitive jobs were mixed with low-priority tasks.
The only way to send messages back and forth was to use a queue — even the simplest messages types had to go through a queue.

The above system also had too much indirection for our current product needs and the complexity did not have the added benefit of preparing us for future scale. Adding a new message type required engineers to modify five separate files and familiarize themselves with SQS tooling, and once the work was done, flaky performance was hard to test and debug.

Requirements and considerations

As we set out to build something better, we wanted what we created to satisfy the following requirements:

Improve reliability and reduce flakiness: reduce instances of dropped and duplicate messages
Reduce maintenance cost: we wanted something easier for our team to understand, built at a complexity that matched our current business requirements.
Feature parity for the end user: from the user’s point of view, messages should be sent and received at the same speed if not faster.
Set us up for scale: at the same time as we wanted to avoid over-engineering now, we wanted our message handling system to be one we could build upon for future scale.

In deciding our path forward, deprecating the queue jobs completely was not a foregone conclusion. Many of the disadvantages we’ve enumerated thus far were a result of the implementation of our job queue; the drawbacks are certainly not endemic to queues or SQS itself. It is also important to note that as we look across the tech ecosystem to how many large scale chat applications are built, we see job queues as a common pattern. Notably, as of 2017 Slack’s job queue system was processing 1.4 billion jobs daily. For Slack, the job queue is an integral part of the architecture enabling reliable service to more than 10 million daily active users. To provide a point of comparison, in First Opinion in 2019 we saw on average 14.7k messages sent per day. With Slack and other companies (such as Quora) using message queues to great effect, why then would we forego our job queue completely? What it came down to was identifying the best fitted engineering solution to meet our current business requirements. Our choice here was a nod to the agile or extreme programming idea to “Do the simplest thing that could possibly work.”

Our choice here was a nod to the agile or extreme programming idea to “Do the simplest thing that could possibly work.”

The path forward

We satisfied all our requirements by integrating Flask-SocketIO into our stack. Flask-SocketIO enables us to connect to socket.io, which provides a real-time and bidirectional client-server communication. In terms of implementation details, at the start we had about 20 jobs we were aiming to replace. We did so incrementally, removing the jobs one-by-one and then removing the old infrastructure completely at the very end.

With each job migration, we had a decision to make: would we run the task synchronously or asynchronously? As a guiding principle, we decided to handle tasks that relied on results from ML models asynchronously and everything else synchronously. For example, sending a wait time message to the user could be executed synchronously on the parent thread of execution. We used SocketIO’s emit() function to send messages between our front and backends. Many tasks that were once navigating a complex loop (chat server to SQS back down to queue consumer to chat server again), are now handled quite simply (directly between frontend and chat server)! In order to avoid delays for more time-consuming tasks such as calls out to ML models, we leveraged SocketIO’s start_background_task() function. With start_background_task(), we spawn a background green thread within our server so that other processes are not blocked during execution.

The pros and cons of the path we chose

Investing engineering time to port a legacy system is never a given. Because we had to make changes for Python 3 compatibility and because our entire team had consistently felt the pain of maintaining the legacy system, we felt a few days investment here was worth it. Replacing our job queues increased the percentage of tasks handled synchronously which carried with it the risk of increasing latency of some message deliveries. On the other hand, without the queue jobs our architecture is much simpler, and being able to spawn green threads for tasks that are specifically time-consuming has its own latency reducing benefits. In summary, we feel the pros outweigh the cons: we chose the path of refraining from over-engineering and architecting thoughtfully at our current scale. For fans of pro/con lists, let’s enumerate these trade-offs in a tabular fashion:

Our work here was a lesson in how the most shiny or “industry standard” solution is not always the best one for your current needs

Results and a look to the future
As our application scales and as we introduce more ML services to augment our doctors’ work, we imagine a scenario where we would integrate PubSub or a similar service so that our frontend could speak to microservices directly. For now, with basic SocketIO tooling, we are handling events with better reliability and faster speeds for the end-user.

In short, with simpler and less fancy infrastructure we now have better performance. While chat applications handling much larger volumes may benefit from the job queue pattern, we learned that by deprecating our job queue we were able to vastly simplify a core piece of our codebase while serving our users better. Our work here was a lesson in how the most shiny or “industry standard” solution is not always the best one for your current needs, and that sometimes engineering simplicity really does have positive impacts for engineers and users alike.

If Curai’s engineering work interests you, please check out our job page and don’t hesitate to say hi 👋 To continue the conversation, you can also find me on Twitter.

Thanks to Matt Willian for collaboration and mentorship on this project.

To queue or not to queue: simplifying our messaging architecture with SocketIO

Stay Informed!