Tumblr Engineering — Colossus: A New Service Framework from Tumblr

1.5M ratings
277k ratings

See, that’s what the app is perfect for.

Sounds perfect Wahhhh, I don’t wanna

Colossus: A New Service Framework from Tumblr

One of the biggest challenges we continue to face at Tumblr is how to properly organize and scale our infrastructure as the platform continues to grow. One strategy that has been very promising is the implementation of microservices. These are small, specialized applications designed to efficiently encapsulate a single feature or component. Instead of having one monolithic application that contains the entire site’s business logic, the clean separation of responsibilities provided by microservices helps facilitate a well-organized infrastructure as well as making it easier to address bugs and performance bottlenecks.

While microservices offer plenty of advantages, they come with their own set of challenges. Microservices need to be easy to build, maintain, deploy, and monitor, and on top of that they need to be extremely high-performing and fault-tolerant. A single service may serve tens or hundreds of thousands of requests per second with strict requirements on latency and uptime.

Colossus is a new framework developed at Tumblr which addresses these challenges. It offers a lightweight, straightforward model for building high-performing microservices. Written in Scala and built on NIO and the Akka actor framework, Colossus has had a huge impact on the way we build services at Tumblr.

Microservices are not new to Tumblr, but in the past we’ve had difficulty writing services in terms of performance, stability, and maintainability. Building a service was a major undertaking and was limited to only a handful of engineers who had built up the necessary domain knowledge to effectively write them. Colossus has completely changed that picture, making it easier to develop services that are fast and fault-tolerant out of the box, greatly lowering the barrier to entry.

Our past experience in building services led to two main goals with Colossus:

Performance

By far the most important goal is that an application written in Colossus should basically be as fast as if it was written directly on NIO without any framework. This is largely because Colossus was designed to encapsulate the I/O layer of a service that was directly written on NIO, which in turn was written that way because of performance problems we faced with existing frameworks we tried. Thus we wanted to ensure that refactoring onto a framework wouldn’t incur a performance hit in the process.

The general structure of a microservice is that it concurrently processes small requests from potentially many clients and keeps little to no internal state. The reactor pattern, which uses single-threaded event loops to multiplex client TCP connections, is ideal for this situation. Thus for Colossus we aimed to build a clean implementation of this model with as little overhead as possible. In many cases entire services can be written without code ever leaving the event loop, and in cases where we do need true parallelism, using Futures and Akka actors is easy and efficient.

This hybrid actor/reactor model has ensured that Colossus meets the performance requirements we need. We’ve benchmarked Colossus services at millions of requests per second and some of our production services using Colossus have handled hundreds of billions of requests with the 99.99th percentile latency under 5ms.

Simplicity

The other major goal is to ensure that Colossus is a small, focused framework with as low of a barrier to entry as possible.

Simplicity comes in two related but different flavors: simplicity of the framework and simplicity of applications using the framework. Simplicity of the framework comes from the fact the Colossus is largely focused on just microservices. While the core of Colossus is a fully generalized wrapper on NIO, most of our efforts have gone into the microservice use case, which lets us keep the code base small and straightforward.

Simplicity of the application largely depends on how the framework presents its API. In this case we are heavily taking advantage of the fact the Colossus is written in Scala. One of the biggest benefits of Scala is the ability write incredibly expressive code and design simple DSL’s with minimal boilerplate. Furthermore, because Scala places a heavy emphasis on type safety and functional programming, we wanted to ensure that Colossus reflected these principles as much as possible. This leads to being able to write applications more concisely such that the focus can stay on the business logic and not the boilerplate.

These principles have allowed Colossus to become a fundamental part of Tumblr’s infrastructure, and has put us on the path to a more service-oriented architecture.

I’m happy to announce that Colossus is now open-sourced under the Apache License and is available on Github. While it is still a work in progress and we are releasing it as a pre-1.0 version, Colossus has already significantly improved the way we build services at Tumblr and we are currently leveraging Colossus in production for several back-end systems with great success. So take a look, try it out, and let us know what you think!