Chat in the making

Publicado el 17/3/2010 por Carlos Abalde Backend Engineer What is the recipe to successfully deploy a large scale and cost-effective chat service in a couple of months and not die trying? Probably there are as many answers as people wanting to contribute with their ideas. Here at Tuenti we enjoy open source and innovative approaches. This has been our dynamic duo when developing our web-based instant messaging (IM) service.

Why reinventing the wheel designing the ultimate IM service? That's usually the shortest path to repeat old mistakes and, even worst, delay product launch indefinitely. Be innovative, get a high quality IM platform, extend and/or adapt it to fit your requirements, and finally, use your experience to contribute back to the community. That is the philosophy behind open source, and that's the way we wanted to build Tuenti's chat service.

Outstanding technologies There are a good amount of outstanding open source IM solutions available out there. Specifically, those based on the open messaging standard XMPP are becoming increasingly popular. Nowadays XMPP is a mature, open, distributed and extensible middle-ware ready to power next generation large-scale real-time web applications. We strongly believe in the power of XMPP, and consequently we believe in the Jabber instant messaging and presence technology as the best choice for the Tuenti's IM service.

Jabber is a powerful IM technology, but that's not enough. The beauty of working at Tuenti is that every new product must be able to handle millions of concurrent users as soon as it is launched. Particularly, our goal with Tuenti's instant messaging service was to be able to handle peaks of one million concurrent users chatting. That's the reason we arrived at ejabberd, a high performance clustered and fault-tolerant server of the Jabber protocol implemented in Erlang and deployed all over the world, from small/internal deployments to large scale ones handling millions of users.

Erlang is a functional distributed language created by Ericsson two decades ago. Ever since its inception, Erlang was specifically designed to develop large-scale, highly distributed and non-stop soft-real-time services running in telephony switches. After its publication with an open license, the Erlang Telecom Platform (OTP) has become a general-purpose framework successfully applied in many projects worldwide. In fact, ejabberd is a great example of the main Erlang/OTP strengths: its high productivity, 5 to 10 times higher than traditional programming languages --ejabberd is developed by a very small team, and its above average scalability and fault tolerance facilities for complex server projects --the prestige of ejabberd among other commercial products is a notable proof of that.

Putting all pieces together next step was gluing XMPP, ejabberd and Erlang/OTP together with the complex backend currently handling all Tuenti services. Tuenti's chat is a simplified persistent-state-less instant messaging service accessed by an unique JavaScript client implementation. However, XMPP provides lots of extra built-in features and extensions. Therefore, the big challenge when putting all pieces together was simplifying and optimizing the ejabberd implementation as much as possible in order to handle even more concurrent users per server.

Specifically, we focused on memory consumption optimizations, XMPP message efficiency, avoidance of any additional storage requirement and/or data duplications, bidirectional integration with current Tuenti's backend services, self-managed contention strategies on server overload, integration with existing monitoring systems, anti-abuse features, etc. As a result, a fully customized ejabberd implementation together with a smart partitioning and load balancing strategies where deployed in our data center to support the new service.

Lot of simulations, benchmarks and stress tests where conducted during the whole implementation process, but, how to launch a new and highly-trafficked service like a chat to a massive audience with some quality guaranties? Our approach was a combination of dark-launch and increasing rolling-out strategies: a couple of weeks before the public release of the instant messaging service, increasingly larger groups of selected users were connected to the service in the background, sending messages and reconnecting to the service every time they logged into the site.

Thanks to the dark launch several performance bottlenecks and minor bugs were detected and fixed, both in the implementation and systems architecture. The fine-tuned service was finally gradually published to all our users in just two days. As a result, Tuenti is the largest ejabberd deployment in Spain, one of the largest ones in the world, and probably up amongst the top few in the world for that combination of frontend and backend quality and usability. After the public launch, almost all Tuenti users logging into the site with supported browsers have also logged into the service, which have routed more than 100 million messages during the first week on-line.