Tuenti Group Chat: Simple, yet complex

Publicado el 28/9/2012 por Diego Muñoz, Senior Engineer

We have recently released the #1 requested feature at Tuenti, group chat.
It has been a titanic effort, months of developing the server code, client side code, and systems new infrastructure to support this highly anticipated feature. But was it so big as to take so much time?

Scope

Since 2010 improvements have been made to the chat server code (Ejabberd, using Erlang as the programming language), achieving important performance gains and lowering the server resource consumption.
We had approximately 3x better performance than a vanilla Ejabberd setup, which taking into account that we currently have more than 400M daily chat messages is not bad at all.

We also had 20 chat server machines, each running on average 6 instances of Ejabberd, and behaving even too well, under their capabilities, so resharding the machines and setting up a load balancer was appealing.

Chat history was almost done, but we had to add support of group chat. It is one of the first projects we do with HBase instead of MySQL as the storage layer.

The messages delivery system (aka message receipts) was also quite advanced in its development, but not yet finished. It uses a simple flow of Sent -> Delivered -> Read states.

Multi-presence means being able to open multiple browser windows and/or multiple mobile devices and not losing the chat connection at any of them (up to a maximum). In order to achieve this the server side logic needs to handle not only jabber ids but also resources, so that the same JID can be connected from multiple sources at the same time.

The “new Tuenti”: This new version of the main website required to focus great part of the technical resources of the company. The team in charge of the Chat not only has that responsibility, so we had to dedicate engineers to build parts of the new website.
As it implied a complete new visual look, the chat had to change its appearance too.

And of course, the group chat.

  • Being able to chat with multiple people at once
  • Roles of room owner (the “administrator” of that chat group/room), members and banned members
  • Storing the rooms even if you close the window (until you explicitly close them)
  • Supporting both default group room avatars (a pretty mosaic) or custom ones (choose any of your photos or upload a new one)
  • Supporting custom room titles
  • Room mute

The Old Chat Web Client

The web chat it is a full Javascript client, only using Flash for videochat. We use a modified opensource Javascript XMPP library, JSJaC, tailored to our needs.
A rough schematic architecture of the chat client is:

  • HTML receiver file, that performs long polling connections to the chat servers to simulate a typical socket.
  • One requests controller that processes incoming XML chat messages (stanzas, iqs and the like) using the JSJaC library, and converts them to javascript objects.
  • A chat UI controller, for instructing chat windows, buddy list, etc. commands
  • Buddylist, User and other classes, all of them twice, one with UI prefix, and the other with Data prefix. We separate UI behaviours from data handling, and all components communicate with each other (think of linked widgets more than a traditional desktop chat client application).
  • User class performs two tasks: It represents a buddy list contact, but it also represents a conversation room (stores the conversation, etc.)

The code has been working perfectly and with almost no client-side maintenance since it was launched in 2009, just adding new features and visual style changes.

What went right

New cluster: It works really good. Now we not only have load balancing, but also we can perform upgrades on one leaf and keep the chat up with the other leaf’s nodes.
Each node now has 10 machines running up to 4 instances per machine, so we actually do more with less hardware.

Cleaner, up to date code: Now we have inheritance in the chat client code, allowing to avoid repeating code by having a base chat room, then one-to-one and group rooms. Data-related classes are also now better separated from UI-related ones, a lot of the code now has lots of comments, we have private and public fields (by convention, not enforced by any javascript framework).
Many events are now handled by YUI and we have dozens of javascript files that are still bundled into one when we deploy the code live, so it eases a lot the development.
Overall, now the client will support future enhancements and additions quite faster.

Fast, very fast: Server side code is even faster. More optimized, more adapted to our needs, being able to handle up to 13 times more messages at once! Custom XMPP stanzas have been built to allow fast but correct delivery.

Everything works as expected: We didn’t had to do any tradeoffs due to technical limitations. We have kept the same browsers support (including IE7) and all features work as the original requisites defined.

Two UIs co-exist happily: Both versions of www.tuenti.com, each with their distinct UI, share all inner code and are easy to extend.

What went wrong

Ran two projects in parallel: Along with housekeeping tasks, the team had other high priority projects to work on, which took resources and time out of the group chat. Bad timing made half of the client side team dedicated to building the “new Tuenti” instead of bringing the full power until last stages of development.

One step at a time: Tuenti has migrated almost all client side code to use Yahoo’s YUI library. We had to migrate the chat client, plus do a huge code refactor to add support of group chats, plus visual changes of the new website, plus new features (chat history, receipts...). This generated a lot of overhead and a first phase of code instability where we didn’t know quickly if a bug was due to the refactor, due to YUI or due to a new feature not yet finished.
Probably would have been much better to first migrate to the new framework, then refactor and then apply the visual changes and implement or finish the new features.

Single responsibility principle: A class should have only one single responsibility. By far, the biggest and hardest part of the refactor was to separate the original User class into ChatUser and ChatRoom. We couldn’t think about group chat back in 2009, but we estimated too optimistically the impact of this change when planning the group chat.

Lack of client side tests: Old chat client had no tests, so QA has to manually test everything and this generated too a lot of overhead.
We are now getting ready a client side testing environment and framework to have the new chat codebase bug-free.

CSS 3 selectors performance: With the multipresence and the new social application, all users now have many more friends online or reachable via mobile device at once. Rendering hundreds of chat friends, plus some performance-wise dangerous CSS 3 selectors hit us in the late stages of development.
We hurried to do some fixes and we are still improving performance as some browsers still suffer a bit from the amount of DOM nodes plus CSS matching rules.

Chat in the making

Publicado el 17/3/2010 por Carlos Abalde Backend Engineer What is the recipe to successfully deploy a large scale and cost-effective chat service in a couple of months and not die trying? Probably there are as many answers as people wanting to contribute with their ideas. Here at Tuenti we enjoy open source and innovative approaches. This has been our dynamic duo when developing our web-based instant messaging (IM) service.

Why reinventing the wheel designing the ultimate IM service? That's usually the shortest path to repeat old mistakes and, even worst, delay product launch indefinitely. Be innovative, get a high quality IM platform, extend and/or adapt it to fit your requirements, and finally, use your experience to contribute back to the community. That is the philosophy behind open source, and that's the way we wanted to build Tuenti's chat service.

Outstanding technologies There are a good amount of outstanding open source IM solutions available out there. Specifically, those based on the open messaging standard XMPP are becoming increasingly popular. Nowadays XMPP is a mature, open, distributed and extensible middle-ware ready to power next generation large-scale real-time web applications. We strongly believe in the power of XMPP, and consequently we believe in the Jabber instant messaging and presence technology as the best choice for the Tuenti's IM service.

Jabber is a powerful IM technology, but that's not enough. The beauty of working at Tuenti is that every new product must be able to handle millions of concurrent users as soon as it is launched. Particularly, our goal with Tuenti's instant messaging service was to be able to handle peaks of one million concurrent users chatting. That's the reason we arrived at ejabberd, a high performance clustered and fault-tolerant server of the Jabber protocol implemented in Erlang and deployed all over the world, from small/internal deployments to large scale ones handling millions of users.

Erlang is a functional distributed language created by Ericsson two decades ago. Ever since its inception, Erlang was specifically designed to develop large-scale, highly distributed and non-stop soft-real-time services running in telephony switches. After its publication with an open license, the Erlang Telecom Platform (OTP) has become a general-purpose framework successfully applied in many projects worldwide. In fact, ejabberd is a great example of the main Erlang/OTP strengths: its high productivity, 5 to 10 times higher than traditional programming languages --ejabberd is developed by a very small team, and its above average scalability and fault tolerance facilities for complex server projects --the prestige of ejabberd among other commercial products is a notable proof of that.

Putting all pieces together next step was gluing XMPP, ejabberd and Erlang/OTP together with the complex backend currently handling all Tuenti services. Tuenti's chat is a simplified persistent-state-less instant messaging service accessed by an unique JavaScript client implementation. However, XMPP provides lots of extra built-in features and extensions. Therefore, the big challenge when putting all pieces together was simplifying and optimizing the ejabberd implementation as much as possible in order to handle even more concurrent users per server.

Specifically, we focused on memory consumption optimizations, XMPP message efficiency, avoidance of any additional storage requirement and/or data duplications, bidirectional integration with current Tuenti's backend services, self-managed contention strategies on server overload, integration with existing monitoring systems, anti-abuse features, etc. As a result, a fully customized ejabberd implementation together with a smart partitioning and load balancing strategies where deployed in our data center to support the new service.

Lot of simulations, benchmarks and stress tests where conducted during the whole implementation process, but, how to launch a new and highly-trafficked service like a chat to a massive audience with some quality guaranties? Our approach was a combination of dark-launch and increasing rolling-out strategies: a couple of weeks before the public release of the instant messaging service, increasingly larger groups of selected users were connected to the service in the background, sending messages and reconnecting to the service every time they logged into the site.

Thanks to the dark launch several performance bottlenecks and minor bugs were detected and fixed, both in the implementation and systems architecture. The fine-tuned service was finally gradually published to all our users in just two days. As a result, Tuenti is the largest ejabberd deployment in Spain, one of the largest ones in the world, and probably up amongst the top few in the world for that combination of frontend and backend quality and usability. After the public launch, almost all Tuenti users logging into the site with supported browsers have also logged into the service, which have routed more than 100 million messages during the first week on-line.

Siguenos