Tuenti Group Chat: Simple, yet complex

Posted on 9/28/2012 by Diego Muñoz, Senior Engineer

We have recently released the #1 requested feature at Tuenti, group chat.
It has been a titanic effort, months of developing the server code, client side code, and systems new infrastructure to support this highly anticipated feature. But was it so big as to take so much time?

Scope

Since 2010 improvements have been made to the chat server code (Ejabberd, using Erlang as the programming language), achieving important performance gains and lowering the server resource consumption.
We had approximately 3x better performance than a vanilla Ejabberd setup, which taking into account that we currently have more than 400M daily chat messages is not bad at all.

We also had 20 chat server machines, each running on average 6 instances of Ejabberd, and behaving even too well, under their capabilities, so resharding the machines and setting up a load balancer was appealing.

Chat history was almost done, but we had to add support of group chat. It is one of the first projects we do with HBase instead of MySQL as the storage layer.

The messages delivery system (aka message receipts) was also quite advanced in its development, but not yet finished. It uses a simple flow of Sent -> Delivered -> Read states.

Multi-presence means being able to open multiple browser windows and/or multiple mobile devices and not losing the chat connection at any of them (up to a maximum). In order to achieve this the server side logic needs to handle not only jabber ids but also resources, so that the same JID can be connected from multiple sources at the same time.

The “new Tuenti”: This new version of the main website required to focus great part of the technical resources of the company. The team in charge of the Chat not only has that responsibility, so we had to dedicate engineers to build parts of the new website.
As it implied a complete new visual look, the chat had to change its appearance too.

And of course, the group chat.

  • Being able to chat with multiple people at once
  • Roles of room owner (the “administrator” of that chat group/room), members and banned members
  • Storing the rooms even if you close the window (until you explicitly close them)
  • Supporting both default group room avatars (a pretty mosaic) or custom ones (choose any of your photos or upload a new one)
  • Supporting custom room titles
  • Room mute

The Old Chat Web Client

The web chat it is a full Javascript client, only using Flash for videochat. We use a modified opensource Javascript XMPP library, JSJaC, tailored to our needs.
A rough schematic architecture of the chat client is:

  • HTML receiver file, that performs long polling connections to the chat servers to simulate a typical socket.
  • One requests controller that processes incoming XML chat messages (stanzas, iqs and the like) using the JSJaC library, and converts them to javascript objects.
  • A chat UI controller, for instructing chat windows, buddy list, etc. commands
  • Buddylist, User and other classes, all of them twice, one with UI prefix, and the other with Data prefix. We separate UI behaviours from data handling, and all components communicate with each other (think of linked widgets more than a traditional desktop chat client application).
  • User class performs two tasks: It represents a buddy list contact, but it also represents a conversation room (stores the conversation, etc.)

The code has been working perfectly and with almost no client-side maintenance since it was launched in 2009, just adding new features and visual style changes.

What went right

New cluster: It works really good. Now we not only have load balancing, but also we can perform upgrades on one leaf and keep the chat up with the other leaf’s nodes.
Each node now has 10 machines running up to 4 instances per machine, so we actually do more with less hardware.

Cleaner, up to date code: Now we have inheritance in the chat client code, allowing to avoid repeating code by having a base chat room, then one-to-one and group rooms. Data-related classes are also now better separated from UI-related ones, a lot of the code now has lots of comments, we have private and public fields (by convention, not enforced by any javascript framework).
Many events are now handled by YUI and we have dozens of javascript files that are still bundled into one when we deploy the code live, so it eases a lot the development.
Overall, now the client will support future enhancements and additions quite faster.

Fast, very fast: Server side code is even faster. More optimized, more adapted to our needs, being able to handle up to 13 times more messages at once! Custom XMPP stanzas have been built to allow fast but correct delivery.

Everything works as expected: We didn’t had to do any tradeoffs due to technical limitations. We have kept the same browsers support (including IE7) and all features work as the original requisites defined.

Two UIs co-exist happily: Both versions of www.tuenti.com, each with their distinct UI, share all inner code and are easy to extend.

What went wrong

Ran two projects in parallel: Along with housekeeping tasks, the team had other high priority projects to work on, which took resources and time out of the group chat. Bad timing made half of the client side team dedicated to building the “new Tuenti” instead of bringing the full power until last stages of development.

One step at a time: Tuenti has migrated almost all client side code to use Yahoo’s YUI library. We had to migrate the chat client, plus do a huge code refactor to add support of group chats, plus visual changes of the new website, plus new features (chat history, receipts...). This generated a lot of overhead and a first phase of code instability where we didn’t know quickly if a bug was due to the refactor, due to YUI or due to a new feature not yet finished.
Probably would have been much better to first migrate to the new framework, then refactor and then apply the visual changes and implement or finish the new features.

Single responsibility principle: A class should have only one single responsibility. By far, the biggest and hardest part of the refactor was to separate the original User class into ChatUser and ChatRoom. We couldn’t think about group chat back in 2009, but we estimated too optimistically the impact of this change when planning the group chat.

Lack of client side tests: Old chat client had no tests, so QA has to manually test everything and this generated too a lot of overhead.
We are now getting ready a client side testing environment and framework to have the new chat codebase bug-free.

CSS 3 selectors performance: With the multipresence and the new social application, all users now have many more friends online or reachable via mobile device at once. Rendering hundreds of chat friends, plus some performance-wise dangerous CSS 3 selectors hit us in the late stages of development.
We hurried to do some fixes and we are still improving performance as some browsers still suffer a bit from the amount of DOM nodes plus CSS matching rules.

Follow us