A faster tuenti. Lessons Learned in Client Side Scalability

Publicado el 07/9/2012 por Alberto Gragera, Frontend Framework Tech Lead, and Nuria Ruiz, Principal Engineer

We have been busy at work building a new Tuenti that we wanted to be much faster. In the process of renewing ourselves we needed to shake out the older client architecture and start afresh. More than doing code changes we found a new philosophy of doing things in which performance was part of the main criteria on how to architect the website.

Load Javascript on Demand

The most important shift we made was loading javascript on demand. This means downloading javascript lazily rather than eagerly. To be clear, you can download eagerly and still be downloading Javascript and CSS asynchronously. Still, that is not a great practice to follow if you have lots of javascript in the client.

Loading javascript eagerly lead us to a situation in which at some point we had about ~1.5MB of JS code for the browser to parse and execute (minimized and gzipped) much of which the user wouldn’t use at all. What we wanted was to just download the javascript we needed to display the current page. To do dynamic code downloading in the client we used YUI, which worked great for us once we integrated its build with our own.

Once we added the ability to download javascript on demand we extended it to client side translations and client side templates. Both translations and templates are compiled to Javascript by our build process and can thus be loaded as needed just like the rest of the code. We use Handlebars as a client side rendering engine, it has a great module that runs on Node.js to compile HTML templates into Javascript.

Measure Everything

We use the HTML5  performance timing API to gather performance data.:
HTML5 perf timing API
Now, we make sure you are not wasting precious keep alive connections to send performance data. We found out that sending stats data through our regular http connection was counter productive. Sending data so frequently maintained keep-alive connections in the loadbalancer for too long and that increased the loadbalancer memory usage. We ended up  sending our stats and performance data from the client to a different domain without keep-alive so as not to interfere with requests to www.tuenti.com.

We optimize everything up to the connection level. Not only we compress images, css and js, version them and use cache headers, but analyze what happens on the network level to guarantee the best experience.

For example we make your browser fetch images from several domains so browsers open more parallel connections for those resources. But not so many that dns resolution for extra domains causes more delays than benefits.

For the main site we focus on fast response times rather than increased parallelism, so we use long-lasting keepalive connections, we use the same origin domain for more types of requests to reuse open connections as much as possible and we have tuned the tcp's initial window to allow sending more data on newly established connections before waiting for client's acks.

Optimize the first Page Load. Use the right tool for the job.

We wanted to keep the Javascript needed to show the first page to a minimum -- none, if possible. So, we removed client side rendering for the first page load, which is normally the Home Page, and made sure all of it was rendered server side. Since we rely on YUI for dynamic loading we at least needed to load the code to do the YUI bootstrap. We ran experiments with slow connections to decide whether it was better to do a connection to retrieve the YUI bootstrap or to just plain inline it on the page. While fast browsers didn’t care, for slow browsers it was faster to actually do a request to retrieve the bootstrap code, since inlining it in the page actually made the payload bigger and slowed the first page response while an external request that gets cached (changes are rare) reduces the data needed to be fetched by the browser.

While we kept the javascript to a minimum, there was still some Javascript we needed to download for the page to render besides the YUI bootstrap. To make things faster, rather than 1) serve page and 2) do client side calculations to see what extra Javascript we needed to download and download it (resulting in a waterfall network panel graph), we added the ability server side to find the Javascript needed for a given url. Thus, the first pageload computes server side the javascript it needs to run that particular url and adds it to the page so by the time the user gets the page it is ready to roll.

Maintainable Javascript.

Writing maintainable Javascript is very important but it becomes critical if you work with a large number of developers. YUI provides a very good set of tools and principles to make this happen:

  • Everything is a module that can depend on other module. This allows you to create decoupled and reusable components without worrying about dependency management.
  • YUI modules run in a sandbox, which can be a little bit confusing at the beginning but that shows up a lot of benefits in the long run. For more information about this check their quick start guide.
  • Custom events provide a very simple way to isolate your components. Rather than having a component making direct calls to other components, i.e. making one know about the other and increasing coupling between them, it’s preferred to have two isolated components that can throw events, and having an upper level entity able to set up the event listening and make them work.

CSS matters

We found several performance issues related with having a large number of DOM nodes. As the chat client needs to be able render a lot of contacts, this is something we have to deal with.

Selectors

Certain CSS selectors can slow the page down. Browsers will apply selectors from right to left so the faster you are able to discard a rule, the faster the browser will be able to process the whole tree and apply the style properly to that element. e.g.

Consider this HTML code

<div>
    <p>
        <span class=”foo-upper-parent”>Bar</span>
        <span class=”foo-parent”>Bar</span>
        <span class=”foo”>Bar</span>
    </p>
</div>

which is translated to this DOM tree internally

div
 |__ p
       |__span
       |__span
       |__span

And these CSS selectors:

a) .foo { color: red; }

b) div p span { width: 100px; }

c) div > p > span { height: 100px; }

d) .foo-parent + .foo

e) .foo-upper-parent ~ .foo

All selectors will match when the browser is resolving the computed style for the span[class=”foo’] element, but what the browser needs to do on each case is very different.

For a) it just compares the class of the element with the selector. Trivial for the browser

For b) it has to go up in the DOM tree to know whether the span has a <p> element as ancestor, and then move up again in the tree to check if the <p> has a <div> element as ancestor. This is horrifically slow

For c) we reduce the ancestor lookups to just be parents. Better than b, but still bad

For d) we need to check if we have an sibling at distance=1 that has the foo-parent class.

For e) we need to check if at the same level of the DOM tree we have any sibling that has the foo-parent class.

As a summary, avoid descendant selectors, avoid tags to qualify rules with IDs or classes and use CSS3 selectors with care, thinking about what the browser will need to do to compute the style of each element.

Event delegation

Event delegation is a nice practice to reduce the number of DOM event subscribers. It uses event bubbling to centralize listeners in a container node, working more or less like this

<div id=”container”>
    <ul>
        <li>
            <span>Foo</span>
        </li>
        <li>
            <span>Bar</span>
        </li>
        ...
    </ul>
</div>

// Using YUI as an example, it’s pretty much the same on every library out there
Y.one(‘#container’).delegate(‘click’, myHandler, ‘span’)

So, when we click into one of this span elements the click event will bubble up to the <li>, then the <ul> and finally the container, whereas it matches the provided css selector, it will be properly handled.

The math is pretty simple -- the deeper your DOM is, the slower all your event handling will be too. Also, CSS sanity rules apply here too, don’t use aggressive selectors to perform the matching as the browser needs to do a querySelector (or a replacement where not available) for each of the nodes that are propagating the event.

Wrapping Up

Making  the changes outlined (and some more) resulted in a tuenti experience that is about five times faster than prior. Not only that, we no longer need a loading bar, as we download the minimum amount of resources to be able to display the page you want to see.

You can give a try to the new tuenti here: http://www.tuenti.com/nuevotuenti

Siguenos