The Tuenti Release and Development Process: Release Time!

Posted on 9/16/2013 by Víctor García, DevOps Engineer

Blog post series: part 4
You can read the previous post here.

Release Candidate Selection

So, we have made sure that the integration branch is good enough and every changeset is a potential release candidate. Therefore, the release branch selection is trivial, just pick the latest changeset.

The release manager is in charge of this. How does s/he do it? Using Flow and Jira.

Like the pull request system we talked about in the previous post, Jira orchestrates the release workflow, and Flow takes care of the logic and operations. So:

  1. The release manager creates a Jira “release” type ticket.
  2. To start the release, just transition the ticket to “Start”

    1. When this operation begins, Jira notifies Flow and the release start process begins.
  • This is what Flow does in the background:

    1. It creates a new branch from the latest integration branch changeset.
    2. Analyzes the release contents and gather the involved tickets, to link them to the release ticket (Jira supports tickets linking)
    3. It configures Jenkins to test the new branch.
    4. It adds everyone involved in the release as watchers (a Jira watcher will be notified in every ticket change) in the release ticket, so that all of them are aware of anything related.
    5. Sends an email notification with the release content to everyone in the company’s tech side.
    6. This process takes ~1 minute.

Building, Compiling and Testing the Release Branch

Once the release branch is selected, it is time to start testing it because there might be a last minute bug that escaped all of the previous tests and eluded our awesome QA team’s eyes.

Flow detects new commits done in release (this commits almost never occur) and builds, compiles and updates an alpha server dedicated for the release branch.

Build

Why do we build the code? PHP is an interpreted language that doesn’t need to be built! Yes, it’s true, but we need to build other things:

  1. JavaScript and HTML code minimization
  2. Fetch libraries
  3. Static files versioning
  4. Generate translation binaries
  5. Build YUI libraries
  6. etc.

Compilation

We also use HipHop, so we do have to compile PHP code.
The HipHop compilation for a huge code base like ours is a quite heavy operation. We get the full code compiled in about 5 minutes using a farm built with 6 servers. This farm is dedicated just to this purpose and the full compilation only takes about 5 - 6 minutes.

Testing

The code built and compiled is deployed to an alpha server for the release branch, and QA tests the code there. The testing is fast and not extensive. It’s basically a sanity test over the main features since Jenkins and the previous testings assure its quality. Furthermore, the error log is checked just in case anything fails silently but leaves an error trace.
This testing phase usually takes a few minutes and bugs are rarely found.
Furthermore, Jenkins also runs all of the automated tests, so we have even more assurance that no tests have been broken.

Staging Phase, the Users Test for Us

Staging is the last step step before the final production deployment. It consists of a handful of dedicated servers where the release branch code is deployed and thousands of real users transparently “test” it. We just need to keep an eye on the error log, the performance stats, and the servers monitors to see if any issue arises.

This step is quite important. New bugs are almost never found here, but the ones that are found are very hard to detect, so anything found here is is more than welcome, especially because those bugs are usually caused by a big amount of users browsing the site, a case that we can’t easily reproduce in an alpha or development environment.

Updating Website!

We are now sure that the release branch code is correct and bugs free. We are ready to deploy the release code to hundreds of frontends servers. The same built code we used for deploying to that alpha for the release branch will be used for production.

The Deployment: TuentiDeployer

The deployment is performed with a tool called TuentiDeployer. Every time we’ve mentioned a “deploy” within these blog posts, that deploy was done using this tool.

It is used across the entire company and for any type of deployment for any service or to any server uses it. It’s basically a smart wrapper over Rsync and WebDav that parallelizes and supports multiple and flexible configurations letting you deploy almost anything you want wherever you want.

The production deployment, of course, is also done with TuentiDeployer, and pushing code to hundreds of servers only takes 1 - 2 minutes (mostly depending on the network latency).

It performs different types of deployments:

  1. PHP code to some servers that does not support HipHop yet.
  2. Static files to the static servers.
  3. An alpha deployment to keep at least one alpha server with live code.
  4. HipHop code to most of the servers:

    1. Not  fully true, we can’t push a huge binary file to hundreds of servers.
    2. Instead, it only deploys a text file with the new HipHop binary version.
    3. The frontend servers have a daemon that detect this file has changed.
    4. If it changed, all servers get the binary file from the artifact server.

      • This file is there as a previous step, pushed after its compilation.
    5. Obviously, hundreds of servers getting a big file from an artifact server will shut it down, so there are a bunch of cache artifacts servers to fetch from and relieve the master one.

Finishing the Release

Release done! It can be safely merged into the live branch so that every developer can get it to work with the latest code. This is when Jira and Flow takes part again. The Jira ticket triggers all the automated process with just the click of a button.

The Jira release ticket is transitioned to the “Deployed” status and a process in Flow starts. This process:

  1. Merges the release branch to the live branch.
  2. Closes the release branch in Mercurial.
  3. Disables the Jenkins job that tested the release branch to avoid wasting resources.
  4. Notifies everyone by email that the release is already in production.
  5. Updates the Flow dashboards as the release is over.

Oh No!! We Have to Revert the Release!!

After deploying the release to production, we might detect there is something important really broken and we have no other choice but to revert the release because that feature cannot be disabled by config and the fix seems to be complex and won’t be ready in the short term.

No problem!! We store the code built compression of the last releases, so we just need to decompress it and do the production deployment.
The revert process takes only about 4 minutes and almost never takes place.

You can keep reading the next post here.

The Tuenti Release and Development Process: Development Branch Integration

Posted on 8/26/2013 by Víctor García, DevOps Engineer

Blog post series: part 3
You can read the previous post here.

In the previous blog post, we mentioned that one of the requisites a development branch must fulfill is a pull request. This is the only process in Tuenti to merge code to the integration branch.
We really want to have an evergreen integration branch that is always in a good state and with no tests failing. To accomplish that, we created a pull request system managed by a tool called Flow.

The Flow Pull Request System

Flow is a tool that fully orchestrates the whole integration, release and deployment process and offers dashboards to show everyone the release status, integration branch, pull requests results, etc. A full blog post will explain this, so here we’re focusing on branch integration.

Although Flow has the logic and performs the operations, every action is triggered through Jira.


Following the above diagram, here are the steps performed:

  • The developer creates an “integration” ticket in Jira. This ticket is considered a unique entity that represents a developer’s intention to integrate code, so the ticket must contain some information regarding the code that will be merged:

    • Branch contents
    • Author
    • QA member representative
    • Risks
    • Branch name
    • Mercurial revision to merge
  • Then, the ticket must be transitioned to the “Accepted” status by a QA member, so we can keep track and be assured that at least one of them has reviewed this branch.
  • To integrate the branch, the ticket must be transitioned to the “Pull request” status. This action will launch a pull request in Flow.

    • Flow and Jira are integrated and they “talk” in both directions. In this case, Jira sends a notification to Flow informing of a new pull request.
    • Additionally, in the background, Flow will gather the tickets involved in the branch and will link all of these tickets with the integration ticket using the Jira ticket relationships.
    • This provides a good overview of what it’s included in that branch to be merged.
  • Flow has a pull request queue, so they are processed sequentially.
  • Flow starts processing the pull request:

    • It creates a temporary branch with the merge of the current integration branch and the pull request branch that will be merged.
    • It configures Jenkins and triggers a build for that temporary branch to run all tests.
    • It waits until Jenkins has finished and when it’s done, Jenkins notifies Flow.

      • Jenkins executes all tests in 40 minutes, using a special and more parallelized configuration.
    • Then, it checks the Jenkins results and decides whether or not the branch can actually be merged to the integration branch.

      • If successful, it performs the merge, transitions the Jira ticket to “Integrated,” and starts with the next pull request.
      • Otherwise, the pull request is marked as failed and transitions the Jira ticket to “Rejected”.
    • Every operation Flow sends an email to the branch owner notifying him or her of the status of pull request and adds a Jira comment to the ticket.
  • If the pull request fails, Flow shows the reason for this in its dashboard and in the ticket (failed tests, merge conflicts), so the developer must:

    • Fix the problems
    • Change the Mercurial revision in the Jira ticket because the fix would require a new commit.
    • Transition the ticket to the pull request status again to launch a new pull request.


As you can see, this process requires minimum manual intervention, only clicking some Jira buttons is enough. It’s quite stable and more importantly, it lets the developers work on other projects while Flow is working, so they don’t have to worry about merges anymore. They just launch the pull request in their Jira ticket, and can forget about everything else.

So, this is how we assure the integration branch is always safe and that tests don’t fail. Therefore, every integration branch changeset is a potential release candidate.

It’s been proved that this model has improved the integration workflow, the developers performance throughput has increased so it can be considered as a total success.

You can keep reading the next post here.

JavaScript Continuous Integration

Posted on 8/22/2013 by Miguel Ángel García, Test FW Engineer; Juan Ramírez, Software Engineer & Alberto Gragera, Software Engineer

Here at Tuenti, we believe that Continuous Integration is the way to go in order to release in a fast, reliable fashion, and we apply it to every single level of our stack. Talking about CI is the same as talking about testing, which is critical for the health of any serious project. The bigger the project is, the more important automatic testing becomes. There are three requirements for testing something automatically:

  • You should be able to run your tests in a single step.
  • Results must be collected in a way that is computer-understable (JSON, XML, YAML, it doesn’t matter which).
  • And of course, you need to have tests. Tests are written by programmers, and it is very important to give them an easy way to write them and a functioning environment in which to execute them.

This is how we tackled the problem.

Our Environment

We use the YUI as the main JS library in our frontend, so our JS-CI need to be YUI compliant. There are very few tools that allow a JS-CI, and we went for JSTestDriver at the beginning but quickly found two main problems:

  • There were a lot of problems between the YUI event system and the way that JSTestDriver retrieves the results.
  • The way tests are executed is not extremely human-friendly. We use a browser to work (and several to test our code in), so it would be nice to use the same interface to run JS tests. This discarded other tools like PhantomJS and other smoke runners.

Therefore, we had to build our own framework to execute tests, keeping all of the requirements in mind. Our efforts were focused on keeping it as simple as possible, and we decided to use the YUI testing libraries (and Sinon as mocking framework) because they allowed us to maintain each test properly isolated as well as proper reporting using XML or JSON and human-readable formats at the same time.

The Approach

This consisted in solving the simpler use case first (execute one test in a browser) and then iterate from there. A very simple PHP framework was put together to go through the JS module directories identifying the modules with tests (thanks to a basic naming convention) and execute them. The results were shown in a human-readable way and in an XML in-a-box.
PHP is required because we want to use a lot of the tools that we have already implemented in PHP related to YUI (dependency map generation server side, etc.).


After that, we wrote a client using Selenium/Webdriver so that the computer-readable results could be easily gathered, combined, and shown in report.

At this point in the project:

  • Developers could use the browser to run the test just by going to a predictable URL,
  • CI could use a command-line program to execute them automatically. Indeed, the results are automatically supported by Jenkins so we only needed to properly configure the jobs, telling our test framework to put the results in the correct folder and store them in the Jenkins job.

That was enough to allow us to execute them hundreds of times a day.

Coverage

Another important aspect of testing is knowing the coverage of your codebase. Due to the nature of our codebase and how we divide everything into modules that work together, each module may have dependencies with others, despite the fact that our tests are meant to be designed per-module.

Most of the dependencies are mocked with SinonJS, but there are also tests that don’t. Since a test in a module can exercise another, we can decide to compute coverage “as integration tests” or with a per-module approach.

The first alternative would instrument all of the code (with YUI Test) before running the tests, so if a module is exercised, it may be due to a test that is meant to exercise another module. An approach like that may encourage developers to create acceptance tests instead of unit tests.

Since we wanted to encourage developers to create unit tests for every module, the decision was finally taken to compute the coverage with a per-module approach. This way, we instrument only the code of the module being tested and ignore the others.

To make this happen, we moved the run operation from the PHP framework to a JS runner because that allowed for some of the required operations to get the coverage as instrument and de-instrument the code before and after running the tests.

Once the test is executed, the final test coverage report is generated, so it would be unnecessary to integrate with our current Jenkins infrastructure and setup all the jobs to generate the report. Coverage generation requires about 50% more time than regular test execution, but it’s absolutely worth it.

To be Done

There is more work that can be done at this point, such as improving the performance by executing tests in parallel or force a minimal coverage for each module, but we haven't done this yet because we think that the coverage is only a measure of untested code and it doesn't ensure the quality of the tests.

It is much more important to have a testing culture than just get the metrics of the code.

The Tuenti Release and Development Process: The Development Environment and Workflow

Posted on 7/22/2013 by Víctor García, DevOps Engineer

Blog post series: part 2
You can read the previous post here.

The Development Environment

We maintain a couple of development environments: a shared one and a private self-hosted one, and we encourage developers to move to the private in order to ease and speed up the development workflow, getting rid of constraints and slowness caused by shared resources.

Both infrastructures provide a full “dev” and “test” sub-environments:

  • The dev has its own fake user data to play with while coding.
  • The test environment is basically a full environment with an empty database to be filled in with test fixtures able to run any kind of tests.
  1. Unit and integration tests run in the same machine the code does, just using PHPUnit.
  2. Browser tests run in a Windows virtualbox machine we provide with Webdriver configured to run tests in Firefox, Chrome and Internet Explorer.
These two sub-environments are under two different Nginx virtual hosts, so a developer can use both at the same time.

The Shared Environment
The shared environment are virtual machines in a VMWare ESXi environment. Each development team is assigned to a single virtual machine. The management and creation of them is pretty straightforward as they are fully provisioned with Puppet. They provide full dev and testing environments with its own Nginx server, Memcache  and Beanstalk instances, a HipHop PHP interpreter, MySQL database, etc. All these resources will be shared among all the virtual machine users. The development database, HBase, Sphinx and Jabber servers are hosted in a separated machine, so, these particular resources are shared by all users.

That’s why we’re currently moving everyone to the private environment. We’ve had problems with shared resources, such as when someone executes a heavy script that raises the CPU usage to 100%, makes the memory swaps, or slows the machine down due very high IO usage, sometimes even affecting users sharing the same physical machine.

The Private Environment: Tuenti-in-a-Box
Tuenti-in-a-box is the official name for this private environment. It’s a VirtualBox machine managed with Vagrant that runs on the developer’s personal laptop. The combination of Vagrant and Puppet is a fantastic thing to set this up since it let’s us provision virtual machines with Puppet very easily, every developer can launch as many private virtual machines as he wants just with a simple command. Thus, Tuenti-in-a-box is an autonomous machine that provides the above-mentioned full dev and test environments, ridding us of problems with shared resources. Every resource is within its virtual machine with one exception, the development database that is still being shared among all users.

Right now, project exists that lets every Tuenti-in-a-box user have its own database with a small and obfuscated subset of production data.

The Code Workflow Until Integration

Developers work within their own branches. Each branch can be shared among other developers. These branches must be up-to-date, so developers frequently update them with the integration branch, which is always considered to be safe and with no broken tests (we will see how this is achieved in the next post).

Each team organizes its development in the manner they want, but to integrate the code, some steps must be followed:

  • A code review must be carried out by your teammates and it must pass the review:
  1. In order to keep track of what a branch contains and for the sake of organization, every piece of code must be tied to a Jira ticket.
  2. Doing this lets us use Fisheye and Crucible to create code reviews.
  • The code is good enough to be deployed to an alpha server.
  • The code can’t break any tests, Jenkins will be in charge of executing all of them.
  • A QA team member tests the branch and gives it the thumbs up.
  • After all of that, a pull request is done and then the branch starts the integration process (more details about this in the next post).
Jenkins Takes Part!
At this stage, testing is one of the most important things. The development branches need to be tested. Giving feedback to developers while they are coding is useful for making them aware of the state of their branches.

Those branches can be tested in three different ways:

  • Development Jenkins builds that only run a subset of tests: faster and more frequent.
  1. ~14500 tests in ~10 minutes
  • Nightly Jenkins builds, that run all tests: slower, but nightly, so developers can have all tests feedback the next day.
  1. ~26500 tests in ~60 minutes
  • Preintegration Jenkins builds, that run all tests: slower but it's the final testing before merging to the integration branch.
  1. ~26500 tests in ~60 minutes
The creation of Jenkins builds is automated by scripts that anyone can execute by command line. No manual intervention in the interface is necessary.

Our Jenkins infrastructure has a master server with 22 slave servers. Each slave parallelizes the tests execution in 6 environments, and in order to be faster, a “pipeline mode” takes 6 slaves and make them run together to executes all tests. You might be interested in what the Jenkins architecture is like and how it executes and parallelizes such a large amount of tests in such a short time, but we’ll leave that for another blog post.

You can keep reading the next post here.

Follow us