The Epic Tale of Moving to HipHop

Posted on 3/05/2013 by By: Daniel Paneda (Senior Engineer) and Jaime Medrano (Principal Engineer)

This article will not explain what HipHop itself is (if you are not familiar with this software, you can read about it here). What we’d like to share are the main changes we had to make within our codebase to get things working in HipHop and evaluate if –performance wise– it was worth it.

TL;DR: it was very much worth it. Performance improvements have been significant. It was not easy to do the migration, and it took us about a year to make our full code base HipHop compliant. We had to make many changes on our side so about a million of lines of code would be compiled on HipHop but that wasn’t it- we also needed to make several changes to HipHop itself...but back to our epic tale!

While the driving motivator to move to HipHop was performance (we expected a much lower usage of CPU and memory and somewhat lower latencies), there were also added benefits to having our code base compiled on HipHop. An important one is that HipHop can be used to do static analysis.

 

Changes to PHP code

If your codebase is on the bigger side, be prepared to invest a lot of effort in getting something that can run on HipHop. The more extensions you use, the greater the chances are that you’ll have to invest in some major refactoring. Unit and integration tests are your friends. Our browser tests were almost useless in helping us find issues mostly because, when compared to unit and integration tests, they had a stronger non-deterministic behaviour. However, without unit tests, our refactoring efforts would have taken a lot longer.

The most important changes we had to make to the code were the following:

  • Get rid of all PHP > 5.3 functionality
  • Remove all uses of eval and dynamic defines.
  • A lot of work regarding having files physically on disk. When possible, we tried to remove instances of those files from PHP code. When that wasn’t an option, we had to get creative. Since HipHop keeps an internal cache of files in order to resolve PHP requires, we hooked up to this functionality to resolve our own files that, in some cases, were not PHP files.
  • Moved configuration from PHP code to json files.
  • Refactor some non-supported features:

    • Removed any uses of ArrayObject since it is not implemented on HipHop (ArrayObject is not your friend, don’t use it).
    • Remove ftp_* functions using curl extension.
    • The only stream wrappers supported are http, zlib, and some php ones (memory, temp, input, output and stderr).

HipHop improvements

Sometimes, refactoring the PHP code to make it HipHop compliant  was not an option or even the best option. In some cases, it was very difficult to remove the dependency on some PHP extensions we were using. In others, we discovered a difference in behaviour between PHP and HipHop that we needed to fix. The code quality of HipHop is pretty good, so fixing things is not a big problem. The problem is finding the exact behaviour difference that exists between PHP and HipHop that produces any given bug.  Again, unit tests to the rescue here.

The most important changes that we had to make to HipHop code were the following:

  • Migration to libevent2 (the stock version is quite old and has some memory leaks)
  • Create a proper build script on top of CMake for debian packages
  • Many, many modifications to match PHP behavior.
  • Some stability fixes in HipHop code regarding non-thread safe PHP libraries we were using. For example, we moved the locale implementation from libc  to boost (thread-safe implementation)
  • Addition of new extensions:

    • MemcachePool fully compatible with PHP one (also involves large changes on libmemcache to fully support UDP).
    • Configuration handling
    • Filter extension
    • Geoip extension
    • GMagick extension
    • Some internal extensions
  • Summary of total changes, excluding autogenerated files
666 files changed, 61990 insertions(+), 35065 deletions(-)
So, in the end, there are major differences between the vanilla HipHop and the one we are using. Of the changes we made, the ones we thought everyone could use are now pull requests to HipHop.
 

Environment changes

Moving over to HipHop wasn’t all about code changes. There were big changes needed in our development environment.  For example, we run HipHop on interpreted mode in development and we only fully compile in later testing phases. Changes to the development environment and the code affected a lot of teams- from devops, to backend engineers to QA. Our main concern was to make the migration as transparent as possible when it came to the development environment. 

Main changes made in this area were the following:

  • Moving development machines to use HipHop instead of php-fpm.
  • Migrate the base debian distro we are using on live and in development.
  • Create a compilation farm with distcc to be able to compile our source code faster.
  • Create new deployment tools to distribute and run HipHop binary balls.

Wrap Up

Moving to HipHop took a huge effort. In our case, it took two people working full time for over a year to complete the migration. On the way, not only did we improve the quality of our codebase but also our tools for building and deployment.

While it was a very long development effort, the reward was excellent to say the least. Since we migrated our fleet to HipHop, we’ve dropped our response times (measured from server side) a whooping 45%.  We use 2-3x less CPU and 10x less memory.

Follow us