Tuenti hosts the 9th PHPMad Meetup about Emergent design with phpspec

Posted on 10/23/2014 by Eng. Outreach Committee

As every month, we hosted the 9th PHPMad meeting in the kitchen of our central Madrid office. Francisco Santamaría talked about “Emergent design with phpspec”. We saw a brief introduction to the tool specBDD called phpspec and how it can help us designing our application, being much more than a simple testing tool.

In addition to PHPMad, we are open to hosting other user groups meetings and talks so, if you’re interested in organising a tech or design-related event, get in touch!

Spawnable Framework - Scripting for everyone

Posted on 9/02/2010 by Nick Flink Lead Dev Tools Engineer

In tuenti we are constantly running scripts. We run them to change the configuration of the site, to update the code on the site and even to check out our current working branch. Having a good scripting framework is obviously important when it comes to automating manual tasks. The way that we have chosen to look at scripts is a bit different than other approaches, because our scripts are quite important to us. We set up our servers so that we can deploy scripts to nearly every server and run them in a consistent manner across all of them. How we have managed to do that is to put our scripts at the top-most layer (same as the traditional index file), as a slightly different entry point. So if index.php is the cgi entry point to our site, Spawn.php is the cli entry point to the same site but is generally used for different purposes.

Using scripts

Scripts are often used to migrate users using older modules of our site to use the new ones or to prime memcache as to not strain the db with every little change. Other times we want to write scripts that work well from the command line with arguments manual pages etc. Sometimes we want to run a script on another server, perhaps even as another user. Many times you want to just run some script every day to analyze a specific set of data.

Requirements

Our scripts need to be maintainable (with code using regular domain layer where possible), easy to use and write for developers and easily executed in parallel. To increase maintainability we've decided that all scripts will have one entry point. Taking this approach leads us to: php Spawn.php . To make the lives of our developers easier, the system has to be equipped with bash completion and by default support execution in different threads. Spawn.php can only run jobs that implement the Spawnable interface. To run a job, Spawn.php simply calls a static function run() as shown below in the Spawnable interface.


interface Spawnable{
    static public function run();
    public function startJob();
}

Script types

CommandLineArgJob

The command line arg job allows easy use with the bash command line. It has bash completion, man pages, and uses reflection to construct your CommandLineArgJob with the same names used in the constructor on the command line. For example the following code class would be constructed with $number=5 and then startJob would run and echo the sent number.


class MyJob extends CmdLineArgJob{
    private $number;
    public function __construct($number){
        $this->number = $number;
    }
    public function startJob(){
        echo 'number = ' . $this->number;
    }
}

me@aserver:~#php Spawn.php MyJob number=5 number = 5

Bash completion tells you all of the optional arguments and if you miss one that isn't optional eg. $number, the framework will tell you that it was required.

ForkedJob

We use forked jobs mainly for migrations. Executed jobs are being controlled by a parent process which fetches arguments for each of them from the input the script has received. As long as there is data for the next job and it hasn't exceed the maximum number of active jobs, it will fork new threads with new jobs assigned to them. Developers can optimize the number of used resources against the speed of execution by changing the allowed number of active forked threads.

QueuedJob

The final job type is the most complicated and flexible one. The system requires a queue client daemon (QCD from now on) to be running on a target machine specifically configured to run particular types of jobs. A page execution can then create a job and enqueue it into a named queue located on one of our queue servers. If there is a QCD waiting on the queue server socket when the job arrives at the queue server it will be routed to the server on which the QCD is running. This allows us to specifically configure servers to run different jobs. Jobs that are transfered to the QCD through a queue server must extend the QueuedJob, which has an additional method named enqueue() beyond the normal run() and startJob() methods. QueuedJobs are transfered as serialized php classes, so when QCD deserializes them, all of the member variables remain intact. Finally the QCD calls the startJob() method and the job is executed. This has an interesting effect where in the constructor is actually called on the frontend server and the startJob is called on the processing server.

An example QueuedJob could appear like this: php Spawn.php ExampleJob a:1:{i:0;a:2:{i:0;s:10:"ExampleJob";i:1;O:10:"ExampleJob":2:{s:15:"^@ExampleJob^@id";s:3:"156";}}} An advantage to the serialized packaging is the ability to send by email specific commands. For example if I were to find a bug in the above ExampleJob, I could even send by email the packaged command line to another developer who would simply copy paste the line into a command prompt to reproduce and start debugging.

Architectural description of the new Frontend Framework

Posted on 6/04/2010 by Andrzej Tucho?ka Lead Code Architect Prem Gurbani Frontend Architect

Abstract

Web applications are becoming more feature rich and ever more ubiquitous, while trying to address the needs of millions of users. Server-side scalability and performance are a serious matter as farms are unceasingly expanding for high growth sites. User Interface designers are looking for novel approaches to stimulate user engagement which results in more advanced and feature-rich clients. Remaining competitive requires companies to constantly push and release new features along with overall UI redesigns. The evolution of the web paradigm requires architectural changes that use models that will increase the flexibility and address the scaling problems in terms of performance, development process and demanding product requirements. This paper presents a web architectural design that decouples several layers of a web application, while delegating all presentation related routines to the client. To address organizational concerns in the design, client-side layers have been highly decoupled resulting with a well defined, natural responsibilities for each of them: structure - HTML, layout - CSS, behavior - JavaScript. The server exclusively produces data that is sent to the client and then mapped to the HTML templates taking advantage of interpreting data structure (presence of items, detecting sets of data, manual operations) for constructing result view to the user. The data produced by the server is client-independent enabling reuse of the server API by several clients. Overall, the strong responsibilities of identified layers allow parallelizing the development process and reduce operational friction between teams. The server-side part of the frontend framework is designed with the novel Printer-Controller-Abstraction (PCA) that was constructed as a variation of the Presentation-Abstraction-Controller (PAC) architectural pattern. The design keeps the high flexibility of the graph of controllers and introduces additional concepts like response caching and reuse and allows easy changes of the input and output formats.

State of the Art

The current system is running rich-client JavaScript and HTML/CSS generated by the server. The Responses are generated using an in-house designed template engine working on a MVC-like (Model-View-Controller) framework. From an organizational perspective frontend engineers build the controllers and views in PHP and afterwards the framework populates the templates (mixture of PHP, HTML and Javascript) to produce the output. Since each part of the response can be generated with PHP, they are taking many shortcuts that result in tight coupling of every possible piece of the code. From a design perspective the existing front-controller is an ad-hoc control flow that routes the calls to the MVC framework. The standard request protocol is done through URL-visible GET requests complemented with data sent via POST. The existing template engine is tightly coupled with the View, and delivers final HTML to every user.

The existing system has several problems that need to be addressed on an architectural level: - minimize inter-team dependencies that force the work organization to be sequential, - avoid duplicating work while introducing additional client applications, - maximize possible optimization and caching solutions that can be implemented on several levels of the system, - reduce TCO (Total Cost of Ownership) by reducing bandwidth and CPU load of the in-house infrastructure, - maximize flexibility in terms of changing the UI (User Interface) while reducing time required to release UI changes, - implement easily adoptable communication protocol to increase opportunities for external usage of the system, - maximize the reuse of the common data used of the system (list of friends, partitioning schema) and minimize the cost of bootstrapping the system.

An analysis performed on existing web applications such as Facebook, Gmail, Flickr, MySpace or Twitter shows that none of these sites produce AJAX (Asynchronous JAvascript and XML) responses which decouple data from presentation. Usually, these responses are a pre-built stream mixing Javascript, CSS, HTML and data which is then inserted into specific containers in DOM (Document Object Model) or just evaluated in the Javascript engine. The abstract of suggested solution defines a communication protocol built on JSON-RPC (JavaScript Object Notation - Remote Procedure Call) with well-defined scopes of responsibilities for technological components of the client and a highly customizable structure of server-side controllers.

Several frameworks exist that introduce similar solutions in order to decouple HTML from data and perform rendering at the client browser. ExtJS [1] allows placing templates in separate files that can be fetched and cached by the browser. Later, AJAX is used to fetch data from the server to populate the templates. This leads to bandwidth savings as there is no redundant HTML served in every pageview and cost of producing rendered HTML is moved to the client browser. However, the drawback of Ext JS approach is that they are introducing a new language in the templates and therefore increase the cost of any UI and design related operations. Consequently there is an increased need for interaction between designers and client-side developers. Also, the HTML templates produced by designers must be converted into Ext Templates increasing the possible points of failure and the complexity of the development process, as well as making maintenance cumbersome. Many other JavaScript libraries essentially work with a similar concept. These include: Mjt [2], UIZE [3], TrimPath [4] or EmbeddedJS [5]. An implementation worth mentioning here is one done by PureJS [6], which tackles this issue by creating HTML templates that do not require usage of conditional or looping statements. The templates remain in HTML and the data is matched with injection points identified by a specific attribute in the structure. Conditional statements are triggered by simply hiding or showing an element. In order to implement loops, they are inferred automatically by detecting if the data is an array. However, PureJS does not effectively manage decoupling of data from the structure. More complex (real) usages of PureJS framework require constructing statements named "directives". These define the process of inserting the data into HTML along with other elements such as additional data structures or adding user interaction. The existing system produces server-side Views as part of the MVC framework. In a multi-client environment this leads to an added load on the server infrastructure along with increased complexity of implementing any changes to the presentation. Currently the system uses seven different interfaces, each with a separate set of views, controllers and dynamic templates. The new solution will reduce that to much simpler and static templates. An additional limitation of the current MVC system is that it is capable of producing only one view per URL with a request; in contrast, a request with a chained JSON-RPC call can perform several operations and return data that can be used for display, caching, or configuration of the client. The new approach also opens several optimization opportunities in terms of reusing bootstrapped instance and in-memory cache of the system and number of client connections.

Overall front-end architecture and strategy

The front-end framework project is part of the overall architectural redesign of Tuenti.com. The front-end layer is responsible for rendering views, communication with the server and UI interaction. The second part of the project includes back-end redesign which is outside of the scope of this document. Main concerns identified by the stakeholders for the system and that are addressed by the front-end design include: flexibility, cost and schedule, and integrability.

The frontend framework's principles are to produce a highly decoupled system by introducing a natural separation of concerns in the source code: structure (HTML), layout styles (CSS), behavior and control (Javascript), and data (JSON-RPC). All except for the data will be cached on the client and in content caching solutions reducing the load of the in-house infrastructure. Furthermore, due to the nature of some requests that return mostly unchanged data, it is possible to cache and reuse them. This opportunity is supported by dramatically (45% to 91% according to the tests) reducing the size of the response produced by the server. The decoupling mentioned above also supports the organizational aspects of development projects based on the framework. Work can easily be parallelized between teams participating in the project. The only required interaction between them takes place in the analysis stage, when the interface and the model are defined to satisfy product requirements. Projects that involve redesign of the user interface can remove (or minimize) the need for developer involvement due to lack of any transformations to the templates that are now based on pure HTML. JSON was picked as a data transport format because it is technology independent (easily parsed) but also native for the presentation layer of the main client, which is written in Javascript. It can be easily (in accordance to PCA design of the server-side) replaced by XML or other formats on demand. The communication protocol itself has been designed to address the need for semantic identification of the data, human readability, and server-side optimizations. The performance concern is being addressed by the possibility to chain multiple calls in one request; this technique not only reduces the need to bootstrap the system but also can drastically reduce the number of connections between the client and the server (which is especially significant for mobile applications).

A more detailed view of the execution flow:

Client-side framework

The approach followed in the framework makes use of a similar concept followed by Pure JS for templates. However, all behaviour and logic is provided by client-side JavaScript only and the mapping between the data and template structure is performed automatically basing on the semantical information contained in both. Since the mentioned mapping information is contained within standard id and class attributes of the HTML tags, they can be naturally reused by Javascript and CSS without introducing any new meta-language. Effectively, the framework resides on the user browser and is responsible for interacting with servers to fetch static and dynamic data, template rendering, and user interaction. Specifically, the server interaction involves sending requests for downloading all statics from the content caches by pooling and parallelizing requests for optimal bandwidth usage and total time of handling the request. Upon receiving the response from the servers the framework executes the hooks for client data transformations (e.g. date transformation) and renders the page. The Main Controller is the core of the client-side framework, as it manages the communication with the servers and handles several hooks allowing code execution within the process of handling user action. The data is retrieved by sending JSON-RPC requests to the server which provide the response to called procedures, but also can contain additional information. This will usually contain data corresponding to the actions that took place within the functional scope of interest of the user (e.g. new message has arrived) but also can contain data used for configuration of the client such as re-routing the client to a different farm, throttling automatic chat status updates, etc.

Example JSON-RPC call:

{
    "Friends": ["getOnline", {"maxCount": "10", "order": "recentVisit"}],
    "Messages": ["getThread", {"id": "12670096710000000066", "msgCount": "5"}],
}

The server will provide a response in JSON which has a flat structure, so it does not infer or suggest any hierarchy on the final display, and is purely data-centric.

{
    "output" : [
        {"friends" : [
            {
                "userId" : 23432,
                "avatar" : {
                    "offsetX": 31,
                    "offsetY": 0,
                    "url": "v1m222apwer124SaaE.jpg",
                },

                "friendAlias" : "Nick"
            },
            {
                "userId" : 63832,
                "avatar" : {
                    "offsetX": 32,
                    "offsetY": 50,
                    "url": "MLuv22apwer114SaaE.jpg",
                },
                "friendAlias" : "John"
            },
        ]},
        {
            "threadId": "12670096710000000066",
            "canReply": true,
            "totalMessageNumber": 3,
            "messages": [
                {
                    "isUnread": true,
                    "senderId": 32,
                    "senderFullName": "Daniel Martín",
                    "body": "But I must explain to you...",
                    "validBody": true
                },
                {
                    "isUnread": false,

                    "senderId": 66,
                    "senderFullName": "Carlos De Miguel Izquierdo",
                    "body": "Sed ut perspiciatis unde omnis...",
                    "validBody": true
                }
            ]
        }
    ]
}

The page structure is defined in pure HTML. This means that the templates themselves do not require introducing any new meta-language, relying instead on the framework to show/repeat pieces depending on the interpretation of the response data. It is possible though to extend execution of the user action by arbitrary routines that can add additional logic. Still, that doesn't influence the way that the templates are being created and all of the routines for handling templates are implemented in Javascript.

Here is an example of a piece of HTML code which serves as a template to display user avatars:

<span>
 <img src="" alt="" />
</span>

With this code sample the Template Engine is capable of implicitly performing a very flexible conditional statement where it is possible to show the element basing on the presence of the avatar in the data, repeat it if the avatar is an array, and inject data into the DOM element basing on the information contained in the params attribute. If no matching with the data can be found the DOM element will be left unprocessed. The data centric approach of the framework means that the Template Engine will identify elements in the structure by iterating through the data, and matching them in the DOM structure that is being dynamically scoped. When iterating into nested structures, the Template Engine will now search only within the corresponding context in the DOM. If certain DOM elements are not identified by the Template Engine they are left unprocessed and their default appearance will apply. In order to match elements to data the DOM elements only need to refer to the data through values set in two distinct element attributes, class attribute for data to be injected in the page structure and params attribute for data to be made available to UI interaction scripts. Specific actions for user interaction can be added to the DOM elements by specifying the action. All actions will be implemented in an external static JavaScript file and, as a good practice and internal coding convention, it will not be allowed to place any other JavaScript code inside the HTML Templates. This is a natural decoupling of behavior from structure which is similar to decoupling page structure from style by not setting inline CSS using the style attribute.

Server-side framework

The main architectural element of the server-side part of the front-end framework is inspired by an architectural pattern known as Presentation-Abstraction-Controller (PAC) [7][8]. The design (named Printer-Controller-Abstraction) bases on identifying data-centric controllers and allows free interactions between them. The graph structure of controllers receives data from the abstraction layer where abstractions are instantiated by controllers. The abstraction layer communicates with the domain layer that manages and identifies domain entities. All of the controllers that participate in the request processing populate a central response buffer that later is printed (currently only JSON printer) and output from the system. The Abstraction layer plays a relatively small role in the PCA structure and only interacts with the Domain layer. But its presence is very important from the perspective of the back-end framework.

General model: None of the PCA agents produce the presentation as such. Instead the response is sent to a Response Buffer where each agent caches the data it produces. The Response Printer is a lazy component which just produces a representation of the data when the full request has been performed (when Printer asks for it). The Response component allows greater control over re-usability of responses across the hierarchy in a request, allowing reuse of the agent response throughout the lifetime of a request. At the end of processing, the Printer component can accept any printing strategy to produce output in desired format. The complexity of each agent is dramatically reduced as the framework does not produce any views. The Abstraction layer of an agent will access the Domain layer, which is part of the backend framework, to fetch the requested data. The Controller contains all the actions an agent can perform and it may instantiate multiple Abstraction objects to fetch data to build its output. The structure of controllers provides them with a lot of flexibility. A Controller can delegate tasks to other agents or fetch their responses and then perform the requested action. Additionally, the controllers are capable of accessing responses of other agents through the response buffer.

Conclusion and Future Work

The new design improves the project organization performance by reducing inter-team dependencies and abbreviating communication to its initial analysis stage. Apart from that it reduces amount of work that is required to prepare the front-end code for the rich UI clients. Teams are able to focus more on their core activities and technologies reducing friction and optimizing the communication paths. Upcoming visual redesign projects can be done primarly by a design team rather involving significant work from client-side developers to support iterative changes to views, templates and controllers. Simple and pure HTML templates improve the throughput of designers who may now work in a WYSIWYG way, able to use their tools directly on the templates.

Key benefits of this architecture are: - minimize overhead of maintaining multiple interfaces, - shift team focus to match their responsibilities, - removing the need of server-side changes when changing page structure, layout, design or UI scripts, - parallelize development project efforts, - minimize bandwidth consumption (savings of 45%-92% depending on the page type), - minimize server-side CPU use (savings of 65%-73% depending on the page type), - improve developer's performance by providing tools with clearly defined responsibilities and scope.

A working prototype has been built, proving the above concepts are viable and functional. A subset of an existing feature, the Private Messages module, of the Tuenti.com application was used to test the above framework. This initial proof of concept (PoC) of the framework has preliminary been evaluated using Firefox 3.5. and Chromium. This successful PoC shows that highly complex and feature rich applications can be developed with the proposed framework. Server-side code complexity is greatly reduced through the usage of the PCA framework. Preliminary results show that response time is almost a third when compared to the existing MVC framework. Templates can now be visualized directly in the browser, raw template sizes are at least 30% smaller, and there are no conditional or iterative flows. Cost of producing rendered HTML is now moved to the client browser's which might now become a challenge that has to be faced before rolling out the system into live environment. However, the overall response time observed by the user is still lower than in the current implementation and will be subject to further optimizations.

References

[1] Ext JS. Palo Alto CA (USA), 2009 [Online]. [2] Mjt, "Template-Driven Web Applications in JavaScript" [Online]. [3] UIZE, "JavaScript Framework", 2010 [Online]. [4] TrimPath, 2008 [Online]. [5] Jupiter Consulting, "EmbeddedJS, An Open Source JavaScript Template Library", Libertyville IL (USA), 2010 [Online]. [6] BeeBole, "PureJS, Templating Tool to generate HTML from JSON data", 2010 [Online]. [7] Coutaz, Joëlle, "PAC: an Implementation Model for Dialog Design", 1987. H-J. Bullinger, B. Shackel (ed.). Proceedings of the Interact'87 conference, September 1-4, 1987, Stuttgart, Germany. North-Holland. pp. 431–436 [8] J. Cai, R. Kapila and G. Pal, "HMVC: The HMVC: The layered pattern for developing strong client tiers", 2000, JavaWorld [Online].

Tuenti Rich Media vs Twannotations

Posted on 5/10/2010 by Tomasz Matuszczyk Lead of Frontend Engineering

Recently Twitter developer Raffi Krikorian published a presentation on "Twannotations" - "a game-changer to the twitter platform" (http://techcrunch.com/2010/05/08/twitter-annotations/, http://mehack.com/extremely-preliminary-look-at-twitters-annota). To provide some feedback to this wonderful initiative I'd like to use this blog post to tell you about a very similar annotations system that we've been using at Tuenti since late 2008.

The need for an annotation format came up when Tuenti faced it's first major code re-factor. Tuenti has always allowed users to attach videos, links, and other types of media, to pretty much any post they write on the Social Network.

The first approach, anno 2006

The first version of this system relied only on URLs being present in the posts. On output a regular expression would scan the content for URLs and identify each of them as: a Youtube video, an internal profile link, an internal photo link, etc.

The parser would then replace the occurrences of these URLs with the HTML needed to render either a Video player, a profile link, a photo thumbnail, etc. Since the URLs were stored in-line with other post content the Rich Media content would render in the same place where the user originally wrote it.

The second approach, anno 2008

The second version of this system had a very short life-span and was introduced as a first attempt to have something more useful than URLs stored in annotated posts. Why was it more useful? Because it stored the type of the Rich Media, and information about how to retrieve more information about the Rich Media. This allowed us to internally use a simple Factory pattern to deal with the Rich Media items, as we started calling them. Why was it short-lived? Because it was a non-standard format:

[[media:[type:Video,id:155]]]

The non standard format choice meant that anyone who wanted to communicate in this format had to write their own parser.

The current approach, anno 2008

The third, and currently used, version of the Rich Media system was born only weeks after the second version got implemented. To address the problem of needing a custom parser - but still take advantage of the low space usage, human readability, and extensibility of the syntax - it was decided that JSON should be used to embed Rich Media information in posts:

{media:{"type":"Video","id":"9304"}}

{media:{"type":"ExternalURL","url":"http:\/\/www.tuenti.com\/"}}

In the storage layer a sample post can look like this:

Hey Nick! Check this video out! {media:{"type":"Video","id":"21767091"}} Let's go there this winter!

Server based services will parse this JSON, retrieve additional information about the video, and then render or modify it.

Client based services need more information. So in the API's response this post could look like this:

Hey Nick! Check this video out! {media:{"type":"Video","id":"21767091","playCount":"10","title":"Rogers Terrain Park at Sunshine Village - 2009"}} Let's go there this winter!

In the UI, depending on the context, the video will get rendered either inline or in it's own section. Some UIs that don't support videos (such as web for simple mobiles) will only render the title of the video.

The full life cycle of a post with annotations

When writing a post the user simply copy/pastes the URL to the media he/she wants to add to the post from the browser window. Buttons are also provided in the regular web UI to make this more convenient.

On the server a Rich Media Input Parser parses the post and converts it into a Rich Media formatted post. This parser has to be fairly complex since it needs to be able to handle multiple formats that don't provide any type context or additional information. What's now stored in the DB is something like this:

Hey Nick! Check this video out! {media:{"type":"Video","id":"21767091"}} Let's go there this winter!

On the way back out to the user a very simple Rich Media Output parser parses the post and provides an OO structure to the template that is in charge of displaying the final result. This parser is currently slightly custom since only parts of the post are JSON formatted. The most complex parts of the post are however JSON formatted and thus you can use the power of a standard JSON parser where it's really needed. You can also easily sanity check that each Rich Media attachment is correctly formatted using any standard JSON parser at your disposal.

Here's a post where the Rich Media content has been rendered in-line instead:

Future improvements

As you can see above the format is still partially non-standard since only Rich Media items are JSON formatted. The rest of the content in the post is not. This requires developers to implement a simple parser that retrieves all Rich Media JSON tags before they can be interpreted using a standard JSON parser. A fully standard format has therefore been discussed, but not yet implemented:

[{media:{type:"Text","content":"Hey Nick! Check this video out! "}},{media:{"type":"Video","id":"21767091"}},{media:{type:"Text","content":" Let's go there this winter!"}}]

The major advantage of this approach is that it's fully JSON based. Any application written in any language that comes with a JSON parser could interpret this without it's developer having to implement his/her own parser.

Evolving a Backend framework

Posted on 4/16/2010 by César Ortiz Architecture Engineer

The duties of a Backend Software Architect at Tuenti include the maintenance and evolution of the Backend Framework. In this article we will talk about Tuenti's framework evolution, share its pros and cons and briefly introduce its features without entering into many technical or architectural details (as they will be covered in future articles).

Historical Review

The software that runs www.tuenti.com changes continuously with at least two code deployments per week. The scope of these releases vary, but usually we release a lot of small changes that touch many different parts of the system. Of course sometimes our projects are really big and their releases get divided and released in series to reduce overall complexity and minimize risk.

Identical approach is appllied to framework releases. Currently the modifications are mainly subtle, but introduction of a framework had to be divided into few phases with some of them also decomposed into smaller ones.

The original version Since its creation, the site runs over a lighttpd, mysql and PHP. From the first version, no third-party frameworks have been used and all the software has been developed in-house (for the good or bad).

The first version of the "lib" was quite primitive from an architectural point of view, since as a start-up the primary aim of Tuenti was to reach the public fast and then evolve once the product was proven successful.

The transitional version The transitional version was the one in place before we introduced the current framework. This code was using a framework built around the MVC pattern with a set of libraries supporting model definition and communication with storage devices (memcached and MySQL). At this point in time, the data partitioning was being introduced for both memcached and MySQL allowing Tuenti to scale much more effectively.

The use of memcached is very important for the performance of the site. When a feature was being implemented, the developer not only had to consider how the data is going to be partitioned in the database, he also had to decide what data is going to be cached in memcache, how the cache would work, and make sure that all interdependencies for data consistency are satisfied. The caching layer contained not only simple data structures, but also indexes, paging structures, etc.

The current version Currently, newly developed domain modules use the new backend framework (that exchanged the old model and supporting classes) and we are gradually migrating modules from the transitional framework to the new one.

We have also designed and developed a new front-end architecture which is still under evaluation and testing. In the following months we will be posting more information about the framework and implemented solutions, so please be patient.

Some of the most important advantages of the new framework are:

  • standardization of data containers,
  • transactional access to the storage (even for devices not supporting transactions),
  • complete abstraction of the data storage layer.

In addition to above, the framework is introducing several concepts, among which you'll find:

  • domain-driven development,
  • automatic handling and synchronization of 3 caching layers,
  • support for data migration, partitioning, replication,
  • automatic CRUD support for all domain entities,
  • object oriented access to data along with directly from containers (avoiding expensive instantiation of objects).

The framework is entirely coded in PHP and (so far) we have not moved any parts of the code into PHP extensions. This leaves us a lot of room for possible performance improvements but will reduce the flexibility of the code if we decide to make that step.

Selected framework features

A framework designed for a website like Tuenti has to address a lot of technical issues which you would not encounter in a standard website deployment. The problems arise on different fields: number of developers working on the project, scalability problems, the migration phases, and many more that appear as the site evolves over the time.

Although a deep explanation is out of scope in this article, let's briefly see the mentioned features.

Transactional access to the storage Systems using many storage devices require additional implementation effort to keep the data in a consistent state. We cannot completely avoid data inconsistencies (due to the delayed nature of some of the operations and failures), so we have to keep part of the consistency checks in the source code. Yet, we can minimize the impact and amount of problems in this area after implementing transaction handling within our application. This means that with more complex operations, that involve changes in several data sources we can keep a relatively high data consistency by implementing a design that defines "domain transactions" that relate to "storage transactions" assigned to different servers with different types of devices running on them.

This approach allows developers to focus on the logic and specific storage related cases, while the framework handles the transactions for most of standard operations automatically.

Complete abstraction of the data storage layer A central point for the storage layer is a "storage target name". These names are linked to several configuration data such as used storage devices, partitioning and/or replication schema, different authentication data, etc. In the domain layer, developers can write code focusing on logic and relations between domain entities and communicate with the storage layer as if it was one device (handling transactions as mentioned above).

This means that when there is a need to perform a data related operation, developers don't need to worry about all the device specific details, caching, etc. Everything is handled automatically so (in the most common case) the data will come from memcache; if it was already used while handling this request - it will already be cached in the framework; or if the data has not been used for a while - it will come from MySQL since the cache has expired.

Standardizaton of data containers Almost all data loaded into the system is stored in standard containers (DataContainer) that later are sub-classed to implement different logic for handling different types of data groups (Queue, Collection etc.). Implementation of standard containers allow us to integrate several features into the framework that not only speed-up development and reduce domain layer's complexity, but also apply system-wide security and unify data access interfaces.

Drawbacks

Every architecture is designed with trade-offs in mind. This means that support of some of the architectural concerns is increased and for some decreased. In this case we have observed a higher memory consumption, bigger challenge in implementing particular performance related optimizations, and reduced flexibility of ways in which code can be implemented.

Currently developers have less freedom than when using the first version Tuenti's back-end framework. Previously a developer could just write any SQL statement he wanted and decide whether to cache the data or not and how that caching should work to the last detail. There was more flexibility but the process was prone to errors and produced a lot of duplicated code (read: copy+paste or waste of time). We still need to provide a way for developers to write complex SQL queries that cannot be generated by the framework automatically but these are just exception as regular queries executed in Tuenti are very simple.

As was already mentioned, higher memory consumption and more challenging implementation of optimizations are drawbacks associated to the use of a more complex framework. Both CPU and memory consumption are not considered problems when we're thinking about regular web requests. Standard response time was not affected in a noticable way, yet a glance at the back-office scripts execution statistics proves to us that there is still a lot of space for improvement in terms of memory usage and CPU consumption.

The root cause of higher memory consumption cannot be associated exclusively to the framework but also due to the fact that objects are cached in memory. Having a garbage collector is useless unless you release all references to objects. Caching is a very good solution to improve speed, but the code must provide ways to flush the cached data in order to make it usable in scripts that usually work on bigger amount of processed data then web requests do.

Evolution of the framework

A good framework design will allow for its evolution, but will define and enforce clear boundaries. Re-architecting the system is always a very difficult and expensive process, so one has to take into consideration all possible concerns (especially non-technical) and requirements defined for the system. It is also clear that the first version will never be the last one, so you need to be patient and listen to all of the feedback you're receiving.

Once you have a stable version of your framework you need to convince the developers that it really solves their needs and that it will make their lives easier. Having your developers "on board" has several advantages:

  • they will suggest improvements and anything else that they feel that is awkward,
  • remove the communication barrier that will block your framework from "reality",
  • speed up development process of the framework by streamlining ideas and effort.

When you are introducing a new framework, you also need to integrate it with the old one. This can be very hard and tricky. What you usually would like to do is to make the old framework use the new one. You need to maintain the old interface but run the new logic inside. Hopefully the old interfaces will make sense and you will not have to spend weeks trying to make "the magic" work in a technical world. You need to consider that the interface is not just the function signature and its arguments; you also have to respect the same error handling and influence of the old code on the environment.

As a framework developer you should never forget that the framework is there to help the people that are developing the functionality over it, however cool your framework is.

Follow us