Building a VoIP Service using WebRTC

Posted on 4/28/2014 by Pedro Álvarez & Iván Mosquera, Senior Software Engineers 

Why We Built a VoIP Service

Tuenti is the first Spanish telco company that integrates the Social Internet World and the Telco World. Tuenti is focused on innovation as a symbol of its identity, offering VoIP services through our applications, which can be used across a lot of platforms (Android, iOS and Web). This is the main key to provide our customers with valuable services that allow them to communicate with their contacts wherever they are, and with no cost on credit or data, thanks to Zerolímites.

How We Build our VoIP Service

A couple of weeks ago, we had the opportunity to read the blog post Tuenti, Telefonica, Tokbox and zero-rated Mobile WebRTC?, written by Dean Bubley, in which there are open questions about how we build our VoIP Service. At the same time, inside the community there are people asking if it is possible to build a business product using the webRTC technology. Here you have the answer.

Infrastructure

Messaging has become one of the most used applications and we have been competing in that market since a few years ago, building clients for every platform and supporting scaling for millions of users. In this scenario, it would not make sense to use a different infrastructure if we could use the one we already had that supported our chat clients. Signaling is exchanged through our already existing XMPP channel and we set up STUN servers in those same chat servers. Our TURN servers are external to the chat infrastructure though.

Signaling

When a Mobile Originated (the caller) wants to make a call, he has to start handling a session negotiation with the Mobile Terminator (the callee). This process is known as signaling. There are different ways to make this process depending on the protocol you use to change information between the MO and the MT.

One of the most well known protocols is Jingle. Jingle is an extension to the Extensible Messaging and Presence Protocol (XMPP), which adds peer-to-peer (P2P) session control (signaling) for VoIP. In our applications we already have a chat and we maintain an XMPP channel open to support the communication over it. Jingle was our first approach.

The first application in which we built the VoIP service was the web. There we could find a good Jingle library, that was easy. On Android, we found an open source library to handle the Jingle parsing as well. For Jingle we were using libjingle from Google. The problem came out when we starting building the iOS application. We did not find any good Jingle open source library. We were forced to parsing the Jingle stanzas by hand, which was not a big deal. We use a small piece of the protocol.

One thing we did not like from Jingle was the Session Description Protocol (SDP) parsing. It has to be converted from its format (rfc4566) to the XML format proposed by Jingle. It is an overhead when we do not want to invest in this area. So, we decided to fork Jingle.

As we were using just a subset of Jingle and we do not support compatibility with other VoIP applications, we created Tangle. We just include the SDP as the body of a Tangle element, as in the following example:

<tangle xmlns='com:tuenti:voice:tangle
action='session-initiate' sid='cea59k47sd59n'>
<sdp>
<![CDATA[
SDP
]]>
</sdp>
</tangle>

Note: Following the feedback we got from readers, we changed the namespace in our tangle specification.

One recommendation before choosing a name, google it!!!!

Multi Resourcing

One of the most difficult challenges we had to manage, about the signaling, was the multi resources or multi ringing. Our VoIP service is available on Android, web and iOS. This means that a customer can be logged into our Android client application and into the web at the same time. It is a very common scenario.

When someone calls a customer which is logged into several resources, the application has to ring in all of them. Then, the callee picks up the call in one of the resources. The ringing has to be canceled in the other one. To achieve this, we introduced the already-answered as a new reason for the session-terminate. We can do it because we know in how many resources our customers are logged into.

WebRTC

As stated in webrtc.org:

WebRTC is a free, open project that enables web browsers with Real-Time Communications (RTC) capabilities via simple JavaScript APIs. The WebRTC components have been optimized to best serve this purpose.

It´s a major innovation in the web world as it makes RTC applications building affordable and ubiquitous. Without WebRTC you would need to build a full client stack for each platform starting from the grounds, specially inconvenient in web as it would require installing a 3rd party plugin, defining your own protocols, dealing with codec licensing and building and maintaining those APIs, which would be enough to make a project like this unfeasible.


WebRTC project solved all these problems. You have an API available with the media capabilities you might need, communication protocols are defined and you avoid codec licensing problems as the project is also caring about that. All this being available to developers is quite shocking and is driving great innovation.

So what we have done is building a VoIP solution powered by this WebRTC technology, on top of our existing messaging infrastructure.

We have built VoIP clients for the current top client platforms: Web, Android and iOS. They are able to interoperate with each other using the same technology under the hood as WebRTC is not only providing client stack for browsers but also native client libraries which we are relaying.

The project is evolving fast, it's becoming more and more stable and it supports the best possible quality using top-notch codecs like Opus. Besides, there is a growing community working and building things on top of the platform. Again, if we compare that with dealing with a full in-house effort, it would be really difficult to maintain. With WebRTC we can focus on product high level tasks and benefitting from WebRTC advances.

Always Connected Application

When the application lost the connection over XMPP with the chat, it turned in a push mode. If your connection was closed and someone wrote you a message through the chat, you will be notified with a push notification. This way of handling the connections works well when the application interchanges just messages. When it has to keep customers available for calling, you can not base your model on push notifications.

We developed a mechanism to maintain the connection alive instead of disconnecting itself after the application stays some time in background. We implemented a low-comm connection state in order to keep as little data transferred as possible, avoiding particularly the transmission of XMPP presences. We did not want to have a higher battery consumption because the connection will be alive longer than before. 

Another thing we did to improve the connection system was to create an adaptative ping. The ping looks for the bigger space time which needs to make the pings and keep alive the XMPP channel.

Zero Limits Calls

As part of our data plans, we offer “Zerolimites” to our customers. Customers who purchase this plan, when they run out of data, they can still have internet access through our application and this will have no effect on their bills.

To provide this, we have zero-rated IP ranges which are free when you are on Zerolímites. In order to make free calls we force calls to go over our TURN servers. This way, the customer does not consume any data. When the customer is over Zerolímites, we filter the ICE candidates in the clients and only set into peerconnection the turn server ICE candidates.

Building the product, getting the Voip flows right

Building a VoIP product is not easy. It´s not just making a call and that´s all, you need to care about several scenarios and make sure the User Experience is consistent with what the user can expect from a phone interface. Our product managers define each detail related to the different existing flows and we, the engineers, need to make sure that we model and support all those cases the right way and that we write a code that makes it maintainable.

Having those flows well defined is also key for testing, they need to be covered by automatic tests that stimulate the state machine that is supporting them, and it is also important that anyone can understand and test those flows. Tuenti Voice team bootstrapped the product but, as it happens in any other project, the whole company should be able to join that effort.

Follow us