Tuenti Voice Control is a proof-of-concept that allows users to browse Tuenti with their voice instead of using a mouse or keyboard. It was created for HackMeUp 15, a 24 hour code competition held between Tuenti engineers every quarter, and uses the Experimental Speech API available in Google Chrome since 2011.
In our demonstration video, Ismael Gonzalez demonstrates browsing Tuenti tabs, going to specific profiles and starting chats. Watch the video now!
After creating a Chrome plugin that communicates speech-to-text data to the website, we spent the remaining three hours adding commands related to Tuenti. By the deadline we could:
Chrome’s Experimental Speech API implements a subset of the features detailed in the W3C Recommendation for Speech Grammer (March 2004) and allows extensions to start speech recognition and retrieve the captured text. To use experimental extension APIs, you must start Chrome with the command line option --enable-experimental-extension-apis.
Google Chrome extensions are composed of HTML pages with specific functions. We use a single content script to capture events from the browser and send requests to a background page:
window.addEventListener("speechstart", function(e) {
chrome.extension.sendRequest('speechstart', function(response) {
triggerSimpleEvent('speechstarted');
});
});
This background page is able to access the experimental API and start speech recognition:
chrome.experimental.speechInput.start({
language: 'ES_es'
}, function () {
if (chrome.extension.lastError) {
console.debug("Couldn't start speech input: " + chrome.extension.lastError.message);
}
});
The background page then communicates the result to the content script via an asynchronous request.
// Target active tab
chrome.tabs.getSelected(null, function (tab) {
chrome.tabs.sendRequest(tab.id, {
success: true,
result: result
}, function (response) {
// Handle request callback
});
});
If recognition has been successful, the content script appends a JSON-serialized version of the speech data array to the DOM and fires a ‘speechresult’ event.
chrome.extension.onRequest.addListener(
function(request, sender, sendResponse) {
var voice = document.getEleventById('voice');
voice.setAttribute('success', request.success ? 'true' : '');
voice.setAttribute('data', JSON.stringify(request.success? request.result.hypotheses : []));
triggerSimpleEvent('speechresult');
}
);
Serialization is required because the content script and underlying website have different Javascript contexts and objects cannot be shared between them.
The W3C recommendation includes a method for specifying a grammar. This is crucial for achieving high accuracy and precision in speech recognition system as error rates decrease as the vocabulary size shrinks: 0-9 can be recognized without error, but vocabulary sizes of 200, 5000 or 100000 can have error rates of 3%, 7% or 45%. After experimentation we found that custom grammars are not implemented in Chrome, as of December 2011, and that Google returns any set of words from its dictionary.
We solved this issue by converting recognized text to a bag-of-words and calculating the probability of a user wanting to perform an action on a friend based on the number of occurrences of words related to that action/user pair.
This approach worked flawlessly when words present in text returned by Google correspond to a valid action/friend pair. This is helped by speaking clearly and using a high quality noise-cancelling microphone (Apple MacBook Pro) to ensure that the speech recognizer can detect the beginning and end of the command.
Before starting the project we did not know if it would be possible, especially using a single key, to start speech recognition, let alone recognize commands. It was and we think that such techniques can provide a better web experience. For this to happen, both Google (and other browser makers) and the W3C must work together to provide a stable API that can be used by all websites without extensions.
HackMeUps are code competitions we hold every quarter at Tuenti. If you would like to participate in development at Tuenti, consider applying! We are always looking for talented candidates.
Tuenti Voice Control was created by Michael Clark (frontend developer) and Ismael Gonzalez (CSS architect).