Regular expressions in JavaScript: Concepts & Performance Issues

Publicado el 15/3/2011 por

Miguel Jiménez Esún Intern, Frontend Framework

Regular expressions are a sophisticated way to check patterns against strings. In JavaScript, we have five native functions that can use regular expressions as an argument. These are the JS functions for performing operations with strings:

  • search, match and replace methods of the String object (regexp is provided as the method’s argument).
  • exec and test methods of the RegExp object.

Construction optimization

Regular expressions have two sections: the regexp itself, and the modifiers. According to this scheme, they can be generated in two different ways:

  • With a ‘/’ starter: the regexp is between two slashes; after the second we place the modifiers.
  • With the RegExp constructor: the regexp is passed as a string as the first parameter, and the modifiers as the second parameter also as a string. This method allows building it from any data type  (strings, numbers, etc). However it does have a performance cost. For example, you can make a regexp to change a string in to another one by using a predefined regexp:
    var str = "Hello, John! Today is Monday.";
    var regexp = /John/g;
    var user = str.match(regexp, 'Mark');

    This will return "Hello, Mark! Today is Monday.". However in some cases this approach may not possible, i.e. you may need to provide the string to be replaced through a parameter:

    function myReplace(name) {
        var str = "Hello, John! Today is Monday.";
        var regexp = new RegExp(name, "g");
        return user = str.match(regexp, 'Mark');
    }

    This will slow down your code by several orders of magnitude. Be extra careful specially if you are running it inside a loop. Here we can see the performance impact across different browsers (less is better):

    Table 1. Different ways of building a regexp

Global modifiers optimization

There are 3 different global modifiers in Javascript, which are described after the regexp closing slash. These can be g, i, and m.

  • g: only used in replace, match and exec methods. It performs a global replace instead of returning just the first occurrence only one.
  • i: case insensitive when making searches or replacements. For example, /a/gi will find A and a. Use this modifier instead of a regular expression, because it will be up to 10% faster. E.g. using /a/gi is faster than /[Aa]/g. You must take in account that the i modifier is prepared to handle any unicode character, regardless of what code it has.

    Table 2. Differences between using the /i modifier against double match

  • m: multiline modifier, allows ^ and $ modifiers to match against a \n character and a \r character, instead of just checking at the start or at the end of the string. This is a very useful and fast modifier so it is recommended to apply it whenever possible instead of an alternative regular expression. As seen below it can be almost 3x faster. For example, when parsing code, it is better to use /^for (.*);/gm, rather than /(\nfor\s*(.*);)|(^for\s*(.*);)/g.

    Table 3. Differences between using the /m modifier against compound patterns

Quantity modifiers

  • Using {} notation when trying to find a repeated pattern can be marginally slower than using special characters, because the regexp parser engine must evaluate what is inside them. In general, you can substitute them by:
    • {0,} with a .
    • {1,} with a +.
    • {0,1} with a ?.
    • {x} with the repeated pattern manually.
    • {1} with nothing (this is the default for everything).
  • The “+” modifier is slightly slower than using the *. It makes the regular expression a bit redundant as you have to repeat the pattern, but it gives you a tiny bit of a performance improvement . This gives the backtracking algorithm a fixed pattern that must be present in the string. Not all browsers implement this performance improvement.

In the chart below we see the metrics comparing these:

Table 4. Quantity modifiers

IE7 IE8 Firefox 3.6 Chrome
{0,} 1850,7 1536,2 1575,3 1407,2
* 1817,6 1653,4 1490 1389,1
{1,} 636,9 497,7 275,5 208,4
+ 650,9 523,8 269 210,6
{0,1} 1824,6 1650,4 1843,8 1391,6
? 1876,4 1674,4 1842,7 1389,6
{3} 559,8 404,6 346,5 79,5
Repeated 506,5 361,6 122,4 78,5
(?:text) 598,9 430,6 173,9 97,4
text(?:text)* 585,8 435,6 173,3 97,2

Back-references

Back-references are a method to use parts of the regexp matches in a different part of the regexp within the same regexp, or else within the replaced string. They can be used also in the callback method. JavaScript has a maximum number of 9 back-references, and they are stored in the same order as the opening parenthesis appear in the expression.

  • Try to avoid using parentheses when it is not needed to create a backreference; and if needed, append  “?:” modifier to the parentheses; this will mark the pattern as non capturing, preventing it to be stored in memory, and thus, making the regexp faster.
  • When using backreferences in the same regexp, you can do so by using the backslash followed by the number; but this process has a performance cost. Do not use it to match strings that can be done without them (e.g.: /"[^"]*"/g instead of (")[^"]*(\1)/g).
  • Backreferences can be used in the replace string in JavaScript by using the symbol $, followed by the number. This notation allows making a lot of replacements:
    • $$ returns the $ symbol.
    • $n returns the back-reference labeled with number “n”.
    • $` returns everything before the matched string.
    • $' returns everything after the matched string.
    • $& returns the entire matched string.

    For example, to delete quotes from strings, you can use the following: str.replace(/"([^"])*"/g, "$1");. This will be a *lot* faster than doing a function where you make the replacement:

    str.replace(/"([^"])*"/g, function(match) {
        return match.slice(1,-1);
    });
  • There are other $ sequences not covered here as they are not fully supported in some browsers such as Opera or IE.

Table 5. Backreferences performance against function substitution

Note: All the charts shown in this article were generated by executing the code 100,000 times, in a page containing a single test.

Siguenos