james mckay dot net
because there are few things that are less logical than business logic

How to match any character (including newlines) in a JavaScript regular expression

There is a little gotcha with JavaScript regular expressions. The . (dot) character, which supposedly matches any character, does not match newlines.

Now this is actually standard (if somewhat counter-intuitive) behaviour in regular expressions in most languages, but it can be changed, for example, by setting the RegexOptions.Singleline option in .NET, the /s modifier in Perl, or the PCRE_DOTALL option in PHP.

Unfortunately, there doesn’t seem to be a corresponding option in JavaScript.

However, there is a workaround. The \s character class matches any white space character (including carriage returns and line feeds), whereas the \S character class matches any non-whitespace character, i.e., anything not included in \s. So… if you want to match any character in JavaScript, including newlines, using [\s\S] instead of the dot should do the trick.

For example, to extract the contents of the section of an HTML document:

/<body[^>]*?>([\s\S]*)<\\/body>/.exec(html)

2 comments:

  • # Reply from Moot at 06:11 on 20 Nov 2008

    This is the ultimate catch all:
    [-uFFFF]

  • # Reply from Moot at 06:14 on 20 Nov 2008

    I can’t use bbcode but the catch is from 0 to uFFFF

Comments are closed.