How to match any character (including newlines) in a JavaScript regular expression
Posted at 11:19 on 04 May 2008
There is a little gotcha with JavaScript regular expressions. The . (dot) character, which supposedly matches any character, does not match newlines.
Now this is actually standard (if somewhat counter-intuitive) behaviour in regular expressions in most languages, but it can be changed, for example, by setting the RegexOptions.Singleline
option in .NET, the /s
modifier in Perl, or the PCRE_DOTALL
option in PHP.
Unfortunately, there doesn't seem to be a corresponding option in JavaScript.
However, there is a workaround. The \s
character class matches any white space character (including carriage returns and line feeds), whereas the \S
character class matches any non-whitespace character, i.e., anything not included in \s
. So... if you want to match any character in JavaScript, including newlines, using [\s\S]
instead of the dot should do the trick.
For example, to extract the contents of the <body>
section of an HTML document:
/<body[^>]*?>([\s\S]*)<\\/body>/.exec(html)