Now this is actually standard (if somewhat counter-intuitive) behaviour in regular expressions in most languages, but it can be changed, for example, by setting the
RegexOptions.Singleline option in .NET, the
/s modifier in Perl, or the
PCRE_DOTALL option in PHP.
However, there is a workaround. The
\s character class matches any white space character (including carriage returns and line feeds), whereas the
\S character class matches any non-whitespace character, i.e., anything not included in
[\s\S] instead of the dot should do the trick.
For example, to extract the contents of the
section of an HTML document: