Hi jh

Let me first say that if you intend to do that with a regex or even several regexes, I am afraid this is going to be quite difficult.

To quote from the documentation on the x modifier:

A single /x tells the regular expression parser to ignore most whitespace that is neither backslashed nor within a bracketed character class. You can use this to break up your regular expression into more readable parts. Also, the "#" character is treated as a metacharacter introducing a comment that runs up to the pattern's closing delimiter, or to the end of the current line if the pattern extends onto the next line. Hence, this is very much like an ordinary Perl code comment. (You can include the closing delimiter within the comment only if you precede it with a backslash, so be careful!)

Use of /x means that if you want real whitespace or "#" characters in the pattern (outside a bracketed character class, which is unaffected by /x), then you'll either have to escape them (using backslashes or \Q...\E ) or encode them using octal, hex, or \N{} escapes. It is ineffective to try to continue a comment onto the next line by escaping the \n with a backslash or \Q .

So, it means, for example, that you can't just remove every thing that comes on a line after a # pound sign, because you can't do it if the pound sign is part of bracketed character class, which means in turn that you need to detect character classes (and that, in itself, is far from trivial). Also, for any pound sign you find, you need to check that it is not escaped by a backslash.

Assuming that you build a bunch of regexes dealing correctly with pound signs, you then need to deal with white space, which is also quite difficult.

So, in brief, it is certainly possible to use regexes to do that, but it is likely to be complex and very difficult.

FWIW, I can think of the following alternatives:

Maybe some other monk(s) will be able to suggest a better solution, but that's what I can think of at the moment.

Please also note that, starting with Perl 5.26, there is also a xx modifier with different rules.


In reply to Re: How to strip comments and whitespace from a regex defined with /x? by Laurent_R
in thread How to strip comments and whitespace from a regex defined with /x? by jh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.