harangzsolt33 has asked for the wisdom of the Perl Monks concerning the following question:

Is it possible to "break up" a regex so that it spans multiple lines?

Replies are listed 'Best First'.
Re: Multiline regex
by kcott (Archbishop) on Dec 18, 2022 at 04:04 UTC

    G'day harangzsolt33,

    "Is it possible to "break up" a regex so that it spans multiple lines?"

    The /x and /xx modifiers exist for this purpose. See "perlre: Modifiers: /x and /xx". There are quite a few gotchas associated with these.

    In http://wzsn.net/perl/index.html, you say you're using "TinyPerl 5.8". I'm not familiar with this, but I'll assume it's a cut-down version of the standard "Perl 5.8"; I don't know what features or support it modifies or excludes. The following version notes refer to "Perl 5.8"; you'll need to adjust for any "TinyPerl 5.8" limitations. (Perl v5.8.0 was released over 20 years agoperlhist; you're missing out on many features, bug & security fixes, and optimisations, with such an old version; an upgrade is recommended.)

    I personally find the /x modifier to be very helpful, particularly with respect to improved readability, and use it often (except for the simplest regexes). On the other hand, I'm not convinced that /xx offers equivalent enhancements; making changes can, on occasion, be tricky. Of course, those are my preferences; they're not recommendations, make your own choices.

    When using either /x or /xx, you need to be mindful of whitespace and hash characters. Here's a non-exhaustive demonstation of some of the similarities and differences:

    $ perl -E ' my $x = qq{ A B\tC # comment}; say q{Original string: |}, $x, q{|}; say q{s/\s//g: |}, $x =~ s/\s//gr, q{|}; say q{s/\s//gx: |}, $x =~ s/\s//grx, q{|}; say q{s/\s//gxx: |}, $x =~ s/\s//grxx, q{|}; say q{s/ //g: |}, $x =~ s/ //gr, q{|}; say q{s/ //gx: |}, $x =~ s/ //grx, q{|}; say q{s/ //gxx: |}, $x =~ s/ //grxx, q{|}; say q{s/[ ]//g: |}, $x =~ s/[ ]//gr, q{|}; say q{s/[ ]//gx: |}, $x =~ s/[ ]//grx, q{|}; #say q{s/[ ]//gxx: |}, $x =~ s/[ ]//grxx, q{|}; say q{s/[\ ]//gxx: |}, $x =~ s/[\ ]//grxx, q{|}; say q{s/#//g: |}, $x =~ s/#//gr, q{|}; say q{s/#//gx: |}, $x =~ s/#//grx, q{|}; say q{s/#//gxx: |}, $x =~ s/#//grxx, q{|}; say q{s/[ #]//g: |}, $x =~ s/[ #]//gr, q{|}; say q{s/[ #]//gx: |}, $x =~ s/[ #]//grx, q{|}; say q{s/[ #]//gxx: |}, $x =~ s/[ #]//grxx, q{|}; ' Original string: | A B C # comment| s/\s//g: |ABC#comment| s/\s//gx: |ABC#comment| s/\s//gxx: |ABC#comment| s/ //g: |AB C#comment| s/ //gx: | A B C # comment| s/ //gxx: | A B C # comment| s/[ ]//g: |AB C#comment| s/[ ]//gx: |AB C#comment| s/[\ ]//gxx: |AB C#comment| s/#//g: | A B C comment| s/#//gx: | A B C # comment| s/#//gxx: | A B C # comment| s/[ #]//g: |AB Ccomment| s/[ #]//gx: |AB Ccomment| s/[ #]//gxx: | A B C comment|

    If I uncomment line 16 (s/[ ]//gxx), I get a single line of output:

    Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE ]/ at -e lin +e 16.

    Line 17, with s/[\ ]//gxx, fixes this. It also demonstrates one of the traps for the unwary.

    As well as adding modifiers to the end of m// and s///, you can also embed them in "Extended Patterns". I find this is handy when used with qr//:

    my $re = qr{(?x: ... multiline regex pattern here ... )};

    Examples of usage crop up here fairly often. A couple of my most recent offerings: "a fairly simple example"; "a more involved, and fully commented, example". And, from many years ago, "a very long and complex example, using qr{...}msx".

    — Ken

      Wow, thank you. That was really helpful! I also looked at your example. I use TinyPerl 5.8 because it is very small, and I don't do a lot with it. Lately I have been playing around with graphics, and I wrote a very long Perl script, which I would like to cut up into smaller pieces (maybe). For this purpose, I want to write a script that analyzes my script and creates a list of dependencies that shows which function depends on which other function. This is why I wanted to know if a multiline regex can exist.

        Rather than reimplementing something to try and (most likely, poorly and/or incompletely) parse Perl let it do it for you with B::Xref.

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

Re: Multiline regex
by johngg (Canon) on Dec 17, 2022 at 22:11 UTC

    Yes. See the x modifier in perlre.

    Cheers,

    JohnGG