in reply to Break a long regex across multiple lines of code, with comments

Use the 'x' modifier:

m/ (?<=a) # positive lookbehind for 'a' sd # literal 'sd' [^e] # ensure there isn't an 'e' here /x;

To help troubleshoot, you can set use re 'debug';, or use YAPE::Regex::Explain.

You might be inclined to show the regex, a bit of surrounding code, and a sample of your data, as there may be more efficient/cleaner ways to do this instead of using one long regex. We won't know though unless you provide more details.

-stevieb

Replies are listed 'Best First'.
Re^2: [OT: Pedantic] Break a long regex across multiple lines of code, with comments
by AnomalousMonk (Archbishop) on Sep 23, 2015 at 03:19 UTC
    [^e]     # ensure there isn't an 'e' here

    stevieb: A small point, but occasionally a very important one (measured in terms of how much of your hair you may pull out): your comment isn't quite right. For instance, in the string 'asd', 'sd' is not followed by an 'e', but it will not match:

    c:\@Work\Perl>perl -wMstrict -le "$_ = 'asd'; ;; print 'match' if m/ (?<=a) sd [^e] /x; print 'qed'; " qed
    That's because  [^e] asserts that a character must be present, and that character must not be an 'e'. (The string 'asdx' will match.)

    To assert simply "an 'e' must not be present" and have a match with 'asd', use a negative look-ahead:

    c:\@Work\Perl>perl -wMstrict -le "$_ = 'asd'; ;; print 'match' if m/ (?<=a) sd (?! e) /x; print 'qed'; " match qed
    (This will also match 'asdx', but will not match 'asde'.)

    See the "Look-Around Assertions" sub-section of the "Extended Patterns" section of perlre. See also perlretut.

    Update: Just making the character class optional with  [^e]? won't work because 'asde' will then match. You could exclude 'asde' while matching 'asd' and 'asdx' by adding the  \z "absolute end of string" assertion
        [^e]? \z
    but then 'asdxy' won't match! (Would maybe  [^e]* \z work? Have to know the precise data.)

    Update 2: A more accurate comment for the original regex element would be
        [^e]     # insure a character is present that is not an 'e'


    Give a man a fish:  <%-{-{-{-<

      Thanks AnomalousMonk for pointing this out. I will vet my example code more closely before posting it, especially when I make direct claims of functionality.

      I know better.

      -stevieb

Re^2: Break a long regex across multiple lines of code, with comments
by davidfilmer (Sexton) on Sep 22, 2015 at 23:24 UTC
    Thank you, stevieb.

    >>> You might be inclined to show the regex, a bit of surrounding code, and a sample of your data, as there may be more efficient/cleaner ways to do this instead of using one long regex.

    Thanks. Here's my demonstrator program, which works properly (though perhaps not efficiently):

    #!/usr/bin/perl use strict; my $string = join ( "\n", <DATA> ); #slurp it all into a string wit +h newlines my( $configuration, $memory, $serial_number ) = ( $string =~ /System Configuration:\s+([\w\s]*?)\n.*Memory size:\s+ +(\d+).*Chassis Serial Number\W+(\w+)/s ); print( "System Configuration: '$configuration'\n", "Memory Size: '$memory'\n", "Serial Number: '$serial_number'\n\n", ); __DATA__ ============================ FW Version ============================ la la la System Configuration: Oracle Corporation sun4v SPARC Enterprise T5220 la la Memory size: 65408 Megabytes Version ------------------------------------------------------------ Sun System Firmware 7.4.7 2014/01/14 18:48 ====================== System PROM revisions ======================= Version ------------------------------------------------------------ OBP 4.33.6.e 2014/01/14 15:19 Chassis Serial Number --------------------- FDL10792DE la la
    OUTPUT
    System Configuration: 'Oracle Corporation sun4v SPARCE nterprise T5220 +' Memory Size: '65408' Serial Number: 'FDL10792DE'

      Hello davidfilmer,

      Here's my demonstrator program, which works properly

      It will work properly only as long as you have no more than one configuration/memory/serial_no dataset in the file. As soon as you add a second set, the regex fails:

      That’s because there are two occurrences of .* in the regex which are greedy, but need to be made non-greedy: .*?

      BTW, why don’t you use warnings? Also, why are you adding an extra newline to the end of each line of input data? Simply joining on the empty string would make $string contain the same data as in the file. But the usual Perl idiom for slurping is this (which is simpler):

      my $string = do { local $/; <DATA>; };

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,