Re: Break a long regex across multiple lines of code, with comments

Replies are listed 'Best First'.

Re^2: [OT: Pedantic] Break a long regex across multiple lines of code, with comments
by AnomalousMonk (Archbishop) on Sep 23, 2015 at 03:19 UTC

[^e] # ensure there isn't an 'e' here

stevieb: A small point, but occasionally a very important one (measured in terms of how much of your hair you may pull out): your comment isn't quite right. For instance, in the string 'asd', 'sd' is not followed by an 'e', but it will not match:

c:\@Work\Perl>perl -wMstrict -le
"$_ = 'asd';
 ;;
 print 'match' if
   m/
     (?<=a)
     sd
     [^e]
   /x;
 print 'qed';
"
qed
[download]

[^e]

must

not

'e'

'asdx'

To assert simply "an 'e' must not be present" and have a match with 'asd', use a negative look-ahead:

c:\@Work\Perl>perl -wMstrict -le
"$_ = 'asd';
 ;;
 print 'match' if
   m/
     (?<=a)
     sd
     (?! e)
   /x;
 print 'qed';
"
match
qed
[download]

'asdx'

'asde'

See the "Look-Around Assertions" sub-section of the "Extended Patterns" section of perlre. See also perlretut.

Update: Just making the character class optional with [^e]? won't work because 'asde' will then match. You could exclude 'asde' while matching 'asd' and 'asdx' by adding the \z "absolute end of string" assertion
[^e]? \z
but then 'asdxy' won't match! (Would maybe [^e]* \z work? Have to know the precise data.)

Update 2: A more accurate comment for the original regex element would be
[^e] # insure a character is present that is not an 'e'

Give a man a fish: <%-{-{-{-<

[reply]
[d/l]
[select]

Re^3: [OT: Pedantic] Break a long regex across multiple lines of code, with comments

by stevieb (Canon) on Sep 23, 2015 at 23:43 UTC

Thanks AnomalousMonk for pointing this out. I will vet my example code more closely before posting it, especially when I make direct claims of functionality.

I know better.

-stevieb

[reply]

Re^2: Break a long regex across multiple lines of code, with comments
by davidfilmer (Sexton) on Sep 22, 2015 at 23:24 UTC

>>> You might be inclined to show the regex, a bit of surrounding code, and a sample of your data, as there may be more efficient/cleaner ways to do this instead of using one long regex.

Thanks. Here's my demonstrator program, which works properly (though perhaps not efficiently):

#!/usr/bin/perl

use strict;

my $string = join ( "\n", <DATA> );    #slurp it all into a string wit
+h newlines

my( $configuration, $memory, $serial_number ) =
   ( $string =~ /System Configuration:\s+([\w\s]*?)\n.*Memory size:\s+
+(\d+).*Chassis Serial Number\W+(\w+)/s );

print(
   "System Configuration: '$configuration'\n",
   "Memory Size:          '$memory'\n",
   "Serial Number:        '$serial_number'\n\n",
);

__DATA__
============================ FW Version ============================
la
la
la
System Configuration: Oracle Corporation sun4v SPARC Enterprise T5220
la
la

Memory size: 65408 Megabytes



Version
------------------------------------------------------------
 Sun System Firmware 7.4.7 2014/01/14 18:48


====================== System PROM revisions =======================
Version
------------------------------------------------------------
OBP 4.33.6.e 2014/01/14 15:19

Chassis Serial Number
---------------------
FDL10792DE
la
la
[download]

System Configuration: 'Oracle Corporation sun4v SPARCE nterprise T5220
+'
Memory Size:          '65408'
Serial Number:        'FDL10792DE'
[download]

[reply]
[d/l]
[select]

Re^3: Break a long regex across multiple lines of code, with comments

by Athanasius (Archbishop) on Sep 23, 2015 at 07:59 UTC

Hello davidfilmer,

Here's my demonstrator program, which works properly

It will work properly only as long as you have no more than one configuration/memory/serial_no dataset in the file. As soon as you add a second set, the regex fails: