Re: nested reg ex over multiple lines

On reflection, there is a way to do this kind of parsing, kind of use, using regexes. I think of it as the "inch along" with negative lookahead strategy described by Merlyn (sort of) at Death to Dot Star. Something like this does what you described you needed above, I believe.

use strict;
use warnings;

my $content = "";
while (<DATA>) {
    $content = $content . $_;
}

#print "content: $content"; # sanity check

while ($content =~m/(
                    CALCON\([^)]*?\)[\r\n]*{[^}]*?}            #entire
+ match. Same as in negative lookahead on next line.
                    ((?!CALCON\([^)]*?\)[\r\n]*{[^}]*?}).)* #inch alon
+g with negative lookahead
                    )/xsmg){
    my $entire_match = $1;
    if ($entire_match =~ /CALCON\((.*?)\)/) {
        my $test_number = $1;
        print "entire match: $entire_match\n";
        print "test number: $test_number\n";
        print "\n\n";
    }
    
    
}
__DATA__

CALCON(test1)
{
  TYPE(U8)
  FEATURE(DCOM)
  NAM(stmin)
  LABEL(Min seperation time between CFs)
  MIN(0)
  MAX(127)
  UNITS(ms)  
}

CALCON(test2)
{
  TYPE(U16)
  FEATURE(DCOM)
  NAM(dcomc_sestmr_timeout)
  LABEL(DCOM Session Timer Timeout)
  MIN(0)
  MAX(65535)
  UNITS(ms)
}

CALCON(test3)
{
  TYPE(U16)
  FEATURE(CALCON)
  NAM(dcomc_sestmr_timeout)
  LABEL(DCOM Session Timer Timeout)
  MIN(CALCON)
  MAX(65535)
  UNITS(ms)
}
[download]

This may be a case of killing a mosquito with a flamethrower, but... well... TIMTOWTDI. Maybe you like it :)

But seriously, an internal rule of thumb for me is that when I start having to inch along, it may be time to stop thinking regexes and start thinking something else.

Disclaimer: this works for your input data, but it makes me a little uneasy. Are there may be edge cases I haven't thought of? That's why the gut still says, uh oh, reach for P::RD.

UPDATE: Replaced the $& with $1 per holli below.

UPDATE 2: Made the "inch ahead" a more thorough, so doesn't fail on "CALCON" in the data area, as in the third test case. Originally this was just

$content =~m/(CALCON((?!CALCON).)* )/xsmg
[download]

Comment on Re: nested reg ex over multiple lines Select or Download Code

Replies are listed 'Best First'.
Re^2: nested reg ex over multiple lines by holli (Abbot) on Jun 20, 2005 at 14:07 UTC
Are you aware of the runtime drawbacks that $& (and his brethren $' and $`) impose?	[reply]
Re^3: nested reg ex over multiple lines by tphyahoo (Vicar) on Jun 20, 2005 at 14:10 UTC
Yeah, but to be honest, I had kind of forgotten about them when I posted the above. I was just all into the inch along with negative lookahead thing. Basically, the $& construct is slow, and might not be supported into the future. (Right?) What's the "right" way to do this again? UPDATE: Changed above code to use $1 instead.	[reply]
Re^4: nested reg ex over multiple lines by holli (Abbot) on Jun 20, 2005 at 14:15 UTC
... `while ($content =~m/(CALCON((?!CALCON).))/smg){ my $entire_match = $1;` [download] ... holli, /regexed monk/*	[reply] [d/l]