in reply to nested reg ex over multiple lines

On reflection, there is a way to do this kind of parsing, kind of use, using regexes. I think of it as the "inch along" with negative lookahead strategy described by Merlyn (sort of) at Death to Dot Star. Something like this does what you described you needed above, I believe.
use strict; use warnings; my $content = ""; while (<DATA>) { $content = $content . $_; } #print "content: $content"; # sanity check while ($content =~m/( CALCON\([^)]*?\)[\r\n]*{[^}]*?} #entire + match. Same as in negative lookahead on next line. ((?!CALCON\([^)]*?\)[\r\n]*{[^}]*?}).)* #inch alon +g with negative lookahead )/xsmg){ my $entire_match = $1; if ($entire_match =~ /CALCON\((.*?)\)/) { my $test_number = $1; print "entire match: $entire_match\n"; print "test number: $test_number\n"; print "\n\n"; } } __DATA__ CALCON(test1) { TYPE(U8) FEATURE(DCOM) NAM(stmin) LABEL(Min seperation time between CFs) MIN(0) MAX(127) UNITS(ms) } CALCON(test2) { TYPE(U16) FEATURE(DCOM) NAM(dcomc_sestmr_timeout) LABEL(DCOM Session Timer Timeout) MIN(0) MAX(65535) UNITS(ms) } CALCON(test3) { TYPE(U16) FEATURE(CALCON) NAM(dcomc_sestmr_timeout) LABEL(DCOM Session Timer Timeout) MIN(CALCON) MAX(65535) UNITS(ms) }
This may be a case of killing a mosquito with a flamethrower, but... well... TIMTOWTDI. Maybe you like it :)

But seriously, an internal rule of thumb for me is that when I start having to inch along, it may be time to stop thinking regexes and start thinking something else.

Disclaimer: this works for your input data, but it makes me a little uneasy. Are there may be edge cases I haven't thought of? That's why the gut still says, uh oh, reach for P::RD.

UPDATE: Replaced the $& with $1 per holli below.

UPDATE 2: Made the "inch ahead" a more thorough, so doesn't fail on "CALCON" in the data area, as in the third test case. Originally this was just

$content =~m/(CALCON((?!CALCON).)* )/xsmg

Replies are listed 'Best First'.
Re^2: nested reg ex over multiple lines
by holli (Abbot) on Jun 20, 2005 at 14:07 UTC
    Are you aware of the runtime drawbacks that $& (and his brethren $' and $`) impose?
      Yeah, but to be honest, I had kind of forgotten about them when I posted the above. I was just all into the inch along with negative lookahead thing.

      Basically, the $& construct is slow, and might not be supported into the future. (Right?) What's the "right" way to do this again?

      UPDATE: Changed above code to use $1 instead.

        ...
        while ($content =~m/(CALCON((?!CALCON).)*)/smg){ my $entire_match = $1;
        ...


        holli, /regexed monk/