Al Shiferaw has asked for the wisdom of the Perl Monks concerning the following question:

<cpan >There is a Single IF/ENDIF statment that contains X number of single level IF/THEN statments. E.g.

IF(A) anytext
IF(B) anytext ENDIF
IF(C) anytext ENDIF
and on ....
ENDIF

desired output:
$1 = IF(A) anytext IF(B) anytext ENDIF IF(C) anytext ENDIF ENDIF
$2 = IF(B) anytext ENDIF
$3 = IF(C) anytext ENDIF<cpan>

Replies are listed 'Best First'.
Re: How to Extract Nested IF/THEN elements
by demerphq (Chancellor) on Jan 24, 2002 at 17:34 UTC
    Arbitrarily nested structures cant be matched by proper regular expressions. Luckily perls regular expression engine isnt all that regular so it probably is possible to do this. (japhy knows how I'm sure.)

    However it would be so much easier to do it with Text::Balanced or Parse::RecDescent that its unlikely anyone would try. Here is a solution using Text::Balanced

    use Text::Balanced 'extract_tagged'; use Data::Dumper; sub get_ifs { my $str=shift; my $list=shift || []; my ($extracted, $remainder, $prefix, $open, $inside, $close)=extract_tagged($str,"IF","ENDIF","(?s).*?(?=IF)"); if ($extracted) { push @$list,$extracted; get_ifs($inside,$list); get_ifs($remainder,$list) if $remainder; } return $list; } print Dumper(get_ifs(<<ENDOFIFS)); IF(A) anytext IF(B) IF(C) anytext ENDIF IF(D) anytext ENDIF ENDIF ENDIF ENDOFIFS
      What my first sentence meant to say was:

      A formal regular expression can only match an arbitrarily nested structure. Thus if we wanted to match to depth of say 6 we could do it, but if we want to match to any depth a formal regex wont work. And as I said above perls regexes arent formal regexes. They are decidedly irregular as they have support for backrefs.

      Yves / DeMerphq
      --
      When to use Prototypes?

      Thanks!
Re: How to Extract Nested IF/THEN elements
by I0 (Priest) on Jun 21, 2002 at 07:34 UTC
    $_ = " IF(A) anytext IF(B) anytext ENDIF IF(C) anytext ENDIF and on .... ENDIF "; @( = ('(',''); @) = (')',''); ($re=$_)=~s/((\bIF\b)|(\bENDIF\b)|.)/$([!$2]\Q$1\E$)[!$3]/gs; print join"\n\n",eval{/$re/},""; warn $@ if $@=~/unmatched/;