in reply to Re: Matching nested begin/ends
in thread Matching nested begin/ends

Yours is wrong.
#!/usr/bin/perl use strict; use warnings 'all'; use vars qw /$re/; $re = qr /begin (?: (?>[^be])* |(??{ $re }) | [be] )* end/x; sub pass {local $_ = shift; print /^$re$/ ? "ok\n" : "not ok: $_\n"} sub fail {local $_ = shift; print ! /^$re$/ ? "ok\n" : "not ok: $_\n"} pass 'begin end'; fail 'begin en'; fail 'begin nd'; pass 'begin begin end end'; pass 'beginend'; pass 'beginbeginbeginendendend'; pass 'begin begin end begin begin end begin end end end'; fail 'begin begin end begin egin end begin end end end'; fail 'begin end begin end'; __END__ ok ok ok ok ok ok ok not ok: begin begin end begin egin end begin end end end not ok: begin end begin end
It matches strings that shouldn't be matched.

Abigail

Replies are listed 'Best First'.
Re: Re: Matching nested begin/ends
by jryan (Vicar) on Aug 01, 2002 at 23:16 UTC

    It is not.

    We simply test differently; I tested with something like this (using your input):

    my$re = qr/ begin (?: (?> [^be]* ) |(??{ $re }) | [be] )* end /x; foreach (<DATA>) { chomp; my @matches = $_ =~ /($re)/g; print qq(For "$_":\n\t); print (@matches ? join("*",@matches) : "no matches", "\n"); } __DATA__ begin end begin en begin nd begin begin end end beginend beginbeginbeginendendend begin begin end begin begin end begin end end end begin begin end begin egin end begin end end end begin end begin end

    Which prints:

    For "begin end": begin end For "begin en": no matches For "begin nd": no matches For "begin begin end end": begin begin end end For "beginend": beginend For "beginbeginbeginendendend": beginbeginbeginendend For "begin begin end begin begin end begin end end end": begin begin end begin begin end begin end end For "begin begin end begin egin end begin end end end": begin begin end begin egin end begin end end For "begin end begin end": begin end*begin end
      That just means your test isn't good enough. You are testing whether "begin end end begin" *contains* a matched begin/end pair. My test however anchor the regex to the beginning and end, and hence correctly flag "begin end begin end" as to *not* be a nested begin/end construct.

      It's the same that /\d/ is *not* a correct regex to test if a string is a number. It's a test to see if a string contains a number. But if all you want to know is whether a string contains a begin/end delimited substring, all you need is /begin.*end/. No recursion required.

      Abigail

        I again disagree. Delimited text is generally part of a larger document that needs to be processed. I can see your point that anchoring the regex is the best way to fully verify the text item's syntax. However, why would this be needed? The item must have followed some pattern to be extracted in the first place.

        Also, /begin.*end/ will allow a begin followed by an end, which will allow an item like beginbeginend to pass with no problems; hence recursion is needed.