in reply to Re: Flattening REs into opcodes for Perl 6
in thread Flattening REs into opcodes for Perl 6

First of all, thanks for replying.

start: exactf "f", find_o advance goto start find_o: exactf "o" exactf "bar", done goto find_o done: end
How does that deal with being matched against, say, "afafoobarz"? Perhaps:
start: save exactf "f", find_o restore #to keep the balance advance goto start find_o: exactf "o", find_bar restore advance goto start find_bar: exactf "bar", done goto find_o ...
would be more correct? Actually, come to think of it, I don't see how it deals with a case like "babylon" (it doesn't match at all) either--maybe advance should have a parameter that it jumps to if it fails?

=cut
--Brent Dax
There is no sig.

Replies are listed 'Best First'.
Re: Re: Re: Flattening REs into opcodes for Perl 6
by japhy (Canon) on Nov 03, 2001 at 04:06 UTC
    Ooh, right, thanks for pointing that out. Let's see...
    start: exactf "f", find_o advance quit goto start find_o: # non-greedy, by the way goto find_bar find_o_real: exactf "o", find_o goto start find_bar: exactf "bar", done goto find_o_real done: end quit: fail
    I should point out that I haven't optimized find_o out for readability purposes, and that in "real life", this would look like:
    start: exactf "f", find_bar advance quit goto start find_o_real: exactf "o", find_bar goto start find_bar: exactf "bar", done goto find_o_real done: end quit: fail
    This feels like drawing a DFA... honestly. I'm not sure where save and restore are going to come into play in this particular example -- it seems not at all.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Would this work? I don't have the ops integrated into Parrot's normal set yet, but I'd like to know if I'm on the right track. re_match is like calling RE_0 as a subroutine, and re_finished is like returning from it. The set is an assignment (in this case to Integer register 0); pretty much everything else that may look odd in here has been explained inline.
      re_match RE_0, "afoobarz" RE_0: # /fo*?bar/s (and yes, the "s" flag is pointless) re_flags "s" re_minlength 4 $start: #a 'local' label--it disappears the next time a normal # label is seen re_pushindex #in case we fail in $find_o or something re_literal "f", $find_bar re_popindex re_advance $failure branch $start #think "goto" $find_o: re_literal "o", $find_bar branch $start $find_bar: re_literal "bar", $success branch $find_o $success: set I0, 1 branch $end $failure: set I0, 0 $end: re_finished
      Looking at this, it seems that most of the time we use the second parameter we're trying to avoid a branch put in in case we fail; is this just a quirk of this example, or might it make more sense for the second parameter to represent where to jump if we fail?
      $start: re_pushindex re_literal "f", $advance $find_bar: re_literal "bar", $find_o #if we made it this far, we're done! set I0, 1 branch $end $find_o: re_literal "o", $advance branch $find_bar $advance: re_popindex re_advance $failure branch $start $failure: set I0, 0 $end: re_finished
      This would be a bigger win in the case of, say, several character classes in a row:
      RE_1: #/[az][by][cx]/ re_flags "" re_minlength 3 $start: re_pushindex re_oneof "az", $advance re_oneof "by", $advance re_oneof "cx", $advance set I0, 1 branch $end $advance: re_popindex re_advance $fail branch $start $fail: set I0, 0 $end: re_finished

      =cut
      --Brent Dax
      There is no sig.

        (The previous content was erroneous.)

        It looks good; I'm tired, though. I'm going to sleep, and I'm going to be programming C all day (well, until 3:00 pm) tomorrow, so I'll get back to you.

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;