in reply to Re: Re: Re: Flattening REs into opcodes for Perl 6
in thread Flattening REs into opcodes for Perl 6

Would this work? I don't have the ops integrated into Parrot's normal set yet, but I'd like to know if I'm on the right track. re_match is like calling RE_0 as a subroutine, and re_finished is like returning from it. The set is an assignment (in this case to Integer register 0); pretty much everything else that may look odd in here has been explained inline.
re_match RE_0, "afoobarz" RE_0: # /fo*?bar/s (and yes, the "s" flag is pointless) re_flags "s" re_minlength 4 $start: #a 'local' label--it disappears the next time a normal # label is seen re_pushindex #in case we fail in $find_o or something re_literal "f", $find_bar re_popindex re_advance $failure branch $start #think "goto" $find_o: re_literal "o", $find_bar branch $start $find_bar: re_literal "bar", $success branch $find_o $success: set I0, 1 branch $end $failure: set I0, 0 $end: re_finished
Looking at this, it seems that most of the time we use the second parameter we're trying to avoid a branch put in in case we fail; is this just a quirk of this example, or might it make more sense for the second parameter to represent where to jump if we fail?
$start: re_pushindex re_literal "f", $advance $find_bar: re_literal "bar", $find_o #if we made it this far, we're done! set I0, 1 branch $end $find_o: re_literal "o", $advance branch $find_bar $advance: re_popindex re_advance $failure branch $start $failure: set I0, 0 $end: re_finished
This would be a bigger win in the case of, say, several character classes in a row:
RE_1: #/[az][by][cx]/ re_flags "" re_minlength 3 $start: re_pushindex re_oneof "az", $advance re_oneof "by", $advance re_oneof "cx", $advance set I0, 1 branch $end $advance: re_popindex re_advance $fail branch $start $fail: set I0, 0 $end: re_finished

=cut
--Brent Dax
There is no sig.

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Flattening REs into opcodes for Perl 6
by japhy (Canon) on Nov 03, 2001 at 09:49 UTC
    (The previous content was erroneous.)

    It looks good; I'm tired, though. I'm going to sleep, and I'm going to be programming C all day (well, until 3:00 pm) tomorrow, so I'll get back to you.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Sorry, one more thing. :^)

      Angel Faus and I were discussing the best way to handle multiple backtracking thingies in the same regexp without one backtracking thingie trying to use another's index. I suggested pushing a mark (a special value, probably -1) onto the stack; if rePopindex popped a mark, it will have 'failed', and thus it would jump to the address given as its parameter. That would work something like this:

      RE: #set up flags and stuff branch $start $advance: rePopindex reAdvance $fail $start: reLiteral "f", $advance rePushindex rePushmark $findo: rePushindex reLiteral "o", $findbar branch $findo $backo: rePopindex $advance $findbar: reLiteral "o", $backo set I0, 1 reFinished $fail: set I0, 0 reFinished
      Angel suggested that when we push an address onto the stack, we give it a label as a parameter; if a matching operation fails, it pops an index/label pair off the stack, sets the index and jumps to the label. Something like this:
      RE: #set up flags and stuff reOnfail $fail #if there's nothing left on the stack branch $start $advance: reAdvance $start: rePushindex $advance reLiteral "f" $findo: rePushindex $findbar reLiteral "o" branch $findo $findbar: reLiteral "bar" set I0, 1 reFinished $fail: set I0, 0 reFinished
      Can you give me any guidance about which will work better in the long run? Angel's idea has ease-of-code-generation and code compactness going for it; mine has explicitness (for debugging) and flexibility. From what I can tell, alternation will be a bitch with either, but mine should be easier; same with optional constructs and lookaheads, from what I can see. (Of course, I could just be blinded because one is my idea and the other is someone else's; that's why I'm asking you.)

      Thank you for helping me out with this. Discussing it with an expert has made this a hundred times better than it was.

      =cut
      --Brent Dax
      There is no sig.

        I'm not sure I can say which is better. It took me a minute to grok the whole thing, but now that I do, I can see how your explicit code can be easier to use in the long run. So I guess that was just my answer.

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;