You feel a pair of eyes staring at you, from behind the trees.
Well, I guess I'll pipe up. First, I would like to see a parallel between the actual node names and the pseudocode you have above. Here's how /fo*?bar/i is currently represented (via -Dr or -mre=debug):
1: EXACTF <f>(3)
3: MINMOD(4)
4: STAR(7)
5: EXACTF <o>(0)
7: EXACTF <bar>(10)
10: END(0)
I'd like to see something like that. The numbers in parentheses represent where to go after the node has matched -- a value of 0 is obviously a special one.
start:
exactf "f", find_o
advance
goto start
find_o:
exactf "o"
exactf "bar", done
goto find_o
done:
end
That looks about right to me. Here we have exactf taking one or two arguments -- the string to match, and the node to jump to if successful (optional). If you want to see how /fo*bar/ differs...
start:
exactf "f", find_o
advance
goto start
find_o:
save
exactf "o", find_o
find_bar:
exactf "bar", done
restore
goto find_bar
done:
end
Here, we've added find_o as the second argument to exactf "o", making it execute itself as many times as it can (greedy). We've also added two important instructions (which aren't all that important, given the text of the regex), save and restore. These basically make a push-pop method for recording and going back to a location in the string.
Notice this code shows absolutely NOTHING in the way of optimizations. Here's the optimized code for /fo*?bar/:
start:
exactf "f", find_o
advance
goto start
find_o:
exactf "o", find_o
exactf "bar", done
goto start
done:
end
And here's the (surprisingly similar!) optimized code for /fo*bar/:
start:
exactf "f", find_o
advance
goto start
find_o:
exactf "o", find_o
exactf "bar", done
goto start
done:
end
They're the same because the regexes are one in the same in this specific example. More complex regexes will behave differently. This regex just happens to be wonderfully implemented like a DFA, really...
Well, I hope this has been a good example.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; |