Ovid has asked for the wisdom of the Perl Monks concerning the following question:
Part of the way that I learn something is to read the documentation and then write up a Meditation about it to ensure that I really grokked what is going on. I seem to have some problems understanding exactly what is going on with the re debug output.
One can use the re pragma to see the opcodes generated by the regex engine. This is useful when you're trying to debug a regex. Typically, I just follow the output to see what's matching and dismissed the rest as voodoo. However, I decided that I really wanted to get a grasp on it and petral and japhy pointed me to "perldebguts". This has a section entitled "Debugging regular expressions". I won't go over it all, but here's some sample output and a quote:
>perl -mre=debug -e '/ab+[cd]/' Compiling REx `ab+[cd]' size 15 first at 1 1: EXACT <a>(3) 3: PLUS(6) 4: EXACT <b>(0) 6: ANYOF[cd](15) 15: END(0) anchored `ab' at 0 floating `b' at 1..2147483647 (checking anchore +d) minlen 3 Freeing REx: `ab+[cd]'
Each of the numbered lines is in the following format:
" "*id*: *TYPE* *OPTIONAL-INFO* (*next-id*)
See the line of the output that reads "size 15 first at 1"? Here's what the docs say about it:
[It's] the size of the compiled form (in arbitrary units, usually 4-byte words) and the label *id* of the first node that does a match.
That seems clear enough. The match starts at the line which has an ID of 1 (1: EXACT <a>(3)). I was asking in the chatterbox for examples that don't start at one and wog listed /x+/. Here's the result:
C:\>perl -mre=debug -e "/x+/" Freeing REx: `,' Compiling REx `x+' size 4 first at 2 1: PLUS(4) 2: EXACT <x>(0) 4: END(0) anchored `x' at 0 (checking anchored) plus minlen 1 Freeing REx: `x+'
japhy pointed out that by switching the plus to a zero, it will start at 1:
C:\>perl -mre=debug -e "/x*/" Freeing REx: `,' Compiling REx `x*' size 4 first at 1 1: STAR(4) 2: EXACT <x>(0) 4: END(0) minlen 0 [snip]
tye mentioned if "first at 5" and opcode 5: is EXACT (ab)(9), then the re can do something similar to pos($str)= index($str,"ab",pos($str)) before each match to speed matching. So, I think the docs mean that the "first at \d+" is the first opcode that must match. Is this correct?
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Understanding 're' debug output
by japhy (Canon) on Jan 10, 2002 at 06:36 UTC |