Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Bravo! Abigail-II++

Now the question is: "How does it work?". This is my (possibly buggy) exploration of the technique.

First I printed out the regex that is generated.

(?:(?{local $q [0] = $nodes [0]})| (?{local $q [0] = $nodes [1]})| (?{local $q [0] = $nodes [2]})| (?{local $q [0] = $nodes [3]})| (?{local $q [0] = $nodes [4]})| (?{local $q [0] = $nodes [5]})| (?{local $q [0] = $nodes [6]})| (?{local $q [0] = $nodes [7]})) (?:(?{local $q [1] = $nodes [0]})| (?{local $q [1] = $nodes [1]})| (?{local $q [1] = $nodes [2]})| (?{local $q [1] = $nodes [3]})| (?{local $q [1] = $nodes [4]})| (?{local $q [1] = $nodes [5]})| (?{local $q [1] = $nodes [6]})| (?{local $q [1] = $nodes [7]})) (?(?{$q [1] eq $q [0] || !$graph {$q [0]} {$q [1]}})x|) (?:(?{local $q [2] = $nodes [0]})| (?{local $q [2] = $nodes [1]})| (?{local $q [2] = $nodes [2]})| (?{local $q [2] = $nodes [3]})| (?{local $q [2] = $nodes [4]})| (?{local $q [2] = $nodes [5]})| (?{local $q [2] = $nodes [6]})| (?{local $q [2] = $nodes [7]})) (?(?{$q [2] eq $q [0] || $q [2] eq $q [1] || !$graph {$q [1]} {$q [2]}})x|) (?:(?{local $q [3] = $nodes [0]})| (?{local $q [3] = $nodes [1]})| (?{local $q [3] = $nodes [2]})| (?{local $q [3] = $nodes [3]})| (?{local $q [3] = $nodes [4]})| (?{local $q [3] = $nodes [5]})| (?{local $q [3] = $nodes [6]})| (?{local $q [3] = $nodes [7]})) (?(?{$q [3] eq $q [0] || $q [3] eq $q [1] || $q [3] eq $q [2] || !$graph {$q [2]} {$q [3]}})x|) (?:(?{local $q [4] = $nodes [0]})| (?{local $q [4] = $nodes [1]})| (?{local $q [4] = $nodes [2]})| (?{local $q [4] = $nodes [3]})| (?{local $q [4] = $nodes [4]})| (?{local $q [4] = $nodes [5]})| (?{local $q [4] = $nodes [6]})| (?{local $q [4] = $nodes [7]})) (?(?{$q [4] eq $q [0] || $q [4] eq $q [1] || $q [4] eq $q [2] || $q [4] eq $q [3] || !$graph {$q [3]} {$q [4]}})x|) (?:(?{local $q [5] = $nodes [0]})| (?{local $q [5] = $nodes [1]})| (?{local $q [5] = $nodes [2]})| (?{local $q [5] = $nodes [3]})| (?{local $q [5] = $nodes [4]})| (?{local $q [5] = $nodes [5]})| (?{local $q [5] = $nodes [6]})| (?{local $q [5] = $nodes [7]})) (?(?{$q [5] eq $q [0] || $q [5] eq $q [1] || $q [5] eq $q [2] || $q [5] eq $q [3] || $q [5] eq $q [4] || !$graph {$q [4]} {$q [5]}})x|) (?:(?{local $q [6] = $nodes [0]})| (?{local $q [6] = $nodes [1]})| (?{local $q [6] = $nodes [2]})| (?{local $q [6] = $nodes [3]})| (?{local $q [6] = $nodes [4]})| (?{local $q [6] = $nodes [5]})| (?{local $q [6] = $nodes [6]})| (?{local $q [6] = $nodes [7]})) (?(?{$q [6] eq $q [0] || $q [6] eq $q [1] || $q [6] eq $q [2] || $q [6] eq $q [3] || $q [6] eq $q [4] || $q [6] eq $q [5] || !$graph {$q [5]} {$q [6]}})x|) (?:(?{local $q [7] = $nodes [0]})| (?{local $q [7] = $nodes [1]})| (?{local $q [7] = $nodes [2]})| (?{local $q [7] = $nodes [3]})| (?{local $q [7] = $nodes [4]})| (?{local $q [7] = $nodes [5]})| (?{local $q [7] = $nodes [6]})| (?{local $q [7] = $nodes [7]})) (?(?{$q [7] eq $q [0] || $q [7] eq $q [1] || $q [7] eq $q [2] || $q [7] eq $q [3] || $q [7] eq $q [4] || $q [7] eq $q [5] || $q [7] eq $q [6] || !$graph {$q [6]} {$q [7]}})x|) (?{ @path = @q })

Which is a doosey, but upon closer inspection reveals that there are a good deal of repeating elements. If you can understand one of these groups, then the overall picture falls into place.

The eye first picks out this as the first repeating group

(?:(?{local $q [0] = $nodes [0]})| (?{local $q [0] = $nodes [1]})| (?{local $q [0] = $nodes [2]})| (?{local $q [0] = $nodes [3]})| (?{local $q [0] = $nodes [4]})| (?{local $q [0] = $nodes [5]})| (?{local $q [0] = $nodes [6]})| (?{local $q [0] = $nodes [7]})) (?:(?{local $q [1] = $nodes [0]})| (?{local $q [1] = $nodes [1]})| (?{local $q [1] = $nodes [2]})| (?{local $q [1] = $nodes [3]})| (?{local $q [1] = $nodes [4]})| (?{local $q [1] = $nodes [5]})| (?{local $q [1] = $nodes [6]})| (?{local $q [1] = $nodes [7]})) (?(?{$q [1] eq $q [0] || !$graph {$q [0]} {$q [1]}})x|)

and this as the second

(?:(?{local $q [2] = $nodes [0]})| (?{local $q [2] = $nodes [1]})| (?{local $q [2] = $nodes [2]})| (?{local $q [2] = $nodes [3]})| (?{local $q [2] = $nodes [4]})| (?{local $q [2] = $nodes [5]})| (?{local $q [2] = $nodes [6]})| (?{local $q [2] = $nodes [7]})) (?(?{$q [2] eq $q [0] || $q [2] eq $q [1] || !$graph {$q [1]} {$q [2]}})x|)

But on closer inspection, you'll notice that the first of these is actually 2 groups, it just happens that the second element of the first group is missing. This is clearly shown by laying out the first three groups side-by-side. (Though it isn't so clear if your terminal is restricted to 70 chars as will probably be the case when this is displayed in a post :()

(?:(?{local $q [0] = $nodes [0]})| (?:(?{local $q [1] = $nodes [0]}) +| (?:(?{local $q [2] = $nodes [0]})| (?{local $q [0] = $nodes [1]})| (?{local $q [1] = $nodes [1]}) +| (?{local $q [2] = $nodes [1]})| (?{local $q [0] = $nodes [2]})| (?{local $q [1] = $nodes [2]}) +| (?{local $q [2] = $nodes [2]})| (?{local $q [0] = $nodes [3]})| (?{local $q [1] = $nodes [3]}) +| (?{local $q [2] = $nodes [3]})| (?{local $q [0] = $nodes [4]})| (?{local $q [1] = $nodes [4]}) +| (?{local $q [2] = $nodes [4]})| (?{local $q [0] = $nodes [5]})| (?{local $q [1] = $nodes [5]}) +| (?{local $q [2] = $nodes [5]})| (?{local $q [0] = $nodes [6]})| (?{local $q [1] = $nodes [6]}) +| (?{local $q [2] = $nodes [6]})| (?{local $q [0] = $nodes [7]})) (?{local $q [1] = $nodes [7]}) +) (?{local $q [2] = $nodes [7]})) (?(?{$q [1] eq $q [0] || + (?(?{$q [2] eq $q [0] || !$graph {$q [0]} {$q [1]}})x +|) $q [2] eq $q [1] || + !$graph {$q [1]} {$q [2]}})x|)

There are 8 groups, one per node in the graph, and two elements per node, plus a final 1-line. Using group 3 as placeholder for the others and breaking out the 2 elements we have

(?:(?{local $q [2] = $nodes [0]})| (?{local $q [2] = $nodes [1]})| (?{local $q [2] = $nodes [2]})| (?{local $q [2] = $nodes [3]})| (?{local $q [2] = $nodes [4]})| (?{local $q [2] = $nodes [5]})| (?{local $q [2] = $nodes [6]})| (?{local $q [2] = $nodes [7]})) (?(?{$q [2] eq $q [0] || $q [2] eq $q [1] || !$graph {$q [1]} {$q [2]}})x|)

Looking at the first element we can see that it consists of a non-capturing group containing 8 expressions or'd together.

(?: exp1 | exp2 | exp3 ... )

And looking at each of the expressions, we find a code block (?{ ... }) and the contents of each code block follows a consistant pattern.  local $q[n] = $nodes[m] where n varies by group and m varies by expression. So each expression simply assigns the name of the next node from @nodes to a localise copy of the nth group element of @q.

The code block construct (?{...}) is described in perlre as

This zero-width assertion evaluate any embedded Perl code. It always succeeds, and its code is not interpolated.

That worried me for a while. If each block always succeeded, the how does or'ing them together help allow us to iterate?

The answer is, that that is what the regex engine does. If a regex fails somewhere to the right of an alternation grouping, it backtracks and picks the next (succeeding) alternate and then attempts to move forward again. As each of the expressions always succeeds, then each time the RE backtacks to this alternation, it will pick the sequentially next alternation for its next attempt.

This can be viewed that within a regex, an alternation is a for loop, iterating over each of its alternations, with an implicit next for any that fail. In this case they will never fail, so they will all be chosen in turn until the overall regex succeeds.

Moving on to the second element in each group we find (slightly reformatted as Abigail's consistant but eclectic formatting always throws me:)

# The second element was missing from the first group. (? (?{ $q[1] eq $q[0] || !$graph{$ +q [0]}{$q [1]} } ) x | ) (? (?{ $q[2] eq $q[0] || $q[2] eq $q[1] || !$graph{$ +q [1]}{$q [2]} } ) x | ) (? (?{ $q[3] eq $q[0] || $q[3] eq $q[1] || $q[3] eq $q[2] || !$graph{$ +q [2]}{$q [3]} } ) x | )

The pattern is clear, but what is going on. The thing to notice is that there are two (nested) expressions here.

The outer expression is (? (inner) x | )

Looking this up in perlre, it matches (?(condition)yes-pattern|no-pattern) which is described as

Conditional expression. (condition) should be either an integer in parentheses (which is valid if the corresponding pair of parentheses matched), or look-ahead/look-behind/evaluate zero-width assertion.

which is about as clear as mud:).

I never really understood this description until now...and the understanding came slowly even with the aid of a complete worked example.

The "condition expression" is (not unsurprisingly) the RE equivalent of perls if statement.

if(cond) re_1 else re_2

If the first clause (condition) is true,

then attempt to match re_1 at this point in the string

else, attempt to match re_2, if supplied.

The bit that eluded me was that the overall expression succeeds if

  1. The condition is true and re_1 matches at this point.
  2. The condition is false and re_2 matches at the point.

I certainly never understood that from the description. I also never realised the significance of the "/evaluate zero-width assertion." part of the description. This is giving a name (which I don;t recall seeing used anywhere else for the (?{...}) construct. It also gives a better understanding of the syntax diagram for the conditional expression.

Getting back to Abigail's regex, this element (? (inner) x | ) is saying execute the code block ("evaluate zero-width assertion") and (despite the description for that construct saying they always succeed) use the result of that evaluation to choose which of the following regular expressions to try and match at this point. In this case, Abigail uses 'x' as re_1 and '' as re_2. This is a little mysterious at first until you look back at the code where the regex is used.

if ("" =~ /$regex/x) {

The regex is being used on an empty string. re_1 ('x') will never succeed in this string regardless of where in the regex it is used. However, re_2 (''), will always match (in this string or any other) regardless of where it is used.

The upshot of all of these conditions, expressions and re's is that if the code block evaluates to true, the conditional element will fail and cause backtracking in the overall re. If the code block evaluates false then the conditional expression succeeds and the overall re moves forward to the next element.

Looking at the contents of the code blocks.

# null. $q[1] eq $q[0] || !$graph{$q [0]}{ +$q [1]} $q[2] eq $q[0] || $q[2] eq $q[1] || !$graph{$q [1]}{ +$q [2]} $q[3] eq $q[0] || $q[3] eq $q[1] || $q[3] eq $q[2] || !$graph{$q [2]}{ +$q [3]}

Each condition is saying

True if the element of @q for this group is equal to that of the element of @q for any of the preceding groups OR if the node of the graph for this group doesn't share an edge with any of the preceeding groups.

Remembering that the second element succeeds if the code block fails and vice versa, the meaning of the second element of each group boils down to

Backtrack if we've already passed through this node, or if this node doesn;t share an edge with the preceeding node.

Repeat the group once for each node and that is essentially that, except that the final line just copies the contents of the path found from @q to @path.

Thanks Abigail. As always, I learned a crap load from your code.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.


In reply to Re: Finding Hamiltonian Paths using the Regexp Engine by BrowserUk
in thread Finding Hamiltonian Paths using the Regexp Engine by Abigail-II

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-03-28 16:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found