b4swine has asked for the wisdom of the Perl Monks concerning the following question:

I need to implement a fast compare according to a fixed partial ordering. I have about 50 objects (imagine vertices on a DAG, ie a directed acyclic graph). If an object A is preferred to B, we write this as A>B (on the graph imagine a directed edge from A to B).

I need to do millions of comparisons, so my plan was to create a hash of hashes, and $pref->{A}{B} is +1, 0, -1, or undef depending on if A>B, A=B, A<B or A and B are not comparable. (Is this a good idea?)

Also, the input describing the (fixed) partial order is in the form of a array of strings, eg:

('xx < yy < zz > aa', 'bb < yy = aa > cc')
This is to be interpreted as a sequence of pairwise comparisons, so the above strings give six different paired comparisons as follows:
xx<yy; yy<zz; zz>aa; bb<yy; yy=aa; aa>cc;
This information along with other implied information due to reflexivity: we know yy>xx because xx<yy, and transitivity we know xx<yy and yy<zz so we can infer that xx<zz. (On the graph a vertex xx < yy if there is any directed path from yy to xx).

So two things to be done

I have solved the problem, and here is my code, but I find it very ugly. Would love to see something nicer.

use strict; use warnings; my $pref = {}; # # Argument is a list of comparison chain strings # my $err; sub PartialOrder { # # preference: a>b means prefer(a,b) # sub markpref { my ($a, $c, $b) = @_; $err=''; if (exists $pref->{$a}) # we knew about a { if (exists $pref->{$a}{$b}) # and b { return $err = "ERROR contradictory $a $b"# but contraditor +y unless $pref->{$a}{$b} == $c; # so error messa +ge return 1; # else nothing to +do } # but not about b $pref->{$a}{$b} = $c; # so add info } else # nothing known about +a { $pref->{$a} = {$b => $c, $a => 0}; # so set info } return 0; } sub prefer { my ($a, $c, $b) = @_; $err = ''; if ($c == 1) { return $err="ERROR: $a>$a" if $a eq $b; # error return $err if markpref @_; return $err if markpref $b, -1, $a; # # Now add all the extended relations # while (my ($x, $v) = each %{$pref->{$b}}) { if ($v>=0 and $x ne $b) { return $err if prefer($a, 1, $x) } } while (my ($x, $v) = each %{$pref->{$a}}) { if ($v<=0 and $x ne $a) { return $err if prefer($x, 1, $b) } } } else # $c == 0 { if ($a eq $b) { return 0; } return $err if markpref @_; return $err if markpref $b, 0, $a; while (my ($x, $v) = each %{$pref->{$b}}) { if ($v>0) { return $err if prefer($a, 1, $x) } elsif ($v<0) { return $err if prefer($x, 1, $a) } else { return $err if prefer($x, 0, $a) } } while (my ($x, $v) = each %{$pref->{$a}}) { if ($v>0) { return $err if prefer($b, 1, $x) } elsif ($v<0) { return $err if prefer($x, 1, $b) } else { return $err if prefer($x, 0, $b) } } } return 0; } # Compute all neighbor relationships # for (@_) { my $chain = $_; $chain =~ s/ //g; my @seq = split /([>=<])/, $chain; my $x = shift @seq; while (my $c = shift @seq) { my $z = shift @seq; if ($c eq '>') { return $err if prefer($x, 1, $z); } elsif ($c eq '=') { return $err if prefer($x, 0, $z); } elsif ($c eq '<') { return $err if prefer($z, 1, $x); } else { return "Error in preferences\n" } $x = $z; } } } ########################################################## # # A Example # die if PartialOrder('xx < yy < zz > aa', 'bb < yy = aa > cc'); sub compare { my ($x, $y) = @_; return $pref->{$x}{$y} if exists $pref->{$x}{$y}; return undef; } print "Comparison Table:\n aa bb cc xx yy zz\n"; for my $a (qw(aa bb cc xx yy zz)) { print "$a "; for my $b(qw(aa bb cc xx yy zz)) { my $c = compare $a, $b; $c = qw(- = + .)[defined $c ? 1+$c : 3]; print " $c "; } print "\n"; }

Replies are listed 'Best First'.
Re: Partial Order (cases)
by tye (Sage) on Jul 25, 2007 at 21:12 UTC

    The complexity in your code is due to having to code all of the combining of cases, each in two directions. Simply using two bits, one for '<=', one for '>=', means that the cases can be combined with a simple bit-wise &, greatly reducing the repetition in the code.

    I also moved the checking for special cases to places where each case only needs to be checked once, eliminating many of the checks. Many operations will be done more than once in building the data structure, but the code is simpler.

    #!/usr/bin/perl -w use strict; sub UNK() { 0; } sub LTE() { 1; } sub GTE() { 2; } sub EQU() { 3; } my $Pref= {}; my %OpBits= ( '<' => LTE(), '>' => GTE(), '=' => EQU(), ); my %RevBits= ( UNK() => UNK(), LTE() => GTE(), GTE() => LTE(), EQU() => EQU(), ); Main( @ARGV ); exit( 0 ); sub onePref { my( $x, $bits, $y )= @_; return if $bits == UNK(); if( $Pref->{$x} && exists $Pref->{$x}{$y} ) { die "ERROR contradictory $x $y\n" if $Pref->{$x}{$y} != $bits; } else { $Pref->{$x}{$y}= $bits; $Pref->{$y}{$x}= $RevBits{$bits}; } } sub allPrefs { my( $x, $c, $y )= @_; my $bits= $OpBits{$c}; $Pref->{$x}{$x}= $Pref->{$y}{$y}= EQU(); onePref( $x, $bits, $y ); while( my($z,$v)= each %{$Pref->{$y}} ) { onePref( $x, $bits & $v, $z ); } $bits= $RevBits{$bits}; while( my($z,$v)= each %{$Pref->{$x}} ) { onePref( $y, $bits & $v, $z ); } } sub partialOrder { for my $chain ( @_ ) { while( $chain =~ /\G\s*(\w+)\s*([<=>])\s*(?=(\w+))/gc ) { my( $x, $c, $y )= ( $1, $2, $3 ); allPrefs( $x, $c, $y ); } if( $chain !~ /\G\s*(\w+)\s*$/g ) { die "Invalid preferences ($chain).\n"; } } } sub compare { my( $x, $y )= @_; return $Pref->{$x}{$y} || UNK(); } sub Main { partialOrder( 'xx < yy < zz > aa', 'bb < yy = aa > cc', ); my @vals= sort keys %$Pref; my %bitVal= ( UNK() => '.', LTE() => '-', GTE() => '+', EQU() => '=', ); my $len= 1; for( map length($_), @vals ) { $len= $_ if $len < $_; } for my $v ( @vals ) { $v= sprintf "%-*s", $len, $v; } print "Comparison Table:\n"; print join " ", ' 'x$len, @vals; print $/; for my $x ( @vals ) { print "$x "; for my $y ( @vals ) { my $c= compare( $x, $y ); printf "%-*s", 1+$len, $bitVal{$c}; } print "\n"; } partialOrder( 'ww < ww' ); }

    Update: The code simplifies even further by moving one of the "do both directions" responsibilities up one level in the call stack:

    sub allPrefs { my( $x, $bits, $y )= @_; $Pref->{$x}{$x}= EQU(); onePref( $x, $bits, $y ); while( my($z,$v)= each %{$Pref->{$y}} ) { onePref( $x, $bits & $v, $z ); } } sub partialOrder { for my $chain ( @_ ) { while( $chain =~ /\G\s*(\w+)\s*([<=>])\s*(?=(\w+))/gc ) { my( $x, $c, $y )= ( $1, $2, $3 ); my $bits= $OpBits{$c}; allPrefs( $x, $bits, $y ); allPrefs( $y, $RevBits{$bits}, $x ); } if( $chain !~ /\G\s*(\w+)\s*$/g ) { die "Invalid preferences ($chain).\n"; } } }

    - tye        

      Tye, this is really lovely.

      The way that "x [v1] y [v2] z" always implies "x [v1 & v2] z", for any two relations [v1] and [v2]. This is the code that I ended up using.

      Thanks.

Re: Partial Order
by Limbic~Region (Chancellor) on Jul 25, 2007 at 18:05 UTC
    b4swine,
    To repeat what you are asking:
    • A data structure that supports fast lookups
    • A routine that implements that fast lookup
    • More elegant code

    The last two depend on the first one. One idea might be to use a single level hash instead of a HoH. Something like $lookup{"$obj_a*$obj_b"}. Another possibility is to use an array instead of a hash if there is some sort of efficient obj to index conversion that could be done since array lookups are faster. In fact, there may be no need to use a data structure at all. If you can convert your objects into an ordinal number efficiently, you could just store all the comparisons in a string and unpack the solution.

    Only you are going to be able to answer if any of those ideas are viable. If you gave us some sample data with more information we might be able to propose a working solution.

    Cheers - L~R

Re: Partial Order
by blokhead (Monsignor) on Jul 25, 2007 at 19:23 UTC
    Since you say that the preprocessing step does not need to be super-fast, then I suggest storing it as a graph. Once you add the "base" comparisons as directed edges, you can compute the transitive closure of the graph. To check a comparison query, you just have to see whether the edge exists in the transitive closure, which is a very fast operation if the graph is pre-computed.

    The only thing to watch out for is the equalities in the data. As long as we're using graphs, you can use an undirected graph to keep track of which items are equivalent, so you can map the members of each equivalence class to a unique representative. Also, if something comes in the data that says xx>yy and later yy>xx, should I infer that xx=yy, or is that an error? My sample code below treats it as an error (the resulting graph is not a DAG), but I suppose you could do either.

    Here is a rough proof-of-concept that I hope doesn't have too many bugs.

    use strict; use Graph::Undirected; use Graph::Directed; my @data = qw[ xx<yy yy<zz zz>aa bb<yy yy=aa aa>cc ]; ## take all equalities in the data, map each item to ## a canonical (equivalent) representative. my $eq = Graph::Undirected->new; $eq->add_edge( split /=/, $_ ) for grep /=/, @data; my %canonical; for ($eq->connected_components) { my $representative = $_->[0]; for (@$_) { $canonical{$_} = $representative; } } sub canonical { exists $canonical{$_[0]} ? $canonical{$_[0]} : $_[0]; } ## take all inequalities in the data, make a dag from them ## and take the transitive closure. my $g = Graph::Directed->new; for (grep !/=/, @data) { my ($x, $y) = split /<|>/, $_; ($x,$y) = ($y,$x) if /</; $g->add_edge( canonical($x), canonical($y) ); } die "Not a DAG!" unless $g->is_dag; $g = $g->transitive_closure; sub compare { my ($x,$y) = map canonical($_), @_; return 0 if $x eq $y; return 1 if $g->has_edge($x,$y); return -1 if $g->has_edge($y,$x); return 0; } my @items = qw[ aa bb cc xx yy zz ]; for my $x (@items) { for my $y (@items) { print "$x, $y: " . compare($x,$y). $/; } }

    blokhead

      blokhead,
      To check a comparison query, you just have to see whether the edge exists in the transitive closure, which is a very fast operation if the graph is pre-computed.

      No offense, but your compare() sub looks like it would be much slower than the OP's HoH solution. Since this is the sub that will be called millions of times, have you Benchmarked your assertion? Especially if the OP's compare() became return $lookup{$x}{$y}; since undef is implied.

      Cheers - L~R

        I admit I didn't study it the OP's code in great detail. Now I see that it probably does an equivalent thing to what mine does: infer all transitive relations from the data and store them in an object that supports individual queries. The OP code infers the transitive relations in some ad-hoc way, storing in a hash; mine uses a graph module and stores the results in a graph. Of course, you can probably guess that I prefer the elegance of using the high-level abstraction of graph theory ;) (probably because I don't want to re-implement transitive closure logic) So perhaps my previous post should be interpreted as advice on computing the transitive closure using pre-existing hammers instead of an ad-hoc method.

        As for efficiency of the compare() sub.. I doubt mine is insanely slow. At worst, the calls to $g->has_edge would cost a few hash lookups in the internals of Graph. If optimization really is a concern at such a low level, you're right -- nothing could be faster than a simple hash lookup. Of course, you could preemptively convert the whole graph's adjacency information into a HoH after doing the transitive closure. For only 50 items, it should be no problem.

        Here is an interesting tradeoff that the OP might be willing to consider (described in the language of graphs, but could be also be coded using the HoH of adjacencies). At the moment, both my code and the OP's essentially precompute answers to all possible queries, which on some level might feel inelegant. A more interesting tradeoff is to put all the relations into the graph, but don't compute the transitive closure right away. Instead, whenever a query comes, do a reachability search in the graph (say, breadth-first search from one of the points). For every path that you happen to inspect at along the way, add in all the edges which are inferred by the transitivity property. The next time you query the graph, there will be new "shortcuts" that you can exploit. Eventually, with enough queries, the graph slowly approaches the transitive closure of the original graph.

        blokhead