Merge lists of ordered strings into canonical ordering

NERDVANA has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Merge lists of ordered strings into canonical ordering by tybalt89 (Monsignor) on Sep 19, 2024 at 21:52 UTC
#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11161855 use warnings; use List::AllUtils qw( min_by nsort_by ); $SIG{__WARN__} = sub { die @_ }; my @list1= qw( K B I S Y A Q ); my @list2= qw( B S N A ); my @list3= qw( Y N Q ); my $combined= fancy_algorithm(\@list1, \@list2, \@list3); use Data::Dump 'dd'; dd 'combined', $combined; my @lists= ( [qw( X P Y )], [qw( X B Y N )], [qw( P B N )], [qw( X B P N )], [qw( X B P Y )], [qw( A Z )], [qw( A K L )], ); $combined= fancy_algorithm(@lists); use Data::Dump 'dd'; dd 'combined', $combined; sub fancy_algorithm { my (%names, @order); @names{ @$_ } = () for my @args = @_; use Data::Dump 'dd'; dd 'input', \@args; while( @args ) { my %names; my $see = 1; $names{$_} += $see++ for map @$_, @args; my @names = nsort_by { $names{$_} } sort +keys %names; my %score; for ( @args ) { my ($head, @rest) = @$_; $score{$head} -= 1e9; $score{$_} //= 0 for @rest; } my $pick = min_by { $score{$_} } @names; push @order, $pick; $_ = [ grep $pick ne $_, @$_ ] for @args; @args = grep @$_, @args; } return \@order; } [download] Outputs: `( "input", [ ["K", "B", "I", "S", "Y", "A", "Q"], ["B", "S", "N", "A"], ["Y", "N", "Q"], ], ) ("combined", ["K", "B", "I", "S", "Y", "N", "A", "Q"]) ( "input", [ ["X", "P", "Y"], ["X", "B", "Y", "N"], ["P", "B", "N"], ["X", "B", "P", "N"], ["X", "B", "P", "Y"], ["A", "Z"], ["A", "K", "L"], ], ) ("combined", ["X", "B", "P", "Y", "N", "A", "Z", "K", "L"])` [download]	[reply] [d/l] [select]
Re^2: Merge lists of ordered strings into canonical ordering by NERDVANA (Priest) on Sep 20, 2024 at 05:47 UTC
Very interesting! It took me a while to figure out what you were doing here - basically just assuming that the next in the sequence must be found at the start of the remaining options, choosing one that is either the most occurrences at starts of remaining lists or first seen, removing it from the options, then repeat? It seems like you can simplify that a bit though. (I'm not sure what the 1e9 was about unless that was from an earlier attempt?) `sub fancy_algorithm { my @args= @_; p @args; my @order; while (@args) { my ($pick, %score, %seen_order); my $see = 1; $seen_order{$_} += $see++ for map @$_, @args; for (@args) { my $n= --$score{$_->[0]}; $pick= $_->[0] if !defined $pick or $n < $score{$pick} or $n == $score{$pick} && $seen_order{$_->[0]} < $seen_ord +er{$pick}; } push @order, $pick; $_ = [ grep $pick ne $_, @$_ ] for @args; @args = grep @$_, @args; } return \@order; }` [download] It seems to work in every case I've tried, though it gives less ideal results with: `my @lists= ( [qw( S T )], [qw( S U )], [qw( Y Z )], [qw( X Y )], [qw( W X )], [qw( V W )], [qw( U V )], ); # S U V W X Y T Z` [download]	[reply] [d/l] [select]
Re: Merge lists of ordered strings into canonical ordering by tybalt89 (Monsignor) on Sep 20, 2024 at 16:24 UTC
Sometimes algorithms are shorter as strings instead of arrays. Not quite the same answer as yours, but I'd claim it's good enough :) #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11161855 use warnings; use List::AllUtils qw( max_by ); $SIG{__WARN__} = sub { die @_ }; my @list1= qw( K B I S Y A Q ); my @list2= qw( B S N A ); my @list3= qw( Y N Q ); my $combined= fancy_algorithm(\@list1, \@list2, \@list3); use Data::Dump 'dd'; dd 'combined', $combined; my @lists= ( [qw( X P Y )], [qw( X B Y N )], [qw( P B N )], [qw( X B P N )], [qw( X B P Y )], [qw( A Z )], [qw( A K L )], ); $combined= fancy_algorithm(@lists); use Data::Dump 'dd'; dd 'combined', $combined; @lists= ( [qw( S T )], [qw( S U )], [qw( Y Z )], [qw( X Y )], [qw( W X )], [qw( V W )], [qw( U V )], ); $combined= fancy_algorithm(@lists); use Data::Dump 'dd'; dd 'combined', $combined; sub fancy_algorithm { my $rule = join "\n", map "@$_", @_; my @order; print "$rule\n"; # FIXME only for testing while( $rule =~ /\S/ ) { my ($head) = max_by { my $n = () = $rule =~ /\b$_\b/g } grep { $rule !~ /\w +$_\b/ } my @front = sort $rule =~ /^ *(\w+) +\b/gm; $head //= shift @front; # for breaking cycles push @order, $head; $rule =~ s/\b$head\b//g; } return \@order; } [download] Outputs: `K B I S Y A Q B S N A Y N Q ("combined", ["K", "B", "I", "S", "Y", "N", "A", "Q"]) X P Y X B Y N P B N X B P N X B P Y A Z A K L ("combined", ["X", "A", "K", "L", "Z", "B", "P", "Y", "N"]) S T S U Y Z X Y W X V W U V ("combined", ["S", "U" .. "Y", "T", "Z"])` [download]	[reply] [d/l] [select]
Re: Merge lists of ordered strings into canonical ordering by hv (Prior) on Sep 19, 2024 at 18:46 UTC
I think a graph approach makes sense, as suggested by Anonymous Monk. I think you can do this using a directed Graph with countedged to record the fact that an ordering has been seen multiple times, then break loops by reducing the weight of all edges in the loop by the minimum of them and removing any edge that reaches zero. It's not immediately obvious to me what should happen when there are multiple intersecting loops - you'd probably need to experiment to find a viable approach for that case.	[reply]
Re: Merge lists of ordered strings into canonical ordering by Corion (Patriarch) on Sep 19, 2024 at 18:18 UTC
I would look at Algorithm::Diff or simply sequentially merging the columns into the master result until all columns have been merged into their respective places.	[reply]
Re^2: Merge lists of ordered strings into canonical ordering by NERDVANA (Priest) on Sep 19, 2024 at 18:57 UTC
The problem with this approach is it can't preserve the unknown status of the orderings, to be resolved by later orderings. In the first example, "S Y A" and "S N A" should combine in a way that expresses a very low preference for Y coming before N, and then that low preference should get overridden by the strong preference for Y coming before N by the 3rd rule. (and it would have made a better example if I put N before Y in the 3rd rule because then the strong preference would be contrary to the default order)	[reply]
Re: Merge lists of ordered strings into canonical ordering by Anonymous Monk on Sep 19, 2024 at 18:15 UTC
Dependency graph / Topological sorting ?	[reply]
Re^2: Merge lists of ordered strings into canonical ordering by NERDVANA (Priest) on Sep 19, 2024 at 19:04 UTC
Topological sort is basically it, but with the catch that sometimes the user contradicts themself.	[reply]
Re: Merge lists of ordered strings into canonical ordering by LanX (Saint) on Sep 19, 2024 at 21:00 UTC
I missed your second test, not sure if it's because -- I don't understand your criteria "select the ordering that occurs more frequently" avoiding ambiguity can't be tested or my code is faulty Could you please elaborate where and why your $combined2 is better? (UPDATE: If there is a quantifiable reason why "X" should be ordered before "A", I'm sure this can be added to the sort function, see `cmp_more_frequently($a,$b)` ) The basic idea is to build a sort function which sorts by the combined less-then relation stored in a HoH %cmp. Contradictory orders are calculated by increments and decrements. <Reveal this spoiler or all in this thread> Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery} updates renamed %lt to %cmp DISCLAIMER: This is a hack, and can be done better and more elegantly, and I'm sure it lacks proper testing. OK there is definitely a bug, looking into it	[reply] [d/l] [select]
Re^2: Merge lists of ordered strings into canonical ordering by LanX (Saint) on Sep 19, 2024 at 23:38 UTC
So I fixed the bug in the $osort (precedence) and my second "non-ambiguity" test still "fails", but is only diverging for "Z" from yours. And I think my solution is better because Z has only one relation. (YMMV) use v5.12; use warnings; use Data::Dump qw/pp dd ddx/; use Test::More; my @list1= qw( K B I S Y A Q ); my @list2= qw( B S N A ); my @list3= qw( Y N Q ); my $combined = combined_order(\@list1, \@list2, \@list3); my $expected = [qw( K B I S Y N A Q )]; is_deeply( $combined, $expected ) or diag pp { got => $combined, exp => $expected }; my @lists2= ( [qw( X P Y )], [qw( X B Y N )], [qw( P B N )], [qw( X B P N )], [qw( X B P Y )], [qw( A Z )], [qw( A K L )], ); # my $dbg_sort = gen_sort_by_orders(@lists2); # ($a,$b) = qw/B X/; # is(signum(&$dbg_sort), 1, "$a cmp $b"); # ($a,$b) = qw/X B/; # is(signum(&$dbg_sort), -1, "$a cmp $b"); my $combined2 = combined_order(@lists2); my $expected2 = [qw( X B P Y N A Z K L )]; is_deeply( $combined2, $expected2 ) or diag pp { got => $combined2, exp => $expected2 }; sub combined_order { my @orders = @_; my @total = map {@$_} @orders; my %uniq; @uniq{@total} =(); @total = keys %uniq; my $osort = gen_sort_by_orders (@orders); #ddx @total; return [ sort {&$osort} @total ]; } sub signum { my ($val) = @_; return 0 unless $val; # 0 or undef return $val / abs($val); } sub gen_sort_by_orders { my @orders = @_; my %cmp; # relation matrix my %count; for my $order (@orders) { my @list = @$order; while (my $before = shift @list) { for my $after (@list) { # comparing lesser < greater is enough bc of symmetry $before lt $after ? $cmp{$before}{$after}-- : $cmp{$after}{$before}++; # we count the relations for second criteria $count{$before}++; $count{$after}++; } } } ddx { "Matrix:" => \%cmp, "Count:" => \%count }; # debu +g my $osort = sub { return ( $a lt $b ? ( $cmp{$a}{$b} // 0 ) : - ( $cmp{$b}{$a} // 0 ) ) \|\| ( $count{$b} - $count{$a} ) # more relations win \|\| $a cmp $b # default anti ambiguity }; return $osort; } done_testing; [download] perl /home/lanx/perl/multisort.pl # multisort.pl:83: { # "Count:" => { A => 9, B => 9, I => 6, K => 6, N => 5, Q => 8, S = +> 9, Y => 8 }, # "Matrix:" => { # A => { B => 2, I => 1, K => 1, N => 1, Q => -1, S = +> 2, Y => 1 }, # B => { I => -1, K => 1, N => -1, Q => -1, S => -2, +Y => -1 }, # I => { K => 1, Q => -1, S => -1, Y => -1 }, # K => { Q => -1, S => -1, Y => -1 }, # N => { Q => -1, S => 1, Y => 1 }, # Q => { S => 1, Y => 2 }, # S => { Y => -1 }, # }, # } ok 1 # multisort.pl:83: { # "Count:" => { A => 3, B => 11, K => 2, L => 2, N => 8, P => 10, X + => 11, Y => 8, Z => 1 }, # "Matrix:" => { # A => { K => -1, L => -1, Z => -1 }, # B => { N => -3, P => -1, X => 3, Y => -2 }, # K => { L => -1 }, # N => { P => 2, X => 2, Y => 1 }, # P => { X => 3, Y => -2 }, # X => { Y => -3 }, # }, # } not ok 2 # Failed test at /home/lanx/perl/multisort.pl line 39. # Structures begin differing at: # $got->[6] = 'K' # $expected->[6] = 'Z' # { # exp => ["X", "B", "P", "Y", "N", "A", "Z", "K", "L"], # got => ["X", "B", "P", "Y", "N", "A", "K", "L", "Z"], # } 1..2 # Looks like you failed 1 test of 2. [download] Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^3: Merge lists of ordered strings into canonical ordering by NERDVANA (Priest) on Sep 20, 2024 at 06:51 UTC
Thanks! This is pretty similar to what I had, but yours works :-)	[reply]
Re^4: Merge lists of ordered strings into canonical ordering by LanX (Saint) on Sep 20, 2024 at 11:41 UTC
Re^2: Merge lists of ordered strings into canonical ordering by NERDVANA (Priest) on Sep 20, 2024 at 00:19 UTC
Could you please elaborate where and why your $combined2 is better? (UPDATE: If there is a quantifiable reason why "X" should be ordered before "A", I'm sure this can be added to the sort function, see cmp_more_frequently($a,$b) ) It wasn't really a requirement, other than that every time you give the algorithm those inputs in that order, you get the same output. In other words I don't want it to depend on hash randomization, so I think original input order should be used as a tie-breaker. When I built the example, I put 'A' after everything else since there was no relation to the previous rules and the rule with 'A' came later. Same with 'Z' vs. 'K'.	[reply]
Re: Merge lists of ordered strings into canonical ordering by ikegami (Patriarch) on Sep 20, 2024 at 16:32 UTC
Nevermind. Second example contradicts this. You're asking for an order-preserving duplicate remover. `my %seen; my @combined = grep !$seen{ $_ }++, @list1, @list2, @list3;` [download] `use List::Util qw( uniq ); my @combined = uniq @list1, @list2, @list3;` [download]	[reply] [d/l] [select]

updates