Re: Merge lists of ordered strings into canonical ordering
by tybalt89 (Monsignor) on Sep 19, 2024 at 21:52 UTC
|
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11161855
use warnings;
use List::AllUtils qw( min_by nsort_by );
$SIG{__WARN__} = sub { die @_ };
my @list1= qw( K B I S Y A Q );
my @list2= qw( B S N A );
my @list3= qw( Y N Q );
my $combined= fancy_algorithm(\@list1, \@list2, \@list3);
use Data::Dump 'dd'; dd 'combined', $combined;
my @lists= (
[qw( X P Y )],
[qw( X B Y N )],
[qw( P B N )],
[qw( X B P N )],
[qw( X B P Y )],
[qw( A Z )],
[qw( A K L )],
);
$combined= fancy_algorithm(@lists);
use Data::Dump 'dd'; dd 'combined', $combined;
sub fancy_algorithm
{
my (%names, @order);
@names{ @$_ } = () for my @args = @_;
use Data::Dump 'dd'; dd 'input', \@args;
while( @args )
{
my %names;
my $see = 1;
$names{$_} += $see++ for map @$_, @args;
my @names = nsort_by { $names{$_} } sort +keys %names;
my %score;
for ( @args )
{
my ($head, @rest) = @$_;
$score{$head} -= 1e9;
$score{$_} //= 0 for @rest;
}
my $pick = min_by { $score{$_} } @names;
push @order, $pick;
$_ = [ grep $pick ne $_, @$_ ] for @args;
@args = grep @$_, @args;
}
return \@order;
}
Outputs:
(
"input",
[
["K", "B", "I", "S", "Y", "A", "Q"],
["B", "S", "N", "A"],
["Y", "N", "Q"],
],
)
("combined", ["K", "B", "I", "S", "Y", "N", "A", "Q"])
(
"input",
[
["X", "P", "Y"],
["X", "B", "Y", "N"],
["P", "B", "N"],
["X", "B", "P", "N"],
["X", "B", "P", "Y"],
["A", "Z"],
["A", "K", "L"],
],
)
("combined", ["X", "B", "P", "Y", "N", "A", "Z", "K", "L"])
| [reply] [d/l] [select] |
|
|
Very interesting! It took me a while to figure out what you were doing here - basically just assuming that the next in the sequence must be found at the start of the remaining options, choosing one that is either the most occurrences at starts of remaining lists or first seen, removing it from the options, then repeat? It seems like you can simplify that a bit though. (I'm not sure what the 1e9 was about unless that was from an earlier attempt?)
sub fancy_algorithm {
my @args= @_;
p @args;
my @order;
while (@args) {
my ($pick, %score, %seen_order);
my $see = 1;
$seen_order{$_} += $see++ for map @$_, @args;
for (@args) {
my $n= --$score{$_->[0]};
$pick= $_->[0] if !defined $pick or $n < $score{$pick}
or $n == $score{$pick} && $seen_order{$_->[0]} < $seen_ord
+er{$pick};
}
push @order, $pick;
$_ = [ grep $pick ne $_, @$_ ] for @args;
@args = grep @$_, @args;
}
return \@order;
}
It seems to work in every case I've tried, though it gives less ideal results with:
my @lists= (
[qw( S T )],
[qw( S U )],
[qw( Y Z )],
[qw( X Y )],
[qw( W X )],
[qw( V W )],
[qw( U V )],
);
# S U V W X Y T Z
| [reply] [d/l] [select] |
Re: Merge lists of ordered strings into canonical ordering
by tybalt89 (Monsignor) on Sep 20, 2024 at 16:24 UTC
|
Sometimes algorithms are shorter as strings instead of arrays.
Not quite the same answer as yours, but I'd claim it's good enough :)
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11161855
use warnings;
use List::AllUtils qw( max_by );
$SIG{__WARN__} = sub { die @_ };
my @list1= qw( K B I S Y A Q );
my @list2= qw( B S N A );
my @list3= qw( Y N Q );
my $combined= fancy_algorithm(\@list1, \@list2, \@list3);
use Data::Dump 'dd'; dd 'combined', $combined;
my @lists= (
[qw( X P Y )],
[qw( X B Y N )],
[qw( P B N )],
[qw( X B P N )],
[qw( X B P Y )],
[qw( A Z )],
[qw( A K L )],
);
$combined= fancy_algorithm(@lists);
use Data::Dump 'dd'; dd 'combined', $combined;
@lists= (
[qw( S T )],
[qw( S U )],
[qw( Y Z )],
[qw( X Y )],
[qw( W X )],
[qw( V W )],
[qw( U V )],
);
$combined= fancy_algorithm(@lists);
use Data::Dump 'dd'; dd 'combined', $combined;
sub fancy_algorithm
{
my $rule = join "\n", map "@$_", @_;
my @order;
print "$rule\n"; # FIXME only for testing
while( $rule =~ /\S/ )
{
my ($head) = max_by { my $n = () = $rule =~ /\b$_\b/g }
grep { $rule !~ /\w +$_\b/ } my @front = sort $rule =~ /^ *(\w+)
+\b/gm;
$head //= shift @front; # for breaking cycles
push @order, $head;
$rule =~ s/\b$head\b//g;
}
return \@order;
}
Outputs:
K B I S Y A Q
B S N A
Y N Q
("combined", ["K", "B", "I", "S", "Y", "N", "A", "Q"])
X P Y
X B Y N
P B N
X B P N
X B P Y
A Z
A K L
("combined", ["X", "A", "K", "L", "Z", "B", "P", "Y", "N"])
S T
S U
Y Z
X Y
W X
V W
U V
("combined", ["S", "U" .. "Y", "T", "Z"])
| [reply] [d/l] [select] |
Re: Merge lists of ordered strings into canonical ordering
by hv (Prior) on Sep 19, 2024 at 18:46 UTC
|
I think a graph approach makes sense, as suggested by Anonymous Monk. I think you can do this using a directed Graph with countedged to record the fact that an ordering has been seen multiple times, then break loops by reducing the weight of all edges in the loop by the minimum of them and removing any edge that reaches zero. It's not immediately obvious to me what should happen when there are multiple intersecting loops - you'd probably need to experiment to find a viable approach for that case.
| [reply] |
Re: Merge lists of ordered strings into canonical ordering
by Corion (Patriarch) on Sep 19, 2024 at 18:18 UTC
|
I would look at Algorithm::Diff or simply sequentially merging the columns into the master result until all columns have been merged into their respective places.
| [reply] |
|
|
The problem with this approach is it can't preserve the unknown status of the orderings, to be resolved by later orderings. In the first example, "S Y A" and "S N A" should combine in a way that expresses a very low preference for Y coming before N, and then that low preference should get overridden by the strong preference for Y coming before N by the 3rd rule. (and it would have made a better example if I put N before Y in the 3rd rule because then the strong preference would be contrary to the default order)
| [reply] |
Re: Merge lists of ordered strings into canonical ordering
by Anonymous Monk on Sep 19, 2024 at 18:15 UTC
|
| [reply] |
|
|
Topological sort is basically it, but with the catch that sometimes the user contradicts themself.
| [reply] |
Re: Merge lists of ordered strings into canonical ordering
by LanX (Saint) on Sep 19, 2024 at 21:00 UTC
|
I missed your second test, not sure if it's because --
- I don't understand your criteria "select the ordering that occurs more frequently"
- avoiding ambiguity can't be tested
- or my code is faulty
Could you please elaborate where and why your $combined2 is better?
(UPDATE: If there is a quantifiable reason why "X" should be ordered before "A", I'm sure this can be added to the sort function, see cmp_more_frequently($a,$b) )
The basic idea is to build a sort function which sorts by the combined less-then relation stored in a HoH %cmp.
Contradictory orders are calculated by increments and decrements.
updates
- renamed %lt to %cmp
- DISCLAIMER: This is a hack, and can be done better and more elegantly, and I'm sure it lacks proper testing.
- OK there is definitely a bug, looking into it
| [reply] [d/l] [select] |
|
|
use v5.12;
use warnings;
use Data::Dump qw/pp dd ddx/;
use Test::More;
my @list1= qw( K B I S Y A Q );
my @list2= qw( B S N A );
my @list3= qw( Y N Q );
my $combined = combined_order(\@list1, \@list2, \@list3);
my $expected = [qw( K B I S Y N A Q )];
is_deeply( $combined, $expected )
or diag pp { got => $combined, exp => $expected };
my @lists2= (
[qw( X P Y )],
[qw( X B Y N )],
[qw( P B N )],
[qw( X B P N )],
[qw( X B P Y )],
[qw( A Z )],
[qw( A K L )],
);
# my $dbg_sort = gen_sort_by_orders(@lists2);
# ($a,$b) = qw/B X/;
# is(signum(&$dbg_sort), 1, "$a cmp $b");
# ($a,$b) = qw/X B/;
# is(signum(&$dbg_sort), -1, "$a cmp $b");
my $combined2 = combined_order(@lists2);
my $expected2 = [qw( X B P Y N A Z K L )];
is_deeply( $combined2, $expected2 )
or diag pp { got => $combined2, exp => $expected2 };
sub combined_order {
my @orders = @_;
my @total = map {@$_} @orders;
my %uniq;
@uniq{@total} =();
@total = keys %uniq;
my $osort = gen_sort_by_orders (@orders);
#ddx @total;
return [ sort {&$osort} @total ];
}
sub signum {
my ($val) = @_;
return 0 unless $val; # 0 or undef
return $val / abs($val);
}
sub gen_sort_by_orders {
my @orders = @_;
my %cmp; # relation matrix
my %count;
for my $order (@orders) {
my @list = @$order;
while (my $before = shift @list) {
for my $after (@list) {
# comparing lesser < greater is enough bc of symmetry
$before lt $after
? $cmp{$before}{$after}--
: $cmp{$after}{$before}++;
# we count the relations for second criteria
$count{$before}++;
$count{$after}++;
}
}
}
ddx { "Matrix:" => \%cmp, "Count:" => \%count }; # debu
+g
my $osort = sub {
return
(
$a lt $b
? ( $cmp{$a}{$b} // 0 )
: - ( $cmp{$b}{$a} // 0 )
)
|| ( $count{$b} - $count{$a} ) # more relations win
|| $a cmp $b # default anti ambiguity
};
return $osort;
}
done_testing;
perl /home/lanx/perl/multisort.pl
# multisort.pl:83: {
# "Count:" => { A => 9, B => 9, I => 6, K => 6, N => 5, Q => 8, S =
+> 9, Y => 8 },
# "Matrix:" => {
# A => { B => 2, I => 1, K => 1, N => 1, Q => -1, S =
+> 2, Y => 1 },
# B => { I => -1, K => 1, N => -1, Q => -1, S => -2,
+Y => -1 },
# I => { K => 1, Q => -1, S => -1, Y => -1 },
# K => { Q => -1, S => -1, Y => -1 },
# N => { Q => -1, S => 1, Y => 1 },
# Q => { S => 1, Y => 2 },
# S => { Y => -1 },
# },
# }
ok 1
# multisort.pl:83: {
# "Count:" => { A => 3, B => 11, K => 2, L => 2, N => 8, P => 10, X
+ => 11, Y => 8, Z => 1 },
# "Matrix:" => {
# A => { K => -1, L => -1, Z => -1 },
# B => { N => -3, P => -1, X => 3, Y => -2 },
# K => { L => -1 },
# N => { P => 2, X => 2, Y => 1 },
# P => { X => 3, Y => -2 },
# X => { Y => -3 },
# },
# }
not ok 2
# Failed test at /home/lanx/perl/multisort.pl line 39.
# Structures begin differing at:
# $got->[6] = 'K'
# $expected->[6] = 'Z'
# {
# exp => ["X", "B", "P", "Y", "N", "A", "Z", "K", "L"],
# got => ["X", "B", "P", "Y", "N", "A", "K", "L", "Z"],
# }
1..2
# Looks like you failed 1 test of 2.
| [reply] [d/l] [select] |
|
|
Thanks! This is pretty similar to what I had, but yours works :-)
| [reply] |
|
|
|
|
Could you please elaborate where and why your $combined2 is better?
(UPDATE: If there is a quantifiable reason why "X" should be ordered before "A", I'm sure this can be added to the sort function, see cmp_more_frequently($a,$b) )
It wasn't really a requirement, other than that every time you give the algorithm those inputs in that order, you get the same output. In other words I don't want it to depend on hash randomization, so I think original input order should be used as a tie-breaker. When I built the example, I put 'A' after everything else since there was no relation to the previous rules and the rule with 'A' came later. Same with 'Z' vs. 'K'.
| [reply] |
Re: Merge lists of ordered strings into canonical ordering
by ikegami (Patriarch) on Sep 20, 2024 at 16:32 UTC
|
Nevermind. Second example contradicts this.
You're asking for an order-preserving duplicate remover.
my %seen;
my @combined = grep !$seen{ $_ }++, @list1, @list2, @list3;
use List::Util qw( uniq );
my @combined = uniq @list1, @list2, @list3;
| [reply] [d/l] [select] |