Re: Sort mechanics problems with objects and potentially contradicting comparisons (can't cause infinite loop)
by tye (Sage) on Jun 01, 2016 at 16:51 UTC
|
which one would expect to cause sort to go into an infinite loop
Not this "one". And the documentation is pretty clear that this can't happen:
In 5.7, the quicksort implementation was replaced with a stable mergesort algorithm whose worst-case behavior is O(NlogN).
An infinite loops is a way worse case than O(NlogN).
And, no, I'm pretty sure that the following bit in the documentation doesn't trump that part:
If it returns inconsistent results (sometimes saying $x[1] is less than $x[2] and sometimes saying the opposite, for example) the results are not well-defined.
"Not well-defined" means that you might not be able to predict the order in which the items are returned. All of the items will be returned, one alias each, and in under O(NlogN) "time". It is just their precise order that is not well-defined.
| [reply] [d/l] [select] |
Re: Sort mechanics problems ...
by haukex (Archbishop) on Jun 01, 2016 at 18:34 UTC
|
use Graph;
my $g = Graph->new(directed => 1);
$g->add_edge(split /:/)
for qw/ a:d a:b c:b d:c /;
my @order = $g->topological_sort;
print join(", ", @order), "\n";
__END__
a, d, c, b
Hope this helps, -- Hauke D | [reply] [d/l] |
|
|
You are indeed correct that the classic solution to the problem is a graph and the type of sort is topological. However, the perl code is not compelled to support every step of the theory. It turns out to be much simpler in practice with perl sort:
if $a is dependent on $b, return -1
if the reverse, return 1
otherwise return 0
this lets $a and $b remain unsorted relative to each other where there is no dependency.
| [reply] [d/l] |
|
|
...But, as you've discovered, it's not actually that simple. Another issue is that you'll need a stable sort algorithm for this to work (see the sort pragma). We're trying to tell you that sort is really not the right tool for this job.
| [reply] |
|
|
This was definitely the most helpful answer. Thanks for taking the trouble!
| [reply] |
Re: Sort mechanics problems ...
by haukex (Archbishop) on Jun 01, 2016 at 15:52 UTC
|
The syntax for custom sort routines is sometimes tricky to get right - towards the end of the sort page there are some examples of the syntax. It sounds like your problem might be fixed with the syntax sort {$obj->_DBsort4create(...args...)} keys %$csvQ, but I'm not sure as you haven't provided runnable example code...
As for your second question, I haven't yet wrapped my head around the potential problem you're describing (no sample code that demonstrates the infinite loop?), but if you're asking about optimizing the sort, then maybe a Schwartzian transform will help?
Regards, -- Hauke D
| [reply] [d/l] |
|
|
Yes I agree with the idea of enclosing the sort call in a block, especially given that Perl documents this approach and it appears all the mechanics are fine, except the potential for infinite loop needing to be caught.
The problem with sample code that is already wrong is that it night have a different problem from the same theoretical weakness. So the issue of circular dependencies should be resolved in theory before writing any more code.
The optimisation part is best done using an orcish manoeuvre than a schwartzian transform because it is a question of not doing the same processing twice although the sort algorithm will call the function many times with the same table appearing in $a or $b for successive iterations.
Except that this time, apart from storing past processing results of individual elements to prevent reiterating them I need also to store past comparisons to check for circular dependencies.
| [reply] |
|
|
The problem with sample code that is already wrong is that it night have a different problem from the same theoretical weakness. So the issue of circular dependencies should be resolved in theory before writing any more code.
Even if we discuss theory, you still need to communicate the potential problem somehow. So I disagree that code isn't useful at this point - it should be possible to construct a piece of sample code that demonstrates the infinite loop issue that you are worried about, even if it takes one or two iterations.
Regards, -- Hauke D
| [reply] |
|
|
|
|
|
|
|
Re: Sort mechanics problems with objects and potentially contradicting comparisons (would cause infinite loop)
by ikegami (Patriarch) on Jun 01, 2016 at 16:50 UTC
|
perl -e'
use strict;
use warnings;
use Data::Dumper qw( Dumper );
sub _DBsort4create {
my $obj = shift;
print(Dumper($obj));
0
}
my $csvQ = { x => 123, y => 456 };
for my $csvq ( sort _DBsort4create keys %$csvQ ) {
}
'
$VAR1 = undef;
If it does happen to return the right object, you are modifying a stack you shouldn't be modifying, which should lead to dire repercussions.
| [reply] [d/l] |
|
|
In your example no object has been blessed
| [reply] |
|
|
Do you really think that adding my $o = bless({}); is going to make a difference? Why? Did you try it?
| [reply] [d/l] |
|
|
|
|
|
|
sub compare {
my $x = shift;
print "\$a = $a, \$b = $b, \$x = $x, \@_ = @_\n";
$a cmp $b;
}
sub test { sort compare 'z','a','b' }
my @z = test('foo', 'bar', 'baz');
print "\@z = @z\n";
output:
$a = z, $b = a, $x = foo, @_ = bar baz
$a = a, $b = b, $x = bar, @_ = baz
$a = b, $b = z, $x = baz, @_ =
@z = a b z
| [reply] [d/l] |
Re: Sort mechanics problems with objects and potentially contradicting comparisons (would cause infinite loop)
by Anonymous Monk on Jun 01, 2016 at 15:47 UTC
|
sort { _compare($obj, $a, $b) } @stuff | [reply] [d/l] |
|
|
That's a way to avoid the undocumented feature, but it is also nonstandard. I'll experiment with a variation though:
sort { $obj->_compare }
which also passes $obj as first parameter and see if I still get $a and $b for free, which IS documented magic. Many thanks for the suggestion.
| [reply] [d/l] |
|
|
and see if I still get $a and $b for free
You can get at them:
use warnings;
use strict;
{ package Bar;
my $obj = Foo->new;
print join ", ",
sort {$obj->mysort($a,$b,"bar")} qw/x a o/;
}
{ package Foo;
sub new { bless {}, shift }
use Data::Dump 'pp';
sub mysort {
pp \@_;
$_[0] cmp $_[1];
}
}
__END__
[bless({}, "Foo"), "x", "a", "bar"]
[bless({}, "Foo"), "x", "o", "bar"]
[bless({}, "Foo"), "o", "a", "bar"]
x, o, a
Updated example: Previously I had mysort reaching into the main package via $::a, which is obviously not the best solution, now I'm passing $a and $b as parameters. (I see the AM made the same suggestion.) Update 2: There are other solutions possible as well.
Hope this helps, -- Hauke D | [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
|
|
|
|
Re: Sort mechanics problems with objects and potentially contradicting comparisons (would cause infinite loop)
by Anonymous Monk on Jun 01, 2016 at 15:51 UTC
|
For part 2, it sounds like you're trying to do some kind of topological sort, but I'm not sure of the details. | [reply] |
|
|
There are several modules available, but I haven't tried them. It may actually be easier to roll your own than to figure out how the modules work. topological sort
| [reply] |
|
|
I am beginning to agree with you on that. Although I need both a sorted list of groups and a hash for looking up what is in which group to do the homegrown thing, it is not so big a stretch to go from the orcish manoeuvre to a group sort on the same basis that handles the topological versus circular dependency scenario safely.
| [reply] |
Re: Sort mechanics problems with objects and potentially contradicting comparisons (would cause infinite loop)
by anonymized user 468275 (Curate) on Jun 02, 2016 at 08:03 UTC
|
So I decided to let the database handle any circular dependencies (it will fail to define one of the tables and I have to then roll back everything) but apart from that I have a working topological sort. Thanks to all who helped out.
my $csvQ = {};
for my $csv ($obj->_globcsv) {
my $base = basename $csv;
$base =~ s/\.txt$//;
$base = lc($base);
$obj->_csv2arr($csv);
$csvQ->{$base}{table} = $obj->_getset('ar');
}
$obj->_getset('csvQ', $csvQ);
for my $table ( sort _DBsort4create keys %$csvQ ) {
$obj->_getset('ar', $csvQ->{$table}{table});
$obj->_arr2db; # process the csv into the db
}
sub _DBsort4create {
my $obj = shift;
my $csvQ = $obj->_getset('csvQ');
for my $ab ($a, $b) {
unless(exists($csvQ->{$ab}{ancestor})) {
$csvQ->{$ab}{ancestor} = {};
ROWDBS4C: for (my $row=1; $row <= $#{$csvQ->{$ab}{table}}; $ro
+w++) { # avoid first row, which is a header not csv data
($csvQ->{$ab}{table}[$row][0] eq '__COLUMNS__')
and last ROWDBS4C;
if ($csvQ->{$ab}{table}[$row][0] eq 'FK') {
$csvQ->{$ab}{ancestor}{$csvQ->{$ab}{table}[$row][2]}='
+';
}
}
}
}
if (exists($csvQ->{$a}{ancestor}{$b})) {
return 1;
}
elsif (exists($csvQ->{$b}{ancestor}{$a})) {
return -1;
}
return 0;
}
Sorts by ancestral relationship of FKs. Minor optimisation similar to orcish manoevre: only search a csv array for FK entries once.
| [reply] [d/l] |
Re: Sort mechanics problems with objects and potentially contradicting comparisons (would cause infinite loop)
by FloydATC (Deacon) on Jun 02, 2016 at 10:56 UTC
|
I would say that the task of detecting circular references in a data structure does not involve sorting at all. Simply walk the data structure/hierarchy/whatever while keeping track of the path you followed to get there.
Here's some untested code to show what I mean:
sub is_circular {
my $obj = shift; # Object to be traversed
my $path = shift || [];
# Check of $obj is in the $path
foreach my $ancestor (@{$path}) {
return 1 if $ancestor eq $obj;
}
# Check the children recursively, depth first
# (My imaginary $obj conveniently has
# a method for getting an array of its children)
foreach my $child ($obj->children) {
return 1 if is_circular($child, [ @{$path}, $obj ] );
}
return 0;
}
This can be done recursively or in a linear fashion if this suits you better. Modules already exist but writing your own code would probably be faster than trying to fit an existing module with your particular data.
-- FloydATC
Time flies when you don't know what you're doing
| [reply] [d/l] |
|
|
You haven't realised that any traversal would be infinite for the failing cases for the same reasons. Repeat visits to a node in the structure tell you nothing about whether the traversal is finite. And counting those is not really a direct approach. A sort is a kind of traversal too.
| [reply] |
|
|
This problem is known as cycle detection, and there are a number of different solutions, depending on your requirements.
Repeat visits to a node in the structure tell you nothing about whether the traversal is finite.
I'm not sure what you mean by that. Repeat visits to a node tell you that you're in a cycle, and you'll keep going around forever unless you break out (as Floyd's code does).
Most sort algorithms won't loop infinitely if the comparison function is inconsistent. They'll just return the list in a non-meaningful order.
| [reply] |
|
|