in reply to Re^4: sort != sort
in thread sort != sort

The issue is, plain and simple, that @Notes is not being assigned to. Here's further evidence:

I put the comparison checks back in and logged them this way:

my $tmp1 = ""; @Notes = sort { $tmp1 .= $b->{terms}.$a->{terms}; my $r = $b->{terms} <=> $a->{terms}; $tmp1 .= "($r) "; return $r; } @Notes; my $tmp2 = ""; my @Sorted = sort { $tmp2 .= $b->{terms}.$a->{terms}; my $r = $b->{terms} <=> $a->{terms}; $tmp2 .= "($r) "; return $r; } @Notes; open(LOG,">/var/www/dump/tmp.txt"); print LOG "$tmp1\n$tmp2\n"; if ($tmp1 ne $tmp2) { print LOG "UNEQUAL" }

$tmp1 does equal $tmp2, demonstrating that the sort operation occurred and occurred exactly the same way both times. I have also compared @Notes to a copy made before the operation, and it does not change. So again, the issue is, plain and simple, that @Notes is not being assigned to by sort @Notes. Sorry to be stubborn and assertive here, but what else is there to believe at that point? It is not that @Notes gets sorted wrongly, or that the one sort operation differs from the other.

Of course, it's not a big deal as using the @Sorted array is fine and I can undef @Notes afterward. But if anyone thinks this looks like a bonified perl bug (first one I'll have found), and wants to give me some tips about how that could be verified using my existing code, I'd probably be willing to investigate. Otherwise...

ps. I also did this using both "use sort _mergesort" and "use sort _quicksort". The quicksort does slightly fewer comparisons (kind of a surprise, considering the 11111111121111111112111 nature of the data) but otherwise the result is exactly the same.

Replies are listed 'Best First'.
Re^6: sort != sort
by Corion (Patriarch) on Oct 25, 2010 at 18:41 UTC

    I'm not saying that what you're seeing is not a Perl bug. But your code is not yet convincing. You are fairly close though - you just need to show where @Notes is filled with data, and with what data. Note that Data::Dumper will output a string that is valid Perl code, so you can use Data::Dumper to get a canned version of the data that exhibits this behaviour.

      I'm not saying it is a perl bug, but I am saying it is not a bug in my code, or an inconsistency in the data. I'm completely satisfied with this explanation (and thank you for egging me on to that point!) since I do not see anything going wrong beyond the fact that @Notes simply remains unchanged at the end of the sort. If you want to explain further about what exactly you mean about the use of Data::Dumper that I haven't done already (that's where the previous dump came from), I'm listening (the bit about getting "a string that is valid Perl code, so you can use Data::Dumper to get a canned version of the data that exhibits this behaviour" is not clear to me). For the curious, here's the package for the object:

      package PNSearch; use strict; use warnings FATAL => qw(all); use CGIutil; our $Log = "PNSearch.log"; $SIG{__WARN__} = \&logwarn; sub logwarn { CGIutil->logger($Log,shift) } sub new { # represents a note my $self = {}; (my $fname, $self->{terms}, my $ln, my $dbh) = (pop,pop,pop,pop); my $cur = 0; while (<$dbh>) { next unless (++$cur == $ln); $_ =~ s/^([^>]+?)<\|>(.*?)\((.*?)\)\s*<\|>//; if (!defined $1) { CGIutil->logger($Log, "No href defined: $fname\n$_\n\n"); return undef; } my $href = $1; if (!defined $2) { $self->{date} = "&mdash;"; $self->{title} = "[no title]"; } else { $self->{date} = $2; if (!defined $3) { $self->{title} = "[no title]" } else { $self->{title} = $3 }; } $self->{href} = "<a class=\"ntitle\" href=\"/archives/$fname.h +tml#$href\">"; $self->{body} = $_; last; } bless($self); } sub hilight { (my $self, my $term) = (shift,shift); my @left = split /</,$self->{body}; foreach (@left) { # @right halves each elem of @left my @right = split />/,$_; next if ($#right < 1); # no half = <tag><tag> $right[1] =~ s/($term)/<em class="hlite">$1<\/em>/g; $_ = join(">",@right); } $self->{body} = join("<",@left); $self->{title} =~ s/($term)/<em class="hlite">$1<\/em>/g; } 1;

      Here's the actual construction:

      my @Notes; # array of PNSearch objects foreach my $file (@Files) { next unless (-f "$DBDir/$file" && !-z "$DBDir/$file"); # scan text only database # each file represents one .html page, each line represents one wh +ole note unless (open(DB, "<$DBDir/$file")) { CGIutil->logger($Log,"!!Could not open $DBDir/$file: $!"); next; } my @lines = (); # array of arrays, 0 = line number 1 = terms fo +und: qv. checkline() below my $ln = 1; while (<DB>) { my @found = ($ln,checkline($_)); push @lines, \@found if $found[1]; $ln++; } close(DB); # pull selected notes from markup database my $cur = 0; # last line in db my $MUH; unless (open($MUH, "<$DBDir/markup/$file")) { CGIutil->logger($Log,"!!Can't open /markup/$file: $!"); next; } foreach my $l (@lines) { my $pns = PNSearch->new($MUH, $l->[0]-$cur, $l->[1], $file); push @Notes, $pns if ($pns); $cur = $l->[0]; } close($MUH); } sub checkline { my $line = pop; my $c = 0; foreach (@Terms) { $c += 1 if ($line =~ /$Pfix<\|>.*?$_/); # anchor name is befor +e first <|> (don't search that) } # nb: return value is the number of terms found, not the number of + individual hits # ie, if there is only one search term, this will be 0 or 1 return $c; }

      The databases are not relational, they are flat files. Example of the plaintext source:

      30 April 2008 (Possession of <|>30 April 2008 (Possession of "extreme +pornography") <|>SNIP 29 April 2008 (Labor Department and whistleblower law)<|>29 April 2008 + (Labor Department and whistleblower law) <|>SNIP 29 April 2008 (Dalit woman refused treatment and dies)<|>29 April 2008 + (Dalit woman refused treatment and dies) <|>SNIP 29 April 2008 (Veterans and suicide)<|>29 April 2008 (Veterans and sui +cide) <|>SNIP 28 April 2008 (Cluster bombs in Iraq)<|>28 April 2008 (Cluster bombs i +n Iraq)<|>SNIP
      The only difference between that one and the "markup" one is the SNIPPED part contains html.

      Nb, that all the db files have already been verified line by line to ensure they are structured correctly. And as I've said, the final output demonstrates no mistakes in the data set. I can't force anyone to believe that, of course. Notice I'm using fatal warning and logging inconsistencies (there are none reported at this point, and the db has about 15000 notes in it).

      ps. for the astute: yes, those hrefs are not uri_encoded, however, that was not my decision, I'm working to spec.

        If this is mod_perl and you turn on warnings, you might see "will not stay shared" in your error log as a hint to what might be causing the problem.

        - tye        

        So, where in the code you've shown do you sort the things?

        Likely, there is some place where you either sort the wrong thing, sort by the wrong column or assign the result to the wrong thing (a global variable instead of a lexical variable or the other way around). But to determine that, we'll need to see the code as a (small) whole instead of as a collection of selected excerpts.