in reply to Re^6: sort != sort
in thread sort != sort

I'm not saying it is a perl bug, but I am saying it is not a bug in my code, or an inconsistency in the data. I'm completely satisfied with this explanation (and thank you for egging me on to that point!) since I do not see anything going wrong beyond the fact that @Notes simply remains unchanged at the end of the sort. If you want to explain further about what exactly you mean about the use of Data::Dumper that I haven't done already (that's where the previous dump came from), I'm listening (the bit about getting "a string that is valid Perl code, so you can use Data::Dumper to get a canned version of the data that exhibits this behaviour" is not clear to me). For the curious, here's the package for the object:

package PNSearch; use strict; use warnings FATAL => qw(all); use CGIutil; our $Log = "PNSearch.log"; $SIG{__WARN__} = \&logwarn; sub logwarn { CGIutil->logger($Log,shift) } sub new { # represents a note my $self = {}; (my $fname, $self->{terms}, my $ln, my $dbh) = (pop,pop,pop,pop); my $cur = 0; while (<$dbh>) { next unless (++$cur == $ln); $_ =~ s/^([^>]+?)<\|>(.*?)\((.*?)\)\s*<\|>//; if (!defined $1) { CGIutil->logger($Log, "No href defined: $fname\n$_\n\n"); return undef; } my $href = $1; if (!defined $2) { $self->{date} = "&mdash;"; $self->{title} = "[no title]"; } else { $self->{date} = $2; if (!defined $3) { $self->{title} = "[no title]" } else { $self->{title} = $3 }; } $self->{href} = "<a class=\"ntitle\" href=\"/archives/$fname.h +tml#$href\">"; $self->{body} = $_; last; } bless($self); } sub hilight { (my $self, my $term) = (shift,shift); my @left = split /</,$self->{body}; foreach (@left) { # @right halves each elem of @left my @right = split />/,$_; next if ($#right < 1); # no half = <tag><tag> $right[1] =~ s/($term)/<em class="hlite">$1<\/em>/g; $_ = join(">",@right); } $self->{body} = join("<",@left); $self->{title} =~ s/($term)/<em class="hlite">$1<\/em>/g; } 1;

Here's the actual construction:

my @Notes; # array of PNSearch objects foreach my $file (@Files) { next unless (-f "$DBDir/$file" && !-z "$DBDir/$file"); # scan text only database # each file represents one .html page, each line represents one wh +ole note unless (open(DB, "<$DBDir/$file")) { CGIutil->logger($Log,"!!Could not open $DBDir/$file: $!"); next; } my @lines = (); # array of arrays, 0 = line number 1 = terms fo +und: qv. checkline() below my $ln = 1; while (<DB>) { my @found = ($ln,checkline($_)); push @lines, \@found if $found[1]; $ln++; } close(DB); # pull selected notes from markup database my $cur = 0; # last line in db my $MUH; unless (open($MUH, "<$DBDir/markup/$file")) { CGIutil->logger($Log,"!!Can't open /markup/$file: $!"); next; } foreach my $l (@lines) { my $pns = PNSearch->new($MUH, $l->[0]-$cur, $l->[1], $file); push @Notes, $pns if ($pns); $cur = $l->[0]; } close($MUH); } sub checkline { my $line = pop; my $c = 0; foreach (@Terms) { $c += 1 if ($line =~ /$Pfix<\|>.*?$_/); # anchor name is befor +e first <|> (don't search that) } # nb: return value is the number of terms found, not the number of + individual hits # ie, if there is only one search term, this will be 0 or 1 return $c; }

The databases are not relational, they are flat files. Example of the plaintext source:

30 April 2008 (Possession of <|>30 April 2008 (Possession of "extreme +pornography") <|>SNIP 29 April 2008 (Labor Department and whistleblower law)<|>29 April 2008 + (Labor Department and whistleblower law) <|>SNIP 29 April 2008 (Dalit woman refused treatment and dies)<|>29 April 2008 + (Dalit woman refused treatment and dies) <|>SNIP 29 April 2008 (Veterans and suicide)<|>29 April 2008 (Veterans and sui +cide) <|>SNIP 28 April 2008 (Cluster bombs in Iraq)<|>28 April 2008 (Cluster bombs i +n Iraq)<|>SNIP
The only difference between that one and the "markup" one is the SNIPPED part contains html.

Nb, that all the db files have already been verified line by line to ensure they are structured correctly. And as I've said, the final output demonstrates no mistakes in the data set. I can't force anyone to believe that, of course. Notice I'm using fatal warning and logging inconsistencies (there are none reported at this point, and the db has about 15000 notes in it).

ps. for the astute: yes, those hrefs are not uri_encoded, however, that was not my decision, I'm working to spec.

Replies are listed 'Best First'.
Re^8: sort != sort (mod_perl?)
by tye (Sage) on Oct 25, 2010 at 20:01 UTC

    If this is mod_perl and you turn on warnings, you might see "will not stay shared" in your error log as a hint to what might be causing the problem.

    - tye        

      Nope.
Re^8: sort != sort
by Corion (Patriarch) on Oct 25, 2010 at 19:13 UTC

    So, where in the code you've shown do you sort the things?

    Likely, there is some place where you either sort the wrong thing, sort by the wrong column or assign the result to the wrong thing (a global variable instead of a lexical variable or the other way around). But to determine that, we'll need to see the code as a (small) whole instead of as a collection of selected excerpts.

      The only sort that happens is the one I've shown you, on the array constructed above (notice the array name, @Notes). There is nothing else to it. You have 100% of the code in which @Notes is referenced here, and 100% of the code which constructed it. If anyone wants to take the time to look for errors, I'm of course grateful. Just to be extra clear:
      1. @Notes is constructed using the code and package above.
      2. @Notes is sorted using the code shown earlier.
      3. Sorted array is displayed making use of PNSearch::highlight, but this is subsequent to everything else; in fact I've been logging this with "die" right after the sort. Without that die all the notes do appear correctly, by which point the data has passed through more than a few error checks. As I said, none of those if (!defined) fire, I'm logging warnings and making them fatal, use strict, use Taint, not a peep. Numbers compare to numbers, strings are not mixed up, mangled, or foreshortened etc, etc. On visual inspection all the search finds and highlights are also correct.

      Maybe also worth mentioning this is a revision (new features) of code I wrote last year that has been used on a production server daily since then. So it's already gone thru a lot of tweaking, testing, usage/error logging, and debugging. I've been asked to implement this elsewhere, and I've never received any complaints.

        halfcountplus,

        Here's my two cents. Please consider the following:

        1. The various sort functions are well-known and I doubt the built-in sort has a new bug.

        2. Even if the built-in sort is somehow broken, you said you used a couple of other sorts, as well, and I doubt they all broke at once.

        3. When you feed sort a block, as below,
          @Notes = sort { $b->{terms} <=> $a->{terms} } @Notes;
          you're comparing number to number (<=>), or text to text (cmp). It returns a -1, 0, or 1. Constructions like the one above are VERY common and work in the general case. You've made an assumption that @Notes is not being assigned to, but if the comparison operators are working, and the sort functions are working, and the assign-to operator is working, then the only thing left is to look at what's actually being compared.

        4. Which, I hasten to point out, is the one thing we haven't seen yet except as a small excerpt of dumped hashes. We can't duplicate all your code because we don't have all your modules or your databases, and all you are showing us is your code and part of your databases. Please use Data::Dumper or YAML or whatever you like and Dump the the actual structure of @Notes and post at least a subset. Maybe here:
          foreach my $l (@lines) { my $pns = PNSearch->new($MUH, $l->[0]-$cur, $l->[1], $file); push @Notes, $pns if ($pns); $cur = $l->[0]; } close($MUH); } use Data::Dumper; open DUMP, ">somefile.txt" or die $!; Dumper \@Notes; close DUMP;
        5. No one has called you a liar, and it doesn't have to be a bug in your code (which you said you updated, by the by!!!), but obviously something isn't working and that's why people keep asking to see what's actually in @Notes, rather than hear your assurances about it.

        Sincerely,

        --marmot