Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have an issue with doing a sort in a memory buffer as follows. I'm not sure
1. How to do this ?
and 2. What the most efficient method is ?
The input file looks something like this (I can't change the format of this file)
a 0 f g j k b 4 y h t e e e c 1 n s v c x d 3 f f f
I've written a bit of code to produce an HTML page for use with MIME::Lite. This all works fine except that I need to be able to sort the contents based on the second field (descending) in each row. Not the first which is the format the file is presented to me in.
This is the code I have to create the HTML page from the input file
my $buffer; open (INPF,"<input.dat") or die "Can't open input.dat: $!\n"; while (<INPF>) { m/^(\w+)\s+(\d+)\s+(\w+)(.*)$/ or die "Unable to match any lines: +$!\n"; if ($2 < 90) { $buffer .= "<tr bgcolor='#00FF00'><td>$1</td><td>$ +2</td><td>$3</td><td>$4</td></tr>\n"; } elsif ($2 < 180) { $buffer .= "<tr bgcolor='#FF6600'><td>$1</td><td>$2</td><td>$3 +</td><td>$4</td></tr>\n"; } else { $buffer .= "<tr bgcolor='#FF0000'><td>$1</td><td>$2</td><td>$3 +</td><td>$4</td></tr>\n"; } } close INPF;
How do I sort the resulting buffer on field 2 before passing it on to the mail routine ?
I should point out that the input file is unlikely to grow beyond about 30,000 records (approx 20Mb)
Any help appreciated

Replies are listed 'Best First'.
Re: most efficient buffer sort
by GrandFather (Saint) on Dec 14, 2005 at 11:09 UTC

    I'm not sure what you are trying to achieve, but if the following (leaving out the HTML stuff for the moment) does what you want let us know:

    use strict; use warnings; my @lines; while (<DATA>) { my ($first, $sort, $second, $tail) = m/^(\w+)\s+(\d+)\s+(\w+)(.*)$/; die "Unable to match any lines:$!\n" if ! defined $tail; push @lines, [$sort, $first, $second, $tail]; } print join "\n", map {"$_->[1] $_->[0] $_->[2]$_->[3]"} sort {$a->[0] +<=> $b->[0]} @lines; __DATA__ a 0 f g j k b 4 y h t e e e c 1 n s v c x d 3 f f f

    Prints:

    a 0 f g j k c 1 n s v c x d 3 f f f b 4 y h t e e e

    DWIM is Perl's answer to Gödel
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: most efficient buffer sort
by salva (Canon) on Dec 14, 2005 at 11:55 UTC
    use Sort::Key qw(ikeysort); my $buffer; open (INPF,"<input.dat") or die "Can't open input.dat: $!\n"; for (ikeysort {/^\w+\s+(\d+)/; $1} <INPF>) { m/^(\w+)\s+(\d+)\s+(\w+)(.*)$/ or die "Unable to match any lines:$!\ +n"; if ($2 < 90) { $buffer .= "<tr bgcolor='#00FF00'><td>$1</td><td>$2</ +td><td>$3</td><td>$4</td></tr>\n"; } elsif ($2 < 180) { $buffer .= "<tr bgcolor='#FF6600'><td>$1</td><td>$2</td><td>$3</td +><td>$4</td></tr>\n"; } else { $buffer .= "<tr bgcolor='#FF0000'><td>$1</td><td>$2</td><td>$3</td +><td>$4</td></tr>\n"; } } close INPF;
      This is good stuff. However, having downloaded your CPAN module. Note that I changed the code above so it does a descending sort e.g.
      for (rikeysort {/^\w+\s+(\d+)/; $2} <INPF>) {
      This error now occurs
      Can't call method "rikeysort" on an undefined value at ./x.pl line 401 +, <INPF> line 12.
      There are only 12 records in my test input file
      It works fine if I use ascending key sort
      Any ideas what is happenning here ?
        have you changed the use Sort::Key statement to import rikeysort instead of ikeysort?
        use Sort::Key qw(rikeysort);
Re: most efficient buffer sort
by serf (Chaplain) on Dec 14, 2005 at 12:15 UTC
    I have been a bit naughty in assuming that it's OK to use
    my ($first, $sort, $second, $tail) = split(/\s+/, $_);
    to replace your:
    m/^(\w+)\s+(\d+)\s+(\w+)(.*)$/
    Because if you have multiple spaces immediately after $second they will be lost - but as you're putting the data into HTML table cells this shouldn't be an issue.

    NB: I used split(/\s+/, $_) and not split(' ', $_) because you were matching /^(\w+), which may just have been for efficiency and anchoring, but I don't know that you didn't need to make sure that there was no leading white space on the line in the input file.

    I have not used GrandFather's map because I needed to do quite a lot to the elements returned by the sort (the tests on $sort and the sprintf) and it looked like it was going to be messy and possibly difficult trying to fit it all in there.

    I like map, but I tend to shy away from using it for more than the most basic usage - I know 95%+ of the people I've ever worked with who have to deal with Perl would not be able to understand how the map worked, but could all unroll a foreach loop if they needed to change the code after I had moved on to my next contract.

    use strict; use warnings; my $buffer; my $input = "input.dat"; my %colour = ( 'lt_90' => '#00FF00', 'lt_180' => '#FF6600', 'gt_179' => '#FF0000' ); my @lines; open( INPF, $input ) or die "Can't read '$input': $!\n"; while (<INPF>) { my ($first, $sort, $second, $tail) = split(/\s+/, $_); if ( defined $tail && $sort =~ /^\d+$/ ) { push(@lines, [$first, $sort, $second, $tail]); } else { # die "Unable to match any lines: $!\n"; # Do you really want to die with $! here? # There wasn't an error, just a failed test, # so $! won't have a relevant message in it. # (I get "Bad file descriptor" YMMV on your OS?) # Perhaps you want something like: chomp(); die "line: '$_' does not match input format\n"; } } close (INPF); for my $line ( sort { $a->[1] <=> $b->[1] } @lines ) { my ($first, $sort, $second, $tail) = @{$line}; my $colour = "gt_179"; if ( $sort < 90 ) { $colour = "lt_90"; } elsif ( $sort < 180 ) { $colour = "lt_180"; } $buffer .= sprintf ( "<tr bgcolor='%s'><td>%s</td><td>%s</td><td>%s</td><td>%s</td> +</tr>\n", $colour{$colour}, $first, $sort, $second, $tail); } print "$buffer";
      I like map...

      calling die from inside a map block is not a very good idea, it can hit a bug on the perl interpreter that has only recently being corrected:

      for (1) { map { die } 2 }
      Good point re the die statement. It's just habit to put that line in. I've changed it accordingly. Thanks.