TrinityInfinity has asked for the wisdom of the Perl Monks concerning the following question:

OK - Here's a sample line of what I'm trying to sort: # <b><a href="http://a.website.I.need.to.use/CHARTS/formbrowsec.cgi?AP=CHOPS&FN=F3000022">CHOPS F3000022</a></b> 20000407 Brief problem desc goes here.      SMITH JOHN          ACCEPTED   NGSS70 I need to sort each of these lines by date(20000407 in this example), which if you split the above line by whitespace (split(/\s+/, $line), makes the date the 4th item (3) in the resultant array given by the split function.. I'm running into all sorts of problems with sort b/c of that whole ASCIIbetical thing, and the fact that I can't seem to get the $a and $b to point to the right items to sort by. Can anyone offer any examples of reading from a file, and sorting by a particular field, and then outputting that resultant sorted array? Thanks so much. When I can't understand the examples in the O'Reilly book I start to worry...

Replies are listed 'Best First'.
Re: Sorting, arrays and other problems
by mirod (Canon) on Mar 06, 2001 at 23:14 UTC

    It seems like we are answering this kind of question a lot these days...

    So here is a simplified Rosler-Guttman manoeuver, which sticks the field to sort on at the beginning of the data, sort on it (using the speed of the native sort) and then remove the field from the data:

    # this should be read from the last line to the top one my @sorted= map { substr( $_, 8) } # [3]: remove the + date sort # [2]: sort aplph +abetically on the date map { $_= @{[split(/\s/, $_)]}[3] . $_} # [1]: get the da +te and add it at the beginning of the string <DATA>; print join "\n", @sorted; __DATA__ <b><a href="url">text</a></b> 20000407 text <b><a href="url">text</a></b> 20000409 text <b><a href="url">text</a></b> 20000408 text <b><a href="url">text</a></b> 20000507 text

    Note that this breaks _really_ easily, especially using split to extract the date. If I wanted to use this for anything else than a one-shot script I would definitelly use some kind of HTML parser here.

Re: Sorting, arrays and other problems
by arturo (Vicar) on Mar 06, 2001 at 23:01 UTC

    I advocate this a lot these days: if you can, use

    lynx -dump "http://site.i.have.to.use/CHARTS/etc"
    And work on plain text.

    The rest of your problem sounds a little vague. Using $a and $b is pretty simple, unless you're playing with references (which one might do in this situation).

    @dates = sort { $a <=> $b } @dates; #numerical sort of dates

    It would help us a lot more if you could post your code and tell us the error message or how the output your getting diverges from what you want.

    Philosophy can be made out of anything. Or less -- Jerry A. Fodor

      I got it!
      I had the right idea, but didn't get things split right!! here's my code now
      #!/usr/bin/perl # first we open the temp file w/the records to be sorted, place in an +array $input = 'frecord.tmp'; $i = 0; open (input) || die "Can't open file $input: $!\n"; @data = <input>; $datalength = @data; # array length close input; @sorted = sort { @a_fields = split/\s+/, $a; @b_fields = split/\s+/, $b; $a_fields[3] <=> $b_fields[3]; } @data; $sortedlength = @sorted; for ($i=0; $i<=$sortedlength; $i++) { print "$sorted[$i] \n"; }
      Thanks for your help!! I do appreciate it! =)

        You're performing the split for each comparison of the elements in @data. This could add a significant amount of time to your file processing. Since the data does not change, it makes sense to do the split one time and cache the results.

        The Schwartzian Transform is one solution. It combines the sorting on the 4th column that you need, with a more efficient one time split:

        #!/usr/bin/perl -w use strict; open INPUT, "< $ARGV[0]" or die "Could not open file $ARGV[0]: $!"; my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, (split /\s+/)[3] ] } <INPUT>; close INPUT; print "$_\n" for @sorted; __END__

        Here's an explanation of how this works. Please remember you'll need to read the algorithm backwards, from the bottom to the top, to follow this explanation:

        <INPUT> is turned into an array and processed through map. map build a temporary anonymous array, by placing the original line, held in $_, into position 0 of the anon. array. It then proceeds to split the line, and pull out the 4th column, which we then place into position 1.

        Once every line in <INPUT> has been processed by map, they are all passed onto the sort function. Remember that as the elements come into sort, they are an anonymous array we built in the previous step. Now, we dereference these, and compared the date (in position 1), to each other.

        Now, all of the lines are sorted in memory, they are passed into map a second time. map then proceeds to pull out every position 0 element, which were the original lines, and assigns these to @sorted.