bionicle32 has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Perl Monks,

I again am in need of your help again. I may not have the best approach, but I will try my best to explain what I am trying to do. I have a directory of text files with information in each of them. I did the easiest part which was to read the directory and each file for specific information and wrote it to one file for further manipulation.

Now that I have this one file I have read it into an array which I will need to now sort by the fourth subscript (array[3]) which is the date field. I know I could make it my first field, but I arranged the fields by order of importance.

I have read and even attemped a few of the examples in the previous write ups, but have had no success. I am using a tab as my delimiter which shouldn't make a difference. Here is a snippet of my code.
for (@links) { my ($status, $reference, $project, $created) = split(/\t/, $_); if ($status eq "Completed") { push(@complete, $_); } else { push(@progress, $_); } } my @sortedComplete = sort { $a->[3] <=> $b->[3] } @complete; my @sortedProgress = sort { $a->[3] <=> $b->[3] } @progress;
The four fields are defined after the declaration of my for loop. I would greatly appreciate your help. All I need is to keep all of the rows as is, but sorted by the date the project was created.

Thank you all,
Bionicle32

Replies are listed 'Best First'.
Re: Sorting a slurped file by a date field
by hardburn (Abbot) on Feb 03, 2004 at 23:03 UTC

    You're almost there. There are a few different approaches you can take to get the rest of the way.

    The simplist is to store a reference to a list in the existing lists (thus creating an array-of-arrays. In this case, you wouldn't need to modify your current sort lines. However, it would change the layout of your datastructure, so other code will probably have to change.

    The second way is to use a Schwarzian Transform, which pulls the data apart before it gets to sort and then puts it back together. It looks like this:

    my @sortedComplete = map { $_->[0] } sort { $a->[0] <=> $b->[0] } map { [ $_, (split /\t/, $_)[3] ]} @complete; # Do the same for @progress

    The third solution is the GRT, which is conceptually similar to the Schwartzian, but is a bit faster. The difference is that in teh Schwartzian, you pull the data into a seperate portion of a temperary data structure, but in the GRT, you keep everything in the same string. This means you can make use of Perl's built-in sorting subroutine, which is implemented in C and will likely be much faster than calling the Perl block. It looks something like this (totally untested):

    my @sortedComplete = map { join "\t", (split/\t/)[1,2,3,0] } sort map { join "\t", (split/\t/)[3,0,1,2] } @complete; # Do the same for @progress

    The trick here is that the field you want to sort on goes first in the string. Note, though, that if two dates are the same, they'll be sorted based on the characters following the date.

    ----
    I wanted to explore how Perl's closures can be
    manipulated, and ended up creating an object
    system by accident.
    -- Schemer

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

Re: Sorting a slurped file by a date field
by ysth (Canon) on Feb 03, 2004 at 22:49 UTC
    It looks like you are expecting the arrays to have each element contain an array of $status,$reference,$project,$created, but you're not actually storing them that way. Try changing your pushes to be like:
    push @complete, [$status, $reference, $project, $created];
      I am expecting each row in the array to contain "$status\t$reference\t$project\t$created\n"

      The array has 96 rows so far and each row has four elements tab separated.

      Here I am slurping the entire file into an array.
      open (LINKS, "$linksQuery") || die ("Can't open $linksQuery: $!"); my @links = <LINKS>; close (LINKS);
      I take it one step further by separating this one array into two arrays based off of $status.
      for (@links) { chomp($_); my ($status, $reference, $project, $created) = split(/\t/, $_); if ($status eq "Completed") { push(@complete, $_); } else { push(@progress, $_); } }
      After I have these two arrays I want to sort each in most recent to latest order. I used the other response to my post, which did sort both arrays but "12/13/02 09:42:51 CST" happens to appear before 12/05/02 13:19:50 CST when it should be the other way around.

      I hope that I made things sound a little more clearier with this explanation.

      Thanks for you help again, Bionicle32
        Thanks, that's a lot clearer. To sort in descending order, simply replace $a with $b and $b with $a; e.g. in place of { $a->[3] <=> $b->[3] }, use { $b->[3] <=> $a->[3] }.

        I hope your $created's are actual numbers, not strings as you show. If you actually have strings in your data, you are going to have to parse them further, since they won't be easily comparable as is.