bionicle32 has asked for the wisdom of the Perl Monks concerning the following question:

I have been programming in Perl for about two years and have finally got to a point where I am getting better with using more of perls array and hash functions in large scale programs. I have been challenged with figuring out away to sort a textfile that contains about eight tab separated fields and the last field is a date field (format ex.09-01-2000). They wanted this file to be sorted by the date field. There about 250 lines of data in this file. I am not that confident in my programming skills and was wondering if someone has done something like this before. If so could you provide some help? Thanks! -Bionicle32

Replies are listed 'Best First'.
Re: Sorting a textfile by a date field
by Zaxo (Archbishop) on Oct 01, 2003 at 05:49 UTC

    Does your format example say Jan 9 or Sep 1? Whichever it is, transform to iso standard format (YYYY-MM-DD) and dictionary sort,

    sub to_iso { join '-', reverse split '-', shift; # otherwise, # join '-', (split '-', shift)[2,0,1]; } sub from_iso { join '-', reverse split '-', shift; # otherwise, # join '-', (split '-', shift)[1,2,0]; } # @data is all your records as an AoA my @sorted = map { $_->[7] = from_iso($_->[7])} sort {$a->[7] cmp $b->[7]} map { $_->[7] = to_iso($_->[7])} @data;
    The maps surrounding the sort routine serve to reduce the number of calls to to_iso(). That pattern is called the Schwartzian Transform, an idiom worth understanding.

    After Compline,
    Zaxo

Re: Sorting a textfile by a date field
by davido (Cardinal) on Oct 01, 2003 at 06:07 UTC
    Even at 250 lines of 512 bytes each (pretty long lines) you're only looking at 128k worth of data, so you don't really need to worry about trying to tie an array to a file or something like that. You can do the most time-efficient / programming-efficient thing, and just slurp it in. ....slurping a file in in its entirety isn't all that great usually, except when you're sorting, in which case it really can simplify your life.

    If you're sure that the last field always contains the date, and you want to sort by date, you can probably do something like this:

    use strict; use warnings; use Date::Manip; my @array = <DATA>; # Slurp in the whole file, line by line. @array = map { [split /\t/, $_] } @array; @array = sort{ParseDate($a->[-1]) cmp ParseDate($b->[-1])} @array; @array = map { join "\t", @{$_} } @array;

    Let me explain that....
    The first line slurps in your whole file into @array.

    The second line turns each element of @array into an anonymous array where each element is one field from a line of your file. The last field is the date.

    The third line takes advantage of Date::Manip's ability to turn just about any string into a date that can be compared with cmp. That line also uses a sort routine where you've defined the comparison mechanism to compare the last element of the anonymous array contained in each line of @array.

    The fourth (last) line joins up the elements of the anonymous array contained in each line of @array, so that each line now contains its original tab delimited version.

    If you want to express that in fewer lines:

    @array = map { join "\t", @{$_} } sort{ParseDate($a->[-1]) cmp ParseDate($b->[-1])} map { [split /\t/, $_] } @array;

    The above is untested, so you may have to tinker a little to get it to your liking. You may even prefer to use a different module (a standard library one, for example). But the logic demonstrated ought to be a pretty good starting point.

    Good luck. I hope this helps!

    Dave

    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein

      Or use Data::Sorting to gloss over the details of the transform:
      use Date::Manip; use Data::Sorting qw( sort_array ); my @array = <DATA>; sort_array( @array, sub { ParseDate( (shift) =~ /\t([^\t])$/ ) } );
Re: Sorting a textfile by a date field
by matthewb (Curate) on Oct 01, 2003 at 05:48 UTC
    How far have you got?

    There is, as ever, more than one way to do it but if you only have 250 lines to deal with it seems reasonable to get it all into a suitable data structure and then operate on that.

    It sounds like this will involve little more than splitting on whitespace in a while loop.

    But what then? Are the dates unique? If so, you could make them keys in a hash and just output it with a sort { date_sorting_code() } keys %hash. But it could be more complicated than that.

    I reckon if you gave a few sample lines and clarified some characteristics your problem could be solved in a few minutes.

    MB