Sorting a textfile by a date field

bionicle32 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Sorting a textfile by a date field by Zaxo (Archbishop) on Oct 01, 2003 at 05:49 UTC
Does your format example say Jan 9 or Sep 1? Whichever it is, transform to iso standard format (YYYY-MM-DD) and dictionary sort, `sub to_iso { join '-', reverse split '-', shift; # otherwise, # join '-', (split '-', shift)[2,0,1]; } sub from_iso { join '-', reverse split '-', shift; # otherwise, # join '-', (split '-', shift)[1,2,0]; } # @data is all your records as an AoA my @sorted = map { $_->[7] = from_iso($_->[7])} sort {$a->[7] cmp $b->[7]} map { $_->[7] = to_iso($_->[7])} @data;` [download] The maps surrounding the sort routine serve to reduce the number of calls to to_iso(). That pattern is called the Schwartzian Transform, an idiom worth understanding. After Compline, Zaxo	[reply] [d/l]
Re: Sorting a textfile by a date field by davido (Cardinal) on Oct 01, 2003 at 06:07 UTC
Even at 250 lines of 512 bytes each (pretty long lines) you're only looking at 128k worth of data, so you don't really need to worry about trying to tie an array to a file or something like that. You can do the most time-efficient / programming-efficient thing, and just slurp it in. ....slurping a file in in its entirety isn't all that great usually, except when you're sorting, in which case it really can simplify your life. If you're sure that the last field always contains the date, and you want to sort by date, you can probably do something like this: `use strict; use warnings; use Date::Manip; my @array = <DATA>; # Slurp in the whole file, line by line. @array = map { [split /\t/, $_] } @array; @array = sort{ParseDate($a->[-1]) cmp ParseDate($b->[-1])} @array; @array = map { join "\t", @{$_} } @array;` [download] Let me explain that.... The first line slurps in your whole file into @array. The second line turns each element of @array into an anonymous array where each element is one field from a line of your file. The last field is the date. The third line takes advantage of Date::Manip's ability to turn just about any string into a date that can be compared with cmp. That line also uses a sort routine where you've defined the comparison mechanism to compare the last element of the anonymous array contained in each line of @array. The fourth (last) line joins up the elements of the anonymous array contained in each line of @array, so that each line now contains its original tab delimited version. If you want to express that in fewer lines: `@array = map { join "\t", @{$_} } sort{ParseDate($a->[-1]) cmp ParseDate($b->[-1])} map { [split /\t/, $_] } @array;` [download] The above is untested, so you may have to tinker a little to get it to your liking. You may even prefer to use a different module (a standard library one, for example). But the logic demonstrated ought to be a pretty good starting point. Good luck. I hope this helps! Dave "If I had my life to do over again, I'd be a plumber." -- Albert Einstein	[reply] [d/l] [select]
Re: Re: Sorting a textfile by a date field by simonm (Vicar) on Oct 01, 2003 at 16:55 UTC
Or use Data::Sorting to gloss over the details of the transform: `use Date::Manip; use Data::Sorting qw( sort_array ); my @array = <DATA>; sort_array( @array, sub { ParseDate( (shift) =~ /\t([^\t])$/ ) } );` [download]	[reply] [d/l]
Re: Sorting a textfile by a date field by matthewb (Curate) on Oct 01, 2003 at 05:48 UTC
How far have you got? There is, as ever, more than one way to do it but if you only have 250 lines to deal with it seems reasonable to get it all into a suitable data structure and then operate on that. It sounds like this will involve little more than splitting on whitespace in a while loop. But what then? Are the dates unique? If so, you could make them keys in a hash and just output it with a `sort { date_sorting_code() } keys %hash`. But it could be more complicated than that. I reckon if you gave a few sample lines and clarified some characteristics your problem could be solved in a few minutes. MB	[reply] [d/l]