Re^4: sorting logfiles by timestamp

Here's essentially the same thing without Time::Piece.

use strict;
use warnings;

my @data;

while (<DATA>) {
    my ($date, $time, $data1, $data2) = split;
    my $key = $date . $time;

    $key =~ s{^(\d\d)/(\d\d)/(\d\d\d\d)(\d\d):(\d\d):(\d\d)}{$3$1$2$4$
+5$6};

    push @data, [ $key, join(',', $date, $time, $data1, $data2) ];
}

print map { "$_->[1]\n" } sort { $a->[0] <=> $b->[0] } @data;

__DATA__
01/14/2014 23:44:14 A Y
01/14/2014 23:44:12 B Y
01/14/2014 23:44:13 C X
01/14/2014 23:44:12 D X
[download]

Better yet, skip split() and use a single regular expression to parse the whole log record as I did here. Personally, I'd take this opportunity to improve the log records by keeping the ISO 8601 format timestamps instead of just using them as a transitory sort key.

Even Dave Rolsky sanctions using a regular expression instead of a proper timestamp parser in exactly this kind of situation. See this slide in his presentation titled A Date with Perl, which you can watch him present here.

Jim

Comment on Re^4: sorting logfiles by timestamp Select or Download Code

Replies are listed 'Best First'.
Re^5: sorting logfiles by timestamp by kcott (Archbishop) on Jan 22, 2014 at 14:37 UTC
I have no problem with your regex approach for the timestamp. The OP seemed to be asking how `Time::Piece` could be used for sorting, so I showed a method for doing that. I agree that ISO 8601 format for the timestamps would be preferable. The OP didn't show real data for "`<data1> <data2>`". I suspect your '`(\S+) (\S+)`' may well be an oversimplification of what's really required; however, no more so than my `split`ting on whitespace. :-) -- Ken	[reply] [d/l] [select]
Re^6: sorting logfiles by timestamp by Jim (Curate) on Jan 22, 2014 at 18:04 UTC
I have no problem with your regex approach for the timestamp. And I have no problem with your parser approach for the timestamp. TIMTOWTDI. ☺ Did you happen to view Dave Rolsky's brief presentation of the slide titled Don't Use a Parser? He explains his qualified recommendation perfectly. I agree that ISO 8601 format for the timestamps would be preferable. Yep. Unless jasonl is required to maintain fidelity to the original representation of the timestamps in the logs for some peculiar reason, this is his best opportunity to improve the data by keeping the reformatted timestamps in ISO 8601 format. The OP didn't show real data for "<data1> <data2>". I suspect your '(\S+) (\S+)' may well be an oversimplification of what's really required; however, no more so than my splitting on whitespace. :-) My '`(\S+) (\S+)`' was an intentional simplification, not an inadvertent oversimplification. Although I didn't explicitly state it, I was tacitly making the point that jasonl could parse the whole log record, including the timestamps, with a single regex. The pattern I used in my code snippet to match `<data1>` and `<data2>` was just a placeholder—one that happens to match the placeholder strings "`<data1>`" and "`<data2>`" literally. ☺ As you've pointed out, jasonl didn't include in his post any verisimilar example data besides just the timestamps, so we can't possibly know how to parse his actual log records properly. Jim	[reply] [d/l] [select]
Re^7: sorting logfiles by timestamp by kcott (Archbishop) on Jan 23, 2014 at 13:59 UTC
"Did you happen to view Dave Rolsky's brief presentation of the slide titled Don't Use a Parser?" The link to the presentation gave: This video is unavailable. (I was able to view the slide). -- Ken	[reply]
Re^8: sorting logfiles by timestamp by Jim (Curate) on Jan 24, 2014 at 16:50 UTC