in reply to Re: sorting logfiles by timestamp
in thread sorting logfiles by timestamp

Thanks, all. I haven't had time to fully absorb the different transforms, but that sounds helpful. kcott, you've pretty much hit the nail on the head, except your output isn't sorted the way I need it to be. You've got:

01/14/2014,23:44:12,D,X 01/14/2014,23:44:13,C,X 01/14/2014,23:44:12,B,Y 01/14/2014,23:44:14,A,Y

When it needs to be:

01/14/2014,23:44:12,D,X 01/14/2014,23:44:12,B,Y 01/14/2014,23:44:13,C,X 01/14/2014,23:44:14,A,Y

Is that the expected behavior? (Note, for my needs in cases where there are multiple entries in the same second, they can be in any order.)

Replies are listed 'Best First'.
Re^3: sorting logfiles by timestamp
by kcott (Archbishop) on Jan 21, 2014 at 18:29 UTC

    From the information provided, I can see no need for that complex data structure (i.e. @{$myHash{$data2}{info}}).

    This code produces the output you say you want:

    #!/usr/bin/env perl -l use strict; use warnings; use Time::Piece; my @data; while (<DATA>) { my ($date, $time, $data1, $data2) = split; my $key = Time::Piece->strptime("$date $time", '%m/%d/%Y %H:%M:%S' +)->epoch; push @data, [$key, "$date,$time,$data1,$data2"]; } print for map { $_->[1] } sort { $a->[0] <=> $b->[0] } @data; __DATA__ 01/14/2014 23:44:14 A Y 01/14/2014 23:44:12 B Y 01/14/2014 23:44:13 C X 01/14/2014 23:44:12 D X

    Output:

    01/14/2014,23:44:12,B,Y 01/14/2014,23:44:12,D,X 01/14/2014,23:44:13,C,X 01/14/2014,23:44:14,A,Y

    If that doesn't do exactly what you want, it should at least provide sufficient information for you to attempt a solution yourself. If you do need further help, please ensure you post the missing details.

    -- Ken

      Here's essentially the same thing without Time::Piece.

      use strict; use warnings; my @data; while (<DATA>) { my ($date, $time, $data1, $data2) = split; my $key = $date . $time; $key =~ s{^(\d\d)/(\d\d)/(\d\d\d\d)(\d\d):(\d\d):(\d\d)}{$3$1$2$4$ +5$6}; push @data, [ $key, join(',', $date, $time, $data1, $data2) ]; } print map { "$_->[1]\n" } sort { $a->[0] <=> $b->[0] } @data; __DATA__ 01/14/2014 23:44:14 A Y 01/14/2014 23:44:12 B Y 01/14/2014 23:44:13 C X 01/14/2014 23:44:12 D X

      Better yet, skip split() and use a single regular expression to parse the whole log record as I did here. Personally, I'd take this opportunity to improve the log records by keeping the ISO 8601 format timestamps instead of just using them as a transitory sort key.

      Even Dave Rolsky sanctions using a regular expression instead of a proper timestamp parser in exactly this kind of situation. See this slide in his presentation titled A Date with Perl, which you can watch him present here.

      Jim

        I have no problem with your regex approach for the timestamp. The OP seemed to be asking how Time::Piece could be used for sorting, so I showed a method for doing that.

        I agree that ISO 8601 format for the timestamps would be preferable.

        The OP didn't show real data for "<data1> <data2>". I suspect your '(\S+) (\S+)' may well be an oversimplification of what's really required; however, no more so than my splitting on whitespace. :-)

        -- Ken

      (Posting anon b/c I don't have my pw saved on this browser.) Apologies, I don't know how many times I looked at your first example and it never clicked that you had them grouped by <data2>, which is indeed what I was going for. To clarify, <data2> is a client identifier, and I need to grab all log messages (<data1>) for each client , group them together by client (hence the hash of arrays with the client ID as key), and then sort each group of logs by time. I haven't tried to implement it yet, but it does indeed look like what I was trying to do. Thanks.