in reply to Re^2: sorting logfiles by timestamp
in thread sorting logfiles by timestamp

From the information provided, I can see no need for that complex data structure (i.e. @{$myHash{$data2}{info}}).

This code produces the output you say you want:

#!/usr/bin/env perl -l use strict; use warnings; use Time::Piece; my @data; while (<DATA>) { my ($date, $time, $data1, $data2) = split; my $key = Time::Piece->strptime("$date $time", '%m/%d/%Y %H:%M:%S' +)->epoch; push @data, [$key, "$date,$time,$data1,$data2"]; } print for map { $_->[1] } sort { $a->[0] <=> $b->[0] } @data; __DATA__ 01/14/2014 23:44:14 A Y 01/14/2014 23:44:12 B Y 01/14/2014 23:44:13 C X 01/14/2014 23:44:12 D X

Output:

01/14/2014,23:44:12,B,Y 01/14/2014,23:44:12,D,X 01/14/2014,23:44:13,C,X 01/14/2014,23:44:14,A,Y

If that doesn't do exactly what you want, it should at least provide sufficient information for you to attempt a solution yourself. If you do need further help, please ensure you post the missing details.

-- Ken

Replies are listed 'Best First'.
Re^4: sorting logfiles by timestamp
by Jim (Curate) on Jan 21, 2014 at 22:57 UTC

    Here's essentially the same thing without Time::Piece.

    use strict; use warnings; my @data; while (<DATA>) { my ($date, $time, $data1, $data2) = split; my $key = $date . $time; $key =~ s{^(\d\d)/(\d\d)/(\d\d\d\d)(\d\d):(\d\d):(\d\d)}{$3$1$2$4$ +5$6}; push @data, [ $key, join(',', $date, $time, $data1, $data2) ]; } print map { "$_->[1]\n" } sort { $a->[0] <=> $b->[0] } @data; __DATA__ 01/14/2014 23:44:14 A Y 01/14/2014 23:44:12 B Y 01/14/2014 23:44:13 C X 01/14/2014 23:44:12 D X

    Better yet, skip split() and use a single regular expression to parse the whole log record as I did here. Personally, I'd take this opportunity to improve the log records by keeping the ISO 8601 format timestamps instead of just using them as a transitory sort key.

    Even Dave Rolsky sanctions using a regular expression instead of a proper timestamp parser in exactly this kind of situation. See this slide in his presentation titled A Date with Perl, which you can watch him present here.

    Jim

      I have no problem with your regex approach for the timestamp. The OP seemed to be asking how Time::Piece could be used for sorting, so I showed a method for doing that.

      I agree that ISO 8601 format for the timestamps would be preferable.

      The OP didn't show real data for "<data1> <data2>". I suspect your '(\S+) (\S+)' may well be an oversimplification of what's really required; however, no more so than my splitting on whitespace. :-)

      -- Ken

        I have no problem with your regex approach for the timestamp.

        And I have no problem with your parser approach for the timestamp. TIMTOWTDI. ☺

        Did you happen to view Dave Rolsky's brief presentation of the slide titled Don't Use a Parser? He explains his qualified recommendation perfectly.

        I agree that ISO 8601 format for the timestamps would be preferable.

        Yep. Unless jasonl is required to maintain fidelity to the original representation of the timestamps in the logs for some peculiar reason, this is his best opportunity to improve the data by keeping the reformatted timestamps in ISO 8601 format.

        The OP didn't show real data for "<data1> <data2>". I suspect your '(\S+) (\S+)' may well be an oversimplification of what's really required; however, no more so than my splitting on whitespace. :-)

        My '(\S+) (\S+)' was an intentional simplification, not an inadvertent oversimplification. Although I didn't explicitly state it, I was tacitly making the point that jasonl could parse the whole log record, including the timestamps, with a single regex. The pattern I used in my code snippet to match <data1> and <data2> was just a placeholder—one that happens to match the placeholder strings "<data1>" and "<data2>" literally. ☺ As you've pointed out, jasonl didn't include in his post any verisimilar example data besides just the timestamps, so we can't possibly know how to parse his actual log records properly.

        Jim

Re^4: sorting logfiles by timestamp
by Anonymous Monk on Jan 22, 2014 at 14:05 UTC

    (Posting anon b/c I don't have my pw saved on this browser.) Apologies, I don't know how many times I looked at your first example and it never clicked that you had them grouped by <data2>, which is indeed what I was going for. To clarify, <data2> is a client identifier, and I need to grab all log messages (<data1>) for each client , group them together by client (hence the hash of arrays with the client ID as key), and then sort each group of logs by time. I haven't tried to implement it yet, but it does indeed look like what I was trying to do. Thanks.