in reply to sorting logfiles by timestamp

G'day jasonl,

You show no indication of what <data2> contains. I assume they're not unique values as you've used it as a key for an array (@{$myHash{$data2}{info}}). You don't say how many elements this array might hold, if you want to sort on <data2> nor whether <data2> needs to appear in the output.

A representative, unordered sample of the input as well as how you'd expect that to be output would have been helpful.

The following script may provide some help in formulating your solution:

#!/usr/bin/env perl -l use strict; use warnings; use Time::Piece; my %myHash; my $format = '%m/%d/%Y %H:%M:%S'; while (<DATA>) { my ($date, $time, $data1, $data2) = split; my $sort_key = Time::Piece->strptime("$date $time", $format)->epoc +h; push @{$myHash{$data2}{info}}, "$sort_key:$date,$time,$data1"; } for my $key (sort keys %myHash) { print "$_,$key" for map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ split /:/, $myHash{$key}{info}[$_], 2 ] } 0 .. $#{$myHash{$key}{info}}; } __DATA__ 01/14/2014 23:44:14 A Y 01/14/2014 23:44:12 B Y 01/14/2014 23:44:13 C X 01/14/2014 23:44:12 D X

Output:

01/14/2014,23:44:12,D,X 01/14/2014,23:44:13,C,X 01/14/2014,23:44:12,B,Y 01/14/2014,23:44:14,A,Y

If you provide a better description of your problem, a better solution can probably be provided. The guidelines in "How do I post a question effectively?" may help you with this.

-- Ken

Replies are listed 'Best First'.
Re^2: sorting logfiles by timestamp
by jasonl (Acolyte) on Jan 22, 2014 at 23:28 UTC

    OK, I finally got a chance to try this. It works a treat, but I cannot for the life of me wrap my head around how and why, and I hate using code I can't understand (for several reasons). I'm OK until:

    print "$_,$key" for map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ split /:/, $myHash{$key}{info}[$_], 2 ] } 0 .. $#{$myHash{$key}{info}};

    Is there a way to write that as a more C-style for loop, even if it's pseudo-code, or is that the only way it will work? I don't follow the flow as-is, and my attempt to rewrite it ended up with only printing indices of the array. I'm also having trouble following the map { } statements, but hopefully if I can grok the way the loop is working the rest will start to make a little more sense.

    Thanks again.

      "It works a treat, but I cannot for the life of me wrap my head around how and why, and I hate using code I can't understand (for several reasons)."

      Firstly, I'm glad to hear you're not just blindly plugging in code you don't understand.

      The "map {} sort {} map {}" construct is known as the "Schwartzian Transform" (to which Laurent_R referred earlier in this thread).

      Where you encounter combinations of functions which take a list and return a list (e.g. grep, map, sort, etc.), it's often best to evaluate them in reverse order.

      Consider this (clunky) rewrite of that piece of code:

      my @indices_of_myhash_key_info_array = 0 .. $#{$myHash{$key}{info}}; my @two_element_arrayrefs_with_sortkey_and_data = map { [ split /:/, $myHash{$key}{info}[$_], 2 ] } @indices_of_myhash_key_info_array; my @two_element_arrayrefs_sorted_by_sortkey = sort { $a->[0] <=> $b->[0] } @two_element_arrayrefs_with_sortkey_and_data; my @data_element_only_sorted_by_sortkey = map { $_->[1] } @two_element_arrayrefs_sorted_by_sortkey; for (@data_element_only_sorted_by_sortkey) { print "$_,$key"; }

      Hopefully that explains what is going on but feel free to ask if anything needs further explanation.

      -- Ken

Re^2: sorting logfiles by timestamp
by jasonl (Acolyte) on Jan 21, 2014 at 16:15 UTC

    Thanks, all. I haven't had time to fully absorb the different transforms, but that sounds helpful. kcott, you've pretty much hit the nail on the head, except your output isn't sorted the way I need it to be. You've got:

    01/14/2014,23:44:12,D,X 01/14/2014,23:44:13,C,X 01/14/2014,23:44:12,B,Y 01/14/2014,23:44:14,A,Y

    When it needs to be:

    01/14/2014,23:44:12,D,X 01/14/2014,23:44:12,B,Y 01/14/2014,23:44:13,C,X 01/14/2014,23:44:14,A,Y

    Is that the expected behavior? (Note, for my needs in cases where there are multiple entries in the same second, they can be in any order.)

      From the information provided, I can see no need for that complex data structure (i.e. @{$myHash{$data2}{info}}).

      This code produces the output you say you want:

      #!/usr/bin/env perl -l use strict; use warnings; use Time::Piece; my @data; while (<DATA>) { my ($date, $time, $data1, $data2) = split; my $key = Time::Piece->strptime("$date $time", '%m/%d/%Y %H:%M:%S' +)->epoch; push @data, [$key, "$date,$time,$data1,$data2"]; } print for map { $_->[1] } sort { $a->[0] <=> $b->[0] } @data; __DATA__ 01/14/2014 23:44:14 A Y 01/14/2014 23:44:12 B Y 01/14/2014 23:44:13 C X 01/14/2014 23:44:12 D X

      Output:

      01/14/2014,23:44:12,B,Y 01/14/2014,23:44:12,D,X 01/14/2014,23:44:13,C,X 01/14/2014,23:44:14,A,Y

      If that doesn't do exactly what you want, it should at least provide sufficient information for you to attempt a solution yourself. If you do need further help, please ensure you post the missing details.

      -- Ken

        Here's essentially the same thing without Time::Piece.

        use strict; use warnings; my @data; while (<DATA>) { my ($date, $time, $data1, $data2) = split; my $key = $date . $time; $key =~ s{^(\d\d)/(\d\d)/(\d\d\d\d)(\d\d):(\d\d):(\d\d)}{$3$1$2$4$ +5$6}; push @data, [ $key, join(',', $date, $time, $data1, $data2) ]; } print map { "$_->[1]\n" } sort { $a->[0] <=> $b->[0] } @data; __DATA__ 01/14/2014 23:44:14 A Y 01/14/2014 23:44:12 B Y 01/14/2014 23:44:13 C X 01/14/2014 23:44:12 D X

        Better yet, skip split() and use a single regular expression to parse the whole log record as I did here. Personally, I'd take this opportunity to improve the log records by keeping the ISO 8601 format timestamps instead of just using them as a transitory sort key.

        Even Dave Rolsky sanctions using a regular expression instead of a proper timestamp parser in exactly this kind of situation. See this slide in his presentation titled A Date with Perl, which you can watch him present here.

        Jim

        (Posting anon b/c I don't have my pw saved on this browser.) Apologies, I don't know how many times I looked at your first example and it never clicked that you had them grouped by <data2>, which is indeed what I was going for. To clarify, <data2> is a client identifier, and I need to grab all log messages (<data1>) for each client , group them together by client (hence the hash of arrays with the client ID as key), and then sort each group of logs by time. I haven't tried to implement it yet, but it does indeed look like what I was trying to do. Thanks.