Re: sorting logfiles by timestamp

G'day jasonl,

You show no indication of what <data2> contains. I assume they're not unique values as you've used it as a key for an array (@{$myHash{$data2}{info}}). You don't say how many elements this array might hold, if you want to sort on <data2> nor whether <data2> needs to appear in the output.

A representative, unordered sample of the input as well as how you'd expect that to be output would have been helpful.

The following script may provide some help in formulating your solution:

#!/usr/bin/env perl -l

use strict;
use warnings;

use Time::Piece;

my %myHash;
my $format = '%m/%d/%Y %H:%M:%S';

while (<DATA>) {
    my ($date, $time, $data1, $data2) = split;
    my $sort_key = Time::Piece->strptime("$date $time", $format)->epoc
+h;
    push @{$myHash{$data2}{info}}, "$sort_key:$date,$time,$data1";
}

for my $key (sort keys %myHash) {
    print "$_,$key" for
        map { $_->[1] }
        sort { $a->[0] <=> $b->[0] }
        map { [ split /:/, $myHash{$key}{info}[$_], 2 ] }
        0 .. $#{$myHash{$key}{info}};
}

__DATA__
01/14/2014 23:44:14 A Y
01/14/2014 23:44:12 B Y
01/14/2014 23:44:13 C X
01/14/2014 23:44:12 D X
[download]

Output:

01/14/2014,23:44:12,D,X
01/14/2014,23:44:13,C,X
01/14/2014,23:44:12,B,Y
01/14/2014,23:44:14,A,Y
[download]

If you provide a better description of your problem, a better solution can probably be provided. The guidelines in "How do I post a question effectively?" may help you with this.

-- Ken

Comment on Re: sorting logfiles by timestamp Select or Download Code

Replies are listed 'Best First'.
Re^2: sorting logfiles by timestamp by jasonl (Acolyte) on Jan 22, 2014 at 23:28 UTC
OK, I finally got a chance to try this. It works a treat, but I cannot for the life of me wrap my head around how and why, and I hate using code I can't understand (for several reasons). I'm OK until: `print "$_,$key" for map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ split /:/, $myHash{$key}{info}[$_], 2 ] } 0 .. $#{$myHash{$key}{info}};` [download] Is there a way to write that as a more C-style for loop, even if it's pseudo-code, or is that the only way it will work? I don't follow the flow as-is, and my attempt to rewrite it ended up with only printing indices of the array. I'm also having trouble following the map { } statements, but hopefully if I can grok the way the loop is working the rest will start to make a little more sense. Thanks again.	[reply] [d/l]
Re^3: sorting logfiles by timestamp by kcott (Archbishop) on Jan 23, 2014 at 14:43 UTC
"It works a treat, but I cannot for the life of me wrap my head around how and why, and I hate using code I can't understand (for several reasons)." Firstly, I'm glad to hear you're not just blindly plugging in code you don't understand. The "`map {} sort {} map {}`" construct is known as the "Schwartzian Transform" (to which Laurent_R referred earlier in this thread). Where you encounter combinations of functions which take a list and return a list (e.g. grep, map, sort, etc.), it's often best to evaluate them in reverse order. Consider this (clunky) rewrite of that piece of code: `my @indices_of_myhash_key_info_array = 0 .. $#{$myHash{$key}{info}}; my @two_element_arrayrefs_with_sortkey_and_data = map { [ split /:/, $myHash{$key}{info}[$_], 2 ] } @indices_of_myhash_key_info_array; my @two_element_arrayrefs_sorted_by_sortkey = sort { $a->[0] <=> $b->[0] } @two_element_arrayrefs_with_sortkey_and_data; my @data_element_only_sorted_by_sortkey = map { $_->[1] } @two_element_arrayrefs_sorted_by_sortkey; for (@data_element_only_sorted_by_sortkey) { print "$_,$key"; }` [download] Hopefully that explains what is going on but feel free to ask if anything needs further explanation. -- Ken	[reply] [d/l] [select]
Re^2: sorting logfiles by timestamp by jasonl (Acolyte) on Jan 21, 2014 at 16:15 UTC
Thanks, all. I haven't had time to fully absorb the different transforms, but that sounds helpful. kcott, you've pretty much hit the nail on the head, except your output isn't sorted the way I need it to be. You've got: `01/14/2014,23:44:12,D,X 01/14/2014,23:44:13,C,X 01/14/2014,23:44:12,B,Y 01/14/2014,23:44:14,A,Y` [download] When it needs to be: `01/14/2014,23:44:12,D,X 01/14/2014,23:44:12,B,Y 01/14/2014,23:44:13,C,X 01/14/2014,23:44:14,A,Y` [download] Is that the expected behavior? (Note, for my needs in cases where there are multiple entries in the same second, they can be in any order.)	[reply] [d/l] [select]
Re^3: sorting logfiles by timestamp by kcott (Archbishop) on Jan 21, 2014 at 18:29 UTC
From the information provided, I can see no need for that complex data structure (i.e. `@{$myHash{$data2}{info}}`). This code produces the output you say you want: `#!/usr/bin/env perl -l use strict; use warnings; use Time::Piece; my @data; while (<DATA>) { my ($date, $time, $data1, $data2) = split; my $key = Time::Piece->strptime("$date $time", '%m/%d/%Y %H:%M:%S' +)->epoch; push @data, [$key, "$date,$time,$data1,$data2"]; } print for map { $_->[1] } sort { $a->[0] <=> $b->[0] } @data; __DATA__ 01/14/2014 23:44:14 A Y 01/14/2014 23:44:12 B Y 01/14/2014 23:44:13 C X 01/14/2014 23:44:12 D X` [download] Output: `01/14/2014,23:44:12,B,Y 01/14/2014,23:44:12,D,X 01/14/2014,23:44:13,C,X 01/14/2014,23:44:14,A,Y` [download] If that doesn't do exactly what you want, it should at least provide sufficient information for you to attempt a solution yourself. If you do need further help, please ensure you post the missing details. -- Ken	[reply] [d/l] [select]
Re^4: sorting logfiles by timestamp by Jim (Curate) on Jan 21, 2014 at 22:57 UTC
Here's essentially the same thing without Time::Piece. `use strict; use warnings; my @data; while (<DATA>) { my ($date, $time, $data1, $data2) = split; my $key = $date . $time; $key =~ s{^(\d\d)/(\d\d)/(\d\d\d\d)(\d\d):(\d\d):(\d\d)}{$3$1$2$4$ +5$6}; push @data, [ $key, join(',', $date, $time, $data1, $data2) ]; } print map { "$_->[1]\n" } sort { $a->[0] <=> $b->[0] } @data; __DATA__ 01/14/2014 23:44:14 A Y 01/14/2014 23:44:12 B Y 01/14/2014 23:44:13 C X 01/14/2014 23:44:12 D X` [download] Better yet, skip `split()` and use a single regular expression to parse the whole log record as I did here. Personally, I'd take this opportunity to improve the log records by keeping the ISO 8601 format timestamps instead of just using them as a transitory sort key. Even Dave Rolsky sanctions using a regular expression instead of a proper timestamp parser in exactly this kind of situation. See this slide in his presentation titled A Date with Perl, which you can watch him present here. Jim	[reply] [d/l] [select]
Re^5: sorting logfiles by timestamp by kcott (Archbishop) on Jan 22, 2014 at 14:37 UTC
Re^6: sorting logfiles by timestamp by Jim (Curate) on Jan 22, 2014 at 18:04 UTC
Some notes below your chosen depth have not been shown here
Re^4: sorting logfiles by timestamp by Anonymous Monk on Jan 22, 2014 at 14:05 UTC
(Posting anon b/c I don't have my pw saved on this browser.) Apologies, I don't know how many times I looked at your first example and it never clicked that you had them grouped by <data2>, which is indeed what I was going for. To clarify, <data2> is a client identifier, and I need to grab all log messages (<data1>) for each client , group them together by client (hence the hash of arrays with the client ID as key), and then sort each group of logs by time. I haven't tried to implement it yet, but it does indeed look like what I was trying to do. Thanks.	[reply]