Re: sorting logfiles by timestamp
by Preceptor (Deacon) on Jan 19, 2014 at 19:10 UTC
|
Date::Parse is your friend for this sort of operation. Because you're using US format date, you can't do an simple (numeric or string wise) sort.
By preference, I'd say 'use the ISO 8601 standard date format' e.g. YYYY-MM-DD HH:MM::SS - the reason being because then this problem is trivial - it sorts both numerically and stringwise - but I realise that's not always an options, so instead:
use Date::Parse;
my %sort_hash;
my $line = "01/14/2014 23:44:12 <data1> <data2>";
my ( $datestr, $timestr, @rest_of_string ) = split ( /\s+/, $line );
my $unix_time = str2time ( $datestr . " " . $timestr );
print $unix_time,"\n";
$sort_hash{$unix_time} = join ( " ", @rest_of_string );
I'm sure you can adapt this for a 'while' loop easily enough. | [reply] [d/l] |
Re: sorting logfiles by timestamp
by Jim (Curate) on Jan 19, 2014 at 21:35 UTC
|
I'm splitting the line and using data2 as the key for a hash…
You can do this because each log record's <data2> is guaranteed to be unique and is therefore a viable key, right?
…and then pushing date, time, and data1 into an array that is the value…
So to amplify Laurent_R's fine suggestion, you're already including in the hash values (i.e., the stored data) the timestamps that will serve as proper sort keys and that you'll therefore use to sort the records later with a Guttman Rosler Transform. You just need to ensure the sort key timestamps are in an ISO 8601 format instead of in the format they're in in the logs. This ensures that when you sort the timestamps lexicographically (ASCIIbetically), they're ordered chronologically as well.
# Parse the log record...
m{^(\d\d)/(\d\d)/(\d\d\d\d) (\d\d:\d\d:\d\d) (\S+) (\S+)} or die;
my $timestamp = "$3-$1-$2 $4";
my $data1 = $5;
my $data2 = $6;
my %myHash;
push @{ $myHash{$data2}{'info'} }, "$timestamp,$data1";
And since it appears you intend to keep the sort key timestamps as data, you don't have to lop them off as you normally would in a Guttman Rosler Transform. So you won't really need to use a transform, per se, at all. You can just sort the records by their hash values.
Jim
| [reply] [d/l] [select] |
Re: sorting logfiles by timestamp
by Laurent_R (Canon) on Jan 19, 2014 at 20:18 UTC
|
| [reply] |
Re: sorting logfiles by timestamp
by kcott (Archbishop) on Jan 20, 2014 at 16:40 UTC
|
G'day jasonl,
You show no indication of what <data2> contains.
I assume they're not unique values as you've used it as a key for an array (@{$myHash{$data2}{info}}).
You don't say how many elements this array might hold, if you want to sort on <data2> nor whether <data2> needs to appear in the output.
A representative, unordered sample of the input as well as how you'd expect that to be output would have been helpful.
The following script may provide some help in formulating your solution:
#!/usr/bin/env perl -l
use strict;
use warnings;
use Time::Piece;
my %myHash;
my $format = '%m/%d/%Y %H:%M:%S';
while (<DATA>) {
my ($date, $time, $data1, $data2) = split;
my $sort_key = Time::Piece->strptime("$date $time", $format)->epoc
+h;
push @{$myHash{$data2}{info}}, "$sort_key:$date,$time,$data1";
}
for my $key (sort keys %myHash) {
print "$_,$key" for
map { $_->[1] }
sort { $a->[0] <=> $b->[0] }
map { [ split /:/, $myHash{$key}{info}[$_], 2 ] }
0 .. $#{$myHash{$key}{info}};
}
__DATA__
01/14/2014 23:44:14 A Y
01/14/2014 23:44:12 B Y
01/14/2014 23:44:13 C X
01/14/2014 23:44:12 D X
Output:
01/14/2014,23:44:12,D,X
01/14/2014,23:44:13,C,X
01/14/2014,23:44:12,B,Y
01/14/2014,23:44:14,A,Y
If you provide a better description of your problem, a better solution can probably be provided.
The guidelines in "How do I post a question effectively?" may help you with this.
| [reply] [d/l] [select] |
|
|
OK, I finally got a chance to try this. It works a treat, but I cannot for the life of me wrap my head around how and why, and I hate using code I can't understand (for several reasons). I'm OK until:
print "$_,$key" for
map { $_->[1] }
sort { $a->[0] <=> $b->[0] }
map { [ split /:/, $myHash{$key}{info}[$_], 2 ] }
0 .. $#{$myHash{$key}{info}};
Is there a way to write that as a more C-style for loop, even if it's pseudo-code, or is that the only way it will work? I don't follow the flow as-is, and my attempt to rewrite it ended up with only printing indices of the array. I'm also having trouble following the map { } statements, but hopefully if I can grok the way the loop is working the rest will start to make a little more sense.
Thanks again. | [reply] [d/l] |
|
|
"It works a treat, but I cannot for the life of me wrap my head around how and why, and I hate using code I can't understand (for several reasons)."
Firstly, I'm glad to hear you're not just blindly plugging in code you don't understand.
The "map {} sort {} map {}" construct is known as the "Schwartzian Transform" (to which Laurent_R referred earlier in this thread).
Where you encounter combinations of functions which take a list and return a list (e.g. grep, map, sort, etc.), it's often best to evaluate them in reverse order.
Consider this (clunky) rewrite of that piece of code:
my @indices_of_myhash_key_info_array
= 0 .. $#{$myHash{$key}{info}};
my @two_element_arrayrefs_with_sortkey_and_data
= map {
[ split /:/, $myHash{$key}{info}[$_], 2 ]
} @indices_of_myhash_key_info_array;
my @two_element_arrayrefs_sorted_by_sortkey
= sort {
$a->[0] <=> $b->[0]
} @two_element_arrayrefs_with_sortkey_and_data;
my @data_element_only_sorted_by_sortkey
= map {
$_->[1]
} @two_element_arrayrefs_sorted_by_sortkey;
for (@data_element_only_sorted_by_sortkey) {
print "$_,$key";
}
Hopefully that explains what is going on but feel free to ask if anything needs further explanation.
| [reply] [d/l] [select] |
|
|
Thanks, all. I haven't had time to fully absorb the different transforms, but that sounds helpful. kcott, you've pretty much hit the nail on the head, except your output isn't sorted the way I need it to be. You've got:
01/14/2014,23:44:12,D,X
01/14/2014,23:44:13,C,X
01/14/2014,23:44:12,B,Y
01/14/2014,23:44:14,A,Y
When it needs to be:
01/14/2014,23:44:12,D,X
01/14/2014,23:44:12,B,Y
01/14/2014,23:44:13,C,X
01/14/2014,23:44:14,A,Y
Is that the expected behavior? (Note, for my needs in cases where there are multiple entries in the same second, they can be in any order.) | [reply] [d/l] [select] |
|
|
From the information provided, I can see no need for that complex data structure (i.e. @{$myHash{$data2}{info}}).
This code produces the output you say you want:
#!/usr/bin/env perl -l
use strict;
use warnings;
use Time::Piece;
my @data;
while (<DATA>) {
my ($date, $time, $data1, $data2) = split;
my $key = Time::Piece->strptime("$date $time", '%m/%d/%Y %H:%M:%S'
+)->epoch;
push @data, [$key, "$date,$time,$data1,$data2"];
}
print for map { $_->[1] } sort { $a->[0] <=> $b->[0] } @data;
__DATA__
01/14/2014 23:44:14 A Y
01/14/2014 23:44:12 B Y
01/14/2014 23:44:13 C X
01/14/2014 23:44:12 D X
Output:
01/14/2014,23:44:12,B,Y
01/14/2014,23:44:12,D,X
01/14/2014,23:44:13,C,X
01/14/2014,23:44:14,A,Y
If that doesn't do exactly what you want, it should at least provide sufficient information for you to attempt a solution yourself.
If you do need further help, please ensure you post the missing details.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|