in reply to Merge 2 or more logs, sort by date & time

Hi, if the format used for the date in each line is the same, I suggest you parse the dates and sort by comparing them either as epoch values or with a built comparator, eg using DateTime or the core module Time::Piece.

Update: Here's a slightly silly example. (It's silly because you shouldn't be using a string sort for this, but rather, offloading the log entries into a database (eg using DBD::SQLite) and doing your sorting there.)

Parsing the dates each time the sort sub made a comparison would be grossly inefficient so I replace them with epoch values and then revert after the sort. This is not to say that sorting with a sub that uses regular expressions is not also hugely inefficient ;-)

use strict; use warnings; use feature 'say'; use Time::Piece; use Data::Dumper; my $fmt = '%a %b %d %T %Y'; say for map { s/ (?<=TIMESTAMP=) (\d+) / localtime($1)->strftime($fmt) /arex } sort { ($a =~ /(?<=TIMESTAMP=) (\d+)/ax)[0] <=> ($b =~ /(?<=TIMESTAMP=) ( +\d+)/ax)[0] } map { chomp; s/ (?<=TIMESTAMP=) (\w+\s+\w+\s+\d+\s+\d+:\d+:\d+\s+\d+) / Time::Piece->strptime($1, $fmt)->epoch /arex } <DATA>; __END__ ...bla TIMESTAMP=Wed Oct 5 04:08:28 2018 bla... ...bla TIMESTAMP=Fri Nov 2 14:11:28 2018 bla... ...bla TIMESTAMP=Tue Oct 16 17:10:00 2018 bla... ...bla TIMESTAMP=Fri Nov 2 14:11:03 2018 bla...
Output:
$ perl 1225106.pl ...bla TIMESTAMP=Fri Oct 05 00:08:28 2018 bla... ...bla TIMESTAMP=Tue Oct 16 13:10:00 2018 bla... ...bla TIMESTAMP=Fri Nov 02 10:11:03 2018 bla... ...bla TIMESTAMP=Fri Nov 02 10:11:28 2018 bla...

Hope this helps!



The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re^2: Merge 2 or more logs, sort by date & time
by ImJustAFriend (Scribe) on Nov 02, 2018 at 16:44 UTC
    Thanks for the updated code, very helpful! Question - how do I tell it to sort the file? Do I still need to split my input into an array?

      Are there multiple records in one log like this ?

      <13>{Mangled Date/Time} <source>[pid]: TIMESTAMP={I'm using this timestamp} MSGCLS= Title= Severity= message = <message part a> <message part b> ... Message Id= END OF REPORT <14>{Mangled Date/Time} <source>[pid]: TIMESTAMP=Tue Oct 16 17:10:00 2018 MSGCLS= Title= Severity= message = <message part a> <message part b> ... Message Id= END OF REPORT
      poj
        Yes, at least 3K per file
Re^2: Merge 2 or more logs, sort by date & time
by ImJustAFriend (Scribe) on Nov 02, 2018 at 16:14 UTC
    Thanks, I'll have a look!