in reply to very slow processing
(Pretty much what RonW said.) You require fast lookup over the $id field. Populate your data structure accordingly! Hash will be a good choice here.
If I understood correctly, the task is about grouping lines in a log file by the ID field, plus some additional munging. The following skeleton code might give you some ideas:
ps. I'm trying to puzzle out what hanr might stand for? And what is a displamer?use strict; use warnings; # @ID to keep unique id's (in order they are seen) # %T to group things by id. Make it HoA (Hash of Arrays) my (@ID, %T); ... while (<HANR>) { my ($date, $id, $kw) = /\[(.+?)\]/g; my $txt; next unless $kw; $txt = "$+ to P5" if $kw =~ /^(Input Message)/; $txt = $+ if $kw =~ /^(Orchestration Started)$/; $txt = $+ if $kw =~ /^ProcessName:(.*)/; # note the opportunity to merge some regexes above next unless $txt; push @ID, $id unless $T{$id}; push @{ $T{$id} }, "$date,$id,$txt \n"; } for my $group (@T{ @ID }) { # this is a hash slice! print OUT for @$group; }
Update.
In above example, the @ID array is only to keep IDs ordered by their first encounter. If that's unimportant, replace the hash slice with just (values %T). The $+ is documented in perlvar. But my regex to split fields doesn't quite cut it if kw is not in brackets - needs a fix.
Having a firm handle on (perl) data structures is of great utility; it helps one in making the right algorithmic decisions. For starters, perldsc is a good read.
pps. I'm awfully suspecting that Hanr shot first. With a displamer.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: very slow processing
by sandy105 (Scribe) on Aug 21, 2014 at 09:51 UTC |