in reply to The need for speed
There are a couple things you can do to speed up your code a bit. I don't know if they'll give you the kind of improvements you are looking for though. You didn't mention what hardware you got those runtimes on, by the way. For heavy log processing, it often makes sense to use a dedicated host or hosts and this might be a problem worth throwing some hardware at as you need it in the future. Analyzing 6 hours worth of data in 4 hours still allows you to finish well ahead of starting the next batch so why isn't it quick enough? What are your requirements?
Following are some optimizations.
The first thing I see is that you use s/// to count characters. You should use tr/// for that instead.
my $count = $string =~ tr/@//;
I don't know how much it will help in your case but using alternation when a character class will do is generally frowned upon. Also, you might as well use precompiled expressions when you can. I'd change
to$CHARS = '(<|>|\[|\])';
$CHARS = qr/([][<>])/;
In your first loop, why do you perform a substitution on $id after matching it with a regular expression in the first place? I would replace
with$id = $1 if (/msgid=([^:]+):/); $id =~ s/(^<|>$)//g;
It isn't exactly the same but I think it would work based on your sample data. If the msgid isn't always bracketed by '<' and '>' then it will break. It will also break if a '>' is permitted in a msgid. One way or the other though, there is bound to be a way to express what you need in one regex.($id) = /msgid=<([^>]+)/;
This may be more of a style issue, but in that first loop, you also have
Why not just say $no_id++ and next unless $id =~ /\S/;? For that matter, you start it with next if (!/MsgTrace/); which would be better expressed with next unless /MsgTrace/; because it avoid the double negative.if (!$id || $id =~ /^(\s+|)$/) { $no_id++; next; }
In your next loop, you make a copy of your data each time. That's a waste especially since you go to lengths not to modify your copy and you throw both the copy and the original out when you are done. (In fact, you probably shouldn't go to the trouble throwing the original out, by the way. You aren't doing what you think anyway.) Just work with the original. That's bound to save you some time. Also, consider passing a reference to process_it() rather than the array itself. Also, in that function, each of those greps is a loop. Try to combine them. Instead of
try something like (untested):(@data) = grep(!/received from internet:/, @in); (@data) = grep(!/Error-Handler/, @data) if ( grep(/Error-Handler/, @data) && $data[1] );
There's probably a better way to do that but that's one way you might reduce those three loops (greps) to one (map).if ($data[1]) { # Do you really mean that, by the way? my (@t1,@t2); @t1 = map { if (/Error-Handler/) { $_ } else { !/received from internet:/ and push @t2, $_; (); } } @data; @data = @t2 if @t1; }
Another micro-optimisation: Don't capture things you don't use in regexen:
would be better written asif ($line[0] =~ /( |-)([a-zA-Z]+)$/) {
instead.if ($line[0] =~ /[ -](a-zA-Z]+)$/) {
I imagine you can clean it up in other ways too. The most dramatic changes to performance usually come from algorithmic changes and I don't know enough about your problem to really think about a better way to do it. Eliminating loops within other loops is one rule of thumb that should help you a great deal as long as you remember that greps and maps are loops too.
-sauoq "My two cents aren't worth a dime.";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: The need for speed
by Aristotle (Chancellor) on Jan 23, 2003 at 23:50 UTC | |
|
Re: Re: The need for speed
by l2kashe (Deacon) on Jan 24, 2003 at 05:56 UTC | |
by sauoq (Abbot) on Jan 24, 2003 at 07:57 UTC |