efficient text processing

sarutobi has asked for the wisdom of the Perl Monks concerning the following question:

hello, I a noob at perl, I wrote a small script to do the following: 1.) read log search for keyword error 2.) read error waiver file. 3.) decide log passed/failed. this is the main section of the code

%g_wvr_list = ();
%g_log_err = ();
@tmpArr = ();

#get errors from waiver list
open(IFP0,"<$opt_wvr_file");
@tmpArr=<IFP0>;
foreach $el (@tmpArr){
  chomp($el);
  $g_wvr_list{"$el"}=1;
}
close(IFP0);

#get errors from test log file
@tmpArr=`grep -iw error $opt_test_log`;
$errorCnt=$#tmpArr+1;

#pass /fail
$waived=0;
printDbg(@tmpArr);
foreach $key ( keys(%g_wvr_list) ){
  printDbg("wvr List $key");
  @matchErr = grep( /\Q$key\E/ , @tmpArr );
  printDbg("matchErrCnt {$#matchErr}");
  printDbg("matchErrCnt {@matchErr}");
  $waived+= @matchErr if( $#matchErr + 1 > 0 );
}
[download]

i wish to optimize this code specially the cross checking section. I will be handling log in sizes of 100 MB and error lines of 1000s . and my approach is slow. any idea on improvement is highly appreciated. thank you

Comment on efficient text processing Download Code

Replies are listed 'Best First'.
Re: efficient text processing by moritz (Cardinal) on Nov 12, 2008 at 19:42 UTC
If you read a file like this: `my @array = <HANDLE>` it will read the whole file into memory first, and then starts working with that. If the file is huge, much memory will be consumed. Instead you should write it this way: `while (my $line = <HANDLE>) { # do something with $line here }` [download] Also if each array item is counted only once, you can rewrite your hole last loop this way: `my $regex = join '\|', map quotemeta, keys %g_wvr_list; my $waived = 0; for (@tmpArr){ $waived++ if m/$regex/; }` [download]	[reply] [d/l] [select]
Re: efficient text processing by gone2015 (Deacon) on Nov 12, 2008 at 19:50 UTC
Some of the ideas in thread 723032 may assist.	[reply]