Even if you can't load the entire log into memory, loading it in chunks should speed things up.
1. You can take the hash with the entries and compact a few thousand of them (or however much memory you want to use). OR you can grab a chunk of the log data from a file with newlines, and read to the next line <IN>.
2. Run search/replace for each regex over the entire buffer, by evaluating the s/foo/bar/g in list context. It can still be found how many replacements were done for each regex.
Code:
Get a chunk of data:
my $bufflen = 4 * 1024; #or w/e do { $result = read ( IN, $buffer, $bufflen-length($buffer), length($buffer) ); } while ( $result && ( length($buffer) < $bufflen ) );
If the chunk ends in the middle of a line, strip off the remainder and save it for the next chunk: (not needed if each entry is separated beforehand)
__EDIT__: Or instead of the above you could just do readline like BrowserUk did. D'oh!my $newline = "\n"; #or some other unique record separator ## if we're not at eof if ($result > 0) { my $last_newline = rindex $buffer, $newline; my $remainderlen = length($buffer)-$last_newline-length($newline); if ($remainderlen <= 0) { $remainder = ''; } else { $remainder = substr($buffer, $last_newline+length($newline), $remainderlen, ''); } } ## this is important: prefix the remainder before next chunk $buffer = $remainder;
Then apply your regexes: (and count how many replacements were done)$buffer .= <IN>;
foreach my $regex (@conversions) { my @results = ( $buffer =~ s/$regex->{from}/$regex->{to}/g ); my $reps_done = 0; grep { $reps_done += $_ } @results; $regex->{count} += $reps_done; } ## and do whatever with the result print OUT $buffer;
The above would be inside a block which loops over each chunk until the end of the log file is reached
In reply to Re: Recommendations for efficient data reduction/substitution application
by ipherian
in thread Recommendations for efficient data reduction/substitution application
by atcroft
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |