Can anyone recommend the most efficient method of applying a large number of regular expression substitutions (> 100) to a relatively large number of input data lines (> 10_000_000)?
In a recent project at $work, I wrote a script to generate a report of logged problem occurrences. In this process, I have to simplify the data by removing variations that are "noise" relative to the issues I am looking at (such as a process ID, an amount of free memory when a memory use threshold is crossed, or a part of a message with more detail than needed in this report). Once I've broken the record into a few basic parts, I currently have a list of 100+ regexen, replacement strings, and order of application to use (currently stored in a DB table), and a simple for loop to apply them to each record:
I have the nagging feeling, however, that there is a more efficient way of dealing with this. (Also, this processing takes up the largest percentage of the processing time in the script, which often takes hours to run.) Any thoughts/suggestions?foreach my $regex (@conversions) { if ( $entry{$k} =~ s/$regex->{from}/$regex->{to}/g ) { $regex->{count}++; # Count of applications used to determine if # a particular substitution is warranted. } }
Thank you for your time and attention, and any direction you may provide.
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |