Parse a huge file and match the lines against a hash entry

snra_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I am developing a Log processing tool.Thought Perl would provide me the best solution for text processing as compared to Java.

Here is my question,

I have a Hash map containing 100 key/value pairs.

Need to parse a huge file containing some 65000 lines, compare each line against a value of the Hash map and find the key if the value matches with the line.
Code snippet as follows,


$start = time();
TEST1: while (<FILEHANDLE>){
        $count = 0;
    
    foreach  $value5(values %msgDefn) {
            $count ++;

          if ($_ =~/($value5)/){

              print "Match found in $count iterations";
              print  $_;
              next TEST1;
         }
         
       }
       
}
$end = time();
print "Time taken was ", ($end - $start), " seconds";
[download]

It takes at an average of 30 seconds for file

.I need to do it in a optimized way , since parsing thousands of similar file would take hours for processing.

Is there anyway using mechanisms the processing time can be reduced.

Thanks a lot.

Comment on Parse a huge file and match the lines against a hash entry Download Code

Replies are listed 'Best First'.
Re: Parse a huge file and match the lines against a hash entry by ig (Vicar) on Jul 26, 2009 at 02:43 UTC
If you have to find the key in your hash given a value, then perhaps your hash is the wrong way around. As long as your values are valid as hash keys, you can easily swap keys and values with `my %reversed = reverse %hash;` [download] after which you can find the key corresponding to a value in your original hash, as follows: `my $value = 'whatever'; my $key = $reversed{$value};` [download]	[reply] [d/l] [select]
Re: Parse a huge file and match the lines against a hash entry by graff (Chancellor) on Jul 26, 2009 at 03:39 UTC
As indicated by the first reply, a single regex consisting of 100 values as alternates is not such a big load, really. And you don't even need a special module to do it this way: `my $value_regex = join( '\|', values %msgDefn ); # actually, use anony +monk's version below... while ( <FILEHANDLE> ) { print if ( /$value_regex/ ); }` [download] That assumes that the values in your hash are all "safe", in the sense that they don't contain any regex magic characters, like brackets, *, ?, +, period, slash, backslash, and so on. If the values might contain things of that sort, you could handle it like this (but YMMV, depending on what's really in your data): `my $value_regex = join( '\|', map { '\Q'.$_.'\E' } values %msgDefn );` [download] Now, if you ultimately need to know which hash key contains the value that actually matched a given line from the file, then you'd really want to build a reverse hash, as suggested in the 2nd reply. Update (forgot to mention): Naturally, lots of other caveats apply, such as false-alarm matches on substrings (e.g. a value like "table", treated as above or as in the OP, would match on a line that contains "stable" or "tablet", which might not be what you want.	[reply] [d/l] [select]
Re^2: Parse a huge file and match the lines against a hash entry by Anonymous Monk on Jul 26, 2009 at 03:45 UTC
`join '\|', map quotemeta, values`	[reply] [d/l]
Re^2: Parse a huge file and match the lines against a hash entry by snra_perl (Acolyte) on Jul 26, 2009 at 06:44 UTC
Thanks everyone... It helped me a lot. The processing time reduced from 25 seconds to 1 second!!!!.	[reply]
Re: Parse a huge file and match the lines against a hash entry by Anonymous Monk on Jul 26, 2009 at 02:28 UTC
Try `use Regex::PreSuf; my $bigregex = presuf( values %msgDefn ); my $start = time(); while( <FILEHANDLE> ){ print if /$bigregex/; } my $end = time(); print "Time taken was ", ($end - $start), " seconds";` [download]	[reply] [d/l]