snra_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I am developing a Log processing tool.Thought Perl would provide me the best solution for text processing as compared to Java.

Here is my question,

I have a Hash map containing 100 key/value pairs.

Need to parse a huge file containing some 65000 lines, compare each line against a value of the Hash map and find the key if the value matches with the line.
Code snippet as follows,
$start = time(); TEST1: while (<FILEHANDLE>){ $count = 0; foreach $value5(values %msgDefn) { $count ++; if ($_ =~/($value5)/){ print "Match found in $count iterations"; print $_; next TEST1; } } } $end = time(); print "Time taken was ", ($end - $start), " seconds";
It takes at an average of 30 seconds for file

.I need to do it in a optimized way , since parsing thousands of similar file would take hours for processing.

Is there anyway using mechanisms the processing time can be reduced.

Thanks a lot.

Replies are listed 'Best First'.
Re: Parse a huge file and match the lines against a hash entry
by ig (Vicar) on Jul 26, 2009 at 02:43 UTC

    If you have to find the key in your hash given a value, then perhaps your hash is the wrong way around.

    As long as your values are valid as hash keys, you can easily swap keys and values with

    my %reversed = reverse %hash;

    after which you can find the key corresponding to a value in your original hash, as follows:

    my $value = 'whatever'; my $key = $reversed{$value};
Re: Parse a huge file and match the lines against a hash entry
by graff (Chancellor) on Jul 26, 2009 at 03:39 UTC
    As indicated by the first reply, a single regex consisting of 100 values as alternates is not such a big load, really. And you don't even need a special module to do it this way:
    my $value_regex = join( '|', values %msgDefn ); # actually, use anony +monk's version below... while ( <FILEHANDLE> ) { print if ( /$value_regex/ ); }
    That assumes that the values in your hash are all "safe", in the sense that they don't contain any regex magic characters, like brackets, *, ?, +, period, slash, backslash, and so on.

    If the values might contain things of that sort, you could handle it like this (but YMMV, depending on what's really in your data):

    my $value_regex = join( '|', map { '\Q'.$_.'\E' } values %msgDefn );
    Now, if you ultimately need to know which hash key contains the value that actually matched a given line from the file, then you'd really want to build a reverse hash, as suggested in the 2nd reply.

    Update (forgot to mention): Naturally, lots of other caveats apply, such as false-alarm matches on substrings (e.g. a value like "table", treated as above or as in the OP, would match on a line that contains "stable" or "tablet", which might not be what you want.

      join '|', map quotemeta, values
      Thanks everyone...
      It helped me a lot. The processing time reduced from 25 seconds to 1 second!!!!.
Re: Parse a huge file and match the lines against a hash entry
by Anonymous Monk on Jul 26, 2009 at 02:28 UTC
    Try
    use Regex::PreSuf; my $bigregex = presuf( values %msgDefn ); my $start = time(); while( <FILEHANDLE> ){ print if /$bigregex/; } my $end = time(); print "Time taken was ", ($end - $start), " seconds";