Re: Fast file parsing

It's faster to use index() when looking for exact matches. For example, I would change your first section to this:

    # Only care about comments
    next unless ( index( $_, '%%' ) == 0 );
    # Duplex or no...
    if (( index( $_, '%%' ) == 0 && index( $_, 'Duplex Duplex' > -1 ))
          ||
      /^[&l1S/i
          ||
      /^[&l2S/i
          ||
      index( $_, 'DUPLEX=ON') > -1 ) {
        $pagestate = 'Duplex';
    }
[download]

You should make sure that the most common situations are the ones that come first in that list of || conditions. Also, for large files it can be helpful to read in chunks bigger than one line. I found this node sped things up a lot when I was doing something similar.

Comment on Re: Fast file parsing Download Code

Replies are listed 'Best First'.
Re: Re: Fast file parsing by hv (Prior) on Mar 09, 2004 at 17:43 UTC
It's faster to use index() when looking for exact matches. That ain't necessarily so, particularly when anchored: `/^%%/` only needs to check the start of the string, whereas `index($_, '%%')` needs to scan to the first match, possibly through the entire string. Update - of course using `substr` and `eq` is a much more obvious way to check this, and I've added some options to the code below to reflect that./Update Try this: #!/usr/bin/perl -w use Benchmark qw/ cmpthese /; my $count = shift; our $a = '%%'; our $b = ' ' x 10000; cmpthese($count, { first_re => q{ $match = "$a$b" =~ /^%%/ }, last_re => q{ $match = "$b$a" =~ /^%%/ }, miss_re => q{ $match = "$b" =~ /^%%/ }, first_index => q{ $match = index("$a$b", "%%") == 0}, last_index => q{ $match = index("$b$a", "%%") == 0}, miss_index => q{ $match = index("$b" , "%%") == 0}, first_substr => q{ $match = substr("$a$b", 0, 2) eq '%%' }, last_substr => q{ $match = substr("$b$a", 0, 2) eq '%%' }, miss_substr => q{ $match = substr("$b" , 0, 2) eq '%%' }, }) [download] Hugo	[reply] [d/l] [select]
Re: Re: Re: Fast file parsing by perrin (Chancellor) on Mar 09, 2004 at 18:15 UTC
Good catch! The anchor makes all the difference, and the regex even beats substr in this case when I run it on my machine. So, index is better than regex for unanchored matches but much worse for anchored ones.	[reply]