in reply to Fast file parsing

It's faster to use index() when looking for exact matches. For example, I would change your first section to this:
# Only care about comments next unless ( index( $_, '%%' ) == 0 ); # Duplex or no... if (( index( $_, '%%' ) == 0 && index( $_, 'Duplex Duplex' > -1 )) || /^[&l1S/i || /^[&l2S/i || index( $_, 'DUPLEX=ON') > -1 ) { $pagestate = 'Duplex'; }
You should make sure that the most common situations are the ones that come first in that list of || conditions. Also, for large files it can be helpful to read in chunks bigger than one line. I found this node sped things up a lot when I was doing something similar.

Replies are listed 'Best First'.
Re: Re: Fast file parsing
by hv (Prior) on Mar 09, 2004 at 17:43 UTC

    It's faster to use index() when looking for exact matches.

    That ain't necessarily so, particularly when anchored: /^%%/ only needs to check the start of the string, whereas index($_, '%%') needs to scan to the first match, possibly through the entire string.

    Update - of course using substr and eq is a much more obvious way to check this, and I've added some options to the code below to reflect that./Update

    Try this:

    #!/usr/bin/perl -w use Benchmark qw/ cmpthese /; my $count = shift; our $a = '%%'; our $b = ' ' x 10000; cmpthese($count, { first_re => q{ $match = "$a$b" =~ /^%%/ }, last_re => q{ $match = "$b$a" =~ /^%%/ }, miss_re => q{ $match = "$b" =~ /^%%/ }, first_index => q{ $match = index("$a$b", "%%") == 0}, last_index => q{ $match = index("$b$a", "%%") == 0}, miss_index => q{ $match = index("$b" , "%%") == 0}, first_substr => q{ $match = substr("$a$b", 0, 2) eq '%%' }, last_substr => q{ $match = substr("$b$a", 0, 2) eq '%%' }, miss_substr => q{ $match = substr("$b" , 0, 2) eq '%%' }, })

    Hugo

      Good catch! The anchor makes all the difference, and the regex even beats substr in this case when I run it on my machine. So, index is better than regex for unanchored matches but much worse for anchored ones.