That would show you the limit on how quickly the file could be processed using perl.while (<>) { @_ = split; $wc += @_ } print "counted $wc white-space-delimited tokens\n";
In any case, I'd be more inclined to look for ways to economize on the amount of code being written to accomplish the task. One thing that might simplify the logic a lot would be to do record-oriented input (rather than line-oriented input):
(updated the "next unless" line, and added assignment to $NetName, as per OP)open (NETSTATS,"$input_file"); $/ = "\nnet '"; my @field_names = ( 'wire capacitance', 'wire resistance', 'number of loads', 'number of pins', 'total wire length' ); while (<NETSTATS>) { chomp; # removes $/ from end of string; warn "got record # $.\n" if ( $. % 50000 == 0 ); # this belongs o +n STDERR, IMO next unless (/^([^']+)'/); $NetName = $1; my %field_val = (); for my $field ( @field_names ) { if ( /$field:\s+([\d.]+)/ ) { $field_val{$field} = $1; } } # .... }
I'm not going to try reimplementing the whole thing, but just that little snippet should give you the basic idea of how I would go about it. The part I've shown replaces approximately the first 50 lines of code from the OP. As for the rest, instead of testing a bunch of distinct scalar variables in order to determine what to do with the record data, you are instead checking the keys and values of a hash, which can be done with less code.
Even if it ends up running a little slower than the original (though I doubt it would), there are other advantages in terms of clarity and maintainability of the code.
And with this sort of approach, it might be easier to find tricks that will speed it up -- e.g. the regex matches in the for loop might be quicker if done like this (because with each iteration, $_ becomes shorter, and the target string is near the beginning):
The main point is that by reading the data one whole record at a time, the logic becomes a lot easier (and might end up running faster, as well).for my $field ( @field_names ) { if ( s/.*?\s$field:\s+([\d.]+)//s ) { $field_val{$field} = $1; } }
In reply to Re: Looking for ways to speed up the parsing of a file...
by graff
in thread Looking for ways to speed up the parsing of a file...
by fiddler42
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |