in reply to Looking for ways to speed up the parsing of a file...
This:
should be done in parallel, ie. writing the current net to a fifo or shared memory; then display the totals with another process. Inside the read loop only do those tasks specifically necessary to processing the net records. Alternately, read the file N lines at a time:if (($TotalNets == 50000) || ($TotalNets == 100000) || ($TotalNets == 250000) || ($TotalNets == 500000) || ($TotalNets == 1000000) || ($TotalNets == 1500000) || ($TotalNets == 2000000) || ($TotalNets == 3000000)) {
do { for (0..N) { if ( my $line = <FH>) { ... do stuff here ... } else { last; } } print "$Some_Total"; } until (eof );
You're processing every token three times here:
Replace the token on the first pass or capture the remainder of the string and pass it to another regex.if ($_ =~ /wire capacitance/) { if ($_ =~ /^\s+wire capacitance\:\s+\d.*\d\s*$/) { ($NetCapRaw) = $_ =~ /^\s+wire capacitance\:\s+(\d +.*\d)\s*$/;
Actually I like the idea of tokenizing the whole file in a multi-pass interpreter; tokenize the file first replacing each token with a code-ref and each constant with an object that returns a constant. then execute the resulting file.
What does this do?
There are four copies of this and they all just set $AddToCustomTable to the same value. Isn't the following the same thing?if (($DriverForwardSlashCount == 0) && ($NetNameForwardSlashCount == +0)) { $AddToCustomTable = 1;
$AddToCustomTable = 1 if ($DriverForwardSlashCount | $NetNameForwardSl +ashCount <= 1 );
There are two time eaters in the code; reading the file and executing the regexes. I would try to separate those. Read the file in and split the fields, generating a hash of tokens and data ( note this is similar to the parsing idea above.) then process the hash for your data. This would seem like extra work but often when you refactor the code like this you see optimizations you wouldn't see with the code all in one mashup like it is.
|
|---|