I need to parse some very large files that are formatted like so:-
net 'IR_REG_INST_INT[20]': dont_touch: FALSE pin capacitance: 0.00458335 wire capacitance: 0.00103955 total capacitance: 0.0056229 wire resistance: 0.0663061 number of drivers: 1 number of loads: 2 number of pins: 3 total wire length: 9.20 (Routed) X_length = 0.96, Y_length = 8.24 number of vias: 6 Connections for net 'IR_REG_INST_INT[20]': Driver Pins Type Pin Cap Pin Loc ------------ ---------------- -------- -------- U195/o Output Pin (invx20) 0.00162106 [1.12 409.8 +8] Load Pins Type Pin Cap Pin Loc ------------ ---------------- -------- -------- U196/c Input Pin (and3x10) 0.00161077 [1.131 401. +15] U1460/a Input Pin (or2x05) 0.00135152 [1.68 409.2 +2]
I have several million summaries like the above in one giant file (c. 6 Gig in size). I need to collect pertinent details from each "net" summary. Here is exactly what I am doing:-
open (NETSTATS,"$input_file"); $TotalNets = 0; while (<NETSTATS>) { if ($_ =~ /^net \'/) { ($NetName) = $_ =~ /^net \'(.*)'\:\s*$/; $c = 1; $TotalNets++; if (($TotalNets == 50000) || ($TotalNets == 100000) || ($Total +Nets == 250000) || ($TotalNets == 500000) || ($TotalNets == 1000000) +|| ($TotalNets == 1500000) || ($TotalNets == 2000000) || ($TotalNets +== 3000000)) { print ("Parsed $TotalNets nets...\n"); } do { if ($_ =~ /wire capacitance/) { if ($_ =~ /^\s+wire capacitance\:\s+\d.*\d\s*$/) { ($NetCapRaw) = $_ =~ /^\s+wire capacitance\:\s+(\d +.*\d)\s*$/; $NetCap = $CapMultiplier*$NetCapRaw; $c++; } else { $NetCap = "NaN"; } } if ($_ =~ /wire resistance/) { if ($_ =~ /^\s+wire resistance\:\s+\d.*\d\s*$/) { ($NetRes) = $_ =~ /^\s+wire resistance\:\s+(\d.*\d +)\s*$/; $c++; } else { $NetRes = "NaN"; } } if ($_ =~ /number of loads/) { if ($_ =~ /^\s+number of loads\:\s+\d+\s*$/) { ($NetFanout) = $_ =~ /^\s+number of loads\:\s+(\d+ +)\s*$/; $c++; } else { $NetFanout = "NaN"; } } if ($_ =~ /total wire length/) { if ($_ =~ /^\s+total wire length\:\s+\d.*\d\s*/) { ($NetLength) = $_ =~ /^\s+total wire length\:\s+(\ +d.*\d)\s*/; $c++; } else { $NetLength = "NaN"; } } $_ = <NETSTATS>; } until (($_ =~ /Driver Pins/) || ($_ eq "" )); if ($_ =~ /Driver Pins/) { $_ = <NETSTATS>; $_ = <NETSTATS>; $_ =~ s/^ *//; $_ =~ s/ *$//; @DriverLine = split (/\s+/,$_); $FirstDriver = $DriverLine[0]; $c++; } $AddToCustomTable = 0; if (($NetName ne "") && (($NetCap ne "") && ($NetCap ne "NaN") +) && (($NetRes ne "") && ($NetRes ne "NaN")) && (($NetFanout ne "") & +& ($NetFanout ne "NaN")) && (($NetLength ne "") && ($NetLength ne "Na +N")) && ($FirstDriver ne "") && ($c == 6)) { if ($NetFanout <= $UpperFanoutLimitOfTable) { if (($UseNetPattern == 0) && ($UseDriverCell == 0) && +($TopLevelOnly == 0)) { $AddToCustomTable = 1; } if (($UseNetPattern == 0) && ($UseDriverCell == 0) && +($TopLevelOnly == 1)) { $DriverForwardSlashCount = $FirstDriver =~ s/(\/)/ +$1/gs; # Simple command to count characters... $NetNameForwardSlashCount = $NetName =~ s/(\/)/$1/ +gs; if (($DriverForwardSlashCount == 0) && ($NetNameFo +rwardSlashCount == 0)) { $AddToCustomTable = 1; if ($DebugMode == 1) { print ("Adding net $NetName (driver = $Fir +stDriver)...\n"); print DEBUG_VERBOSE ("$NetFanout $NetRes\n +"); } } if (($DriverForwardSlashCount == 0) && ($NetNameFo +rwardSlashCount == 1)) { $AddToCustomTable = 1; if ($DebugMode == 1) { print ("Adding net $NetName (driver = $Fir +stDriver)...\n"); print DEBUG_VERBOSE ("$NetFanout $NetRes\n +"); } } if (($DriverForwardSlashCount == 1) && ($NetNameFo +rwardSlashCount == 0)) { $AddToCustomTable = 1; if ($DebugMode == 1) { print ("Adding net $NetName (driver = $Fir +stDriver)...\n"); print DEBUG_VERBOSE ("$NetFanout $NetRes\n +"); } } if (($DriverForwardSlashCount == 1) && ($NetNameFo +rwardSlashCount == 1)) { $AddToCustomTable = 1; if ($DebugMode == 1) { print ("Adding net $NetName (driver = $Fir +stDriver)...\n"); print DEBUG_VERBOSE ("$NetFanout $NetRes\n +"); } } } if (($UseNetPattern == 0) && ($UseDriverCell == 1) && +($TopLevelOnly == 0)) { if ($FirstDriver =~ qr/$DriverPattern/x) { # to re +gard variable as a regular expression... $AddToCustomTable = 1; } } #if (($UseNetPattern == 0) && ($UseDriverCell == 1) && + ($TopLevelOnly == 1)) { # This condition not allowed per input argument pa +rsing... #} if (($UseNetPattern == 1) && ($UseDriverCell == 0) && +($TopLevelOnly == 0)) { if ($NetName =~ qr/$NetPattern/x) { $AddToCustomTable = 1; } } #if (($UseNetPattern == 1) && ($UseDriverCell == 0) && + ($TopLevelOnly == 1)) { # This condition not allowed per input argument pa +rsing... #} if (($UseNetPattern == 1) && ($UseDriverCell == 1) && +($TopLevelOnly == 0)) { if ($NetName =~ qr/$NetPattern/x) { $AddToCustomTable = 1; } elsif ($FirstDriver =~ qr/$DriverPattern/x) { $AddToCustomTable = 1; } } #if (($UseNetPattern == 1) && ($UseDriverCell == 1) && + ($TopLevelOnly == 1)) { # This condition not allowed per input argument pa +rsing... #} } if ($AddToCustomTable == 1) { #use constant (p_name => 0, p_cap => 1, p_res => 2, p_ +length => 3, p_driver => 4); push @{$NetStats[ $NetFanout ] ||= []}, [ $NetName, $N +etCap, $NetRes, $NetLength, $FirstDriver ]; } } else { if ($DebugMode == 1) { print DEBUG_VERBOSE ("ERROR: Problem deriving stats fo +r net $NetName!\n"); print DEBUG_VERBOSE ("ERROR: c=$c NetName=$NetName Net +Fanout=$NetFanout NetCap=$NetCap NetRes=$NetRes NetLength=$NetLength +FirstDriver=$FirstDriver\n\n"); } } } $NetName = ""; $NetCap = "NaN"; $NetRes = "NaN"; $NetFanout = "NaN"; $NetLength = "NaN"; $FirstDriver = ""; } print ("Parsed $TotalNets nets...\n\n"); close (NETSTATS);
...but I would really like to speed up the parsing of the input file. Does anyone have any general suggestions? It currently takes about 10 minutes to process the file, and I would like to pull that in a little. I know I am kinda splitting hairs here (10 minutes is reasonable, after all), but, again, any little improvement here and there would be much appreciated.
Thanks,
Fiddler42
In reply to Looking for ways to speed up the parsing of a file... by fiddler42
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |