Hi, All,

I need to parse some very large files that are formatted like so:-

net 'IR_REG_INST_INT[20]': dont_touch: FALSE pin capacitance: 0.00458335 wire capacitance: 0.00103955 total capacitance: 0.0056229 wire resistance: 0.0663061 number of drivers: 1 number of loads: 2 number of pins: 3 total wire length: 9.20 (Routed) X_length = 0.96, Y_length = 8.24 number of vias: 6 Connections for net 'IR_REG_INST_INT[20]': Driver Pins Type Pin Cap Pin Loc ------------ ---------------- -------- -------- U195/o Output Pin (invx20) 0.00162106 [1.12 409.8 +8] Load Pins Type Pin Cap Pin Loc ------------ ---------------- -------- -------- U196/c Input Pin (and3x10) 0.00161077 [1.131 401. +15] U1460/a Input Pin (or2x05) 0.00135152 [1.68 409.2 +2]

I have several million summaries like the above in one giant file (c. 6 Gig in size). I need to collect pertinent details from each "net" summary. Here is exactly what I am doing:-

open (NETSTATS,"$input_file"); $TotalNets = 0; while (<NETSTATS>) { if ($_ =~ /^net \'/) { ($NetName) = $_ =~ /^net \'(.*)'\:\s*$/; $c = 1; $TotalNets++; if (($TotalNets == 50000) || ($TotalNets == 100000) || ($Total +Nets == 250000) || ($TotalNets == 500000) || ($TotalNets == 1000000) +|| ($TotalNets == 1500000) || ($TotalNets == 2000000) || ($TotalNets +== 3000000)) { print ("Parsed $TotalNets nets...\n"); } do { if ($_ =~ /wire capacitance/) { if ($_ =~ /^\s+wire capacitance\:\s+\d.*\d\s*$/) { ($NetCapRaw) = $_ =~ /^\s+wire capacitance\:\s+(\d +.*\d)\s*$/; $NetCap = $CapMultiplier*$NetCapRaw; $c++; } else { $NetCap = "NaN"; } } if ($_ =~ /wire resistance/) { if ($_ =~ /^\s+wire resistance\:\s+\d.*\d\s*$/) { ($NetRes) = $_ =~ /^\s+wire resistance\:\s+(\d.*\d +)\s*$/; $c++; } else { $NetRes = "NaN"; } } if ($_ =~ /number of loads/) { if ($_ =~ /^\s+number of loads\:\s+\d+\s*$/) { ($NetFanout) = $_ =~ /^\s+number of loads\:\s+(\d+ +)\s*$/; $c++; } else { $NetFanout = "NaN"; } } if ($_ =~ /total wire length/) { if ($_ =~ /^\s+total wire length\:\s+\d.*\d\s*/) { ($NetLength) = $_ =~ /^\s+total wire length\:\s+(\ +d.*\d)\s*/; $c++; } else { $NetLength = "NaN"; } } $_ = <NETSTATS>; } until (($_ =~ /Driver Pins/) || ($_ eq "" )); if ($_ =~ /Driver Pins/) { $_ = <NETSTATS>; $_ = <NETSTATS>; $_ =~ s/^ *//; $_ =~ s/ *$//; @DriverLine = split (/\s+/,$_); $FirstDriver = $DriverLine[0]; $c++; } $AddToCustomTable = 0; if (($NetName ne "") && (($NetCap ne "") && ($NetCap ne "NaN") +) && (($NetRes ne "") && ($NetRes ne "NaN")) && (($NetFanout ne "") & +& ($NetFanout ne "NaN")) && (($NetLength ne "") && ($NetLength ne "Na +N")) && ($FirstDriver ne "") && ($c == 6)) { if ($NetFanout <= $UpperFanoutLimitOfTable) { if (($UseNetPattern == 0) && ($UseDriverCell == 0) && +($TopLevelOnly == 0)) { $AddToCustomTable = 1; } if (($UseNetPattern == 0) && ($UseDriverCell == 0) && +($TopLevelOnly == 1)) { $DriverForwardSlashCount = $FirstDriver =~ s/(\/)/ +$1/gs; # Simple command to count characters... $NetNameForwardSlashCount = $NetName =~ s/(\/)/$1/ +gs; if (($DriverForwardSlashCount == 0) && ($NetNameFo +rwardSlashCount == 0)) { $AddToCustomTable = 1; if ($DebugMode == 1) { print ("Adding net $NetName (driver = $Fir +stDriver)...\n"); print DEBUG_VERBOSE ("$NetFanout $NetRes\n +"); } } if (($DriverForwardSlashCount == 0) && ($NetNameFo +rwardSlashCount == 1)) { $AddToCustomTable = 1; if ($DebugMode == 1) { print ("Adding net $NetName (driver = $Fir +stDriver)...\n"); print DEBUG_VERBOSE ("$NetFanout $NetRes\n +"); } } if (($DriverForwardSlashCount == 1) && ($NetNameFo +rwardSlashCount == 0)) { $AddToCustomTable = 1; if ($DebugMode == 1) { print ("Adding net $NetName (driver = $Fir +stDriver)...\n"); print DEBUG_VERBOSE ("$NetFanout $NetRes\n +"); } } if (($DriverForwardSlashCount == 1) && ($NetNameFo +rwardSlashCount == 1)) { $AddToCustomTable = 1; if ($DebugMode == 1) { print ("Adding net $NetName (driver = $Fir +stDriver)...\n"); print DEBUG_VERBOSE ("$NetFanout $NetRes\n +"); } } } if (($UseNetPattern == 0) && ($UseDriverCell == 1) && +($TopLevelOnly == 0)) { if ($FirstDriver =~ qr/$DriverPattern/x) { # to re +gard variable as a regular expression... $AddToCustomTable = 1; } } #if (($UseNetPattern == 0) && ($UseDriverCell == 1) && + ($TopLevelOnly == 1)) { # This condition not allowed per input argument pa +rsing... #} if (($UseNetPattern == 1) && ($UseDriverCell == 0) && +($TopLevelOnly == 0)) { if ($NetName =~ qr/$NetPattern/x) { $AddToCustomTable = 1; } } #if (($UseNetPattern == 1) && ($UseDriverCell == 0) && + ($TopLevelOnly == 1)) { # This condition not allowed per input argument pa +rsing... #} if (($UseNetPattern == 1) && ($UseDriverCell == 1) && +($TopLevelOnly == 0)) { if ($NetName =~ qr/$NetPattern/x) { $AddToCustomTable = 1; } elsif ($FirstDriver =~ qr/$DriverPattern/x) { $AddToCustomTable = 1; } } #if (($UseNetPattern == 1) && ($UseDriverCell == 1) && + ($TopLevelOnly == 1)) { # This condition not allowed per input argument pa +rsing... #} } if ($AddToCustomTable == 1) { #use constant (p_name => 0, p_cap => 1, p_res => 2, p_ +length => 3, p_driver => 4); push @{$NetStats[ $NetFanout ] ||= []}, [ $NetName, $N +etCap, $NetRes, $NetLength, $FirstDriver ]; } } else { if ($DebugMode == 1) { print DEBUG_VERBOSE ("ERROR: Problem deriving stats fo +r net $NetName!\n"); print DEBUG_VERBOSE ("ERROR: c=$c NetName=$NetName Net +Fanout=$NetFanout NetCap=$NetCap NetRes=$NetRes NetLength=$NetLength +FirstDriver=$FirstDriver\n\n"); } } } $NetName = ""; $NetCap = "NaN"; $NetRes = "NaN"; $NetFanout = "NaN"; $NetLength = "NaN"; $FirstDriver = ""; } print ("Parsed $TotalNets nets...\n\n"); close (NETSTATS);

...but I would really like to speed up the parsing of the input file. Does anyone have any general suggestions? It currently takes about 10 minutes to process the file, and I would like to pull that in a little. I know I am kinda splitting hairs here (10 minutes is reasonable, after all), but, again, any little improvement here and there would be much appreciated.

Thanks,

Fiddler42


In reply to Looking for ways to speed up the parsing of a file... by fiddler42

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.