Just in case you are looking for a way to slow it down... :-)

I find that breaking these sort of jobs down into smaller tasks can offer benefits that out-weigh any speed penalties.

For instance you could consider doing the 'extracting' and the 'reporting' separately. First load each 'record' into a hash and then worry about what to do with it.

Long winded loops, complex ifs and elses and tricky 'bulk' regexes often leads, imo, to code that is difficult to write, read and maintain. And if the spec changes...

Building on the suggestions already this does the parsing.

#!/usr/local/bin/perl use strict; use warnings; use Data::Dumper; $Data::Dumper::Indent = 1; $Data::Dumper::Sortkeys = 1; $/ = "\nnet '"; while (my $record = <DATA>){ my %db_record = process_record($record); # write_record(%db_record); print Dumper \%db_record; } sub process_record { my ($record) = @_; my @parts = split /Connections for net/, $record; my ($net_id, $part_one_details) = split /\s*:\s*/, $parts[0], 2; my (undef, $part_two_details) = split /\s*:\s*/, $parts[1], 2;; my $part_one = process_part_one($part_one_details); my $part_two = process_part_two($part_two_details); my %db; $db{$net_id}{part_one} = $part_one; $db{$net_id}{part_two} = $part_two; return %db; } sub process_part_one { my ($part) = @_; $part =~ s/(total wire length:.*?)\n/$1/; my %flds = trim(split(/:|\n/, $part)); return {%flds}; } sub process_part_two { my ($part) = @_; #my (undef, $data) = split /:/, $part; my @lines = split /\n/, $part; my ($spec_line, $id, $pin, %rec); for my $line (@lines){ next unless $line; next if $line =~ /------/; last if $line =~ /net '/; #' if ($line =~ /(Driver|Load)/){ $spec_line = 1; $pin = $1; next; } my @flds = grep{$_} split /\s{2,}/, $line; if ($spec_line == 1){ $spec_line++; $id = trim($flds[0]); push @{$rec{$pin}{$id}}, trim($flds[1]); } else{ push @{$rec{$pin}{$id}}, trim(@flds); $spec_line = 1; } } return \%rec; } sub write_record{ my (%db_record) = @_; # do something with data in %db_record return; } sub trim { my @copy = @_; for (@copy){ s/^\s+//; s/\s+$//; s/\s+/ /g; } return wantarray ? @copy : pop @copy; } __DATA__ net 'IR_REG_INST_INT[20]': dont_touch: FALSE pin capacitance: 0.00458335 wire capacitance: 0.00103955 total capacitance: 0.0056229 wire resistance: 0.0663061 number of drivers: 1 number of loads: 2 number of pins: 3 total wire length: 9.20 (Routed) X_length = 0.96, Y_length = 8.24 number of vias: 6 Connections for net 'IR_REG_INST_INT[20]': Driver Pins Type Pin Cap Pin Loc ------------ ---------------- -------- -------- U195/o Output Pin (invx20) 0.00162106 [1.12 409.8 +8] Load Pins Type Pin Cap Pin Loc ------------ ---------------- -------- -------- U196/c Input Pin (and3x10) 0.00161077 [1.131 401. +15] U1460/a Input Pin (or2x05) 0.00135152 [1.68 409.2 +2] net 'IR_REG_INST_INT[30]': dont_touch: FALSE pin capacitance: 0.00458335 wire capacitance: 0.00103955 total capacitance: 0.0056229 wire resistance: 0.0663061 number of drivers: 1 number of loads: 2 number of pins: 3 total wire length: 9.20 (Routed) X_length = 0.96, Y_length = 8.24 number of vias: 6 Connections for net 'IR_REG_INST_INT[30]': Driver Pins Type Pin Cap Pin Loc ------------ ---------------- -------- -------- U195/o Output Pin (invx20) 0.00162106 [1.12 409.8 +8] Load Pins Type Pin Cap Pin Loc ------------ ---------------- -------- -------- U196/c Input Pin (and3x10) 0.00161077 [1.131 401. +15] U1460/a Input Pin (or2x05) 0.00135152 [1.68 409.2 +2]
output
$VAR1 = { 'net \'IR_REG_INST_INT[20]\'' => { 'part_one' => { 'dont_touch' => 'FALSE', 'number of drivers' => '1', 'number of loads' => '2', 'number of pins' => '3', 'number of vias' => '6', 'pin capacitance' => '0.00458335', 'total capacitance' => '0.0056229', 'total wire length' => '9.20 (Routed) X_length = 0.96, Y_length += 8.24', 'wire capacitance' => '0.00103955', 'wire resistance' => '0.0663061' }, 'part_two' => { 'Driver' => { 'U195/o' => [ 'Output Pin (invx20)', '0.00162106', '[1.12 409.88]' ] }, 'Load' => { 'U1460/a' => [ 'Input Pin (or2x05)', '0.00135152', '[1.68 409.22]' ], 'U196/c' => [ 'Input Pin (and3x10)', '0.00161077', '[1.131 401.15]' ] } } } }; $VAR1 = { 'IR_REG_INST_INT[30]\'' => { 'part_one' => { 'dont_touch' => 'FALSE', 'number of drivers' => '1', 'number of loads' => '2', 'number of pins' => '3', 'number of vias' => '6', 'pin capacitance' => '0.00458335', 'total capacitance' => '0.0056229', 'total wire length' => '9.20 (Routed) X_length = 0.96, Y_length += 8.24', 'wire capacitance' => '0.00103955', 'wire resistance' => '0.0663061' }, 'part_two' => { 'Driver' => { 'U195/o' => [ 'Output Pin (invx20)', '0.00162106', '[1.12 409.88]' ] }, 'Load' => { 'U1460/a' => [ 'Input Pin (or2x05)', '0.00135152', '[1.68 409.22]' ], 'U196/c' => [ 'Input Pin (and3x10)', '0.00161077', '[1.131 401.15]' ] } } } };
There is no error checking or validation but I believe it would be easier to do that with this approach rather than with the "one big loop" method.

update
Redundant line at the begining of process_part_two sub commented out.


In reply to Re: Looking for ways to speed up the parsing of a file... by wfsp
in thread Looking for ways to speed up the parsing of a file... by fiddler42

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.