in reply to Looking for ways to speed up the parsing of a file...

Just in case you are looking for a way to slow it down... :-)

I find that breaking these sort of jobs down into smaller tasks can offer benefits that out-weigh any speed penalties.

For instance you could consider doing the 'extracting' and the 'reporting' separately. First load each 'record' into a hash and then worry about what to do with it.

Long winded loops, complex ifs and elses and tricky 'bulk' regexes often leads, imo, to code that is difficult to write, read and maintain. And if the spec changes...

Building on the suggestions already this does the parsing.

#!/usr/local/bin/perl use strict; use warnings; use Data::Dumper; $Data::Dumper::Indent = 1; $Data::Dumper::Sortkeys = 1; $/ = "\nnet '"; while (my $record = <DATA>){ my %db_record = process_record($record); # write_record(%db_record); print Dumper \%db_record; } sub process_record { my ($record) = @_; my @parts = split /Connections for net/, $record; my ($net_id, $part_one_details) = split /\s*:\s*/, $parts[0], 2; my (undef, $part_two_details) = split /\s*:\s*/, $parts[1], 2;; my $part_one = process_part_one($part_one_details); my $part_two = process_part_two($part_two_details); my %db; $db{$net_id}{part_one} = $part_one; $db{$net_id}{part_two} = $part_two; return %db; } sub process_part_one { my ($part) = @_; $part =~ s/(total wire length:.*?)\n/$1/; my %flds = trim(split(/:|\n/, $part)); return {%flds}; } sub process_part_two { my ($part) = @_; #my (undef, $data) = split /:/, $part; my @lines = split /\n/, $part; my ($spec_line, $id, $pin, %rec); for my $line (@lines){ next unless $line; next if $line =~ /------/; last if $line =~ /net '/; #' if ($line =~ /(Driver|Load)/){ $spec_line = 1; $pin = $1; next; } my @flds = grep{$_} split /\s{2,}/, $line; if ($spec_line == 1){ $spec_line++; $id = trim($flds[0]); push @{$rec{$pin}{$id}}, trim($flds[1]); } else{ push @{$rec{$pin}{$id}}, trim(@flds); $spec_line = 1; } } return \%rec; } sub write_record{ my (%db_record) = @_; # do something with data in %db_record return; } sub trim { my @copy = @_; for (@copy){ s/^\s+//; s/\s+$//; s/\s+/ /g; } return wantarray ? @copy : pop @copy; } __DATA__ net 'IR_REG_INST_INT[20]': dont_touch: FALSE pin capacitance: 0.00458335 wire capacitance: 0.00103955 total capacitance: 0.0056229 wire resistance: 0.0663061 number of drivers: 1 number of loads: 2 number of pins: 3 total wire length: 9.20 (Routed) X_length = 0.96, Y_length = 8.24 number of vias: 6 Connections for net 'IR_REG_INST_INT[20]': Driver Pins Type Pin Cap Pin Loc ------------ ---------------- -------- -------- U195/o Output Pin (invx20) 0.00162106 [1.12 409.8 +8] Load Pins Type Pin Cap Pin Loc ------------ ---------------- -------- -------- U196/c Input Pin (and3x10) 0.00161077 [1.131 401. +15] U1460/a Input Pin (or2x05) 0.00135152 [1.68 409.2 +2] net 'IR_REG_INST_INT[30]': dont_touch: FALSE pin capacitance: 0.00458335 wire capacitance: 0.00103955 total capacitance: 0.0056229 wire resistance: 0.0663061 number of drivers: 1 number of loads: 2 number of pins: 3 total wire length: 9.20 (Routed) X_length = 0.96, Y_length = 8.24 number of vias: 6 Connections for net 'IR_REG_INST_INT[30]': Driver Pins Type Pin Cap Pin Loc ------------ ---------------- -------- -------- U195/o Output Pin (invx20) 0.00162106 [1.12 409.8 +8] Load Pins Type Pin Cap Pin Loc ------------ ---------------- -------- -------- U196/c Input Pin (and3x10) 0.00161077 [1.131 401. +15] U1460/a Input Pin (or2x05) 0.00135152 [1.68 409.2 +2]
output
$VAR1 = { 'net \'IR_REG_INST_INT[20]\'' => { 'part_one' => { 'dont_touch' => 'FALSE', 'number of drivers' => '1', 'number of loads' => '2', 'number of pins' => '3', 'number of vias' => '6', 'pin capacitance' => '0.00458335', 'total capacitance' => '0.0056229', 'total wire length' => '9.20 (Routed) X_length = 0.96, Y_length += 8.24', 'wire capacitance' => '0.00103955', 'wire resistance' => '0.0663061' }, 'part_two' => { 'Driver' => { 'U195/o' => [ 'Output Pin (invx20)', '0.00162106', '[1.12 409.88]' ] }, 'Load' => { 'U1460/a' => [ 'Input Pin (or2x05)', '0.00135152', '[1.68 409.22]' ], 'U196/c' => [ 'Input Pin (and3x10)', '0.00161077', '[1.131 401.15]' ] } } } }; $VAR1 = { 'IR_REG_INST_INT[30]\'' => { 'part_one' => { 'dont_touch' => 'FALSE', 'number of drivers' => '1', 'number of loads' => '2', 'number of pins' => '3', 'number of vias' => '6', 'pin capacitance' => '0.00458335', 'total capacitance' => '0.0056229', 'total wire length' => '9.20 (Routed) X_length = 0.96, Y_length += 8.24', 'wire capacitance' => '0.00103955', 'wire resistance' => '0.0663061' }, 'part_two' => { 'Driver' => { 'U195/o' => [ 'Output Pin (invx20)', '0.00162106', '[1.12 409.88]' ] }, 'Load' => { 'U1460/a' => [ 'Input Pin (or2x05)', '0.00135152', '[1.68 409.22]' ], 'U196/c' => [ 'Input Pin (and3x10)', '0.00161077', '[1.131 401.15]' ] } } } };
There is no error checking or validation but I believe it would be easier to do that with this approach rather than with the "one big loop" method.

update
Redundant line at the begining of process_part_two sub commented out.