comment on

This:

        if (($TotalNets == 50000) ||
 ($TotalNets == 100000) || ($TotalNets == 250000) ||
 ($TotalNets == 500000) || ($TotalNets == 1000000) || 
($TotalNets == 1500000) || ($TotalNets == 2000000) || 
($TotalNets == 3000000)) {
[download]

should be done in parallel, ie. writing the current net to a fifo or shared memory; then display the totals with another process. Inside the read loop only do those tasks specifically necessary to processing the net records. Alternately, read the file N lines at a time:

do {
    for (0..N) {
        if ( my $line = <FH>) {
            ...
            do stuff here
            ...
        } else {
            last;
        }
    }
print "$Some_Total";
} until (eof );
[download]

You're processing every token three times here:

if ($_ =~ /wire capacitance/) {
                if ($_ =~ /^\s+wire capacitance\:\s+\d.*\d\s*$/) {
                    ($NetCapRaw) = $_ =~ /^\s+wire capacitance\:\s+(\d
+.*\d)\s*$/;
[download]

Replace the token on the first pass or capture the remainder of the string and pass it to another regex.

Actually I like the idea of tokenizing the whole file in a multi-pass interpreter; tokenize the file first replacing each token with a code-ref and each constant with an object that returns a constant. then execute the resulting file.

What does this do?

 if (($DriverForwardSlashCount == 0) && ($NetNameForwardSlashCount == 
+0)) {
                        $AddToCustomTable = 1;
[download]

There are four copies of this and they all just set $AddToCustomTable to the same value. Isn't the following the same thing?

$AddToCustomTable = 1 if ($DriverForwardSlashCount | $NetNameForwardSl
+ashCount <= 1 );
[download]

There are two time eaters in the code; reading the file and executing the regexes. I would try to separate those. Read the file in and split the fields, generating a hash of tokens and data ( note this is similar to the parsing idea above.) then process the hash for your data. This would seem like extra work but often when you refactor the code like this you see optimizations you wouldn't see with the code all in one mashup like it is.

s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}

In reply to Re: Looking for ways to speed up the parsing of a file... by starbolin
in thread Looking for ways to speed up the parsing of a file... by fiddler42

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.