rr27 has asked for the wisdom of the Perl Monks concerning the following question:

I have to parse a formatted file of size 34M but its taking long about 30 min. Can you plz suggest changes to increase the speed?

#!usr/local/bin/perl @timeData=localtime(time); print "@timeData\n"; open FILE,"/usr/rr/xaa"; ### points ###### 1. create tables $count=0; $ban_gate=0; $path_gate=0; $ras_gate=0; $ldc_gate=0; my @col; my @data; my $adjust_flag=0; sub brace_rem { my($p)= @_; chop $p; $p=reverse $p; chop $p; $q= reverse $p; return "\'$q\'"; } ########### sqlite perl ########### use DBI; my $driver = "SQLite"; my $database = "test1.db"; my $dsn = "DBI:$driver:dbname=$database"; my $userid = ""; my $password = ""; $dbh = DBI->connect($dsn, $userid, $password, { RaiseError => 1 }) or die $DBI::errstr; sub table_create { my($tbname,@col)=@_; my $c; foreach $c(@col) { $c=brace_rem($c); } my @upcol=join(',',@col); my $sth=$dbh->prepare("CREATE TABLE $tbname (@upcol)"); my $rv = $sth->execute() or die $DBI::errstr; if($rv < 0){ print $DBI::errstr; } else { # print "Table $tbname created"; } } sub insert_data { my $d; my($tbname,@data)=@_; foreach $d(@data) { $d=brace_rem($d); } my @updata=join(',',@data); my $stmt = qq(INSERT INTO $tbname VALUES(@updata);); my $rv = $dbh->do($stmt) or die $DBI::errstr; } while(<FILE>) { ############# BANNER ################# if ($ban_gate==1) { if($_=~/END_BANNER/) { $ban_gate=0; table_create('FileInfo',@col); insert_data(FileInfo,@data); } else { @parts=/\{.*?\}/g; push(@col,$parts[0]); push(@data,$parts[1]); } } else { ### bangate=0 if($_=~/^BANNER/) { $ban_gate=1; } else { ## else of banner if($ras_gate==1) { if($_=~/END_REQ_CLC/ || $_=~/END_ARR_CLC/ || $_=~/END_SLK_ +CLC/) { $ras_gate=0; $tbname="$ras_token$pathno"; table_create($tbname,@col); insert_data($tbname,@data); } else { ## else of re with rasgate=1 @parts=/\{.*?\}/g; push(@col,$parts[1]); push(@data,$parts[2]); } } else { ### else of rasgate=0 if($_=~/(\s{2})REQ_CLC/ || $_=~/\s{2}SLK_CLC/ || $_=~/\s{2 +}ARR_CLC/) { $ras_gate=1; while (@col) {pop(@col);} while (@data) {pop(@data);} $ras_token=$_; chomp $ras_token; } else { ###### else of rasgate=0 and not req if ($ldc_gate==1) { if ($_=~/END_LAUNCH_CLK_PATH/ || $_=~/END_DATA_PATH/ | +| $_=~/END_CAP_CLK_PATH/) { $ldc_gate=0; } else {### else of rasgate=0 and not req and not endlau +nch, ldcgate=1 if($_=~/INST/ || $_=~/\s{4}HPIN/ || $_=~/\s{4}NET/ || +$_=~/\s{4}PORT/ ) { @parts=/\{.*?\}/g; while (@data) {pop(@data);} if ($flag==1 && $_=~/INST/) { for($i=0;$i<$#parts-2;++$i) { push(@data,$parts[$i]); } push(@data,"$parts[$i]$parts[$i+1]"); push(@data,$parts[$i+2]); } else { foreach $l (@parts) { push(@data,$l); } } insert_data($tbname,@data); } else {### else of inst net, rasgate=0 if($_=~/COLUMNS/) { @parts=/\{.*?\}/g; foreach $l (@parts) { if($l=~/adjustment/) {$flag=1; } else {$flag=0;} push(@col,$l); } $tbname="$ldc_token$pathno"; table_create($tbname,@col); } } } } else { ### ldcgate=0 if($_=~/(\s{2})LAUNCH_CLK_PATH/ || $_=~/\s{2}DATA_PATH +/ || $_=~/\s{2}CAP_CLK_PATH/) { $ldc_gate=1; $ldc_token=$_; while (@col) {pop(@col);} chomp $ldc_token; if($_=~/DATA_PATH/) { chop $ldc_token;} } } } } if($_=~/^PATH/) { @parts = split(/\s+/,$_); $pathno=$parts[1]; print "$pathno\n"; } } } } @timeData=localtime(time); print "@timeData\n"; $dbh->disconnect();

Replies are listed 'Best First'.
Re: Perl script speed
by davido (Cardinal) on May 30, 2014 at 05:00 UTC

    Just tell us what the input looks like, how you want it processed and stored in the database. I think that would be faster than wading through 162 lines of code containing if/else if statements that go four layers deep in places.

    I can suggest that there is often room for improvement in working with the database by preparing statements once, and executing multiple times, rather than doing $dbh->do(...). Also, in the case of SQLite at least, transactions that span multiple inserts or updates will often mitigate the problem of being IO bound, since the transactions are often held in memory until they're committed. Re: Challenge: 8 Letters, Most Words demonstrates this technique.


    Dave

      Input file format is something like this:

      VERSION {1.0} PTDEF {instance} {pin} {cell} {edge} {clock_edge} {clock} {phase} BANNER {Module} {xyz} {Timing} {EARLY} {Slew Propagation} {WORST} {Operating Condition} {W_125_0.99_0.99} {PVT Mode} {min} {Tree Type} {balanced} {Process} {3.0000} {Voltage} {0.9900} {Temperature} {125.0000} {time unit} {1.0000 ps} {capacitance unit} {1.0000 fF} {resistance unit} {1.0000 kOhm} {TOOL} {v13.22-s020_1 ((64bit) 11/20/2013 12:28 (Linux 2.6))} {DATE} {May 16, 2014} END_BANNER PATH 1 VIEW func_STRONG_LT_MINC CHECK_TYPE {Removal Check} REF {xsd_gen/xdiv_sd/DLSB/reg2_reg} {clk} ENDPT {xsd_gen/xdiv_sd/DLSB/reg2_reg} {clrz} {dffcqs_f1_fs_dh} {^} { +leading} {ADPLLS_CLKOUT} {ADPLLS_CLKOUT(C)(P)(func_STRONG_LT_MINC)*} BEGINPT {xsd_gen/div_code_u_reg_reg_1_} {q} {sdffpqs_f2_dh_xsvt} {^} + {leading} {DIVSD} {DIVSD(D)(P)(func_STRONG_LT_MINC)*} REQ_CLC {} {Other End Arrival Time} {307.3350} {+} {Removal} {61.1000} {+} {Phase Shift} {0.0000} {-} {CPPR Adjustment} {11.9700} {-} {Cycle Adjustment} {0.0000} {+} {Uncertainty} {20.0000} {=} {Required Time} {376.4650} END_REQ_CLC SLK_CLC {} {Arrival Time} {580.6149} {} {Slack Time} {204.1499} END_SLK_CLC SLK 204.1499 ARR_CLC {} {Clock Rise Edge} {0.0000} {+} {Drive Adjustment} {1.3300} {=} {Beginpoint Arrival Time} {1.3300} END_ARR_CLC LAUNCH_CLK_PATH COLUMNS {instance} {fpin} {fedge} {tpin} {tedge} {net} {cell} {del +ay} {incr_delay} {slew} {load} {arrival} {required} {stolen} {fanout} + {pin_location} {adjustment} PORT {} {adplls_clk} {^} {} {} {adplls_clk} {} {} {} {6.6000} {2.0 +255} {1.3300} {-202.8199} {} {1} {(0.00, 0.00) } {} NET {} {} {} {} {} {adplls_clk} {} {0.4750} {0.0000} {6.6000} {2.0 +255} {1.8050} {-202.3449} {} {} {} {} INST {dft_mux_adplls1_clk/tiboxv_clk_mx2_mux_0} {a} {^} {y} {^} {} + {ctmux2_f4} {40.4700} {0.0000} {17.9000} {} {42.2750} {-161.8749} {} + {6} {(16.47, 7.88) } {(16.11, 7.58)} {} NET {} {} {} {} {} {mx_fref_adplls_clk} {} {1.1400} {0.0000} {18.1 +000} {17.3511} {43.4150} {-160.7349} {} {} {} {} INST {dft_core_clock_leaker/func_clk_in_gate_dt/tiboxv_clk_icg_icg +_0} {clkin} {^} {clkout} {^} {} {icg_f4} {26.4100} {0.0000} {7.8000} +{} {69.8250} {-134.3249} {} {1} {(66.33, 12.83) } {(65.43, 12.53)} {} + NET {} {} {} {} {} {dft_core_clock_leaker/func_clk_in_gate} {} {0. +0950} {0.0000} {7.8000} {2.4991} {69.9200} {-134.2299} {} {} {} {} INST {dft_core_clock_leaker/func_clk_mux_dt/tiboxv_clk_mx2_mux_0} +{a} {^} {y} {^} {} {ctmux2_f4} {43.7950} {0.0000} {21.9000} {} {113.7 +150} {-90.4349} {} {9} {(64.22, 13.72) } {(64.58, 14.03)} {} NET {} {} {} {} {} {cleaker_clk} {} {1.4250} {0.0000} {22.1000} {2 +1.7218} {115.1400} {-89.0099} {} {} {} {} INST {icg_clk_983mhz/tiboxv_clk_icg_icg_0} {clkin} {^} {clkout} {^ +} {} {icg_f4} {85.5000} {0.0000} {103.6000} {} {200.6400} {-3.5099} { +} {137} {(68.58, 8.78) } {(67.68, 9.08)} {} NET {} {} {} {} {} {clk_2ghz} {} {9.8800} {0.0000} {105.1000} {174 +.7581} {210.5200} {6.3701} {} {} {} {} INST {xsd_gen/xdiv_sd/U0_clkin_gate0/tiboxh_clk_or2_or_0} {a} {^} +{y} {^} {} {ctor2_b4} {44.9350} {0.0000} {13.3000} {} {255.4550} {51. +3051} {} {1} {(48.20, 31.88) } {(47.12, 31.73)} {} NET {} {} {} {} {} {xsd_gen/xdiv_sd/clkin_0_} {} {0.2850} {0.0000} + {13.3000} {2.6745} {255.7400} {51.5901} {} {} {} {} INST {xsd_gen/xdiv_sd/U0_clkin0_buf/tiboxh_clk_buf_buf_0} {a} {^} +{y} {^} {} {ctb_f4} {21.9450} {0.0000} {9.7000} {} {277.6850} {73.535 +1} {} {5} {(47.07, 30.53) } {(48.33, 30.53)} {} NET {} {} {} {} {} {xsd_gen/xdiv_sd/clkout_gdc_gated} {} {0.3800} +{0.0000} {9.7000} {10.3982} {278.0650} {73.9151} {} {} {} {} INST {xsd_gen/xdiv_sd/DLSB/reg4_reg} {clk} {^} {q} {^} {} {dffcqs_ +f1_fs_dh} {94.0500} {0.0000} {51.7000} {} {372.1150} {167.9651} {} {4 +} {(52.61, 40.12) } {(53.19, 41.27)} {DIVSD Adj. = 0.0000} NET {} {} {} {} {} {xsd_gen/xdiv_sd/clkout_gdc} {} {3.8950} {0.000 +0} {51.8000} {11.1143} {376.0100} {171.8600} {} {} {} {} INST {xsd_gen/xdiv_sd/clkout_mux2/tiboxh_clk_mx2_mux_0} {b} {^} {y +} {^} {} {ctmux2_f4} {64.1250} {0.0000} {34.2000} {} {440.1350} {235. +9850} {} {8} {(54.36, 46.27) } {(53.28, 45.98)} {} NET {} {} {} {} {} {clkgen_obs_clk} {} {4.1800} {0.0000} {34.5000} + {47.5230} {444.3150} {240.1650} {} {} {} {} END_LAUNCH_CLK_PATH DATA_PATH COLUMNS {instance} {fpin} {fedge} {tpin} {tedge} {net} {cell} {del +ay} {incr_delay} {slew} {load} {arrival} {required} {stolen} {fanout} + {pin_location} {adjustment} INST {xsd_gen/div_code_u_reg_reg_1_} {clk} {^} {q} {^} {} {sdffpqs +_f2_dh_xsvt} {39.3000} {0.0000} {16.0000} {} {483.6150} {279.4650} {} + {4} {(59.22, 41.18) } {(60.66, 41.18)} {} NET {} {} {} {} {} {xsd_gen/div_code_u_reg_1_} {} {0.6000} {0.0000 +} {16.0000} {4.8318} {484.2150} {280.0650} {} {} {} {} INST {xsd_gen/xdiv_sd/g397} {a} {^} {y} {v} {} {inv_f1_xsvt} {16.6 +000} {0.0000} {12.5000} {} {500.8150} {296.6650} {} {3} {(56.97, 39.2 +3) } {(57.15, 39.08)} {} NET {} {} {} {} {} {xsd_gen/xdiv_sd/n_16} {} {0.5000} {0.0000} {12 +.5000} {3.5621} {501.3150} {297.1650} {} {} {} {} INST {xsd_gen/xdiv_sd/g384} {b} {v} {y} {^} {} {nand3_f0p33_xsvt} +{17.0000} {0.0000} {10.6000} {} {518.3149} {314.1650} {} {1} {(56.72, + 37.73) } {(56.84, 37.43)} {} NET {} {} {} {} {} {xsd_gen/xdiv_sd/n_7} {} {0.1000} {0.0000} {10. +6000} {0.8820} {518.4149} {314.2650} {} {} {} {} INST {xsd_gen/xdiv_sd/g375} {c2} {^} {y} {^} {} {aoa112_f1_xsvt} { +62.2000} {0.0000} {12.2000} {} {580.6149} {376.4650} {} {1} {(54.86, +37.73) } {(53.95, 37.58)} {} NET {} {} {} {} {} {xsd_gen/xdiv_sd/n_25} {} {0.0000} {0.0000} {12 +.2000} {1.2445} {580.6149} {376.4650} {} {} {} {} END_DATA_PATH OTHER_ARR_CLC {} {Clock Rise Edge} {0.0000} {+} {Drive Adjustment} {1.4700} {=} {Beginpoint Arrival Time} {1.4700} END_OTHER_ARR_CLC CAP_CLK_PATH COLUMNS {instance} {fpin} {fedge} {tpin} {tedge} {net} {cell} {del +ay} {incr_delay} {slew} {load} {arrival} {required} {stolen} {fanout} + {pin_location} {adjustment} PORT {} {adplls_clk} {^} {} {} {adplls_clk} {} {} {} {6.6000} {2.1 +483} {1.4700} {205.6199} {} {1} {(0.00, 0.00) } {} NET {} {} {} {} {} {adplls_clk} {} {0.5250} {0.0000} {6.6000} {2.1 +483} {1.9950} {206.1449} {} {} {} {} INST {dft_mux_adplls1_clk/tiboxv_clk_mx2_mux_0} {a} {^} {y} {^} {} + {ctmux2_f4} {44.7300} {0.0000} {17.9000} {} {46.7250} {250.8749} {} +{6} {(16.47, 7.88) } {(16.11, 7.58)} {} NET {} {} {} {} {} {mx_fref_adplls_clk} {} {1.2600} {0.0000} {18.1 +000} {17.5437} {47.9850} {252.1349} {} {} {} {} INST {dft_core_clock_leaker/func_clk_in_gate_dt/tiboxv_clk_icg_icg +_0} {clkin} {^} {clkout} {^} {} {icg_f4} {29.1900} {0.0000} {7.8000} +{} {77.1750} {281.3249} {} {1} {(66.33, 12.83) } {(65.43, 12.53)} {} NET {} {} {} {} {} {dft_core_clock_leaker/func_clk_in_gate} {} {0. +1050} {0.0000} {7.8000} {2.6219} {77.2800} {281.4299} {} {} {} {} INST {dft_core_clock_leaker/func_clk_mux_dt/tiboxv_clk_mx2_mux_0} +{a} {^} {y} {^} {} {ctmux2_f4} {48.4050} {0.0000} {21.9000} {} {125.6 +850} {329.8349} {} {9} {(64.22, 13.72) } {(64.58, 14.03)} {} NET {} {} {} {} {} {cleaker_clk} {} {1.5750} {0.0000} {22.1000} {2 +2.3134} {127.2600} {331.4099} {} {} {} {} INST {icg_clk_983mhz/tiboxv_clk_icg_icg_0} {clkin} {^} {clkout} {^ +} {} {icg_f4} {94.5000} {0.0000} {103.7000} {} {221.7600} {425.9099} +{} {137} {(68.58, 8.78) } {(67.68, 9.08)} {} NET {} {} {} {} {} {clk_2ghz} {} {10.9200} {0.0000} {105.1000} {18 +2.0893} {232.6800} {436.8299} {} {} {} {} INST {xsd_gen/xdiv_sd/U0_clkin_gate0/tiboxh_clk_or2_or_0} {a} {^} +{y} {^} {} {ctor2_b4} {49.6650} {0.0000} {13.3000} {} {282.3450} {486 +.4949} {} {1} {(48.20, 31.88) } {(47.12, 31.73)} {} NET {} {} {} {} {} {xsd_gen/xdiv_sd/clkin_0_} {} {0.3150} {0.0000} + {13.3000} {2.8831} {282.6600} {486.8099} {} {} {} {} INST {xsd_gen/xdiv_sd/U0_clkin0_buf/tiboxh_clk_buf_buf_0} {a} {^} +{y} {^} {} {ctb_f4} {24.2550} {0.0000} {9.7000} {} {306.9150} {511.06 +49} {} {5} {(47.07, 30.53) } {(48.33, 30.53)} {} NET {} {} {} {} {} {xsd_gen/xdiv_sd/clkout_gdc_gated} {} {0.4200} +{0.0000} {9.7000} {10.7594} {307.3350} {511.4849} {} {} {} {} END_CAP_CLK_PATH END_PATH 1
        You'd be wise to completely refactor the code and to separate the logic for parsing and processing.

        ATM this is a maintenance nightmare (sorry)

        parser

        This "format" seems to have a strict grammar:

        • lines have <indentation> <capitalized KEYWORD > <datafields>
        • the <datafields> are either space separated or {grouped by curlies}
        • if lines are nested the level is finished by END_keyword

        So write a _generic_ parser (and better implicitly check for correctness) by

        • parsing each line with a regex,
        • keeping record of nesting levels
        • putting parsed data into a hash of hashes

        If this parser works correctly you can start measuring the _speed_ for your huge file.

        Now you are able to tell if the DB operations are the bottleneck (most likely they are)

        processing

        You should try to bundle many DB-operations, cause there is always a overhead.

        If the complete HoH after parsing the whole file is to large, you must define "rules" which hook into the parsing.

        This can be done by checking after each line for suitable callbacks to evaluate your temporary data so far.

        Define a callback-hash $action{KEYWORD} holding subs to be called if a keyword is processed.

        HTH! :)

        Cheers Rolf

        ( addicted to the Perl Programming Language)

        After this I might then have to ask how would the output you are trying to get will look like?
        Please could you see this How do I post a question effectively?

        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me
Re: Perl script speed
by hippo (Archbishop) on May 30, 2014 at 08:31 UTC

    In any optimisation task the first step is to profile the code. I guess that you have not yet done this (otherwise why would you not have shared the results with us).

    Fortunately perl has a plethora of profiling options. The first port of call is the recommended Devel::NYTProf. Alternatively you could try the minimalist Devel::FastProf.

    Once you've profiled it you will be in a better position to know where the problem lies.

    In the meantime I would suggest breaking out your big nest of if/elses into separate subroutines for ease of both profiling and general maintenance.

      As evident, this is first time I am writing perl with SQLite, hence the inefficiency. I have done the profiling and got 1.DBI::st::execute as major time consumer. 2. DBD::SQLite::st::_prepare 3.DBI::_setup_handle. Please suggest ways to reduce the time.

        Try switching AutoCommit off and then use $dbh->commit() after say every 1000 inserts, adjusting the number to get the best result.
        This node might help Loading bulk data into SQLite

        poj
        I would have expected that parsing a 34 MB file would not take very much time (at least not anything near 30 minutes) and that the DB inserts would likely be the big time consumers (but it is of course much better to check it with profiling tools). Well, davido has suggested some possible improvements here.
Re: Perl script speed
by roboticus (Chancellor) on May 30, 2014 at 21:48 UTC

    rr27:

    Just a couple simple things:

    • while (@col) { pop(@col); } is better written as @col = ();
    • Similarly, foreach $l (@parts) { push( @data, $l ); } is better written as push @data, @parts;
    • And using elsif can simplify your code, turning:
      if (cond) { ...stuff... } else { #nothing here if (cond2) { ...more stuff... } #nothing here, either }
      into:
      if (cond1) { ...stuff... } elsif (cond2) { ...more stuff... }

    Finally, your $xx_gate variables are all doing a similar task. You're basically doing a simple state-based parser where you're using the gate variables to track which section you happen to be in. If I were doing it, I'd use a single variable, and use text values to make the code a little easier to read. Additionally, since you're sitting in a loop and using mutually-exclusive if statements to do one small chunk each time through the loop, you can use the next statement to avoid some else statements. So where you have code something like:

    while (<FILE>) { if ($ban_gate==1) { if ($_=~/END_BANNER/) { # found end of section, do stuff and turn off section process +ing $ban_gate=0; } else { # process a line in the banner section } } else { if ($ras_gate==1) { if ($_=~/END_xxxx/) { # found end of section, do stuff and turn off section proc +essing $ras_gate=0; } } else { # process a line in the ras section } } }

    I'd write it more like:

    while (<FILE>) { if ($state eq 'BANNER') { if ($_=~/END_BANNER/) { # found end of section, do stuff and turn off section process +ing $state='-none-'; next; } # process a line in the banner section } elsif ($state eq 'RAS') { if ($_=~/END_xxxx/) { # found end of section, do stuff and turn off section process +ing $state='-none-'; next; } # process a line in the ras section } }

    ...that's enough for now.

    Note: while I strongly suggest using the first three items, the rest are more opinion and style and you should take them with a grain of salt.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Perl script speed
by GrandFather (Saint) on May 30, 2014 at 21:35 UTC

    The following code which does (as far as I can tell) substantially what you want except that instead of creating a plethora of tables containing a single row, it generates a table per "signal" and adds a path number column. 4000 paths generates about a 30 MB "file" and runs in under 30 seconds.

    #!usr/local/bin/perl use strict; use warnings; use DBI; my $fileStr = do {local $/; <DATA>}; my ($header, $meat) = $fileStr =~ /(.*)^PATH 1(.*)END_PATH 1.*/sm; my $dbName = 'test1.sqlite'; my $paths = 4000; $fileStr = join '', $header, map {"PATH $_${meat}END_PATH $_\n"} 1 .. +$paths; unlink $dbName; open my $fin, '<', \$fileStr; my $startTick = time (); run($fin, $dbName); my $seconds = time () - $startTick; print "Took $seconds for $paths paths\n"; sub run { my ($fin, $dbName) = @_; my $self = bless {fin => $fin}; my $dsn = "DBI:SQLite:dbname=$dbName"; my $userid = ""; my $password = ""; $self->{dbh} = DBI->connect($dsn, $userid, $password, {RaiseError +=> 1}) or die $DBI::errstr; my %dispatch = ( VERSION => sub { }, PTDEF => sub { }, BANNER => \&banner, PATH => \&path, ); $self->{dbh}->do('BEGIN TRANSACTION;'); while (defined (my $line = <$fin>)) { my ($key, $tail) = $line =~ /^(\w+)\s*(.*)/ or next; die "Parser doesn't understand $line" if !exists $dispatch{$ke +y}; $dispatch{$key}($self, $key, $tail); } $self->{dbh}->do('COMMIT TRANSACTION;'); } sub banner { my ($self, $key, $tail) = @_; my $fin = $self->{fin}; my @cols; my @values; while (defined (my $line = <$fin>)) { last if $line =~ /END_$key/; my ($key, $value) = $line =~ /\{([^}]+)\}/g or next; push @cols, $key; push @values, $value; } $self->table_create('FileInfo', @cols); $self->insertRow('FileInfo', @values); } sub path { my ($self, $key, $tail) = @_; my $fin = $self->{fin}; my %dispatch = ( REQ_CLC => \&rasGate, SLK_CLC => \&rasGate, ARR_CLC => \&rasGate, LAUNCH_CLK_PATH => \&ldcGate, DATA_PATH => \&ldcGate, CAP_CLK_PATH => \&ldcGate, ); $self->{pathNum} = $tail; print "Processing path $tail\n"; while (defined (my $line = <$fin>)) { last if $line =~ /END_$key/; my ($key, $tail) = $line =~ /^\s+(\w+)\s*(.*)/ or next; warn "Parser doesn't understand $line", next if !exists $dispa +tch{$key}; $dispatch{$key}($self, $key, $tail); } } sub rasGate { my ($self, $key, $tail) = @_; my $fin = $self->{fin}; my @cols; my @values; my $rasToken; while (defined (my $line = <$fin>)) { last if $line =~ /END_$key/; my (undef, $key, $value) = $line =~ /\{([^}]+)\}/g or next; push @cols, $key; push @values, $value; } $self->table_create($key, 'Path', @cols); $self->insertRow($key, $self->{pathNum}, @values); } sub ldcGate { my ($self, $key, $tail) = @_; my $fin = $self->{fin}; my %dispatch = ( COLUMNS => \&ldcColumns, INST => \&instLine, HPIN => \&ldcLine, NET => \&ldcLine, PORT => \&ldcLine, ); $self->{ldcKey} = $key; $self->{rows} = []; while (defined (my $line = <$fin>)) { last if $line =~ /END_$key/; my ($key, $tail) = $line =~ /^\s*(\w+)\s*(.*)/ or next; die "Parser doesn't understand $line" if !exists $dispatch{$ke +y}; $dispatch{$key}($self, $key, $tail); } $self->{ldcSth}->execute($self->{pathNum}, @$_) for @{$self->{rows +}}; } sub ldcLine { my ($self, $key, $tail) = @_; push @{$self->{rows}}, [map {defined ($_) ? $_ : ''} $tail =~ /\{([^}]*)\}/g]; } sub instLine { my ($self, $key, $tail) = @_; my @values = map {defined ($_) ? $_ : ''} $tail =~ /\{([^}]*)\}/g; $values[-3] = "$values[-3]$values[-2]"; splice @values, $#values - 1, 1; push @{$self->{rows}}, \@values; } sub ldcColumns { my ($self, $key, $tail) = @_; my @cols = $tail =~ /\{([^}]+)\}/g; $self->table_create($self->{ldcKey}, 'Path', @cols); my $places = join ', ', ('?') x @cols; my $stmt = qq(INSERT INTO '$self->{ldcKey}' VALUES(?, $places);); $self->{ldcSth} = $self->{dbh}->prepare($stmt) or die $DBI::errstr +; } sub table_create { my ($self, $tbname, @cols) = @_; my $columns = "'" . join ("', '", @cols) . "'"; return if $self->{haveTable}{$tbname}++; my $sth = $self->{dbh}->prepare("CREATE TABLE $tbname ($columns)") +; my $rv = $sth->execute() or die $DBI::errstr; if ($rv < 0) { print $DBI::errstr; } else { # print "Table $tbname created"; } } sub insertRow { my ($self, $tbname, @data) = @_; my $places = join ', ', ('?') x @data; my $stmt = qq(INSERT INTO $tbname VALUES($places);); my $sth = $self->{dbh}->prepare($stmt) or die $DBI::errstr; $sth->execute(@data) or die $DBI::errstr; }

    Tack the following __DATA__ to the end of the script before running it, or alter the code to use an external file.

    Perl is the programming world's equivalent of English

      I think the regex in rasGate should to be

      my (undef, $key, $value) = $line =~ /\{([^}]*)\}/g or next; # * not +

      to parse the empty brackets {} correctly otherwise you get a column called 0.0000 in the table.

      ARR_CLC {} {Clock Rise Edge} {0.0000} {+} {Drive Adjustment} {1.3300} {=} {Beginpoint Arrival Time} {1.3300} END_ARR_CLC
      poj
      Thanx a lot!! But I am facing an error of  DBD::SQLite::st execute failed: called with 16 variables when 17 are needed This is at the line
      $self->{ldcSth}->execute($self->{pathNum}, @$_) for @{$self->{rows}}; }
      of ldcGate. Please can you explain how to solve this error.

        Am I right in thinking that some of lines only have 16 columns, ie no adjustment ?

        COLUMNS {instance} {fpin} {fedge} {tpin} {tedge} {net} {cell} {del +ay} {incr_delay} {slew} {load} {arrival} {required} {stolen} {fanout} + {pin_location} {adjustment}
        Update:
        I ask because in your code you have these parts which I guess deal with the extra column by concatenating 2 together
        if ($l=~/adjustment/) { $flag=1; }
        and
        if ($flag==1 && $_=~/INST/) { for($i=0;$i<$#parts-2;++$i) { push(@data,$parts[$i]); } push(@data,"$parts[$i]$parts[$i+1]"); push(@data,$parts[$i+2]); } else { foreach $l (@parts) { push(@data,$l); } }

        poj
Re: Perl script speed
by perlfan (Parson) on May 30, 2014 at 12:20 UTC
    This code has a complexity of O(n^2) at least since you have some while statements in there. Don't nest loops and you might go faster. I would suggest multiple passes if necessary over nesting loops.

      The "nested while loops" are a non-issue in terms of the overall performance of the code. They are a really inefficient way of emptying arrays, but in the context of the database I/O they are totally insignificant. The code complexity is O(n).

      There are nested for loops too which are more interesting, but they effectively add a fixed computational load per line so again the have no effect on the complexity.

      Perl is the programming world's equivalent of English