in reply to need to optimize my sub routine

That subroutine has large amounts of cut&paste code that could be reduced, but mostly that wouldn't affect the performance.

Two changes that might improve it are:

  1. Three occurrences of $HoH{$p} =  {%rec}; could become $HoH{$p} =  \%rec; without affecting the program, but avoiding replication of existing hashes.
  2. One occurance of  push @data, {%HoH}; can become  push @data, \%HoH; for the same reason.

This is total speculation given the number of variables used within this sub but not declared there, so take it with a pinch of salt, but the subroutine may be reducable to something along the lines of:

sub main { my $csv = Text::CSV_XS->new; for( @files ) { open( NOW , "$directory/$_" ) || die "you suck\n"; while( <NOW> ) { chomp; my( %rec, %HoH, $p ); $t_counter++; my( $attrsRef, @selectedFields ) = /^STOP/ ? ( \@attrs_sto, 0,1,13,14,16,20,33,34,36, +67 ) : /^START/ ? ( \@attrs_sta, 0,1,11,15,28,29,31,53 + ) : /^ATTEMPT/ ? ( \@attrs_att, 0,1,11,13,17,30,31,33,57 + ) : $ncounter++; ; my %rec{ @$attrs_ref ) = @fields{ @selectedFields ); my $p = $rec{ _i_pstn_trunk } ? extract( $rec{ _i_pstn_circuit + }, $rec{ _i_pstn_trunk } ) : $rec{ _e_pstn_trunk } ? extract( $rec{ _e_pstn_circuit + }, $rec{ _e_pstn_trunk } ) : $ncounter++ ; $csv->parse( $_ ); my @fields = $csv->fields; push @data, \%HoH; } close NOW; } }

The main unknown is the significance of the variable $ncounter? (BTW: Naming a variable $somethingcounter is ... That it is a counter is obvious from it's usage, but one letter to signify what it is counting, when it is declared at a higher level, does not convey much information.

On similar lines, having arrays called @attrs_sto, @attrs_sta, and @attrs_att, presumably STOP, START and ATTEMPT, minimises the important information (to just one character sometimes) and maximises the less important common element.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: need to optimize my sub routine
by convenientstore (Pilgrim) on Feb 20, 2008 at 02:11 UTC
    hmm, did not see this post and very interesting
    I will definitly change them as well and try it
    btw, $HoH{$p} = {%rec} isn't same as $HoH{$p} = \%rec ?
      btw, $HoH{$p} = {%rec} isn't same as $HoH{$p} = \%rec ?

      No.

      The former creates a new anonymous hash by copying the contents of %rec into it, and then stores a reference to that anonymous hash into $HoH{$p}.

      The latter, just stores a reference to the existing %rec into $HoH{$p}.

      As %rec is declared within the while loop, you will get an empty hash each time the loop iterates and so there is no need to copy it's contents, just take a reference.

      No copying means less work, so more efficient. How much difference it makes will depend on how large the hashes are. On the evidence of the script, probably not much individually, but over many iterations it may be significant enough to be worth while.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I definitely like your code better
        I however, just took out the extra variable that I could just calculate later so I replaced it w/ next
        results were very good
        AutoLoader::__ANON__[/usr/lib/perl5/5.8.8/AutoLoader.pm:96] Total Elapsed Time = 150.9968 Seconds User+System Time = 150.9268 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 38.4 57.95 152.52 1 57.950 152.52 main::main 34.4 52.00 50.872 194654 0.0003 0.0003 Text::CSV_XS::Parse 14.7 22.25 21.102 194654 0.0001 0.0001 Text::CSV_XS::fields 14.4 21.82 70.386 194654 0.0001 0.0004 Text::CSV_XS::parse 2.79 4.207 3.086 187290 0.0000 0.0000 main::extract 0.04 0.060 0.079 4 0.0150 0.0198 main::BEGIN 0.01 0.010 0.010 10 0.0010 0.0010 constant::import 0.01 0.010 0.010 2 0.0050 0.0049 Exporter::import
        I also rearranged it according to yours and I got pretty good results as well
        I will run it few more times to see what's going on
        Also, your sample gave me good idea to get rid of some other cut+paste in the code.. Thanks!!!
        sub main { my $csv = Text::CSV_XS->new; for ( @files ) { open ( NOW , "$directory/$_" ) || die "you suck\n"; while ( <NOW> ) { chomp; my (%rec,%HoH, $p); my( $attrsRef, @selectedFields ); $csv->parse($_); my @fields = $csv->fields; ( $attrsRef, @selectedFields ) = /^STOP/ ? ( \@attrs_sto, 0,1,13,14,16,20,33 +,34,36,67 ) : /^START/ ? ( \@attrs_sta, 0,1,11,15,28,29,31 +,53 ) : /^ATTEMPT/ ? ( \@attrs_att, 0,1,11,13,17,30,31 +,33,57 ) : next; @rec{ @$attrsRef } = @fields[ @selectedFields ]; $p = $rec{ _i_pstn_trunk } ? extract( $rec{ _i_pstn_ci +rcuit }, $rec{ _i_pstn_trunk } ) : $rec{ _e_pstn_trunk } ? extract( $rec{ _e_pstn_ci +rcuit }, $rec{ _e_pstn_trunk } ) : next; ; $HoH{$p} = \%rec; push @data, \%HoH; } close NOW; } } Total Elapsed Time = 159.9763 Seconds User+System Time = 160.7492 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 40.2 64.66 160.58 1 64.659 160.58 main::main 34.8 56.01 54.882 194654 0.0003 0.0003 Text::CSV_XS::Parse 13.5 21.78 20.632 194654 0.0001 0.0001 Text::CSV_XS::fields 11.9 19.17 71.746 194654 0.0001 0.0004 Text::CSV_XS::parse 2.90 4.667 3.546 187290 0.0000 0.0000 main::extract 0.28 0.450 0.500 5 0.0901 0.1001 main::BEGIN 0.02 0.040 0.031 1546 0.0000 0.0000 diagnostics::unescape 0.01 0.010 0.010 1 0.0100 0.0099 Exporter::as_heavy 0.01 0.010 0.010 20 0.0005 0.0005 Getopt::Long::BEGIN 0.00 0.000 -0.000 6 0.0000 - strict::import 0.00 0.000 -0.000 3 0.0000 - Text::CSV_XS::BEGIN 0.00 0.000 -0.000 7 0.0000 - vars::import 0.00 0.000 -0.000 1 0.0000 - DynaLoader::bootstrap 0.00 0.000 -0.000 1 0.0000 - DynaLoader::dl_load_flags 0.00 0.000 -0.000 1 0.0000 - DynaLoader::dl_load_file