in reply to need to optimize my sub routine

Simplest speedup should be to move from reading line-by-line and using parse () to use getline (). This will pay off even more when you allow binary and/or embedded newlines. I did a small benchmark on my machine:

/home/merijn> cat test.pl #!/pro/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); use Text::CSV_XS; use IO::Handle; my $csv = Text::CSV_XS->new; my @f; sub diamond { open my $io, "<", "test.csv" or die "test.csv: $!"; while (<$io>) { $csv->parse ($_); @f = $csv->fields; } } # diamond sub intern { open my $io, "<", "test.csv" or die "test.csv: $!"; while (my $row = $csv->getline ($io)) { @f = @$row; } } # intern cmpthese (-5, { "diamond" => \&diamond, "getline" => \&intern }); /home/merijn> wc -l test.csv 12000 test.csv /home/merijn> perl test.pl Rate diamond getline diamond 6.89/s -- -39% getline 11.3/s 64% -- /home/merijn>

You can use the first field to do your after-matches.


Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^2: need to optimize my sub routine
by convenientstore (Pilgrim) on Feb 21, 2008 at 01:09 UTC
    hey, thanks, this is another challenge for me
    I will look at your code and figure out how to incoporate that into mine and let you know the results
    This is fantastic!!!!
Re^2: need to optimize my sub routine
by convenientstore (Pilgrim) on Feb 21, 2008 at 05:51 UTC
    This has definitely improved the time. But I lost some accuracy
    Perhaps that's due to my inaccurate calculation on other sub, but this cut down my time down to 115 sec. Amazing
    sub main { my $csv = Text::CSV_XS->new; for (@files) { open my $io ,"<", "$directory/$_" || die "you suck\n"; DOIT: while (my $it = $csv->getline ($io)) { my (%rec,%HoH); my $p; chomp; $t_counter++; my @fields = @$it; if ($fields[0] =~ /^STOP/) { @rec{@attrs_sto} = @fields[0,1,13,14,16,20,33,3 +4,36,67]; if ($rec{_i_pstn_trunk}) { $p = extract($rec{_i_pstn_circuit}, $rec +{_i_pstn_trunk}); $HoH{$p} = \%rec; } elsif ($rec{_e_pstn_trunk}) { $p = extract($rec{_e_pstn_circuit}, $rec +{_e_pstn_trunk}); $HoH{$p} = \%rec; } else { next DOIT; } } elsif ($fields[0] = /^START/) { @rec{@attrs_sta} = @fields[0,1,11,15,28,29,31,5 +3]; if ($rec{_i_pstn_trunk}) { $p = extract($rec{_i_pstn_circuit}, $rec{_i_p +stn_trunk}); $HoH{$p} = \%rec; } elsif ($rec{_e_pstn_trunk}) { $p = extract($rec{_e_pstn_circuit}, $rec{_e_ +pstn_trunk}); $HoH{$p} = \%rec; } else { next DOIT; } } elsif ($fields[0] =~ /^ATTEMPT/) { @rec{@attrs_att} = @fields[0,1,11,13,17,30,31,3 +3,57]; if ($rec{_i_pstn_trunk}) { $p = extract($rec{_i_pstn_circuit}, $rec{_i_ +pstn_trunk}); $HoH{$p} = \%rec; } elsif ($rec{_e_pstn_trunk}) { $p = extract($rec{_e_pstn_circuit}, $rec{_e_ +pstn_trunk}); $HoH{$p} = \%rec; } else { next DOIT; } } else { next DOIT; } push @data, \%HoH; } }

      I don't know how well the optimizer works, but if you really need the last milklisecond, don't get the fields out, but keep working with the reference. Also drop the chomp. You're not working with C<$_> anymore.

      DOIT: while (my $it = $csv->getline ($io)) { my (%rec, %HoH, $p); # chomp; # No need to chomp anymore! $t_counter++; # my @fields = @$it; # Don't make a copy if ($it->[0] =~ /^STOP/) { # use the reference : } elsif ($it->[0] ...

      Enjoy, Have FUN! H.Merijn