Re: need to optimize my sub routine

Simplest speedup should be to move from reading line-by-line and using parse () to use getline (). This will pay off even more when you allow binary and/or embedded newlines. I did a small benchmark on my machine:

/home/merijn> cat test.pl
#!/pro/bin/perl

use strict;
use warnings;

use Benchmark qw( cmpthese );
use Text::CSV_XS;
use IO::Handle;

my $csv = Text::CSV_XS->new;
my @f;

sub diamond
{
    open my $io, "<", "test.csv" or die "test.csv: $!";
    while (<$io>) {
        $csv->parse ($_);
        @f = $csv->fields;
        }
    } # diamond

sub intern
{
    open my $io, "<", "test.csv" or die "test.csv: $!";
    while (my $row = $csv->getline ($io)) {
        @f = @$row;
        }
    } # intern

cmpthese (-5, { "diamond" => \&diamond, "getline" => \&intern });
/home/merijn> wc -l test.csv
12000 test.csv
/home/merijn> perl test.pl
          Rate diamond getline
diamond 6.89/s      --    -39%
getline 11.3/s     64%      --
/home/merijn>
[download]

You can use the first field to do your after-matches.

Enjoy, Have FUN! H.Merijn

Comment on Re: need to optimize my sub routine Select or Download Code

Replies are listed 'Best First'.
Re^2: need to optimize my sub routine by convenientstore (Pilgrim) on Feb 21, 2008 at 01:09 UTC
hey, thanks, this is another challenge for me I will look at your code and figure out how to incoporate that into mine and let you know the results This is fantastic!!!!	[reply]
Re^2: need to optimize my sub routine by convenientstore (Pilgrim) on Feb 21, 2008 at 05:51 UTC
This has definitely improved the time. But I lost some accuracy Perhaps that's due to my inaccurate calculation on other sub, but this cut down my time down to 115 sec. Amazing sub main { my $csv = Text::CSV_XS->new; for (@files) { open my $io ,"<", "$directory/$_" \|\| die "you suck\n"; DOIT: while (my $it = $csv->getline ($io)) { my (%rec,%HoH); my $p; chomp; $t_counter++; my @fields = @$it; if ($fields[0] =~ /^STOP/) { @rec{@attrs_sto} = @fields[0,1,13,14,16,20,33,3 +4,36,67]; if ($rec{_i_pstn_trunk}) { $p = extract($rec{_i_pstn_circuit}, $rec +{_i_pstn_trunk}); $HoH{$p} = \%rec; } elsif ($rec{_e_pstn_trunk}) { $p = extract($rec{_e_pstn_circuit}, $rec +{_e_pstn_trunk}); $HoH{$p} = \%rec; } else { next DOIT; } } elsif ($fields[0] = /^START/) { @rec{@attrs_sta} = @fields[0,1,11,15,28,29,31,5 +3]; if ($rec{_i_pstn_trunk}) { $p = extract($rec{_i_pstn_circuit}, $rec{_i_p +stn_trunk}); $HoH{$p} = \%rec; } elsif ($rec{_e_pstn_trunk}) { $p = extract($rec{_e_pstn_circuit}, $rec{_e_ +pstn_trunk}); $HoH{$p} = \%rec; } else { next DOIT; } } elsif ($fields[0] =~ /^ATTEMPT/) { @rec{@attrs_att} = @fields[0,1,11,13,17,30,31,3 +3,57]; if ($rec{_i_pstn_trunk}) { $p = extract($rec{_i_pstn_circuit}, $rec{_i_ +pstn_trunk}); $HoH{$p} = \%rec; } elsif ($rec{_e_pstn_trunk}) { $p = extract($rec{_e_pstn_circuit}, $rec{_e_ +pstn_trunk}); $HoH{$p} = \%rec; } else { next DOIT; } } else { next DOIT; } push @data, \%HoH; } } [download]	[reply] [d/l]
Re^3: need to optimize my sub routine by Tux (Canon) on Feb 21, 2008 at 07:20 UTC
I don't know how well the optimizer works, but if you really need the last milklisecond, don't get the fields out, but keep working with the reference. Also drop the chomp. You're not working with C<$_> anymore. `DOIT: while (my $it = $csv->getline ($io)) { my (%rec, %HoH, $p); # chomp; # No need to chomp anymore! $t_counter++; # my @fields = @$it; # Don't make a copy if ($it->[0] =~ /^STOP/) { # use the reference : } elsif ($it->[0] ...` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l]