Can this code be optimized further?

samy_kumar has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Can this code be optimized further? by dragonchild (Archbishop) on Feb 10, 2005 at 13:56 UTC
There's a few improvements that can be made. `use strict; my @temp = qw( a_1 b_1 a_2 a_3 a_4 b_2 a_5 b_3 a_6 b_4 ); my %values; foreach (@temp) { /^([ab])_(.)/ && do { push @{$values{$1}}, $2; next; }; die "'$_' doesn't match the pattern of ^[ab]_.\n"; } print "$_ => @{$values{$_}}\n" for qw( a b );` [download] Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.	[reply] [d/l]
Re: Can this code be optimized further? by eXile (Priest) on Feb 10, 2005 at 14:33 UTC
why not use only grep?: `my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; my @a_arr = grep { s/^a_(.)/$1/} @temp; my @b_arr = grep { s/^b_(.)/$1/} @temp;` [download] PS: using @a and @b should be OK, but $a and $b have special meaning (in 'sort'), so I try to never use 'a' and 'b' as variable names.	[reply] [d/l]
Re: Can this code be optimized further? by BrowserUk (Patriarch) on Feb 10, 2005 at 14:53 UTC
`#!/usr/bin/perl -w use strict; my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; my (@a, @b) ; m[^([ab])_(.*)$] and push @{$1 eq 'a' ? \@a : \@b}, $2 for @temp; print "@a <==> @b";` [download] Examine what is said, not who speaks. Silence betokens consent. Love the truth but pardon error.	[reply] [d/l]
Re: Can this code be optimized further? by Fletch (Bishop) on Feb 10, 2005 at 14:06 UTC
Another way. `## I'd personally use qw( ) here, but this is probalby just sample dat +a my @temp = ("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4 +"); my( @a, @b ); my %arrays; eval qq{\$arrays{ "$_" } = \\\@$_} for qw( a b ); for( @temp ) { die "Invalid prefix on element '$_'\n" unless /^([ab])_(.)/; push @{ $arrays{ $1 } }, $2; }` [download] Update:* Gah, left off a couple of `\\` to get a reference in the eval.	[reply] [d/l]
Re^2: Can this code be optimized further? by dragonchild (Archbishop) on Feb 10, 2005 at 14:19 UTC
`eval qq{\$arrays{ "$_" } = \\\@$_} for qw( a b );` ?!?!? Am I the only one that thinks unncessary evals are just ugly? While I commend you for trying to keep the surrounding code unchanged (by creating @a and @b), I think the better solution is to improve the surrounding code by making sure the appropriate data structures are used - in this case, a HoA is appropriate. Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.	[reply] [d/l]
Re^3: Can this code be optimized further? by Fletch (Bishop) on Feb 10, 2005 at 14:26 UTC
If you can think of a better way to get a reference to an arbitrary lexical without `eval` sure. The original question was how to get things into arrays `@a` and `@b`; this gets things into the lexical arrays. Granted they may really need a HoA to begin with, but this technique (or the equivalent `$arrays{ $_ } = do { no strict 'refs'; \@{ $_ } }` for a package variable) sometimes is useful.	[reply] [d/l] [select]
Re^2: Can this code be optimized further? by Roy Johnson (Monsignor) on Feb 10, 2005 at 15:16 UTC
You can use a hash to reroute things to the lexical arrays so you don't have to eval: `my (@a, @b); my %router = (a => \@a, b => \@b); foreach my $value (@temp) { push @{$router{$1}}, $2 if $value =~ /([ab])_(.)/; } print "@a <==> @b";` [download] Caution:* Contents may have been coded under pressure.	[reply] [d/l]
Re^3: Can this code be optimized further? by Fletch (Bishop) on Feb 10, 2005 at 15:34 UTC
And if you had 10 arrays you'd want to keep that hash up to date? 20? 30? (Of course not, you'd use a HoA and not screw with this in the first place. :) That being said, in this specific case of just a and b I'd personally have probably used something like BrowserUK's `@{ $1 eq 'a' ? \@a : \@b}` below; but the eval's more flexible if you have a large number of destinations (then again with a large number of destinations you'd probably want an HoA). Of course at the moment I'd be more likely to just use `Enumerable#partition` and be done with it, but that's another language all together.	[reply] [d/l]
Re^4: Can this code be optimized further? by Roy Johnson (Monsignor) on Feb 10, 2005 at 16:43 UTC
Re: Can this code be optimized further? by Anonymous Monk on Feb 10, 2005 at 14:23 UTC
I wouldn't use push. If @temp is large (which assume it is, if it isn't there's no need to bother with optimizing it), repeated pushes means repeated mallocing the array needed for @a and @b as it extends. I'd use: `my @temp = (...); my @a = map {/a_(.)/ ? $1 : ()} @temp; my @b = map {/b_(.)/ ? $1 : ()} @temp;` [download] If you expect to not have many matches, that is, most elements in @temp don't contain a_ or b_, I'd try to see whether the simpler regex is enough gain to have both the map and the grep: `my @a = map {/a_(.)/; $1} grep /a_/ @temp; my @b = map {/b_(.)/; $1} grep /b_/ @temp;` [download]	[reply] [d/l] [select]
Re: Can this code be optimized further? by phaylon (Curate) on Feb 10, 2005 at 14:41 UTC
This works w/o pcre's and more than a and b, though it stores in a hash. `#!/usr/bin/perl use warnings; use strict; my @temp = qw( a_1 b_1 a_2 a_3 a_4 b_2 a_5 b_3 a_6 b_4 ); my %parts; push @{ $parts{ substr $_, 0, 1 }}, substr $_, 2 foreach @temp; use Data::Dumper; print Dumper \%parts;` [download]	[reply] [d/l]
Re^2: Can this code be optimized further? by Anonymous Monk on Feb 10, 2005 at 17:56 UTC
This works w/o pcre's I assume that by "pcre" you mean Perl Compatible Regular Expressions. Now -- putting aside all questions of how compatible those regexes really are -- I'd like to mention that when we're using Perl, we don't have to call them "Perl Compatible". We just call them "Regular Expressions". ;-P	[reply]
Re^3: Can this code be optimized further? by phaylon (Curate) on Feb 10, 2005 at 18:34 UTC
Yep, but I often talk to people not involved, so it's just a habit. Besides, I find it more aesthetic than "re's" or "RegExp's" or something :)	[reply]
Re: Can this code be optimized further? by cog (Parson) on Feb 10, 2005 at 14:49 UTC
Use the Benchmark module to try out all these answers :-)	[reply]
Re: Can this code be optimized further? by rir (Vicar) on Feb 10, 2005 at 20:22 UTC
`my ( @a, @b); { no strict "refs"; push @{substr( $_,0,1)}, substr($_,2) foreach ( @temp) }` [download] There is not much point in avoiding sym-refs here. Be well, rir	[reply] [d/l]
Re: Can this code be optimized further? by runrig (Abbot) on Feb 10, 2005 at 16:58 UTC
As long as everyone's just chiming in with different ways, if you don't mind modifying the original array: `for (@temp) { push @a, $_ if s/^a_//; push @b, $_ if s/^b_//; }` [download] You could even modify this to use a HoA-type solution, e.g.(update: fixed code..still untested): `my %hoa; s/^([a-z])_// and push @{$hoa{$1}}, $_ for @temp;` [download]	[reply] [d/l] [select]
Re: Can this code be optimized further? by holli (Abbot) on Feb 10, 2005 at 18:56 UTC
This is not an optimization, but for the sake of TIMTOWTDI: `#!/usr/bin/perl -w use strict; my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; my (@a, @b) ; my %dispatch = ( "a" => sub { push @a, shift }, "b" => sub { push @b, shift }, ); foreach my $value (@temp) { if ( $value =~ /^([ab])_(.+)$/ ) { my $sub = $dispatch{$1}; &$sub($2); } } print "@a <==> @b\n";` [download] holli, /regexed monk/	[reply] [d/l]
Re: Can this code be optimized further? by RazorbladeBidet (Friar) on Feb 10, 2005 at 15:23 UTC
#!/usr/bin/perl -w use strict; use Time::HiRes qw( gettimeofday tv_interval); my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; my @startTime = gettimeofday; for ( 1..10000 ) { my (@a, @b) ; foreach my $value (@temp) { push @a, $1 if ($value =~ /a_(.)/) ; push @b, $1 if ($value =~ /b_(.)/) ; } } print "Time: ".tv_interval( \@startTime )."\n"; @temp = qw( a_1 b_1 a_2 a_3 a_4 b_2 a_5 b_3 a_6 b_4 ); @startTime = gettimeofday; for ( 1..10000 ) { my %values; foreach (@temp) { /^([ab])_(.)/ && do { push @{$values{$1}}, $2; next; }; die "'$_' doesn't match the pattern of ^[ab]_.\n"; } } print "Time: ".tv_interval( \@startTime )."\n"; @temp = qw( a_1 b_1 a_2 a_3 a_4 b_2 a_5 b_3 a_6 b_4 ); @startTime = gettimeofday; for ( 1..10000 ) { my (@a, @b); foreach (@temp) { my $prefix = substr( $_, 0, 2 ); if ( $prefix eq 'a_' ) { push @a, substr( $_, 2 ); } elsif ( $prefix eq 'b_' ) { push @b, substr( $_, 2 ); } } } print "Time: ".tv_interval( \@startTime )."\n"; @startTime = gettimeofday; for ( 1..10000 ) { my( @a, @b ); my %arrays; eval qq{\$arrays{ "$_" } = \\\@$_} for qw( a b ); for( @temp ) { die "Invalid prefix on element '$_'\n" unless /^([ab])_(.)/; push @{ $arrays{ $1 } }, $2; } } print "Time: ".tv_interval( \@startTime )."\n"; @startTime = gettimeofday; for ( 1..10000 ) { my @a = map {/a_(.)/ ? $1 : ()} @temp; my @b = map {/b_(.)/ ? $1 : ()} @temp; } print "Time: ".tv_interval( \@startTime )."\n"; @startTime = gettimeofday; for ( 1..10000 ) { my @a_arr = grep { s/^a_(.)/$1/} @temp; my @b_arr = grep { s/^b_(.)/$1/} @temp; } print "Time: ".tv_interval( \@startTime )."\n"; # GAVE ME substr outside of string at ./testme2.pl line 88. #@startTime = gettimeofday; #for ( 1..10000 ) { # my %parts; # push @{ $parts{ substr $_, 0, 1 }}, substr $_, 2 # foreach @temp; #} #print "Time: ".tv_interval( \@startTime )."\n"; @startTime = gettimeofday; for ( 1..10000 ) { my (@a, @b) ; m[^([ab])_(.)$] and push @{$1 eq 'a' ? \@a : \@b}, $2 for @temp; } print "Time: ".tv_interval( \@startTime )."\n"; Output: Time: 0.575739 Time: 0.717803 Time: 0.46604 Time: 1.731715 Time: 0.777471 Time: 0.462416 Time: 0.103439 [download] I'd go with BrowserUK's code :) Update!!!: Somewhere along the way the @temp was being affected, hence the warnings on phaylon's code. Here's the updated output w/ mine removed and phaylon's added (also a little more descriptive): `Baseline Time: 0.584196 Regexp w/ Hash Time: 0.7385 Eval Time: 1.774581 Map Time: 0.76179 Grep Time: 0.456991 Hash w/ substr Time: 0.411059 Regexp w/ eq Time: 0.683154 Router Time: 0.668033` [download] Hence, the HoA using substrings would be the most flexible and quickest (albeit a bit risky). Grep is a close second with the eval being 3x as long.	[reply] [d/l] [select]
Re^2: Can this code be optimized further? by eric256 (Parson) on Feb 10, 2005 at 16:10 UTC
You should use the benchmark module to do benchmarking. Hopefully I got these right. #!/usr/bin/perl -w use strict; use Benchmark qw/cmpthese/; my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; # samy_kumar sub original { my (@a, @b) ; foreach my $value (@temp) { push @a, $1 if ($value =~ /a_(.)/) ; push @b, $1 if ($value =~ /b_(.)/) ; } } # dragonchild sub hash_style { my %values; foreach (@temp) { /^([ab])_(.)/ && do { push @{$values{$1}}, $2; next; }; } } # roy jonhson sub router_style { my (@a, @b); my %router = (a => \@a, b => \@b); foreach my $value (@temp) { push @{$router{$1}}, $2 if $value =~ /([ab])_(.)/; } } sub switch_style { my (@a, @b); foreach (@temp) { my $prefix = substr( $_, 0, 2 ); if ( $prefix eq 'a_' ) { push @a, substr( $_, 2 ); } elsif ( $prefix eq 'b_' ) { push @b, substr( $_, 2 ); } } } sub map_style { my @a = map {/a_(.)/ ? $1 : ()} @temp; my @b = map {/b_(.)/ ? $1 : ()} @temp; } sub grep_style { my @a_arr = grep { s/^a_(.)/$1/} @temp; my @b_arr = grep { s/^b_(.)/$1/} @temp; } sub grep_map_style { my @a = map {/a_(.)/; $1} grep /a_/, @temp; my @b = map {/b_(.)/; $1} grep /b_/, @temp; } sub trinary { my (@a, @b) ; m[^([ab])_(.*)$] and push @{$1 eq 'a' ? \@a : \@b}, $2 for @temp; } cmpthese( 100_000, { "Original" => \&original, "Hash" => \&hash_style, "Router" => \&router_style, "Switch" => \&switch_style, "Map" => \&map_style, "Grep" => \&grep_style, "Grep + Map" => \&grep_map_style, "Trinary" => \&trinary }); __DATA__ C:\test>perl 429768.pl Rate Grep Switch Map Router Original Grep + Map Ha +sh Trinary Grep 32489/s -- -67% -70% -78% -81% -83% -8 +5% -87% Switch 98425/s 203% -- -9% -34% -42% -49% -5 +5% -60% Map 108460/s 234% 10% -- -27% -36% -44% -5 +1% -56% Router 149031/s 359% 51% 37% -- -12% -23% -3 +2% -40% Original 168634/s 419% 71% 55% 13% -- -13% -2 +3% -32% Grep + Map 194175/s 498% 97% 79% 30% 15% -- -1 +2% -21% Hash 220264/s 578% 124% 103% 48% 31% 13% +-- -11% Trinary 246914/s 660% 151% 128% 66% 46% 27% 1 +2% -- C:\test>perl 429768.pl Rate Grep Switch Map Router Original Grep + Map Ha +sh Trinary Grep 32658/s -- -66% -69% -78% -81% -83% -8 +5% -86% Switch 95602/s 193% -- -10% -36% -43% -49% -5 +5% -60% Map 106610/s 226% 12% -- -28% -37% -43% -5 +0% -55% Router 148810/s 356% 56% 40% -- -12% -21% -3 +0% -37% Original 168350/s 415% 76% 58% 13% -- -11% -2 +1% -29% Grep + Map 188324/s 477% 97% 77% 27% 12% -- -1 +2% -21% Hash 213220/s 553% 123% 100% 43% 27% 13% +-- -10% Trinary 236967/s 626% 148% 122% 59% 41% 26% 1 +1% -- C:\test>perl 429768.pl Rate Grep Switch Map Router Original Grep + Map Ha +sh Trinary Grep 32819/s -- -66% -69% -78% -81% -82% -8 +6% -87% Switch 96899/s 195% -- -9% -35% -42% -47% -5 +8% -61% Map 106724/s 225% 10% -- -28% -37% -42% -5 +3% -57% Router 148810/s 353% 54% 39% -- -12% -19% -3 +5% -40% Original 168350/s 413% 74% 58% 13% -- -8% -2 +6% -32% Grep + Map 182815/s 457% 89% 71% 23% 9% -- -2 +0% -26% Hash 228833/s 597% 136% 114% 54% 36% 25% +-- -7% Trinary 246305/s 650% 154% 131% 66% 46% 35% +8% -- C:\test> [download] ___________ Eric Hodges	[reply] [d/l]
Re^3: Can this code be optimized further? by Roy Johnson (Monsignor) on Feb 10, 2005 at 18:08 UTC
I added a testing mode to the code, so you can verify that each sub yields proper results. When running, if you pass a n argument to the program, it will run a test instead of a benchmark. Benchmark runs for three seconds instead of 100_000 iterations, now. I also tweaked a few of the routines, which made substantial differences in their runtimes. And I dumped a couple of uninteresting subs. The code is in the readmore. Read more... (3 kB) My results: Rate Grep Trinary Hash Map Original New_Map New_Grep Tr +i_Substr Switch Tri_Substr2 Grep 969/s -- -30% -33% -37% -43% -50% -56% + -68% -71% -76% Trinary 1379/s 42% -- -4% -10% -19% -29% -37% + -55% -59% -66% Hash 1442/s 49% 5% -- -6% -15% -26% -34% + -53% -57% -64% Map 1528/s 58% 11% 6% -- -10% -21% -31% + -50% -54% -62% Original 1702/s 76% 23% 18% 11% -- -12% -23% + -44% -49% -57% New_Map 1937/s 100% 40% 34% 27% 14% -- -12% + -36% -42% -52% New_Grep 2202/s 127% 60% 53% 44% 29% 14% -- + -28% -34% -45% Tri_Substr 3050/s 215% 121% 111% 100% 79% 57% 39% + -- -9% -24% Switch 3353/s 246% 143% 132% 119% 97% 73% 52% + 10% -- -16% Tri_Substr2 4000/s 313% 190% 177% 162% 135% 106% 82% + 31% 19% -- [download] Tri_Substr2 is effectively a synthesis of switch and trinary. You can see the differences little design changes make. Using references slows things down. Matching and substituting back in instead of just deleting makes a big difference for grep vs. new_grep. Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]
Re^4: Can this code be optimized further? by dragonchild (Archbishop) on Feb 10, 2005 at 18:35 UTC
Re^5: Can this code be optimized further? by Roy Johnson (Monsignor) on Feb 10, 2005 at 18:52 UTC
Re^3: Can this code be optimized further? by RazorbladeBidet (Friar) on Feb 10, 2005 at 16:18 UTC
Redefine @temp before each algorithm or pass it in as an argument. I see that Benchmark has a different output format, but the results should be close to the same, no?	[reply]
Re^4: Can this code be optimized further? by eric256 (Parson) on Feb 10, 2005 at 16:33 UTC
Re^5: Can this code be optimized further? by RazorbladeBidet (Friar) on Feb 10, 2005 at 16:37 UTC
Re: Can this code be optimized further? by TedPride (Priest) on Feb 10, 2005 at 16:15 UTC
I don't think regex is needed here, and you really should make the code somewhat more forgiving to data variations. What if you use all letters of the alphabet instead of just a and b? `use strict; use warnings; my @temp=("a_1","b_1","a_2","a_3","a_4","b_2","a_5","b_3","a_6","b_4") +; my ($key, $val, %hash); for (@temp) { ($key, $val) = split '_'; if (exists($hash{$key})) { push @{$hash{$key}}, $val; } else { $hash{$key} = [$val]; } } for (sort keys %hash) { print "$_: " . join(' ', @{$hash{$_}}) . "\n"; }` [download] You could probably optimize this further by allocating the entire expected space for each array when initially created, but I'll leave that for the next guy.	[reply] [d/l]
Re: Can this code be optimized further? by jdporter (Paladin) on Feb 10, 2005 at 20:58 UTC
This solution is more golfish than fast. (Anyone care to benchmark it?) `eval join '', map /(.)_(.)/ ? "push \@$1,'$2';" : (), @temp;` [download] Here's another which is shorter but perhaps slower, since it does many evals instead of one: `/(.)_(.)/ && eval "push \@$1,'$2'" for @temp;` [download]	[reply] [d/l] [select]
Re: Can this code be optimized further? by ikegami (Patriarch) on Feb 10, 2005 at 18:23 UTC
Another way that works from a string instead of from an array: `use strict; use warnings; my @temp = qw( a_1 b_1 a_2 a_3 a_4 b_2 a_5 b_3 a_6 b_4 ); local $_ = join(' ', @temp); my @a = /a_(\d+)/g; # or: my @a = /\ba_(\d+)\b/g; my @b = /b_(\d+)/g; # or: my @b = /\bb_(\d+)\b/g; $, = ", "; $\ = "\n"; print(@a); print(@b); __END__ output ====== 1, 2, 3, 4, 5, 6 1, 2, 3, 4` [download]	[reply] [d/l]