Re: How to transpose lines faster?
by pryrt (Abbot) on Feb 01, 2018 at 22:03 UTC
|
I would have used with_substr() (in the spoiler), but your chop-based is faster during my test. However, if the character '0' (ASCII 48) is in your possible alphabet, chop() won't give the right answer
benchmark:
#!/usr/bin/env perl -l
use warnings;
use strict;
use Benchmark qw/timethese cmpthese/;
use Test::More;
my @az = ('a'..'z', 'A'..'Z', '1'..'9'); # note: if '0' is in your
+alphabet, chop() will fail
my @golden_original;
my $N = 200;
for( 1 .. $N ) {
push @golden_original, join '', map { $az[rand @az] } 1..$N;
}
#print length($str), qq(\t"$str");
#print scalar @golden_original;
#diag explain org => \@gold
+en_original;
my $wch = [with_chop (@golden_original)]; #diag explain wch => $wch;
my $wsp = [with_split (@golden_original)]; #diag explain wsp => $wsp;
my $wsu = [with_substr(@golden_original)]; #diag explain wsu => $wsu;
is_deeply( $wsp, $wch, 'split vs chop');
is_deeply( $wsu, $wch, 'substr vs chop');
sub with_split {
my @original = @_;
my @transposed;
for( @original ) {
my $i = 0;
$transposed[$i++] .= $_ for split //;
}
return @transposed;
}
sub with_chop {
my @original = @_;
my @transposed;
for( map $_ = reverse, @original ){
my $i = 0;
$transposed[ $i ++ ] .= chop while $_;
}
return @transposed;
}
sub with_substr {
my @original = @_;
my $w = scalar @original;
my @transposed = ( ' 'x$w ) x $w;
for my $i ( 0 .. $#original ) {
for my $j ( 0 .. $i ) {
substr($transposed[$i], $j, 1) = substr($original[$j], $i,
+ 1);
substr($transposed[$j], $i, 1) = substr($original[$i], $j,
+ 1);
}
}
return @transposed;
}
my $r = timethese( 1000, {
with_substr => sub { with_substr( @golden_original ); },
with_split => sub { with_split ( @golden_original ); },
with_chop => sub { with_chop ( @golden_original ); },
});
cmpthese $r;
done_testing();
with_chop: 5 wallclock secs ( 5.19 usr + 0.00 sys = 5.19 CPU) @ 19
+2.79/s (n=1000)
with_split: 13 wallclock secs (13.14 usr + 0.02 sys = 13.16 CPU) @ 76
+.01/s (n=1000)
with_substr: 7 wallclock secs ( 6.89 usr + 0.00 sys = 6.89 CPU) @ 1
+45.14/s (n=1000)
Rate with_split with_substr with_chop
with_split 76.0/s -- -48% -61%
with_substr 145/s 91% -- -25%
with_chop 193/s 154% 33% --
I am no good with pack/unpack, otherwise, I would have tested LanX's suggestion, too. From what I remember in past benchmarks, that pair usually beats substr.
update: chop will fail because you're testing the truthiness of the character it returns. '0', like 0, is false.
update 2: hmm, no, you should be testing the truthiness of the whole string. When I had '0' in my alphabet, it dropped one or more '0's from the last row of @$wch... Oh, right, if the last character in one of the strings in @original ends in one or more zeroes, then it will evaluate to false, rather than your expected true. So if '0' might be in your alphabet, you should probably test the if length($_) instead.
update 3: I meant while length $_...
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
|
|
Thank you, pryrt, for nice analysis, and thanks for finding a bug with '0'.
| [reply] |
Re: How to transpose lines faster?
by AnomalousMonk (Archbishop) on Feb 01, 2018 at 22:50 UTC
|
Both ways are non-destructive (after using map in second).
FWIW: I don't understand what "after using map in second" means, but the second OPed method is destructive (i.e., changes @orig array):
c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le
"my @orig = qw(
abcde
fghij
klmno
pqrst
uvwxy
);
;;
my @transposed;
for (map $_ = reverse, @orig) {
my $i = 0;
$transposed[ $i ++ ] .= chop while $_;
}
dd \@transposed;
dd \@orig;
"
["afkpu", "bglqv", "chmrw", "dinsx", "ejoty"]
["edcba", "jihgf", "onmlk", "tsrqp", "yxwvu"]
But changing the map expression to
for (map scalar(reverse), @orig) { ... }
fixes this.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: How to transpose lines faster?
by LanX (Saint) on Feb 01, 2018 at 20:50 UTC
|
| [reply] |
Re: How to transpose lines faster?
by BrowserUk (Patriarch) on Feb 02, 2018 at 00:38 UTC
|
for my $i (0 .. length( $in[0] ) - 1 ) {
push @out, join '', map substr( $_,$i,1 ), @in;
}
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
Suck that fhit
| [reply] [d/l] |
|
|
On my PC, 2e3 x 2e3 matrices:
$transposed[ $i ++ ] .= chop while $_ (incorrect): 0.6s
$transposed[ $i ++ ] .= chop while length (correct) (by pryrt): 0.75s
$transposed[ $i ++ ] .= chop while /./ (assuming that dot match any input char): 1.35s
$transposed[ $i ++ ] .= chop while !/^$/: 1.35s
next are without reverse:
$transposed[ $i ++ ] .= $& while /./g: 1.85s
$transposed[ $i ++ ] .= substr $_, 0, 1, '' while length: 1.0s
$transposed[ $i ++ ] .= $_ for split //: 1.0s
push @out, join '', map substr( $_,$i,1 ), @in (by BrowserUK): 1.1s
(pryrt's substr+substr variant ommited)
| [reply] [d/l] [select] |
|
|
I'm actually curious how my multi-substr variant would perform in your bench... because in my tests (on two different machines), and much to my surprise, mine significantly outperforms the join-map-substr.
anomalous_map: 20 wallclock secs (17.09 usr + 0.16 sys = 17.25 CPU) @
+ 2.03/s (n=35)
anomalous_two: 17 wallclock secs (15.12 usr + 0.08 sys = 15.20 CPU) @
+ 2.04/s (n=31)
buk_substr: 16 wallclock secs (15.59 usr + 0.01 sys = 15.61 CPU) @ 0
+.70/s (n=11)
with_chop_fixed: 16 wallclock secs (15.09 usr + 0.17 sys = 15.27 CPU)
+ @ 2.23/s (n=34)
with_split: 16 wallclock secs (15.19 usr + 0.00 sys = 15.19 CPU) @ 0
+.86/s (n=13)
with_substr: 15 wallclock secs (15.39 usr + 0.00 sys = 15.39 CPU) @
+1.75/s (n=27)
Rate buk_substr with_split with_substr anomalous_ma
+p anomalous_two with_chop_fixed
buk_substr 0.705/s -- -18% -60% -65
+% -65% -68%
with_split 0.856/s 21% -- -51% -58
+% -58% -62%
with_substr 1.75/s 149% 105% -- -14
+% -14% -21%
anomalous_map 2.03/s 188% 137% 16% -
+- -0% -9%
anomalous_two 2.04/s 189% 138% 16% 0
+% -- -8%
with_chop_fixed 2.23/s 216% 160% 27% 10
+% 9% --
anomalous_map: 16 wallclock secs (15.34 usr + 0.06 sys = 15.41 CPU) @
+ 1.49/s (n=23)
anomalous_two: 15 wallclock secs (14.98 usr + 0.02 sys = 15.00 CPU) @
+ 1.60/s (n=24)
buk_substr: 17 wallclock secs (16.56 usr + 0.03 sys = 16.59 CPU) @ 0
+.30/s (n=5)
with_chop_fixed: 15 wallclock secs (15.22 usr + 0.02 sys = 15.23 CPU)
+ @ 1.51/s (n=23)
with_split: 17 wallclock secs (17.41 usr + 0.00 sys = 17.41 CPU) @ 0
+.34/s (n=6)
with_substr: 15 wallclock secs (15.50 usr + 0.00 sys = 15.50 CPU) @
+1.29/s (n=20)
Rate buk_substr with_split with_substr anomalous_ma
+p with_chop_fixed anomalous_two
buk_substr 0.301/s -- -13% -77% -80
+% -80% -81%
with_split 0.345/s 14% -- -73% -77
+% -77% -78%
with_substr 1.29/s 328% 274% -- -14
+% -15% -19%
anomalous_map 1.49/s 395% 333% 16% -
+- -1% -7%
with_chop_fixed 1.51/s 401% 338% 17% 1
+% -- -6%
anomalous_two 1.60/s 431% 364% 24% 7
+% 6% --
And yes, the anomalous_map, anomalous_two and with_chop_fixed seemed to go back and forth a lot, even on the first system. It's not had a lot of runs, so the timing's not very accurate. Trying with 200x200 on the second machine, to give it more runs per function:
Rate buk_substr with_split with_substr with_chop_fix
+ed anomalous_map anomalous_two
buk_substr 29.1/s -- -15% -77% -7
+8% -78% -80%
with_split 34.3/s 18% -- -73% -7
+4% -74% -76%
with_substr 127/s 337% 271% -- -
+2% -5% -12%
with_chop_fixed 130/s 348% 280% 3%
+-- -3% -10%
anomalous_map 134/s 360% 291% 5%
+3% -- -8%
anomalous_two 145/s 399% 323% 14% 1
+1% 8% --
#!/usr/bin/env perl -l
use warnings;
use strict;
use Benchmark qw/timethese cmpthese/;
use Test::More;
my @az = ('a'..'z', 'A'..'Z', '1'..'9'); # note: if '0' is in your
+alphabet, chop() will fail
my @golden_original;
my $N = 2000;
for( 1 .. $N ) {
push @golden_original, join '', (map { $az[rand @az] } 1..$N-5),
+'0'x5;
}
#print length($str), qq(\t"$str");
#print scalar @golden_original;
#diag explain org => \@gold
+en_original;
my $wcf = [with_chop_fixed(@golden_original)]; #diag explain wcf => $w
+cf;
my $wch = [with_chop (@golden_original)]; #diag explain wch => $w
+ch;
my $wsp = [with_split (@golden_original)]; #diag explain wsp => $w
+sp;
my $wsu = [with_substr (@golden_original)]; #diag explain wsu => $w
+su;
my $wuk = [buk_substr (@golden_original)]; #diag explain wuk => $w
+uk;
my $wam = [anomalous_map (@golden_original)]; #diag explain wam => $w
+am;
my $wa2 = [anomalous_two (@golden_original)]; #diag explain wa2 => $w
+a2;
is_deeply( $wch, $wsp, 'your chop vs split');
is_deeply( $wcf, $wsp, 'my chop vs split');
is_deeply( $wsu, $wsp, 'substr vs split');
is_deeply( $wuk, $wsp, 'BrowserUK\'s substr vs split');
is_deeply( $wam, $wsp, 'AnomalousMonk\'s map vs split');
is_deeply( $wa2, $wsp, 'AnomalousMonk\'s second vs split');
sub with_split {
my @original = @_;
my @transposed;
for( @original ) {
my $i = 0;
$transposed[$i++] .= $_ for split //;
}
return @transposed;
}
sub with_chop {
my @original = @_;
my @transposed;
for( map $_ = reverse, @original ){
my $i = 0;
$transposed[ $i ++ ] .= chop while $_;
}
return @transposed;
}
sub with_chop_fixed {
my @original = @_;
my @transposed;
for( map $_ = reverse, @original ){
my $i = 0;
$transposed[ $i ++ ] .= chop while length $_;
}
return @transposed;
}
sub with_substr {
my @original = @_;
my $w = scalar @original;
my @transposed = ( ' 'x$w ) x $w;
for my $i ( 0 .. $#original ) {
for my $j ( 0 .. $i ) {
substr($transposed[$i], $j, 1) = substr($original[$j], $i,
+ 1);
substr($transposed[$j], $i, 1) = substr($original[$i], $j,
+ 1);
}
}
return @transposed;
}
sub buk_substr { # https://perlmonks.org/?node_id=1208297
my @in = @_;
my @out;
for my $i (0 .. length( $in[0] ) - 1 ) {
push @out, join '', map substr( $_,$i,1 ), @in;
}
return @out;
}
sub anomalous_map { # https://perlmonks.org/?node_id=1208294, but
+modified for my length $_ fix...
my @orig = @_;
my @transposed;
for (map scalar(reverse), @orig) {
my $i = 0;
$transposed[ $i ++ ] .= chop while length $_;
}
return @transposed;
}
sub anomalous_two { # [id://1208340]
my $ar_matrix = [@_];
my $r_all = reverse @$ar_matrix;
my $i_max = @$ar_matrix && length($ar_matrix->[0]) - 1;
my @transposed; $#transposed = $i_max - 1;
for( 0 .. $#$ar_matrix ) {
for( 0 .. $i_max) {
$transposed[$_] .= chop $r_all;
}
}
return @transposed;
}
my $r = timethese( -15, {
with_substr => sub { with_substr ( @golden_original
+ ); },
with_split => sub { with_split ( @golden_original
+ ); },
with_chop_fixed => sub { with_chop_fixed ( @golden_original
+ ); },
buk_substr => sub { buk_substr ( @golden_original
+ ); },
anomalous_map => sub { anomalous_map ( @golden_original
+ ); },
anomalous_two => sub { anomalous_two ( @golden_original
+ ); },
});
cmpthese $r;
done_testing();
| [reply] [d/l] [select] |
Re: How to transpose lines faster?
by AnomalousMonk (Archbishop) on Feb 02, 2018 at 18:20 UTC
|
Here's my nomination. I haven't done any Benchmark-ing, but the minimal computation done in the inmost for-loop gives me hope that it will have a chance.
Update: It just occurred to me that pre-extending the @transposed array might squeeze out a few more microseconds of performance. In the AnomalousMonk_1() function, change the
my @transposed;
statement to
my @transposed;
$#transposed = $i_max - 1;
(tested | tested for correctness, but again, no benchmarking for speed).
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |