How to transpose lines faster?

rsFalse has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to transpose lines faster? by pryrt (Abbot) on Feb 01, 2018 at 22:03 UTC
I would have used `with_substr()` (in the spoiler), but your chop-based is faster during my test. However, if the character '0' (ASCII 48) is in your possible alphabet, `chop()` won't give the right answer benchmark: #!/usr/bin/env perl -l use warnings; use strict; use Benchmark qw/timethese cmpthese/; use Test::More; my @az = ('a'..'z', 'A'..'Z', '1'..'9'); # note: if '0' is in your +alphabet, chop() will fail my @golden_original; my $N = 200; for( 1 .. $N ) { push @golden_original, join '', map { $az[rand @az] } 1..$N; } #print length($str), qq(\t"$str"); #print scalar @golden_original; #diag explain org => \@gold +en_original; my $wch = [with_chop (@golden_original)]; #diag explain wch => $wch; my $wsp = [with_split (@golden_original)]; #diag explain wsp => $wsp; my $wsu = [with_substr(@golden_original)]; #diag explain wsu => $wsu; is_deeply( $wsp, $wch, 'split vs chop'); is_deeply( $wsu, $wch, 'substr vs chop'); sub with_split { my @original = @_; my @transposed; for( @original ) { my $i = 0; $transposed[$i++] .= $_ for split //; } return @transposed; } sub with_chop { my @original = @_; my @transposed; for( map $_ = reverse, @original ){ my $i = 0; $transposed[ $i ++ ] .= chop while $_; } return @transposed; } sub with_substr { my @original = @_; my $w = scalar @original; my @transposed = ( ' 'x$w ) x $w; for my $i ( 0 .. $#original ) { for my $j ( 0 .. $i ) { substr($transposed[$i], $j, 1) = substr($original[$j], $i, + 1); substr($transposed[$j], $i, 1) = substr($original[$i], $j, + 1); } } return @transposed; } my $r = timethese( 1000, { with_substr => sub { with_substr( @golden_original ); }, with_split => sub { with_split ( @golden_original ); }, with_chop => sub { with_chop ( @golden_original ); }, }); cmpthese $r; done_testing(); [download] `with_chop: 5 wallclock secs ( 5.19 usr + 0.00 sys = 5.19 CPU) @ 19 +2.79/s (n=1000) with_split: 13 wallclock secs (13.14 usr + 0.02 sys = 13.16 CPU) @ 76 +.01/s (n=1000) with_substr: 7 wallclock secs ( 6.89 usr + 0.00 sys = 6.89 CPU) @ 1 +45.14/s (n=1000) Rate with_split with_substr with_chop with_split 76.0/s -- -48% -61% with_substr 145/s 91% -- -25% with_chop 193/s 154% 33% --` [download] I am no good with pack/unpack, otherwise, I would have tested LanX's suggestion, too. From what I remember in past benchmarks, that pair usually beats substr. update: chop will fail because you're testing the truthiness of the character it returns. '0', like 0, is false. update 2: hmm, no, you should be testing the truthiness of the whole string. When I had '0' in my alphabet, it dropped one or more '0's from the last row of `@$wch`... Oh, right, if the last character in one of the strings in @original ends in one or more zeroes, then it will evaluate to false, rather than your expected true. So if '0' might be in your alphabet, you should probably test the `if length($_)` instead. update 3: I meant `while length $_`...	[reply] [d/l] [select]
Re^2: How to transpose lines faster? by pryrt (Abbot) on Feb 01, 2018 at 22:23 UTC
yeah, the `while length $_` will do it; see modified code #!/usr/bin/env perl -l use warnings; use strict; use Benchmark qw/timethese cmpthese/; use Test::More; my @az = ('a'..'z', 'A'..'Z', '1'..'9'); # note: if '0' is in your +alphabet, chop() will fail my @golden_original; my $N = 2000; for( 1 .. $N ) { push @golden_original, join '', (map { $az[rand @az] } 1..$N-5), +'0'x5; } #print length($str), qq(\t"$str"); #print scalar @golden_original; #diag explain org => \@gold +en_original; my $wcf = [with_chop_fixed(@golden_original)]; #diag explain wcf => $w +cf; my $wch = [with_chop (@golden_original)]; #diag explain wch => $w +ch; my $wsp = [with_split (@golden_original)]; #diag explain wsp => $w +sp; my $wsu = [with_substr (@golden_original)]; #diag explain wsu => $w +su; is_deeply( $wch, $wsp, 'your chop vs split'); is_deeply( $wcf, $wsp, 'my chop vs split'); is_deeply( $wsu, $wsp, 'substr vs split'); sub with_split { my @original = @_; my @transposed; for( @original ) { my $i = 0; $transposed[$i++] .= $_ for split //; } return @transposed; } sub with_chop { my @original = @_; my @transposed; for( map $_ = reverse, @original ){ my $i = 0; $transposed[ $i ++ ] .= chop while $_; } return @transposed; } sub with_chop_fixed { my @original = @_; my @transposed; for( map $_ = reverse, @original ){ my $i = 0; $transposed[ $i ++ ] .= chop while length $_; } return @transposed; } sub with_substr { my @original = @_; my $w = scalar @original; my @transposed = ( ' 'x$w ) x $w; for my $i ( 0 .. $#original ) { for my $j ( 0 .. $i ) { substr($transposed[$i], $j, 1) = substr($original[$j], $i, + 1); substr($transposed[$j], $i, 1) = substr($original[$i], $j, + 1); } } return @transposed; } #my $r = timethese( -5, { # with_substr => sub { with_substr( @golden_original ); }, # with_split => sub { with_split ( @golden_original ); }, # with_chop => sub { with_chop ( @golden_original ); }, #}); #cmpthese $r; done_testing(); [download]	[reply] [d/l] [select]
Re^3: How to transpose lines faster? by rsFalse (Chaplain) on Feb 02, 2018 at 12:00 UTC
Thank you, pryrt, for nice analysis, and thanks for finding a bug with '0'.	[reply]
Re: How to transpose lines faster? by AnomalousMonk (Archbishop) on Feb 01, 2018 at 22:50 UTC
Both ways are non-destructive (after using map in second). FWIW: I don't understand what "after using map in second" means, but the second OPed method is destructive (i.e., changes `@orig` array): `c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my @orig = qw( abcde fghij klmno pqrst uvwxy ); ;; my @transposed; for (map $_ = reverse, @orig) { my $i = 0; $transposed[ $i ++ ] .= chop while $_; } dd \@transposed; dd \@orig; " ["afkpu", "bglqv", "chmrw", "dinsx", "ejoty"] ["edcba", "jihgf", "onmlk", "tsrqp", "yxwvu"]` [download] But changing the map expression to `for (map scalar(reverse), @orig) { ... }` fixes this. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re: How to transpose lines faster? by LanX (Saint) on Feb 01, 2018 at 20:50 UTC
> Do you know faster ways? Probably by using `unpack` and `pack` with a generated dynamic template. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Wikisyntax for the Monastery}	[reply]
Re: How to transpose lines faster? by BrowserUk (Patriarch) on Feb 02, 2018 at 00:38 UTC
How does this score on your system? `for my $i (0 .. length( $in[0] ) - 1 ) { push @out, join '', map substr( $_,$i,1 ), @in; }` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit	[reply] [d/l]
Re^2: How to transpose lines faster? by rsFalse (Chaplain) on Feb 03, 2018 at 00:25 UTC
On my PC, 2e3 x 2e3 matrices: `$transposed[ $i ++ ] .= chop while $_` (incorrect): 0.6s `$transposed[ $i ++ ] .= chop while length` (correct) (by pryrt): 0.75s `$transposed[ $i ++ ] .= chop while /./` (assuming that dot match any input char): 1.35s `$transposed[ $i ++ ] .= chop while !/^$/`: 1.35s next are without reverse: `$transposed[ $i ++ ] .= $& while /./g`: 1.85s `$transposed[ $i ++ ] .= substr $_, 0, 1, '' while length`: 1.0s `$transposed[ $i ++ ] .= $_ for split //`: 1.0s `push @out, join '', map substr( $_,$i,1 ), @in` (by BrowserUK): 1.1s (pryrt's substr+substr variant ommited)	[reply] [d/l] [select]
Re^3: How to transpose lines faster? by pryrt (Abbot) on Feb 03, 2018 at 02:03 UTC
I'm actually curious how my multi-substr variant would perform in your bench... because in my tests (on two different machines), and much to my surprise, mine significantly outperforms the join-map-substr. anomalous_map: 20 wallclock secs (17.09 usr + 0.16 sys = 17.25 CPU) @ + 2.03/s (n=35) anomalous_two: 17 wallclock secs (15.12 usr + 0.08 sys = 15.20 CPU) @ + 2.04/s (n=31) buk_substr: 16 wallclock secs (15.59 usr + 0.01 sys = 15.61 CPU) @ 0 +.70/s (n=11) with_chop_fixed: 16 wallclock secs (15.09 usr + 0.17 sys = 15.27 CPU) + @ 2.23/s (n=34) with_split: 16 wallclock secs (15.19 usr + 0.00 sys = 15.19 CPU) @ 0 +.86/s (n=13) with_substr: 15 wallclock secs (15.39 usr + 0.00 sys = 15.39 CPU) @ +1.75/s (n=27) Rate buk_substr with_split with_substr anomalous_ma +p anomalous_two with_chop_fixed buk_substr 0.705/s -- -18% -60% -65 +% -65% -68% with_split 0.856/s 21% -- -51% -58 +% -58% -62% with_substr 1.75/s 149% 105% -- -14 +% -14% -21% anomalous_map 2.03/s 188% 137% 16% - +- -0% -9% anomalous_two 2.04/s 189% 138% 16% 0 +% -- -8% with_chop_fixed 2.23/s 216% 160% 27% 10 +% 9% -- [download] anomalous_map: 16 wallclock secs (15.34 usr + 0.06 sys = 15.41 CPU) @ + 1.49/s (n=23) anomalous_two: 15 wallclock secs (14.98 usr + 0.02 sys = 15.00 CPU) @ + 1.60/s (n=24) buk_substr: 17 wallclock secs (16.56 usr + 0.03 sys = 16.59 CPU) @ 0 +.30/s (n=5) with_chop_fixed: 15 wallclock secs (15.22 usr + 0.02 sys = 15.23 CPU) + @ 1.51/s (n=23) with_split: 17 wallclock secs (17.41 usr + 0.00 sys = 17.41 CPU) @ 0 +.34/s (n=6) with_substr: 15 wallclock secs (15.50 usr + 0.00 sys = 15.50 CPU) @ +1.29/s (n=20) Rate buk_substr with_split with_substr anomalous_ma +p with_chop_fixed anomalous_two buk_substr 0.301/s -- -13% -77% -80 +% -80% -81% with_split 0.345/s 14% -- -73% -77 +% -77% -78% with_substr 1.29/s 328% 274% -- -14 +% -15% -19% anomalous_map 1.49/s 395% 333% 16% - +- -1% -7% with_chop_fixed 1.51/s 401% 338% 17% 1 +% -- -6% anomalous_two 1.60/s 431% 364% 24% 7 +% 6% -- [download] And yes, the anomalous_map, anomalous_two and with_chop_fixed seemed to go back and forth a lot, even on the first system. It's not had a lot of runs, so the timing's not very accurate. Trying with 200x200 on the second machine, to give it more runs per function: `Rate buk_substr with_split with_substr with_chop_fix +ed anomalous_map anomalous_two buk_substr 29.1/s -- -15% -77% -7 +8% -78% -80% with_split 34.3/s 18% -- -73% -7 +4% -74% -76% with_substr 127/s 337% 271% -- - +2% -5% -12% with_chop_fixed 130/s 348% 280% 3% +-- -3% -10% anomalous_map 134/s 360% 291% 5% +3% -- -8% anomalous_two 145/s 399% 323% 14% 1 +1% 8% --` [download] #!/usr/bin/env perl -l use warnings; use strict; use Benchmark qw/timethese cmpthese/; use Test::More; my @az = ('a'..'z', 'A'..'Z', '1'..'9'); # note: if '0' is in your +alphabet, chop() will fail my @golden_original; my $N = 2000; for( 1 .. $N ) { push @golden_original, join '', (map { $az[rand @az] } 1..$N-5), +'0'x5; } #print length($str), qq(\t"$str"); #print scalar @golden_original; #diag explain org => \@gold +en_original; my $wcf = [with_chop_fixed(@golden_original)]; #diag explain wcf => $w +cf; my $wch = [with_chop (@golden_original)]; #diag explain wch => $w +ch; my $wsp = [with_split (@golden_original)]; #diag explain wsp => $w +sp; my $wsu = [with_substr (@golden_original)]; #diag explain wsu => $w +su; my $wuk = [buk_substr (@golden_original)]; #diag explain wuk => $w +uk; my $wam = [anomalous_map (@golden_original)]; #diag explain wam => $w +am; my $wa2 = [anomalous_two (@golden_original)]; #diag explain wa2 => $w +a2; is_deeply( $wch, $wsp, 'your chop vs split'); is_deeply( $wcf, $wsp, 'my chop vs split'); is_deeply( $wsu, $wsp, 'substr vs split'); is_deeply( $wuk, $wsp, 'BrowserUK\'s substr vs split'); is_deeply( $wam, $wsp, 'AnomalousMonk\'s map vs split'); is_deeply( $wa2, $wsp, 'AnomalousMonk\'s second vs split'); sub with_split { my @original = @_; my @transposed; for( @original ) { my $i = 0; $transposed[$i++] .= $_ for split //; } return @transposed; } sub with_chop { my @original = @_; my @transposed; for( map $_ = reverse, @original ){ my $i = 0; $transposed[ $i ++ ] .= chop while $_; } return @transposed; } sub with_chop_fixed { my @original = @_; my @transposed; for( map $_ = reverse, @original ){ my $i = 0; $transposed[ $i ++ ] .= chop while length $_; } return @transposed; } sub with_substr { my @original = @_; my $w = scalar @original; my @transposed = ( ' 'x$w ) x $w; for my $i ( 0 .. $#original ) { for my $j ( 0 .. $i ) { substr($transposed[$i], $j, 1) = substr($original[$j], $i, + 1); substr($transposed[$j], $i, 1) = substr($original[$i], $j, + 1); } } return @transposed; } sub buk_substr { # https://perlmonks.org/?node_id=1208297 my @in = @_; my @out; for my $i (0 .. length( $in[0] ) - 1 ) { push @out, join '', map substr( $_,$i,1 ), @in; } return @out; } sub anomalous_map { # https://perlmonks.org/?node_id=1208294, but +modified for my length $_ fix... my @orig = @_; my @transposed; for (map scalar(reverse), @orig) { my $i = 0; $transposed[ $i ++ ] .= chop while length $_; } return @transposed; } sub anomalous_two { # [id://1208340] my $ar_matrix = [@_]; my $r_all = reverse @$ar_matrix; my $i_max = @$ar_matrix && length($ar_matrix->[0]) - 1; my @transposed; $#transposed = $i_max - 1; for( 0 .. $#$ar_matrix ) { for( 0 .. $i_max) { $transposed[$_] .= chop $r_all; } } return @transposed; } my $r = timethese( -15, { with_substr => sub { with_substr ( @golden_original + ); }, with_split => sub { with_split ( @golden_original + ); }, with_chop_fixed => sub { with_chop_fixed ( @golden_original + ); }, buk_substr => sub { buk_substr ( @golden_original + ); }, anomalous_map => sub { anomalous_map ( @golden_original + ); }, anomalous_two => sub { anomalous_two ( @golden_original + ); }, }); cmpthese $r; done_testing(); [download]	[reply] [d/l] [select]
Re: How to transpose lines faster? by AnomalousMonk (Archbishop) on Feb 02, 2018 at 18:20 UTC
Here's my nomination. I haven't done any Benchmark-ing, but the minimal computation done in the inmost `for`-loop gives me hope that it will have a chance. Read more... (5 kB) Update: It just occurred to me that pre-extending the `@transposed` array might squeeze out a few more microseconds of performance. In the `AnomalousMonk_1()` function, change the `my @transposed;` statement to `my @transposed;` `$#transposed = $i_max - 1;` (~~tested~~ \| tested for correctness, but again, no benchmarking for speed). Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]