in reply to Is it possible to make reference to substrings and sort them?

With a bit of index arithmetic and a manual compare function you would get

my $ntext = length $text; my $bwt2 = join('', map{substr($text, ($_ - 1), 1)} sort{ my $j = -1; my $diff = 0; while( $j++<$ntext ) { $diff = substr( $text, ($j+$a) % $ntext, 1 ) cmp substr( $text +, ($j+$b) % $ntext, 1 ); last if $diff; } return $diff; } 0..$ntext-1 ); print "$text: $bwt2\n";

Whether or not this is faster, I do not know, but it avoids building all the cyclic rotations.

Replies are listed 'Best First'.
Re^2: Is it possible to make reference to substrings and sort them?
by hdb (Monsignor) on Mar 22, 2015 at 13:20 UTC

    If you can spare the memory to double your text, you can create references to all your rotations. Below I use the Schwartzian transform to create the references once, then sort them.

    my $text2 = "$text$text"; my $bwt4 = join('', map{substr($text, ($_ - 1), 1)} map{ $_->[0] } sort{ ${$a->[1]} cmp ${$b->[1]} } map{ [$_,\substr($text2,$_,$ntext)] } 0..$ntext-1 ); print "$text: $bwt4\n";

      Good call on doubling the string. Though the Schwartzian seems unnecessary here:

      my $text2 = "$text$text"; my $tlen = length($text); my $bwt7 = join('', map{ substr($$_, -1) } sort{ $$a cmp $$b } map{ \substr($text2, $_, $tlen) } 0..$tlen-1 ); print "$text: $bwt7\n";

      Intermediate array of references consumes more memory than an array of offsets would, I think, but is faster to sort. Something to test and benchmark...

Re^2: Is it possible to make reference to substrings and sort them?
by spooknick (Novice) on Mar 22, 2015 at 21:09 UTC

    Excellent, it took:

    Spent 398.221 wallclock secs (397.00 usr +  0.98 sys = 397.98 CPU)

    for a sequence of 4639675 nt

    I just thought about not using the complete length of text in sort, like using only the first 1000 nt, but your idea is accurate.