This is from real code. We have a string of DNA nucleotides (or amino acid residues), and also a list of the distances between successive gaps to insert in the string (this is run-length encoding for alignments). Now put the gaps into the sequence: transform gapify 'ACGTACGTACGT',2,4,5 into 'AC-GTAC-GTACG-T' (gaps after 2 letters, then 4, then 5).

It's also a nice example of how to use substr as an lvalue.

sub gapify { my $str = shift; my $loc; while (@_) { $loc += shift; substr($str,$loc++,0)='-'; } $str }

Replies are listed 'Best First'.
Re: Adding gaps to a sequence
by danger (Priest) on Jul 15, 2001 at 21:27 UTC

    Here is another alternative using unpack() --- should be somewhat more efficient, especially for sequence strings with more cuts:

    sub gapify { my $s = shift; join '-', unpack join('A','',@_,'*'), $s; }

      My tests proved otherwise -- else, I would have used that code in my reply. Since it wasn't faster, I looked for some other avenue.

      Jeff japhy Pinyan: Perl, regex, and perl hacker.
      s++=END;++y(;-Q)}y js++=;shajsj<++y(p-q)}?print:??;

        That is odd, because my benchmark indicates the opposite. Here is some benchmark code calling the two version with longer strings and more cuts each time. The results (shown at the end, slightly reformatted: perl 5.6.1) indicate a fairly predictable increase in efficiency of the unpack() method versus the substr() method:

        #!/usr/bin/perl -w use strict; use Benchmark; my @args = ('ACGTACGTACGT', 2,4,5); timethese(-2,{ japhy => sub{japhy(@args)}, danger => sub{danger(@args)}, }); @args = ('ACGTACGTACGT' x 4, (2,4,5) x 4); timethese(-2,{ japhy => sub{japhy(@args)}, danger => sub{danger(@args)}, }); @args = ('ACGTACGTACGT' x 100, (2,4,5) x 100); timethese(-2,{ japhy => sub{japhy(@args)}, danger => sub{danger(@args)}, }); sub danger { my $s = shift; join '-', unpack join('A','',@_,'*') ,$s; } sub japhy { my $s = shift; join '-', map(substr($s, 0, $_, ''), @_),$s; } __END__ Benchmark: running danger, japhy, each for at least 2 CPU seconds... danger: 1 wallclock secs ( 2.06 usr + 0.00 sys = 2.06 CPU) @ 10438.35/s (n=21503) japhy: 1 wallclock secs ( 2.06 usr + 0.00 sys = 2.06 CPU) @ 8694.17/s (n=17910) Benchmark: running danger, japhy, each for at least 2 CPU seconds... danger: 2 wallclock secs ( 2.03 usr + 0.00 sys = 2.03 CPU) @ 4800.49/s (n=9745) japhy: 3 wallclock secs ( 2.10 usr + 0.00 sys = 2.10 CPU) @ 3409.05/s (n=7159) Benchmark: running danger, japhy, each for at least 2 CPU seconds... danger: 2 wallclock secs ( 2.13 usr + 0.00 sys = 2.13 CPU) @ 258.22/s (n=550) japhy: 3 wallclock secs ( 2.01 usr + 0.00 sys = 2.01 CPU) @ 155.72/s (n=313)

        Of course, it is entirely possible that I've completely messed up the benchmark.

Re: Adding gaps to a sequence
by japhy (Canon) on Jul 15, 2001 at 18:53 UTC
    Here's another approach -- still using substr(), but differently.
    sub gapify { my $s = shift; return join '-', map(substr($s, 0, $_, ''), @_), # runs of the string $s; # the remainder of the string }


    japhy -- Perl and Regex Hacker
Re: Adding gaps to a sequence
by MeowChow (Vicar) on Jul 15, 2001 at 22:10 UTC
    For amusement/golf purposes only...
    sub gapify { join'-',shift=~/@{[map"(.{$_})",@_]}(.*)/x }
       MeowChow                                   
                   s aamecha.s a..a\u$&owag.print
      I probably missed a trick or two but here goes nothing at 39 chars
      sub gapify_lk { $s=shift;$s=~s/(.*-)?.{$_}/$&-/for@_;$s }

      "Argument is futile - you will be ignorralated!"