in reply to splitting a sequence using unpack

I think if you read the documentation for pack/unpack youd know all you need to know. (see perlfunc) But the brief version is: "a3" means extract the input as sequences of 3 ascii chars (ie make a string out of the first three bytes). the 'x' operator is the element/string multiplier. In this context it repeats "a3" a certain number of times. So what the overall effect is is to divide your input string into as many three character sequences as there are in the string (assuming the strings length is a multiple of 3). Try doing some one liners to see what is going on:

D:\Development>perl -e "print 'a3' x 10" a3a3a3a3a3a3a3a3a3a3

This use of pack is quite fast for the job it does but has a disadvantage that the string must be stored twice in memory and the pack string must be created which will only be slightly shorter. If you only need to deal with each three character sequence at a time it may be better to do

while ($seq~=/\G(...)/sg) { print "Got: $1\n"; }
---
demerphq

Replies are listed 'Best First'.
Re^2: splitting a sequence using unpack
by Taulmarill (Deacon) on Mar 01, 2005 at 12:23 UTC
    if it's necessary to have the triplets in an array, this should do it.
    my $seq = "atgcatccctttaat"; my @triplets = $seq =~ /(...)/g; print "$seq: ", join(",", @triplets), "\n";

    and if the last "triplet" can have only one or thwo characters, just change the regex.

    my @triplets = $seq =~ /(.{1,3})/g;

    as i agree with demerphq that unpack should be faster than a regex, it's just far more readable what's going on by using regular expressions. tune your code with unpack if the script is too slow.
    btw. if you don't know what a funktion does, just type perldoc -f unpack ;-)
Re^2: splitting a sequence using unpack
by BrowserUk (Patriarch) on Mar 01, 2005 at 12:25 UTC
    ...it does but has a disadvantage that the string must be stored twice in memory...

    Would you explain this please?


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
      the string is originaly in $seq and is dublicated into the array @triplets. if you only want to print them, it's unnecessary to hold all triplets in memory but you can go through them one by one. besides that, storing triplets in an array should imho use more memory than the string stored in a scalar.

        Okay. That makes sense if all you want to do is print them out. I'd probably tackle it with substr in that case though.

        $p=0; print( substr $s, $p, 3 ), $p+=3 while $p < length $s;

        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.