in reply to variation on splitting a string into elements of an array
in thread splitting a sequence using unpack

Note that if you're using a perl as recent as perl 5.8, you can simplify your initial unpack to:
@triplets = unpack ('(a3)*', $line);
So long as you're sure that $line has a length that is a multiple of 3. If you don't necessarily have that, you'll get trailing crud in the last element of @triplets:
# throw away trailing crud pop @triplets if $triplets[-1] !~ /.../;
This leads to this answer for your original question (very similar to what's already been posted)
my $line = 'atccatccctttaat'; my @triplets = unpack( '(a3)*', $line); my @triplets2 = unpack( 'x(a3)*', $line); my @triplets3 = unpack( 'xx(a3)*', $line); # throw away trailing crud pop @triplets if $triplets[-1] !~ /.../; pop @triplets2 if $triplets2[-1] !~ /.../; pop @triplets3 if $triplets3[-1] !~ /.../;
If you want to start reading at some arbitrary point, you can do:
my @triplets = unpack("x${skip}(a3)*", $line); pop @triplets if $triplets[-1] !~ /.../;
Of course, it might just be easier to replace $line with substr($line,$skip).
-- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/

Replies are listed 'Best First'.
Re^2: variation on splitting a string into elements of an array
by bageler (Hermit) on Mar 02, 2005 at 17:49 UTC
    Since you had repeating code, my preference is to put it in a loop. I'd also use length instead of /.../, it's computationally less expensive which I assume is important if you're parsing lots of sequences.
    my $line = 'atccatccctttaat'; my %triplets; for (0 .. 2) { @{$triplets{$_}} = unpack(('x' x $_).'(a3)*',$line); pop @{$triplets{$_}} if length($triplets{$_}->[-1]) != 3; print "Offset $_: @{$triplets{$_}}\n"; }
      Yeah, I thought about a loop, but I was afraid it would obscure what the code is doing - I'd certainly switch to a loop if doing more than 3 or four repetitions, and if I were doing more than two statements per offset, but at less than that the loop syntax just clutters things up. (I realize that this is a matter of personal taste, and I might change my answer depending on my mood).

      Good point about using length - again, I think my way is clearer, but I'm not sure whether that's the use of length or the ->. (which you could drop)

      By the way, I would have written the first line of your loop as:

      $triplets{$_} = [unpack("x$_ (a3)*", $line)];
      But that's only because I don't like using @{$unusedvarref} to auto-vivify.
      -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/