in reply to Picking up Values By Group

Mixing tab delimeters with space delimited data is a really bad idea, and if you have any choice in the matter, you should change it.

On the basis that you don't have the choice, the following should work, but realise that the tabs I've embedded in the data will likely have been corrupted in the process of upload and download, and the wrapping etc, that PM does to code:

#! perl -slw use strict; use Data::Dump qw[ pp ]; my %data; while( <DATA> ) { my( $str, @values ) = map{ s[^\s+|\s+$][]g; $_ } split "\t"; $data{ $str } = [ map [ split ' ' ], @values ]; } pp %data; my @output; while( my( $key, $valueRef ) = each %data ) { my @required; for my $c ( 0 .. length( $key ) - 1 ) { push @required, $valueRef->[ $c ][ index "ACGT", substr $key, $c, +1 ]; } push @output, \@required; } pp \@output; __DATA__ AGAC 9 -29 -39 -37 27 -28 -39 -37 26 -27 -39 -37 2 +7 -27 -39 12 ACGT 1 -2 3 -4 5 -6 7 -8 9 -10 11 -12 13 -14 15 -1 +6

Output:

c:\test>junk6 ( "AGAC", [ [9, -29, -39, -37], [27, -28, -39, -37], [26, -27, -39, -37], [27, -27, -39, 12], ], "ACGT", [ [1, -2, 3, -4], [5, -6, 7, -8], [9, -10, 11, -12], [13, -14, 15, -16], ], ) [[9, -39, 26, -27], [1, -6, 11, -16]]

This is the same, but I've substituted the text '<TAB>' for the tab character which shoudl make it easier to try:

#! perl -slw use strict; use Data::Dump qw[ pp ]; my %data; while( <DATA> ) { my( $str, @values ) = map{ s[^\s+|\s+$][]g; $_ } split '<TAB>'; $data{ $str } = [ map [ split ' ' ], @values ]; } pp %data; my @output; while( my( $key, $valueRef ) = each %data ) { my @required; for my $c ( 0 .. length( $key ) - 1 ) { push @required, $valueRef->[ $c ][ index "ACGT", substr $key, $c, +1 ]; } push @output, \@required; } pp \@output; __DATA__ AGAC <TAB> 9 -29 -39 -37 <TAB> 27 -28 -39 -37 <TAB> 26 -27 -39 -37 <TA +B> 27 -27 -39 12 ACGT <TAB> 1 -2 3 -4 <TAB> 5 -6 7 -8 <TAB> 9 -10 11 -12 <TAB> 13 -14 1 +5 -16

The output is the same as above.

Like I say, if you have any influence over the file format, change the tab delimiters to something visible that does not match "\s".


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Picking up Values By Group
by holli (Abbot) on Feb 04, 2009 at 16:35 UTC
    I would use a lookup hash for the indices instead of calling index repeatedly.


    holli

    When you're up to your ass in alligators, it's difficult to remember that your original purpose was to drain the swamp.
      instead of calling index repeatedly.

      The tradeoff is:

      • scanning a 4 character string for a single character.
      • hashing a single character to a 32-bit hash and then performing a modulo 4 operation upon it.

      Which actually favours the former:

      #! perl -slw use strict; use Benchmark qw[ cmpthese ]; our %lookup = ( A=>0, B=>1, C=>2, D=>3 ); our $input = 'ACGT' x 1000; cmpthese -1, { index => q[ our( %lookup, $input ); my $n; $n = index "ACGT", $_ for split '', $input; ], hash => q[ our( %lookup, $input ); my $n; $n = $lookup{ $_ } for split '', $input; ], }; __END__ c:\test>junk5 Rate hash index hash 107/s -- -21% index 135/s 27% --

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.