siskos1 has asked for the wisdom of the Perl Monks concerning the following question:

hi. i havent been learning perl for a while and i forgot a lot. so it may be easy question, sorry for that. my problem is, i am creating a 8000x8000 2 dimensional array. there are some proteins, if they interact, their corresponding value will be 1, otherwise 0. interacting proteins are given pairwise in a tab delimited text row by row. so i have thought of something like this;
$/ = undef; my $string = <IN>; my @all = split /\t|\n/,$string; my @array; for ($i=0; $i<$#all; $i+=2) { $array[$all[$i]] [$all[$i+1]] = 1 ; $array[$all[$i+1]] [$all[$i]] = 1 ; }
so my question is, how can i use strings in an array bracket. like $array [ proteinA ] [ proteinB ] if i am not wrong, it is possible without using "strict". code will be used just a couple of times, so easier way is better. i would also like to hear, if there are some other ways. thanks for reading.

Replies are listed 'Best First'.
Re: how can i use characters in 2 dimensional array brackects
by BrowserUk (Patriarch) on Jun 13, 2010 at 10:31 UTC

    You can't use strings to index arrays. But you could use a 2D hash instead.

    The problem is, a 8000x8000 array is going to consume close to 2GB of ram, and a 2D hash of the same dimensions considerably more.

    However, since you are only doing a boolean test, there is no need to actually store any values in the hash. Hashes have a very nice exists test.

    So, instead of storing 0 or 1 for each pair, you could make a sparse 2D hash by only creating keys (but assigning no value!) for those that are true. Then your test simple becomes if( exists $hash{ $first }{ $second } ){ ... which if the ratio of true to false is less than ~50%, will save you memory relative to an array(~1.7GB .v. ~2GB)

    But, for best speed and minimal memory consumption, whenever I see "0 or 1" I think bit vectors. An array of 8000 bitstrings, each with 8000 bits (1000 bytes) requires just 8MB:

    @vecs = ('')x8000; for my $_1 ( 0 .. 7999 ) { vec( $vecs[ $_1 ], $_, 1 ) = (rand() > 0.5 ) for 0 .. 7999 };; print total_size \@vecs;; 8494224

    And lookups are very fast:

    $t = time; for my $_1 ( 0 .. 7999 ) { vec( $vecs[ $_1 ], $_, 1 ) ? ++$true : ++$false for 0 .. 7999 }; print "$true $false", time() - $t;; 31998302 32001698 15.6879999637604

    At 4 milllion lookups/second, it beats hashes and probably arrays too. Definitely, if their size pushes you into swapping.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: how can i use characters in 2 dimensional array brackects
by Corion (Patriarch) on Jun 13, 2010 at 09:16 UTC

    Arrays use numbers as indices, not strings. Maybe you want to declare constants to access your array indices, or maybe you want to use a hash instead?