You failed to mention what part of the problem you are having trouble with. I'm going to make the assumption that you already know enough Perl to open files, and so my solution assumes that you've already got the files in an array of some sort. Because of this assumption (a consequence of your lack of specifying sufficient detail), you will have to adapt this solution to your needs.

my @genes = qw( Gene1 Gene2 Gene3 Gene4 Gene5 Gene6 ); my @raw_files = ( "Gene1 Gene2 Gene3", "Gene2 Gene3 Gene4", "Gene3 Gene4 Gene5", ); my @gene_in_files = map { my %content; @content{ split " ", $_ } = (); \%content; } @raw_files; my @gene_matrix = map { my $gene = $_; [ map { ~~exists $_->{$gene} } @gene_in_files ] } @genes; print "Gene", $_+1, " @{$gene_matrix[$_]}\n" for 0 .. $#gene_matrix;

This solution puts the contents of each file into a hash so that it can be quickly determined if Gene1 can be found in File1. Then it just iterates over the genes, and tests each file to see if the gene is found in the file. If so, it flips a flag in the gene matrix on. Otherwise, it sets the flag to zero.

If your requirement is that you use actual bits rather than an array of 1's and 0's, that too is pretty simple, but I'm going to assume that you know how to read the documentation for vec, and are able to adapt the solution to fit that need.

Here is the output from my example script:

Gene1 1 0 0 Gene2 1 1 0 Gene3 1 1 1 Gene4 0 1 1 Gene5 0 0 1 Gene6 0 0 0

Also, I suggest that when you're trying to show us tabular input and output, that you simply wrap it in <code></code> tags; it's easier to maintain fixed column widths when you don't have to worry about how HTML gobbles up duplicated whitespace, and you won't have to put <br /> after each line of tabular data. See Writeup Formatting Tips. By way of example, when I posted my sample output, I did this:

<code> [shift-insert, to paste output from my terminal] </code>

Update: Simplified the solution by eliminating temp variables holding various stages of the data transform.

Update2: And here's my "just for fun" version:

my @genes = qw( Gene1 Gene2 Gene3 Gene4 Gene5 Gene6 ); my @raw_files = ( "Gene1 Gene2 Gene3", "Gene2 Gene3 Gene4", "Gene3 Gene4 Gene5" ); my $gene_num = 1; print "Gene", $gene_num++, " @{$_}\n" for sub { my @in_file = map { { map { $_ => 0 } split " ", $_ } } @{+shift}; map { my $gene = $_; [ map { ~~exists $_->{$gene} } @in_file ] } @{+shift}; }->( \@raw_files, \@genes );

Dave


In reply to Re: Creating a binary matrix by davido
in thread Creating a binary matrix by perl_user123

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.