dear monks,

i have a very simple problem but can't work it out! I basically have two arrays, one contains a small number of unique id's, the second contains lots of sequences labelled with their id's.

I am simply trying to compare the unique sequence id's to those in the sequence file and extract the corresponding sequence.

I am getting confused trying to loop within two arrays. Please can someone show me where i'm getting it wrong??

# the uniq id's file looks like this: # gi|11995001:156374-156649 dbj|BA000040|:2701685-2702539 dbj|BA000040|:c8987046-8986282 gi|13488050:58289-58570 gi|13470324:5721573-5721854 # the corresponding sequence file looks like this: >gi|11995001:156374-156649, SMa0002 ATGGAGGCTGTTCCCATGAATGTAGACCTCTCACGGCGCAGCTTTTTGAAGCTGGCTGGAGCAGGGGCTG CGGCAACGTCACTCGGTGCGATGGGGTTTGGTGAGGCTGAGGCGGCGGTCGTCGCGCATGTCCGGCCTCA >dbj|BA000040|:2701685-2702539 GAAGGAGCCGATCTGGTCACCTTTTCCGGCGACAAGCTGCTGGGCGGTCCGCAGGCGGGTTTCATCGTCG GGCGCAGGGACCTGATCGCCGA # every unique_id has a corresonding sequence in @sequence # here is my attempt open (GENES, "$ARGV[1]") or die "unable to open file $!\n"; open (IDS, "$ARGV[0]") or die "unable to open file $!\n"; open (GENES, "$ARGV[1]") or die "unable to open file $!\n"; my @ids = <IDS>; my @genes = <GENES>; my $ids = join ('', @ids); @ids = split ('\n', $ids); my $genes = join ('', @genes); @genes = split ('>', $genes); my @accessions; foreach my $line (@file) { if ($line =~ /^(\w+\|\w+\.{0,1}\d{0,1}\|{0,1}:c{0,1}\d+\-\d+)/ +) { push @accessions, "$1"; } } # extract uniq id's my %seen=(); my @uniq = (); foreach my $item (@accessions) { unless ($seen{$item}) { $seen{$item}=1; push (@uniq, $item); } } # dig out the correspnding sequence for each id # THIS BIT NOT WORKING ;-( for (my $i=0; $i<@sequence; $i++) { foreach my $id (@uniq) { if ($sequence[$i] =~ /^$id/) { print "$id\n"; } } }

In reply to pattern matching and array comparison by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.