Perl_girl has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Quite new to programming, and have a bit of a problem that I can't work out! Basically, I need to go through one array and pull out the entries that match another, have tried all sorts of different match and grep scripts, but just keep going round in circles! I'm sure its all a mess, but any help would be much appreciated!

What I have includes: 1. List of Gene names and array probe names 2. Data table of these same probe names followed by values What I need to end up with is ideally a reproduction of the second table with a column containing the gene name for the line of data. There are multiple data entries for the gene/probe names.

My current attempt variation, which prints out the data table again with an added column of tabs:

#! usr/bin/perl use strict; use warnings; open (NAME, "found.txt") or die; open (DATA, "Array_values.txt") or die "$!\n"; open (OUT, ">Array_fin.txt") or die "$!\n"; my @line = <DATA>; my @probe_ID; my @probe_name; my @gene; my $line; my $probe_name; my $i = (0 .. 13005); #The number of genes and probe names (whereas th +ere are 389308 data entries, including header line) while (<NAME>) { my @col = split(/\t/,$_); #split sequence names into gene and probe $probe_name = $col[1]; my $gene = $col[0]; push (@probe_name, $probe_name); push (@gene, $gene); } foreach $line (@line) #take each line of data { my @data = split(/\t/,$line); #extract probe name my $probe_ID = $data[0]; # push (@probe_ID, $probe_ID); if ($probe_ID =~ m/$probe_name[$i]/) #match the current entry w +ith the list of probe names { print OUT "$gene[$i]\t$line\n"; #print the relavant gene name a +nd data } else { print OUT "no match\n"; } } close

Replies are listed 'Best First'.
Re: Match within array
by moritz (Cardinal) on Jul 06, 2010 at 11:59 UTC
    my $i = (0 .. 13005);

    That's where your problems start; the code is not doing what you expect. Using 0 .. 13005 in scalar context is not a range, but a flip flop, so $i now contains 1. Surprise, surprise.

    if ($probe_ID =~ m/$probe_name[$i]/) #match the current entry with the list of probe names

    Consequently this doesn't do what you want it to - it just checks against $probe_name[1], not all the probe names.

    One approach is to build a regex instead:

    # afater the first loop: my $probes_regex = join '|', @probe_name; # you don't need to store all lines in @line, it only wastes memory while (my $line = <DATA>) { my @data = split /\t/, $line; if ($data[0] =~ $probes_regex) { print OUT "$gene[$i]\t$line\n"; } else { print "no match\n"; } }

    But if you don't want or need regex matches, but rather exact matches, a hash is much more efficient.

    In general it pays off if you tell us about how your data looks like, there might be ways to speed up or simplify the process quite a bit.

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Match within array
by ww (Archbishop) on Jul 06, 2010 at 12:59 UTC
    For future reference, this is a common inquiry... aka FAQ.

    You'll often find answers to such questions in the Monastery's fine Tutorials or Q&A sections, or by use of Super_Search.

Re: Match within array
by RMGir (Prior) on Jul 06, 2010 at 11:51 UTC
    I think you'd find walking through your script with the perl debugger instructive. For instance, $i has the scalar value 13006, I'll bet.

    I think what you need is just a hash, using the probe names as keys, and the genes as values. Then your regex match on $probe_ID becomes a simple hash lookup...


    Mike
      For instance, $i has the scalar value 13006, I'll bet.

      Bet lost :-)

      $ perl -wle 'my $i = (0..13005); print $i' Use of uninitialized value in range (or flip) at -e line 1. Use of uninitialized value in range (or flop) at -e line 1. 1

      It's not evaluated as a list (of which the last element is taken), but rather as a flip flop that matches against $_ $..

      Perl 6 - links to (nearly) everything that is Perl 6.
        Gah, foiled by the scalar flip-flop again :)

        Mike