Giorgio C has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have a question hoping someone can help me. I have a file with two columns A and B with a list of genes coordinates in each of them: I'd like to write in another file every element that is present in column A but not in B. (I mean I'd like to write in output the NON common elements of the two columns). Do you have any suggestions ( little script in perl )bnot exel cuz the list is huge. Thanks you in advance, Giorgio
  • Comment on NON Common Elements between two columns

Replies are listed 'Best First'.
Re: NON Common Elements between two columns
by Corion (Patriarch) on Jul 17, 2012 at 09:58 UTC

    This is a FAQ. Please read perlfaq4 about "symmetric difference" and/or "duplicate".

      Yes, I've seen; but since I'm a novice I think I do not completely understand what the faq suggests. Could you give me a hand, please?
        Can you show an example of file (a couple of lines will be enough)?
        Sorry if my advice was wrong.
Re: NON Common Elements between two columns
by Anonymous Monk on Jul 17, 2012 at 13:04 UTC
    Also consider putting the two lists into, say, an SQLite database (file), at which point you can simply use a left outer join to find the non-matching elements from either or both lists. You might discover many benefits to putting these lists into SQLite tables. (There is no "database server" involved in this scenario.)
Re: NON Common Elements between two columns
by Cristoforo (Curate) on Jul 17, 2012 at 22:59 UTC
    aitap's solution works fine, but I did a version using only 1 hash (and 1 array).
    #!/usr/bin/perl use strict; use warnings; my (@genes, %tested); while (<>) { s/^>//; my ($col1, $col2) = split; push @genes, $col1; $tested{$col2}++; } { local $\ = "\n"; for (@genes) { print if not $tested{$_}; } }
    When reading from the empty, <>, brackets, a file to read from has to be typed at the command line. For this program, I had the contents in file o33.txt.
    C:\Old_Data\perlp>type o33.txt >chr9:133738100-133738472_0 chr20:62159728-62161126_840 >chr9:133738100-133738472_60 chr2:215589720-215676478_59220 >chr9:133738100-133738472_120 chr2:215589720-215676478_59160 >chr9:133738100-133738472_180 chr15:99500240-99507809_0 >chr9:133738100-133738472_240 chr2:215589720-215676478_59100 >chr9:133738100-133738472_253 chr1:162745876-162746210_215 >chr9:133747466-133747650_0 chr5:108523084-108532592_960 >chr9:133747466-133747650_60 chr20:62159728-62161126_900 >chr9:133747466-133747650_65
    Then, my command line was the name of the program, t1.pl followed by the name of the file to read from, o33.txt.
    perl t1.pl o33.txt