Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I am new to Perl and I am facing my first serious problem.
I have 2 files, say FILE1 and FILE2, which both have this format:
>name1 ASDFGHTHHJYJYJYJYRGRGRGEERHTE >name2 EGREGRGREGRHHTHTHTRHTHTRHTHRT >name3 TRJYTJYTHREGRGWGEFVEFFRFREFRE

I want to read the first file and see which sequences of this file (sequences are the strings of letters above >name1, >name2 etc) are not found in file2 and print them.
I am giving an example:<.br>
[FILE1] >name1 ABCDEFGHIJKLMNOPQ >name2 RHREGRHRTHTRHTRHTHRTH >name3 REHTRREFWDCEWCFEWWGREGREGRHTGREWFWE >name4 EWDWQEREWREWTEW ############################################## [FILE2] >new_name1 REHEFWEGTRJTYIJYTHTGRER >new_name2 ABCDEFGHIJKLMNOPQ >new_name3 DVFHTGVCVTRYITYYTRETEWTRE
In the above example I want to read all sequences in file1 and print name2 (along with its seq), name3 and name4, because name1 has a sequence that belongs also to file2. What can I do?

Replies are listed 'Best First'.
Re: Silly newbie question on comparing files
by GrandFather (Saint) on Jan 20, 2008 at 20:27 UTC

    Perl has a magical data type called a "hash" (or "associative array", see perldata) which is frequently used to determine if items are unique in some fashion. If your files are of modest size (up to a 100 MB maybe) then the common technique would be to populate a hash with the key sequences from the smaller file, then check to see if the key sequences from the other file exist in the hash.


    Perl is environmentally friendly - it saves trees
Re: Silly newbie question on comparing files
by cosmicperl (Chaplain) on Jan 20, 2008 at 19:44 UTC
    You want to look into DIFF's A quick google for "perl diff" brings up loads of stuff for doing this. Also CPAN has a lot on there CPAN DIFF search