Silly newbie question on comparing files

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I am new to Perl and I am facing my first serious problem.
I have 2 files, say FILE1 and FILE2, which both have this format:

>name1
ASDFGHTHHJYJYJYJYRGRGRGEERHTE
>name2
EGREGRGREGRHHTHTHTRHTHTRHTHRT
>name3
TRJYTJYTHREGRGWGEFVEFFRFREFRE
[download]

I want to read the first file and see which sequences of this file (sequences are the strings of letters above >name1, >name2 etc) are not found in file2 and print them.
I am giving an example:<.br>

[FILE1]
>name1
ABCDEFGHIJKLMNOPQ
>name2
RHREGRHRTHTRHTRHTHRTH
>name3
REHTRREFWDCEWCFEWWGREGREGRHTGREWFWE
>name4
EWDWQEREWREWTEW
##############################################
[FILE2]
>new_name1
REHEFWEGTRJTYIJYTHTGRER
>new_name2
ABCDEFGHIJKLMNOPQ
>new_name3
DVFHTGVCVTRYITYYTRETEWTRE
[download]

In the above example I want to read all sequences in file1 and print name2 (along with its seq), name3 and name4, because name1 has a sequence that belongs also to file2. What can I do?

Comment on Silly newbie question on comparing files Select or Download Code

Replies are listed 'Best First'.
Re: Silly newbie question on comparing files by GrandFather (Saint) on Jan 20, 2008 at 20:27 UTC
Perl has a magical data type called a "hash" (or "associative array", see perldata) which is frequently used to determine if items are unique in some fashion. If your files are of modest size (up to a 100 MB maybe) then the common technique would be to populate a hash with the key sequences from the smaller file, then check to see if the key sequences from the other file exist in the hash. Perl is environmentally friendly - it saves trees	[reply]
Re: Silly newbie question on comparing files by cosmicperl (Chaplain) on Jan 20, 2008 at 19:44 UTC
You want to look into DIFF's A quick google for "perl diff" brings up loads of stuff for doing this. Also CPAN has a lot on there CPAN DIFF search	[reply]