Okay, if you and your friend agree on
which 5-sec
portion to compare (e.g. always use the first 5 sec, not
counting any initial silence that might be present), then
you have a fairly good chance of building a DFT-based
discriminator/identifier with a pretty good success rate.
In this case, Perl could be very
handy for driving the DFT/VQ engine on your friend's
audio file, doing data reduction on that output, and
running or maybe even computing the suitable statistics to
identify a "best match" in your local database of first-5-sec
snippets.
Just building your local database of "song signatures" will
be a very instructive exercise, and you can use it for both
"training" and "testing". I could go on... but it would all
be speculative, and you should work it out for yourself.