@links = filter(@links); sub filter { #pseudocode strip off any hashes because "foo.com/bar.htm#quux" is the same as "foo.com/bar.htm" strip off "index.html" and so on because "foo.com/index.html" is the same as "foo.com/" various other things ... NOW feed them through a hash to guarantee uniqueness NOW grep them against the %seen hash return whatever's left }