in reply to Using text files to remove duplicates in a web crawler

For some reason this allows duplicates, any ideas?
How do we keep track of "duplicates" in perl? perldoc -q duplicate says (after listing a bunch of potential solutions) perhaps you should have been using a hash all along, eh? The faq is right, use a hash.

`perldoc DB_File'
`perldoc AnyDBM_File'

Also, merlyn has written a couple of articles on writing spiders, so you should check those out.

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

  • Comment on Re: Using text files to remove duplicates in a web crawler