in reply to Using text files to remove duplicates in a web crawler
For some reason this allows duplicates, any ideas?How do we keep track of "duplicates" in perl? perldoc -q duplicate says (after listing a bunch of potential solutions) perhaps you should have been using a hash all along, eh? The faq is right, use a hash.
`perldoc DB_File'
`perldoc AnyDBM_File'
Also, merlyn has written a couple of articles on writing spiders, so you should check those out.
|
|---|