in reply to Creating a web crawler (theory)
Certainly, there are reasons for using full URLs occasionally BUT WHERE DID YOU GET THAT IDEA? (That's not purely sarcasm. If you can offer an authority for that, I'd like to read it!)
IIRC, a full URL forces the visitors browser to revisit the DNS server, creating needless traffic and slowing rendering.
(see brian_d_foy's reply below re DNS revisits: He's right and I clearly IDidNotRC ...but I believe the balance of this post can stand!)
However, you have a number of good answers on how to deal with your generic question, and good suggestions for dealing with relative links.
But you may want to consider the volume of data you're apt to deal with. One of my sites has ~1600 pages, and well over 5000 links. I can collect those links with a script -- ON A LOCAL MIRROR (ie, no net time and no competition for the server's attention)-- in about 15 seconds but I can't even guess what the time required would be if one were to try to chase down all the links on the secondary, tertiary, etc, pages...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Creating a web crawler (theory)
by brian_d_foy (Abbot) on Jan 28, 2005 at 21:01 UTC | |
by gaal (Parson) on Jan 29, 2005 at 14:10 UTC | |
by brian_d_foy (Abbot) on Jan 29, 2005 at 15:42 UTC | |
by gaal (Parson) on Jan 29, 2005 at 16:26 UTC | |
by brian_d_foy (Abbot) on Jan 29, 2005 at 18:06 UTC |