in reply to Creating a web crawler (theory)
And using regexes would probably be a pain because webmasters don't always use FULL URLS like they should.
Erm, no relative URLs are perfectly valid. Do you really think it'd be a good idea to have a hyooman explicitly add http://www.wherever.com/six/levels/deep/into/some/path/ to the front of every URI? Not every page is automatically generated.
At any rate, see the new_abs method from URI for how to handle these easily.
|
|---|