in reply to Creating a web crawler (theory)

And using regexes would probably be a pain because webmasters don't always use FULL URLS like they should.

Erm, no relative URLs are perfectly valid. Do you really think it'd be a good idea to have a hyooman explicitly add http://www.wherever.com/six/levels/deep/into/some/path/ to the front of every URI? Not every page is automatically generated.

At any rate, see the new_abs method from URI for how to handle these easily.