in reply to Crawling Relative Links from Webpages
The pdf link on the example page is not relative to the current page. It starts with /, so is absolute path relative to the current server, so combining with base() isn't going to work.
You need to combine it with the server to form the correct url, but Mech doesn't break that out for you. It will give you a URI, but that's documented in alien and so I have never been sure if it can give you the root address of the server or not. I've always used:
my( $server ) = $url =~ m[(^http://[^/]+/)];
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Crawling Relative Links from Webpages
by listanand (Sexton) on May 08, 2010 at 01:34 UTC | |
by BrowserUk (Patriarch) on May 08, 2010 at 01:42 UTC | |
by Anonymous Monk on May 08, 2010 at 03:44 UTC | |
by BrowserUk (Patriarch) on May 08, 2010 at 03:54 UTC | |
by Anonymous Monk on May 08, 2010 at 04:17 UTC |