Re^3: Crawling Relative Links from Webpages

Something like:

my $uri = $mech->uri;
my( $server ) = $url =~ m[(^http://[^/]+)/];
...
my $pdfurl = $server . $link;
[download]

Note: There probably is some way of getting the appropriate portion of the url from URI without resorting to regex, but I've never worked out how.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^3: Crawling Relative Links from Webpages Download Code

Replies are listed 'Best First'.
Re^4: Crawling Relative Links from Webpages by Anonymous Monk on May 08, 2010 at 03:44 UTC
uri returns a URI object, so `$mech->uri->host` or `$mech->uri->ihost`	[reply] [d/l] [select]
Re^5: Crawling Relative Links from Webpages by BrowserUk (Patriarch) on May 08, 2010 at 03:54 UTC
I know. But how do you get the bit the OP needs? Not like this: `perl -MURI -E"$u=new URI('http://dspace.mit.edu/handle/1721.1/53720'); say $u->ho +st" dspace.mit.edu` [download] Nor any of these: c:\test>perl -MURI -E"my $u=new URI('http://dspace.mit.edu/handle/1721 +.1/53720'); say $u->authority" dspace.mit.edu c:\test>perl -MURI -E"my $u=new URI('http://dspace.mit.edu/handle/1721 +.1/53720'); say $u->path" /handle/1721.1/53720 c:\test>perl -MURI -E"my $u=new URI('http://dspace.mit.edu/handle/1721 +.1/53720'); say $u->fragment" c:\test>perl -MURI -E"my $u=new URI('http://dspace.mit.edu/handle/1721 +.1/53720'); say $u->opaque" //dspace.mit.edu/handle/1721.1/53720 c:\test>perl -MURI -E"my $u=new URI('http://dspace.mit.edu/handle/1721 +.1/53720'); say $u->canonical" http://dspace.mit.edu/handle/1721.1/53720 [download]	[reply] [d/l] [select]
Re^6: Crawling Relative Links from Webpages by Anonymous Monk on May 08, 2010 at 04:17 UTC
I'm sorry I thought he wanted host, like `my $uri = URI->new_abs( $str, $mech->uri )` [download] , but I would just use follow_link	[reply] [d/l]